arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.26951 2026-04-30 cs.CL cs.AI cs.LG

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

Gongbo Zhang, Wen Wang, Ye Tian, Li Yuan

Comments 15 pages, 3 figures. Code: https://github.com/PKU-YuanGroup/TIDE

详情
英文摘要

Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. While existing distillation methods for dLLMs reduce inference steps within a single architecture, none address cross-architecture knowledge transfer, in which the teacher and student differ in architecture, attention mechanism, and tokenizer. We present TIDE, the first framework for cross-architecture dLLM distillation, comprising three modular components: (1) TIDAL, which jointly modulates distillation strength across training progress and diffusion timestep to account for the teacher's noise-dependent reliability; (2) CompDemo, which enriches the teacher's context via complementary mask splitting to improve predictions under heavy masking; and (3) Reverse CALM, a cross-tokenizer objective that inverts chunk-level likelihood matching, yielding bounded gradients and dual-end noise filtering. Distilling 8B dense and 16B MoE teachers into a 0.6B student via two heterogeneous pipelines outperforms the baseline by an average of 1.53 points across eight benchmarks, yielding notable gains in code generation, where HumanEval scores reach 48.78 compared to 32.3 for the AR baseline.

2604.26946 2026-04-30 cs.CV cs.RO

Three-Step Nav: A Hierarchical Global-Local Planner for Zero-Shot Vision-and-Language Navigation

Wanrong Zheng, Yunhao Ge, Laurent Itti

Comments Accepted to AISTATS 2026. Code: https://github.com/ZoeyZheng0/3-step-Nav

详情
Journal ref
Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (AISTATS), 2026
英文摘要

Breakthrough progress in vision-based navigation through unknown environments has been achieved by using multimodal large language models (MLLMs). These models can plan a sequence of motions by evaluating the current view at each time step against the task and goal given to the agent. However, current zero-shot Vision-and-Language Navigation (VLN) agents powered by MLLMs still tend to drift off course, halt prematurely, and achieve low overall success rates. We propose Three-Step Nav to counteract these failures with a three-view protocol: First, "look forward" to extract global landmarks and sketch a coarse plan. Then, "look now" to align the current visual observation with the next sub-goal for fine-grained guidance. Finally, "look backward" audits the entire trajectory to correct accumulated drift before stopping. Requiring no gradient updates or task-specific fine-tuning, our planner drops into existing VLN pipelines with minimal overhead. Three-Step Nav achieves state-of-the-art zero-shot performance on the R2R-CE and RxR-CE dataset. Our code is available at https://github.com/ZoeyZheng0/3-step-Nav.

2604.26944 2026-04-30 math.CA cs.SC

Fractions of Recurrence Operators for Generalized Fourier Series in Classical Orthogonal Polynomials

Alexandre Benoit, Nicolas Brisebarre, Bruno Salvy

详情
英文摘要

We consider series expansions in bases of classical orthogonal polynomials. When such a series solves a linear differential equation with polynomial coefficients, its coefficients satisfy a linear recurrence equation. We interpret this equation as the numerator of a fraction of linear recurrence operators. This interpretation lets us give a simple and unified view of previous algorithms computing these recurrences, with a noncommutative Euclidean algorithm as the algorithmic engine. Finally, we demonstrate the effectiveness of our approach on various examples.

2604.26943 2026-04-30 cs.CV

ProcFunc: Function-Oriented Abstractions for Procedural 3D Generation in Python

Alexander Raistrick, Karhan Kayan, Jack Nugent, David Yan, Lingjie Mei, Meenal Parakh, Hongyu Wen, Dylan Li, Yiming Zuo, Erich Liang, Jia Deng

详情
英文摘要

We introduce ProcFunc, a library for Blender-based procedural 3D generation in Python. ProcFunc provides a library of easy-to-use Python functions, which streamline creating, combining, analyzing, and executing procedural generation code. ProcFunc makes it easy to create large-scale diverse training data, by combinatorial compositions of semantic components. VLMs can use ProcFunc to edit procedural material and geometry code and can create new procedural code with significantly fewer coding errors. Finally, as an example use case, we use ProcFunc to develop a new procedural generator of indoor rooms, which includes a collection of new compositional procedural materials. We demonstrate the detail, runtime efficiency, and diversity of this room generator, as well as its use for 3D synthetic data generation. Please visit https://github.com/princeton-vl/procfunc for source code.

2604.26942 2026-04-30 cs.LG math.ST q-bio.GN stat.ME stat.ML stat.TH

Hyper Input Convex Neural Networks for Shape Constrained Learning and Optimal Transport

Shayan Hundrieser, Insung Kong, Johannes Schmidt-Hieber

Comments 65 pages, 13 figures, the first two authors contributed equally

详情
英文摘要

We introduce Hyper Input Convex Neural Networks (HyCNNs), a novel neural network architecture designed for learning convex functions. HyCNNs combine the principles of Maxout networks with input convex neural networks (ICNNs) to create a neural network that is always convex in the input, theoretically capable of leveraging depth, and performs reliable when trained at scale compared to ICNNs. Concretely, we prove that HyCNNs require exponentially fewer parameters than ICNNs to approximate quadratic functions up to a given precision. Throughout a series of synthetic experiments, we demonstrate that HyCNNs outperform existing ICNNs and MLPs in terms of predictive performance for convex regression and interpolation tasks. We further apply HyCNNs to learn high-dimensional optimal transport maps for synthetic examples and for single-cell RNA sequencing data, where they oftentimes outperform ICNN-based neural optimal transport methods and other baselines across a wide range of settings.

2604.26939 2026-04-30 math.PR cs.SI q-bio.PE

Degree-dependent and distance-dependent contact rates interpolate between explosive, exponential and polynomial epidemic growth

Zylan Benjert, Júlia Komjáthy, Johannes Lengler, John Lapinskas, Ulysse Schaller

详情
英文摘要

It is a fundamental question in epidemiology to estimate, model and predict the growth rate of a pandemic. Analogously, analysing the diffusion of innovation, (fake) news, memes, and rumours is of key importance in the social sciences. The resulting epidemic growth curves can be classified according to their growth rates. These have been found to range from exponential to both faster super-exponential curves and slower subexponential or polynomial curves. Previous research has lacked a unified explanatory framework capable of accommodating super-exponential, (stretched) exponential, and polynomial growth patterns within the same contact network. In this paper we propose a simple agent-based network model that can capture all these phases. We provide such a framework by modelling how transmission rates depend on spatial distance and on individuals' numbers of contacts. By comparing the growth rate of spreading processes with or without degree-dependent and/or distance-dependent contact rates through data-driven and synthetic simulations on real and modelled networks with underlying geometry, we find evidence that even a 'sublinear presence' of these causes may cause a significant slow down of the growth rate on the same underlying network. We find that the growth rate is governed by a combination of three factors: geometry, the prevalence of weak ties, and superspreaders. We confirm our results with rigorous proofs in a theoretical model, using a spatial multiscale-argument in long-range heterogeneous first passage percolation. Our results give a plausible explanation of why the consecutive waves of a single pandemic can differ in their growth even if their spreading mechanisms are similar.

2604.26935 2026-04-30 cs.HC

Artistic Practice Opportunities in CST Evaluations: A Longitudinal Group Deployment of ArtKrit

Catherine Liu, Tao Long, Asya Vaisberg, Chau Vu, Jiaju Ma, Jingyi Li

Comments 17 pages, 8 figures. Accepted to DIS 2026

详情
英文摘要

Creativity support tools (CSTs) aim to elevate the quality of artists' creative processes and artifacts. Yet most current CST evaluations overlook temporal and social aspects of tool use. To address this gap, we present a longitudinal, group-based CST evaluation through a three-week deployment of ArtKrit, a computational drawing tool that supports disciplined drawing. Nine digital artists, organized into three communities of practice, completed weekly "master studies" alongside a researcher-artist. Our results show users' evolving relationships with ArtKrit over time - from early experimentation to selective incorporation or misuse - alongside changes in their ways of artistic seeing. These changes unfolded within artist support networks that fostered confidence and creative safety, and validated individual expression. Overall, our findings suggest that CST evaluations can - and should - be designed as opportunities for meaningful artistic engagement rather than purely extractive measurement exercises. We contribute this longitudinal, group-based approach as one CST evaluation method.

2604.26934 2026-04-30 cs.CV

World2VLM: Distilling World Model Imagination into VLMs for Dynamic Spatial Reasoning

Wanyue Zhang, Wenxiang Wu, Wang Xu, Jiaxin Luo, Helu Zhi, Yibin Huang, Shuo Ren, Zitao Liu, Jiajun Zhang

Comments The code is available at https://github.com/WanyueZhang-ai/World2VLM. The dataset is available at https://huggingface.co/datasets/WanyueZhang/World2VLM

详情
英文摘要

Vision-language models (VLMs) have shown strong performance on static visual understanding, yet they still struggle with dynamic spatial reasoning that requires imagining how scenes evolve under egocentric motion. Recent efforts address this limitation either by scaling spatial supervision with synthetic data or by coupling VLMs with world models at inference time. However, the former often lacks explicit modeling of motion-conditioned state transitions, while the latter incurs substantial computational overhead. In this work, we propose World2VLM, a training framework that distills spatial imagination from a generative world model into a vision-language model. Given an initial observation and a parameterized camera trajectory, we use a view-consistent world model to synthesize geometrically aligned future views and derive structured supervision for both forward (action-to-outcome) and inverse (outcome-to-action) spatial reasoning. We post-train the VLM with a two-stage recipe on a compact dataset generated by this pipeline and evaluate it on multiple spatial reasoning benchmarks. World2VLM delivers consistent improvements over the base model across diverse benchmarks, including SAT-Real, SAT-Synthesized, VSI-Bench, and MindCube. It also outperforms the test-time world-model-coupled methods while eliminating the need for expensive inference-time generation. Our results suggest that world models can serve not only as inference-time tools, but also as effective training-time teachers, enabling VLMs to internalize spatial imagination in a scalable and efficient manner.

2604.26932 2026-04-30 math.OC cs.LG

Learning Over-Relaxation Policies for ADMM with Convergence Guarantees

Junan Lin, Paul J. Goulart, Luca Furieri

详情
英文摘要

The Alternating Direction Method of Multipliers (ADMM) is a widely used method for structured convex optimization, and its practical performance depends strongly on the choice of penalty and relaxation parameters. Motivated by settings such as Model Predictive Control (MPC), where one repeatedly solves related optimization problems with fixed structure and changing parameter values, we propose learning online updates of the relaxation parameter to improve performance on problem classes of interest. This choice is computationally attractive in OSQP-like architectures, since adapting relaxation does not trigger the matrix refactorizations associated with penalty updates. We establish convergence guarantees for ADMM with time-varying penalty and relaxation parameters under mild assumptions, and show on benchmark quadratic programs that the resulting learned policies improve both iteration count and wall-clock time over baseline OSQP.

2604.26931 2026-04-30 cs.DC

Adaptive Self-Organization in Anonymous Dynamic Networks

Garrett Parzych, Joshua J. Daymude

Comments 30 pages, 1 figure, 1 table, 1 algorithm. To appear as a brief announcement at SAND 2026

详情
英文摘要

We introduce the problem of adaptive self-organization in which the nodes of an anonymous, synchronous dynamic network must distributively change the collective distribution of their responses (or "colors") as a function of time-varying environmental signals, even when these signals are only perceived locally and the network topology changes adversarially. Specifically, a signal adversary may change the type of signal and which node(s) witness that signal arbitrarily between rounds. If a signal (or lack thereof) $s$ persists in the system for sufficiently long, the dynamic network must stabilize such that nodes' colors reach and remain in a distribution closely approximating $r(s)$, a goal distribution defined by the problem instance. We first prove that if nodes are deterministic, the only solvable instances of adaptive-self organization are those with homogeneous goal distributions, i.e., those where all nodes must stabilize with the same color. We then present a linear-time, logarithmic-memory, deterministic algorithm for this subclass of instances that works even when the multiplicity and location of signal witnesses change arbitrarily. When nodes know $n$, the number of nodes in the network, a small adaptation of this algorithm achieves a stronger convergence property in which adversarial edge and signal dynamics are entirely unable to disturb stabilized configurations. Finally, we present a randomized extension of these algorithms that solves arbitrary (i.e., not necessarily homogeneous) instances of adaptive self-organization with high probability when nodes know the goal distributions.

2604.26926 2026-04-30 cs.LG math.OC stat.ML

A Note on How to Remove the $\ln\ln T$ Term from the Squint Bound

Francesco Orabona

详情
英文摘要

In Orabona and Pál [2016], we introduced the shifted KT potentials, to remove the $\ln \ln T$ factor in the parameter-free learning with expert bound. In this short technical note, I show that this is equivalent to changing the prior in the Krichevsky--Trofimov algorithm. Then, I show how to use the same idea to remove the $\ln \ln T$ factor in the data-independent bound for the Squint algorithm.

2604.26923 2026-04-30 cs.SE cs.CL

ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

Yeheng Chen, Chaoxiang Xie, Yuling Shi, Wenhao Zeng, Yongpan Wang, Hongyu Zhang, Xiaodong Gu

Comments Accepted to AIware 2026. Code and data available at https://github.com/ian-Kappa/ClassEval-Pro

详情
英文摘要

LLMs have achieved strong results on both function-level code synthesis and repository-level code modification, yet a capability that falls between these two extremes -- compositional code creation, i.e., building a complete, internally structured class from a specification -- remains underserved. Current evaluations are either confined to isolated functions or rely on manually curated class-level tasks that are expensive to scale and increasingly susceptible to data contamination. We introduce ClassEval-Pro, a benchmark of 300 class-level tasks spanning 11 domains, constructed through an automated three-stage pipeline that combines complexity enhancement, cross-domain class composition, and integration of real-world GitHub code contributed after January 2025. Every task is validated by an LLM Judge Ensemble and must pass test suites with over 90% line coverage. We evaluate five frontier LLMs under five generation strategies. The best model achieves only 45.6% class-level Pass@1, with a 17.7-point gap between the strongest and weakest models, confirming the benchmark's discriminative power. Strategy choice strongly interacts with model capability: structured approaches such as bottom-up improve weaker models by up to 9.4 percentage points, while compositional generation collapses to as low as 1.3%. Error analysis over 500 manually annotated failures reveals that logic errors (56.2%) and dependency errors (38.0%) dominate, identifying cross-method coordination as the core bottleneck.

2604.26922 2026-04-30 cs.LG cs.DS cs.GT stat.ML

On the Learning Curves of Revenue Maximization

Steve Hanneke, Alkis Kalavasis, Shay Moran, Grigoris Velegkas

Comments To appear in the 58th ACM Symposium on Theory of Computing (STOC 2026)

详情
英文摘要

Learning curves are a fundamental primitive in supervised learning, describing how an algorithm's performance improves with more data and providing a quantitative measure of its generalization ability. Formally, a learning curve plots the decay of an algorithm's error for a fixed underlying distribution as a function of the number of training samples. Prior work on revenue-maximizing learning algorithms, starting with the seminal work of Cole and Roughgarden [STOC, 2014], adopts a distribution-free perspective, which parallels the PAC learning framework in learning theory. This approach evaluates performance against the hardest possible sequence of valuation distributions, one for each sample size, effectively defining the upper envelope of learning curves over all possible distributions, thus leading to error bounds that do not capture the shape of the learning curves. In this work we initiate the study of learning curves for revenue maximization and provide a near-complete characterization of their rate of decay in the basic setting of a single item and a single buyer. In the absence of any restriction on the valuation distribution, we show that there exists a Bayes-consistent algorithm, meaning that its learning curve converges to zero for any arbitrary valuation distribution as the number of samples $n \to \infty$. However, this convergence must be arbitrarily slow, even if the optimal revenue is finite. In contrast, if the optimal revenue is achieved by a finite price, then the optimal rate of decay is roughly $1/\sqrt{n}$. Finally, for distributions supported on discrete sets of values, we show that learning curves decay almost exponentially fast, a rate unattainable under the PAC framework.

2604.26921 2026-04-30 quant-ph cs.CC

En Route to a Standard QMA1 vs. QCMA Oracle Separation

David Miloschewsky, Supartha Podder, Dorian Rudolph

Comments 25 pages

详情
英文摘要

We study the power of quantum witnesses under perfect completeness. We construct a classical oracle relative to which a language lies in $\mathsf{QMA}_1$ but not in $\mathsf{QCMA}$ when the $\mathsf{QCMA}$ verifier is only allowed polynomially many adaptive rounds and exponentially many parallel queries per round. Additionally, we derandomize the permutation-oracle separation of Fefferman and Kimmel, obtaining an in-place oracle separation between $\mathsf{QMA}_1$ and $\mathsf{QCMA}$. Furthermore, we focus on $\mathsf{QCMA}$ and $\mathsf{QMA}$ with an exponentially small gap, where we show a separation assuming the gap is fixed, but not when it may be arbitrarily small. Finally, we derive consequences for approximate ground-state preparation from sparse Hamiltonian oracle access, including a bounded-adaptivity frustration-free variant.

2604.26920 2026-04-30 cs.CV

Color-Encoded Illumination for High-Speed Volumetric Scene Reconstruction

David Novikov, Eilon Vaknin, Narek Tumanyan, Mark Sheinin

Comments accepted to IEEE CVPR 2026 as a highlight

详情
英文摘要

The task of capturing and rendering 3D dynamic scenes from 2D images has become increasingly popular in recent years. However, most conventional cameras are bandwidth-limited to 30-60 FPS, restricting these methods to static or slowly evolving scenes. While overcoming bandwidth limitations is difficult for general scenes, recent years have seen a flurry of computational imaging methods that yield high-speed videos using conventional cameras for specific applications (e.g., motion capture and particle image velocimetry). However, most of these methods require modifications to a camera's optics or the addition of mechanically moving components, limiting them to a single-view high-speed capture. Consequently, these methods cannot be readily used to capture a 3D representation of rapid scene motion. In this paper, we propose a novel method to capture and reconstruct a volumetric representation of a high-speed scene using only unaugmented low-speed cameras. Instead of modifying the hardware or optics of each individual camera, we encode high-speed scene dynamics by illuminating the scene with a rapid, sequential color-coded sequence. This results in simultaneous multi-view capture of the scene, where high-speed temporal information is encoded in the spatial intensity and color variations of the captured images. To construct a high-speed volumetric representation of the dynamic scene, we develop a novel dynamic Gaussian Splatting-based approach that decodes the temporal information from the images. We evaluate our approach on simulated scenes and real-world experiments using a multi-camera imaging setup, showing first-of-a-kind high-speed volumetric scene reconstructions.

2604.26919 2026-04-30 cs.LG cs.AI cs.NE

Causal Learning with Neural Assemblies

Evangelia Kopadi, Dimitris Kalles

Comments 8 pages, 11 figures

详情
英文摘要

Can Neural Assemblies -- groups of neurons that fire together and strengthen through co-activation -- learn the direction of causal influence between variables? While established as a computationally general substrate for classification, parsing, and planning, neural assemblies have not yet been shown to internalize causal directionality. We demonstrate that the inherent operations of neural assemblies -- projection, local plasticity control, and sparse winner selection -- are sufficient for directional learning. We introduce DIRECT (DIRectional Edge Coupling/Training), a mechanism that co-activates source and target assemblies under an adaptive gain schedule to internalize directed relations. Unlike backpropagation-based methods, DIRECT relies solely on local plasticity, making the resulting causal claims auditable at the mechanism level. Our findings are verified through a dual-readout validation strategy: (i) synaptic-strength asymmetry, measuring the emergent weight gap between forward and reverse links, and (ii) functional propagation overlap, quantifying the reliability of directional signal flow. Across multiple domains, the framework achieves perfect structural recovery under a supervised, known-structure setting. These results establish neural assemblies as an auditable bridge between biologically plausible dynamics and formal causal models, offering an "explainable by design" framework where causal claims are traceable to specific neural winners and synaptic asymmetries.

2604.26917 2026-04-30 cs.CV

AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation

Zijie Wu, Chaohui Yu, Fan Wang, Xiang Bai

Comments 14 pages, TPAMI submission, code url: https://github.com/JarrentWu1031/AnimateAnyMesh-pp

详情
英文摘要

Recent advances in 4D content generation have attracted increasing attention, yet creating high-quality animated 3D models remains challenging due to the complexity of modeling spatio-temporal distributions and the scarcity of 4D training data. We present AnimateAnyMesh++, a feed-forward framework for text-driven animation of arbitrary 3D meshes with substantial upgrades in data, architecture, and generative capability. First, we expand the DyMesh-XL dataset by mining dynamic content from Objaverse-XL, increasing the number of unique identities from 60K to 300K and substantially broadening category and motion diversity. Second, we redesign DyMeshVAE-Flex with power-law topology-aware attention and vertex-normal enhanced features, which significantly improves trajectory reconstruction, local geometry preservation, and mitigates trajectory-sticking artifacts. Third, we introduce architectural changes to both DyMeshVAE-Flex and the rectified-flow (RF) generator to support variable-length sequence training and generation, enabling longer animations while preserving reconstruction fidelity. Extensive experiments demonstrate that AnimateAnyMesh++ generates semantically accurate and temporally coherent mesh animations within seconds, surpassing prior approaches in quality and efficiency. The enlarged DyMesh-XL, the upgraded DyMeshVAE-Flex, and variable-length RF together deliver consistent gains across benchmarks and in-the-wild meshes. We will release code, models, and the expanded DyMesh-XL upon acceptance of this manuscript to facilitate research in 4D content creation.

2604.26913 2026-04-30 math.OC cs.NA math.NA math.PR

Generalization of Zeroth-Order Method for Quotients of Quadratic Functions

Jonas Bresch

详情
英文摘要

Optimization of quadratic functions and the quotient of those are relevant in subspace and iterative optimization methods. In this paper, the calculation of the generalized operator norm and extremal generalized Rayleigh quotient is considered. In contrast to recent works an unconstrained sampling approach on the entire sphere for the random search direction in each iteration is proposed. Furthermore, the link to zeroth-order methods for Riemannian first- and second-order optimization methods is provided in the sense that the Riemannian gradient and Hessian are estimated by the specific surrogates. Even though the tangent space is not used in this construction the optimal step size problem can be computed in a closed form. The subproblems of this and recent works are illuminated in the context of sub-generalized Rayleigh quotient problems on specific Gram matrices. Together the achieved theory allows to construct an accelerated algorithm which shows state-of-the-art behavior and outperforms recent works.

2604.26910 2026-04-30 cs.RO

Bi-Level Optimization for Contact and Motion Planning in Rope-Assisted Legged Robots

Ruben Malacarne, Ioannis Tsikelis, Enrico Mingo Hoffman, Michele Focchi

详情
英文摘要

This paper presents a planning pipeline framework for locomotion in rope-assisted robots climbing vertical surfaces. The proposed framework is formulated as a bi-level optimization scheme that addresses a mixed-integer problem: selecting feasible terrain regions for landing while simultaneously optimizing the control inputs, namely rope tensions and leg forces, and landing location. The outer level of the optimization is solved using the Cross-Entropy Method, while the inner level relies on gradient-based nonlinear optimization to compute dynamically feasible motions. The approach is validated on a novel climbing robot platform, ALPINE, across a variety of challenging terrain configurations.

2604.26903 2026-04-30 eess.SP cs.AI cs.AR cs.ET cs.SY eess.SY

Recent Advances in mm-Wave and Sub-THz/THz Oscillators for FutureG Technologies

Baktash Behmanesh, Ahmad Rezvanitabar

详情
英文摘要

This paper provides a concise yet comprehensive review of recent advancements in millimeter-wave (mm-wave) oscillators below 100 GHz and sub-terahertz (sub-THz/THz) oscillators above 100 GHz for next-generation computing and communication systems, including 5G, 6G, and beyond. Various design approaches, including CMOS, SiGe, and III-V semiconductor technologies, are explored in terms of performance metrics such as phase noise, output power, efficiency, frequency tunability, and stability. The review highlights key challenges in achieving high-performance and reliable oscillator designs while discussing emerging techniques for performance enhancement. By evaluating recent design trends, this work aims to offer valuable insights and design guidelines that facilitate the development of robust mm-wave and sub-THz/THz oscillators for future communication, computing, and sensing applications.

2604.26900 2026-04-30 quant-ph cs.CC cs.DS

Strict Hierarchy for Quantum Channel Certification to Unitary

Kean Chen, Qisheng Wang, Zhicheng Zhang

Comments 13 pages, 3 algorithms

详情
英文摘要

We consider the problem of quantum channel certification to unitary, where one is given access to an unknown $d$-dimensional channel $\mathcal{E}$, and wants to test whether $\mathcal{E}$ is equal to a target unitary channel or is $\varepsilon$-far from it in the diamond norm. We present optimal quantum algorithms for this problem, settling the query complexities in three access models with increasing power. Specifically, we show that: (i) $Θ(d/\varepsilon^2)$ queries suffice for incoherent access model, matching the lower bound due to Fawzi, Flammarion, Garivier, and Oufkir (COLT 2023). (ii) $Θ(d/\varepsilon)$ queries suffice for coherent access model, matching the lower bound due to Regev and Schiff (ICALP 2008). (iii) $Θ(\sqrt{d}/\varepsilon)$ queries suffice for source-code access model, matching the lower bound due to Jeon and Oh (npj Quantum Inf. 2026). This demonstrates a strict hierarchy of complexities for quantum channel certification to unitary across various access models.

2604.26899 2026-04-30 eess.SY cs.RO cs.SY

Safe Navigation using Neural Radiance Fields via Reachable Sets

Omanshu Thapliyal, Malarvizhi Sankaranarayanasamy, Ravigopal Vennelakanti

Comments 5 pages, 8 figures, 2026 4th International Conference on Mechatronics, Control and Robotics (ICMCR)

详情
英文摘要

Safe navigation in cluttered environments is an important challenge for autonomous systems. Robots navigating through obstacle ridden scenarios need to be able to navigate safely in the presence of obstacles, goals, and ego objects of varying geometries. In this work, reachable set representations of the robot's real-time capabilities in the state space can be utilized to capture safe navigation requirements. While neural radiance fields (NeRFs) are utilized to compute, store, and manipulate the volumetric representations of the obstacles, or ego vehicle, as needed. Constrained optimal control is employed to represent the resulting path planning problem, involving linear matrix inequality constraints. We present simulation results for path planning in the presence of numerous obstacles in two different scenarios. Safe navigation is demonstrated through using reachable sets in the corresponding constrained optimal control problems.

2604.26898 2026-04-30 math.PR cs.LG stat.ML

Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models

Andrea Agazzi, Giuseppe Bruno, Eloy Mosig García, Samuele Saviozzi, Marco Romito

Comments 55 pages, 6 figures

详情
英文摘要

We prove pathwise convergence of the layerwise evolution of tokens in a finite-depth, finite-width transformer model with MultiLayer Perceptron (MLP) blocks to a continuous-time stochastic interacting particle system. We also identify the stochastic partial differential equation describing the evolution of the tokens' distribution in this limit and prove propagation of chaos when the number of such tokens is large. The bounds we establish are quantitative and the limits we consider commute. We further prove that the limiting stochastic model displays synchronization by noise and establish exponential dissipation of the interaction energy on average, provided that the common noise is sufficiently coercive relative to the deterministic self-attention drift. We finally characterize the activation functions satisfying the former condition.

2604.26897 2026-04-30 cs.RO cs.SY eess.SY

Stochastic Entanglement of Deterministic Origami Tentacles For Universal Robotic Gripping

Alec Boron, Bokun Zheng, Ziyang Zhou, Noel Naughton, Suyi Li

详情
英文摘要

Origami-inspired robotic grippers have shown promising potential for object manipulation tasks due to their compact volume and mechanical flexibility. However, robust capture of objects with random shapes in dynamic working environments often comes at the cost of additional actuation channels and control complexity. Here, we introduce a tendon-driven origami tentacle gripper capable of universal object gripping by exploiting a synergy between local, deterministic deformation programming and global, stochastic entanglements. Each origami tentacle is made by cutting thin Mylar sheets; It features carefully placed holes for routing an actuation tendon, origami creases for controlling the deformation, and a tapered shape. By tailoring these design features, one can prescribe the shrinking, bending, and twisting deformation, eventually creating deterministic coiling with a simple tendon pull. Then, when multiple coiling tentacles are placed in proximity, stochastic entanglement emerges, allowing the tentacles to braid, knot, and grip objects with random shapes. We derived a simulation model by integrating origami mechanics with Cosserat rods to correlate origami design, tendon deformation, and their collective gripping performance. Then, we experimentally tested how these coiling and entangling origami tentacles can grasp objects under gravity and in water. A stow-and-release deployment mechanism was also tested to simulate in-orbit grasping. Overall, the entertaining origami tentacle gripper presents a new strategy for robust object grasping with simple design and actuation.

2604.26896 2026-04-30 math.NA cs.NA physics.flu-dyn

Data assimilation for slightly compressible flow

Aytekin Çıbık, Rui Fang

详情
英文摘要

Continuous data assimilation (CDA) nudges observational data into governing equations to recover the underlying flow and improve predictions. Existing rigorous CDA analyses focus primarily on incompressible flows, yet no physical flow is perfectly incompressible. Approximating a slightly compressible flow with an incompressible model introduces non-negligible model errors. Data assimilation for compressible flows remains challenging due to strong nonlinearities and the presence of shocks. We design an algorithm that addresses the limitations of velocity-only nudging for slightly compressible flow. This work incorporates both velocity and pressure data from the slightly compressible flow and nudges both quantities into the incompressible Navier--Stokes equations. Our analysis shows that the model error decays exponentially in the initial error, with an asymptotic residual of order $\mathcal{O}(H)$, where H denotes the observation resolution. The analysis also identifies a scaling for the pressure nudging parameter $μ_1 = O(1/H^2)$ that ensures effective assimilation. We validate the theoretical results through a suite of numerical experiments: a convergence study confirming optimal rates, a modified Taylor--Green vortex benchmark demonstrating synchronization of energy, enstrophy, and pressure, and an acoustic wave propagation test that isolates the role of pressure nudging and achieves a $97.9\%$ reduction in pressure error relative to velocity-only assimilation. Together, these results provide a foundation for discrete error estimates and realistic compressible applications.

2604.26893 2026-04-30 cs.CV

Graph-based Semantic Calibration Network for Unaligned UAV RGBT Image Semantic Segmentation and A Large-scale Benchmark

Fangqiang Fan, Zhicheng Zhao, Xiaoliang Ma, Chenglong Li, Jin Tang

Comments 13 pages,13 figures

详情
英文摘要

Fine-grained RGBT image semantic segmentation is crucial for all-weather unmanned aerial vehicle (UAV) scene understanding. However, UAV RGBT semantic segmentation faces two coupled challenges: cross-modal spatial misalignment caused by sensor parallax and platform vibration, and severe semantic confusion among fine-grained ground objects under top-down aerial views. To address these issues, we propose a Graph-based Semantic Calibration Network (GSCNet) for unaligned UAV RGBT image semantic segmentation. Specifically, we design a Feature Decoupling and Alignment Module (FDAM) that decouples each modality into shared structural and private perceptual components and performs deformable alignment in the shared subspace, enabling robust spatial correction with reduced modality appearance interference. Moreover, we propose a Semantic Graph Calibration Module (SGCM) that explicitly encodes the hierarchical taxonomy and co-occurrence regularities among ground-object categories in UAV scenes into a structured category graph, and incorporates these priors into graph-attention reasoning to calibrate predictions of visually similar and rare categories.In addition, we construct the Unaligned RGB-Thermal Fine-grained (URTF) benchmark, to the best of our knowledge, the largest and most fine-grained benchmark for unaligned UAV RGBT image semantic segmentation, containing over 25,000 image pairs across 61 categories with realistic cross-modal misalignment. Extensive experiments on URTF demonstrate that GSCNet significantly outperforms state-of-the-art methods, with notable gains on fine-grained categories. The dataset is available at https://github.com/mmic-lcl/Datasets-and-benchmark-code.

2604.26892 2026-04-30 cs.SE

Hot Fixing in the Wild

Carol Hanna, Karine Even-Mendoza, W. B. Langdon, Mar Zamorano López, Justyna Petke, Federica Sarro

详情
英文摘要

Despite the operational importance of hot fixes, large-scale evidence on how they reshape routine maintenance workflows, particularly in the era of autonomous coding agents, remains limited. We analyse hot fixes present in over 61,000 GitHub repositories from the Hao-Li/AIDev dataset and find consistent patterns of urgency: reduced collaboration (typically a single contributor), smaller and more targeted changes (median 2-3 commits and files, with <10 line modifications), limited review (often fewer than two reviewers), and substantially fewer test file modifications than regular bug fixes, consistent with their urgency-driven character. Leveraging the same urgency contexts, we examine differences between human- and AI-agent-authored hot fixes, revealing over 10 distinct repair behaviours, thus offering insights into future human-automation collaboration for hot fixing. Our study is the first to empirically analyse hot fix code changes at scale using a repository-level operationalisation of urgency. The comparison of human and agentbehaviours delineates their distinct characteristics, providing a foundation for understanding hot fixing in real-world practice

2604.26889 2026-04-30 cs.PF

Revealing NVIDIA Closed-Source Driver Command Streams for CPU-GPU Runtime Behavior Insight

Yuang Yan, Ian Karlin, Ryan Grant

详情
英文摘要

For NVIDIA GPUs, CUDA is the primary interface through which applications orchestrate GPU execution, yet much of the logic that realizes CUDA operations resides in NVIDIA's closed-source userspace driver. As a result, the translation from high-level CUDA APIs to low-level hardware commands remains opaque, limiting both software understanding and performance attribution. This paper makes that command path visible. We recover the hardware command streams emitted by NVIDIA's closed-source userspace driver with full integrity by leveraging the recently open-sourced kernel driver, instrumenting the memory-mapping path, and installing a hardware watchpoint on the userspace mapping of the GPU doorbell register. This lets us capture complete command submissions at the moment they are committed. Using this methodology, we present two case studies. For CUDA data movement, we identify the DMA submission modes selected by the driver and characterize their raw hardware performance independently of driver overhead through CUDA-bypassing controlled command issuance. For CUDA Graphs, we show that the reduced launch overhead in newer CUDA releases is associated with a smaller command footprint and a more efficient submission pattern. Together, these results show that command-level visibility provides a practical basis for understanding and optimizing GPU middleware behavior, improving performance interpretation, and informing future hardware--software co-design for CUDA and related accelerator stacks.

2604.26888 2026-04-30 cs.LG

Multiple Additive Neural Networks for Structured and Unstructured Data

Janis Mohr, Jörg Frochte

Comments Accepted author manuscript; page layout differs from the published Springer version

详情
Journal ref
In: Bäck, T. et al. (eds.), Computational Intelligence, IJCCI 2023, Studies in Computational Intelligence, vol. 1196, Springer, Cham, pp. 165-186 (2025)
英文摘要

This paper extends and explains the Multiple Additive Neural Networks (MANN) methodology, an enhancement to the traditional Gradient Boosting framework, utilizing nearly shallow neural networks instead of decision trees as base learners. This innovative approach leverages neural network architectures, notably Convolutional Neural Networks (CNNs) and Capsule Neural Networks, to extend its application to both structured data and unstructured data such as images and audio. For structured data the advantages of capsule neural networks as feature extractors are used and combined with MANN as a classifier. MANN's unique architecture promotes continuous learning and integrates advanced heuristics to combat overfitting, ensuring robustness and reducing sensitivity to hyperparameter settings like learning rate and iterations. Our empirical studies reveal that MANN surpasses traditional methods such as Extreme Gradient Boosting (XGB) in accuracy across well-known datasets. This research demonstrates MANN's superior precision and generalizability, making it a versatile tool for diverse data types and complex learning environments.

2604.26883 2026-04-30 cs.CV

SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset

Changhyun Roh, Yonghyun Jeong, Jonghyun Lee, Chanho Eom, Jihyong Oh

Comments The last two authors are co-corresponding authors. Please visit our project page at https://cmlab-korea.github.io/SEAL

详情
英文摘要

Synthesizing a target concept from a single reference image is challenging in diffusion-based personalized text-to-image generation, particularly for sticker personalization where prompts often require explicit attribute edits. With only one reference, test-time fine-tuning (TTF) methods tend to overfit, producing \textit{visual entanglement}, where background artifacts are absorbed into the learned concept, and \textit{structural rigidity}, where the model memorizes reference-specific spatial configurations and loses contextual controllability. To address these issues, we introduce \textbf{SE}mantic-aware single-image sticker person\textbf{AL}ization (\textbf{SEAL}), a plug-and-play, architecture-agnostic adaptation module that integrates into existing personalization pipelines without modifying their U-Net-based diffusion backbones. SEAL applies three components during embedding adaptation: (1) a Semantic-guided Spatial Attention Loss, (2) a Split-merge Token Strategy, and (3) Structure-aware Layer Restriction. To support sticker-domain personalization with attribute-level control, we present StickerBench, a large-scale sticker image dataset with structured tags under a six-attribute schema (Appearance, Emotion, Action, Camera Composition, Style, Background). These annotations provide a consistent interface for varying context while keeping target identity fixed, enabling systematic evaluation of identity disentanglement and contextual controllability. Experiments show that SEAL consistently improves identity preservation while maintaining contextual controllability, highlighting the importance of explicit spatial and structural constraints during test-time adaptation. The code, StickerBench, and project page will be publicly released.