arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2088
2605.05219 2026-05-08 cs.LG cs.AI

Sparse Prefix Caching for Hybrid and Recurrent LLM Serving

Mikhail Shirokikh, Sergey Nikolenko

详情
英文摘要

Prefix caching is a key latency optimization for autoregressive LLM serving, yet existing systems assume dense per-token key/value reuse. State-space models change the structure of the problem: a recurrent layer can resume from a single stored state rather than requiring the entire token history. This asymmetry opens a new design point between no reuse and dense caching: store exact recurrent states at a sparse set of checkpoint positions and, on a cache hit, resume from the deepest stored checkpoint and recompute the remaining suffix exactly. We formalize sparse prefix caching as checkpoint placement under a distribution over overlap depths, yielding an exact O(NM) dynamic program. For use cases where requests share a non-trivial prefix (e.g. asking different questions about a single long document), we show that our method consistently improves the Pareto frontier traced by standard heuristics on real-world data. Across QuALITY and System Prompts, distribution-aware placement dominates every fixed-budget baseline on the measured layer-group Pareto frontier and matches or outperforms the strongest heuristic (block caching) while typically using substantially fewer checkpoints, with the largest gains at low checkpoint budgets where the overlap distribution is most non-uniform. The method is most relevant when many requests share a substantial but not identical prefix within a retained cache entry. It preserves exact outputs, does not change the recurrent computation itself or require new recurrent update kernels, applies to recurrent/SSM layers whose hidden state can be extracted and restored exactly, and for hybrid models can be combined with existing KV-cache compression techniques.

2605.05218 2026-05-08 cs.LG cs.AI math.DS nlin.CD

Horizon-Constrained Rashomon Sets for Chaotic Forecasting

Gauri Kale, Rahul Vishwakarma, Holly Diamond, Ava Hedayatipour, Amin Rezaei

详情
Journal ref
AIP Advances 16, 045208 (2026)
英文摘要

Predictive multiplicity and chaotic dynamics represent two fundamental challenges in machine learning that have evolved independently despite their conceptual connections. We bridge this gap by introducing horizon-constrained Rashomon sets, a theoretical framework that characterizes how model multiplicity evolves with prediction horizon in chaotic systems. Unlike static prediction tasks where the Rashomon set remains fixed, chaos induces exponential divergence among initially similar models, fundamentally transforming the nature of predictive equivalence. We prove that the effective Rashomon set contracts exponentially with lead time at a rate determined by the maximum Lyapunov exponent and introduce Lyapunov-weighted metrics that provide tighter bounds on predictive disagreement. Leveraging these insights, we develop decision-aligned selection algorithms that choose among near-optimal models based on downstream utility rather than forecast accuracy alone. Extensive experiments on synthetic chaotic systems (Lorenz-96, Kuramoto-Sivashinsky) and real-world applications (wind power, traffic, weather) demonstrate that our framework improves decision quality by 18-34\% while maintaining competitive predictive performance. This work establishes the first rigorous connection between chaos theory and predictive multiplicity, providing principled guidance for deploying machine learning in safety-critical chaotic domains.

2605.05217 2026-05-08 cs.LG cs.AI

Physics-Informed Neural Networks with Learnable Loss Balancing and Transfer Learning

Reza Pirayeshshirazinezhad

详情
英文摘要

We propose a self-supervised physics-informed neural network (PINN) framework that adaptively balances physics-based and data-driven supervision for scientific machine learning under data scarcity. Unlike prior PINNs that rely on fixed or heuristic weighting of physics residuals and data loss, our approach introduces a learnable blending neuron that dynamically adjusts the relative contribution of each term based on their uncertainties. This mechanism enables stable training and improved generalization without manual tuning. To further enhance efficiency, we integrate a transfer learning strategy that reuses representations from related domains and adapts them to new physical systems with limited data. We validate the framework for the prediction of heat transfer in liquid-metal miniature heat sinks using only 87 CFD datapoints, where the adaptive PINN achieves an error <8%, outperforming shallow neural networks, kernel methods, and physics-only baselines. Our framework provides a general recipe for embedding physics adaptively into neural networks, offering a robust and reproducible approach for data-scarce problems across various scientific domains, including fluid dynamics and material modeling.

2605.05216 2026-05-08 cs.LG

SAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees

Yi Xie, Yangyang Xu, Yi Fan, Bo Liu

Comments Published at AAMAS 2026

详情
英文摘要

Large language models (LLMs) with a large number of parameters achieve strong performance but are often prohibitively expensive to deploy. Recent work explores using teams of smaller, more efficient LLMs that collectively match or even outperform a single large model. However, jointly updating multiple agents introduces compounding distribution shifts, making coordination and stability during training difficult. We address this by introducing Sequential Agent Tuning (SAT), a coordinator-free training paradigm. SAT represents the team as a factorized policy and employs block-coordinate updates over agents, enabling scalable, decentralized training without a central controller. Specifically, we develop a sequence-aware, on-policy advantage estimator that conditions on the evolving team policy, coupled with per-agent KL trust regions that isolate occupancy drift. Theoretically, this framework provides two critical guarantees. First, it ensures monotonic improvement, stabilizing the training process. Second, it establishes provable plug-and-play invariance: any agent can be upgraded to a stronger model without retraining the rest of the team, with a formal guarantee that the performance bound improves. Empirically, a team of three 4B agents (12B total) trained with SAT surpasses the much larger Qwen3-32B on AIME24/25 benchmarks by 3.9\% on average. We validate our plug-and-play theory by swapping in two 8B agents, which boosts the composite score by 10.4\%. We provide code and appendix of proof at https://github.com/Yydc/SAT-AAMAS

2605.05215 2026-05-08 cs.CV cs.AI cs.LG

Layout-Aware Representation Learning for Open-Set ID Fraud Discovery

Jinxing Li, Nicholas Ren, Cathy Chang, Hongkai Pan, Daniel George

详情
英文摘要

Identity-document fraud detection is not a stationary binary classification problem. Adaptive attackers modify templates and fabrication pipelines, making historical fraud labels stale, and successful forgeries recur at scale as coherent campaigns. We therefore study layout-aware representation learning for open-set fraud discovery rather than only closed-set classification. We adapt DINOv3 to the document domain via context-aware SimMIM fine-tuning and supervised metric learning with composite loss that encourages inter-class separability and intra-class compactness. The model is trained with U.S. IDs only. With a lightweight MLP and softmax classifier, the embedding achieves 99.83% layout classification accuracy on Canadian layouts. Moreover, on a dataset of 20,448 Canadian IDs, embedding-space analysis surfaces 276 adaptive physical-fraud cases, including 222 not surfaced by incumbent detectors. The embedding supports similarity-based expansion from a single confirmed seed to additional related cases not linked by conventional metadata graphs. The layout-aware document embeddings provide a production-aligned basis for discovering novel and campaign-scale fraud under distribution shift.

2605.05213 2026-05-08 cs.LG q-bio.QM

Nationwide EHR-Based Chronic Rhinosinusitis Prediction Using Demographic-Stratified Models

Sicong Chang, Yidan Shen, Justina Varghese, Akshay R Prabhakar, Sebastian Guadarrama-Sistos-Vazquez, Jiefu Chen, Masayoshi Takashima, Omar G. Ahmed, Renjie Hu, Xin Fu

Comments Sicong Chang, Yidan Shen are the co-first authors This paper is already accepted to IEEE Engineering in Medicine and Biology Society (EMBC) 2026 conference

详情
英文摘要

Chronic rhinosinusitis (CRS) is a common heterogeneous inflammatory disorder that causes substantial morbidity and healthcare costs. CRS is difficult to identify early from routine encounters, as symptom presentations overlap with common conditions such as allergic rhinitis, and heterogeneous phenotypes further obscure risk patterns. Prior predictive studies often rely on single-institutional cohorts , which reduce population-level generalizability. To overcome this, we leveraged nationwide longitudinal EHR data from the \textit{All of Us} Research Program to predict CRS diagnosis using two years of pre-diagnostic history. To address extreme feature sparsity and dimensionality in coded EHR data, we implemented a hybrid feature-selection pipeline that combines prevalence-based statistical screening with model-based importance ranking, compressing approximately 110,000 candidate codes into 100 interpretable features. To capture demographic heterogeneity, we trained demographic stratified models across six adult sex and life-stage subgroups with subgroup-specific hyperparameter tuning. Our framework achieved an overall AUC of 0.8461, improving discrimination by 0.0168 over the best baseline. These results demonstrate that routinely collected EHR data may support population-representative CRS risk stratification and inform earlier triage and referral prioritization in primary care.

2605.05209 2026-05-08 cs.LG cs.AI

Are Flat Minima an Illusion?

Michael Timothy Bennett

详情
英文摘要

Neural networks that land in flat regions of the loss landscape tend to generalise better than those in sharp regions. Sharpness-Aware Minimisation exploits this to improve generalisation. But function-preserving reparameterisation can inflate the Hessian of any minimum by two orders of magnitude without changing a single prediction. If the geometry of weight space can be manufactured from nothing, it cannot be the cause of anything. In other words, flat is simple and simplicity depends on encoding. Here I show that the actual driver is weakness, the volume of completions compatible with the learned function in the learner's embodied language. Weakness is reparameterisation-invariant because it is defined over what the network \emph{does}, not how it is parameterised. I prove weakness is minimax-optimal under exchangeable demands, and that PAC-Bayes bounds work because they correlate with it. On MNIST, the large-batch generalisation advantage \emph{vanishes} as training data grows, from $+1.6\%$ at $n = 2{,}000$ to $+0.02\%$ at $n = 60{,}000$. A quantity whose predictive power depends on how much data you have is not a cause but a confounder. I run head-to-heads on 100 networks with identical architecture and training. For MNIST weakness predicts generalisation ($ρ= +0.374$, $p = 0.00012$), sharpness anticorrelates ($ρ= -0.226$) and simplicity predicts nothing ($p = 0.848$). For Fashion-MNIST ($ρ= +0.384$, $p = 8.15 \times 10^{-5}$), though simplicity is at least somewhat predictive there. Simplicity is dataset dependent, whereas weakness is invariant. Flat minima were never the answer.

2605.05208 2026-05-08 cs.RO cs.DC math.OC

A GPU-Accelerated Hybrid Method for a Class of Multi-Depot Vehicle Routing Problems

Zhenyu Lei, Jin-Kao Hao

详情
英文摘要

Multi-depot vehicle routing problems (MDVRPs) are prevalent in a variety of practical applications. However, they are computationally challenging to solve due to their inherent complexity. This paper proposes an effective hybrid algorithm for a class of MDVRPs. The algorithm integrates a learning-driven, diversity-controlled route-exchange crossover and a multi-depot-supported feasible-and-infeasible search framework guided by a multi-penalty evaluation function. Two dedicated depot-related local search operators are incorporated to further strengthen the search capability in multi-depot settings. To improve computational efficiency and scalability, an enhanced version of the algorithm is developed that uses a tensor-based GPU acceleration combined with a novel multi-move update strategy. Extensive computational experiments on benchmark instances of three MDVRP variants show that the proposed algorithms are highly competitive with state-of-the-art methods, especially for large-scale instances.

2605.05102 2026-05-08 cs.LG stat.ML

Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning

Harin Lee, Min-hwan Oh

Comments Accepted at the Conference of Learning Theory (COLT) 2026

详情
英文摘要

We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilistic guarantee that holds uniformly over all confidence levels $δ\in (0,1]$, thereby characterizing the regret distribution across the full range of $δ$. We present a simple UCBVI-style algorithm with exploration bonus $\min\{c_{1,k}/N, c_{2,k}/\sqrt{N}\}$, where $N$ denotes the visit count and $(c_{1,k},c_{2,k})$ are user-specified parameters. For arbitrary parameter sequences, we derive general gap-independent and gap-dependent distributional regret bounds, yielding a principled characterization of how the parameters control the trade-off between expected performance, tail risk, and instance-dependent behavior. In particular, our bounds achieve optimal trade-offs between expected and distributional regret in both minimax and instance-dependent regimes. As a special case, for multi-armed bandits with $A$ arms and horizon $T$, we obtain a distributional regret bound of order $\mathcal{O}(\sqrt{AT}\log(1/δ))$, confirming the conjecture of Lattimore & Szepesvári (2020, Section 17.1) for the first time.

2605.04830 2026-05-08 cs.LG cond-mat.stat-mech

Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models

Yifan F. Zhang, Fangjun Hu, Guangkuo Liu, Mert Okyay, Xun Gao

Comments 20 pages, 10 figures. comments are welcome

详情
英文摘要

Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcate into different semantic minima of the energy landscape, whereas the nonlocality picture views the critical window as when local denoising fails. We study whether two notions of such phase transitions are concurrent in modern diffusion transformers. By evaluating the dynamics and outcomes of the generation trajectory, we observe a near-simultaneous occurrence of the non-locality and symmetry breaking critical times. Our work is the first to unify the two notions of phase transitions in practice: it provides a concrete diagnostic for when and why diffusion models rely on conditioning and global denoising, enabling principled evaluation of model efficiency and guiding the design of architectures and sampling schemes that avoid unnecessary computation.

2605.04719 2026-05-08 cs.CL

Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL

Yaxun Dai, Baolin Sun, Junying Wang, Pengfei Wang, Yingqi Gao, Xuemei Dong, Mengdie Chu, Xiang Qi, Pingfu Chao

详情
英文摘要

Tool-integrated Text-to-SQL parsing has emerged as a promising paradigm, framing SQL generation as a sequential decision-making process interleaved with tool execution. However, existing reinforcement learning approaches mainly rely on coarse-grained outcome supervision, resulting in a fundamental credit assignment problem: models receive the same reward for any trajectory that yields the correct answer, even when intermediate steps are redundant, inefficient, or erroneous. Consequently, models are encouraged to explore suboptimal reasoning spaces, limiting both efficiency and generalization. To address this problem, we propose FineStep, a novel framework for step-level credit assignment in tool-augmented Text-to-SQL. First, we introduce a reward design with independent process rewards to alleviate the signal sparsity of outcome supervision. Next, we present a step-level credit assignment mechanism to precisely quantify the value of each reasoning step. Finally, we develop a policy optimization method based on step-level advantages for efficient updates. Extensive experiments on BIRD benchmarks show that FineStep achieves state-of-the-art performance and reduces redundant tool interactions, with a 3.25% average EX gain over GRPO at the 4B scale.

2605.04412 2026-05-08 cs.CV

Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion

Yiran Qiao, Yiren Lu, Yunlai Zhou, Disheng Liu, Linlin Hou, Rui Yang, Yu Yin, Jing Ma

详情
英文摘要

3D asset generation plays a pivotal role in fields such as gaming and virtual reality, enabling the rapid synthesis of high-fidelity 3D objects from a single or multiple images. Building on this capability, enabling style-controllable generation naturally emerges as an important and desirable direction. However, existing approaches typically rely on style images that lie within or are similar to the training distribution of 3D generation models. When presented with out-of-distribution (OOD) styles, their performance degrades significantly or even fails. To address this limitation, we introduce \textbf{DiLAST}: 2D Diffusion-based Latent Awakening for 3D Style Transfer. Specifically, we leverage a pretrained 2D diffusion model as a teacher to provide rich and generalizable style priors. By aligning rendered views with the target style under diffusion-based guidance, our method optimizes the structured 3D latent representations for stylization. We observe that this limitation stems not from insufficient model capacity, but from the underutilization of structured 3D latents, which are inherently expressive. Despite being trained on comparatively limited data, 3D generation models can leverage 2D diffusion guidance to steer denoising toward specific directions in latent space, thereby producing diverse, OOD styles. Extensive experiments across diverse data and multiple 3D generation backbones demonstrate the effectiveness and plug-and-play nature of our approach.

2605.04282 2026-05-08 cs.LG

Hardware-Aware Neural Feature Extraction for Resource-Constrained Devices

Francesco Tosini, Simone Pedroni, Christian Veronesi, Pietro Bartoli, Andrea Giudici, Marco Paracchini, Marco Marcon, Diana Trojaniello

Comments This paper has been accepted for publication at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. \c{opyright}IEEE

详情
英文摘要

Visual SLAM is a core component of spatial computing systems, yet deploying learned local feature extractors on microcontroller-class hardware remains challenging due to memory, bandwidth, and quantization constraints. While modern neural descriptors provide strong robustness, their practical adoption is often hindered by system-level bottlenecks that are not captured by FLOP-based efficiency metrics. In this work, we introduce Gideon, a hardware-aware neural feature extractor explicitly designed for resource-constrained devices. Our approach combines relational knowledge distillation from a SuperPoint teacher with differentiable neural architecture search (DNAS) under strict memory and operator constraints. Unlike conventional design pipelines, we treat quantization stability and dynamic-range compactness as first-class objectives. We show that architectural choices such as replacing Batch Normalization with affine layers significantly improve INT8 robustness, and that descriptor dimensionality directly governs quantization resilience. Deployed on STM32N6, Gideon achieves 9.003 ms inference time (111 fps) while remaining below a 1.5 MB memory footprint. Remarkably, INT8 quantization induces negligible degradation and occasionally matches full-precision performance. These results demonstrate that robust learned feature extraction can be reconciled with embedded hardware constraints through holistic hardware-algorithm co-design.

2605.04066 2026-05-08 cs.CL cs.ET cs.LG

Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning

Yiming Huang, Zhenbo Shi, Shuzheng Gao, Cuiyun Gao, Peiyi Han, Chuanyi Liu

Comments Accepted to ACL 2026 (Findings)

详情
英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) is an essential paradigm that enhances the reasoning capabilities of Large Language Models (LLMs). However, existing methods typically rely on static policy optimization schemes that misalign with the model's evolving reasoning capabilities. To address this issue, we propose Adaptive Power-Mean Policy Optimization (APMPO), which comprises two main innovations: Power-Mean Policy Optimization (PMPO) and Feedback-Adaptive Clipping (FAC). Specifically, PMPO introduces a generalized power-mean objective. This enables the model to adaptively transition from the signal-amplifying behavior of the arithmetic mean to the consistency-enforcing behavior of the geometric mean. FAC adaptively adjusts clipping bounds based on real-time reward statistics to overcome the limitations of static mechanisms. Capitalizing on these innovations, APMPO improves learning dynamics and reasoning performance. Extensive experiments on nine datasets across three reasoning tasks showcase the superiority of APMPO over state-of-the-art RLVR-based baselines. For instance, APMPO boosts the average Pass@1 score on mathematical reasoning benchmarks by 3.0 points compared to GRPO when using Qwen2.5-3B-Instruct.

2605.04065 2026-05-08 cs.CL cs.ET cs.LG

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

Yiming Huang, Zhenbo Shi, Xin-Cheng Wen, Jichuan Zeng, Cuiyun Gao, Peiyi Han, Chuanyi Liu

Comments Accepted by ACL 2026

详情
英文摘要

Unsupervised reinforcement learning (RL) has emerged as a promising paradigm for enabling self-improvement in large language models (LLMs). However, existing unsupervised RL-based methods often lack the capacity to adapt to the model's evolving reasoning capabilities during training. Therefore, these methods can misdirect policy optimization in the absence of ground-truth supervision. To address this issue, we introduce FREIA, a novel RL-based algorithm built on two key innovations: (1) Free Energy-Driven Reward (FER) adapts rewards to balance consensus and exploration based on the Free Energy Principle. (2) Adaptive Advantage Shaping (AAS) adaptively adjusts learning signals based on the statistical characteristics of sampled rewards. Empirical evaluations on nine datasets across three reasoning tasks showcase that FREIA outperforms other unsupervised RL-based baselines. Notably, in mathematical reasoning tasks, FREIA surpasses other methods by an average of 0.5 to 3.5 points in Pass@1 using the DeepSeek-R1-Distill-Qwen-1.5B model.

2605.04057 2026-05-08 cs.LG cs.AI

Structured Progressive Knowledge Activation for LLM-Driven Neural Architecture Search

Zhen Liu, Yuhan Liu, Jinjun Wang, Wei Song, Jianyi Liu, Jingwen Fu

详情
英文摘要

This paper focuses on a key challenge in Neural Architecture Search (NAS): integrating established architectural knowledge while exploring new designs under expensive evaluations. Large language models (LLMs) are a promising assistant for NAS because they can translate rich architectural and coding priors into executable code edits. However, in practice, seemingly local revisions often propagate into non-local behavioral and performance shifts because a single edit can inadvertently couple multiple interacting functional factors, a phenomenon we refer to as functional entanglement. To make LLM knowledge usable under such entanglement, we propose Structured Progressive Knowledge Activation (SPARK), which activates relevant priors by explicitly selecting the functional factor to modify and conditioning the edit on that factor. This factor-conditioned editing reduces entangled side effects and yields more targeted, reliable architecture modifications. On CLRS-DFS, SPARK achieves a 28.1x sample-efficient architecture evolution speedup and yields a 22.9 percent relative improvement in OOD accuracy.

2605.03989 2026-05-08 cs.AI

An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration

Dutao Zhang, Tian Liao

Comments Preprint. 6 pages, 1 figure, 3 tables

详情
英文摘要

Retrieval-augmented generation systems often assume that one fixed retrieval pipeline is sufficient across heterogeneous tasks, yet factoid question answering, multi-hop reasoning, and scientific verification exhibit different retrieval preferences. We present Experience-RAG Skill, an agent-oriented pluggable retrieval orchestration layer positioned between the agent and the retriever pool. The proposed skill analyzes the current scene, consults an experience memory, selects an appropriate retrieval strategy, and returns structured evidence to the agent. Under a fixed candidate pool, Experience-RAG Skill achieves an overall nDCG@10 of 0.8924 on BeIR/nq, BeIR/hotpotqa, and BeIR/scifact, outperforming fixed single-retriever baselines and remaining competitive with Adaptive-RAG-style routing. The results suggest that retrieval strategy selection can be productively encapsulated as a reusable agent skill rather than being hard-coded in the upper workflow.

2605.03749 2026-05-08 cs.CV

FluxFlow: Conservative Flow-Matching for Astronomical Image Super-Resolution

Shuhong Liu, Xining Ge, Ziteng Cui, Liuzhuozheng Li, Gengjia Chang, Jun Liu, Ziying Gu, Dong Li, Xuangeng Chu, Lin Gu, Tatsuya Harada

详情
英文摘要

Ground-to-space astronomical super-resolution requires recovering space-quality images from ground-based observations that are simultaneously limited by pixel sampling resolution and atmospheric seeing, which imposes a stochastic, spatially varying PSF that cannot be resolved through upsampling alone. Existing methods rely on synthetic training pairs that fail to capture real atmospheric statistics and are prone to either over-smoothed reconstructions or hallucination sources with no physical counterpart in the observed sky. We propose FluxFlow, a conservative pixel-space flow-matching framework that incorporates observation uncertainty and source-region importance weights during training, and a training-free Wiener-regularized test-time correction to suppress hallucination sources while preserving recovered detail. We further construct the DESI--HST Dataset, the large-scale real-world benchmark comprising 19,500 real co-registered ground-to-space image pairs with real atmospheric PSF variation. Experiments demonstrate that FluxFlow consistently outperforms existing baseline methods in both photometric and scientific accuracy.

2605.03379 2026-05-08 cs.LG cs.CL

Two Calls, Two Moments, and the Vote-Accuracy Curve of Repeated LLM Inference

Yi Liu

详情
英文摘要

Repeated sampling is a standard way to spend test-time compute, but its benefit is controlled by the latent distribution of correctness across examples, not by one-call accuracy alone. We study the binary correctness layer of repeated LLM inference under conditional-i.i.d. calls. One labeled call identifies the mean latent success probability; two labeled calls identify its second moment and hence the same-example correctness correlation that separates stable errors from recoverable call-level randomness. From these two moments, every fixed majority-vote budget has a sharp distribution-free two-call interval. The key technical reduction is that the infinite-dimensional moment problem has three-atom extremizers and quadratic dual certificates for every finite budget, so the bounds are exact rather than discretized or parametric. The first useful budget, three votes, has a closed form, width at most $1/8$, and a certified-improvement criterion. The infinite-vote endpoint is the limit of majority voting as the number of calls tends to infinity; it is also sharply bounded, but remains threshold-sensitive because it depends on latent mass around $q=1/2$. We add maximum-entropy and Latent-difficulty Gaussian-probit point completions, and experiments on LLM calls over QNLI and QQP show that empirical three- and five-vote accuracies are contained in the projected two-call regions while temperature changes and randomized model mixtures can create voting gains not ordered by one-call accuracy.

2605.03354 2026-05-08 cs.AI

What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis

Xutao Mao, Jinman Zhao, Gerald Penn, Cong Wang

详情
英文摘要

Agent memory failures are silent: an LLM-based agent can produce a fluent response even when it fails to extract, retain, or retrieve the information needed across sessions. The write-manage-read loop describes the external pipeline of these systems but leaves open which internal computations implement each stage. Tracing feature circuits across the Qwen-3 family (0.6B--14B) and two memory frameworks (mem0 and A-MEM), we report two mechanistic findings and one deliverable. First, control is detectable before content: routing circuitry is causally active at 0.6B, while content circuitry produces no detectable signal until 4B, exposing a deployment regime where small models route memory decisions before they can reliably extract or ground the underlying facts. Second, the shared hub is recruited, not created: Write and Read converge on a late-layer hub that already exists in the base model as a context-grounding substrate, and memory framing recruits a memory-specific functional direction on this substrate rather than building one of its own. Both findings transfer across mem0 and A-MEM, indicating that the underlying computations are properties of the base model rather than of any particular interface. Building on this circuit structure, we develop an unsupervised stage-level diagnostic that localizes silent failures to the responsible operation up to 76.2% accuracy, outperforming the strongest supervised baseline by 13 points. Together, these results point to circuit-level signatures as a practical handle for monitoring and structurally-guided design of agent memory.

2605.03227 2026-05-08 cs.AI

Evaluating Prompting and Execution-Based Methods for Deterministic Computation in LLMs

Hongkun Yu

Comments 8 pages, 1 figure. Code and dataset available at https://github.com/bigbird231/llm-exact-computation-dataset

详情
英文摘要

Large Language Models (LLMs) have demonstrated strong capabilities in natural language understanding and reasoning. However, their ability to perform exact, deterministic computation remains unclear. In this work, we systematically evaluate multiple prompting strategies, including Chain-of-Thought (CoT), Least-to-Most decomposition, Program-of-Thought (PoT), and Self-Consistency (SC), on tasks requiring precise and error-free outputs, including binary counting, longest substring detection, and arithmetic evaluation. To support this study, we introduce a synthetic dataset with diverse natural language instructions, enabling controlled evaluation of exact computation across multiple task types. Our results show that standard prompting methods achieve only moderate accuracy on sequence-based tasks. CoT provides limited improvement, while Least-to-Most suffers from error accumulation. In contrast, PoT achieves perfect accuracy by generating executable code and delegating computation to an external interpreter. Self-Consistency improves robustness through majority voting, but incurs substantial computational overhead. We further train a small domain-specific model (CodeT5-small) to generate executable programs, which achieves perfect accuracy on held-out synthetic test data across all tasks with minimal training cost. Overall, our findings suggest that LLMs may simulate reasoning patterns rather than reliably perform exact symbolic computation. For deterministic tasks, combining LLMs with external tools or using specialized models provides a more reliable and efficient solution.

2605.03222 2026-05-08 cs.LG stat.ML

Beyond Activation Alignment: The Geometry of Neural Sensitivity

Amirhossein Yavari, Farnaz Zamani Esfahlani

Comments 9 pages, 4 figures

详情
英文摘要

Activation-alignment measures such as Representational Similarity Analysis (RSA), Canonical Correlation Analysis (CCA), and Centered Kernel Alignment (CKA) are widely used to compare biological and artificial neural representations. Recent theoretical work interprets many of these methods as assessing agreement between optimal linear readouts over broad families of global tasks. However, agreement at the level of global readouts does not determine how a system uses local stimulus evidence. Specifically, representations may align in activation space yet differ in their sensitivity to small perturbations. To address this challenge, we introduce a complementary framework based on local decodable information, which focuses on a representation's ability, under noise, to discriminate small perturbations within a specified stimulus-coordinate subspace. Building on Fisher information and local representation geometry, we summarize each representation using the expected projected pullback/Fisher metric over that subspace. This formulation induces a second-moment family of local discrimination tasks, for which the resulting operator provides a minimal, complete dataset-level summary of expected discriminability. We compare these regularized signatures using a log-spectral distance on the manifold of symmetric positive definite (SPD) matrices, yielding the Spectral Riemannian Alignment Score (S-RAS) and a uniform multiplicative certificate over the corresponding family of lifted task values. Empirically, this framework enables the recovery of corresponding layers across independently trained artificial neural networks, supports transferable class-conditional probes, reveals controlled dissociations between standard and robust training, and uncovers stimulus-coordinate family effects across mouse visual cortex using the Allen Brain Observatory static gratings dataset.

2605.03125 2026-05-08 cs.LG

Taming the Curses of Multiagency in Robust Markov Games with Large State Space through Linear Function Approximation

Jingchu Gai, Laixi Shi

详情
英文摘要

Multi-agent reinforcement learning (MARL) holds great potential but faces robustness challenges due to environmental uncertainty. To address this, distributionally robust Markov games (RMGs) optimize worst-case performance when the environment deviates from the nominal model within a uncertainty set. Beyond robustness, an equally urgent goal for MARL is data efficiency -- sampling from vast state and action spaces that grow exponentially with the number of agents potentially leads to the curse of multiagency. However, current provably data-efficient algorithms for RMGs are limited to tabular settings with finite state and action spaces, which are only computationally manageable for small-scale problems, leaving RMGs with large-scale (or infinite) state spaces largely unexplored. The only existing work beyond tabular settings focuses on linear function approximation (LFA) for a restrictive class of RMGs using vanish minimal value assumption and still suffers from sample complexity with the curse of multiagency. In this work, we focuses on general RMGs with LFA. For uncertainty sets defined by total variation distance, we develop provably data-efficient algorithms that break the curse of multiagency in both the generative model setting and a newly proposed online interactive setting. To our knowledge, our results are the first to break the curse of multiagency of sample complexity for RMGs with large (possibly infinite) state spaces, regardless of the uncertainty set construction.

2605.01518 2026-05-08 cs.RO

VOFA: Visual Object Goal Pushing with Force-Adaptive Control for Humanoids

Zichao Hu, Zifan Xu, Dongsik Chang, He Yin, Linh Tran, Roberto Martín-Martín, Peter Stone, Jingyu Qiao, Joydeep Biswas

详情
英文摘要

The ability to push large objects in a goal-directed manner using onboard egocentric perception is an essential skill for humanoid robots to perform complex tasks such as material handling in warehouses. To robustly manipulate heavy objects to arbitrary goal configurations, the robot must cope with unknown object mass and ground friction, noisy onboard perception, and actuation errors; all in a real-time feedback loop. Existing solutions either rely on privileged object-state information without onboard perception or lack robustness to variations in goal configurations and object physical properties. In this work, we present VOFA, a visual goal-conditioned humanoid loco-manipulation system capable of pushing objects with unknown physical properties to arbitrary goal positions. VOFA consists of a two-level hierarchical architecture with a high-level visuomotor policy and a low-level force-adaptive whole-body controller. The high-level policy processes noisy onboard observations and generates goal-conditioned commands to operate in closed loop across diverse object-goal configurations, while the low-level whole-body controller provides robustness to variations in object physical properties. VOFA is extensively evaluated in both simulation and real-world experiments on the Booster T1 humanoid robot. Our results demonstrate strong performance, achieving over 90% success in simulation and over 80% success in real-world trials. Moreover, VOFA successfully pushes objects weighing up to 17kg, exceeding half of the Booster T1's body weight.

2605.01355 2026-05-08 cs.CV cs.AI

AgriKD: Cross-Architecture Knowledge Distillation for Efficient Leaf Disease Classification

Minh-Dung Le, Minh-Duc Hoang, Hoang-Vu Truong, Thi-Thu-Hong Phan

Comments 47 pages, 14 figures

详情
英文摘要

Automated leaf disease classification is critical for early disease detection in resource-constrained field environments. Vision Transformers (ViTs) provide strong representation capability by modeling long-range dependencies and inter-class relationships; however, their high computational cost makes them impractical for deployment on edge devices. As a result, existing approaches struggle to effectively transfer these rich representations to lightweight models. This paper introduces AgriKD, a cross-architecture knowledge distillation framework for efficient edge deployment, which transfers knowledge from a Vision Transformer (ViT) teacher to a compact convolutional student model. To bridge the representational gap between Transformer and CNN architectures, the proposed approach integrates multiple distillation objectives at the output, feature, and relational levels, where each objective captures a different aspect of the teacher knowledge. This enables the student model to better preserve and utilize transformer-derived global representations. Experiments on multiple leaf disease datasets show that the distilled student achieves performance comparable to the teacher while significantly improving efficiency, reducing model parameters by approximately 172 times, computational cost by 47.57 times, and inference latency by 18-22 times. Furthermore, the optimized model is deployed across multiple runtime formats, including ONNX, TFLite Float16, and TensorRT FP16, achieving consistent predictive performance with negligible accuracy degradation. Real-world deployment on NVIDIA Jetson edge devices and a mobile application demonstrates reliable real-time inference, highlighting the practicality of AgriKD for AI-powered agricultural applications in resource-constrained environments.

2605.01327 2026-05-08 cs.AI cs.LG

Segment-Aligned Policy Optimization for Multi-Modal Reasoning

Lei Gao, Zhuoming Li, Mengxi Jia, Jiakang Yuan, Hongbo Sun, Hao Sun, Xuelong Li

详情
英文摘要

Existing reinforcement learning approaches for Large Language Models typically perform policy optimization at the granularity of individual tokens or entire response sequences. However, such formulations often misalign with the natural step-wise structure of reasoning processes, leading to suboptimal credit assignment and unstable training in multi-modal reasoning tasks. To bridge this gap, we propose Segment-Aligned Policy Optimization (SAPO), a novel reinforcement learning paradigm that treats coherent reasoning steps, rather than tokens or full sequences as fundamental units of policy update. SAPO introduces a step-wise Markov decision process abstraction over reasoning segments, accompanied by segment-level value estimation, advantage computation, and importance sampling mechanisms that are semantically aligned with reasoning boundaries. Experiments on representative reasoning benchmarks demonstrate that SAPO consistently outperforms token-level and sequence-level policy optimization methods, achieving significant accuracy improvements while exhibiting better training stability and value estimation consistency. Our work underscores the importance of aligning reinforcement learning updates with the intrinsic structure of reasoning, paving the way for more efficient and semantically grounded policy optimization in complex reasoning tasks. Codes and models will be released to ensure full reproducibility.

2605.01203 2026-05-08 cs.AI cs.CL

GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models

Zhouhao Sun, Xuan Zhang, Xiao Ding, Bibo Cai, Li Du, Kai Xiong, Xinran Dai, Fei Zhang, weidi tang, Zhiyuan Kan, Yang Zhao, Bing Qin, Ting Liu

详情
英文摘要

Currently, process reward models (PRMs) have exhibited remarkable potential for test-time scaling. Since large language models (LLMs) regularly generate flawed intermediate reasoning steps when tackling a broad spectrum of reasoning and decision-making tasks, PRMs are required to possess capabilities for detecting process-level errors in real-world scenarios. However, existing benchmarks primarily focus on mathematical reasoning, thereby failing to comprehensively evaluate the error detection ability of PRMs across diverse reasoning scenarios. To mitigate this gap, we introduce GR-Ben, a process-level benchmark specifically designed for assessing PRM's performance across two primary reasoning domains (science and logic) and nine subdomains. We conduct extensive experiments on a diverse set of 22 models, encompassing both PRMs and LLMs, and derive two key findings: (1) In domains beyond mathematical reasoning, the error-detection ability of existing PRMs and LLMs is found to be markedly weaker by comparison.(2) In general, PRMs are less adept at identifying knowledge-based errors, whereas LLMs exhibit poorer performance in detecting computational errors. We hope GR-Ben can foster future researches on PRMs for general domains, thereby enhancing the reasoning capabilities of LLMs.

2605.01120 2026-05-08 cs.AI math.CO

New Bounds for Zarankiewicz Numbers via Reinforced LLM Evolutionary Search

Jay Bhan, Nicole Nobili, Patrick Langer

Comments *Jay Bhan and Nicole Nobili contributed equally to this work as first authors, and their order was determined via coin flip

详情
英文摘要

The Zarankiewicz number $\textbf{Z}(m, n, s, t)$ is the maximum number of edges in a bipartite graph $G_{m, n}$ such that there is no complete $K_{s, t}$ bipartite subgraph. We determine for the first time the exact values of three Zarankiewicz numbers: $\textbf{Z}(11, 21, 3, 3)=116$, $\textbf{Z}(11, 22, 3, 3)=121$, and $\textbf{Z}(12, 22, 3, 3)=132$. We further establish lower bounds for 41 more Zarankiewicz numbers, including several that are within one edge of the best known upper bound, and we match the established value in four more closed cases. Our results are obtained using OpenEvolve, an open-source evolutionary algorithm based on Large Language Models (LLMs) that iteratively improves algorithms for generating mathematical constructions by optimizing a reward signal which we tailored for this specific problem. These findings provide new extremal graph constructions and demonstrate the potential of LLM-guided evolutionary search to contribute to mathematical research. In addition to presenting the resulting constructions, we report the generation algorithms produced, describe the relevant implementation details, and provide our computational costs. Our costs are remarkably low, at less than \$30 for each Zarankiewicz parameter combination, showing that LLM-guided evolutionary search can be an inexpensive, reproducible, and accessible tool for discovering new combinatorial constructions.

2605.00847 2026-05-08 cs.CL cs.AI cs.LG

H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models

Cutter Dawes, Aryan Sharma, Angelos Ioannis Lagos, Shivam Raval

详情
英文摘要

Representing and navigating hierarchy is a fundamental primitive of reasoning. Large language models have demonstrated proficiency in a wide variety of tasks requiring hierarchical reasoning, but there exists limited analysis on how the models geometrically represent the necessary latent constructions for such thinking. To this end, we develop H-probes, a collection of linear probes that extract hierarchical structure, specifically depth and pairwise distance, from latent representations. In synthetic tree traversal tasks, the H-probes robustly find the subspaces containing hierarchical structure necessary to complete the tasks; furthermore, in comprehensive ablation experiments, we show that these hierarchy-containing subspaces are low-dimensional, causally important for high task performance, and generalize within- and out-of-domain. Furthermore, we find analogous, though weaker, hierarchical structure in real-world hierarchical contexts such as mathematical reasoning traces. These results demonstrate that models represent hierarchy not only at the level of syntax and concepts, but at deeper levels of abstraction -- including the reasoning process itself.

2605.00742 2026-05-08 cs.AI cs.LG stat.ML

Position: agentic AI orchestration should be Bayes-consistent

Theodore Papamarkou, Pierre Alquier, Matthias Bauer, Wray Buntine, Andrew Davison, Gintare Karolina Dziugaite, Maurizio Filippone, Andrew Y. K. Foong, Vincent Fortuin, Dimitris Fouskakis, Jes Frellsen, Eyke Hüllermeier, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Nikita Kotelevskii, Salem Lahlou, Yingzhen Li, Fang Liu, Clare Lyle, Thomas Möllenhoff, Konstantina Palla, Maxim Panov, Yusuf Sale, Kajetan Schweighofer, Artem Shelmanov, Siddharth Swaroop, Martin Trapp, Willem Waegeman, Andrew Gordon Wilson, Alexey Zaytsev

Comments Accepted for publication at ICML 2026

详情
英文摘要

LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which tool to call, which expert to consult, or how many resources to invest. While the usefulness and feasibility of Bayesian approaches remain unclear for LLM inference, this position paper argues that the control layer of an agentic AI system (that orchestrates LLMs and tools) is a clear case where Bayesian principles should shine. Bayesian decision theory provides a framework for agentic systems that can help to maintain beliefs over task-relevant latent quantities, to update these beliefs from observed agentic and human-AI interactions, and to choose actions. Making LLMs themselves explicitly Bayesian belief-updating engines remains computationally intensive and conceptually nontrivial as a general modeling target. In contrast, this paper argues that coherent decision-making requires Bayesian principles at the orchestration level of the agentic system, not necessarily the LLM agent parameters. This paper articulates practical properties for Bayesian control that fit modern agentic AI systems and human-AI collaboration, and provides concrete examples and design patterns to illustrate how calibrated beliefs and utility-aware policies can improve agentic AI orchestration.