arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.05219 2026-05-08 cs.LG cs.AI

Sparse Prefix Caching for Hybrid and Recurrent LLM Serving

Mikhail Shirokikh, Sergey Nikolenko

详情

英文摘要

Prefix caching is a key latency optimization for autoregressive LLM serving, yet existing systems assume dense per-token key/value reuse. State-space models change the structure of the problem: a recurrent layer can resume from a single stored state rather than requiring the entire token history. This asymmetry opens a new design point between no reuse and dense caching: store exact recurrent states at a sparse set of checkpoint positions and, on a cache hit, resume from the deepest stored checkpoint and recompute the remaining suffix exactly. We formalize sparse prefix caching as checkpoint placement under a distribution over overlap depths, yielding an exact O(NM) dynamic program. For use cases where requests share a non-trivial prefix (e.g. asking different questions about a single long document), we show that our method consistently improves the Pareto frontier traced by standard heuristics on real-world data. Across QuALITY and System Prompts, distribution-aware placement dominates every fixed-budget baseline on the measured layer-group Pareto frontier and matches or outperforms the strongest heuristic (block caching) while typically using substantially fewer checkpoints, with the largest gains at low checkpoint budgets where the overlap distribution is most non-uniform. The method is most relevant when many requests share a substantial but not identical prefix within a retained cache entry. It preserves exact outputs, does not change the recurrent computation itself or require new recurrent update kernels, applies to recurrent/SSM layers whose hidden state can be extracted and restored exactly, and for hybrid models can be combined with existing KV-cache compression techniques.

URL PDF HTML ☆

赞 0 踩 0

2605.05218 2026-05-08 cs.LG cs.AI math.DS nlin.CD

Horizon-Constrained Rashomon Sets for Chaotic Forecasting

Gauri Kale, Rahul Vishwakarma, Holly Diamond, Ava Hedayatipour, Amin Rezaei

2605.05217 2026-05-08 cs.LG cs.AI

Physics-Informed Neural Networks with Learnable Loss Balancing and Transfer Learning

Reza Pirayeshshirazinezhad

2605.05216 2026-05-08 cs.LG

SAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees

Yi Xie, Yangyang Xu, Yi Fan, Bo Liu

Comments Published at AAMAS 2026

2605.05215 2026-05-08 cs.CV cs.AI cs.LG

Layout-Aware Representation Learning for Open-Set ID Fraud Discovery

Jinxing Li, Nicholas Ren, Cathy Chang, Hongkai Pan, Daniel George

2605.05213 2026-05-08 cs.LG q-bio.QM

Nationwide EHR-Based Chronic Rhinosinusitis Prediction Using Demographic-Stratified Models

Sicong Chang, Yidan Shen, Justina Varghese, Akshay R Prabhakar, Sebastian Guadarrama-Sistos-Vazquez, Jiefu Chen, Masayoshi Takashima, Omar G. Ahmed, Renjie Hu, Xin Fu

Comments Sicong Chang, Yidan Shen are the co-first authors This paper is already accepted to IEEE Engineering in Medicine and Biology Society (EMBC) 2026 conference

2605.05209 2026-05-08 cs.LG cs.AI

Are Flat Minima an Illusion?

Michael Timothy Bennett

2605.05208 2026-05-08 cs.RO cs.DC math.OC

A GPU-Accelerated Hybrid Method for a Class of Multi-Depot Vehicle Routing Problems

Zhenyu Lei, Jin-Kao Hao

2605.05102 2026-05-08 cs.LG stat.ML

Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning

Harin Lee, Min-hwan Oh

Comments Accepted at the Conference of Learning Theory (COLT) 2026

2605.04830 2026-05-08 cs.LG cond-mat.stat-mech

Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models

Yifan F. Zhang, Fangjun Hu, Guangkuo Liu, Mert Okyay, Xun Gao

Comments 20 pages, 10 figures. comments are welcome

2605.04719 2026-05-08 cs.CL

Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL

Yaxun Dai, Baolin Sun, Junying Wang, Pengfei Wang, Yingqi Gao, Xuemei Dong, Mengdie Chu, Xiang Qi, Pingfu Chao

2605.04412 2026-05-08 cs.CV

Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion

Yiran Qiao, Yiren Lu, Yunlai Zhou, Disheng Liu, Linlin Hou, Rui Yang, Yu Yin, Jing Ma

2605.04282 2026-05-08 cs.LG

Hardware-Aware Neural Feature Extraction for Resource-Constrained Devices

Francesco Tosini, Simone Pedroni, Christian Veronesi, Pietro Bartoli, Andrea Giudici, Marco Paracchini, Marco Marcon, Diana Trojaniello

Comments This paper has been accepted for publication at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. \c{opyright}IEEE

2605.04066 2026-05-08 cs.CL cs.ET cs.LG

Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning

Yiming Huang, Zhenbo Shi, Shuzheng Gao, Cuiyun Gao, Peiyi Han, Chuanyi Liu

Comments Accepted to ACL 2026 (Findings)

2605.04065 2026-05-08 cs.CL cs.ET cs.LG

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

Yiming Huang, Zhenbo Shi, Xin-Cheng Wen, Jichuan Zeng, Cuiyun Gao, Peiyi Han, Chuanyi Liu

Comments Accepted by ACL 2026

2605.04057 2026-05-08 cs.LG cs.AI

Structured Progressive Knowledge Activation for LLM-Driven Neural Architecture Search

Zhen Liu, Yuhan Liu, Jinjun Wang, Wei Song, Jianyi Liu, Jingwen Fu

2605.03989 2026-05-08 cs.AI

An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration

Dutao Zhang, Tian Liao

Comments Preprint. 6 pages, 1 figure, 3 tables

2605.03749 2026-05-08 cs.CV

FluxFlow: Conservative Flow-Matching for Astronomical Image Super-Resolution

Shuhong Liu, Xining Ge, Ziteng Cui, Liuzhuozheng Li, Gengjia Chang, Jun Liu, Ziying Gu, Dong Li, Xuangeng Chu, Lin Gu, Tatsuya Harada

2605.03379 2026-05-08 cs.LG cs.CL

Two Calls, Two Moments, and the Vote-Accuracy Curve of Repeated LLM Inference

Yi Liu

2605.03354 2026-05-08 cs.AI

What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis

Xutao Mao, Jinman Zhao, Gerald Penn, Cong Wang

2605.03227 2026-05-08 cs.AI

Evaluating Prompting and Execution-Based Methods for Deterministic Computation in LLMs

Hongkun Yu

Comments 8 pages, 1 figure. Code and dataset available at https://github.com/bigbird231/llm-exact-computation-dataset

2605.03222 2026-05-08 cs.LG stat.ML

Beyond Activation Alignment: The Geometry of Neural Sensitivity

Amirhossein Yavari, Farnaz Zamani Esfahlani

Comments 9 pages, 4 figures

详情

英文摘要

Activation-alignment measures such as Representational Similarity Analysis (RSA), Canonical Correlation Analysis (CCA), and Centered Kernel Alignment (CKA) are widely used to compare biological and artificial neural representations. Recent theoretical work interprets many of these methods as assessing agreement between optimal linear readouts over broad families of global tasks. However, agreement at the level of global readouts does not determine how a system uses local stimulus evidence. Specifically, representations may align in activation space yet differ in their sensitivity to small perturbations. To address this challenge, we introduce a complementary framework based on local decodable information, which focuses on a representation's ability, under noise, to discriminate small perturbations within a specified stimulus-coordinate subspace. Building on Fisher information and local representation geometry, we summarize each representation using the expected projected pullback/Fisher metric over that subspace. This formulation induces a second-moment family of local discrimination tasks, for which the resulting operator provides a minimal, complete dataset-level summary of expected discriminability. We compare these regularized signatures using a log-spectral distance on the manifold of symmetric positive definite (SPD) matrices, yielding the Spectral Riemannian Alignment Score (S-RAS) and a uniform multiplicative certificate over the corresponding family of lifted task values. Empirically, this framework enables the recovery of corresponding layers across independently trained artificial neural networks, supports transferable class-conditional probes, reveals controlled dissociations between standard and robust training, and uncovers stimulus-coordinate family effects across mouse visual cortex using the Allen Brain Observatory static gratings dataset.

URL PDF HTML ☆

赞 0 踩 0

2605.03125 2026-05-08 cs.LG

Taming the Curses of Multiagency in Robust Markov Games with Large State Space through Linear Function Approximation

Jingchu Gai, Laixi Shi

2605.01518 2026-05-08 cs.RO

VOFA: Visual Object Goal Pushing with Force-Adaptive Control for Humanoids

Zichao Hu, Zifan Xu, Dongsik Chang, He Yin, Linh Tran, Roberto Martín-Martín, Peter Stone, Jingyu Qiao, Joydeep Biswas

2605.01355 2026-05-08 cs.CV cs.AI

AgriKD: Cross-Architecture Knowledge Distillation for Efficient Leaf Disease Classification

Minh-Dung Le, Minh-Duc Hoang, Hoang-Vu Truong, Thi-Thu-Hong Phan

Comments 47 pages, 14 figures

2605.01327 2026-05-08 cs.AI cs.LG

Segment-Aligned Policy Optimization for Multi-Modal Reasoning

Lei Gao, Zhuoming Li, Mengxi Jia, Jiakang Yuan, Hongbo Sun, Hao Sun, Xuelong Li

2605.01203 2026-05-08 cs.AI cs.CL

GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models

Zhouhao Sun, Xuan Zhang, Xiao Ding, Bibo Cai, Li Du, Kai Xiong, Xinran Dai, Fei Zhang, weidi tang, Zhiyuan Kan, Yang Zhao, Bing Qin, Ting Liu

2605.01120 2026-05-08 cs.AI math.CO

New Bounds for Zarankiewicz Numbers via Reinforced LLM Evolutionary Search

Jay Bhan, Nicole Nobili, Patrick Langer

Comments *Jay Bhan and Nicole Nobili contributed equally to this work as first authors, and their order was determined via coin flip

2605.00847 2026-05-08 cs.CL cs.AI cs.LG

H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models

Cutter Dawes, Aryan Sharma, Angelos Ioannis Lagos, Shivam Raval

2605.00742 2026-05-08 cs.AI cs.LG stat.ML

Position: agentic AI orchestration should be Bayes-consistent

Theodore Papamarkou, Pierre Alquier, Matthias Bauer, Wray Buntine, Andrew Davison, Gintare Karolina Dziugaite, Maurizio Filippone, Andrew Y. K. Foong, Vincent Fortuin, Dimitris Fouskakis, Jes Frellsen, Eyke Hüllermeier, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Nikita Kotelevskii, Salem Lahlou, Yingzhen Li, Fang Liu, Clare Lyle, Thomas Möllenhoff, Konstantina Palla, Maxim Panov, Yusuf Sale, Kajetan Schweighofer, Artem Shelmanov, Siddharth Swaroop, Martin Trapp, Willem Waegeman, Andrew Gordon Wilson, Alexey Zaytsev

Comments Accepted for publication at ICML 2026