arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.26898 2026-04-30 math.PR cs.LG stat.ML

Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models

Andrea Agazzi, Giuseppe Bruno, Eloy Mosig García, Samuele Saviozzi, Marco Romito

Comments 55 pages, 6 figures

2604.26881 2026-04-30 cs.DC cs.LG

FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving

Minghe Wang, Trever Schirmer, Mohammadreza Malekabbasi, David Bermbach

Comments Accepted for publication in the 9th International Workshop on Edge Systems, Analytics and Networking (EdgeSys 2026)

2604.26851 2026-04-30 cs.CY cs.AI

Resume-ing Control: (Mis)Perceptions of Agency Around GenAI Use in Recruiting Workflows

Sajel Surati, Rosanna Bellini, Emily Black

Comments 22 pages, 3 tables, submitted January 2026, accepted March 2026

2604.26834 2026-04-30 quant-ph cs.LG

Quantum Feature Selection with Higher-Order Binary Optimization on Trapped-Ion Hardware

Carlos Flores-Garrigós, Anton Simen, Qi Zhang, Enrique Solano, Narendra N. Hegade, Sayonee Ray, Claudio Girotto, Jason Iaconis, Martin Roetteler

2604.26703 2026-04-30 cond-mat.mtrl-sci cs.AI physics.comp-ph

A self-evolving agent for explainable diagnosis of DFT-experiment band-gap mismatch

Yue Li, Bijun Tang

2604.26675 2026-04-30 quant-ph cs.LG

Parameterized Quantum Circuits as Feature Maps: Representation Quality and Readout Effects in Multispectral Land-Cover Classification

Ralntion Komini, Aikaterini Mandilara, Georgios Maragkopoulos, Dimitris Syvridis

2604.26673 2026-04-30 stat.ML cs.LG

Laplace Approximation for Bayesian Tensor Network Kernel Machines

Albert Saiapin, Kim Batselier

Comments 19 pages, 3 figures, 6 tables. Code available at: https://github.com/AlbMLpy/laplace-tnkm

2604.26664 2026-04-30 eess.IV cs.CV physics.optics

Circular Phase Representation and Geometry-Aware Optimization for Ptychographic Image Reconstruction

Carson Yu Liu, Jun Cheng, Chien-Chun Chen, Steve F. Shu

2604.26651 2026-04-30 cs.IR cs.LG

The Bandit's Blind Spot: The Critical Role of User State Representation in Recommender Systems

Pedro R. Pires, Gregorio F. Azevedo, Rafael T. Sereicikas, Pietro L. Campos, Tiago A. Almeida

Comments Published in SAC'26, 8 pages, 2 figures

2604.26649 2026-04-30 cs.IR cs.AI cs.CL

When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models

Dongxin Guo, Jikun Wu, Siu Ming Yiu

Comments 12 pages, 3 figures, 9 tables. Accepted at SIGIR 2026 (49th International ACM SIGIR Conference on Research and Development in Information Retrieval), Melbourne, Australia

详情

DOI: 10.1145/3805712.3809722

英文摘要

Large reasoning models such as DeepSeek-R1 and OpenAI o1 generate extended chains of thought spanning thousands of tokens, yet their integration with retrieval-augmented generation (RAG) remains fundamentally misaligned. Current RAG systems optimize for providing context before reasoning begins, while reasoning models require evidence injection during multi-step inference chains. We introduce ReaLM-Retrieve, a reasoning-aware retrieval framework that addresses this mismatch through three key innovations: (1) a step-level uncertainty detector that identifies knowledge gaps at reasoning-step granularity rather than token or sentence level; (2) a retrieval intervention policy that learns when external evidence maximally benefits ongoing reasoning; and (3) an efficiency-optimized integration mechanism that reduces per-retrieval overhead by 3.2x compared to naive integration. Experiments on MuSiQue, HotpotQA, and 2WikiMultiHopQA demonstrate that ReaLM-Retrieve achieves on average 10.1% absolute improvement in answer F1 over standard RAG (range: 9.0-11.8% across the three benchmarks) while reducing retrieval calls by 47% compared to fixed-interval approaches like IRCoT (all improvements significant at p<0.01, paired bootstrap). On the challenging MuSiQue benchmark requiring 2-4 hop reasoning, our method achieves 71.2% F1 with an average of only 1.8 retrieval calls per question. Analysis shows that ReaLM-Retrieve also improves retrieval quality itself, achieving 81.3% Recall@5 with consistently higher precision and MRR than fixed-interval baselines on supporting evidence, establishing new state-of-the-art efficiency-accuracy trade-offs for reasoning-intensive retrieval tasks.

URL PDF HTML ☆

赞 0 踩 0

2604.26334 2026-04-30 cs.DC cs.AR cs.LG

Efficient, VRAM-Constrained xLM Inference on Clients

Aditya Ukarande, Deep Shekhar, Marc Blackstein, Ram Rangan

Comments Accepted at MLSys 2026 (Industry Track). 17 pages, 7 figures, 9 tables. Code and artifacts available at: https://github.com/deepshnv/pipeshard-mlsys26-ae

2604.24966 2026-04-30 cs.CY cs.AI

Risk Reporting for Developers' Internal AI Model Use

Oscar Delaney, Sambhav Maheshwari, Joe O'Brien, Theo Bearman, Oliver Guest

Comments 31 pages, 2 figures, 1 table

详情

英文摘要

Frontier AI companies first deploy their most advanced models internally, for weeks or months of safety testing, evaluation, and iteration, before a possible public release. For example, Anthropic recently developed a new class of model with advanced cyberoffense-relevant capabilities, Mythos Preview, which was available internally for at least six weeks before it was publicly announced. This internal use creates risks that external deployment frameworks may fail to address. Legal frameworks, notably California's Transparency in Frontier Artificial Intelligence Act (SB 53), New York's Responsible AI Safety And Education (RAISE) Act, and the EU's General-Purpose AI Code of Practice, all discuss risks from internal AI use. They require frontier developers to make and implement plans for how to manage risks from internal use, and to produce internal use risk reports describing their safeguards and any residual risks. This guide provides a harmonized standard for companies to produce internal use risk reports suitable for all three regulatory frameworks. It is addressed primarily to evaluation and safety teams at frontier AI developers, and secondarily to regulators and auditors seeking to understand what good reporting looks like. Given the pace of AI R&D automation and the limited external visibility into how companies use their most capable models internally, regular and detailed risk reporting may be one of the few mechanisms available to ensure that the risks from internal AI use are identified and managed before they materialize. Whenever a substantially more capable or riskier model is deployed internally, the developer should create a risk report and argue why the model is safe to deploy. We structure the reporting framework around two threat vectors -- autonomous AI misbehavior and insider threats -- and three risk factors for each: means, motive, and opportunity.

URL PDF HTML ☆

赞 0 踩 0

2604.08618 2026-04-30 cs.IR cs.AI cs.SE

SkillForge: Forging Domain-Specific, Self-Evolving Agent Skills in Cloud Technical Support

Xingyan Liu, Xiyue Luo, Linyu Li, Ganghong Huang, Jianfeng Liu, Honglin Qiao

Comments Accepted at ACM SIGIR 2026 Industry Track. 18 pages, 5 figures, 3 tables

详情

DOI: 10.1145/3805712.3808466

英文摘要

Deploying LLM-powered agents in enterprise scenarios such as cloud technical support demands high-quality, domain-specific skills. However, existing skill creators lack domain grounding, producing skills poorly aligned with real-world task requirements. Moreover, once deployed, there is no systematic mechanism to trace execution failures back to skill deficiencies and drive targeted refinements, leaving skill quality stagnant despite accumulating operational evidence. We introduce SkillForge, a self-evolving framework that closes an end-to-end creation-evaluation-refinement loop. To produce well-aligned initial skills, a Domain-Contextualized Skill Creator grounds skill synthesis in knowledge bases and historical support tickets. To enable continuous self-optimization, a three-stage pipeline -- Failure Analyzer, Skill Diagnostician, and Skill Optimizer -- automatically diagnoses execution failures in batch, pinpoints the underlying skill deficiencies, and rewrites the skill to eliminate them. This cycle runs iteratively, allowing skills to self-improve with every round of deployment feedback. Evaluated on five real-world cloud support scenarios spanning 1,883 tickets and 3,737 tasks, experiments show that: (1) the Domain-Contextualized Skill Creator produces substantially better initial skills than the generic skill creator, as measured by consistency with expert-authored reference responses from historical tickets; and (2) the self-evolution loop progressively improves skill quality from diverse starting points (including expert-authored, domain-created, and generic skills) across successive rounds, demonstrating that automated evolution can surpass manually curated expert knowledge.

URL PDF HTML ☆

赞 0 踩 0

2603.02259 2026-04-30 cs.MA cs.LG cs.RO

The Alignment Flywheel: A Governance-Centric Hybrid MAS for Architecture-Agnostic Safety

Elias Malomgré, Pieter Simoens

Comments Accepted for the EMAS workshop at AAMAS 2026

2602.15983 2026-04-30 cs.SE cs.AI cs.LG math.OC

ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization

Junbo Jacob Lian, Yujun Sun, Huiling Chen, Chaoyu Zhang, Hanzhang Qin, Chung-Piaw Teo

Comments Code and benchmark: https://github.com/junbolian/ReLoop

2510.20956 2026-04-30 cs.CR cs.CL

Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training

Zheng-Xin Yong, Stephen H. Bach

Comments Published in The Fourteenth International Conference on Learning Representations (ICLR) 2026

2509.26184 2026-04-30 cs.IR cs.AI cs.CL

Auto-ARGUE: LLM-Based Report Generation Evaluation

William Walden, Marc Mason, Orion Weller, Laura Dietz, John Conroy, Neil Molino, Hannah Recknor, Bryan Li, Gabrielle Kaili-May Liu, Yu Hou, Dawn Lawrie, James Mayfield, Eugene Yang

Comments SIGIR 2026: Demo Track

2509.21382 2026-04-30 eess.AS cs.SD

Multi-Speaker DOA Estimation in Binaural Hearing Aids using Deep Learning and Speaker Count Fusion

Farnaz Jazaeri, Homayoun Kamkar-Parsi, François Grondin, Martin Bouchard

Comments 5 pages, 2 figures, to appear in IEEE ICASSP 2026

2507.00209 2026-04-30 eess.IV cs.AI cs.CV cs.RO

SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures

Fengyi Jiang, Xiaorui Zhang, Lingbo Jin, Ruixing Liang, Yuxin Chen, Adi Chola Venkatesh, Jason Culman, Tiantian Wu, Lirong Shao, Wenqing Sun, Cong Gao, Hallie McNamara, Jingpei Lu, Omid Mohareri

2505.14808 2026-04-30 stat.ML cs.LG math.ST stat.TH

Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective

Soo Min Kwon, Alec S. Xu, Can Yaras, Laura Balzano, Qing Qu

Comments AISTATS 2026

2503.23818 2026-04-30 eess.SY cs.LG cs.SY

L2RU: a Structured State Space Model with prescribed L2-bound

Leonardo Massai, Muhammad Zakwan, Giancarlo Ferrari-Trecate

详情

英文摘要

Structured state-space models (SSMs) have recently emerged as a powerful architecture at the intersection of machine learning and control, featuring layers composed of discrete-time linear time-invariant (LTI) systems followed by pointwise nonlinearities. These models combine the expressiveness of deep neural networks with the interpretability and inductive bias of dynamical systems, offering strong performance on long-sequence tasks with favorable computational complexity. However, their adoption in applications such as system identification and optimal control remains limited by the difficulty of enforcing stability and robustness in a principled and tractable manner. We introduce L2RU, a class of SSMs endowed with a prescribed $\mathcal{L}_2$-gain bound, guaranteeing input--output stability and robustness for all parameter values. The L2RU architecture is derived from free parametrizations of LTI systems satisfying an $\mathcal{L}_2$ constraint, enabling unconstrained optimization via standard gradient-based methods while preserving rigorous stability guarantees. Specifically, we develop two complementary parametrizations: a non-conservative formulation that provides a complete characterization of square LTI systems with a given $\mathcal{L}_2$-bound, and a conservative formulation that extends the approach to general (possibly non-square) systems while improving computational efficiency through a structured representation of the system matrices. Both parametrizations admit efficient initialization schemes that facilitate training long-memory models. We demonstrate the effectiveness of the proposed framework on a nonlinear system identification benchmark, where L2RU achieves improved performance and training stability compared to existing SSM architectures, highlighting its potential as a principled and robust building block for learning and control.

URL PDF HTML ☆

赞 0 踩 0

2207.06229 2026-04-30 stat.ML cs.LG math.FA math.PR math.ST stat.CO stat.TH

Distribution-Free Stochastic Analysis and Robust Multilevel Vector Field Anomaly Detection

Julio E Castrillon-Candas, Michael Rosenbaum, Mark Kon

2604.26632 2026-04-30 nlin.CD cs.LG

Inferring bifurcation diagrams of two distinct chaotic systems by a single machine

Jianmin Guo, Yao Du, Yizhen Yu, Yong Zou, Xingang Wang

Comments 10 pages, 4 figures

2604.26615 2026-04-30 cs.SE cs.AI

TDD Governance for Multi-Agent Code Generation via Prompt Engineering

Tarlan Hasanli, Shahbaz Siddeeq, Bishwash Khanal, Pyry Kotilainen, Tommi Mikkonen, Pekka Abrahamsson

Comments 5 pages. Submitted to the 1st International Workshop on Empirical Prompt Engineering for Software Engineering (PROMPT-SE 2026)

2604.26591 2026-04-30 cs.CE cs.AI

MappingEvolve: LLM-Driven Code Evolution for Technology Mapping

Rongliang Fu, Yi Liu, Qiang Xu, Tsung-Yi Ho

2604.26566 2026-04-30 eess.SY cs.LG cs.SY

Learning to Route Electric Trucks Under Operational Uncertainty

Stavros Orfanoudakis, Ziyan Li, Ruixiao Yang, Nikolay Aristov, Pedro P. Vergara, Chuchu Fan, Elenna Dugundji

Comments Reinforcement Learning, Electric Truck Routing, Freight Transportation, Graph Neural Networks, Stochastic Optimization, Vehicle Routing

2604.26561 2026-04-30 cs.MA cs.AI

Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation

Ariel Sela

Comments 14 pages, 7 tables, 120 deliberations across 2 policy scenarios

2604.26558 2026-04-30 stat.ML cs.LG stat.ME

Deep-testing: the case of dependence detection

Gery Geenens, Pierre Lafaye de Micheaux, Ivan Muyun Zou

2604.26557 2026-04-30 cs.DC cs.AI cs.PF

DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference

Bodon Jeong, Hongsu Byun, Youngjae Kim, Weikuan Yu, Kyungkeun Lee, Jihoon Yang, Sungyong Park

Comments To appear in IEEE International Conference on Distributed Computing Systems (ICDCS) 2026

2604.26555 2026-04-30 cs.DC cs.LG

FloatSOM: GPU-Accelerated, Distributed, Topology-Flexible Self-Organizing Maps

Tony Xu, Sarah Klamt, Katherine Turner, Anne Brustle, Felix Marsh-Wakefield, Givanna Putri