arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2069
2605.30571 2026-06-01 cs.AR cs.AI cs.DC cs.PF cs.RO

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

受限于内存但不受限于带宽:批量1的LLM解码中的物理AI推理差距

Josef Chen

AI总结 本文通过测量不同GPU上批量1的自回归解码性能,发现物理AI推理并非仅受内存带宽限制,还受启动开销影响,并指出量化路径的实际收益取决于运行时实现。

详情
AI中文摘要

物理AI系统,包括机器人、自动驾驶车辆、具身智能体和边缘副驾驶,通常运行与云端LLM服务不同的推理工作负载:单流、批量1的自回归解码,其中一个机器人、摄像头流或用户会话等待下一个token。这种工作负载通常被描述为受内存带宽限制。每个解码步骤都会流式传输模型权重和活跃的KV缓存,因此延迟应与峰值HBM带宽成比例。我们表明这种说法是正确的但不完整。我们测量了三个7至8B类GQA变压器在四个NVIDIA GPU(H100 SXM5、A100-80GB SXM4、L40S和L4)上的批量1解码。我们评估了从2048到16384的上下文长度,在受控的bf16 SDPA设置下产生了44个有效单元。达到的峰值HBM带宽比例随着峰值带宽的增加而下降。在标题性的Qwen-2.5-7B ctx=2048单元中,L4达到了其分析内存下限的大约81%,而H100仅达到27%。物理AI解码是内存主导的,但更快的内存并不能转化为成比例的延迟增益。我们通过CUDA Graphs A/B实验测试了缺失项。在H100上,ctx=2048时,CUDA Graphs在N=10个新会话中将解码延迟提高了1.259倍,95%自助法置信区间为1.253至1.267。在L4上,相同的干预仅提供了1.028倍的提升。这分离出了在快速GPU上可见但在较慢、带宽受限的GPU上基本隐藏的启动侧开销。部署的含义是,只有当运行时实现时,内存节省才重要。在L4上,bf16解码接近内存下限,但常见的量化路径并未恢复预期的4倍权重流量减少:从62.32 ms/step的bf16基线,bnb-nf4达到59.36 ms/step,AutoAWQ+Marlin达到45.24 ms/step。使用Ada调优的int4内核的GPTQ+ExLlamaV2达到17.36 ms/step。

英文摘要

Physical AI systems, including robots, autonomous vehicles, embodied agents and edge copilots, often run a different inference workload from cloud LLM serving: single-stream, batch-1 autoregressive decode, where one robot, camera feed or user session waits on the next token. This workload is usually described as memory-bandwidth-bound. Each decode step streams model weights and the active KV cache, so latency should scale with peak HBM bandwidth. We show that this account is true but incomplete. We measure batch-1 decode for three 7 to 8B-class GQA transformers across four NVIDIA GPUs: H100 SXM5, A100-80GB SXM4, L40S and L4. We evaluate context lengths from 2048 to 16384, producing 44 valid cells under a controlled bf16 SDPA setup. The achieved fraction of peak HBM bandwidth falls as peak bandwidth rises. On the headline Qwen-2.5-7B ctx=2048 cell, an L4 reaches roughly 81 percent of its analytic memory floor, while an H100 reaches only 27 percent. Physical-AI decode is memory-dominated, but faster memory does not translate into proportional latency gains. We test the missing term with a CUDA Graphs A/B experiment. On H100 at ctx=2048, CUDA Graphs improves decode latency by 1.259x across N=10 fresh sessions, with a 95 percent bootstrap confidence interval of 1.253 to 1.267. On L4, the same intervention gives only 1.028x. This isolates a launch-side overhead that becomes visible on fast GPUs but remains mostly hidden on slower, bandwidth-bound GPUs. The deployment implication is that memory savings matter only when the runtime realises them. On L4, bf16 decode sits close to the memory floor, but common quantised paths do not recover the expected 4x weight-traffic reduction: bnb-nf4 reaches 59.36 ms/step and AutoAWQ+Marlin reaches 45.24 ms/step from a 62.32 ms bf16 baseline. GPTQ+ExLlamaV2, with Ada-tuned int4 kernels, reaches 17.36 ms/step.

2605.30532 2026-06-01 stat.CO cs.LG stat.ML

True Self-Avoiding Walk for Accelerating Markov-Chain Monte Carlo Integration

真实自回避行走用于加速马尔可夫链蒙特卡洛积分

Qinghua, Ding, Venkat Anantharam

AI总结 本文提出使用真实自回避行走(TSAW)改进马尔可夫链蒙特卡洛(MCMC)积分估计,通过惩罚过度访问的转移概率,使得经验积分误差达到几乎必然的O(√log t / t)量级,显著优于标准随机游走的t^{-1/2}误差。

详情
AI中文摘要

我们研究真实自回避行走(TSAW)作为一种通过马尔可夫链蒙特卡洛(MCMC)改进经验积分估计的机制。我们考虑与有限集上不可约马尔可夫核$P$(具有平稳分布$π$)相关的有限状态自适应采样动力学,其中转移概率根据经验过度使用而受到惩罚。我们的主要结果是,由此产生的基于TSAW的行走的经验占用计数$L_t(i)$和转移计数$N_t(i,j)$满足\[ L_t(i)-tπ_i = O(\sqrt{\log t}) \quad ext{和}\quad N_t(i,j)-tπ_iP_{ij}=O(\sqrt{\log t}) \qquad ext{几乎必然} \]对于每个状态$i$和每个满足$P_{ij}>0$的边$(i,j)$。因此,对于每个有界函数$f:V o\mathbb R$,我们的积分估计器的误差收敛为\[ \left| rac1t\sum_{s=0}^{t-1} f(X_s)-\sum_{i\in V}π_i f(i) ight| = O\left( rac{\sqrt{\log t}}{t} ight) \qquad ext{几乎必然}. \]这些结果表明,与标准随机游走方法下经验平均的通常$t^{-1/2}$误差标度相比,基于TSAW的估计器产生的经验积分误差几乎必然为$O(\sqrt{\log t}/t)$量级,从而实现了对样本量$t$的显著更尖锐的依赖性。

英文摘要

We study true self-avoiding walk (TSAW) as a mechanism for improving empirical integral estimation via Markov chain Monte Carlo (MCMC). We consider finite-state adaptive sampling dynamics associated with an irreducible Markov kernel $P$ on a finite set, with stationary distribution $π$, in which the transition probabilities are penalized according to empirical overuse. Our main result is that the empirical occupation counts $L_t(i)$ and transition counts $N_t(i,j)$ of the resulting TSAW-based walk satisfy \[ L_t(i)-tπ_i = O(\sqrt{\log t}) \quad\text{and}\quad N_t(i,j)-tπ_iP_{ij}=O(\sqrt{\log t}) \qquad\text{almost surely} \] for every state $i$ and every edge $(i,j)$ with $P_{ij}>0$. Consequently, for every bounded function $f:V\to\mathbb R$, the error of our integral estimator converges as \[ \left|\frac1t\sum_{s=0}^{t-1} f(X_s)-\sum_{i\in V}π_i f(i)\right| = O\left(\frac{\sqrt{\log t}}{t}\right) \qquad\text{almost surely}. \] These results show that, in contrast with the usual $t^{-1/2}$ error scaling for empirical averages under standard random-walk-based methods, TSAW-based estimator yields empirical integral errors of order $O(\sqrt{\log t}/t)$ almost surely, thereby achieving a substantially sharper dependence on the sample size $t$.

2605.30509 2026-06-01 stat.ML cs.AI cs.LG

Improved Distribution Estimation in $\ell_\infty$

在 $\ell_\infty$ 下的改进分布估计

Doron Cohen, Aryeh Kontorovich, Yonatan Livshitz

AI总结 本文在 $\ell_\infty$ 范数下改进了离散概率分布的估计,给出了期望极小极大界和高概率尾界,解决了 Kontorovich 和 Painsky (JMLR, 2025) 提出的开放问题,包括最紧风险界的完全经验版本和最坏情况极值分布的形式,并报告了鼓励性的实证结果。

Comments 24 pages, 3 figures

详情
AI中文摘要

我们提出了在 $\ell_\infty$ 范数下估计离散概率分布的改进界。这些包括期望极小极大界和高概率尾界。我们解决了 Kontorovich 和 Painsky (JMLR, 2025) 提出的一些开放问题——包括他们提出的最紧风险界的完全经验版本以及识别最坏情况极值分布的形式。还报告了鼓励性的实证结果。

英文摘要

We present improved bounds for estimating discrete probability distributions under the $\ell_\infty$ norm. These include minimax bounds in expectation and high-probability tail bounds. We resolve some of the open questions posed in Kontorovich and Painsky (JMLR, 2025) -- including a fully empirical version of the tightest risk bound they presented and identifying the form of the worst-case extremal distribution. Encouraging empirical results are reported as well.

2605.30478 2026-06-01 cs.SE cs.CL

Improving Small Language Models for Code Generation with Reinforcement Learning from Verification Feedback

通过验证反馈的强化学习改进用于代码生成的小型语言模型

Egor Skopin, Evgeny Kotelnikov

AI总结 本研究通过基于验证奖励的强化学习(RLVR)结合LoRA微调,在MBPP基准上对小型语言模型(Qwen3-0.6B和Llama3.2-1B)进行Python代码生成实验,发现奖励设计对功能正确性和行为诊断有显著影响,并提出组合奖励可稳定权衡正确性与风格约束。

Comments Accepted for AINL-2026 conference

详情
AI中文摘要

具有可验证奖励的强化学习(RLVR)使用程序可检查的信号(如单元测试结果)训练语言模型,从而能够直接优化代码生成的功能正确性。我们在MBPP基准上使用两个小型模型(Qwen3-0.6B和Llama3.2-1B)结合LoRA微调,对Python代码生成进行了RLVR的实证研究。在多种奖励公式(如:仅单元测试奖励、通过Ruff linter的仅静态分析塑形、以及组合奖励)下,我们比较了基于组的策略优化变体(GRPO和GSPO),并评估了功能正确性和行为诊断。在我们的实验设置中,RLVR在提出的组合奖励配置下将MBPP测试上的pass@1提高了最多13个百分点。然而,我们发现奖励塑形可能引发系统性行为偏移:仅使用静态分析惩罚可能使策略偏向于生成更短的代码,从而减少lint错误,但未能可靠地提高功能正确性。相比之下,组合奖励减轻了这种退化,并在正确性和风格约束之间产生了更稳定的权衡。总体而言,我们的结果强调,RLVR在代码生成中的有效性对奖励设计和优化粒度高度敏感,并且除了pass@1之外的诊断指标(包括生成长度、Ruff严重性分布和执行错误类型)有助于识别失败模式。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) trains language models using programmatically checkable signals such as unit-test outcomes, enabling direct optimization for functional correctness in code generation. We conduct an empirical study of RLVR for Python code generation on the MBPP benchmark using two small models (Qwen3-0.6B and Llama3.2-1B) with LoRA fine-tuning. Across multiple reward formulations such as: unit-test-only rewards, static-analysis-only shaping via the Ruff linter, and a combined reward, we compare group-based policy optimization variants (GRPO and GSPO) and evaluate both functional correctness and behavioral diagnostics. In our experimental setting, RLVR improves pass@1 on MBPP test by up to 13 percentage points under proposed combined reward configuration. However, we find that reward shaping can induce systematic behavioral shifts: using only static-analysis penalties may bias the policy toward shorter completions that reduce lint errors without reliably improving functional correctness. In contrast, combined rewards mitigate this degeneration and yield more stable trade-offs between correctness and style constraints. Overall, our results highlight that RLVR effectiveness for code generation is highly sensitive to reward design and optimization granularity, and that diagnostics beyond pass@1, including generation length, Ruff severity profiles, and execution error types are useful for identifying failure modes.

2605.30476 2026-06-01 cs.IT cs.CR cs.LG math.IT

Local Differential Privacy with Correlated Noise Achieves Central-DP Optimal Cost

具有相关噪声的本地差分隐私实现中心DP最优成本

Madhura Pathegama, Srikanth Avasarala, Viveck R. Cadambe, Juba Ziani

AI总结 通过设计本地噪声之间的相关性,构造ε-差分隐私机制,使得估计成本与中心化设置下的最优成本匹配,差距可任意小。

详情
AI中文摘要

我们研究在存在诚实但好奇的服务器的情况下,私有地估计n个用户持有的值的总和。这要求不仅在数据发布时,而且在服务器端计算过程中也要保证隐私。因此,我们采用本地(纯)差分隐私模型,其中每个用户传输一个噪声扰动值。众所周知,与中心化模型(仅在聚合后添加噪声)相比,独立的本地噪声通常会导致显著的效用损失。我们表明,这种差距并非根本性的。通过精心设计本地添加的噪声变量之间的相关性,我们构造了ε-DP机制,其估计成本与中心化设置下可实现的最优成本匹配,误差可任意小。

英文摘要

We study privately estimating the sum of $n$ user-held values in the presence of an honest-but-curious server. This motivates requiring privacy not only at data release but also throughout server-side computation. We therefore adopt the local (pure) differential privacy model, in which each user transmits a noise-perturbed value. It is well known that independent local noise typically incurs a substantial utility loss compared to the centralized model, where noise is added only after aggregation. We show that this gap is not fundamental. By carefully designing correlations among the locally added noise variables, we construct $\varepsilon$-DP mechanisms whose estimation cost matches the optimal cost achievable in the centralized setting, up to an arbitrarily small error.

2605.30454 2026-06-01 cs.CR cs.AI

The Surface You Test Is Not the Surface That Breaks

测试的表面并非断裂的表面

Shifat E Arman, Syed Nazmus Sakib, Nafiul Haque, Shahrear Bin Amin

AI总结 本文发现工具增强的LLM代理对提示注入的脆弱性依赖于攻击表面(工具输出 vs 工具描述),提出自适应攻击率并强调评估需报告每个表面的脆弱性。

Comments 8 Figures, 8 Tables, Under Review at EMNLP

详情
AI中文摘要

工具增强的LLM代理容易受到提示注入攻击:控制代理上下文部分的第三方可以植入指令,代理随后执行这些指令,仿佛它们来自用户。当前的评估在每个模型的一个通道(工具输出)上报告单一的攻击成功率,并将该数字视为模型的脆弱性。但工具描述(代理在每次调用工具前都会读取)本身是一个攻击者可以选择替代的注入表面。我们保持注入载荷字节相同,并通过两个表面在来自六个家族的13个LLM和四个任务套件中传递。相同的字节在模型间的成功率上出现反转:GPT-4.1在工具输出上脆弱性为96%,但在工具描述上仅为4%,而GEMINI-3-FLASH则呈现镜像模式,分别为20%和98%。对6,830次尝试的方差分解显示,攻击结果的变化中仅有0%可归因于表面本身,而模型-表面交互则占16.7%。脆弱性是配对的性质,而非通道的性质。自适应攻击率(定义为每个单元在表面上的最大值)平均比最强的固定表面基线高出9.1个百分点。标准的提示级防御继承了相同的盲点,将工具输出的ASR降低到10-18%,而描述通道仍高于54%。攻击和防御评估都必须报告每个表面的脆弱性。

英文摘要

Tool-augmented LLM agents are vulnerable to prompt injection: a third party who controls part of the agent's context can plant instructions that the agent then executes as if they came from the user. Current evaluations report a single attack success rate per model on one channel, the tool output and treat that number as the model's vulnerability. But tool descriptions, which the agent reads at every turn before any tool is called, are themselves an injection surface that the attacker can choose instead. We hold the injection payload byte-identical and deliver it through both surfaces across 13 LLMs from six families and four task suites. The same bytes invert in success rate across models: GPT-4.1 is 96 percent vulnerable on tool outputs but only 4 percent on tool descriptions, while GEMINI-3-FLASH shows the mirror pattern at 20 percent and 98 percent. A variance decomposition over 6,830 attempts attributes 0 percent of the variation in attack outcomes to the surface alone, while the model-surface interaction accounts for 16.7 percent. Vulnerability is a property of the pairing, not the channel. The Adaptive Attack Rate, defined as the per-cell maximum over surfaces, exceeds the strongest fixed-surface baseline by +9.1 percentage points on average. Standard prompt-level defenses inherit the same blindspot, reducing tool-output ASR to 10-18 percent while leaving the description channel above 54 percent. Both attack and defense evaluation must report per-surface vulnerability.

2605.30453 2026-06-01 hep-ph cs.LG physics.data-an

Generative Models and Statistical Validation

生成模型与统计验证

Sascha Diefenbacher, Sofia Palacios Schweitzer, Gregor Kasieczka

AI总结 本文介绍现代生成网络的框架,并讨论量化其准确性、精度和统计能力的挑战。

Comments 36 pages, 4 figures, Part of the VERaiPHY Initiative

详情
AI中文摘要

生成式机器学习已成为理论和实验物理中的重要工具,特别是在快速代理和密度估计的背景下。在这项工作中,我们首先介绍现代生成网络的基本框架,然后讨论量化其准确性、精度和统计能力的挑战。

英文摘要

Generative machine learning has become an essential tool in theoretical and experimental physics, especially in the context of fast surrogates and density estimators. In this work, we first introduce the underlying framework of modern generative networks and then discuss challenges in quantifying their accuracy, precision, and statistical power.

2605.30429 2026-06-01 quant-ph cs.LG

Attention-based optimizer for symmetry finding

基于注意力的对称性发现优化器

Shreya Banerjee, Vinodh Raj Rajagopal Muthu, Charlie Nation, Rick P. A. Simon, Francesco Martini, Alessandro Ricottone, Federico Cerisola, Luca Dellantonio

AI总结 提出一个基于Set-Transformer架构的优化框架,利用自注意力编码Pauli字符串间的相关性,并通过自定义对易目标优化,以高概率发现哈密顿量的Pauli对称性。

Comments 9+4 pages, 2 Figures, Comments welcome

详情
AI中文摘要

发现对称性对于理解物理模型至关重要。在这项工作中,我们提出了一个优化框架,用于搜索哈密顿量的Pauli对称性,融合了机器学习与自动对称性发现领域。该框架基于Set-Transformer架构,利用自注意力编码Pauli字符串之间的成对和高阶相关性。然后将这些关系解码为候选对称性,并通过基于对易的自定义目标进一步优化,映射到输入哈密顿量的对称性。我们将该方法应用于随机Pauli哈密顿量、一维和二维周期横向场伊辛模型以及Toric码。结果表明,对于物理哈密顿量(伊辛和Toric),我们的框架以接近确定性的概率成功,同时与最先进策略相比提供了显著优势。对于随机Pauli哈密顿量,我们估计了在固定设计规格下以高成功概率找到对称性所需的计算资源,特别是并行启动次数和GPU数量。

英文摘要

Finding symmetries is crucial for understanding physical models. In this work, we present an optimization framework that searches Pauli symmetries of Hamiltonians, merging the fields of machine learning with automated symmetry finding. Built on a Set-Transformer architecture, our framework uses self-attention to encode the pairwise and higher-order correlations among the Pauli-Strings. The relations are then decoded as a candidate, which is further optimized with a custom commutation-based objective, and mapped to a symmetry of the input Hamiltonian. We apply our method to random Pauli Hamiltonians, periodic one and two dimensional transverse-field Ising model and the Toric code. We show that for physical Hamiltonians (Ising and Toric), our framework succeeds with near-deterministic probability while providing substantial advantage compared to state-of-the-art strategies. For random Pauli Hamiltonians, we estimate the required computational resources, specifically the number of parallel starts and the number of GPUs, to find a symmetry with high success probability under fixed design specifications.

2605.30406 2026-06-01 cs.CY cs.AI

AI Loss of Control Incident Management: Response & Resilience

AI失控事件管理:响应与韧性

Ross Gruetzemacher

AI总结 针对AI失控事件管理的研究空白,本文提出一个基础框架和分类法,将失控场景分为“极其昂贵”和“不可能”两类,并分别给出韧性投资和主动事件管理(包含遏制与威胁中和)的应对策略。

Comments 25 pages, 4 figures

详情
AI中文摘要

近期研究表明,展示欺骗和抵抗关闭能力的AI系统表明AI失控(LOC)是一个紧迫的政策问题,然而当前文献几乎只关注对齐和预防。为填补这一空白,本文引入了一个用于管理灾难性AI失控事件的基础框架和分类法。该分类法的第一层区分了重新控制“极其昂贵”与“不可能”的场景。虽然不可能的场景需要立即进行韧性投资以从根本上限制AI的攻击面,但极其昂贵的场景需要通过遏制和威胁中和进行主动事件管理。该框架进一步将这些可管理的事件分为意外失控(需要自动断路器响应)和对抗性失控(需要逐步升级的应对措施)。通过将三个严重性等级映射到具体场景矩阵,本文为管理前所未有的AI风险提供了具体且相称的指南。

英文摘要

Recent research demonstrating AI systems exhibiting deception and shutdown resistance suggests that AI loss of control (LOC) is an urgent policy concern , yet current literature focuses almost exclusively on alignment and prevention. To address this gap, this paper introduces a foundational framework and taxonomy for managing catastrophic AI LOC incidents. The taxonomy's first level distinguishes between scenarios where regaining control is 'extremely costly' versus 'impossible'. While impossible scenarios demand immediate resilience investments to fundamentally restrict an AI's attack surface , extremely costly scenarios require active incident management via Containment and Threat Neutralization. The framework further categorizes these manageable events into accidental LOC (requiring automated circuit-breaker responses) and adversarial LOC (requiring graduated escalatory measures). By mapping three severity classes to specific scenario matrices, this paper provides a concrete, proportional guide for managing unprecedented AI risks.

2605.30399 2026-06-01 q-bio.QM cs.LG eess.IV

A Novel Computer Vision Approach for Assessing Fish Responses to Intrusive Objects in Aquaculture

一种用于评估鱼类对水产养殖中侵入性物体反应的新型计算机视觉方法

Hanne-Grete Alvheim, Stian Mjelde Jakobsen, Martin Føre, Eleni Kelasidi

AI总结 本研究提出一种基于YOLOv8、ByteTrack、SuperGlue和三角测量的新型立体视觉方法,用于检测、跟踪和估计鱼类三维位置,以分析不同形状、大小和颜色的结构对鱼类行为的影响。

详情
AI中文摘要

水产养殖业需要应对若干挑战,以确保可持续的海产品生产满足日益增长的全球需求。其中一个主要挑战是确保生产过程中鱼类健康良好和福利可接受,因为改善鱼类福利在当前和未来的生产系统中至关重要。本研究通过开发和实施方法,识别鱼类对侵入性物体的个体和群体行为反应,从而解决这一问题。因此,我们开发了一种检测、跟踪和估计个体鱼类三维位置的新方法,并专门设计用于跟踪工业海水网箱中养殖鱼类的尾鳍。跟踪数据采用一种新型立体视觉方法进行处理,该方法适用于估计鱼类的位置、速度、加速度以及转向和俯仰角。随后,分析了从工业规模养鱼场获得的数据集,以识别不同形状、大小和颜色的结构对鱼类行为的影响。该方法使用手动标注的尾鳍进行训练,并采用YOLOv8结合ByteTrack作为目标检测器和跟踪器,SuperGlue用于匹配左右帧中的检测结果,以及三角测量来重建鱼类的三维位置。测试了不同的图像预处理和增强方法以提高目标检测准确性,并比较了它们的性能,同时测试了RAFT-Stereo用于深度估计。获得的结果既验证了该方法相对于先前研究工作的性能,也展示了该方法在提供对海水网箱中行为动态更深入理解方面的新颖性和潜力。

英文摘要

The aquaculture industry needs to address several challenges to secure sustainable seafood production that can serve an increasing global demand. One major challenge is to ensure good fish health and acceptable welfare during production since the improvement of fish welfare is of vital importance in current and future production systems. In this study, this is addressed by developing and implementing methods to identify fish behaviors in response to intrusive objects both on individual and on a group basis. A novel approach for detecting, tracking, and estimating the 3D position of individual fish has thus been developed, and specifically designed to track the caudal fins of farmed fish in industrial sea cages. The tracking data was subjected to a novel stereo-vision method adapted to estimate fish positions, velocities, accelerations, and turning and pitch angles. Datasets obtained from industrial-scale fish farms were then analyzed to identify the impact of structures of varying shapes, sizes, and colors on fish behavior. The method was trained using manually labeled caudal fins, and used YOLOv8 with ByteTrack as an object detector and tracker, SuperGlue for matching detections in the left and right frames, and triangulation to reconstruct the 3D positions of the fish. Different image pre-processing and augmentation methods for enhancing object detection accuracy were tested and their performance compared, while RAFT-Stereo was tested for depth estimation purposes. The obtained results both validate the method's performance against previous research efforts, and demonstrate the novelty and potential of this method in providing more insight into behavioral dynamics in sea-cages.

2605.30396 2026-06-01 cs.GR cs.LG

Smaller and Faster 3DGS via Post-Training Dictionary Learning

通过训练后字典学习实现更小更快的3DGS

Jiarong Gong, Jonas Unger, Ehsan Miandji

AI总结 提出首个基于字典学习的3DGS后训练压缩框架,无需重新训练即可显著压缩模型、保持图像质量并提升实时渲染速度。

详情
AI中文摘要

3D高斯泼溅(3DGS)是一种有前景的实时渲染神经场景表示方法,但训练后的模型通常占用大量内存,限制了在低性能设备上的部署。现有的压缩技术往往引入额外的可训练参数,虽然实现了出色的压缩比,但会导致图像质量明显下降。在这项工作中,我们首次提出了基于字典学习的3DGS压缩框架。所提出的后训练压缩流程几乎可以应用于任何3DGS模型,无需重新训练或修改现有3DGS模型。我们的压缩框架实现简单,但提供了显著的压缩能力,保持了图像质量,并提升了实时渲染性能。在13个基准场景上,我们的方法应用于3DGS、3DGS-MCMC和PixelGS时,平均压缩比分别达到3.95倍、3.10倍和4.55倍。同时,渲染速度分别持续提升23.3%、24.3%和25.3%,且图像质量保持不变。

英文摘要

3D Gaussian Splatting (3DGS) is a promising neural scene representation for real-time rendering, but trained models often suffer from large memory footprints, limiting deployment on less powerful devices. Existing compression techniques often lead to architectures with several additional trainable parameters. While achieving outstanding compression ratios, they introduce noticeable drops in image quality. In this work, we introduce the first dictionary-learning-based compression framework for 3DGS. The proposed post-training compression pipeline can be deployed in virtually any 3DGS model without the need for re-training or modifications to existing 3DGS models. Our compression framework is straightforward to implement, yet provides significant compression capabilities, preserves image quality, and improves real-time rendering performance. Across 13 benchmark scenes, our approach achieves an average compression ratio of 3.95x, 3.10x, and 4.55x when applied to 3DGS, 3DGS-MCMC, and PixelGS, respectively. This yields consistent rendering speedups of 23.3%, 24.3%, and 25.3%, while maintaining image quality.

2605.30394 2026-06-01 cs.SE cs.AI

CodeGolf Bench: A Multi-Language Benchmark for Evaluating Concise Code Generation Capabilities of Large Language Models

CodeGolf Bench:评估大型语言模型简洁代码生成能力的多语言基准

Vedant Padwal

AI总结 提出CodeGolf Bench基准,基于代码高尔夫竞赛,在60种编程语言中评估LLM生成简洁高效代码的能力,实验表明推理模型显著优于非推理模型。

Comments 12 pages, 6 figures, 5 tables

详情
AI中文摘要

本文介绍了CodeGolf Bench,一个能够评估大型语言模型(LLM)在60种编程语言中生成简洁代码能力的基准。基于代码高尔夫(一种专注于最小字符或字节解决方案的娱乐性编程竞赛),该基准提供了衡量LLM生成高效、简洁代码能力的独特指标。与现有受限于固定问题集和语言覆盖范围的基准不同,CodeGolf Bench利用code.golf平台提供新问题和实时人类表现基线。对九个LLM在Python和C++任务上的评估表明,推理模型显著优于非推理模型,达到了70.97%的最佳平均百分位数。这一性能差距在C++中尤为明显,突显了推理对于具有严格语法要求的语言的重要性。非推理模型在两种语言中的效率优化方面表现更差,最佳百分位数显著低于推理模型。CodeGolf Bench提供了一个动态框架,用于评估LLM在代码高尔夫中针对不断进化的人类表现的代码生成能力。

英文摘要

This paper introduces Code Bench, a benchmark capable of evaluating Large Language Models (LLMs) concise code generation abilities in 60 programming languages. Based on code golf, a recreational programming competition focused on minimal character or byte solutions, the benchmark provides a distinctive measure of LLMs ability to produce efficient, concise code. Unlike existing benchmarks limited by fixed problem sets and language coverage, CodeGolf Bench leverages the code.golf platform to provide new problems and live human performance baselines. Evaluation of nine LLMs on Python and C++ tasks demonstrates that reasoning models significantly outperform non-reasoning models, achieving best average percentile of 70.97%. This performance gap is particularly pronounced in C++, highlighting reasoning's importance for languages with strict syntax requirements. Non-reasoning models struggle more with efficiency optimization across both languages, with best percentiles significantly lower than reasoning counterparts. CodeGolf Bench offers a dynamic framework for evaluating LLM code generation capabilities against evolving human performance on code golf.

2605.30391 2026-06-01 cs.MA cs.AI cs.CL

Social Reasoning in Machines: Investigating Collective Truth-Seeking Dynamics in Large Language Model Debate

机器中的社会推理:探究大语言模型辩论中的集体求真动态

Tom Pecher

AI总结 通过多智能体辩论模拟论证推理理论,证明大语言模型集体辩论能显著提升基于问卷的求真任务性能,并提出利用辩论动态测量模型内在属性的新基准方法。

Comments Master's thesis

详情
AI中文摘要

人类推理长期以来被理论化地认为是通过社会方式运作的,而非孤立的个体认知,而是通过集体对抗性话语,这一框架被称为论证推理理论(ATR)。ATR不依赖个体“理性主义者”作为求真的主要载体,而是将真理重新概念化为社会认识论的涌现属性:即在不完美的个体推理在辩论的对抗压力下精炼的产物。这种集体智能的分布式方法引导人类达到了更高的认知高度,并支撑着所有民主制度的基本原则。本论文首次通过大语言模型(LLM)的多智能体辩论(MAD)模拟ATR,开辟了新领域。通过严格的实证分析,我们证明,在正确设计一组认知多样化的模型时,LLM-MAD能够显著提高基于问卷的求真任务性能,即使单个辩论参与者独立表现有限。此外,我们提供了强有力的实证证据,表明这种性能提升在机制上根植于ATR的核心原则,暗示集体推理可能普遍优于个体推理,而非生物学或进化中的偶然现象。最后,基于对辩论动态的分析,我们提出了一种新的基准测试方法,利用LLM-MAD测量模型内在属性(如幻觉倾向),从而以当前静态基准方法无法支持的方式比较模型。

英文摘要

Human reasoning has long been theorised to operate socially, not through isolated individual cognition, but through collective adversarial discourse, a framework known as the Argumentative Theory of Reasoning (ATR). Rather than relying on individual "intellectualist reasoners" as the primary vehicle for truth-seeking, ATR reconceptualises truth as an emergent property of social epistemology: the product of imperfect individual reasoning refined under the adversarial pressure of debate. This distributed method of collective intelligence has guided humanity to ever-greater epistemic heights and underpins the foundational principles of all democratic systems. This thesis breaks new ground by, for the first time, simulating ATR through the multi-agent debate (MAD) of large language models (LLMs). With rigorous empirical analysis, we demonstrate that, when correctly engineering an epistemically diverse set of models, LLM-MAD can significantly improve truth-seeking performance on questionnaire-based tasks, even when individual debate participants exhibit limited standalone performance. Furthermore, we present strong empirical evidence that this performance gain is mechanistically grounded in the central principles of ATR, suggesting that collective reasoning may be universally favourable over individualist reasoning, rather than a quirk in biology or evolution. Finally, drawing on our analysis of debate dynamics, we propose a novel benchmarking methodology that leverages LLM-MAD to measure intrinsic model properties (such as hallucination propensity) in order to compare models in ways that current static benchmarking approaches cannot support.

2605.30389 2026-06-01 cs.FL cs.LG

The Inclusion Depth of Pattern Languages: An Open Problem in Algorithmic Learning Theory

模式语言的包含深度:算法学习理论中的一个开放问题

Wei Luo

AI总结 本文提出模式语言包含深度(最长严格包含链长度)的计算问题,并猜测其公式为2|p| - #var(p) - 1,该问题连接形式语言、组合词论和极限识别学习。

Comments 2 pages. Open problem from COLT 2005. Generic author-prepared version for arXiv. Originally appeared in Learning Theory, 18th Annual Conference on Learning Theory, COLT 2005, Bertinoro, Italy, June 2005

详情
Journal ref
Learning Theory, 18th Annual Conference on Learning Theory, COLT 2005, Lecture Notes in Artificial Intelligence 3559, Springer, 2005, pp. 689-690
AI中文摘要

模式语言是形式语言理论和算法学习理论中的经典模型。本文提出了计算模式语言包含深度的问题:从通用模式语言到给定模式生成的语言的最长严格包含链的长度。包含深度捕捉了从正数据识别模式的心智变化复杂度。核心开放问题是:对于每个有限字母表Σ(至少两个符号)上的每个模式p,包含深度ID_Σ(p)是否可计算,以及是否可在多项式时间内计算。一个简单的猜想公式ID_Σ(p) = 2|p| - #var(p) - 1将蕴含线性时间算法。该问题连接了模式语言包含、词上的组合学、极限中的语言识别以及有界心智变化学习。

英文摘要

Pattern languages are a classical model in formal language theory and algorithmic learning theory. This note formulates the problem of computing the inclusion depth of a pattern language: the length of the longest strict inclusion chain from the universal pattern language to the language generated by a given pattern. Inclusion depth captures the mind-change complexity of pattern identification from positive data. The central open question is whether the inclusion depth ID_Sigma(p) is computable for every pattern p over every finite alphabet Sigma with at least two symbols, and whether it is computable in polynomial time. A simple conjectured formula, ID_Sigma(p) = 2|p| - #var(p) - 1, would imply a linear-time algorithm. The problem connects pattern language inclusion, combinatorics on words, language identification in the limit, and mind-change-bounded learning.

2605.30375 2026-06-01 physics.flu-dyn cs.AI

Full-field prediction for engineering-scale three-dimensional aircraft with multigrid-hierarchical learning

基于多重网格分层学习的工程尺度三维飞机全场预测

Yunfei Liu, Hao Wang, Yuhang Qi, Hao Yue, Dehong Meng, Wei Li, Rui Wang, Tiejun Li, Jie Liu, Junwu Hong, Xinhai Chen

AI总结 提出MHLF框架,结合拓扑一致的多重网格表示和分层策略,实现工程尺度三维飞机流场的高效高保真预测,加速CFD收敛3-8倍。

详情
AI中文摘要

高保真计算流体动力学对航空航天设计至关重要,但实际三维飞机的工程尺度模拟计算成本高昂。基于学习的流场初始化可以通过减少初始解与收敛解之间的数值距离来提高效率,然而现有的深度学习方法难以扩展到具有多尺度区域异质性的大型三维飞机流场。因此,大多数先前的研究集中在二维问题、表面量、积分气动系数或网格分辨率有限的简化三维案例上。本文提出MHLF,一种多重网格分层学习框架,用于加速工程尺度飞机流场模拟,同时保持高保真数值精度。MHLF将拓扑一致的几何多重网格表示与分层策略相结合,在预测和后续CFD校正过程中捕捉区域流场异质性。在涵盖马赫数0.15至6.0、包括亚声速、跨声速和超声速状态的三个工程尺度飞机案例中,MHLF在不牺牲流场精度的情况下加速收敛,相比传统初始化实现了3至8倍的效率提升。这些结果展示了CFD领域内大型三维飞机实际全场流场预测的能力,并为数据驱动的高保真飞机流场模拟加速奠定了基础。

英文摘要

High-fidelity computational fluid dynamics is essential for aerospace design, but engineering-scale simulations of practical three-dimensional aircraft remain computationally expensive. Learning-based flow-field initialization can improve efficiency by reducing the numerical distance between the initial and converged solutions, yet existing deep learning approaches remain difficult to scale to large three-dimensional aircraft flows with multiscale regional heterogeneity. Most prior studies therefore focus on two-dimensional problems, surface quantities, integral aerodynamic coefficients, or simplified three-dimensional cases with limited grid resolution.Here we propose MHLF, a multigrid-hierarchical learning framework for accelerating engineering-scale aircraft flow simulations while preserving high-fidelity numerical accuracy. MHLF combines a topologically consistent geometric multigrid representation with a hierarchical strategy that captures regional flow heterogeneity during both prediction and subsequent CFD correction. Across three engineering-scale aircraft cases spanning Mach 0.15 to 6.0 and covering subsonic, transonic and supersonic regimes, MHLF accelerates convergence without sacrificing flow-field accuracy, achieving a 3 to 8 times efficiency improvement over conventional initialization. These results demonstrate practical full-flow-field prediction for large three-dimensional aircraft within the CFD domain and provide a foundation for data-driven acceleration of high-fidelity aircraft flow simulation.

2605.30372 2026-06-01 cs.NE cs.AI cs.LG q-bio.NC

Evolutionary Algorithm for Reservoir Learning and Yielding

用于储层学习和生成的进化算法

Julien Testu, Pierrick Legrand, Xavier Hinaut

AI总结 提出进化算法EARLY,通过进化多储层回声状态网络的拓扑和超参数,在时序学习任务上优于随机搜索,并发现任务难度影响网络结构。

详情
Journal ref
GECCO '26 - The Genetic and Evolutionary Computation Conference, Jul 2026, San jos{é}, Costa Rica
AI中文摘要

储层计算是一种递归神经网络,因其将动态处理与训练好的读出层分离而成为时序学习的有前途方法。然而,经典的回声状态网络(ESN)通常需要针对任务调整其架构和超参数才能获得良好性能。本文介绍了EARLY(用于储层学习和生成的进化算法),这是一个旨在进化多储层ESN的拓扑和超参数的框架。受大脑模块化组织的启发,EARLY将架构编码为基于图的基因组,并应用交叉、变异和选择来发现有效的配置。我们的目标是创建通用架构和任务诱导泛化。该方法在CogScale数据集的时序学习任务上进行了评估。结果表明,进化出的架构在多个任务上优于通过随机搜索获得的架构,并根据任务难度表现出结构差异:简单任务产生轻量级架构,而复杂任务倾向于更丰富的模块化组织。这些发现表明,进化搜索有助于为更广泛的时序问题识别可复用的储层结构。进一步在跨情境学习数据集上评估进化出的架构,以评估其适应新环境的能力。

英文摘要

Reservoir computing, a type of recurrent neural network, is a promising approach for temporal learning as it separates dynamic processing from the trained readout layer. However, classical Echo State Networks (ESNs) often require task-specific tuning of their architecture and hyperparameters to achieve good performance. This paper introduces EARLY (Evolutionary Algorithm for Reservoir Learning and Yielding), a framework designed to evolve both the topology and hyperparameters of multi-reservoir ESNs. Inspired by the modular organisation of the brain, EARLY encodes architectures as graph-based genomes and applies crossover, mutation, and selection to discover effective configurations. Our goal is to create both generic architectures and tasks inducing generalization. The method is evaluated on temporal learning tasks from the CogScale dataset. Results show that evolved architectures outperform those obtained with random search on several tasks and exhibit structural differences depending on task difficulty: simpler tasks yield lightweight architectures, while more complex tasks favour richer modular organisations. These findings suggest that evolutionary search can help identify reusable reservoir structures for a broader range of temporal problems. The evolved architectures are further evaluated on a cross-situational learning dataset to assess their ability to adapt to new environments.

2605.30371 2026-06-01 cs.NE cs.LG math.DS

From Mean-Field Limits to Semiclassical Concentration: Global Convergence of the Canonical Evolutionary Strategy

从平均场极限到半经典集中:规范进化策略的全局收敛性

Matías Neto, Nicolás Garay, Luis Martí, Nayat Sanchez-Pi

AI总结 本文通过将规范进化策略建模为受控数学框架,利用薛定谔型复制子-突变子方程的半经典极限,从离散个体动力学到确定性平均场极限建立严格层级,证明全局收敛性由主算子的主特征函数决定,从而为“最平坦者生存”现象提供数学解释,并展示其在高维移位初始化场景中的优势。

详情
AI中文摘要

我们解决了随机连续优化中的全局收敛性问题。为此,我们将规范进化策略(CES)表述为一个受控数学框架,通过薛定谔型复制子-突变子方程的半经典极限来分析进化算法的全局收敛性。我们建立了从离散个体动力学到确定性平均场极限的严格层级,证明全局收敛性由底层算子的主特征函数控制。这一性质(称为几何选择)自然地优先考虑鲁棒、平坦的最优解而非狭窄的局部陷阱,为“最平坦者生存”现象提供了数学依据。此外,与当全局最小点位于初始支撑之外时容易发生过早方差崩溃的共识驱动方法不同,CES的复制子-突变子动力学促进了内在的质量传输。高维基准测试(d=30)证实了这一优势,表明在标准共识驱动和基于梯度的方法无法有效迁移的移位初始化场景中,CES实现了更低的残差误差。通过将焦点从逐点共识转移到谱集中,我们的框架为进化策略(ES)的全局收敛性提供了坚实的理论基础,无需额外的数值启发式方法。

英文摘要

We address the issue of global convergence in stochastic continuous optimization. For that purpose, we formulate the Canonical Evolutionary Strategy (CES) as a controlled mathematical framework to analyze global convergence in evolutionary algorithms via the semiclassical limit of a Schr{ö}dinger-type replicator-mutator equation. We provide a rigorous hierarchy from a discrete individual-based dynamics to a deterministic mean-field limit, demonstrating that global convergence is governed by the principal eigenfunction of the underlying operator. This property, defined as Geometric Selection, naturally prioritizes robust, flat optima over narrow local traps, offering a mathematical justification for the ''survival of the flattest'' phenomenon. Moreover, unlike consensus-driven methods that are prone to premature variance collapse when the global minimizer resides outside the initial support, the replicator-mutator dynamics of CES facilitate intrinsic mass transport. High-dimensional benchmarks (d = 30) confirm this advantage, showing that CES achieves lower residual errors in shifted initialization scenarios where standard consensus-driven and gradient-based methods fail to migrate effectively. By shifting the focus from point-wise consensus to spectral concentration, our framework provides a robust theoretical foundation for global convergence in Evolution Strategies (ES) without the need for additional numerical heuristics.

2605.30368 2026-06-01 cs.NE cs.AI cs.RO q-bio.NC

Reinterpreting Safety Thresholds as Neuron Spiking Thresholds

将安全阈值重新解释为神经元放电阈值

Enrico Del Re, Mohamed Sabry, Cristina Olaverri-Monreal

AI总结 提出将替代安全措施(SSM)的固定阈值重新解释为泄漏积分点火(LIF)神经元的放电阈值,构建脉冲神经网络(SNN)学习人类刹车起始点,实现客观SSM与主观安全感知的融合。

Comments 6 pages

详情
AI中文摘要

替代安全措施(SSM)在自动驾驶领域的交通风险评估中被广泛使用。然而,大多数基于SSM的评估采用固定阈值,无法捕捉人类对持续临界状态的响应或对短暂高风险峰值的反应。本文提出了一种受生物学启发的SSM阈值重新解释,将其建模为泄漏积分点火(LIF)神经元的放电阈值,并将多个SSM输入组合成脉冲神经网络(SNN)。该SNN经过训练,使其发放的脉冲与人类刹车起始点对齐。训练数据是在使用3D-CoAutoSim平台(基于CARLA/Unreal和六自由度运动平台)的受控跟车实验中记录的,实验中生成了诱导的关键事件。结果表明,学习到的脉冲活动在定性上与跨场景的刹车行为一致,并捕捉了仅靠阈值交叉无法一致解释的反应。跨参与者的分析进一步表明,学习到的输入阈值保持相对一致,而学习到的衰减因子编码了SSM的不同时间敏感性。本研究的发现表明,脉冲动力学可能作为一种机制,促进客观SSM与主观人类安全感知的融合。

英文摘要

Surrogate Safety Measures (SSMs) are extensively utilised in the evaluation of traffic risk in automated driving contexts. However, the majority of SSM-based evaluations employ fixed thresholds that fail to capture the human response to sustained borderline conditions or the reaction to brief, high-risk peaks. The present work proposes a biologically inspired reinterpretation of SSM thresholds. This is modelled as spiking thresholds of leaky integrate-and-fire (LIF) neurons, with multiple SSM inputs combined into a spiking neural network (SNN). The SNN is trained to emit spikes that are aligned with human braking onsets. The training data was recorded in a controlled car-following experiment using the 3D-CoAutoSim platform with CARLA/Unreal and a 6-DOF motion platform, where induced critical events were generated. The results demonstrate that the learned spiking activity qualitatively aligns with braking behaviour across scenarios and captures reactions that are not consistently explained by threshold crossings alone. Analysis across participants further indicates that learned input thresholds remain relatively consistent, while learned decay factors encode different temporal sensitivities for the SSMs. The findings of this study indicate that spiking dynamics may serve as a mechanism to facilitate the convergence of objective SSMs with subjective human safety perception.

2605.30366 2026-06-01 cs.CR cs.SD eess.AS

Escaping the Linearity Trap: Manifold Detours for Black-Box Adversarial Attacks on Singing Audio Deepfake Detection

逃离线性陷阱:针对歌声音频深度伪造检测的黑盒对抗攻击的流形绕行策略

Yifan Liao, Yule Liu, Zhen Sun, Zongmin Zhang, Yupeng He, Jiaheng Wei, Xinhu Zheng, Xinlei He

AI总结 针对自监督学习(SSL)歌声深度伪造检测(SVDD)的对抗攻击失败源于线性陷阱,提出MARS框架通过双层优化逃离线性陷阱,显著提升黑盒迁移攻击成功率。

详情
AI中文摘要

近期歌声合成(SVS)的进步使得高度逼真但可能恶意的AI翻唱成为可能,因此歌声深度伪造检测(SVDD)至关重要。基于自监督学习(SSL)的检测器通过微调语音SSL骨干网络以捕获歌声特有的伪造痕迹,达到了最先进的性能。现有的对抗攻击通常无法攻破SSL-SVDD,造成其具有内在鲁棒性的错误印象。我们揭示这源于两个挑战。首先,在目标层面,攻击在局部替代模型上优化交叉熵,跨越替代模型特定的决策边界而非抑制共享的伪造证据。其次,在方法层面,攻击遵循替代模型的主导梯度方向。在SSL-SVDD中,这对应于微调后的伪造敏感方向,限制了向未见检测器的迁移性——我们将这种几何失败称为线性陷阱。为了正确评估鲁棒性,我们提出了MARS(语义元对抗回归),一种针对SSL-SVDD的基于迁移的黑盒攻击框架。结构上,MARS通过从预训练SSL空间构建自然语义锚点、从微调空间构建伪造锚点,转向假设-证据操纵。算法上,MARS通过双层优化逃离线性陷阱:内层阶段诱导切向探索,外层阶段引导音频朝向自然语义流形。在CtrSVDD基准上的实验表明,MARS在分布内迁移(13%)、分布外迁移(10%)和跨任务评估(36%)中提升了攻击成功率(ASR),凸显了构建鲁棒SVDD系统的紧迫性。

英文摘要

Recent Singing Voice Synthesis (SVS) advances enable highly realistic but potentially malicious AI covers, making singing voice deepfake detection (SVDD) crucial. Self-Supervised Learning (SSL)-based detectors achieve state-of-the-art performance by fine-tuning speech SSL backbones to capture singing-specific spoof artifacts. Existing adversarial attacks often fail against SSL-SVDD, creating a false impression of inherent robustness. We reveal this stems from two challenges. First, at the objective level, attacks optimize cross-entropy on local surrogates, crossing surrogate-specific boundaries rather than suppressing shared spoof evidence. Second, at the method level, attacks follow the surrogate's dominant gradient direction. In SSL-SVDD, this aligns with fine-tuned artifact-sensitive directions, limiting transferability to unseen detectors - a geometric failure we term the Linearity Trap. To properly evaluate robustness, we propose MARS (Meta-Adversarial Regression of Semantics), a transfer-based black-box framework tailored to SSL-SVDD. Structurally, MARS shifts to hypothesis-evidence manipulation by constructing a natural semantic anchor from the pre-trained SSL space and an artifact anchor from the fine-tuned space. Algorithmically, MARS escapes the Linearity Trap via bi-level optimization: the inner stage induces tangential exploration, while the outer stage guides the audio toward the natural semantic manifold. Experiments on the CtrSVDD benchmark show MARS improves Attack Success Rate (ASR) in in-distribution transfer (13%), out-of-distribution transfer (10%), and cross-task evaluation (36%), highlighting the urgent need for robust SVDD systems.

2605.30364 2026-06-01 eess.SP cs.AI

Hamiltonian-Inspired Attention Mechanism for Scalable RF Transmitter Fingerprinting

哈密顿启发的注意力机制用于可扩展射频发射器指纹识别

Chitraksh Singh, Monisha Dhanraj, Akram Sheriff

AI总结 提出哈密顿Transformer,通过物理启发的注意力结构(规范保持值更新和相位增量嵌入)提升射频发射器指纹识别在规模扩展下的性能。

Comments 9 pages

详情
AI中文摘要

射频(RF)指纹识别利用基带I/Q信号中硬件引入的缺陷来识别无线发射器。然而,深度学习模型在接收机和信道分布变化下性能下降,尤其是当发射器数量增加时。本文提出哈密顿Transformer,一种物理启发的注意力架构,通过使用学习到的斜对称生成器和Störmer-Verlet蛙跳积分步骤,在每个注意力头内强制执行规范保持的值动态。额外的相位增量嵌入在输入层揭示振荡器动态。所有实验使用WiSig数据集的非均衡原始I/Q信号,在四种协议下进行:同一天分类、跨接收机泛化、跨天泛化和扩展到150个设备。哈密顿Transformer在同一天条件下达到99.12%的准确率,在150个发射器时达到61.64%,在所有规模点上持续优于CNN和Transformer基线。受控消融研究确定值更新中的规范保持是驱动扩展优势的主要归纳偏置,而相位增量嵌入提供了最大的单组件改进。这些结果表明,将物理启发的结构先验嵌入注意力机制是在原始无线信号上进行大规模发射器识别的有效方法。

英文摘要

Radio-frequency (RF) fingerprinting identifies wire-less transmitters using hardware-induced imperfections present in baseband I/Q signals. However, deep learning models often degrade under receiver and channel distribution shifts, particularly as transmitter populations grow. This work proposes the Hamiltonian Transformer, a physics-informed attention architecture that enforces norm preserving value dynamics within each attention head using a learned skew-symmetric generator and a Störmer-Verlet leapfrog integration step. An additional phase-increment embedding exposes oscillator dynamics at the input layer. All experiments use non-equalized raw I/Q signals from the WiSig dataset under four protocols: same-day classification, cross-receiver generalisation, cross-day generalisation, and transmitter scaling up to 150 devices. The Hamiltonian Transformer achieves 99.12% accuracy under same-day conditions and 61.64% at 150 transmitters, consistently outperforming CNN and Transformer baselines across all scale points. A controlled ablation study identifies norm-preservation in the value update as the primary inductive bias driving the scaling advantage, with the phase increment embedding providing the single largest per-component improvement. These results indicate that embedding physics-informed structural priors into attention mechanisms is an effective approach to large-scale transmitter identification on raw wireless signals.

2605.30363 2026-06-01 q-fin.CP cs.AI cs.LG q-fin.ST

Enhancing Regime Shift Detection Using Unstructured Data: A Study on the Treasury Market

利用非结构化数据增强制度转换检测:国债市场研究

Mingxuan Yi, Vidal Mehra, Jing Chen, John Cartlidge

AI总结 提出一种结合大语言模型推理与统计检验的文本增强型制度转换检测框架,在国债市场数据上实现F1=0.82,优于纯数据驱动方法。

Comments 8 pages, 4 figures. Code available at: https://github.com/mingxuan-yi/regime_shift

详情
AI中文摘要

金融市场的制度转换会重组资产价格和宏观变量的联合动态,打破任何单一制度校准。然而,由于数据信号嘈杂且高度多重共线性,而宣布制度转换的同期文本是非结构化的,因此难以可靠检测。标准的制度转换检测方法仅依赖结构化时间序列数据,忽略政策沟通,尽管这些文本往往在观察到的价格中实现转换之前就发出信号。我们提出了一种文本增强的制度转换检测流程,该流程将大语言模型(LLM)对央行沟通的推理与多元金融时间序列的统计验证相结合。该框架是检测器无关的:文本提出的候选点通过向量自回归(VAR)上的自助法似然比检验进行验证,而来自任意制度检测器的数据驱动候选点则通过宽松的LLM文本检查进行确认。我们在2010-2024年FOMC会议记录以及14变量美国国债和宏观经济面板数据上评估了该框架,使用了四种可互换的数据驱动检测器。所提出的流程在经核实的货币政策制度转换锚定列表上实现了F1=0.82,具有当日模态检测延迟,并且性能始终优于纯数据驱动基线。结果表明,将非结构化政策文本与统计结构性断点检测相结合,提高了金融市场制度转换识别的鲁棒性和可解释性。

英文摘要

Regime shifts in financial markets reorganise the joint dynamics of asset prices and macro variables, breaking any single-regime calibration. They are nonetheless difficult to detect reliably because the data signal is noisy and heavily multicollinear, while the contemporaneous text that announces them is unstructured. Standard regime shift detection methods rely solely on structured time-series data and ignore policy communications, even though these texts often signal shifts before they materialise in observed prices. We propose a text-enhanced regime shift detection pipeline that combines large language model (LLM) reasoning over central-bank communications with statistical validation on multivariate financial time series. The framework is detector-agnostic: text-proposed candidates are validated using a bootstrap likelihood-ratio test on a vector autoregression (VAR), while data-driven candidates from arbitrary regime detectors are ratified through a lenient LLM text check. We evaluate the framework on 2010-2024 FOMC minutes paired with a 14-variable U.S. Treasury and macroeconomic panel, using four interchangeable data-driven detectors. The proposed pipeline achieves F1 = 0.82 against a verified anchor list of monetary-policy regime shifts, with same-day modal detection latency and consistently stronger performance than pure data-driven baselines. The results demonstrate that combining unstructured policy text with statistical structural-break detection improves the robustness and interpretability of regime shift identification in financial markets.

2605.30362 2026-06-01 cs.NE cs.AI cs.CV

XOResNet: Exclusive-OR Meta-Residuals Facilitate Deep Spiking Neural Networks Learning

XOResNet: 异或元残差促进深度脉冲神经网络学习

Jianfang Wu, Junsong Wang

AI总结 针对深度脉冲神经网络中残差结构存在的脉冲冗余、信息损失和冗余学习问题,提出OR-ADD捷径连接和XOR元残差机制,构建XOResNet,在多个数据集上超越现有方法。

Comments 33 pages, 12 figures, 7 Tables

详情
AI中文摘要

脉冲神经网络(SNN)在深度模型中展现出优越的学习和表示能力。鉴于ResNet在深度学习中的巨大成功,自然希望用残差学习训练深度SNN。然而,现有的用于构建深度SNN的残差结构仍然面临脉冲冗余或信息损失以及冗余学习的挑战。在本研究中,我们首先旨在解决恒等映射中的相对脉冲冗余和非恒等映射中的信息损失问题。为此,我们提出了一种OR-ADD(OA)捷径连接,用于合并残差结构中两个分支的输出脉冲/电流。此外,为了减轻残差结构主干分支中的冗余学习,我们引入了XOR元残差的概念,即使用异或(XOR)操作为主干分支选择预学习残差。最后,通过整合OA捷径和XOR元残差,我们设计了XOR残差块,并基于该块进一步构建了不同深度的XOResNet。在Fashion-MNIST、CIFAR-10、CIFAR-100和miniImageNet四个数据集上的大量实验表明,所提出的XOResNet优于现有的通过梯度下降优化的最先进深度SNN。这些结果验证了我们的OA捷径和XOR元残差组件在克服SNN中残差学习基本局限性方面的有效性,为构建高性能神经形态系统提供了新的架构见解。

英文摘要

Spiking neural networks (SNNs) hold promise for demonstrating superior learning and representation capabilities in deep models. Given the tremendous success of ResNet in deep learning, it would naturally follow to train deep SNNs with residual learning. However, existing residual structures for constructing deep SNNs still present challenges of spike redundancy or information loss, as well as redundant learning. In the present study, we first aim to address issues of relative spike redundancy in identity mapping and information loss in non-identity mapping. To this end, we propose an OR-ADD (OA) shortcut connection to merge output spikes/currents from two branches in the residual structure. Furthermore, to mitigate redundant learning in the backbone branch of the residual structure, we introduce the concept of XOR meta-residuals, i.e., selecting pre-learning residuals using the Exclusive-OR (XOR) operation for the backbone branch. Finally, by integrating the OA shortcut and XOR meta-residuals, we devise the XOR residual block and further construct XOResNet with varying depths based on this block. Extensive experiments on four datasets, Fashion-MNIST, CIFAR-10, CIFAR-100, and miniImageNet, show that the proposed XOResNet outperforms existing state-of-the-art deep SNNs optimized via gradient descent. These results validate the effectiveness of our OA shortcut and XOR meta-residual components in overcoming fundamental limitations of residual learning in SNNs, providing new architectural insights for building high-performance neuromorphic systems.

2605.30361 2026-06-01 cs.NE cs.AI cs.LG

Gradient-Free Training of Spiking Neural Networks via Low-Rank Evolution Strategies

通过低秩进化策略的无梯度训练脉冲神经网络

Dhruv Patankar, Sachit Ramesha Gowda

AI总结 提出EGGROLL方法,利用低秩因子化进化策略扰动,在N-MNIST数据集上以79.21%测试精度和2.23倍加速实现脉冲神经网络的无梯度训练。

Comments 12 pages, 4 figures

详情
AI中文摘要

脉冲神经网络(SNN)在神经形态硬件上具有显著的能效优势,但由于离散脉冲阈值不可微,其训练仍然具有挑战性。代理梯度方法通过近似导数规避了这一问题,但它们需要反向传播基础设施,这与片上学习不兼容。进化策略(ES)是一种自然的无梯度替代方案,但其计算成本随参数数量扩展,使得对于大型权重矩阵不实用。我们提出了一种使用EGGROLL训练SNN的方法,这是一种ES扰动的低秩因子化,将每代内存从$\mathcal{O}(mn)$降低到$\mathcal{O}(r(m{+}n))$。将EGGROLL与N-MNIST上的漏积分点火SNN相结合,我们证明了无梯度训练达到了79.21%的测试准确率,同时相对于全秩ES,每代墙钟时间减少了2.23倍。我们的结果表明EGGROLL对于SNN训练是可行的,具有明确的准确率-速度权衡,并且兼容于无需代理梯度的神经形态硬件上的训练。

英文摘要

Spiking Neural Networks (SNNs) offer compelling energy efficiency on neuromorphic hardware, yet their training remains challenging because the discrete spike threshold is non-differentiable. Surrogate-gradient methods sidestep this by approximating the derivative, but they impose backpropagation infrastructure that is incompatible with on-chip learning. Evolution Strategies (\es) are a natural gradient-free alternative, yet their computational cost scales with the number of parameters, making them impractical for large weight matrices. We present a method for training SNNs using EGGROLL, a low-rank factorisation of ES perturbations that reduces per-generation memory from $\mathcal{O}(mn)$ to $\mathcal{O}(r(m{+}n))$. Combining EGGROLL with a Leaky Integrate-and-Fire SNN on N-MNIST, we demonstrate that gradient-free training achieves 79.21% test accuracy while reducing per-generation wall-clock time by 2.23$\times$ relative to full-rank ES. Our results demonstrate EGGROLL is viable for SNN training, with a clear accuracy-speed tradeoff, compatible with training on neuromorphic hardware without surrogate gradients.

2605.30359 2026-06-01 cs.NE cs.DC cs.LG cs.PF cs.SE cs.SY eess.SY

Kernel Foundry: A Diagnosis-driven Evolutionary Kernel Optimizer with Multi-Experts

Kernel Foundry:一种基于诊断的多专家进化内核优化器

Zixuan Huang, Da Chen, Kecheng Huang, Lihao Yin, Xing Li, Huiling Zhen, Mingxuan Yuan, Zili Shao

AI总结 提出Kernel Foundry,一种结合专家引导初始化、多岛进化搜索和结构化诊断反馈的自动GPU内核优化框架,显著提升正确性和性能。

详情
AI中文摘要

生成高性能GPU内核仍然具有挑战性,因为需要同时保证正确性和硬件感知优化。虽然大型语言模型(LLMs)在代码生成方面显示出潜力,但它们通常无法生成既正确又高效的内核。我们提出Kernel Foundry,一种诊断驱动的进化框架,用于自动GPU内核优化。我们的方法将专家引导的、检索增强的初始化与多岛进化搜索相结合,其中候选内核通过结构化诊断反馈进行迭代细化。一个集中的经验库积累可重用的优化知识以指导后续进化,同时显式机制防止绕过内核级计算的作弊行为。在KernelBench上的实验表明,我们的方法在正确性和性能上均持续优于强基线,在Level~2上实现了高达100%的正确性。

英文摘要

Generating high-performance GPU kernels remains challenging due to the need for both correctness and hardware-aware optimization. While large language models (LLMs) show promise in code generation, they often fail to produce kernels that are both correct and efficient. We propose Kernel Foundry, a diagnosis-driven evolutionary framework for automatic GPU kernel optimization. Our method combines expert-guided, retrieval-augmented initialization with a multi-island evolutionary search, where candidate kernels are iteratively refined using structured diagnostic feedback. A centralized experience library accumulates reusable optimization knowledge to guide subsequent evolution, while explicit mechanisms prevent cheating behaviors that bypass kernel-level computation. Experiments on KernelBench show that our method consistently improves both correctness and performance over strong baselines, achieving up to 100% correctness on Level~2.

2605.28646 2026-06-01 cs.CR cs.CL

MaskClaw: Edge-Side Personalized Privacy Arbitration for GUI Agents with Behavior-Driven Skill Evolution

MaskClaw: GUI代理的边端个性化隐私仲裁与行为驱动技能进化

Yanqiu Zhao, Dongying Zheng, Kaibo Huang, Yukun Wei, Zhongliang Yang, Linna Zhou

AI总结 提出MaskClaw,一种在边端进行隐私仲裁的GUI代理框架,通过本地视觉证据提取、策略记忆检索和沙箱门控的技能进化,在截图离开信任环境前决定允许、遮蔽或询问,解决了静态PII检测和云端推理的隐私边界问题。

Comments Preprint. Submitted to EMNLP 2026. 21 pages, including appendices; 5 figures Under review. Yanqiu Zhao and Dongying Zheng contributed equally to this work

详情
AI中文摘要

GUI代理依赖截图来推断意图并在应用程序间操作,但这些截图通常包含私人消息、医疗记录、支付凭证和工作场所特定的工作流程。在这种设置中,隐私决策取决于任务、接收者、应用程序状态和用户角色,然而静态PII检测器无法捕捉这些边界,而云端VLM推理可能在决定保护什么之前上传原始屏幕。我们提出MaskClaw,一个用于GUI代理的边端隐私仲裁器。MaskClaw提取本地视觉证据,检索用户和任务特定的策略记忆,并在原始截图离开受信任的用户或组织控制环境之前决定允许、遮蔽或询问。在五个设计的技能进化场景中,它将纠正、取消和编辑转化为可复用的隐私技能,并通过沙箱门控进行检查。我们引入了P-GUI-Evo,一个基于真实UI模式、重构的HTML屏幕和清理标签构建的基准。实验表明,仅靠模式匹配、云端推理和路由在相同协议下倾向于过度确认、过度遮蔽或暴露原始截图。相关工件可在https://github.com/Theodora-Y/MaskClaw获取。

英文摘要

GUI agents rely on screenshots to infer intent and operate across applications, but these screenshots often contain private messages, medical records, payment credentials, and workplace-specific workflows. Privacy decisions in this setting depend on task, recipient, application state, and user role, yet static PII detectors miss these boundaries and cloud-side VLM reasoning can upload the raw screen before deciding what should be protected. We present MaskClaw, an edge-side privacy arbitrator for GUI agents. MaskClaw extracts local visual evidence, retrieves user- and task-specific policy memory, and decides Allow, Mask, or Ask before raw screenshots leave a trusted user- or organization-controlled environment. In five designed skill-evolution scenarios, it turns corrections, cancellations, and edits into reusable privacy skills checked by a sandbox gate. We introduce P-GUI-Evo, a benchmark built from real UI patterns, reconstructed HTML screens, and sanitized labels. Experiments show that pattern matching, cloud reasoning, and routing alone tend to over-confirm, over-mask, or expose raw screenshots under the same protocol. The artifact is available at https://github.com/Theodora-Y/MaskClaw.

2605.28162 2026-06-01 quant-ph cs.LG

Learning Logical Operations for Arbitrary Quantum Error Correction Codes

学习任意量子纠错码的逻辑操作

Nico Meyer, Christopher Mutschler, Dominik Seuß, Andreas Maier, Daniel D. Scherer

AI总结 提出基于学习的框架,仅通过编码电路为任意量子纠错码构造具有横向性或浅深度等结构性质的逻辑操作,并扩展为变分早期容错量子计算(VarEFTQC)方法,用于协同设计非加性编码和逻辑门集。

Comments 23 pages, 12 figures, 5 tables

详情
AI中文摘要

逻辑操作对于量子纠错码内的量子计算至关重要。然而,发现其物理实现具有挑战性,特别是对于缺乏稳定子描述的非加性码。我们提出了一个通用的基于学习的框架,仅给定编码电路,即可构造逻辑操作的物理实现,同时强制执行诸如横向性或浅深度等结构性质。我们的方法通过重新发现标准稳定子码的已知逻辑操作得到验证。然后,我们将其扩展为协同设计过程,称为变分早期容错量子计算(VarEFTQC),该过程针对给定噪声模型定制非加性编码,并强制执行所需的逻辑门集,例如横向IQP型族或低深度通用集。一个软件库实现了完整的学习流程,包括损失函数变体、ansatz族和优化例程。这些结果共同将VarEFTQC定位为发现用于早期容错量子计算的硬件自适应逻辑工具的实用工具。

英文摘要

Logical operations are essential for quantum computation within quantum error-correcting codes. However, discovering their physical realizations is challenging, especially for non-additive codes that lack a stabilizer description. We present a general learning-based framework that, given only an encoding circuit, constructs physical implementations of logical operations while enforcing structural properties such as transversality or shallow depth. Our approach is validated by rediscovering known logical operations of standard stabilizer codes. We then extend it to a co-design procedure, dubbed variational early fault-tolerant quantum computing (VarEFTQC), which tailors non-additive encodings to a given noise model and enforces desired logical gate sets, such as transversal IQP-type families or low-depth universal sets. A software library implements the complete learning pipeline, including loss-function variants, ansatz families, and optimization routines. Together, these results position VarEFTQC as a practical tool for discovering hardware-adapted logical gadgets for early fault-tolerant quantum computing.

2605.27912 2026-06-01 cs.CR cs.DS cs.LG

Privately Estimating Monotone Statistics in Polynomial Time

多项式时间内私有估计单调统计量

Gavin Brown, Ephraim Linder, Mahbod Majid, Vikrant Singhal

AI总结 针对单调统计量的差分隐私估计,提出一种改进的子采样-聚合算法,在样本复杂度上节省因子t,运行时间增加因子e^t,并证明其最优性。

详情
AI中文摘要

我们研究用于估计单调统计量的高效差分隐私算法,即那些在新观测值加入时单调的统计量。我们研究的起点是子采样-聚合:一种经典范式,它将数据集划分为多个块,估计每个块上的统计量,然后私有地聚合这些估计。虽然这种方法实用且具有通用性,但它相当耗费数据。我们针对单调统计量类改进了这一框架——与子采样-聚合相比,我们的算法在样本复杂度上节省了因子$t$,而在运行时间上付出了因子$e^t$的代价,其中$t>0$是一个可调参数。我们通过查询复杂度的下界来补充我们的结果,表明我们的算法在此任务上本质上是最优的。作为一个应用,我们在私有特征值估计、私有损失估计以及私有估计高维模型(例如线性回归)中的单个参数方面获得了改进的结果。

英文摘要

We study efficient differentially private algorithms for estimating monotone statistics, i.e., statistics that are monotone under the addition of new observations. The starting point for our investigation is subsample-and-aggregate: a classical paradigm that partitions the dataset into blocks, estimates the statistic on each block, and then privately aggregates the estimates. While practical and generically applicable, this approach is quite data-hungry. We improve upon this framework for the class of monotone statistics -- compared to subsample-and-aggregate, our algorithms save a factor of $t$ in sample complexity and pay a factor of $e^t$ in running time, where $t>0$ is a tunable parameter. We complement our results with a query-complexity lower bound, showing that our algorithms are essentially optimal for this task. As an application, we obtain improved results for private eigenvalue estimation, private loss estimation, and privately estimating a single parameter of a high-dimensional model, e.g., in linear regression.

2605.26183 2026-06-01 q-bio.QM cs.LG

What Molecular Structure Cannot Tell Us: A Taxonomy of Explainability Gaps in GNN-Based Drug Toxicity Prediction

分子结构无法告诉我们的事:基于GNN的药物毒性预测中可解释性差距的分类

Juergen Dietrich

AI总结 本研究引入了一个操作分类法,系统性地分析了图神经网络在药物毒性预测中由于结构信息限制导致的不可解释性差距,并以阿司匹林为例量化了分子结构仅能解释约45%的不良反应。

Comments 13 pages

详情
AI中文摘要

并非所有临床相关的不良反应都能从分子图中结构推断出来——无论模型质量或架构复杂性如何。本研究引入了一个操作分类法,用于描述独立于所用学习算法的结构信息限制,这些限制阻碍了基于结构的毒性预测。图神经网络(GNN)已成为分子毒性预测的自然方法,直接作用于原子连接性,避免了固定长度指纹固有的信息损失。然而,药物已知药理学特征中实际可从分子结构推断的比例仍未被系统探索。以乙酰水杨酸(ASA,阿司匹林)——药理学中表征最全面的药物之一——作为模型化合物进行系统性案例研究。在Tox21基准上训练消息传递神经网络(MPNN),并应用GNNExplainer表征原子级归因。结果表明,分子结构解释了约45%(5/11)的已知ASA不良反应。引入了一个四类差距分类法(GAP-1至GAP-4),区分了原则上不可编码的效应、由非随机缺失(MNAR)机制引起的数据差距、检测面板不匹配和表示误差。通过系统的ChEMBL查询(42个已记录检测,0个可检索生物活性条目)经验量化了MNAR差距。注意力池化实验将表示误差定位到MPNN消息传递层而非聚合步骤。该差距分类法对药物安全信号检测和监管框架(包括良好药物警戒实践(GVP)指南和新方法论(NAMs))具有直接影响。在伴随的DDI消融研究中确认了所识别的结构限制。

英文摘要

Not all clinically relevant adverse effects are structurally inferable from molecular graphs - regardless of model quality or architectural complexity. This study introduces an operational taxonomy of the structural information limits that prevent structure-based toxicity prediction, independent of the learning algorithm employed. Graph Neural Networks (GNNs) have emerged as a natural approach for molecular toxicity prediction, operating directly on atomic connectivity without the information loss inherent to fixed-length fingerprints. However, the fraction of a drug's known pharmacological profile that is actually inferable from molecular structure remains systematically underexplored. A systematic case study using acetylsalicylic acid (ASA, Aspirin) - one of the most comprehensively characterized drugs in pharmacology - serves as model compound. A Message Passing Neural Network (MPNN) is trained on the Tox21 benchmark and GNNExplainer is applied to characterize atom-level attribution. Results indicate that molecular structure explains approximately 45% (5/11) of known ASA adverse effects. A four-category Gap Taxonomy (GAP-1 through GAP-4) is introduced distinguishing between principally non-encodable effects, data gaps arising from Missing Not At Random (MNAR) mechanisms, assay panel mismatches, and representation errors. The MNAR gap is empirically quantified via a systematic ChEMBL query (42 documented assays, 0 retrievable bioactivity entries). An attention pooling experiment localizes the representation error to the MPNN message passing layers rather than the aggregation step. The Gap Taxonomy has direct implications for drug safety signal detection and regulatory frameworks including Good Pharmacovigilance Practice (GVP) guidelines and New Approach Methodologies (NAMs). Structural limits identified are confirmed in a companion DDI ablation study.

2605.12340 2026-06-01 stat.ML cs.LG

Online Learning-to-Defer with Varying Experts

在线学习延迟决策与变化专家

Dang Hoang Duy, Yannis Montreuil, Maxime Meyer, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

AI总结 针对动态专家池和流式数据,提出首个在线学习延迟决策算法,利用H-一致性界和在线凸优化实现遗憾界保证。

详情
AI中文摘要

学习延迟决策(L2D)方法将每个查询路由到预测模型或外部专家。虽然现有工作研究批处理设置中的这个问题,但实际部署需要处理流数据、变化的专家可用性和变化的专家分布。我们引入了第一个用于多类分类的在线L2D算法,具有bandit反馈和动态变化的专家池。我们的方法在一般情况下实现了$O((n+n_e)T^{2/3})$的遗憾界,在低噪声条件下实现了$O((n+n_e)\sqrt{T})$的遗憾界,其中$T$是时间范围,$n$是标签数量,$n_e$是跨轮次观察到的不同专家数量。该分析基于在线框架的新颖$\mathcal{H}$-一致性界,结合在线凸优化的一阶方法。在合成和真实世界数据集上的实验表明,我们的方法有效地将标准学习延迟决策扩展到具有变化专家可用性和可靠性的设置。

英文摘要

Learning-to-Defer (L2D) methods route each query either to a predictive model or to external experts. While existing work studies this problem in batch settings, real-world deployments require handling streaming data, changing expert availability, and shifting expert distribution. We introduce the first online L2D algorithm for multiclass classification with bandit feedback and a dynamically varying pool of experts. Our method achieves regret guarantees of $O((n+n_e)T^{2/3})$ in general and $O((n+n_e)\sqrt{T})$ under a low-noise condition, where $T$ is the time horizon, $n$ is the number of labels, and $n_e$ is the number of distinct experts observed across rounds. The analysis builds on novel $\mathcal{H}$-consistency bounds for the online framework, combined with first-order methods for online convex optimization. Experiments on synthetic and real-world datasets demonstrate that our approach effectively extends standard Learning-to-Defer to settings with varying expert availability and reliability.

2604.09414 2026-06-01 stat.ML cs.LG

Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

超越增强动作代理的多专家学习延迟决策

Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

AI总结 针对多专家学习延迟决策问题,提出一种解耦代理损失函数,通过独立sigmoid头与softmax分类器头分离优化,解决了现有方法中的优化病理问题,并首次给出不随专家数量增长的校准常数界。

详情
AI中文摘要

学习延迟决策(L2D)系统针对每个输入决定是自行预测还是交给若干可用专家之一。非常成熟的方案通过将$K$个类别和$J$个专家视为共享$(K{+}J)$动作几何中的竞争动作,联合训练分类器和路由器。后续工作在该几何内提出了一系列增量修复;我们表明,即使在统计一致性下,每个方法仍不同程度地遭受优化层面的病理问题(目标失真、梯度放大、赢家通吃饥饿、集合质量崩溃或类别-专家耦合)。我们完全跳出增强动作家族,提出一种解耦代理:一个softmax分类器头以及每个专家独立的sigmoid头,镜像了问题的两个自然对象。我们证明每个样本的更新是坐标式的,且类别-专家Hessian块恒为零,并证明了具有校准常数$\max\{2\sqrt{2},\sqrt{2J/λ}\}$的过量风险界——据我们所知,这是第一个在多专家L2D中当每个专家权重固定时常数不随专家池增长的保证。在受控合成研究以及CIFAR-10、CIFAR-10H和Covertype上,它是我们比较中唯一在专家池增长时保持稳定、保留稀有专家并在每个真实数据基准上优于独立分类器的方法。

英文摘要

A learning-to-defer (L2D) system decides, for each input, whether to predict on its own or to hand it to one of several available experts. The very well established recipe trains classifier and router jointly by treating the $K$ classes and $J$ experts as competing actions in one shared $(K{+}J)$-action geometry. Subsequent work has proposed a series of incremental fixes within this geometry; we show that each still suffers, to varying severity, from an optimization-level pathology (target distortion, gradient amplification, winner-take-all starvation, set-mass collapse, or class-expert coupling) even under statistical consistency. We step outside the augmented-action family entirely and propose a decoupled surrogate: a softmax classifier head and an independent sigmoid head per expert, mirroring the two natural objects of the problem. We show that per-sample updates are then coordinatewise and the class-expert Hessian block is identically zero, and prove an excess-risk bound with calibration constant $\max\{2\sqrt{2},\sqrt{2J/λ}\}$ -- to our knowledge the first multi-expert L2D guarantee whose constant does not grow with the expert pool when the per-expert weight is held fixed. On controlled synthetic studies and on CIFAR-10, CIFAR-10H, and Covertype, it is the only method in our comparison that remains stable as the expert pool grows, preserves rare specialists, and improves over a standalone classifier on every real-data benchmark.