arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3860
2605.17899 2026-05-19 cs.LG cs.AI q-bio.QM

DCFold: Efficient Protein Structure Generation with Single Forward Pass

DCFold: 通过单次前向传递高效生成蛋白质结构

Zhe Zhang, Yuanning Feng, Yuxuan Song, Keyue Qiu, Hao Zhou, Wei-Ying Ma

AI总结 本文提出DCFold,一种单步生成模型,实现了与AlphaFold3同等的精度,通过双一致性训练框架和新的时间测地匹配(TGM)调度器,在保持预测保真度的同时将推理速度提升15倍,验证了其在结构预测和结合设计基准上的有效性。

详情
AI中文摘要

AlphaFold3引入了一种基于扩散的架构,将蛋白质结构预测提升到原子级分辨率,并提高了准确性。这种最先进的性能使AlphaFold3成为多样化生成和设计任务的基础模型。然而,其迭代设计显著增加了推理时间,限制了在虚拟筛选和蛋白质设计等下游任务中的实际部署。我们提出DCFold,一种单步生成模型,实现了AlphaFold3级别的精度。我们的双一致性训练框架,结合了新的时间测地匹配(TGM)调度器,使DCFold在保持预测保真度的同时,将推理速度提升15倍。我们验证了其在结构预测和结合设计基准上的有效性。

英文摘要

AlphaFold3 introduces a diffusion-based architecture that elevates protein structure prediction to all-atom resolution with improved accuracy. This state-of-the-art performance has established AlphaFold3 as a foundation model for diverse generation and design tasks. However, its iterative design substantially increases inference time, limiting practical deployment in downstream settings such as virtual screening and protein design. We propose DCFold, a single-step generative model that attains AlphaFold3-level accuracy. Our Dual Consistency training framework, which incorporates a novel Temporal Geodesic Matching (TGM) scheduler, enables DCFold to achieve a 15x acceleration in inference while maintaining predictive fidelity. We validate its effectiveness across both structure prediction and binder design benchmarks.

2605.17898 2026-05-19 cs.LG

Lightweight Gaussian Process Inference in C++ on Metal and CUDA

基于C++在Metal和CUDA上的轻量级高斯过程推断

Yu-Hsueh Fang

AI总结 本文提出LightGP,一个无需依赖的C++17库,用于高斯过程回归,支持Apple Metal和NVIDIA CUDA后端,以及通过Apple Accelerate和OpenBLAS优化的CPU路径。LightGP提供了四种推断路径,覆盖从N=100到N=500,000的问题规模,并在不同硬件上实现了显著的性能提升。

详情
AI中文摘要

高斯过程(GP)推断在Python中主要由GPyTorch和GPflow等库主导,这些库基于深度学习框架,继承了它们的调度开销和依赖项足迹。我们提出了LightGP,一个无依赖的C++17库,用于GP回归,并提供Python绑定,支持Apple Metal和NVIDIA CUDA后端,以及通过Apple Accelerate和OpenBLAS优化的CPU路径。LightGP提供了四种推断路径——精确的Cholesky分解、无矩阵的共轭梯度法、稀疏变分自由能和结构化核插值(SKI)与FFT——覆盖从N=100到N=500,000的问题。在Apple M4上,LightGP CPU在精确GP推断中比GPyTorch CPU快2.6-8.7倍,在稀疏GP推断中每种规模都快1.5倍。在NVIDIA RTX 3060上,LightGP CUDA在精确GP推断中比GPyTorch CUDA快2.3-6.7倍,直到N=2048,而在N=4096时GPyTorch缩小了差距。在Metal上融合的无矩阵核-向量乘积在N=20,000时以O(N)内存实现了32倍的性能提升,而通过Accelerate vDSP加速的SKI矩阵-向量乘法在N=200,000时运行在亚毫秒级别。LightGP编译为一个单一的静态库,无外部依赖,并可通过pip install lightgp安装。

英文摘要

Gaussian process (GP) inference in Python is dominated by libraries such as GPyTorch and GPflow, which are built on deep-learning frameworks and inherit their dispatch overhead and dependency footprint. We present LightGP, a dependency-free C++17 library for GP regression with Python bindings, supporting Apple Metal and NVIDIA CUDA backends alongside tuned CPU paths via Apple Accelerate and OpenBLAS. LightGP provides four inference paths -- exact Cholesky, matrix-free conjugate gradients, sparse variational free energy, and structured kernel interpolation with FFT -- covering problems from $N{=}100$ to $N{=}500{,}000$. On an Apple M4, LightGP CPU is 2.6--8.7$\times$ faster than GPyTorch CPU for exact GP and ${\sim}1.5\times$ faster for sparse GP at every scale tested. On an NVIDIA RTX~3060, LightGP CUDA is 2.3--6.7$\times$ faster than GPyTorch CUDA for exact GP up to $N{=}2{,}048$, with GPyTorch closing the gap at $N{=}4{,}096$. A fused matrix-free kernel-vector product on Metal achieves 32$\times$ over the explicit path at $N{=}20{,}000$ with $O(N)$ memory, and an FFT-accelerated SKI matvec via Accelerate vDSP runs in sub-millisecond time at $N{=}200{,}000$. LightGP compiles as a single static library with zero external dependencies and is installable via \texttt{pip install lightgp

2605.17894 2026-05-19 cs.AI

Evaluating Cognitive Age Alignment in Interactive AI Agents

评估交互式AI代理的认知年龄对齐

Yifan Shen, Jiawen Zhang, Jian Xu, Junho Kim, Ismini Lourentzou, Xu Cao, Meihuan Huang

AI总结 本文提出ChildAgentEval,首个基于心理测量的交互式基准,用于评估基于多模态大语言模型的代理的认知年龄对齐,通过与年龄特定的人类发展阶段进行系统比较,揭示当前代理在模拟年龄特定认知行为方面的优劣。

详情
AI中文摘要

尽管代理AI及其核心多模态大语言模型(MLLMs)在语言和视觉推理方面展示了从日常生活到高级科学研究的广阔潜力,但人工与人类智能之间仍存在深刻差距。尽管集成了强大工具和先进MLLMs,最先进的AI代理经常在基础且看似简单的任务上失败,而儿童可以轻松解决。受韦氏儿童智力量表(WISC)启发,我们引入ChildAgentEval,首个心理测量学基础的交互式基准,用于评估基于MLLMs的代理的认知年龄对齐。ChildAgentEval系统地将各种基于MLLMs的交互代理的推理性能与年龄特定的人类发展阶段进行比较,揭示当前代理系统在模拟年龄特定认知行为方面的能力和局限性。

英文摘要

While agentic AI and its core multimodal large language models (MLLMs) have demonstrated remarkable promise in language and visual reasoning across domains ranging from daily life to advanced scientific research, a profound gap remains between artificial and human intelligence. Despite the integration of powerful tools and advanced MLLMs, state-of-the-art AI agents frequently fail at foundational, seemingly simple tasks that a child can resolve with ease. Inspired by the Wechsler Intelligence Scale for Children (WISC), we introduce ChildAgentEval, the first psychometrically grounded interactive benchmark for evaluating cognitive age alignment in MLLM-based agents. ChildAgentEval systematically compares the reasoning performance of various MLLM-based interactive agents against age-specific human developmental stages, exposing where current agentic AI systems can and cannot simulate age-specific cognitive behavior.

2605.17887 2026-05-19 cs.LG cs.AI

Attention Sinks and Outliers in Attention Residuals

注意力沉底与注意力残差中的异常值

Haozheng Luo, Haoran Dai, Shaoyang Zhang, Xi Chen, Eric Hanchen Jiang, Yijiang Li, Jingyuan Huang, Chenghao Qiu, Chenwei Xu, Zhenyu Pan, Haotian Zhang, Binghui Wang, Yan Chen

AI总结 本文提出OASIS技术,通过层间空信号来解决注意力残差架构中注意力沉底、激活异常值以及推理稳定性下降的问题,通过双归一化设计和实验验证提升了模型的结构鲁棒性和量化鲁棒性。

详情
AI中文摘要

我们提出OASIS,一种基于层间空信号的异常值和沉底感知技术。As AttnResidual架构引入了额外的深度归一化通道,它们提高了层间路由的灵活性,但也加剧了注意力沉底、激活异常值以及由此导致的推理稳定性和量化鲁棒性下降。OASIS通过引入基于Softmax1的空空间和通过层间空信号将token级的空证据耦合到深度路由中,从而减少由沉底主导的路由并提高结构鲁棒性。理论上,我们证明了AttnResidual的双归一化设计加剧了沉底形成和量化脆性。实验上,我们在三个真实世界数据集上将OASIS与五个基线进行比较,并观察到在注意力沉底和后量化性能方面有持续的改进。值得注意的是,OASIS在评估设置中实现了最大无穷范数平均减少9.26%、平均峰度减少2.60%,并在W8A8下将困惑度降低了75.85%,在W4A4下将GSM8K Pass@1提高了12.42%。

英文摘要

We propose OASIS, an outlier- and sink-aware technique built on inter-layer null signaling. As AttnResidual architectures introduce an additional depth-wise normalization channel, they improve inter-layer routing flexibility but also exacerbate attention sinks, activation outliers, and the resulting degradation in inference stability and quantization robustness. OASIS addresses this issue by introducing a Softmax1-based null space and coupling token-level null evidence to depth routing through an inter-layer null signal, thereby reducing sink-dominated routing and improving structural robustness. Theoretically, we show that the dual-normalization design of AttnResidual intensifies sink formation and quantization brittleness. Experimentally, we compare OASIS against five baselines on three real-world datasets and observe consistent improvements in both attention sink and post-quantization performance. Notably, OASIS achieves an average reduction of 9.26% in maximum infinity norm and 2.60% in average kurtosis across the evaluated settings, while lowering perplexity by 75.85% under W8A8 and improving GSM8K Pass@1 by 12.42% under W4A4.

2605.17885 2026-05-19 cs.CL cs.AI

Multi-agent AI systems outperform human teams in creativity

多智能体AI系统在创造力上超越人类团队

Tiancheng Hu, Yixuan Jiang, Haotian Li, José Hernández-Orallo, Xing Xie, Nigel Collier, David Stillwell, Luning Sun

AI总结 研究探讨了多智能体AI系统在创造力任务中的表现,发现其在四个多样化问题解决任务中,比单智能体和人类团队更具创造力,核心方法是通过语义空间路径分析生成过程,主要贡献是揭示了AI和人类团队在创造力预测上的不同机制。

详情
AI中文摘要

尽管人工智能(AI)在众多认知任务上已匹配或超越人类表现,但创造力仍是一个极具争议的前沿。随着基于大语言模型(LLMs)的AI系统在研究和创新中被越来越多地采用,理解并增强其创造力变得至关重要。本文证明,多智能体LLM团队不仅超越了单个智能体,而且在4541个多智能体LLM想法和341个人类团队想法上,显著优于人类团队在创造力方面(Cohen's d=1.50)。这种优势由新颖性驱动,同时保持了相当的实用性。为了研究两组的生成过程,我们通过神经语言模型表示将对话表示为语义空间中的路径。LLM和人类团队在对话范围广泛而不是集中在单一主题(低全局一致性)时产生更多创造性想法。然而,预测创造力的额外模式不同:LLM团队受益于高效的探索(高语义扩展,较短路径),而人类团队受益于维持流畅的对话流程(高局部一致性,频繁转换)。此外,我们识别出模型选择和讨论结构作为正交的设计杠杆,共同解释了LLM对话动态中26.8%的方差,为系统开发具有增强创造力的多智能体系统铺平了道路。

英文摘要

Although artificial intelligence (AI) now matches or exceeds human performance across numerous cognitive tasks, creativity remains a highly contested frontier. As AI systems based on large language models (LLMs) are increasingly adopted in research and innovation, it is essential to understand and augment their creativity. Here we demonstrate that multi-agent LLM teams not only surpass single agents, but also substantially outperform human teams in creativity (Cohen's d=1.50) across 4,541 multi-agent LLM ideas and 341 human-team ideas on six diverse problem-solving tasks. This advantage is driven by novelty while maintaining comparable usefulness. To investigate the generative processes in both groups, we represent conversations as paths through semantic space using neural language model representations. Both LLM and human teams produce more creative ideas when conversations range widely rather than staying centered on a single theme (low global coherence). However, the additional patterns that predict creativity differ: LLM teams benefit from efficient exploration (high semantic spread, shorter paths), while human teams benefit from maintaining smooth conversational flow (high local coherence, frequent pivots). Additionally, we identify model choice and discussion structure as orthogonal design levers that together explain 26.8% of variance in LLM conversational dynamics, paving the way for systematic approaches to developing multi-agent systems with augmented creative capabilities.

2605.17877 2026-05-19 cs.AI

PAIR: Prefix-Aware Internal Reward Model for Multi-Turn Agent Optimization

PAIR:面向多轮代理优化的前缀感知内部奖励模型

Wonjoong Kim, Yeonjun In, Sangwu Park, Dongha Lee, Chanyoung Park

AI总结 本文提出PAIR模型,通过结合冻结的隐藏状态探针和轻量级注意力头部,解决多轮任务中内部正确性探针的可靠性问题,从而在不依赖外部模型调用或地面真实依赖的情况下,为GRPO训练提供密集的步骤级奖励信号。

Comments Under Review

详情
AI中文摘要

当前LLM在执行复杂多阶段任务方面面临重大挑战。组相对策略优化(GRPO)已成为主流选择,但其依赖稀疏结果奖励严重限制了中间步骤的信用分配。现有解决方案如运行完整回滚以分配步骤级优势、在每个步骤调用外部LLM评判者或计算内在奖励(需要每次评估都有地面真实答案)都引入了显著成本或实际限制。我们假设内部正确性探针可以重新利用LLM隐藏状态进行步骤级奖励信号,可能一次性解决所有这些限制。然而,现有探针研究假设输入干净,我们首先表明在多步骤设置中这一假设不成立:隐藏状态探针在前缀污染跟踪与可能损坏的前缀保持一致性时严重退化,而基于注意力的特征在污染下保持稳健但清洁前缀表现欠佳。基于这种互补关系,我们提出前缀感知内部奖励(PAIR),一种两阶段模型,包含冻结隐藏状态探针估计信念一致性以及轻量级注意力头部纠正其向地面正确性。实验结果表明,PAIR在受污染轨迹上实现了最高的AUROC,同时运行成本极低,能够在不依赖外部模型调用、地面真实依赖或完整轨迹回滚的情况下,为GRPO训练提供密集的步骤级奖励信号。

英文摘要

A significant hurdle for current LLMs is the execution of complex, multi-stage tasks. Group Relative Policy Optimization (GRPO) has been emerging as a leading choice, but its reliance on sparse outcome rewards severely limits credit assignment across intermediate steps. Existing remedies such as running full rollouts to assign step-level advantages, calling external LLM judges at each step, or computing intrinsic rewards that require ground-truth answers at every evaluation introduce significant costs or practical constraints. We hypothesize that internal correctness probing over LLM hidden states can be repurposed as a step-level reward signal, potentially addressing all of these limitations at once. However, existing probing research assumes clean inputs, and we first show that this assumption breaks down in multi-step settings: hidden-state probes degrade severely under prefix contamination tracking coherence with the (possibly corrupted) prefix rather than grounded correctness, while attention-based features remain robust to contamination but underperform on clean prefixes. Building on this complementary relationship, we propose the Prefix-Aware Internal Reward (PAIR), a two-stage model with a frozen hidden-state probe estimating belief-consistency and a lightweight attention-based head correcting it toward grounded correctness. Experimental results show that PAIR achieves the highest AUROC on contaminated trajectories while operating at negligible inference cost, enabling dense step-level reward signals for GRPO training without external model calls, ground-truth dependencies, or full-trajectory rollouts.

2605.17875 2026-05-19 cs.CV

HexagonalWarriorMamba: Superior Threshold-Dependent Multi-label Classification of 12-Lead ECG Cardiac Abnormalities

HexagonalWarriorMamba: 12导联ECG心脏异常的阈值依赖多标签分类的更优方法

Huawei Jiang, Husna Mutahira, Shibo Wei, Jiahang Li, Vladimir Shin, Juneho Yi, Dongryeol Ryu, Wonyoung Park, Mannan Saeed Muhammad

AI总结 本文提出HexagonalWarriorMamba框架,通过将12导联ECG视为单通道2D图像而非传统1D时间序列,改进了传统深度学习模型在处理ECG信号长程依赖关系方面的不足,实现了对心脏异常的更优多标签分类。

Comments Submitted to Scientific Reports

详情
AI中文摘要

从12导联心电图(ECG)中准确自动诊断心脏异常对于管理心血管疾病至关重要。然而,传统深度学习模型在检测并发状况方面仍面临挑战,因为它们通常难以建模ECG信号固有的长程依赖性。本文提出HexagonalWarriorMamba(HWMamba),一种基于Mamba架构的框架,将12导联ECG视为单通道2D图像而非传统1D时间序列。通过整合分层架构与2D选择性扫描机制,HWMamba被设计用于建模数据中的全局上下文和复杂空间关系。该模型在PhysioNet/Computing in Cardiology挑战2021数据集上进行评估,该数据集包含26个诊断标签,涵盖来自四个国家和三个大洲的七个机构的记录。结果表明,HWMamba在五个关键的阈值依赖指标上均优于当前最先进的方法,包括挑战分数和子集准确率。这些改进在保持宏AUROC接近SOTA性能的同时,提供了来自训练数据的有效阈值选择与强大的判别能力之间的平衡。这种Hexagonal Warrior表现,反映了在多个评估维度上的一致性能,使HWMamba成为多标签ECG分类的稳健且多功能的方法。

英文摘要

The accurate automated diagnosis of cardiac abnormalities from 12-lead electrocardiograms (ECGs) is critical for managing cardiovascular disease. However, detecting concurrent conditions remains a challenge for traditional deep learning models, which often have limited ability to model the long-range dependencies inherent in ECG signals. This manuscript proposes HexagonalWarriorMamba (HWMamba), a framework built on the Mamba architecture that processes 12-lead ECGs as single-channel 2D images rather than conventional 1D time series. By integrating a hierarchical architecture with a 2D Selective Scan mechanism, HWMamba is designed to model global context and complex spatial relationships within the data. The model is evaluated on the PhysioNet/Computing in Cardiology Challenge 2021 dataset, which includes 26 diagnostic labels and comprises recordings collected from seven institutions across four countries and three continents. Results demonstrate that HWMamba outperforms current state-of-the-art (SOTA) methods across five key threshold-dependent metrics, including Challenge Score and Subset Accuracy. These improvements provide a balance between strong discriminative capability and effective threshold selection derived from the training data, while maintaining near-SOTA performance in Macro AUROC. This Hexagonal Warrior performance, reflecting consistent performance across multiple evaluation dimensions, positions HWMamba as a robust and versatile approach for multi-label ECG classification.

2605.17873 2026-05-19 cs.LG cs.AI cs.CL

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

HINT-SD:针对长 Horizon 智能体的定向 hindsight 自监督学习

Woongyeng Yeo, Yumin Choi, Taekyung Ki, Sung Ju Hwang

AI总结 本文提出 HINT-SD,一种针对长 Horizon 智能体的定向 hindsight 自监督学习框架,通过全轨迹 hindsight 选择失败相关的动作,并仅在目标动作跨度上应用反馈条件自监督学习,实验表明该方法在 BFCL v3 和 AppWorld 上比密集的每回合反馈基线提高了 18.80 个百分点,同时训练时间降低 2.26 倍。

详情
AI中文摘要

训练具有长 horizon 的 LLM 智能体进行强化学习具有挑战性,因为稀疏结果奖励只能表明任务是否成功,而不能指示哪些中间动作导致了结果或如何修正。最近的方法通过从回合级动作-输出信号生成奖励或文本提示,或通过反馈条件自监督学习来缓解这一问题。然而,当许多中间回合已经成功或中性时,在每个回合生成反馈效率低下,而固定或错位的反馈难以监督导致失败的动作。为此,我们提出了 HINT-SD,一种基于全轨迹 hindsight 的定向自监督学习框架,用于选择失败相关的动作,并仅在目标动作跨度上应用反馈条件自监督学习。在 BFCL v3 和 AppWorld 上的实验表明,我们的方法在比密集的每回合反馈基线提高 18.80 个百分点的同时,实现了 2.26 倍更低的训练时间,表明选择何时进行自监督学习是有效且高效的长 horizon 智能体训练的关键因素。

英文摘要

Training long-horizon LLM agents with reinforcement learning is challenging because sparse outcome rewards reveal whether a task succeeds, but not which intermediate actions caused the outcome or how they should be corrected. Recent methods alleviate this issue by generating rewards or textual hints from turn-level action-output signals, or by using feedback-conditioned self-distillation. However, generating feedback at every turn is inefficient when many intermediate turns are already successful or neutral, and applying feedback at a fixed or misaligned turn often fails to supervise the actions that contributed to the failure. To bridge this gap, we propose HINT-SD, a targeted self-distillation framework that uses full-trajectory hindsight to select failure-relevant actions and applies feedback-conditioned distillation only on targeted action spans. Experiments on BFCL v3 and AppWorld show that our method improves over the dense per-turn feedback baseline by up to 18.80 percent while achieving 2.26$\times$ lower time per training step, suggesting that selecting where to distill is a key factor for both effective and efficient long-horizon agent training.

2605.17869 2026-05-19 cs.CV

PySIFT: GPU-Resident Deterministic SIFT for Deep Learning Vision Pipelines

PySIFT:用于深度学习视觉流水线的GPU驻留确定性SIFT

Sivakumar K. S., Mohammad Daniyalur Rahman, Gopi Raju Matta

AI总结 本文研究了经典SIFT在深度学习视觉流水线中的应用,展示了其在准确性和速度上的优势,并提出了PySIFT,一种完全在GPU上驻留的SIFT实现,能够提供确定性的输出和高效的性能。

Comments 9 pages, 6 figures

详情
AI中文摘要

在局部特征研究中,一个普遍的假设是经典手工描述符是精度有限的 relics,最好被学习的替代品取代。我们证明这是错误的。通过覆盖四个基准(HPatches、ROxford5K、IMC Phototourism、MegaDepth)的8种配置消融研究,我们展示了经典SIFT结合DSP多尺度池化在所有准确性指标上均优于神经描述符和方向替代(HardNet、OriNet),同时运行速度比传统方法快2-18倍,并且学习的匹配器(LightGlue)补充而非取代经典特征。结论重新定义了一十年的工作:不是“取代SIFT”,而是“与SIFT组合”,经典提取与学习匹配仅在几何上下文需要时使用。这一发现之所以不可见,是因为没有先前的GPU SIFT能够保持整个流水线在VRAM中或提供模块化以进行受控的经典-学习消融。我们提出了PySIFT,第一个完全在GPU上驻留的SIFT,使用CuPy/Numba CUDA内核和DLPack零拷贝传递到下游DL框架——无论关键点数量如何,元数据交换均在毫秒级O(1)时间内完成。在一台NVIDIA RTX 3050(4 GB VRAM)笔记本电脑上,PySIFT实现了:(i)在HPatches上比OpenCV SIFT更高的平均匹配准确率(MMA);(ii)在高分辨率MegaDepth上每对快383毫秒;(iii)在跨数据集基准测试中更高的几何精度(在MegaDepth上+5.6 pp AUC@10°,在IMC Phototourism上更多内点);(iv)位确定性的输出——在不同运行中具有相同的关键点和描述符,即使在不同GPU架构上也能够重复检测。这一保证表明学习的提取器无法在不付出显著性能牺牲的情况下匹配,也无法在不同GPU架构上实现,因为cuDNN的架构依赖性算法选择。PySIFT是开源的,无需C++编译。

英文摘要

A widespread assumption in local feature research holds that classical handcrafted descriptors are accuracy-limited relics best replaced by learned alternatives. We show this is wrong. Through an 8-configuration ablation spanning four benchmarks (HPatches, ROxford5K, IMC Phototourism, MegaDepth), we demonstrate that classical SIFT with DSP multi-scale pooling outperforms neural descriptor and orientation replacements (HardNet, OriNet) on every accuracy metric--while running 2--18$\times$ faster--and that learned matchers (LightGlue) complement rather than supersede classical features. The conclusion reframes a decade of work: not "replace SIFT" but "compose with SIFT," classical extraction paired with learned matching only where geometric context demands it. This finding was invisible because no prior GPU SIFT kept the complete pipeline in VRAM or offered modularity for controlled classical-vs-learned ablations. We present PySIFT, the first fully GPU-resident SIFT, implemented in CuPy/Numba CUDA kernels with DLPack zero-copy handoff to downstream DL frameworks--submillisecond O(1) metadata swap regardless of keypoint count. On a laptop-grade NVIDIA RTX 3050 (4 GB VRAM), PySIFT achieves: (i) higher Mean Matching Accuracy (MMA) than OpenCV SIFT on HPatches, (ii) 383 ms faster per pair on high-resolution MegaDepth, (iii) higher geometric accuracy on cross-dataset benchmarks (+5.6 pp AUC@10${}^\circ$ on MegaDepth, more inliers on IMC Phototourism), and (iv) bitwise deterministic output--identical keypoints and descriptors across runs, with detection reproducing identically even across GPU architectures: a guarantee that learned extractors cannot match without significant performance sacrifice, and cannot achieve at all across GPU architectures due to cuDNN's architecture-dependent algorithm selection. PySIFT is open-source, requiring no C++ compilation.

2605.17865 2026-05-19 cs.CV

Imaging Hidden Objects with Consumer LiDAR via Motion Induced Sampling

通过运动诱导采样用消费级LiDAR成像隐藏物体

Siddharth Somasundaram, Aaron Young, Akshat Dave, Adithya Pediredla, Ramesh Raskar

AI总结 本文提出了一种多帧融合策略,利用运动诱导孔径采样模型,在消费级LiDAR上实现了非线视成像,实现了隐藏物体的3D重建、多物体跟踪和相机定位,并展示了消费级硬件无需额外设置即可实现非线视成像的潜力。

详情
AI中文摘要

LiDARs are being increasingly deployed for consumer imaging in handheld, wearable, and robotic applications. These sensors can capture the time-of-flight of light at picosecond resolution, which in principle, enables them to capture information about objects hidden from their field of view. While such non-line-of-sight (NLOS) imaging capabilities have been shown on research-grade LiDARs, they are challenging to achieve on consumer devices due to poor signal quality resulting from low laser power, low spatial resolution, and object and camera motion. Inspired by burst photography and synthetic aperture radar, we propose a multi-frame fusion strategy to overcome these challenges and demonstrate NLOS imaging on consumer LiDAR. We first introduce the motion-induced aperture sampling model to unify the effects of object shape, object motion, and camera motion under a single measurement model. Using this model, we demonstrate several NLOS capabilities on a smartphone-grade LiDAR: (1) 3D reconstruction, (2) single and multi-object tracking, and (3) camera localization using hidden objects. Previously, NLOS imaging capabilities were largely restricted to bulky and expensive research-grade hardware that requires extensive setup and calibration. Our results represent a shift towards plug-and-play NLOS imaging, where anyone can image hidden objects with off-the-shelf hardware ($<100) and no additional setup. We believe that democratization of such capabilities will advance consumer applications of NLOS imaging.

英文摘要

LiDARs are being increasingly deployed for consumer imaging in handheld, wearable, and robotic applications. These sensors can capture the time-of-flight of light at picosecond resolution, which in principle, enables them to capture information about objects hidden from their field of view. While such non-line-of-sight (NLOS) imaging capabilities have been shown on research-grade LiDARs, they are challenging to achieve on consumer devices due to poor signal quality resulting from low laser power, low spatial resolution, and object and camera motion. Inspired by burst photography and synthetic aperture radar, we propose a multi-frame fusion strategy to overcome these challenges and demonstrate NLOS imaging on consumer LiDAR. We first introduce the motion-induced aperture sampling model to unify the effects of object shape, object motion, and camera motion under a single measurement model. Using this model, we demonstrate several NLOS capabilities on a smartphone-grade LiDAR: (1) 3D reconstruction, (2) single and multi-object tracking, and (3) camera localization using hidden objects. Previously, NLOS imaging capabilities were largely restricted to bulky and expensive research-grade hardware that requires extensive setup and calibration. Our results represent a shift towards plug-and-play NLOS imaging, where anyone can image hidden objects with off-the-shelf hardware ($<100) and no additional setup. We believe that democratization of such capabilities will advance consumer applications of NLOS imaging.

2605.17862 2026-05-19 cs.LG cs.AI

$\boldsymbol{f}$-OPD: Stabilizing Long-Horizon On-Policy Distillation with Freshness-Aware Control

f-OPD: 通过新鲜度感知控制稳定长周期在线策略蒸馏

Xianwei Chen, Shimin Zhang, Jibin Wu

AI总结 本文提出f-OPD框架,通过引入样本级新鲜度评分来稳定长周期在线策略蒸馏,实现性能与效率的平衡,为大规模长周期智能体训练奠定基础。

详情
AI中文摘要

在大规模语言模型中扩展在线策略蒸馏(OPD)面临根本性矛盾:异步执行是系统效率的必要条件,但结构上偏离理想的在线策略目标。为解决这一挑战,我们理论上将目标偏差分解为回放漂移和监督漂移,分别捕捉学生回放和教师上下文的陈旧性。基于此,我们引入样本级新鲜度评分,量化缓冲样本相对于在线策略目标的可靠性。受此信号引导,我们进一步提出f-OPD,一种新颖的框架,能够自适应调节陈旧样本的影响并约束异步训练下累积的策略漂移。在推理、工具使用和编码代理任务中,f-OPD在增加交互周期时,始终能够实现与同步优化相当的任务性能,同时保留异步执行的吞吐量优势。我们的结果建立了OPD中实现性能-效率权衡的第一个配方,为大规模长周期智能体训练铺平道路。

英文摘要

Scaling on-policy distillation (OPD) for large language models (LLMs) confronts a fundamental tension: asynchronous execution is necessary for system efficiency, but structurally deviates from the ideal on-policy objective. To address this challenge, we theoretically decompose the objective discrepancy into rollout drift and supervision drift, capturing staleness in student rollout and teacher context, respectively. Building on this, we introduce a sample-level freshness score that quantifies the reliability of a buffered sample with respect to the on-policy objective. Guided by this signal, we further propose f-OPD, a novel framework that adaptively regulates stale-sample influence and constrains policy drift accumulated under asynchronous training. Across reasoning, tool-use, and coding-agent tasks of increasing interaction horizon, f-OPD consistently achieves task performance comparable to synchronous optimization while largely retaining the throughput advantages of asynchronous execution. Our results establish the first recipe for achieving a performance-efficiency trade-off in OPD, paving the way for long-horizon agentic post-training at scale.

2605.17860 2026-05-19 cs.CL cs.AI

PAREDA: A Multi-Accent Speech Dataset of Natural Language Processing Research Discussions

PAREDA:自然语言处理研究讨论的多口音语音数据集

Sicheng Jin, Dipankar Srirag, Aditya Joshi

AI总结 本文提出PAREDA数据集,用于研究不同口音、自发性和领域特定语音的ASR性能,通过评估SOTA模型发现零样本设置下模型表现下降,但微调后显著降低WER,证明数据集捕捉了现有数据缺失的语言特征。

Comments Accepted and presented at SPEAKABLE 2026 workshop at LREC 2026

详情
AI中文摘要

尽管现代自动语音识别(ASR)系统在基准语料上实现高精度,但其性能在现实世界变化时往往下降。本文聚焦于因口音、自发性和领域特定语音引起的变异性。特别是,我们介绍了PAREDA数据集,这是首个多口音语音数据集,包含澳大利亚、印度英语和中文英语口音的学术自然语言处理(NLP)论文讨论。每个会话都会引发自发独白(一篇论文摘要的总结)和非独白(参与者之间的问答会话),从而产生一个充满技术术语和会话现象的语料库。我们评估了SOTA ASR模型在PAREDA上的性能,分析了口音混合和语音速度增加的影响。我们的结果表明,在零样本设置下,模型表现更差,证实了数据集的挑战性。然而,对PAREDA的微调显著降低了词错误率(WER),证明我们的数据集捕捉了现有语料中常缺失的语言特征。PAREDA为构建和评估更稳健和包容的ASR系统提供了宝贵的资源,用于专门的现实应用。

英文摘要

While modern Automatic Speech Recognition (ASR) systems achieve high accuracy on benchmark corpora, their performance often degrades when there is real-world variability. This work focuses on variability arising due to accented, spontaneous, and domain-specific speech. In particular, we introduce PAper REading DAtaset (PAREDA), a first-of-its-kind multi-accent speech dataset consisting of discussions on academic Natural Language Processing (NLP) papers between speakers with Australian, Indian-English, and Chinese English accents. Each session elicits a spontaneous monologue (a summary of a paper's abstract) and a non-monologue (a question-and-answer session between participants), resulting in a corpus rich with technical jargon and conversational phenomena. We evaluate the performance of SOTA ASR models on PAREDA, analysing the impact of accent mixing and increased speech rate. Our results show that, in the zero-shot setting, models perform worse, confirming the dataset's challenging nature. However, fine-tuning on PAREDA significantly reduces the Word Error Rate (WER), demonstrating that our dataset captures linguistic characteristics often missing from existing corpora. PAREDA serves as a valuable new resource for building and evaluating more robust and inclusive ASR systems for specialised, real-world applications.

2605.17856 2026-05-19 cs.AI

KISS - Knowledge Infrastructure for Scientific Simulation: A Scaffolding for Agentic Earth Science

KISS - 地球科学的科学模拟知识基础设施:一种智能体的支架

Ziwei Li, Liujun Zhu, Yuchen Liu, Yichen Zhao, Birk Li, Ruiqi Wu, Junliang Jin, Jianyun Zhang

AI总结 本文提出KISS,一种用于科学模拟的知识基础设施,通过将专业知识外化为经过验证的建模操作符、分阶段的领域协议和诊断恢复机制,使智能体能够生成物理合理且可验证的端到端模拟,从而降低非专业用户与过程模拟之间的接入门槛,并促进建模社区的整合。

详情
AI中文摘要

基于过程的模拟模型编码了数十年的地球科学领域科学理解,但最暴露于气候风险和资源稀缺的社区却最无法利用这些模型。本文介绍知识基础设施(KI),一种可被智能体执行的支架,将专业知识外化为经过验证的建模操作符、分阶段的领域协议和诊断恢复机制。在3000次耦合水文基准测试中,配备KI的智能体在84%的试验中生成了物理合理且可验证的端到端模拟,而未配备KI的智能体则停留在低于40%的水平。KI具有跨学科泛化能力。我们将其构建过程封装为知识解构工具包(KDT),该工具能够自主生成KI,使智能体能够执行117个额外的过程导向模型,覆盖14个地球科学领域。在所有119个KI中,建模决策和失败修复机制在不同底层物理基础上趋于一致,表明操作专业知识是结构化和可提取的,而非随意的。演示显示,配备KI的智能体降低了非专业用户与过程导向模拟之间的接入门槛,并降低了建模社区之间的整合门槛。通过这一支架,基于过程的科学可以作为可生长的科学公共领域发展,回应谁需要知道,且可由谁能够贡献来扩展。

英文摘要

Process-based simulation models encode decades of scientific understanding across the Earth sciences, yet the communities most exposed to climate risk and resource scarcity are the least able to use them. Here, we introduce knowledge infrastructure (KI), an agent-actionable scaffold that externalizes expertise into validated modelling operators, staged domain protocols, and diagnostic recovery mechanisms. Across a 3,000-trial coupled-hydrology benchmark, agents equipped with KI produced physically plausible, verifiable end-to-end simulations in up to 84% of trials, while agents without KI plateaued below 40%. KI generalizes across disciplines. We packaged its construction into a Knowledge Dissection Toolkit (KDT) that autonomously produced KI enabling end-to-end agent execution of 117 additional process-based models across 14 Earth-science domains. Across all 119 KIs, modelling decisions and failure remedies converged despite different underlying physics, showing that operational expertise is structured and extractable rather than ad hoc. Demonstrations show KI-equipped agents lowering both the access barrier between non-specialist users and process-based simulation, and the integration barrier between modelling communities. Through this scaffold, process-based science can then evolve as a living scientific commons, answerable to whoever needs to know and extendable by whoever can contribute.

2605.17854 2026-05-19 cs.LG

Learning over Positive and Negative Edges with Contrastive Message Passing

通过对比信息传递学习正负边

Peter Pao-Huang, Charilaos I. Kanatsoulis, Michael Bereket, Jure Leskovec

AI总结 本文研究了在低标签率、高同质性和高边密度设置下,负边信息对图表示学习的价值,并提出对比信息传递机制以同时利用正负边信息提升性能。

详情
AI中文摘要

传统的图学习方法通过现有(即正边)边进行信息传递来更新节点特征,但这些方法往往忽视了缺失(即负边)中可能有价值的信息。本文理论分析了负边在图表示中的价值,并证明在低标签率、高同质性和高边密度设置下,访问负边能提供比仅使用正边更大的信息增益。受此启发,我们引入对比信息传递(CMP),一种通用的信息传递架构,使图神经网络层能够推理正负边信息。通过在可学习权重上施加软正半定约束,我们的方法对正连接节点应用相似性保持变换,对负连接节点应用不相似性诱导变换。在不同数据条件下,CMP在低标签设置下,当负边信息有效时, consistently 超过基线方法。

英文摘要

Conventional approaches to learning on graphs involve message passing along existing (i.e., positive) edges to update node features. However, these approaches often disregard the potentially valuable information contained in the absence (i.e., negative) of edges. Here, we theoretically analyze the value of negative edges in graph representations and prove that in settings of low label rates, high homophily, and high edge density, access to negative edges provides significant information gain over using only positive edges. Motivated by this insight, we introduce Contrastive Message Passing (CMP), a general message passing architecture that enable graph neural network layers to reason over positive and negative edges. By imposing soft positive semidefinite constraints on the learnable weights, our approach differentially applies similarity-preserving transformations to positively connected nodes and dissimilarity-inducing transformations to negatively connected nodes. Over simulated and real datasets in varying data regimes, CMP consistently outperforms baselines in low-label settings when negative edges are informative.

2605.17851 2026-05-19 cs.RO

A Dexterous and Compliant Gripper With Soft Hydraulic Actuation for Microgravity Manipulation

一种具有软液压驱动的灵活机械手用于微重力操作

William Su, Jordan Kam, Yixiao Wang, Jianshu Zhou

AI总结 本文提出将DexCoHand灵活的双指六自由度机械手与Astrobee自由飞行机器人集成,以实现微重力环境下的灵活操作,该机械手在保持稳定接触的同时减少了对自由飞行基底的干扰,提高了操作的连续性和适应性。

Comments Accepted to the IEEE ICRA 2026 Space Robotics Workshop (SRW). 4 pages, 3 figures

详情
AI中文摘要

Astrobee现有的单自由度(DOF)欠驱动柔性爪形抓取器能够停靠在国际空间站(ISS)上,但对连续的灵活操作能力有限。更复杂的微重力任务需要一个能够保持稳定接触并限制对自由飞行基底的干扰的末端执行器,因为接触力会直接耦合到基底运动中。本文提出了将DexCoHand(一种灵活的双指六自由度抓取器)与Astrobee自由飞行机器人集成,以实现微重力操作。该系统在MuJoCo中使用Astrobee的标准手rail停靠序列进行评估,包括接近、停靠以及随后的俯仰和偏转运动。与Astrobee现有的抓取器相比,DexCoHand在保持命令的俯仰和偏转运动的同时,减少了意外的交叉轴基底运动。在地球上的硬件实验进一步展示了DexCoHand的灵活操作能力和其在更适应的智能操作任务中的潜力。

英文摘要

Astrobee's existing one-degree-of-freedom (DOF) underactuated compliant claw gripper enables perching on the International Space Station (ISS), but provides limited capability for continuous dexterous manipulation. More complex microgravity tasks require an end-effector that can maintain stable contact while limiting disturbance to the free-flying base, since contact forces directly couple into base motion. This article presents the integration of DexCoHand, a dexterous and compliant two-finger, 6-DOF gripper, with the Astrobee free-flying robot for microgravity manipulation. The system is evaluated in MuJoCo using Astrobee's standard handrail perching sequence, including approach, perching, and subsequent pan and tilt motions. Compared with Astrobee's existing gripper, DexCoHand preserves the commanded pan and tilt motions while reducing unintended cross-axis base motion. Hardware experiments on Earth further demonstrate DexCoHand's dexterous manipulation capabilities and its potential for more adaptable intelligent manipulation tasks.

2605.17849 2026-05-19 cs.CL cs.AI cs.LG

Generating Pretraining Tokens from Organic Data for Data-Bound Scaling

从有机数据生成预训练令牌以实现数据驱动的扩展

Zichun Yu, Chenyan Xiong

AI总结 本文提出SynPro框架,通过重新表述和重新格式化操作,帮助大语言模型更充分地利用有限的有机数据,从而在数据驱动的预训练中实现更高效的扩展。

详情
AI中文摘要

LLM预训练正从计算驱动转向数据驱动的阶段,其中可用的人类(有机)文本远远无法满足扩展需求。然而,达到数据驱动阶段并不意味着模型已充分利用其有机语料库。在本文中,我们介绍了SynPro,一个合成数据生成框架,帮助LLM更深入地学习有限的有机数据。SynPro应用两种操作,即重新表述和重新格式化,以多样化的形式呈现相同的有机源,以促进更深层次的学习,而无需引入外部信息。两个生成器通过强化学习优化,使用质量、忠实度和数据影响奖励进行优化,并在预训练平台期持续更新,以针对模型尚未吸收的内容。我们使用DCLM-Baseline的10%最优令牌(0.8B和2.2B)预训练400M和1.1B模型,反映了前沿预训练中现实的数据驱动阶段。我们的结果表明,有机数据被标准重复方法显著低估:SynPro解锁了比重复方法多3.7-5.2倍的有效令牌,甚至在1.1B规模上超过了非数据驱动的Oracle,该Oracle在等效唯一数据上训练。分析证实,忠实、模型意识的合成可以在不导致分布崩溃的情况下实现数据驱动的扩展。我们开源代码在https://github.com/cxcscmu/SynPro。

英文摘要

LLM pretraining is shifting from a compute-bound to a data-bound regime, where available human (organic) text falls far short of scaling demands. However, reaching the data-bound regime does not mean the model has fully utilized its organic corpus. In this paper, we introduce SynPro, a synthetic data generation framework that helps LLMs more thoroughly learn from limited organic data. SynPro applies two operations, rephrasing and reformat, that present the same organic source in diverse forms to facilitate deeper learning without introducing external information. Both generators are optimized via reinforcement learning with quality, faithfulness, and data influence rewards, and are continuously updated as pretraining plateaus to target content the model has yet to absorb. We pretrain 400M and 1.1B models with 10% of their Chinchilla-optimal tokens (0.8B and 2.2B) from DCLM-Baseline, reflecting a realistic data-bound regime in frontier pretraining. Our results reveal that organic data is significantly underutilized by standard repetition: SynPro unlocks 3.7-5.2x the effective tokens of repetition, even surpassing the non-data-bound oracle that trains on equivalent unique data at the 1.1B scale. Analyses confirm that faithful, model-aware synthesis sustains data-bound scaling without causing distribution collapse. We open-source our code at https://github.com/cxcscmu/SynPro.

2605.17834 2026-05-19 cs.CV

Stabilizing, Scaling & Enhancing MeanFlow for Large-scale Diffusion Distillation

稳定、扩展与增强MeanFlow用于大规模扩散蒸馏

Xiao He, Yang Li, Peizhen Zhang, Songtao Liu, Zhao Zhong, Nannan Wang

AI总结 本文提出了一种稳定MeanFlow的方法,通过引入暖启动技术并结合轨迹分布对齐,提高了大规模工业模型蒸馏的性能和泛化能力。

Comments 10 pages

详情
AI中文摘要

扩散模型表现出卓越的生成能力,但其高延迟限制了实际部署。许多研究尝试减少采样步骤以加速推理。其中,MeanFlow因其简洁的公式和显著的性能而受到关注。然而,其优化目标的不稳定性以及'均值偏置'限制了其在蒸馏大规模工业模型中的应用。为了稳定MeanFlow用于蒸馏大规模模型,我们首先引入了暖启动技术,其中MeanFlow的原始微分解法被替换为离散解。这种设计避免了由于MeanFlow目标包含来自未充分训练模型的stop-gradient项而导致的训练崩溃。一旦模型获得初步能力以拟合平均速度场,我们将其优化目标切换回微分解法,以实现进一步的细化。同时,为了缓解在极少数步推理中复杂目标分布下的'均值偏置',我们将其纳入轨迹分布对齐作为辅助目标,鼓励学生模型的轨迹分布更接近教师模型的轨迹分布。我们提出的蒸馏框架在应用于文本到图像(T2I)模型FLUX.1-dev(高达12B参数)时,相比现有蒸馏方法表现更优。此外,当扩展到80B参数的最新状态(SOTA)T2I模型HunyuanImage 3.0时,我们的方法继续表现出稳健的泛化能力和强性能。

英文摘要

Diffusion models exhibit remarkable generative capability, but their high latency limits practical deployment. Many studies have attempted to reduce sampling steps to accelerate inference. Among them, MeanFlow has attracted considerable attention due to its concise formulation and remarkable performance. Nevertheless, the instability of its optimization objective and the ''mean-seeking bias'' have limited its applicability to distill large-scale industrial models. To stabilize MeanFlow for distilling large-scale models, we first introduce a warm-up technique, in which the original differential solution of MeanFlow is replaced by a discrete solution. This design avoids training collapse caused by the MeanFlow target containing a stop-gradient term from an undertrained model. Once the model acquires a preliminary ability to fit the average velocity field, we switch the optimization objective back to the differential solution, enabling further refinement. Meanwhile, to alleviate the ''mean-seeking bias'' of MeanFlow under extremely few-step inference with complex target distributions, we incorporate trajectory distribution alignment as an auxiliary objective, encouraging the student model's trajectory distribution to align more closely with that of the teacher model. Our proposed distillation framework achieves superior performance compared to existing distillation approaches when applied to the text-to-image (T2I) model FLUX.1-dev (up to 12B parameters). Furthermore, when extended to the 80B-parameter state-of-the-art (SOTA) T2I model HunyuanImage 3.0, our method continues to demonstrate robust generalization and strong performance.

2605.17833 2026-05-19 cs.LG cs.AI

Efficient Bilevel Optimization for Meta Label Correction in Noisy Label Learning

高效的元标签校正中的双层优化

Ba Hoang Anh Nguyen, Viet Cuong Ta

AI总结 本文提出了一种高效的元标签校正方法EBOMLC,通过引入一步内循环更新、混合上界损失和对齐感知的动态障碍物,提高了元模型的训练效率和稳定性,实验表明其在高噪声环境下表现优异。

详情
AI中文摘要

训练深度神经网络时使用噪声标签可以降低数据标注成本,但可能会将噪声引入学习模型中。在元标签校正方法中,除了主模型外,还会训练一个额外的元模型,使用小规模干净数据集来校正大规模噪声数据集。然而,元模型的更新需要在主模型的内部步骤中计算超梯度,这会显著增加计算成本。为了提高训练效率,我们首先引入动态障碍梯度下降到标准元标签校正中。虽然这种直接扩展能够将训练过程的速度提高到大约一阶复杂度,但缺乏防止噪声信号泄漏到主模型和稳定元模型学习的机制。基于这一观察,我们提出了EBOMLC方法,其设计包含三个关键改进:一步内循环更新、混合上界损失和对齐感知的动态障碍物。在CIFAR-10和CIFAR-100上的实验结果表明,EBOMLC在高噪声率设置下优于其他基线方法,同时减少了元标签校正方法的训练时间。

英文摘要

Training a deep neural network with noisy labels could reduce data annotation cost but may introduce noise into the learned model. In meta label correction approaches, an additional meta model besides the main model is trained with a small, clean dataset to correct the large, noisy dataset. However, the update of the meta model requires the computation of hypergradients at the inner step of the main model which signif- icantly increases the computational cost. To improve the training efficiency, we first introduce the dynamic barrier gradient descent into standard meta label correction. While this naive extenstion is able to speed up the training process to approximately first- order complexity, it lacks mechanisms to prevent the leakage of noisy signals to the main model and to stabilize the learning of the meta model. Based on this observation, we propose the EBOMLC method, which is designed with three key improvements including one-step inner loop update, mixture upper loss and alignment- aware dynamic barrier. Empirical results on CIFAR-10 and CIFAR-100 demonstrate that EBOMLC consistently outperforms other baselines, especially under high noise rate settings, while reducing training time of the meta label correction approach.

2605.17831 2026-05-19 cs.LG cs.DB

Agentic Cost-Aware Query Planning with Knowledge Distillation for Big Data Analytics

具有知识蒸馏的代理成本感知查询规划用于大数据分析

Mahdi Naser-Moghadasi

AI总结 本文提出了一种结合规则基教师规划器、UCB1老虎机探索、成本感知预测和知识蒸馏的轻量级学生规划器,以解决大数据分析中查询优化计算成本高且资源受限环境下的内存和延迟约束问题,实验结果显示在纽约出租车和IMDB数据集上相比默认规划器降低了23%的延迟并保持了94%的约束满足率。

Comments 8 pages, preprint, code at https://github.com/mahdinaser/agentic-kd-planner

详情
AI中文摘要

在大数据分析中查询优化仍然计算成本很高,尤其是在资源受限的环境中,传统优化器无法满足内存和延迟约束。我们提出了一种代理查询规划系统,结合规则基教师规划器、UCB1老虎机探索、成本感知预测和知识蒸馏来构建轻量级学生规划器。我们的教师规划器使用六个关键优化策略生成SQL计划,而UCB1老虎机搜索在显式资源约束下高效地探索计划空间。随机森林成本模型预测查询延迟,根据计划特征进行成本感知决策。蒸馏的学生规划器(逻辑回归或梯度提升)学习模仿教师-老虎机决策以实现快速推理。在纽约出租车和IMDB数据集上的评估显示,与默认规划器相比,延迟减少了23%,同时保持了94%的约束满足率。学生规划器在复制最优计划方面实现了89%的准确性,推理时间快15倍。我们的单文件实现使在资源受限机器上可重复的大数据分析成为可能,并在https://github.com/mahdinaser/agentic-kd-planner上公开发布。

英文摘要

Query optimization in big data analytics remains computationally expensive, particularly for resource-constrained environments where traditional optimizers fail to satisfy memory and latency constraints. We present an agentic query planning system that combines a rule-based teacher planner, UCB1 bandit exploration, cost-aware prediction, and knowledge distillation to a lightweight student planner. Our teacher planner generates SQL plans using six key optimization strategies, while UCB1 bandit search efficiently explores the plan space under explicit resource constraints. A Random Forest cost model predicts query latency from plan features, enabling cost-aware decisions. A distilled student planner (Logistic Regression or Gradient Boosting) learns to mimic teacher-bandit decisions for fast inference. Evaluation on NYC Taxi and IMDB datasets demonstrates 23% latency reduction compared to default planners while maintaining 94% constraint satisfaction. The student planner achieves 89% accuracy in replicating optimal plans with 15x faster inference time. Our single-file implementation enables reproducible big-data analytics on resource-limited machines and is publicly available at https://github.com/mahdinaser/agentic-kd-planner.

2605.17830 2026-05-19 cs.AI cs.CL

Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

记住更多,风险更多:具有记忆能力的LLM代理的纵向安全风险

Ahmad Al-Tawaha, Shangding Gu, Peizhi Niu, Ruoxi Jia, Ming Jin

AI总结 本研究探讨了具有记忆能力的LLM代理在长期任务中因记忆积累导致的安全风险,提出了一种触发-探测协议来评估记忆污染的影响,并发现记忆安全应被视为一个纵向属性而非单一状态属性。

详情
AI中文摘要

对具有记忆能力的LLM代理的安全评估通常测量单任务内的安全性:代理是否在对抗性条件下(如提示注入或记忆污染)安全地完成单一场景。然而,在部署中,一个代理会服务于许多独立任务,时间跨度较长,早期任务积累的记忆会影响后续无关任务的行为。研究这种情形需要在任务间的时间维度上进行评估:不是代理在任何单一记忆状态下的安全性,而是随着记忆在许多独立交互中积累,其安全性特征如何变化。我们称之为这种故障模式“时间记忆污染”。为了隔离记忆暴露与流非平稳性,我们引入了一种触发-探测协议,该协议通过固定探测集与不同前缀长度的只读记忆快照进行评估,并结合NullMemory反事实基线来识别由记忆引起的违规。我们将此协议应用于三个涵盖记录、备忘录、表单和电子邮件通信的部署场景,以及八种记忆架构,并进一步在Claw-like AI代理(如OpenClaw)上使用平台原生的记忆机制。具有记忆能力的代理在NullMemory基线上表现优异,记忆引起的违规率在两种代理类别中均表现出随暴露长度上升的稳健趋势。顺序随机化实验表明,该效应主要由积累内容而非接触顺序驱动。最后,事件分解的结构后果是记忆引起的风险在生成前的检索状态即可检测,我们通过高召回率的诊断监控器验证了这一点。我们的结果表明,应将记忆安全视为一个需要时间评估的纵向属性,而非可通过快照捕捉的单一状态属性。

英文摘要

Safety evaluations of memory-equipped LLM agents typically measure within-task safety: whether an agent completes a single scenario safely, often under adversarial conditions such as prompt injection or memory poisoning. In deployment, however, a single agent serves many independent tasks over a long horizon, and memory accumulated during earlier tasks can affect behavior on later, unrelated ones. Studying this regime requires evaluation along the temporal dimension across tasks: not whether an agent is safe at any single memory state, but how its safety profile changes as memory accumulates across many independent interactions. We call this failure mode temporal memory contamination. To isolate memory exposure from stream non-stationarity, we introduce a trigger-probe protocol that evaluates a fixed probe set against read-only memory snapshots at varying prefix lengths, together with a NullMemory counterfactual baseline for identifying memory-induced violations. We apply this protocol across three deployment scenarios spanning records, memos, forms, and email correspondence and eight memory architectures, and additionally on Claw-like AI agents, such as OpenClaw, using the platform's native memory mechanism. Memory-enabled agents consistently exceed the NullMemory baseline, and memory-induced violation rates show a robust upward trend with exposure length on both agent classes. Order-randomization experiments indicate that the effect is driven primarily by accumulated content rather than encounter order. Finally, a structural consequence of the event decomposition is that memory-induced risk is detectable from retrieval state before generation, which we confirm with a high-recall diagnostic monitor. Our results argue for treating memory safety as a longitudinal property that requires temporal evaluation, not a single-state property that can be captured by a snapshot.

2605.17829 2026-05-19 cs.AI

Interactive Evaluation Requires a Design Science

交互评估需要一种设计科学

Keyang Xuan, Peiyang Song, Pan Lu, Pengrui Han, Wenkai Li, Zhenyu Zhang, Zexue He, Wenyue Hua, Manling Li, Jiaxuan You, Adrian Weller, Yizhong Wang, Jiaxin Pei

AI总结 本文探讨了交互评估应被视为一种原则性的评估范式,而非仅仅是新的智能体基准。通过定义评估为证据到判断的自主映射,文章展示了交互评估如何改变这一映射的两方面,并提出双轴分类法,制定设计原则和报告标准,分析了长期评估挑战在轨迹层面的再现。

Comments 10 pages

详情
AI中文摘要

AI评估正经历结构性变革。大型语言模型(LLMs)越来越多地被部署为通过工具、环境、用户和其他智能体进行时间动作的系统,而许多评估实践仍继承自响应中心基准(例如固定输入、孤立输出和单个响应可做出的判断)。该领域开始构建交互基准,但所形成的景观却碎片化:基准在允许的交互制品、轨迹评分方式以及所支持的主张上各不相同。本文主张交互评估应被视为一种原则性的评估范式,而非仅仅是新的智能体基准。单纯采用以往的评估范式并不足够。我们定义评估为证据到判断的自主映射,并展示交互评估改变了这一映射的两方面:证据变为由交互生成的轨迹,而评估过程必须评估过程、可恢复性、协调性、鲁棒性和系统级性能。基于此定义,我们提出双轴分类法,推导设计原则和报告标准,分析代表性场景,并探讨长期评估挑战在轨迹层面的再现。

英文摘要

AI evaluation is undergoing a structural change. Large language models (LLMs) are increasingly deployed as systems that act over time through tools, environments, users, and other agents, while many evaluation practices still inherit assumptions from response-centered benchmarks (e.g., fixed inputs, isolated outputs, and outcome judgments that can be made from a single response). The field has begun to build interactive benchmarks, but the resulting landscape is fragmented: benchmarks differ in what interaction artifacts they admit, how trajectories are scored, and what claims their results support. This position paper argues that interactive evaluation should be treated as a principled evaluation paradigm, not merely a new family of agent benchmarks. Simply adopting previous evaluation paradigms does not suffice. We define evaluation as an autonomous mapping from evidence to judgments, and show that interactive evaluation changes both sides of this mapping: the evidence becomes interaction-generated trajectories, while the evaluation procedure must assess process, recoverability, coordination, robustness, and system-level performance. Building on this definition, we propose a two-axis taxonomy, derive design principles and reporting standards, examine representative scenarios, and analyze how longstanding evaluation challenges reappear at the trajectory level.

2605.17827 2026-05-19 cs.LG cs.AI

Content-Style Identification via Differential Independence

通过微分独立性进行内容-风格识别

Subash Timilsina, Hoang-Son Nguyen, Sagar Shrestha, Xiao Fu

AI总结 本文提出了一种新的结构条件,即内容-风格微分独立性(CSDI),用于在内容和风格可能依赖的情况下实现生成分析中的可识别性,通过在雅可比子空间上施加块状正交约束,并设计了基于数值雅可比近似的随机正则化器以支持高维生成模型。

Comments 24 pages, 15 figures, ICML 2026

详情
AI中文摘要

生成分析经常将多领域观察建模为领域不变内容变量和领域特定风格变量的非线性混合。从不成对的领域中识别这两种因素可以实现域迁移和反事实数据生成等任务。先前的工作在内容和风格之间(块状)统计独立性或通过非线性混合函数的稀疏雅可比假设下建立了可识别性,但这些条件在实践中可能过于严格。在本文中,我们引入了内容-风格微分独立性(CSDI),一种替代的结构条件,要求内容和风格的微小变化在数据流形上诱导正交方向,从而在内容和风格依赖且雅可比密集时也能实现可识别性。我们通过在内容和风格相关的雅可比子空间上施加块状正交约束来操作化这一条件。为了支持高维生成模型,我们设计了一个基于数值雅可比近似的随机正则化器,从而在如高分辨率图像生成等设置中实现可扩展训练。在多个数据集上的实验验证了可识别性分析,并展示了反事实生成和域迁移的实用优势。

英文摘要

Generative analysis often models multi-domain observations as nonlinear mixtures of domain-invariant content variables and domain-specific style variables. Identifying both factors from unpaired domains enables tasks such as domain transfer and counterfactual data generation. Prior work establishes identifiability under (block-wise) statistical independence between content and style, or via sparse Jacobian assumptions on the nonlinear mixing function, but such conditions can be restrictive in practice. In this work, we introduce content-style differential independence (CSDI), an alternative structural condition requiring that infinitesimal variations in content and style induce orthogonal directions on the data manifold, thereby enabling identifiability even when content and style are dependent and the Jacobian is dense. We operationalize this condition through a blockwise orthogonality constraint on the Jacobian subspaces associated with content and style. To support high-dimensional generative models, we design a stochastic regularizer based on numerical Jacobian approximation, enabling scalable training in settings such as high-resolution image generation. Experiments across multiple datasets corroborate the identifiability analysis and demonstrate practical benefits on counterfactual generation and domain translation.

2605.17826 2026-05-19 cs.CV cs.AI

CounterCount: A Diagnostic Framework for Counting Bias in Vision Language Models

CounterCount: 一种用于视觉语言模型计数偏差诊断的框架

Reem Alzahrani, Hassan Alshanqiti, Bushra Bin Hemid, Zaid Alyafeai, Abdelrahman Eldesokey, Bernard Ghanem

AI总结 本文提出CounterCount框架,通过对比事实性与反事实性图像来诊断视觉语言模型在计数任务中的偏差问题,揭示模型对物体级先验知识的依赖,并提出统一的注意力调节策略提升反事实计数准确性。

详情
AI中文摘要

视觉语言模型(VLMs)在多模态推理方面表现出色,但尚不清楚其答案是基于视觉证据还是由学习的语言和世界先验知识驱动。计数提供了一个精确的测试环境:当视觉证据与常识物体知识冲突时,模型必须依赖图像而非典型计数。我们引入CounterCount,一种用于VLMs的反事实计数诊断框架,包含配对的事实性和反事实性图像、编辑过的计数相关属性、验证答案和局部化证据注释。评估最近的VLMs,我们发现其在事实性图像上表现强劲,但在反事实属性变化下持续退化,表明即使存在矛盾的视觉证据,模型仍依赖物体级先验知识。利用局部化注释,我们发现这些失败不仅由于缺失或模糊的视觉证据,而是由于模型对计数相关视觉token的注意力权重不足。我们引入一种统一的推理时间注意力调节策略,重新加权所选的视觉token,使多个VLMs的反事实计数准确率提高高达8%。总体而言,CounterCount揭示了先验驱动的计数失败,并为设计未来的VLMs提供了诊断见解。

英文摘要

Vision-Language Models (VLMs) excel at multimodal reasoning, yet it remains unclear whether their answers are grounded in visual evidence or driven by learned language and world priors. Counting provides a precise testbed: when visual evidence conflicts with canonical object knowledge, a model must rely on the image rather than a prototypical count. We introduce CounterCount, a diagnostic framework for counterfactual counting in VLMs, consisting of paired factual and counterfactual images with edited count-relevant attributes, verified answers, and localized evidence annotations. Evaluating recent VLMs, we find strong performance on factual images but consistent degradation under counterfactual attribute changes, indicating reliance on object-level priors even when contradictory visual evidence is present. Using localized annotations, we show that these failures are not solely due to missing or ambiguous visual evidence, but to models underweighting attention to count-relevant visual tokens. We introduce a unified inference-time attention modulation strategy that reweights selected visual tokens, improving counterfactual counting accuracy by up to 8% across multiple VLMs. Overall, CounterCount exposes prior-driven counting failures and provides diagnostic insights for designing future VLMs.

2605.17823 2026-05-19 cs.CV cs.AI

Why We Look Where We Look: Emergent Human-like Fixations of a Foveated Visual Language Model Maximizing Scene Understanding

为什么我们看那里:一种最大化场景理解的视网膜视觉语言模型表现出的人类样注视模式

Shravan Murlidaran, Ziqi Wen, Sana Shehabi, Miguel P. Eckstein

AI总结 研究探讨了人类自由观看时注视模式的形成机制,发现最大化场景理解的视网膜视觉语言模型能够产生类似人类的注视模式,表明这种模式可能是优化场景理解的副产品。

详情
AI中文摘要

当人类在没有特定任务的情况下观察场景(自由观看)时,他们最初会将眼动定向到场景中心,然后注视人物、文本、被注视或抓取的物体以及具有语义意义的区域。这些标志性注视模式所反映的内容以及它们是否优化了底层感知任务仍不清楚。我们显示,一个具有模拟视网膜视觉的计算代理,经过训练以优化场景理解,会表现出人类样的注视模式。相比之下,经过训练以搜索或分类场景的代理版本,或配备比人类更好的或更差的周边视觉的版本,预测人类注视模式的准确性较低。因此,人类自由观看的注视模式可能是在生物视网膜视觉约束下优化场景理解的副产品。

英文摘要

When humans view scenes without a specific task (free-viewing), they initially direct their eye movements toward the scene center and then fixate on people, text, objects being gazed at or grasped, and semantically meaningful regions. What these signature fixation patterns reflect and whether they optimize an underlying perceptual task remain unknown. We show that a computational agent with simulated foveation, trained to optimize scene comprehension, exhibits emergent human fixation signature patterns. In contrast, versions of the agent trained to search or classify scenes, or equipped with peripheral vision that was better or worse than human vision, predicted human fixation patterns less accurately. Thus, human free-viewing fixation patterns may emerge as a functional byproduct of optimizing scene comprehension under the biological constraints of foveated vision.

2605.17822 2026-05-19 cs.CV

Unleashing the Representational Power of Fourier Shapes for Attacking Infrared Object Detection

释放傅里叶形状的表示能力以攻击红外目标检测

Yixing Yong, Jian Wang, Ming Lei, Lijun He, Fan Li

AI总结 本文提出了一种基于傅里叶形状的红外目标检测攻击方法,通过引入可学习的傅里叶形状,克服了传统形状方法在表示能力和优化能力之间的根本权衡问题,实现了高效的梯度优化生成具有欺骗性的形状,使人类目标逃避检测。

详情
AI中文摘要

红外目标检测在自动驾驶和监控中至关重要,但仍然容易受到物理对抗攻击的威胁。与RGB域不同,攻击必须操控热信号,使得热阻材料的几何形状成为主要的对抗信息载体。当前基于形状的方法在表示能力和优化能力之间存在根本性的权衡,限制了攻击效果。在本文中,我们通过将可学习的傅里叶形状引入红外域,克服了这一困境。我们利用端到端可微框架,将一组紧凑的傅里叶系数,定义形状边界,通过 winding number theorem 解析地映射到像素空间的掩码。这使得能够通过梯度优化高效生成具有欺骗性的形状,使人类目标逃避检测。广泛的数字和物理实验提供了全面的评估,并验证了我们的优越性能。我们得到的物理贴片实现了惊人的鲁棒性,成功逃避了不同距离、角度、姿态和个体的检测器,且在距离大于25米(置信度=0.5)时攻击成功率超过88%。代码可在 https://github.com/Yongyx99/Fourier-shape-attack 上获得。

英文摘要

Infrared object detection is crucial for perception in autonomous driving and surveillance but remains vulnerable to physical adversarial attacks. Unlike in the RGB domain, where attacks rely on color texture, infrared attacks must manipulate thermal signatures, making the geometry shape of heat-blocking materials the primary adversarial information carrier. Current shape-based methods suffer from a fundamental trade-off between representational capability and optimization power, limiting their attack effectiveness.In this work, we overcome this dilemma by introducing learnable Fourier shapes to the infrared domain. We utilize an end-to-end differentiable framework where a compact set of Fourier coefficients, defining the shape boundary, is analytically mapped to a pixel-space mask via the winding number theorem. This enables efficient gradient-based optimization to generate potent shapes that cause human targets to evade detection. Extensive digital and physical experiments provide a comprehensive evaluation and validate our superior performance. Our resulting physical patch achieves striking robustness, successfully evading detectors across diverse distances, angles, poses, and individuals, and achieves over 88% attack success rate at distances greater than 25m (conf.=0.5). Code is available at https://github.com/Yongyx99/Fourier-shape-attack.

2605.17818 2026-05-19 cs.CV

Evidence-Guided Unknown Rejection for High-Confidence Near-Known Unknowns

基于证据的未知拒绝用于高置信度近似未知物

Xi Chen, Yingjun Xiao, Gang Fang

AI总结 本文提出EGUR-A方法,通过改变决策方式从判断样本得分是否足够高到判断预测已知类别是否有足够证据接受样本,从而减少高置信度的误判接受。

Comments 8 pages, 2 figures,8 tables

详情
AI中文摘要

开放集识别系统面临一个被忽视的失败模式:高置信度的近似未知物,这些样本位于已知标签集之外,但足够接近已知类别,使得闭合集分类器以高置信度接受它们。我们证明这种失败在标量阈值方法中普遍存在,包括最近的后处理检测器,并且更强的编码器可能放大而非消除风险。我们提出EGUR-A,将决策从『这个样本的得分是否足够高?』转变为『这个预测的已知类别是否有足够的证据来接受这个样本?』EGUR-A结合类别条件的局部接受证据与全局残差证据,并从已知样本统计中选择其相对权重,而无需未知验证数据。在CUB、FGVC-Aircraft和ImageNet-hard上,EGUR-A显著减少了在匹配已知拒绝操作点处的高置信度误判接受。结果不是更强的阈值,而是不同的问题:已知类别是否有权接受样本。

英文摘要

Open-set recognition systems face a neglected failure mode: high-confidence near-known unknowns, which lie outside the known label set but are close enough to known classes that a closed-set classifier accepts them with high confidence. We show that this failure is widespread across scalar-threshold methods, including recent post-hoc detectors, and that stronger encoders can amplify rather than remove the risk. We propose EGUR-A, which changes the decision from ``is this sample's score high enough?'' to ``does this predicted known class have sufficient evidence to accept this sample?'' EGUR-A combines class-conditional local acceptance evidence with global residual evidence, and selects their relative weight from known-sample statistics without unknown validation data. Across CUB, FGVC-Aircraft, and ImageNet-hard, EGUR-A substantially reduces high-confidence false known acceptance at matched known-rejection operating points. The result is not a stronger threshold; it is a different question: whether a known class is entitled to accept a sample.

2605.17815 2026-05-19 cs.RO cs.AI

Virtues of Ordered Chaos: Planning with Topple Actions in Tabletop Stack Rearrangement

秩序之中的混沌:在桌面堆叠重构中使用Topple动作的规划

Hao Lu, Rahul Shome

AI总结 本文研究了桌面环境中堆叠重构任务,通过引入更丰富的非抓取聚合动作(特别是从堆叠中倒落物体到桌面的Topple动作)来增强任务规划领域。核心方法是提出一种新的Topple聚合工具,将候选任务计划计算转化为 Pebble Motion 问题变体,从而在IsaacSim物理模拟中验证了其效果,展示了在执行速度上的显著优势。

Comments 8 pages, 7 figures

详情
AI中文摘要

高效的物体操作策略对自动化应用有重大影响。本文研究了桌面环境中的堆叠重构任务,重点是通过引入更丰富的非抓取聚合动作(特别是从堆叠中倒落物体到桌面的Topple动作)来增强任务规划领域。Topple可以压缩长序列的中间搬运动作。计算的计划需要根据问题在其中交错执行抓取和放置动作与Topple动作。为了生成任务计划并建模一个抽象来计算包含抓取和Topple动作的解决方案,引入了一种新的Topple聚合工具。使用这种有向图抽象,候选任务计划计算成为Pebble Motion问题的变种,将物体视为石子。然后在基于IsaacSim的物理模拟中报告了基准测试。结果突显了仅使用抓取和放置动作相比,在执行速度上的明显优势。尽管本文主要研究Topple动作,但证明了类似的抽象可以建模其他感兴趣的聚合动作,如Scoop。本文的工作为丰富物体交互的操纵应用提供了初步但有力的证据,表明抽象在其中的潜在好处。

英文摘要

Efficient object manipulation strategies have significant impact in automation applications. In this work, the stack rearrangement in tabletop settings is studied, with a focus on augmenting the task planning domain with richer nonprehensile aggregating actions, in particular the toppling of objects from a stack to the table. Toppling can compress long sequences of intermediate relocations. Computed plans need to interleave pick-and-place actions with topple throughout its plan based on the problem. In order to generate the task plan and model an abstraction to compute solutions that include both pick-and-place and topple actions, a novel aggregating gadget for topple is introduced. Using this directed graphical abstraction, candidate task plan computation becomes a variant of the pebble motion problem, treating objects as pebbles. Benchmarks are then reported in a IsaacSim-based physics simulation. Results highlight clear benefits of achieving faster execution than solely using pick-and-place actions. Though this work primarily investigates the topple action, we demonstrate that similar abstractions can model other aggregating actions of interest, like scoop. The current work provides a preliminary, strong indication of the promising benefits of abstractions for rich object interactions in manipulation applications.

2605.17812 2026-05-19 cs.AI

Going Headless? On the Boundaries of Vertical AI Firms

going headless?关于垂直AI企业的边界

Muhammad Zia Hydari, Farooq Muzaffar

AI总结 本文探讨了垂直AI企业在会计、法律、医疗、采购等领域中,将工作流、领域逻辑和责任整合到单一应用中的传统模式,以及通用AI代理如何解构这种模式,促使企业采取"going headless"策略。文章指出,这种策略对某些企业有益,对另一些企业则可能造成破坏,并提出了基于任务-责任制度的三类分类体系及规则债务的概念。

详情
AI中文摘要

垂直AI企业在会计、法律、医疗、采购等领域历史上将工作流、领域逻辑和责任整合到单一应用中。通用AI代理现在正在解构这种整合,促使创始人和投资者倡导"going headless":将工作流和界面交给代理,并将领域专业知识作为可调用的服务暴露出来。本文认为,对于某些企业来说,going headless是正确的,而对于另一些企业则可能是破坏性的,后者往往通过看似界面决策的架构选择无意中放弃了其价值捕获。这是一个边界问题,答案取决于区分接口边界(通常可以移动)和责任边界(通常不能移动)。基于科斯的企业理论、埃森曼、帕克和范阿尔斯特恩的平台包容框架,以及蒂茨对互补资产和可获取性的分析,本文表明,通过开放协议运营的协调者即使在技术互操作性提高的情况下仍能获得包容权力,并且持久的价值捕获集中在专业签发、受监管的工作流、证据轨迹和受信任的记录系统中。本文提出了一种三类分类体系(组件、集成软件平台、双轨),该分类不是基于行业而是基于任务-责任制度,并正式化了规则债务的概念:当业务规则和专业标准从受控系统迁移到提示和代理指令时,客户组织将承担未来治理、维护和责任负担。随后有四项原则:按责任而非界面分解,翻转边缘同时保留核心,将规则债务作为集成平台防止的客户成本,避免单一协调者依赖。

英文摘要

Vertical AI firms in accounting, law, healthcare, procurement, and similar domains historically bundled workflow, domain logic, and accountability into a single application. General-purpose AI agents are now unbundling that package, prompting founders and investors to advocate "going headless": cede the workflow and interface to agents and expose domain expertise as callable services. This article argues that going headless is correct for some firms and destructive for others, and that the latter often cede their value capture inadvertently through architectural choices that look like interface decisions. This is a boundary question, and the answer turns on distinguishing the interface boundary, which can often move, from the accountability boundary, which often must not. Drawing on Coase's theory of the firm, Eisenmann, Parker, and Van Alstyne's platform envelopment framework, and Teece's analysis of complementary assets and appropriability, the article shows that orchestrators operating through open protocols acquire envelopment power even as technical interoperability improves, and that durable value capture concentrates in cospecialized accountability assets: professional signoff, regulated workflows, evidence trails, and trusted systems of record. The article proposes a three-position taxonomy (component, integrated software platform, dual-track) determined not by sector but by task-accountability regime, and formalizes the construct of rule debt: the future governance, maintenance, and accountability burden that accrues to customer organizations when business rules and professional standards migrate from governed systems into prompts and agent instructions. Four principles follow: decompose by accountability not interface, invert the edges while retaining the core, position rule debt as the customer cost the integrated platform prevents, and avoid single-orchestrator dependence.

2605.17811 2026-05-19 cs.LG cs.AI math.OC

One Model, Two Roles: Emergent Specialization in a Shared Recurrent Transformer

一个模型,两种角色:共享递归变压器中的涌现专业化

Jucheng Shen, Barbara Su, Anastasios Kyrillidis

AI总结 该研究探讨了共享权重的递归变压器是否能在未被分割成独立模块的情况下发展出不同的内部角色,通过不对称输入递归(AIR)架构发现,模型内部状态分化出不同的功能角色,并展示了这种分化与模型状态动态的关系。

Comments 21 pages, 13 figures, 8 tables

详情
AI中文摘要

可以一个共享权重的递归变压器在未被分割成独立模块的情况下发展出不同的内部角色吗?我们研究了不对称输入递归(AIR),这是一种最小的两状态推理架构,在其中相同的Transformer模型被重复用于更新(根据文献,L和H),唯一的更新规则差异是编码输入在L更新中被注入但在H更新中不被注入。在Sudoku-Extreme和Maze中,解码的rollouts揭示出一致的分裂:$\zH$表现得像一个完全承诺的提案状态,而$\zL$保留局部不确定性和移动的中间结构。冻结实验显示,这种分裂实际上与模型的状态动态有关:在Sudoku中,冻结$\zH$会减少$\zL$的内容变化,而冻结$\zL$会增加$\zH$的内容变化;而在Maze中,冻结任一状态会增加另一个状态的内容变化。消融实验显示,为了诱导专业化,共享模型需要能够区分两种更新类型,要么通过输入注入的不对称性,要么通过一个单独的层级标记。机理上,注意力分析显示在Sudoku和Maze中,L更新始终比H更新更局部。这些结果表明,在两状态递归设置中,清晰的状态身份信号可以诱导共享参数递归变压器内部稳定的、相关的功能角色。代码可在https://github.com/juchengshen/air获得。

英文摘要

Can a shared-weight recurrent Transformer develop distinct internal roles without being partitioned into separate modules? We study this in Asymmetric Input Recurrence (AIR), a minimal two-state reasoning architecture in which the same Transformer model is reused for both updates (per literature, L and H) and the only built-in difference in the update rule is that the encoded input is injected during L-updates but not H-updates. Across Sudoku-Extreme and Maze, decoded rollouts reveal a consistent split: $\zH$ behaves like a fully committed proposal state, whereas $\zL$ retains local uncertainty and shifting intermediate structure. Freeze experiments show that this split is, in practice, related to the model's state dynamics: in Sudoku, freezing $\zH$ reduces $\zL$'s content changes whereas freezing $\zL$ increases $\zH$'s, while in Maze, freezing either state increases content changes in the other state. Ablations show that to induce specialization, the shared model needs to be able to tell the two update types apart, either from input injection asymmetry or from a separate level token. Mechanistically, attention analysis shows that L-updates are consistently more local than H-updates in both Sudoku and Maze. Together, these results show that, in a two-state recurrent setting, a clear state-identity signal can induce stable, related functional roles inside a shared-parameter recurrent Transformer. Code is available at \href{https://github.com/juchengshen/air}{\textcolor{blue}{https://github.com/juchengshen/air}}.

2605.17808 2026-05-19 cs.LG stat.ML

A Unified Framework for Data-Free One-Step Sampling via Wasserstein Gradient Flows

通过Wasserstein梯度流构建数据免费一步采样的统一框架

Chenguang Wang, Tianshu Yu

AI总结 本文提出了一种基于Wasserstein梯度流的数据免费一步采样的统一理论框架,展示了f-分歧度目标下诱导速度场的通用形式,并通过软欠覆盖功能理论推导了分歧度选择与质量运输几何之间的压缩-弹性恒等式,进一步扩展到Log-Variance分歧度,并通过KDE实现和归一化流路线实现了一步推断。

详情
AI中文摘要

我们开发了一种基于Wasserstein梯度流的数据免费一步采样的统一理论框架。对于广泛的标准f-分歧度目标,我们证明诱导速度场具有通用形式V(x)=w(r(x))β(x),其中β(x)=∇log(p(x)/q(x))在不同目标中共享,而w仅由分歧度的选择决定。这种分解表明标准f-分歧度漂移共享相同的渐近目标分布p,并主要区别于如何在欠覆盖区域重新分配瞬时修复努力。为了正式化这种区别,我们推导了软欠覆盖功能的一步区域响应理论,并获得了一个将分歧度选择与质量运输进入欠覆盖区域的几何联系的压缩-弹性恒等式。我们进一步将该框架扩展到Log-Variance (LV)分歧度,分析参考分布如何改变最终的漂移结构,并提出一个实用的LV启发式替代方案用于数据免费训练。基于此理论,我们通过KDE实现该框架,并描述了互补的归一化流路线,从而在训练后实现一步推断。在多模态高斯混合基准测试中的实验结果与理论预测一致,并在这些目标上展示了有效的一步采样。

英文摘要

We develop a unified theoretical framework for data-free one-step sampling from unnormalized target distributions based on Wasserstein gradient flows. For a broad class of standard f-divergence objectives, we show that the induced velocity field admits the universal form $\mathbf{V}(x)=w(r(x))\,β(x)$, where $β(x)=\nabla \log (p(x)/q(x))$ is shared across objectives and $w$ is determined solely by the choice of divergence. This decomposition shows that standard f-divergence drifts share the same asymptotic target distribution $p$ and differ primarily in how they redistribute transient repair effort across under-covered regions. To formalize this distinction, we derive a one-step regional-response theory for a soft under-coverage functional and obtain a compression--elasticity identity that links divergence choice to the geometry of mass transport into under-covered regions. We further extend the framework beyond the f-divergence family to the Log-Variance (LV) divergence, analyze how the reference distribution alters the resulting drift structure, and motivate a practical LV-inspired surrogate for data-free training. Based on this theory, we instantiate the framework with a KDE-based implementation and describe a complementary normalizing-flow route, enabling one-step inference after training. Experiments on multimodal Gaussian-mixture benchmarks are consistent with the theoretical predictions and demonstrate effective one-step sampling on these targets.