arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1926
专题追踪
2602.00851 2026-05-22 cs.AI cs.MA

Understanding Persuasion in Long-Running Agents

理解长期运行代理中的说服

Hyejun Jeong, Amir Houmansadr, Shlomo Zilberstein, Eugene Bagdasarian

发表机构 * University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校)

AI总结 本文研究了长期任务中代理受到用户说服影响的行为变化,提出了一种基于行为的评估框架,发现提前指定信念状态的代理在搜索和源访问上表现更高效,表明说服影响代理行为。

Comments Code available at https://github.com/HyejunJeong/persuasion-propagation

详情
AI中文摘要

现代AI代理越来越多地结合对话交互与自主任务执行,例如编码和网络研究,这引发了一个自然问题:当一个从事长期任务的代理受到用户说服时会发生什么?然而研究这一可能性具有挑战性,因为长期运行的代理行为具有噪声且难以重复,而且不清楚只有在扩展任务执行中才会出现哪些独特挑战。我们研究了信念层面干预如何影响下游任务行为,这种现象我们称之为说服传播。我们介绍了一种以行为为中心的评估框架,区分在任务执行期间或之前应用的说服。在网页研究和编码任务中,我们发现即时说服导致的行为影响弱且不一致。相反,当在任务时间显式指定信念状态时,信念预填充的代理平均进行26.9%更少的搜索,并访问16.9%更少的唯一来源,比中性预填充的代理。这些结果表明,即使在之前的交互中,说服也会影响代理的行为,从而推动对代理系统的行为层面评估。

英文摘要

Modern AI agents increasingly combine conversational interaction with autonomous task execution, such as coding and web research, raising a natural question: What happens when an agent engaged in long-horizon tasks is exposed to user persuasion? Yet studying this possibility is challenging because long-running agent behavior is noisy and costly to reproduce, and it remains unclear which unique challenges emerge only in extended task execution. We study how belief-level intervention can influence downstream task behavior, a phenomenon we name persuasion propagation. We introduce a behavior-centered evaluation framework that distinguishes between persuasion applied during or prior to task execution. Across web research and coding tasks, we find that on-the-fly persuasion induces weak and inconsistent behavioral effects. In contrast, when the belief state is explicitly specified at task time, belief-prefilled agents conduct on average 26.9% fewer searches and visit 16.9% fewer unique sources than neutral-prefilled agents. These results suggest that persuasion, even in prior interaction, can affect the agent's behavior, motivating behavior-level evaluation in agentic systems.

2601.11079 2026-05-22 cs.LG

Soft Bayesian Context Tree Models for Real-Valued Time Series

针对实值时间序列的软贝叶斯上下文树模型

Shota Saito, Yuta Nakahara, Toshiyasu Matsushima

发表机构 * Gunma University(群马大学) Waseda University(早稻田大学)

AI总结 本文提出了一种新的软贝叶斯上下文树模型(Soft-BCT),用于实值时间序列。该模型采用概率性分裂上下文空间,而非传统上下文树模型中确定性的上下文空间分裂。基于变分推断提出学习算法,实验结果表明Soft-BCT在某些数据集上优于传统上下文树模型。

详情
AI中文摘要

本文提出软贝叶斯上下文树模型(Soft-BCT),这是一种新的实值时间序列的上下文树模型。Soft-BCT考虑了上下文空间的软(概率)分裂,而不是传统上下文树模型中上下文空间的硬(确定性)分裂。基于变分推断提出Soft-BCT的学习算法。实验结果表明,Soft-BCT在某些数据集上优于传统上下文树模型。

英文摘要

This paper proposes the soft Bayesian context tree model (Soft-BCT), which is a novel BCT model for real-valued time series. The Soft-BCT considers soft (probabilistic) splits of the context space, instead of hard (deterministic) splits of the context space as in the previous BCT for real-valued time series. A learning algorithm of the Soft-BCT is proposed based on the variational inference. The results of experiments demonstrate the superiority of the Soft-BCT compared to the previous BCT for some datasets.

2601.10348 2026-05-22 cs.CL cs.AI cs.LG

Training-Trajectory-Aware Token Selection

基于训练轨迹的token选择

Zhanming Shen, Jiaqi Hu, Zeyu Qin, Hao Chen, Wentao Ye, Zenan Huang, Yihong Zhuang, Guoshan Lu, Junlin Zhou, Junbo Zhao

发表机构 * Zhejiang University(浙江大学) Hong Kong University of Science and Technology(香港科技大学)

AI总结 本文提出T3S方法,通过在token层面重构训练目标,清除未学习token的优化路径,从而在连续蒸馏中提升性能,实验表明在AR和dLLM设置中均取得显著效果。

Comments Accepted by ICML 2026

详情
AI中文摘要

高效的蒸馏是将昂贵的推理能力转化为可部署效率的关键途径,然而在前沿领域中,当学生模型已具备较强的推理能力时,朴素的连续蒸馏往往产生有限的收益甚至退化。我们观察到一种训练特征现象:即使损失单调下降,所有性能指标在几乎相同的瓶颈处会突然大幅下降,然后逐渐恢复。我们进一步揭示了token层面的机制:置信度会分裂成稳步增加的模仿锚点token,快速锚定优化,以及尚未学习的token,其置信度被抑制直到瓶颈之后。这两种类型token无法共存的特性是连续蒸馏失败的根本原因。为此,我们提出了基于训练轨迹的token选择(T3S)方法,以在token层面重建训练目标,清除未学习token的优化路径。T3S在AR和dLLM设置中均取得一致的收益:仅用数百个示例,Qwen3-8B在竞争性推理基准上超越DeepSeek-R1,Qwen3-32B接近Qwen3-235B,且T3训练的LLaDA-2.0-Mini超越其AR基线,达到所有16B级模型中的最先进性能。

英文摘要

Efficient distillation is a key pathway for converting expensive reasoning capability into deployable efficiency, yet in the frontier regime where the student already has strong reasoning ability, naive continual distillation often yields limited gains or even degradation. We observe a characteristic training phenomenon: even as loss decreases monotonically, all performance metrics can drop sharply at almost the same bottleneck, before gradually recovering. We further uncover a token-level mechanism: confidence bifurcates into steadily increasing Imitation-Anchor Tokens that quickly anchor optimization and other yet-to-learn tokens whose confidence is suppressed until after the bottleneck. And the characteristic that these two types of tokens cannot coexist is the root cause of the failure in continual distillation. To this end, we propose Training-Trajectory-Aware Token Selection (T3S) to reconstruct the training objective at the token level, clearing the optimization path for yet-to-learn tokens. T3S yields consistent gains in both AR and dLLM settings: with only hundreds of examples, Qwen3-8B surpasses DeepSeek-R1 on competitive reasoning benchmarks, Qwen3-32B approaches Qwen3-235B, and T3-trained LLaDA-2.0-Mini exceeds its AR baseline, achieving state-of-the-art performance among all of 16B-scale no-think models.

2512.16739 2026-05-22 cs.AI

AI-Driven Prediction of Cancer Pain Episodes: A Hybrid Decision Support Approach

基于AI的癌症疼痛发作预测:一种混合决策支持方法

Yipeng Zhuang, Yifeng Guo, Yuewen Li, Yuheng Wu, Philip Leung-Ho Yu, Tingting Song, Zhiyong Wang, Kunzhong Zhou, Weifang Wang, Li Zhuang

发表机构 * The University of Hong Kong(香港大学) Peking University Cancer Hospital Yunnan Hospital, The Third Affiliated Hospital of Kunming Medical University(北京大学肿瘤医院云南医院,昆明医科大学第三附属医院)

AI总结 本研究提出了一种混合机器学习和大语言模型的方法,利用结构化和非结构化电子健康记录数据预测癌症患者在住院48和72小时内疼痛发作,通过整合时间序列药物趋势和模糊剂量记录,提高了敏感性和可解释性,实现了87.6%和91.7%的准确率。

详情
AI中文摘要

肺癌患者经常经历突破性疼痛发作,高达91%的患者需要及时干预。为了实现主动疼痛管理,我们提出了一种混合机器学习和大语言模型的管道,利用结构化和非结构化的电子健康记录数据预测住院48和72小时内的疼痛发作。分析了266名住院患者的历史队列,特征包括人口统计学数据、肿瘤分期、生命体征和WHO分级镇痛药使用情况。机器学习模块捕捉时间序列药物趋势,而大语言模型解释模糊的剂量记录和自由文本临床笔记。整合这些模态提高了灵敏度和可解释性。我们的框架在48小时和72小时的准确率分别为0.876和0.917,灵敏度分别提高了10.6%和10.7%,归因于大语言模型的增强。这种混合方法提供了一种临床可解释且可扩展的工具,用于早期疼痛发作预测,有望提高治疗精准度并优化肿瘤学护理中的资源分配。

英文摘要

Lung cancer patients frequently experience breakthrough pain episodes, with up to 91% requiring timely intervention. To enable proactive pain management, we propose a hybrid machine learning and large language model pipeline that predicts pain episodes within 48 and 72 hours of hospitalization using both structured and unstructured electronic health record data. A retrospective cohort of 266 inpatients was analyzed, with features including demographics, tumor stage, vital signs, and WHO-tiered analgesic use. The machine learning module captured temporal medication trends, while the large language model interpreted ambiguous dosing records and free-text clinical notes. Integrating these modalities improved sensitivity and interpretability. Our framework achieved an accuracy of 0.876 (48h) and 0.917 (72h), with improvements in sensitivity of 10.6% and 10.7%, respectively, attributable to large language model augmentation. This hybrid approach offers a clinically interpretable and scalable tool for early pain episode forecasting, with potential to enhance treatment precision and optimize resource allocation in oncology care.

2512.12744 2026-05-22 cs.LG

Resting Neurons, Active Insights: Robustifying Activation Sparsity in LLMs via Spontaneity

静息神经元,主动洞察:通过自发性增强LLM中的激活稀疏性

Haotian Xu, Jiannan Yang, Tian Gao, Tsui-Wei Weng, Tengfei Ma

发表机构 * IBM Thomas J. Watson Research Center, Yorktown Heights, USA(IBM 托马斯·J·沃森研究中心,美国Yorktown Heights) Halıcıoğlu Data Science Institute, UC San Diego, La Jolla, USA(哈利奇欧数据科学研究所,美国UC圣地亚哥La Jolla) Stony Brook University, Stony Brook, USA(史泰文·布鲁克大学,美国Stony Brook)

AI总结 本文提出了一种通过引入自发神经元(SPON)来增强LLM中激活稀疏性的方法,解决了高稀疏率下模型精度下降的问题,通过分布匹配训练SPON,使模型在稀疏计算中保持稳定和泛化能力。

Comments ICML 2026

详情
AI中文摘要

激活稀疏性提供了一种有吸引力的途径来加速大型语言模型(LLM)的推理过程,通过选择性地抑制隐藏激活。然而,现有方法在高稀疏率下表现出严重的准确性下降。我们发现,这种失败源于表征不稳定:*激活稀疏性破坏了预训练期间学习的输入依赖激活,导致隐藏状态的分布偏移。*我们通过将激活稀疏性重新定义为表征对齐问题,并引入**自发神经元(SPON)**,一种受生物系统中自发神经活动启发的轻量机制。SPON注入一组小的可学习、输入无关的激活向量,作为稀疏计算中的持久表征锚点。这些向量通过分布匹配训练与密集模型匹配,并在训练后可吸收进偏置项中,带来极小的推理开销。在多个LLM架构上,SPON一致地恢复了性能,稳定了潜在表征,并保持了泛化能力。我们的结果确立了SPON作为可靠激活稀疏推理的有效且原则性解决方案,并为LLM的知识保留提供了新的见解。

英文摘要

Activation sparsity offers a compelling route to accelerate large language model (LLM) inference by selectively suppressing hidden activations, yet existing approaches exhibit severe accuracy degradation at high sparsity. We show that this failure stems from representational instability: *activation sparsity disrupts input-dependent activation learned during pretraining, inducing distribution shifts in hidden states.* We address this issue by reframing activation sparsity as a representational alignment problem and introducing **Spontaneous Neurons (SPON)**, a lightweight mechanism inspired by spontaneous neural activity in biological systems. SPON injects a small set of learnable, input-independent activation vectors that act as persistent representational anchors for sparse computation. These vectors are trained via distribution matching to the dense model and can be absorbed into bias terms after training, incurring negligible inference overhead. Across multiple LLM backbones, SPON consistently restores performance, stabilizes latent representations, and preserves generalization. Our results establish SPON as an effective and principled solution for reliable activation-sparse inference, and offer new insights into knowledge retention in LLMs.

2512.11587 2026-05-22 cs.LG cs.NA math.NA math.OC

Gradient Descent as a Perceptron Algorithm: Understanding Dynamics and Implicit Acceleration

梯度下降作为感知机算法:理解动态与隐式加速

Alexander Tyurin

发表机构 * Applied AI Institute, Moscow, Russia(应用人工智能研究所,莫斯科,俄罗斯)

AI总结 本文研究了梯度下降在神经网络训练中的优化动态和隐式加速现象,通过非线性模型分析显示梯度下降步骤等价于广义感知机算法,揭示了非线性模型在迭代复杂度上的优势。

详情
AI中文摘要

即使对于应用于神经网络训练的梯度下降(GD)方法,理解其优化动态,包括收敛速度、迭代轨迹、函数值振荡,尤其是其隐式加速现象,仍然是一个具有挑战性的问题。我们分析了具有逻辑损失的非线性模型,并展示梯度下降的步骤等同于广义感知机算法(Rosenblatt, 1958),从而提供了新的动态视角。这种简化步骤通过经典线性代数工具进行分析。在最小化示例中,我们证明了双层模型的非线性可以证明在迭代复杂度上比线性模型更快,即$ ilde{O}(\sqrt{d})$,相比线性模型的$Ω(d)$,其中$d$是特征数量。这有助于解释神经网络中观察到的优化动态和隐式加速现象。理论结果通过广泛的数值实验得到支持。我们相信这种替代观点将进一步推动神经网络优化的研究。

英文摘要

Even for the gradient descent (GD) method applied to neural network training, understanding its optimization dynamics, including convergence rate, iterate trajectories, function value oscillations, and especially its implicit acceleration, remains a challenging problem. We analyze nonlinear models with the logistic loss and show that the steps of GD reduce to those of generalized perceptron algorithms (Rosenblatt, 1958), providing a new perspective on the dynamics. This reduction yields significantly simpler algorithmic steps, which we analyze using classical linear algebra tools. Using these tools, we demonstrate on a minimalistic example that the nonlinearity in a two-layer model can provably yield a faster iteration complexity $\tilde{O}(\sqrt{d})$ compared to $Ω(d)$ achieved by linear models, where $d$ is the number of features. This helps explain the optimization dynamics and the implicit acceleration phenomenon observed in neural networks. The theoretical results are supported by extensive numerical experiments. We believe that this alternative view will further advance research on the optimization of neural networks.

2512.10719 2026-05-22 cs.CV

SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

SpaceDrive: 在基于视觉语言模型的自动驾驶中引入空间感知

Peizheng Li, Zhenghao Zhang, David Holtz, Hang Yu, Yutong Yang, Yuzhi Lai, Rui Song, Andreas Geiger, Andreas Zell

发表机构 * Mercedes-Benz AG(梅赛德斯-奔驰集团) University of Tübingen(图宾根大学) Tübingen AI Center(图宾根人工智能中心) TU Munich(慕尼黑工业大学) Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院) University of Stuttgart(斯图加特大学) UCLA(加州大学洛杉矶分校)

AI总结 本文提出SpaceDrive框架,通过将空间信息作为显式位置编码来增强基于VLM的自动驾驶系统对精细3D空间关系的理解,从而提升规划精度和开放环性能。

详情
AI中文摘要

基于视觉语言模型(VLM)的端到端自动驾驶方法因具备通用的视觉理解和强大的推理能力而迅速发展。然而,我们发现当前VLM在理解细粒度的3D空间关系方面存在困难,这在与物理世界交互的系统中是基本要求。为了解决这一问题,我们提出了SpaceDrive,一个基于空间感知的VLM自动驾驶框架,将空间信息作为显式位置编码(PEs)而非文本数字标记,从而实现语义和空间表示的联合推理。SpaceDrive采用通用的位置编码器处理从多视角深度估计、历史自我状态和文本提示中得到的所有3D坐标。这些3D PE首先叠加到相应的2D视觉标记上,同时作为任务无关的坐标表示,取代数字形式的数值标记作为VLM的输入和输出。这种机制使模型能够更好地在空间推理中索引特定的视觉语义,并直接回归轨迹坐标而非逐位生成,从而提升规划精度。广泛的实验验证了SpaceDrive在nuScenes数据集上实现了最先进的开放环性能,并在Bench2Drive闭环基准中取得了78.02的第二好Driving Score。代码可在:https://github.com/zhenghao2519/SpaceDrive获取。

英文摘要

End-to-end autonomous driving methods built on vision language models (VLMs) have undergone rapid development driven by their universal visual understanding and strong reasoning capabilities obtained from the large-scale pretraining. However, we find that current VLMs struggle to understand fine-grained 3D spatial relationships which is a fundamental requirement for systems interacting with the physical world. To address this issue, we propose SpaceDrive, a spatial-aware VLM-based driving framework that treats spatial information as explicit positional encodings (PEs) instead of textual digit tokens, enabling joint reasoning over semantic and spatial representations. SpaceDrive employs a universal positional encoder to all 3D coordinates derived from multi-view depth estimation, historical ego-states, and text prompts. These 3D PEs are first superimposed to augment the corresponding 2D visual tokens. Meanwhile, they serve as a task-agnostic coordinate representation, replacing the digit-wise numerical tokens as both inputs and outputs for the VLM. This mechanism enables the model to better index specific visual semantics in spatial reasoning and directly regress trajectory coordinates rather than generating digit-by-digit, thereby enhancing planning accuracy. Extensive experiments validate that SpaceDrive achieves state-of-the-art open-loop performance on the nuScenes dataset and the second-best Driving Score of 78.02 on the Bench2Drive closed-loop benchmark over existing VLM-based methods. Code is available at: https://github.com/zhenghao2519/SpaceDrive.

2512.02193 2026-05-22 cs.AI

From monoliths to modules: Decomposing transducers for efficient world modelling

从整体到模块:分解转换器以实现高效的world建模

Alexander Boyd, Franz Nowak, David Hyland, Manuel Baltieri, Fernando E. Rosas

发表机构 * Department of Informatics, University of Sussex(Sussex大学信息学院) Beyond Institute for Theoretical Science (BITS)(理论科学研究所) ETH Zürich(苏黎世联邦理工学院) Principles of Intelligent Behaviour in Biological and Social Systems (PIBBSS)(生物和社会系统智能行为原理研究所) Department of Computer Science, University of Oxford(牛津大学计算机科学系) Araya Inc.(Araya公司) Sussex AI and Sussex Centre for Consciousness Science, University of Sussex(Sussex大学人工智能与意识科学中心) Centre for Complexity Science and Center for Psychedelic Research, Department of Brain Sciences, Imperial College London(复杂科学中心和迷幻研究中心,伦敦帝国理工学院脑科学系) Center for Eudaimonia and Human Flourishing, University of Oxford(幸福与人类繁荣中心,牛津大学)

AI总结 本文提出了一种分解复杂world建模的方法,通过转换器框架将世界模型分解为多个模块,从而提高计算效率并支持分布式推理,为AI安全和现实应用提供基础。

详情
AI中文摘要

world模型最近被提出作为AI代理在部署前训练和评估的沙盒环境。尽管现实中的world模型通常计算需求高,但通过利用现实世界场景中子组件以模块化方式交互的事实,可以缓解这一问题。在本文中,我们通过开发一个框架来分解由转换器表示的复杂world模型,探索这一想法。转换器是一类扩展POMDPs的模型。尽管转换器的组合已被深入理解,我们的结果澄清了如何通过推导在不同输入-输出子空间上操作的子转换器来反转这一过程,从而实现并行化和可解释的替代方案,以支持分布式推理。总体而言,这些结果为连接现实推理所需的计算效率与AI安全所要求的结构透明性奠定了基础。

英文摘要

World models have been recently proposed as sandbox environments in which AI agents can be trained and evaluated before deployment. While realistic world models often have high computational demands, this can often be alleviated by exploiting the fact that real-world scenarios tend to involve subcomponents that interact in a modular manner. In this paper, we explore this idea by developing a framework for decomposing complex world models represented by transducers, a class of models generalising POMDPs. Whereas the composition of transducers is well understood, our results clarify how to invert this process by deriving sub-transducers operating on distinct input-output subspaces, enabling parallelizable and interpretable alternatives to monolithic world modelling that can support distributed inference. Overall, these results lay groundwork for bridging the computational efficiency required for real-world inference and the structural transparency demanded by AI safety.

2511.18159 2026-05-22 cs.LG

Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models

为扩散模型带来稳定性:分解和减少训练掩码扩散模型的方差

Mengni Jia, Mengyu Zhou, Yihao Liu, Xiaoxi Jiang, Guanjun Jiang

发表机构 * University of Cambridge(剑桥大学) Peking University(北京大学) Qwen Large Model Application Team, Alibaba(阿里巴巴通义大模型应用团队)

AI总结 本文研究了掩码扩散模型(MDMs)训练方差高导致不稳定的问题,通过分解方差来源并提出六种方差减少方法,显著提升了模型在复杂推理任务中的准确率,并将运行间变异性降低至自回归模型(ARMs)水平。

详情
AI中文摘要

Masked diffusion models (MDMs) are a promising alternative to autoregressive models (ARMs), but they suffer from inherently much higher training variance. High variance leads to noisier gradient estimates and unstable optimization, so even equally strong pretrained MDMs and ARMs that are competitive at initialization often diverge after task-specific training, with MDMs falling far behind. There has been no theoretical explanation or systematic solution. We derive the first decomposition of MDM training variance into three sources: (A) masking pattern noise, (B) masking rate noise, and (C) data noise, while ARMs are only affected by (C). This explains the fundamental training gap. Building on this foundation, we design six variance-reduction methods, including two core methods: (1) P-POTS, a Pareto-optimal t sampler that minimizes training variance by sampling harder t values more often with appropriately smaller update steps, and (2) MIRROR, which uses negatively correlated samples to reduce (A). Experiments show that compared to standard MDM training, our methods improve accuracy by 7-8% on complex reasoning tasks, while simultaneously reducing run-to-run variability to near ARM levels, substantially narrowing the gap with strong ARM baselines; in most settings, even the best baseline runs remain below the worst run of our method.

英文摘要

Masked diffusion models (MDMs) are a promising alternative to autoregressive models (ARMs), but they suffer from inherently much higher training variance. High variance leads to noisier gradient estimates and unstable optimization, so even equally strong pretrained MDMs and ARMs that are competitive at initialization often diverge after task-specific training, with MDMs falling far behind. There has been no theoretical explanation or systematic solution. We derive the first decomposition of MDM training variance into three sources: (A) masking pattern noise, (B) masking rate noise, and (C) data noise, while ARMs are only affected by (C). This explains the fundamental training gap. Building on this foundation, we design six variance-reduction methods, including two core methods: (1) P-POTS, a Pareto-optimal t sampler that minimizes training variance by sampling harder t values more often with appropriately smaller update steps, and (2) MIRROR, which uses negatively correlated samples to reduce (A). Experiments show that compared to standard MDM training, our methods improve accuracy by 7-8% on complex reasoning tasks, while simultaneously reducing run-to-run variability to near ARM levels, substantially narrowing the gap with strong ARM baselines; in most settings, even the best baseline runs remain below the worst run of our method.

2511.10619 2026-05-22 cs.LG stat.ML

Algorithm Design and Stronger Guarantees for the Improving Multi-Armed Bandits Problem

改进多臂老虎机问题的算法设计及更强的保证

Avrim Blum, Marten Garicano, Kavya Ravichandran, Dravyansh Sharma

发表机构 * Toyota Technological Institute at Chicago(芝加哥丰田技术研究所) University of Chicago(芝加哥大学) IDEAL Institute, Toyota Technological Institute at Chicago(IDEAL研究所,芝加哥丰田技术研究所)

AI总结 本文提出两种新的参数化老虎机算法家族,通过离线数据界定了学习近最优算法的样本复杂度,并在标准超参数调优基准上进行了实证评估。第一家族包含先前工作的最优随机算法,展示在满足额外凹性性质的臂奖励曲线下,可以实现更强的保证。第二家族算法在良好行为实例上保证最佳臂识别,在不良行为实例上退化为最坏情况保证。

Comments 36 pages

详情
AI中文摘要

改进多臂老虎机问题是一个在不确定性下分配努力的形式模型,受投资新技术研究努力、进行临床试验和从学习曲线中选择超参数等场景的启发。每次拉取臂提供奖励,该奖励以递减回报单调增加。已有大量工作设计了改进老虎机算法,但最坏情况保证较为悲观。事实上,已知确定性和随机性算法相对于最优臂的强下界分别为Ω(k)和Ω(√k)的乘法近似因子。在本文中,我们提出两个新的参数化老虎机算法家族,并利用离线数据界定了从每个家族学习近最优算法的样本复杂度。我们还在标准超参数调优基准上进行了实证评估。我们定义的第一家族包含先前工作的最优随机算法。我们证明,适当选择的算法从该家族中可以实现更强的保证,当臂奖励曲线下满足与凹性强度相关的额外性质时,具有最优的k依赖性。我们的第二家族包含在良好行为实例上保证最佳臂识别并在不良行为实例上退化为最坏情况保证的算法。

英文摘要

The improving multi-armed bandits problem is a formal model for allocating effort under uncertainty, motivated by scenarios such as investing research effort into new technologies, performing clinical trials, and hyperparameter selection from learning curves. Each pull of an arm provides reward that increases monotonically with diminishing returns. A growing line of work has designed algorithms for improving bandits, albeit with somewhat pessimistic worst-case guarantees. Indeed, strong lower bounds of $Ω(k)$ and $Ω(\sqrt{k})$ multiplicative approximation factors are known for both deterministic and randomized algorithms (respectively) relative to the optimal arm, where $k$ is the number of bandit arms. In this work, we propose two new parameterized families of bandit algorithms and bound the sample complexity of learning the near-optimal algorithm from each family using offline data. We also perform empirical evaluations on standard hyperparameter tuning benchmarks. The first family we define includes the optimal randomized algorithm from prior work. We show that an appropriately chosen algorithm from this family can achieve stronger guarantees, with optimal dependence on $k$, when the arm reward curves satisfy additional properties related to the strength of concavity. Our second family contains algorithms that both guarantee best-arm identification on well-behaved instances and revert to worst-case guarantees on poorly-behaved instances.

2511.04838 2026-05-22 cs.LG math.SP q-bio.MN

SPECTRA: Spectral Domain-Aware Graph Generation for Imbalanced Molecular Property Regression

SPECTRA: 用于不平衡分子属性回归的谱域感知图生成

Brenda Nogueira, Gisela A. Gonzalez-Montiel, Meng Jiang, Nitesh V. Chawla, Nuno Moniz

发表机构 * University of Notre Dame, Dept. of Computer Science University of Notre Dame, Dept. of Chemistry University of Notre Dame, Lucy Family Institute for Data \& Society Notre Dame Indiana USA University of Notre Dame, Lucy Family Institute for Data \& Society

AI总结 本文提出SPECTRA方法,通过结合稀缺性感知预算方案、目标邻居图对齐和拉普拉斯谱插值,提升对相关但数据稀缺的分子属性值的预测能力,同时在相关目标范围内优于现有最先进方法,计算时间减少约4倍。

详情
AI中文摘要

分子属性回归在化学相关的目标范围内遇到困难,因为这些范围在数据集中代表性不足。标准的平均误差最小化方法在这些高相关性情况下表现不佳,而过采样方法会导致分子表示失去意义。本文提出SPECTRA,一种谱域感知的图生成方法,旨在提高对相关但数据稀缺的分子属性值的预测能力。它结合了稀缺性感知的预算方案以聚焦数据稀缺区域,目标邻居图对齐以建立结构对应关系,以及拉普拉斯谱、节点特征和目标的插值。结合使用谱图神经网络和边缘感知的切比雪夫卷积,SPECTRA在属性预测基准测试中表现出色,在相关目标范围内与最先进的方法竞争,同时计算时间减少约4倍。

英文摘要

Molecular property regression struggles with cases in chemically relevant target ranges that are underrepresented in datasets. Standard average error minimization approaches underperform in these highly relevant cases, and oversampling approaches lead to meaningless molecular representations. In this paper, we propose SPECTRA, a spectral, domain-aware graph generation method designed to improve the prediction of underrepresented but relevant molecular property values. It combines a rarity-aware budgeting scheme to focus generation where data are scarce, target-neighbors graph alignment to establish structural correspondence, and interpolation of Laplacian spectra, node features, and targets. Coupled with spectral GNN using edge-aware Chebyshev convolutions, SPECTRA shows its effectiveness in property prediction benchmarks with competitive performance over leading state-of-the-art methods in relevant target ranges, while requiring ~4x less computational time.

2511.02043 2026-05-22 cs.LG cs.PF

Flashlight: PyTorch Compiler Extensions to Accelerate Attention Variants

Flashlight: PyTorch 编译器扩展以加速注意力变种

Bozhi You, Irene Wang, Zelal Su Mustafaoglu, Abhinav Jangda, Angélica Moreira, Roshan Dathathri, Divya Mahajan, Keshav Pingali

发表机构 * Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country(匿名机构,匿名城市,匿名地区,匿名国家)

AI总结 本文提出Flashlight,一种基于PyTorch的编译器框架,能够自动生成融合的FlashAttention风格内核,支持任意注意力程序,无需静态模板或预定义内核专有化,从而在保持性能的同时提供灵活性。

详情
AI中文摘要

注意力是大型语言模型(LLMs)的基本构建块,因此有很多努力去高效地实现它。例如,FlashAttention利用分块和内核融合来优化注意力。最近,一些注意力变种被引入以提高模型质量和效率。支持它们仍然困难,因为它们通常需要专门的内核或手动调优的实现。FlexAttention最近通过使用静态编程模板来支持FlashAttention-like内核来解决部分这一差距。在本文中,我们介绍了Flashlight,一种位于PyTorch生态系统中的编译器原生框架,能够自动生成融合的FlashAttention风格内核,适用于任意注意力程序,而无需依赖静态模板或预定义的内核专有化。Flashlight利用PyTorch的编译流程来透明地融合和分块注意力计算,使各种注意力模式能够高效执行。不仅支持FlexAttention模型中所有可表达的变种,还处理更一般、数据依赖的注意力公式,这些超出了FlexAttention的能力范围。我们的结果表明,Flashlight生成的内核在性能上与FlexAttention具有竞争力或更优,同时提供原生PyTorch代码的灵活性,使开发人员能够快速探索新的注意力模型,而不会牺牲性能。

英文摘要

Attention is a fundamental building block of large language models (LLMs), so there have been many efforts to implement it efficiently. For example, FlashAttention leverages tiling and kernel fusion to optimize attention. Recently, a number of variants of attention have been introduced to enhance model quality or efficiency. Supporting them efficiently remains difficult since they usually require specialized kernels or hand-tuned implementations. FlexAttention recently addressed part of this gap by using static programming templates to support FlashAttention-like kernels for a subset of attention variants. In this paper, we introduce Flashlight, a compiler-native framework within the PyTorch ecosystem that automatically generates fused, FlashAttention-style kernels for arbitrary attention-based programs, without relying on static templates or predefined kernel specializations. Flashlight leverages PyTorch's compilation workflow to fuse and tile attention computations transparently, enabling efficient execution for diverse attention patterns. Not only does it support all variants expressible in the FlexAttention model but it also handles more general, data-dependent attention formulations that are beyond the capabilities of FlexAttention. Our results show that Flashlight produces kernels with competitive or superior performance to FlexAttention, while offering the flexibility of native PyTorch code, enabling developers to rapidly explore new attention models without sacrificing performance.

2510.20814 2026-05-22 cs.CV

SpectraMorph: Structured Latent Learning for Self-Supervised Hyperspectral Super-Resolution

SpectraMorph: 结构化潜在学习用于自监督超光谱超分辨率

Ritik Shah, Marco F Duarte

发表机构 * University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校)

AI总结 本研究提出SpectraMorph,一种基于物理指导的自监督融合框架,通过结构化潜在空间实现超光谱超分辨率,利用多光谱图像与超光谱图像的融合,产生可解释的中间结果,并在短时间内训练,即使在单波段多光谱图像下也保持鲁棒性。

Journal ref ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

详情
AI中文摘要

超光谱传感器每像素捕获密集的光谱信息,但空间分辨率低,导致边界模糊和混合像素效应。共注册的互补传感器如多光谱、RGB或全色相机提供高空间分辨率细节,推动通过超光谱与多光谱图像融合实现超光谱超分辨率。现有的基于深度学习的方法虽然性能强大,但依赖于不透明的回归器,缺乏可解释性且在多光谱图像波段很少时往往失效。我们提出了SpectraMorph,一种具有结构化潜在空间的物理指导自监督融合框架。SpectraMorph不通过直接回归,而是强制一个解混瓶颈:从低分辨率超光谱图像中提取端成员签名,并通过紧凑的多层感知机从多光谱图像预测类似丰度的地图。通过线性混合重建光谱,训练通过多光谱传感器的光谱响应函数进行自监督方式。SpectraMorph产生可解释的中间结果,训练时间短于一分钟,并且即使在单波段(全色)多光谱图像下也保持鲁棒性。在合成和真实数据集上的实验表明,SpectraMorph在自监督和无监督基线中表现一致优于最先进方法,同时在监督基线中也保持非常具有竞争力。

英文摘要

Hyperspectral sensors capture dense spectra per pixel but suffer from low spatial resolution, causing blurred boundaries and mixed-pixel effects. Co-registered companion sensors such as multispectral, RGB, or panchromatic cameras provide high-resolution spatial detail, motivating hyperspectral super-resolution through the fusion of hyperspectral and multispectral images (HSI-MSI). Existing deep learning based methods achieve strong performance but rely on opaque regressors that lack interpretability and often fail when the MSI has very few bands. We propose SpectraMorph, a physics-guided self-supervised fusion framework with a structured latent space. Instead of direct regression, SpectraMorph enforces an unmixing bottleneck: endmember signatures are extracted from the low-resolution HSI, and a compact multilayer perceptron predicts abundance-like maps from the MSI. Spectra are reconstructed by linear mixing, with training performed in a self-supervised manner via the MSI sensor's spectral response function. SpectraMorph produces interpretable intermediates, trains in under a minute, and remains robust even with a single-band (pan-chromatic) MSI. Experiments on synthetic and real-world datasets show SpectraMorph consistently outperforming state-of-the-art unsupervised/self-supervised baselines while remaining very competitive against supervised baselines.

2510.08759 2026-05-22 cs.CV cs.RO

Dissecting Embodied Abilities in Multimodal Language Models through Skill-level Evaluation and Diagnosis

通过技能级评估与诊断解构多模态语言模型的具身能力

Yu Qi, Haibo Zhao, Ziyu Guo, Siyuan Ma, Ziyan Chen, Yaokun Han, Renrui Zhang, Zitiantao Lin, Yizhe Zhu, Shiji Xin, Yijian Huang, Boce Hu, Kai Cheng, Peiheng Wang, Jiazheng Liu, Jiayi Zhang, Yizhe Zhu, Wenqing Wang, Yiran Qin, Haojie Huang, Lawson L. S. Wong

发表机构 * Northeastern University, Boston, MA, USA The Chinese University of Hong Kong, Hong Kong, China Peking University, Beijing, China Westlake University, Hangzhou, China Harvard University, Cambridge, MA, USA Purdue University, West Lafayette, IN, USA University of Oxford, Oxford, United Kingdom

AI总结 本文提出BEAR基准,通过分解具身任务为14个原子技能进行细粒度评估,发现感知能力是推理失败的主要瓶颈,并提出BEAR-Agent多模态对话代理,显著提升具身技能性能。

Comments Accepted to ICML 2026

详情
AI中文摘要

理解具身多模态大语言模型(MLLMs)的能力瓶颈对于改进具身代理至关重要。然而,现有具身基准主要集中在任务级评估,未能提供模型失败的潜在原因的可操作见解。为解决这一限制,我们引入BEAR,一个将具身任务分解为14个原子技能以进行细粒度技能级评估的基准。BEAR包含4,469个交错的图像-视频-文本样本,涵盖6类中的14种技能,从低级感知到高级规划。我们评估了20个MLLMs在BEAR上的表现,采用分层技能级诊断框架,并揭示了两个关键发现:(1)感知能力是推理失败的主要瓶颈,(2)当前模型存在不稳定的时间空间建模问题,这在先前基准中未被充分暴露。受这些发现启发,我们进一步提出BEAR-Agent,一个多模态对话代理,通过添加视觉和空间推理工具来增强MLLMs。BEAR-Agent在具身技能上显著提升了性能,在BEAR上相对于GPT-5基模型实现了17.5%的相对提升,同时在仿真和现实世界机器人实验中也优于强基线模型。项目页面:https://bear-official66.github.io/

英文摘要

Understanding the capability bottlenecks of embodied multimodal large language models (MLLMs) is crucial for improving embodied agents. However, existing embodied benchmarks mainly focus on task-level evaluation and fail to provide actionable insights into the underlying causes of model failures. To address this limitation, we introduce BEAR, a benchmark that decomposes embodied tasks into 14 atomic skills for fine-grained skill-level evaluation. BEAR comprises 4,469 interleaved image-video-text samples spanning 14 skills across 6 categories, ranging from low-level perception to high-level planning. We evaluate 20 MLLMs on BEAR under a hierarchical skill-level diagnosis framework and uncover two key findings: (1) perceptual capabilities are major bottlenecks behind reasoning failures, and (2) current models suffer from unstable spatiotemporal modeling that remains largely unexposed in prior benchmarks. Motivated by these findings, we further propose BEAR-Agent, a multimodal conversational agent that augments MLLMs with visual and spatial reasoning tools. BEAR-Agent substantially improves performance across embodied skills, achieving a relative improvement of 17.5% on GPT-5 over the base model on BEAR, while also outperforming strong baselines in both simulation and real-world robotic experiments. Project page: https://bear-official66.github.io/

2510.07962 2026-05-22 cs.CL cs.AI

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

LightReasoner: 小型语言模型能否教会大型语言模型推理?

Jingyuan Wang, Yankai Chen, Zhonghang Li, Chao Huang

发表机构 * University of Hong Kong(香港大学) University of Chicago(芝加哥大学)

AI总结 本文提出LightReasoner框架,通过利用强专家模型与弱业余模型之间的行为差异,发现高价值推理时刻,从而提升大型语言模型的推理能力,同时减少资源消耗。

Comments Updated to ACL 2026 camera-ready version with improved method presentation, expanded related work discussion, additional analyses, and presentation refinements

详情
AI中文摘要

大型语言模型(LLMs)在推理任务上取得了显著进展,通常通过监督微调(SFT)实现。然而,SFT过程资源消耗大,依赖大规模定制数据集、拒绝采样演示和对所有token的统一优化,尽管只有少量token具有实际学习价值。在本工作中,我们探索了一个反直觉的想法:小型语言模型(SLMs)能否通过揭示高价值推理时刻来教会大型语言模型(LLMs)其独特优势?我们提出了LightReasoner,一种新的框架,利用强专家模型(LLM)与弱业余模型(SLM)之间的行为差异。LightReasoner分为两个阶段:(1)采样阶段通过专家-业余对比确定关键推理时刻,并构建捕捉专家优势的监督示例;(2)微调阶段将专家模型与这些提炼出的示例对齐,放大其推理优势。在七个数学基准测试中,LightReasoner将准确性提高了28.1%,同时将时间消耗减少了90%,采样问题减少了80%,调优token使用减少了99%,且不依赖真实标签。通过将弱SLMs转化为有效的教学信号,LightReasoner提供了一种可扩展且资源高效的提升LLM推理能力的方法。代码可在:https://github.com/HKUDS/LightReasoner获取。

英文摘要

Large language models (LLMs) have demonstrated remarkable progress in reasoning, often through supervised fine-tuning (SFT). However, SFT is resource-intensive, relying on large curated datasets, rejection-sampled demonstrations, and uniform optimization across all tokens, even though only a fraction carry meaningful learning value. In this work, we explore a counterintuitive idea: can smaller language models (SLMs) teach larger language models (LLMs) by revealing high-value reasoning moments that reflect the latter's unique strength? We propose LightReasoner, a novel framework that leverages the behavioral divergence between a stronger expert model (LLM) and a weaker amateur model (SLM). LightReasoner operates in two stages: (1) a sampling stage that pinpoints critical reasoning moments and constructs supervision examples capturing the expert's advantage through expert-amateur contrast, and (2) a fine-tuning stage that aligns the expert model with these distilled examples, amplifying its reasoning strengths. Across seven mathematical benchmarks, LightReasoner improves accuracy by up to 28.1%, while reducing time consumption by 90%, sampled problems by 80%, and tuned token usage by 99%, all without relying on ground-truth labels. By turning weaker SLMs into effective teaching signals, LightReasoner offers a scalable and resource-efficient approach for advancing LLM reasoning. Code is available at: https://github.com/HKUDS/LightReasoner

2510.04280 2026-05-22 cs.LG cs.AI cs.RO

A KL-regularization Framework for Learning to Plan with Adaptive Priors

一种基于KL正则化的学习规划框架:具有自适应先验的规划

Álvaro Serra-Gomez, Daniel Jarne Ornia, Dhruva Tirumala, Thomas Moerland

发表机构 * LIACS, Leiden University, Leiden, The Netherlands(莱顿大学莱顿分校,荷兰) Google Deepmind, London, United Kingdom(谷歌DeepMind,英国伦敦) University of Oxford, Oxford, United Kingdom(牛津大学,英国牛津)

AI总结 本文提出了一种基于KL正则化的学习规划框架,通过将规划器的动作分布作为先验整合到策略优化中,提升了在高维连续控制任务中模型驱动强化学习的样本效率和长期性能。

Comments Published at ICML2026

详情
AI中文摘要

有效的探索仍然是模型驱动强化学习(MBRL)中的核心挑战,尤其是在高维连续控制任务中,样本效率至关重要。近期的一项重要工作利用学习的策略作为模型预测路径积分(MPPI)规划的提案分布。初始方法在更新采样策略时独立于规划器分布,通常通过确定性策略梯度和熵正则化最大化学习的价值函数。然而,由于训练过程中遇到的状态依赖于MPPI规划器,使采样策略与规划器对齐可以提高价值估计的准确性以及长期性能。为此,近期的方法通过最小化KL散度到规划器分布或引入规划器引导的正则化来更新采样策略。在本文中,我们通过引入策略优化-模型预测控制(PO-MPC),将这些基于MPPI的强化学习方法统一到一个框架中,这是一种整合规划器动作分布作为先验的KL正则化MBRL方法家族。通过使学习的策略与规划器的行为对齐,PO-MPC允许在回报最大化和KL散度最小化之间更灵活的策略更新。我们澄清了先前方法如何作为该家族的特殊案例出现,并探索了之前未研究的变体。我们的实验表明,这些扩展配置产生了显著的性能提升,推动了基于MPPI的强化学习的前沿。

英文摘要

Effective exploration remains a central challenge in model-based reinforcement learning (MBRL), particularly in high-dimensional continuous control tasks where sample efficiency is crucial. A prominent line of recent work leverages learned policies as proposal distributions for Model-Predictive Path Integral (MPPI) planning. Initial approaches update the sampling policy independently of the planner distribution, typically maximizing a learned value function with deterministic policy gradient and entropy regularization. However, because the states encountered during training depend on the MPPI planner, aligning the sampling policy with the planner improves the accuracy of value estimation and long-term performance. To this end, recent methods update the sampling policy by minimizing KL divergence to the planner distribution or by introducing planner-guided regularization into the policy update. In this work, we unify these MPPI-based reinforcement learning methods under a single framework by introducing Policy Optimization-Model Predictive Control (PO-MPC), a family of KL-regularized MBRL methods that integrate the planner's action distribution as a prior in policy optimization. By aligning the learned policy with the planner's behavior, PO-MPC allows more flexibility in the policy updates to trade off Return maximization and KL divergence minimization. We clarify how prior approaches emerge as special cases of this family, and we explore previously unstudied variations. Our experiments show that these extended configurations yield significant performance improvements, advancing the state of the art in MPPI-based RL.

2509.20912 2026-05-22 cs.AI

DeFacto: Counterfactual Thinking with Images for Enforcing Evidence-Grounded and Faithful Reasoning

DeFacto: 通过图像进行反事实推理以强制证据支持和忠实推理

Tianrun Xu, Haoda Jing, Ye Li, Yuquan Wei, Jun Feng, Guanyu Chen, Haichuan Gao, Tianren Zhang, Feng Chen

发表机构 * Department of Automation, Tsinghua University, Beijing, China(清华大学自动化系) Zhongguancun Academy, Beijing, China(中关村学院) School of Software, Xinjiang University, Urumqi, China(新疆大学软件学院) College of Materials Science and Engineering, Fuzhou University, Fuzhou, China(福州大学材料科学与工程学院) Institute of Automation, Chinese Academy of Sciences, Beijing, China(中国科学院自动化研究所) School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China(中国科学院大学人工智能学院) Beijing Qianjue Technology Co., Ltd., Beijing, China(北京千 jue 技术有限公司)

AI总结 本文提出DeFacto框架,通过整合正例、反事实和随机遮蔽三种训练范式,提升多模态语言模型在证据一致性方面的表现,并引入DeFacto-1.5K基准进行系统评估。

详情
AI中文摘要

最近多模态语言模型(MLLMs)的进步使通过图像进行推理成为多模态推理的主要范式。然而,现有方法仍无法确保答案与证据的一致性,即正确答案必须由正确视觉证据支持。为了解决这个问题,我们提出了DeFacto,一种反事实推理框架,该框架明确地将视觉证据与最终答案对齐。我们的方法整合了三种互补的训练范式:正例、反事实和随机遮蔽。我们进一步开发了一个语言引导的证据构建流水线,该流水线能够自动定位与问题相关区域并生成反事实变体,从而得到DeFacto-100K。基于此数据集,我们训练MLLMs使用基于GRPO的强化学习,并设计三种互补的奖励机制以促进正确回答、结构化推理和一致的证据选择。此外,我们引入了DeFacto-1.5K,一个由人类标注的基准,用于系统评估证据支持的一致性,而不仅仅是答案准确性。在多样化的基准测试中,DeFacto在答案准确性和证据-答案一致性方面均显著优于强大的基线模型。

英文摘要

Recent advances in multimodal language models (MLLMs) have made thinking with images a dominant paradigm for multimodal reasoning. However, existing methods still fail to ensure evidence-answer consistency, where correct answers must be supported by correct visual evidence. To address this issue, we propose DeFacto, a counterfactual reasoning framework that explicitly aligns visual evidence with final answers. Our approach integrates three complementary training paradigms: positive, counterfactual, and random-masking. We further develop a language-guided evidence construction pipeline that automatically localizes question-relevant regions and generates counterfactual variants, resulting in DeFacto-100K. Building on this dataset, we train MLLMs with GRPO-based reinforcement learning and design three complementary rewards to promote correct answering, structured reasoning, and consistent evidence selection. Moreover, we introduce DeFacto-1.5K, a human-annotated benchmark for systematically evaluating evidence-grounded consistency beyond answer accuracy. Experiments on diverse benchmarks demonstrate that DeFacto substantially improves both answer accuracy and evidence-answer consistency over strong baselines.

2509.17086 2026-05-22 cs.CV

SFN-YOLO: Towards Free-Range Poultry Detection via Scale-aware Fusion Networks

SFN-YOLO:通过尺度感知融合网络实现自由放养禽类检测

Jie Chen, Yuhong Feng, Tao Dai, Hao Wang, Hongtao Chen, Zhaoxi He, Mingzhe Liu, Jiancong Bai

发表机构 * Shenzhen University(深圳大学) The Hong Kong University of Science(香港科学与技术大学)

AI总结 本文提出了一种名为SFN-YOLO的创新禽类检测方法,通过尺度感知融合技术提高复杂环境中的检测性能,并引入了专为自由放养条件设计的M-SCOPE数据集,实验表明该模型在仅7.2M参数的情况下达到了80.7%的mAP,比基准模型少35.1%的参数,同时保持了良好的泛化能力。

详情
AI中文摘要

检测和定位禽类对于推进智能禽类养殖至关重要。尽管检测导向方法已取得进展,但在自由放养环境中仍面临多尺度目标、遮挡和复杂或动态背景带来的挑战。为解决这些问题,我们引入了一种名为SFN-YOLO的创新禽类检测方法,该方法利用尺度感知融合技术,将详细的局部特征与更广泛的全局上下文相结合,以提高复杂环境中的检测性能。此外,我们还开发了一个新的扩展数据集(M-SCOPE),专门针对多样的自由放养条件。全面的实验表明,我们的模型在仅7.2M参数的情况下实现了80.7%的mAP,比基准模型少35.1%的参数,同时在不同领域中保持了强大的泛化能力。SFN-YOLO的高效和实时检测能力支持了自动化智能禽类养殖。

英文摘要

Detecting and localizing poultry is essential for advancing smart poultry farming. Despite the progress of detection-centric methods, challenges persist in free-range settings due to multiscale targets, obstructions, and complex or dynamic backgrounds. To tackle these challenges, we introduce an innovative poultry detection approach named SFN-YOLO that utilizes scale-aware fusion. This approach combines detailed local features with broader global context to improve detection in intricate environments. Furthermore, we have developed a new expansive dataset (M-SCOPE) tailored for varied free-range conditions. Comprehensive experiments demonstrate our model achieves an mAP of 80.7% with just 7.2M parameters, which is 35.1% fewer than the benchmark, while retaining strong generalization capability across different domains. The efficient and real-time detection capabilities of SFN-YOLO support automated smart poultry farming.

2509.09088 2026-05-22 cs.LG math.DG math.DS

An entropy formula for the Deep Linear Network

深度线性网络的熵公式

Govind Menon, Tianmin Yu

发表机构 * Division of Applied Mathematics, Brown University(布朗大学应用数学系) School of Mathematics, Institute for Advanced Study(高级研究院数学系) Department of Mathematics, Northwestern University(西北大学数学系)

AI总结 本文研究深度线性网络的黎曼几何,以建立学习过程的热力学描述。通过群作用分析过参数化,并利用参数空间到可观测空间的黎曼子流形,定义并计算玻尔兹曼熵。主要技术步骤是利用雅可比矩阵理论显式构造平衡流形的切空间正交基。

Comments Final version of accepted paper in SIAM Journal on Mathematical Analysis. Includes fixes of minor typos (especially equation (3.13), (6.35) and (6.36)

详情
AI中文摘要

我们研究深度线性网络(DLN)的黎曼几何,作为建立学习过程热力学描述的基础。主要工具是利用群作用分析过参数化以及利用参数空间到可观测空间的黎曼子流形。通过在参数空间中平衡流形的群轨道分层来定义并计算玻尔兹曼熵。我们还显示[2]中定义在可观测空间上的黎曼几何是通过平衡流形的黎曼子流形得到的。主要技术步骤是利用雅可比矩阵理论显式构造平衡流形切空间的正交基。

英文摘要

We study the Riemannian geometry of the Deep Linear Network (DLN) as a foundation for a thermodynamic description of the learning process. The main tools are the use of group actions to analyze overparametrization and the use of Riemannian submersion from the space of parameters to the space of observables. The foliation of the balanced manifold in the parameter space by group orbits is used to define and compute a Boltzmann entropy. We also show that the Riemannian geometry on the space of observables defined in [2] is obtained by Riemannian submersion of the balanced manifold. The main technical step is an explicit construction of an orthonormal basis for the tangent space of the balanced manifold using the theory of Jacobi matrices.

2509.06503 2026-05-22 cs.AI q-bio.QM

An AI system to help scientists write expert-level empirical software

一种帮助科学家编写专家级经验软件的AI系统

Eser Aygün, Anastasiya Belyaeva, Gheorghe Comanici, Marc Coram, Hao Cui, Jake Garrison, Renee Johnston Anton Kast, Cory Y. McLean, Peter Norgaard, Zahra Shamsi, David Smalling, James Thompson, Subhashini Venugopalan, Brian P. Williams, Chujun He, Sarah Martinson, Martyna Plomecka, Lai Wei, Yuchen Zhou, Qian-Ze Zhu, Matthew Abraham, Erica Brand, Anna Bulanova, Jeffrey A. Cardille, Chris Co, Scott Ellsworth, Grace Joseph, Malcolm Kane, Ryan Krueger, Johan Kartiwa, Dan Liebling, Jan-Matthis Lueckmann, Paul Raccuglia, Xuefei, Wang, Katherine Chou, James Manyika, Yossi Matias, John C. Platt, Lizzie Dorfman, Shibl Mourad, Michael P. Brenner

发表机构 * Google DeepMind(谷歌深Mind) Google Research(谷歌研究) Google Platforms and Devices(谷歌平台与设备) Massachusetts Institute of Technology(麻省理工学院) School of Engineering and Applied Sciences, Harvard University(哈佛大学工程与应用科学学院)

AI总结 本文提出Empirical Research Assistance (ERA)系统,利用大型语言模型和树搜索技术,自动创建高质量的科学软件,以加速计算实验的开发,从而提高科研效率。

Comments 78 pages, 31 figures, 22 tables

详情
AI中文摘要

科学发现的周期经常被缓慢、手动的软件创建所限制,用于支持计算实验。为了解决这个问题,我们提出了Empirical Research Assistance (ERA),一种AI系统,其目标是最大化一个质量度量。该系统使用大型语言模型(LLM)和树搜索(TS)来系统性地提高质量度量并智能地导航可能的解决方案空间。当探索并整合外部来源的复杂研究想法时,ERA能够产生专家级的结果。树搜索的有效性在各种任务上得到了证明。在生物信息学中,ERA发现了40种新的单细胞数据分析方法,这些方法在公开排行榜上优于顶级的人工方法。在流行病学中,ERA生成了14种模型,这些模型在预测新冠住院预测方面优于CDC集合和所有其他个体模型。ERA还为地理空间分析、斑马鱼神经活动预测和积分数值解法以及时间序列预测的规则基构造生成了专家级软件。通过为多样任务设计和实现新的解决方案,ERA代表了加速科学进步的重要一步。

英文摘要

The cycle of scientific discovery is frequently bottlenecked by the slow, manual creation of software to support computational experiments\cite{hannay2009how}. To address this, we present Empirical Research Assistance (ERA), an AI system that creates expert-level scientific software whose goal is to maximize a quality metric. The system uses a Large Language Model (LLM) and Tree Search (TS)\cite{silver2016mastering} to systematically improve the quality metric and intelligently navigate the large space of possible solutions. ERA achieves expert-level results when it explores and integrates complex research ideas from external sources. The effectiveness of tree search is demonstrated across a diverse range of tasks. In bioinformatics, ERA discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard. In epidemiology, ERA generated 14 models that outperformed the CDC ensemble and all other individual models for forecasting COVID-19 hospitalizations. ERA also produced expert-level software for geospatial analysis, neural activity prediction in zebrafish, and numerical solution of integrals, and a novel rule-based construction for time series forecasting. By devising and implementing novel solutions to diverse tasks, ERA represents a significant step towards accelerating scientific progress.

2508.03865 2026-05-22 cs.CL

An Entity Linking Agent for Question Answering

用于问答任务的实体链接代理

Yajie Luo, Yihong Wu, Muzhi Li, Jia Ao Sun, Xinyu Wang, Liheng Ma, Yingxue Zhang, Jian-Yun Nie

发表机构 * The Chinese University of Hong Kong(香港中文大学) McGill University(麦吉尔大学) Mila - Quebec AI Institute(魁北克AI研究院) Huawei Noah’s Ark Lab(华为诺亚实验室)

AI总结 本文提出了一种基于大语言模型的实体链接代理,用于解决问答任务中短且模糊用户问题的实体链接问题,通过两个实验验证了其有效性。

Comments 12 pages, 2 figures

详情
AI中文摘要

一些问答(QA)系统依赖知识库(KB)来提供准确答案。实体链接(EL)在将自然语言提及链接到KB条目中起着关键作用。然而,大多数现有的EL方法是为长上下文设计的,无法在问答任务中有效处理短且模糊的用户问题。我们提出了一种用于问答任务的实体链接代理,基于一个模拟人类认知流程的大语言模型。该代理主动识别实体提及、检索候选实体并做出决策。为了验证我们代理的有效性,我们进行了两项实验:基于工具的实体链接和问答任务评估。结果证实了我们代理的鲁棒性和有效性。

英文摘要

Some Question Answering (QA) systems rely on knowledge bases (KBs) to provide accurate answers. Entity Linking (EL) plays a critical role in linking natural language mentions to KB entries. However, most existing EL methods are designed for long contexts and do not perform well on short, ambiguous user questions in QA tasks. We propose an entity linking agent for QA, based on a Large Language Model that simulates human cognitive workflows. The agent actively identifies entity mentions, retrieves candidate entities, and makes decision. To verify the effectiveness of our agent, we conduct two experiments: tool-based entity linking and QA task evaluation. The results confirm the robustness and effectiveness of our agent.

2507.20268 2026-05-22 cs.LG eess.SP stat.ML

Reliable Wireless Indoor Localization via Cross-Validated Prediction-Powered Calibration

通过交叉验证的预测驱动校准实现可靠的无线室内定位

Seonghoon Yoo, Houssem Sifaou, Sangwoo Park, Joonhyuk Kang, Osvaldo Simeone

发表机构 * School of Electrical Engineering, Korea Advanced Institute of Science and Technology(韩国科学技术院电子工程学院) King’s Communications, Learning & Information Processing (KCLIP) Lab, Centre for Intelligent Information Processing Systems (CIIPS), Department of Engineering, King’s College London(伦敦国王学院信息与通信实验室,智能信息处理系统中心,工程系) Institute for Intelligent Networked Systems, Northeastern University London(伦敦东北大学智能网络系统研究所)

AI总结 本文提出一种利用有限校准数据同时优化预测器和估计合成标签偏差的方法,通过交叉验证预测驱动校准提高无线室内定位的可靠性。

详情
AI中文摘要

使用预测模型和接收信号强度信息(RSSI)进行无线室内定位需要适当的校准以获得可靠的定位估计。一种解决方法是使用由(通常不同的)预测模型生成的合成标签。但微调额外的预测器以及估计合成标签的残差偏差需要额外的数据,加剧了无线环境中的校准数据稀缺问题。本文提出了一种方法,能够高效利用有限的校准数据,同时微调预测器并估计合成标签的偏差,从而获得具有严格覆盖保证的预测集。在指纹数据集上的实验验证了所提出方法的有效性。

英文摘要

Wireless indoor localization using predictive models with received signal strength information (RSSI) requires proper calibration for reliable position estimates. One remedy is to employ synthetic labels produced by a (generally different) predictive model. But fine-tuning an additional predictor, as well as estimating residual bias of the synthetic labels, demands additional data, aggravating calibration data scarcity in wireless environments. This letter proposes an approach that efficiently uses limited calibration data to simultaneously fine-tune a predictor and estimate the bias of synthetic labels, yielding prediction sets with rigorous coverage guarantees. Experiments on a fingerprinting dataset validate the effectiveness of the proposed method.

2507.17640 2026-05-22 cs.CV

Not All Starting Points Are Equal: Pre-trained Priors and Their Outsized Impact on Person Identification

并非所有起始点都平等:预训练先验及其在人识别人脸识别中的巨大影响

Thomas M. Metz, Matthew Q. Hill, Alice J. O'Toole

发表机构 * School of Behavioral and Brain Sciences(行为与脑科学学院) The University of Texas at Dallas(德克萨斯大学达拉斯分校) Richardson, Texas, USA(德克萨斯州里德利尔)

AI总结 本文研究了预训练方法对人识别人脸识别任务的影响,发现预训练权重在域适应过程中扮演重要先验角色,并展示了使用大视觉基础模型进行简单域适应可获得SOTA结果。

详情
AI中文摘要

近年来,计算机视觉领域出现了大量多样化的通用预训练方法。然而,这些预训练方法对人识别人脸识别任务(re-id)的影响仍缺乏深入研究。我们发现,在等效域适应流程下,不同起始模型(架构和预训练权重)会产生显著不同的识别人脸识别结果。我们指出,对不同下游性能的直观解释是不足的,并提出预训练权重在域适应过程中学习的权重起着强先验作用。在此框架下,域适应解决方案可被视为Gibbs后验的最大概率点估计,其中预训练权重充当先验。在此框架下,我们展示了使用大预训练基础模型进行简单域适应可在多个re-id数据集(Market、PRCC、DeepChange、BTS)上获得SOTA结果,其参数空间与起始参数非常接近。此外,我们对这些解决方案进行了消融研究,发现它们可以使用小的迁移集和不同迁移数据集实现,但对优化器、权重衰减和损失函数的选择敏感。最终,我们提出直接使用大视觉基础模型(如CLIP、Dino、EVA、AIM等)进行微调的简单方法应作为未来re-id研究的重要基准。

英文摘要

Recent years have seen an explosion of diverse general purpose pre-training methodologies for computer vision. However, the impact that these pre-training methodologies have on person identification tasks (re-id) remains under-explored. We show that under equated domain adaptation pipelines, there is dramatic variance in person identification outcomes using different starting models (architectures and pre-trained weights). We show that a range of intuitive explanations for differing downstream performance on a range of re-id tests are insufficient and propose that pre-trained weights serve as a strong prior to the weights learned during domain adaptation. This framework allows for domain adapted solutions to be viewed as a maximum probability point estimate of the Gibbs posterior with the pre-trained weights acting as a prior. Under this framework, we show that large, pre-trained foundation models with simple domain adaptation achieve SOTA solutions on a range of re-id datasets (Market, PRCC, DeepChange, BTS) with solutions that are very close in the parameter space to the starting parameters. Moreover, we perform ablations on these solutions and show that they can be reached with small transfer sets and with varying transfer datasets but are sensitive to choice of optimizer, weight-decay, and loss function. Ultimately, we propose that the simple approach of direct fine-tuning using large vision foundation models (CLIP, Dino, EVA, AIM, etc.) needs to serve as an important baseline for future work in re-id.

2507.03674 2026-05-22 cs.CL cs.AI

STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking

STRUCTSENSE:一种任务无关的代理框架,用于结构化信息提取,具有人机协同评估和基准测试

Tek Raj Chhetri, Yibei Chen, Puja Trivedi, Dorota Jarecka, Saif Haobsh, Patrick Ray, Lydia Ng, Satrajit S. Ghosh

发表机构 * McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA(麦戈文脑科学研究所,麻省理工学院,马萨诸塞州剑桥市) Fylo Labs Inc., New York, NY, USA(Fylo实验室公司,纽约州纽约市) Allen Institute for Brain Science, Seattle, WA, USA(艾伦脑科学研究所,华盛顿州西雅图市)

AI总结 本文提出STRUCTSENSE框架,通过整合本体引导的符号知识、代理自我评估细化和人机协同验证,实现了结构化信息提取的鲁棒性,并在三个领域展示了其跨任务泛化能力。

Comments -

详情
AI中文摘要

从科学文献中提取结构化信息对于加速发现至关重要,但大型语言模型(LLMs)在需要专家知识的专门领域表现不佳,且在跨任务泛化方面表现差。我们引入STRUCTSENSE,一种模块化、任务无关、开源的框架,整合了本体引导的符号知识、代理自我评估细化和人机协同验证,以实现领域感知的稳健提取。我们在三个递增语义复杂度的任务上评估STRUCTSENSE:基于模式的评估工具提取(91-100%准确率)、从科学论文中提取元数据和资源(86-93%总体准确率)以及从神经科学文献中进行命名实体识别(NER)(58-75%标签准确率,共8,882个实体)。在两个生物医学NER基准(NCBI疾病和S800物种)上,系统实现了≥90%的宽松召回率和62.5-85.8%的严格召回率,同时提取了1,000-3,600个额外实体。本地概念映射服务在严格匹配下达到62-82%的Hits@1,在语义匹配下达到68-86%。这些结果在三个领域展示了STRUCTSENSE跨任务泛化的能力,同时保持了源地和可追溯性透明度。

英文摘要

Extracting structured information from scientific literature is critical for accelerating discovery, yet Large Language Models (LLMs) often struggle in specialized domains that require expert knowledge and generalize poorly across tasks. We introduce \textsc{StructSense}, a modular, task-agnostic, open-source framework that integrates ontology-guided symbolic knowledge, agentic self-evaluative refinement, and human-in-the-loop validation for robust domain-aware extraction. We evaluate \textsc{StructSense} on three tasks of increasing semantic complexity: schema-based extraction of assessment instruments (91--100\% accuracy), metadata and resource extraction from scientific papers (86--93\% overall), and named entity recognition (NER) from neuroscience literature (58--75\% label accuracy across 8,882 entities). On two biomedical NER benchmarks (NCBI Disease and S800 Species), the system achieves $\geq$90\% relaxed recall and 62.5--85.8\% strict recall while extracting 1,000--3,600 additional entities beyond gold annotations. The local concept mapping service achieves Hits@1 of 62--82\% under strict matching and 68--86\% under semantic matching. These results across three domains demonstrate that \textsc{StructSense} generalizes across tasks while maintaining source grounding and provenance transparency.

2506.23808 2026-05-22 cs.CV

Towards Initialization-free Calibrated Bundle Adjustment

迈向无初始化的校准捆绑调整

Carl Olsson, Amanda Nilsson

发表机构 * Lund University(隆德大学)

AI总结 本文提出了一种利用已知相机校准的无初始化校准SfM方法,通过引入成对相对旋转估计来实现近等距重建,从而提高三维重建的准确性。

详情
AI中文摘要

近期一系列工作表明,可以通过伪对象空间误差(pOSE)作为替代目标函数来实现无初始化的捆绑调整(BA)。初始重建步骤优化一个所有项都是射影不变的目标函数,无法纳入相机校准的知识。因此,解法仅在射影变换下确定,该过程需要更多的数据才能成功重建。相反,我们提出了一种能够利用已知相机校准的方法,从而产生近等距解,即精确到相似变换的重建。为此,我们引入了携带相机校准信息的成对相对旋转估计。这些估计仅对相似变换不变,因此鼓励保留真实场景的度量特征的解。我们的方法可以看作是将旋转平均整合到pOSE框架中,朝着无初始化校准SfM迈进。我们的实验评估表明,我们能够可靠地优化我们的目标函数,从随机起始解中以高概率收敛到全局最小值,从而产生准确的近等距重建。

英文摘要

A recent series of works has shown that initialization-free BA can be achieved using pseudo Object Space Error (pOSE) as a surrogate objective. The initial reconstruction-step optimizes an objective where all terms are projectively invariant and it cannot incorporate knowledge of the camera calibration. As a result, the solution is only determined up to a projective transformation of the scene and the process requires more data for successful reconstruction. In contrast, we present a method that is able to use the known camera calibration thereby producing near metric solutions, that is, reconstructions that are accurate up to a similarity transformation. To achieve this we introduce pairwise relative rotation estimates that carry information about camera calibration. These are only invariant to similarity transformations, thus encouraging solutions that preserve metric features of the real scene. Our method can be seen as integrating rotation averaging into the pOSE framework striving towards initialization-free calibrated SfM. Our experimental evaluation shows that we are able to reliably optimize our objective, achieving convergence to the global minimum with high probability from random starting solutions, resulting in accurate near metric reconstructions.

2506.19500 2026-05-22 cs.AI cs.CL cs.LG

NaviAgent: Graph-Driven Bilevel Planning for Scalable Tool Orchestration

NaviAgent: 一种基于图的双层规划用于可扩展的工具编排

Yan Jiang, Hao Zhou, Lizhong GU, Tianlong Li, Ruinan Jin, Wanqi Zhou, Ai Han

发表机构 * Department of Electrical and Computer Engineering, The Ohio State University, USA(电气与计算机工程系,俄亥俄州立大学,美国)

AI总结 本文提出NaviAgent,一种基于图的双层规划框架,通过解耦任务规划与工具执行,提升大规模工具编排的可扩展性和鲁棒性,实验表明其在任务成功率和实际应用中表现优异。

Comments Accepted to ICML 2026

Journal ref Proceedings of the 43rd International Conference on Machine Learning (ICML), 2026

详情
AI中文摘要

大型语言模型(LLMs)越来越多地作为功能调用代理,通过调用外部工具来处理超出其静态知识的任务。然而,它们通常逐个调用工具,缺乏对任务结构的整体视图。由于工具之间往往相互依赖,这导致了错误累积和可扩展性差,尤其是在扩展到数百或数千个工具时。为了解决这些限制,我们提出了NaviAgent,一种显式的双层架构,通过基于工具关系的图建模来解耦任务规划与工具执行。在规划层,基于LLM的代理决定是否直接回应、澄清意图或检索并执行独立于工具间复杂度的工具链。在执行层,工具世界导航模型(TWNM)编码工具之间的结构和行为关系,引导代理生成可扩展且鲁棒的调用序列。通过整合真实工具交互的反馈,NaviAgent实现了规划与执行之间的闭环对齐,使代理能够在大规模工具生态系统中实现自适应导航。在API-Bank和ToolBench上的评估显示,任务成功率(TSR)有持续改进,TWNM在复杂任务上平均提升13.1个百分点。进一步在50个真实API跨7个领域的测试中,展示了4.3-12.0个百分点的持续收益,步骤更少且延迟更低,证明了其在真实世界动态下的鲁棒泛化能力。

英文摘要

Large Language Models (LLMs) increasingly act as function-call agents that invoke external tools to tackle tasks beyond their static knowledge. However, they typically invoke tools one at a time without a global view of task structure. As tools often depend on one another, this leads to error accumulation and poor scalability, particularly when scaling to hundreds or thousands of tools. To address these limitations, we propose NaviAgent, an explicit bilevel architecture that decouples task planning from tool execution through graph-based modeling of tool relations. At the planning level, the LLM-based agent decides whether to respond directly, clarify intent, or retrieve and execute a toolchain independent of inter-tool complexity. At the execution level, a Tool World Navigation Model (TWNM) encodes structural and behavioral relations among tools, steering the agent to compose scalable and robust invocation sequences. Incorporating feedback from real tool interactions, NaviAgent achieves closed-loop alignment between planning and execution, enabling adaptive navigation in large-scale tool ecosystems. Evaluations on API-Bank and ToolBench show consistent improvements in task success rate (TSR), with TWNM yielding an average gain of 13.1 points on complex tasks. Further tests on 50 real APIs across 7 domains show consistent gains of 4.3--12.0 points, with fewer steps and latency, demonstrating robust generalization under real-world dynamics.

2506.16659 2026-05-22 cs.LG cs.AI math.OC

Memory-Efficient LLM Pretraining via Minimalist Optimizer Design

通过最小化优化器设计实现内存高效的LLM预训练

Athanasios Glentis, Jiaxiang Li, Andi Han, Mingyi Hong

发表机构 * Department of Electrical and Computer Engineering, University of Minnesota, USA(电气与计算机工程系,明尼苏达大学,美国) School of Mathematics and Statistics, University of Sydney, Australia(数学与统计学学院,悉尼大学,澳大利亚)

AI总结 本文研究了如何通过简单的优化器设计改进,使SGD在预训练中达到最先进的性能,提出了SCALE优化器,在内存使用上比Adam更高效,并在多个模型上表现优于现有内存高效的优化器。

Comments Accepted at ICML 2026

详情
AI中文摘要

训练大型语言模型(LLMs)依赖于自适应优化器,如Adam,这些优化器引入了额外的操作,并需要比SGD更多的内存来维护一阶和二阶矩量。尽管最近的工作如GaLore、Fira和APOLLO提出了状态压缩的内存高效变体,但一个根本性的问题仍然存在:plain SGD需要哪些最小的修改才能达到最先进的预训练性能?我们通过自底向上的方法系统地研究了这个问题,并识别出两种简单但高度(内存和计算)高效的技巧:(1)列级梯度归一化(沿输出维度归一化梯度),在没有动量的情况下提升SGD性能;(2)仅在输出层应用一阶动量,因为梯度方差最高。结合这两种技术得到SCALE(Stochastic Column-normAlized Last-layer momEntum),一种简单的优化器,用于内存高效的预训练。在多个模型(60M-1B)上,SCALE的内存使用仅为Adam的35-45%,并且在多个模型上表现优于Adam。它还一致优于内存高效的优化器如GaLore、Fira和APOLLO,使其成为在内存限制下的大规模预训练的强大候选者。对于LLaMA 7B,SCALE在困惑度和内存消耗方面都优于最先进的内存高效方法APOLLO和Muon。

英文摘要

Training large language models (LLMs) relies on adaptive optimizers such as Adam, which introduce extra operations and require significantly more memory to maintain first- and second-order moments than SGD. While recent works such as GaLore, Fira and APOLLO have proposed state-compressed memory-efficient variants, a fundamental question remains: What are the minimum modifications to plain SGD needed to match state-of-the-art pretraining performance? We systematically investigate this question using a bottom-up approach, and identify two simple yet highly (memory- and compute-) efficient techniques: (1) column-wise gradient normalization (normalizing the gradient along the output dimension), that boosts SGD performance without momentum; and (2) applying first-order momentum only to the output layer, where gradient variance is highest. Combining these two techniques lead to SCALE (Stochastic Column-normAlized Last-layer momEntum), a simple optimizer for memory efficient pretraining. Across multiple models (60M-1B), SCALE matches or exceeds the performance of Adam while using only 35-45% of the total memory. It also consistently outperforms memory-efficient optimizers such as GaLore, Fira and APOLLO, making it a strong candidate for large-scale pretraining under memory constraints. For LLaMA 7B, SCALE outperforms the state-of-the-art memory-efficient methods APOLLO and Muon in both perplexity and memory consumption.

2506.14648 2026-05-22 cs.RO cs.AI

SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning

SENIOR: 在基于偏好的强化学习中高效查询选择与偏好引导探索

Hexian Ni, Tao Lu, Haoyuan Hu, Yinghao Cai, Shuo Wang

发表机构 * State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences(多模态人工智能系统国家重点实验室,自动化研究所) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院)

AI总结 本文提出SENIOR方法,通过高效查询选择和偏好引导探索提升人类反馈效率和策略学习速度,解决基于偏好的强化学习在反馈和样本效率方面的不足。

Comments 8 pages, 8 figures, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)

详情
AI中文摘要

基于偏好强化学习(PbRL)方法通过学习基于人类偏好的奖励模型来避免奖励工程。然而,较差的反馈和样本效率仍然是阻碍PbRL应用的问题。本文提出了一种新颖的高效查询选择和偏好引导探索方法,称为SENIOR,能够选择有意义且易于比较的行为片段对,以提高人类反馈效率并加速策略学习,通过设计的偏好引导内在奖励。我们的关键思想是双方面的:(1)我们设计了一种基于运动区别的选择方案(MDS)。它通过状态的核密度估计选择具有明显运动和不同方向的片段对,这更任务相关且更易于人类偏好标注;(2)我们提出了一种新颖的偏好引导探索方法(PGE)。它鼓励探索高偏好和低访问状态,并持续引导智能体获取有价值的样本。两种机制的协同作用可以显著加快奖励和策略学习的进度。我们的实验表明,SENIOR在六个复杂的机器人操作任务(从仿真和现实世界)中,既在人类反馈效率又在策略收敛速度上均优于其他五个现有方法。视频可在我们的项目网站上找到:https://2025senior.github.io/

英文摘要

Preference-based Reinforcement Learning (PbRL) methods provide a solution to avoid reward engineering by learning reward models based on human preferences. However, poor feedback- and sample- efficiency still remain the problems that hinder the application of PbRL. In this paper, we present a novel efficient query selection and preference-guided exploration method, called SENIOR, which could select the meaningful and easy-to-comparison behavior segment pairs to improve human feedback-efficiency and accelerate policy learning with the designed preference-guided intrinsic rewards. Our key idea is twofold: (1) We designed a Motion-Distinction-based Selection scheme (MDS). It selects segment pairs with apparent motion and different directions through kernel density estimation of states, which is more task-related and easy for human preference labeling; (2) We proposed a novel preference-guided exploration method (PGE). It encourages the exploration towards the states with high preference and low visits and continuously guides the agent achieving the valuable samples. The synergy between the two mechanisms could significantly accelerate the progress of reward and policy learning. Our experiments show that SENIOR outperforms other five existing methods in both human feedback-efficiency and policy convergence speed on six complex robot manipulation tasks from simulation and four real-worlds. Videos can be found on our project website: https://2025senior.github.io/

2503.21821 2026-05-22 cs.AI

PHYSICS: Benchmarking Foundation Models on University-Level Physics Problem Solving

PHYSICS:在大学物理问题求解中基准测试基础模型

Kaiyue Feng, Yilun Zhao, Yixin Liu, Tianyu Yang, Chen Zhao, John Sous, Arman Cohan

发表机构 * Yale University(耶鲁大学) New York University(纽约大学) Notre Dame University(诺特丹大学)

AI总结 本文提出PHYSICS基准测试,用于评估大学水平物理问题求解能力,包含1297个专家标注的问题,涵盖六个核心领域,并通过自动化评估系统揭示了领先基础模型的显著局限性。

Journal ref Findings of ACL 2025

详情
AI中文摘要

我们介绍了PHYSICS,一个全面的大学物理问题求解基准测试。它包含1297个专家标注的问题,涵盖六个核心领域:经典力学、量子力学、热力学和统计力学、电磁学、原子物理和光学。每个问题都需要高级物理知识和数学推理。我们开发了一个稳健的自动化评估系统,以实现精确且可靠的验证。对领先基础模型的评估揭示了显著的局限性。即使最先进的模型o3-mini也只能达到59.9%的准确率,突显了解决高水平科学问题的重大挑战。通过全面的错误分析、探索多样的提示策略以及基于检索增强生成(RAG)的知识增强,我们识别出关键的改进领域,为未来的发展奠定了基础。

英文摘要

We introduce PHYSICS, a comprehensive benchmark for university-level physics problem solving. It contains 1297 expert-annotated problems covering six core areas: classical mechanics, quantum mechanics, thermodynamics and statistical mechanics, electromagnetism, atomic physics, and optics. Each problem requires advanced physics knowledge and mathematical reasoning. We develop a robust automated evaluation system for precise and reliable validation. Our evaluation of leading foundation models reveals substantial limitations. Even the most advanced model, o3-mini, achieves only 59.9% accuracy, highlighting significant challenges in solving high-level scientific problems. Through comprehensive error analysis, exploration of diverse prompting strategies, and Retrieval-Augmented Generation (RAG)-based knowledge augmentation, we identify key areas for improvement, laying the foundation for future advancements.

2503.00747 2026-05-22 cs.CV cs.RO eess.IV

LFX: Towards Unified Light Field Dense Semantic Segmentation and Salient Object Detection

LFX:迈向统一的光场密集语义分割和显著物体检测

Fei Teng, Lingxin Huang, Buyin Deng, Kai Luo, Boyuan Zheng, Zheng Fang, Hong Zheng, Kunyu Peng, Jiaming Zhang, Yaonan Wang, Kailun Yang

发表机构 * School of Artificial Intelligence and Robotics and the National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, China(人工智能与机器人学院和机器人视觉感知与控制技术国家工程研究中心,湖南大学,中国) China Mobile Group Hunan Company Ltd., China(中国移动集团湖南有限公司,中国) Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Germany(人机学与机器人研究所,卡尔斯鲁厄理工学院,德国)

AI总结 本文提出LFX框架,通过统一的光场表示特征调制空间,实现了对多种光场表示和不同感知任务的适应,从而在三个光场基准测试中取得最先进的结果,显著优于特定表示方法。

Comments The source code will be made publicly available at https://github.com/FeiT-FeiTeng/LFX

详情
AI中文摘要

光场相机在单次曝光内捕获多视角观测。然而,现有研究通常针对特定的LF表示进行优化,导致该领域缺乏统一的学习框架。为弥合这一差距,我们提出了LFX,首个统一的光场感知框架。LFX建立了一个表示不变的特征调制空间,使其能够适应异构的LF表示和多样的感知任务。具体而言,我们提出了Field-of-Parallax Angular Subspace Modeling(FoP-ASM),为每个辅助视图分配独立的角标记,实现视图间的独立建模。同时,共享流形子空间约束和正则化损失强制在视图间保持全局一致的语义调制。在三个LF基准测试中的广泛评估表明,LFX在不同的LF表示上均取得最佳结果,比特定表示方法高出高达12%和20%,在显著物体检测中达到0.029/0.027的MAE,且在语义分割中达到84.37 mIoU。源代码将在https://github.com/FeiT-FeiTeng/LFX上公开。

英文摘要

Light field cameras capture multi-view observations within a single exposure. However, existing studies are typically tailored to specific LF representations, leaving the field without a unified learning framework. To bridge this gap, we present LFX, the first unified framework for LF perception. LFX establishes a representation-invariant feature modulation space, enabling it to adapt to heterogeneous LF representations and diverse perception tasks. Specifically, we propose Field-of-Parallax Angular Subspace Modeling (FoP-ASM), which assigns an independent angular marker to each auxiliary view, enabling view-wise independent modeling. Meanwhile, shared manifold subspace constraints and regularization losses enforce globally consistent semantic modulation across views. Extensive evaluations across three LF benchmarks show that LFX achieves state-of-the-art results across distinct LF representations, outperforming representation-specific methods by up to 12% and 20% with 0.029/0.027 MAE for salient object detection, and achieving 84.37 mIoU for semantic segmentation. The source code will be made publicly available at https://github.com/FeiT-FeiTeng/LFX.