arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 4089
专题追踪
2606.00821 2026-06-02 cs.LG

A Comparative Analysis of Machine Learning Algorithms for Multi-Task Prediction of the Parameters of the Pectin Hydrolysis--Extraction Process

机器学习算法用于果胶水解-提取过程参数多任务预测的比较分析

Mullosharaf K. Arabov, Shavkat Yo. Kholov, Zainiddin K. Muhiddin

发表机构 * Institute of Computational Mathematics and Information Technologies, Kazan Federal University(卡兹安联邦大学计算数学与信息科技研究所) Tajik Technical University named after Academician M.S. Osimi(阿米尔·苏米院士命名的塔吉克技术大学) V.I. Nikitin Institute of Chemistry, National Academy of Sciences of Tajikistan(塔吉克斯坦国家科学院化学研究所维·尼金廷研究所)

AI总结 本研究比较了11种机器学习算法在多任务回归预测果胶水解-提取过程参数中的性能,其中CatBoost表现最佳(平均R²约0.946),并分析了特征重要性,原料类型占主导地位(63.6%)。

Comments Preprint

详情
AI中文摘要

本研究利用机器学习方法解决复杂多参数工艺——果胶水解-提取过程的控制挑战。实验基础是一个独特的数据库,包含在受控条件下对七种植物原料进行的1000次实验室实验,涉及四个可变工艺因素(温度85-130°C、压力0.9-2.2 atm、保温时间3-10分钟、pH 1.5-2.0)。记录了四个输出特征:果胶产率、半乳糖醛酸含量、分子量和酯化度。为解决多任务回归问题,训练并比较了11种算法:正则化线性模型、集成方法(随机森林、梯度提升、XGBoost、CatBoost、Extra Trees)、k近邻、支持向量回归和多层感知器。最佳结果由CatBoost展示(超参数优化后平均R²约为0.946)。特征重要性分析揭示了原料类型的主导作用(占总重要性的63.6%),其次是温度和保温时间。开发的流水线以生产就绪格式导出,并部署为交互式Web界面。研究结果表明,集成方法结合严格的统计分析和可解释AI显著减少了物理实验的需求,并为智能果胶生产控制奠定了基础。

英文摘要

This study addresses the challenge of controlling a complex, multi-parameter technological process -- pectin hydrolysis--extraction -- using machine learning methods. The experimental foundation is a unique database comprising 1,000 laboratory experiments conducted under controlled conditions on seven types of plant raw material with four variable process factors (temperature 85--130 C, pressure 0.9--2.2 atm, holding time 3--10 min, pH 1.5--2.0). Four output characteristics were recorded: pectin yield, galacturonic acid content, molecular weight, and degree of esterification. To solve the multi-task regression problem, 11 algorithms were trained and compared: regularised linear models, ensemble methods (Random Forest, Gradient Boosting, XGBoost, CatBoost, Extra Trees), k-nearest neighbours, support vector regression, and a multilayer perceptron. The best results were demonstrated by CatBoost (average R-squared approximately 0.946 after hyperparameter optimisation). Feature importance analysis revealed the dominant role of the raw material type (63.6% of total importance), followed by temperature and holding time. The developed pipeline was exported in a production-ready format and deployed as an interactive web interface. The findings demonstrate that ensemble methods combined with rigorous statistical analysis and interpretable AI significantly reduce the need for physical experiments and form the basis for intelligent pectin production control.

2606.00820 2026-06-02 cs.CL

Not All Flips Are Conformity: Decomposing Stance Convergence in Multi-Agent LLM Debate

并非所有翻转都是顺从:多智能体LLM辩论中的立场趋同分解

Xiqi Hao, Zengqing Wu, Yu-Xuan Qiu, Chuan Xiao, Ruiqi Xu, Shuyuan Zheng, Jianbin Qin

发表机构 * Beijing Institute of Technology(北京理工大学) University of Osaka(大阪大学) Shenzhen University(深圳大学)

AI总结 通过三源分解框架,将多智能体辩论中的答案趋同分解为自发不稳定性、立场诱导的从众和推理诱导的说服,并揭示从众主要是有害的,可通过干预降低。

详情
AI中文摘要

多智能体辩论(MAD)是提升LLM推理能力的一种有前景的策略,但当智能体收敛于一个共同答案时,尚不清楚这种收敛是反映了真正的深思熟虑还是社会顺从。我们表明,传统的答案翻转率混淆了三种不同的机制:自发不稳定性、立场诱导的从众和推理诱导的说服。我们的三源分解框架通过受控的反事实条件隔离了每一种机制。在主要的MMLU-Pro设置中,37%的智能体-问题观测在自我反思下发生变化,而鲁棒性测试显示在GPQA-Diamond和三个模型族中存在显著的模型依赖性不稳定性;严格从众在主要设置中为29%,并且在模型复制中仍然主要是有害的(57-77%从正确变为错误)。一个受控的信息梯度实验表明,即使是空洞的推理也与抵抗智能体中20-39%的错误采纳相关,且类似推理的呈现方式具有显著的劝说权重。有害从众可以从第0轮特征预测(AUC = 0.79),并且针对风险的干预将其降低了13.6个百分点(p < 0.001)。然而,在没有正确性标签或自我反思控制的情况下,减少同伴采纳并不会提高准确性,因为有害和有益的影响无法区分。

英文摘要

Multi-agent debate (MAD) is a promising strategy for improving LLM reasoning, but when agents converge on a shared answer, it is unclear whether that convergence reflects genuine deliberation or social compliance. We show that the conventional answer flip rate conflates three distinct mechanisms: spontaneous instability, stance-induced conformity, and reasoning-induced persuasion. Our three-source decomposition framework isolates each through controlled counterfactual conditions. In the primary MMLU-Pro setting, 37% of agent-question observations change under self-reflection alone, while robustness tests show substantial model-dependent instability across GPQA-Diamond and three model families; strict conformity is 29% in the primary setting and remains predominantly harmful across model replications (57-77% correct-to-wrong). A controlled information-gradient experiment reveals that even vacuous reasoning is associated with 20-39% error adoption among resistant agents, with reasoning-like presentation carrying substantial persuasive weight. Harmful conformity can be predicted from Round 0 features (AUC = 0.79), and risk-targeted intervention reduces it by 13.6 percentage points (p < 0.001). However, without correctness labels or self-reflection controls, reducing peer adoption does not improve accuracy, because harmful and beneficial influence cannot be distinguished.

2606.00819 2026-06-02 cs.AI

Mitigating Hallucinations in Large Language Models Via Decoder Layer Skipping

通过解码器层跳跃减轻大型语言模型中的幻觉

Hanze Li, Jinhao You, Yichen Guo, Kai Tang, Shuangyang Xie, Xiande Huang

发表机构 * De Artificial Intelligence Lab(德人工智能实验室)

AI总结 本文提出DeLask框架,通过动态跳过易产生幻觉的解码器层,利用梯度下降的等价性检测并抑制错误信号,从而减轻LLM幻觉并提升可靠性。

Comments 5 pages

详情
AI中文摘要

大型语言模型(LLM)在各种自然语言任务中表现出色,但其输出常常出现幻觉——与事实信息不符的内容。在这项工作中,我们对解码过程进行了全面的逐层分析,并揭示幻觉往往源自更深的解码器层。为了解决这个问题,我们引入了 extbf{DeLask}( extbf{De}coder extbf{La}yer extbf{Sk}ipping),一种新颖的解码框架,它动态跳过容易产生幻觉的层。DeLask利用理论洞察,即$L$层Transformer的前向计算在条件上等价于$L$步梯度下降。我们通过计算连续解码步骤导出的梯度之间的余弦相似度来定义\emph{漂移值},从而在下降方向反转时识别问题层。DeLask并非完全丢弃这些层,而是将其隐藏状态与前面层部分聚合,从而在抑制错误信号的同时保持一致性。跨不同LLM和基准的广泛实验表明,DeLask持续减轻幻觉并增强整体可靠性,为提升大规模语言模型的鲁棒性提供了一个轻量级且可泛化的解码框架。

英文摘要

Large Language Models (LLMs) have achieved strong performance across diverse natural language tasks, yet their outputs often suffer from hallucinations -- content that is misaligned with factual information. In this work, we conduct a comprehensive layer-wise analysis of the decoding process and reveal that hallucinations tend to originate from deeper decoder layers. To address this issue, we introduce \textbf{DeLask} (\textbf{De}coder \textbf{La}yer \textbf{Sk}ipping), a novel decoding framework that dynamically skips layers prone to producing hallucinations. DeLask leverages the theoretical insight that the forward computation of an $L$-layer Transformer is conditionally equivalent to $L$ steps of gradient descent. We define a \emph{driftance value} by computing the cosine similarity between gradients derived from consecutive decoder steps, identifying problematic layers when the descent direction reverses. Rather than discarding such layers entirely, DeLask partially aggregates their hidden states with preceding layers, thereby preserving consistency while suppressing erroneous signals. Extensive experiments across diverse LLMs and benchmarks demonstrate that DeLask consistently mitigates hallucinations and enhances overall reliability, providing a lightweight and generalizable decoding framework for improving the robustness of large-scale language models.

2606.00815 2026-06-02 cs.LG

OmniEEG-Bench: A Standardized Evaluation Benchmark for EEG Foundation Models

OmniEEG-Bench: 脑电图基础模型的标准化评估基准

Ziling Lu, Zongsheng Li, Xinke Shen, Kexin Lou, Yingyue Xin, Xiaoqi Chen, Shinan Wang, Xiang Chen, Jiahao Fan, Chenyu Huang, Xin Xu, Zhoujie Hou, Chen Wei, Quanying Liu

发表机构 * Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, China(南方科技大学生物医学工程系,深圳,中国) School of Computer Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China(香港中文大学(深圳)计算机科学与工程学院,深圳,中国) Omni-Intelligence, Shenzhen, China(奥米智能,深圳,中国) Shenzhen Loop Area Institute, Shenzhen, China(深圳环城研究院,深圳,中国)

AI总结 针对脑电图基础模型评估碎片化问题,提出统一基准OmniEEG-Bench,涵盖六类任务、54个数据集,并揭示预训练数据多样性和模型大小与性能的缩放律关系。

Comments 28 pages, 13 figures, 8 tables; benchmark of EEG foundation models

详情
AI中文摘要

脑电图(EEG)支持多种脑机接口(BCI)任务,从脑状态监测到人-大语言模型交互。EEG基础模型正在兴起,但由于异构数据集和不一致的任务协议,评估仍然碎片化。在此,我们介绍OmniEEG-Bench,一个用于EEG基础模型(FMs)的统一基准和下游任务路线图。它将EEG FMs的评估组织为六个任务族,涵盖(i)信号可靠性、(ii)生物特征与疾病、(iii)意识与状态、(iv)认知与情感、(v)自然刺激解码以及(vi)运动与交互,引入了先前EEG FM工作中未系统基准测试的新一代任务。OmniEEG-Bench通过任务卡规范标准化模型部署、任务定义和指标,并统一了54个EEG数据集及一致的评估协议。我们对10个代表性EEG基础模型进行了基准测试,并报告了涵盖多种评估设置的排行榜。预训练数据集多样性和模型大小均与跨数据集的更好平均排名显著相关,揭示了EEG基础模型中的缩放律行为(图1)。这些结果表明,扩展EEG基础模型不仅需要更大的架构,还需要更广泛和更多样化的预训练数据。基准测试代码可在https://github.com/ncclab-sustech/omni-eegbench.git获取。

英文摘要

Electroencephalography (EEG) supports a variety of brain-computer interface (BCI) tasks ranging from brain-state monitoring to human-LLM interactions. EEG foundation models are emerging, but evaluation remains fragmented due to heterogeneous datasets and nconsistent task protocols. Here, we introduce OmniEEG-Bench, a unified benchmark and downstream task roadmap for EEG foundation models (FMs). It organizes evaluation of EEG FMs into six task families spanning (i) signal reliability, (ii) biometrics and disease, (iii) consciousness and state, (iv) cognition and emotion, (v) naturalistic stimulus decoding, and (vi) motor and interaction, introducing a new generation of tasks not systematically benchmarked in prior EEG FM work. OmniEEG-Bench standardizes model deployment, task definitions, and metrics through a task-card specification, and unifies 54 EEG datasets with consistent evaluation protocols. We benchmark 10 representative EEG foundation models and report a leaderboard that covers diverse evaluation settings. Both pretraining dataset diversity and model size are significantly associated with better average ranks across datasets, revealing scaling-law behavior in EEG foundation models (Figure 1). These results suggest that scaling EEG foundation models requires not only larger architectures but also broader and more diverse pretraining data. The benchmark code is available at https://github.com/ncclab-sustech/omni-eegbench.git.

2606.00808 2026-06-02 cs.LG

Safe-Subspace Pseudo-Label Refinement for Source-Free Graph Domain Adaptation

安全子空间伪标签精炼用于无源图域自适应

Yingxu Wang, Xinwang Liu, Siyang Gao, Nan Yin

发表机构 * Department of Computer Science and Engineering, Chinese University of Hong Kong(香港中文大学计算机科学与工程系) College of Computer, National University of Defense Technology(国防科技大学计算机学院) Department of Data Science, City University of Hong Kong(香港城市大学数据科学系) The Education University of Hong Kong(香港教育大学)

AI总结 针对无源图域自适应中伪标签不可靠的问题,提出SafeSubspace伪标签精炼方法,通过识别置信度一致的安全子空间并利用语义与结构证据进行伪标签验证,实现鲁棒的图域自适应。

详情
AI中文摘要

无源图域自适应(SF-GDA)旨在当源图不再可访问时,将源训练的图模型适应到未标记的目标图。一个核心障碍是伪标签的可靠性:在特征和拓扑偏移下,源诱导的预测可能变得自信但错误,而无差别的自训练会通过图消息传递放大系统误差。本文从选择性伪标签的角度研究SF-GDA。我们不是假设整个目标域上全局有界的伪标签噪声,而是识别一个置信度一致的安全子空间,在该子空间上伪标签噪声可以在受限后验差异下得到控制,并推导出一个目标风险分解,将安全子空间拟合误差、选定标签噪声和不确定集风险分开。在此分析指导下,我们提出SafeSubspace伪标签精炼(S$^2$PLR),一种无源图自适应框架,仅对同时具有语义和结构证据支持的目标图应用硬伪标签监督。具体来说,S$^2$PLR利用源委员会置信度和分歧估计语义可靠性,通过图对比学习学习目标内在的结构表示,通过邻域一致性验证伪标签,并利用噪声容忍的软正则化处理剩余的不确定样本,而不是不可靠的硬标签。在不同域偏移下的图像和真实世界图基准上的实验表明,S$^2$PLR在各种无源迁移设置中实现了鲁棒且具有竞争力的性能。

英文摘要

Source-free graph domain adaptation (SF-GDA) aims to adapt source-trained graph models to unlabeled target graphs when source graphs are no longer accessible. A central obstacle is pseudo-label reliability: under feature and topological shifts, source-induced predictions may become confidently wrong, and indiscriminate self-training can amplify systematic errors through graph message passing. This paper studies SF-GDA from a selective pseudo-labeling perspective. Instead of assuming globally bounded pseudo-label noise over the entire target domain, we identify a confidence-consistent safe subspace on which pseudo-label noise can be controlled under restricted posterior discrepancy, and derive a target-risk decomposition that separates safe-subspace fitting error, selected-label noise, and uncertain-set risk. Guided by this analysis, we propose SafeSubspace Pseudo-Label Refinement (S$^2$PLR), a source-free graph adaptation framework that applies hard pseudo-label supervision only to target graphs supported by both semantic and structural evidence. Specifically, S$^2$PLR estimates semantic reliability using source-committee confidence and disagreement, learns a targetintrinsic structural representation via graph contrastive learning, verifies pseudo-labels through neighborhood consistency, and exploits the remaining uncertain samples with noise-tolerant soft regularization rather than unreliable hard labels. Experiments on image and real-world graph benchmarks under different domain shifts demonstrate that S$^2$PLR achieves robust and competitive performance across diverse source-free transfer settings.

2606.00798 2026-06-02 cs.CV cs.AI cs.LG

DASH: Dual-Branch Score Distillation for Guidance-Calibrated Compact Diffusion Models

DASH: 用于引导校准紧凑扩散模型的双分支分数蒸馏

Abdullah Al Shafi, Kazi Saeed Alam, Sk Imran Hossain, Engelbert Mephu Nguifo

发表机构 * Khulna University of Engineering & Technology(Khulna 工程与技术大学) University Clermont Auvergne(克莱蒙特-奥弗涅大学)

AI总结 针对类条件扩散模型参数压缩中无监督无条件分数分支导致引导失效的问题,提出双分支蒸馏框架DASH,通过独立监督两个分支并引入锚点正则化和课程迁移,在5.9倍压缩下保持与教师模型相近的FID和引导保真度。

Comments 14 pages, 7 figures, 4 tables; appendix with additional ablations and qualitative results

详情
AI中文摘要

类条件扩散模型的参数压缩揭示了输出级蒸馏中一个未被充分探索的局限性:无条件分数分支保持无监督,导致学生模型中无分类器引导差距欠定。该差距在每个去噪步骤中被放大,允许两个分支都崩溃为相同预测的退化解,使得引导在低输出级训练损失下无效。本文介绍了DASH,一种双分支蒸馏框架,独立监督两个分数分支,通过独立分支约束为每个训练样本唯一指定目标分支输出,并引入锚点项将条件预测正则化到真实噪声。该框架进一步引入了TIRT迁移,将教师收敛的每时间步重要性课程复制到学生中作为冻结先验,消除了在有限蒸馏预算内重新学习它的需要。在CIFAR-10和CIFAR-100上的实验表明,5.9倍压缩在50步DDIM采样下将质量保持在教师模型4个FID点以内,显著优于从头训练,且引导保真度良好保持。消融研究证实无条件监督是主要贡献,占总蒸馏增益的60%以上。课程迁移和锚点正则化提供互补收益,共同验证了双分支约束对于引导保持压缩的经验必要性。

英文摘要

Parameter compression of class-conditional diffusion models reveals an underexplored limitation in output-level distillation: the unconditional score branch remains unsupervised, leaving the classifier-free guidance gap underdetermined in the student. This gap, amplified at every denoising step, admits degenerate solutions where both branches collapse toward identical predictions, rendering guidance ineffective despite low output-level training loss. This paper introduces DASH, a dual-branch distillation framework that independently supervises both score branches, uniquely specifying target branch outputs for each training sample through independent branch constraints, with an anchor term regularising conditional predictions toward ground-truth noise. The framework further introduces TIRT Transfer, which copies the teacher's converged per-timestep importance curriculum into the student as a frozen prior, eliminating the need to relearn it within limited distillation budgets. Experiments on CIFAR-10 and CIFAR-100 demonstrate that 5.9x compression maintains quality within 4 FID points of the teacher at 50-step DDIM sampling, considerably outperforming training from scratch with guidance fidelity well preserved. Ablation studies confirm that unconditional supervision is the dominant contribution, accounting for over 60% of total distillation gain. Curriculum transfer and anchor regularisation provide complementary benefit, together validating dual-branch constraints as empirically essential for guidance-preserving compression.

2606.00795 2026-06-02 cs.LG cs.AI

Extending Causal Metamodeling to a non-Markovian Queue

将因果元建模扩展到非马尔可夫排队系统

Pracheta Amaranath, Anant Bhide, David Jensen, Peter Haas

发表机构 * Manning College of Information and Computer Sciences University of Massachusetts Amherst(信息与计算机科学学院麻省大学阿默斯特分校)

AI总结 本文通过相位型分布近似非指数分布,将模块化动态贝叶斯网络(MDBN)因果元建模方法从马尔可夫系统扩展到非马尔可夫排队系统,并解决了相位数选择、参数学习和采样间隔等挑战,实验表明在G/M/1队列上可实现数量级的推理加速。

Comments 12 pages

详情
AI中文摘要

离散事件仿真的元模型近似模拟模型的行为,而无需运行昂贵的仿真。先前的工作引入了模块化动态贝叶斯网络(MDBN)——一类元模型,可以使用单个训练模型估计一系列概率和因果查询(PCQ)——但该方法仅限于马尔可夫系统。在本文中,我们通过使用相位型分布近似非指数分布,启动MDBN向非马尔可夫排队的扩展。这种方法带来了新的挑战,包括在选择相位数量时平衡元建模精度和可处理性、高效学习元模型参数,以及选择用于通过离散时间MDBN近似连续时间仿真的采样间隔。我们为这些挑战提供了初步解决方案,从而产生了第一个针对非马尔可夫系统的因果元建模技术。在G/M/1队列上的实验表明,MDBN可以为PCQ提供准确的答案,并且相对于直接仿真,推理时间实现了数量级的加速。

英文摘要

Metamodels for discrete-event simulations approximate the behavior of simulation models without running expensive simulations. Prior work introduced modular dynamic Bayesian networks (MDBNs) -- a class of metamodels that can estimate a range of probabilistic and causal queries (PCQs) using a single, trained model -- but the method was limited to Markovian systems. In this paper, we initiate an extension of MDBNs to non-Markovian queues by approximating non-exponential distributions using phase-type distributions. This approach raises novel challenges, including balancing metamodeling accuracy and tractability when choosing the number of phases, efficiently learning metamodel parameters, and choosing the sampling interval that is used to approximate a continuous-time simulation by a discrete-time MDBN. We provide preliminary solutions to these challenges, yielding the first causal metamodeling technique for non-Markovian systems. Experiments on a G/M/1 queue demonstrate that the MDBN can produce accurate answers to PCQs with orders-of-magnitude speedup of inference times relative to direct simulation.

2606.00784 2026-06-02 cs.CV

DINO-GFSA: Geo-Localization via Semantic Gated Fusion and Mamba-based Sequential Aggregation

DINO-GFSA:基于语义门控融合和Mamba序列聚合的地理定位

Beier Hu, Yuanshen Guo, Jialu Cai, Chengwei Li, Yong Wang, Shunan Wu, Zhigang Wu

发表机构 * School of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen, China(中山大学航空航天学院,深圳,中国)

AI总结 提出DINO-GFSA框架,通过LoRA适配的DINOv3骨干网络、语义门控残差融合模块和Mamba序列聚合头,在无人机跨视角地理定位中实现最先进性能。

详情
AI中文摘要

跨视角地理定位(CVGL)对于无人机在无GNSS环境下的自定位和目标定位至关重要。然而,在保留细粒度空间细节的同时获取鲁棒语义仍然具有挑战性。为此,我们提出DINO-GFSA框架,利用LoRA(低秩适配)适配的DINOv3(ViTL)骨干网络实现参数高效、高容量的表示。关键地,我们引入了语义门控残差融合模块,利用高层语义选择性校准和整合低层空间线索,有效弥合语义鸿沟。此外,设计了基于Mamba的序列聚合头,以线性复杂度捕获长距离空间依赖。实验表明,在University-1652和DenseUAV基准上取得了最先进性能,特别是在DenseUAV上Recall@1比之前最佳方法高出3.48%。这些结果验证了DINO-GFSA作为无人机CVGL通用鲁棒解决方案的有效性。

英文摘要

Cross-view geo-localization (CVGL) is critical for Unmanned Aerial Vehicle (UAV) self-positioning and target localization in GNSS-denied environments. However, acquiring robust semantics while preserving finegrained spatial details remains challenging. To address this, we propose DINO-GFSA, a framework leveraging a LoRA (Low-Rank Adaptation) adapted DINOv3 (ViTL) backbone for parameter-efficient, high-capacity representation. Crucially, we introduce a Semantic Gated Residual Fusion module, which utilizes high-level semantics to selectively calibrate and integrate low-level spatial cues, effectively bridging the semantic gap. Furthermore, a Mamba-based Sequential Aggregation Head is designed to capture long-range spatial dependencies with linear complexity. Experiments demonstrate state-of-the-art performance on University-1652 and DenseUAV benchmarks, notably surpassing the previous best on DenseUAV by 3.48% on Recall@1. These results validate DINO-GFSA as a generalized, robust solution for UAV CVGL.

2606.00782 2026-06-02 cs.CV

FlowOVD: Learning Generative Latent Flows for Zero-shot Open-vocabulary Detection

FlowOVD: 学习生成式潜在流用于零样本开放词汇检测

Yao Wei, Andrea Cavallaro, Changjae Oh

发表机构 * Queen Mary University of London(伦敦女王学院) EPFL(瑞士联邦理工学院)

AI总结 提出FlowOVD,基于修正流的文本条件查询生成框架,通过连续潜在查询动态实现开放词汇检测,在COCO和LVIS上分别达到49.5 AP和31.5 AP,优于GroundingDINO。

详情
AI中文摘要

开放词汇目标检测(OVD)通过大规模视觉-语言预训练取得了显著进展。然而,现有方法通常将OVD表述为判别性预测问题,其中解码器查询要么是静态的,要么从编码器特征初始化,从而限制了其多样性和灵活性。在本文中,我们引入生成视角,将解码器查询生成建模为潜在空间中的连续传输过程。我们提出FlowOVD,一种基于修正流的文本条件查询生成框架,逐步将文本无关的查询转换为文本引导的查询。通过将连续潜在查询动态引入基于视觉-语言模型(VLM)的检测器,我们的方法避免了启发式离散查询构建,并为开放词汇检测实现了更具表现力的语义对齐。无需额外训练数据,FlowOVD在COCO上达到49.5 AP,在LVIS上达到31.5 AP,分别比GroundingDINO高出+1.2 AP(+2.5%)和+4.1 AP(+15.0%)。在具有挑战性的长尾LVIS基准上的更大增益进一步凸显了连续查询生成对开放词汇泛化的有效性。

英文摘要

Open-vocabulary object detection (OVD) has achieved remarkable progress through large-scale vision-language pre-training. Existing methods, however, typically formulate OVD as a discriminative prediction problem, where decoder queries are either static or initialized from encoder features, thus limiting their diversity and flexibility. In this paper, we introduce a generative perspective by modeling decoder query generation as a continuous transport process in latent space. We propose FlowOVD, a text-conditioned query generation framework based on rectified flow that progressively transforms text-agnostic queries into text-guided queries. By introducing continuous latent query dynamics into a vision-language model (VLM) based detector, our method avoids heuristic discrete query construction and enables more expressive semantic alignment for open-vocabulary detection. Without requiring additional training data, FlowOVD achieves 49.5 AP on COCO and 31.5 AP on LVIS, outperforming GroundingDINO by +1.2 AP (+2.5 %) and +4.1 AP (+15.0 %), respectively. The larger gain on the challenging long-tailed LVIS benchmark further highlights the effectiveness of continuous query generation for open-vocabulary generalization.

2606.00780 2026-06-02 cs.LG cs.AI

Behavior-Invariant Task Representation Learning with Transformer-based World Models for Offline Meta-Reinforcement Learning

基于Transformer世界模型的行为不变任务表示学习用于离线元强化学习

Fuyuan Qian, Menglong Zhang, Song Wang, Quanying Liu

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学)

AI总结 提出一种结合信息论任务表示学习与Transformer随机世界模型的框架,通过提取行为不变的任务变量和保守值惩罚,解决离线元强化学习中的分布偏移和稀疏奖励问题,实现鲁棒泛化。

Comments ICML2026

详情
AI中文摘要

离线元强化学习利用静态数据集使智能体能够通过结合离线效率与元学习适应性来泛化到未见环境,但它面临来自上下文和策略分布偏移的关键挑战。这些问题阻碍智能体适应在线环境,并在稀疏奖励设置下进一步加剧。结果,智能体常常陷入固有的模式困境,无法实现鲁棒的泛化。在这项工作中,我们提出了一种新颖的框架,将信息论任务表示学习与基于Transformer的随机世界模型相结合。我们的方法提取对行为策略不变的任务定义潜在变量,从而有效缓解上下文分布偏移。为了进一步处理策略偏移和模型利用,我们对基于想象力的轨迹应用保守值惩罚,防止策略利用模型不准确性,同时保持鲁棒适应。大量评估表明,我们的方法在分布外和稀疏奖励设置下优于最先进的方法,具有优越的稳定性和泛化能力。

英文摘要

Offline meta-reinforcement learning leverages static datasets to enable agents to generalize to unseen environments by combining offline efficiency with meta-learning adaptability, yet it faces key challenges from context and policy distribution shifts. These issues hinder agents from adapting to online environments, and are further exacerbated under sparse-reward settings. As a result, agents often become trapped in an inherent pattern dilemma, failing to achieve robust generalization. In this work, we propose a novel framework that integrates information-theoretic task representation learning with a Transformer-based stochastic world model. Our approach extracts task-defining latent variables that are invariant to behavior policy, thereby effectively mitigating the context distribution shift. To further handle policy shift and model exploitation, we apply a conservative value penalty to imagination-based rollouts, preventing the policy from exploiting model inaccuracies while maintaining robust adaptation. Extensive evaluations demonstrate that our method outperforms state-of-the-art approaches, with superior stability and generalization under out-of-distribution and sparse-reward settings.

2606.00776 2026-06-02 cs.LG

Latent Diffusion Pretraining for Crystal Property Prediction

晶体性质预测的潜在扩散预训练

Shrimon Mukherjee, Kishalay Das, Partha Basuchowdhuri, Pawan Goyal, Niloy Ganguly

发表机构 * University of California, Berkeley(加州大学伯克利分校) Indian Institute of Technology, Bombay(印度班加罗尔印度理工学院)

AI总结 提出基于潜在扩散的预训练框架CrysLDNet,结合变分自编码器和扩散模型,从无标注晶体结构中学习表示,微调后显著提升性质预测性能。

Comments Published in ICML 2026

详情
AI中文摘要

快速准确地预测晶体性质是新材料设计中的核心挑战。图神经网络和基于Transformer的模型由于能够编码晶体中原子的局部结构环境,已成为此任务的有力工具。然而,这些模型需要大量数据,而实践中晶体性质的标注数据稀缺。预训练-微调策略,特别是基于扩散模型的策略,在解决这些限制方面显示出前景。在这项工作中,我们引入了一个新颖的基于潜在扩散的预训练框架CrysLDNet,旨在缓解数据稀缺问题。我们的方法在预训练阶段将变分自编码器(VAE)与扩散模型相结合。VAE编码器将3D晶体结构映射到平滑的潜在空间,在该空间中应用扩散过程。这种潜在扩散预训练使图编码器能够从大规模无标注数据中有效捕获结构和化学语义,然后可以针对特定性质预测任务进行微调。在流行的DFT数据集上进行性质预测的综合实验表明,CrysLDNet显著优于从头训练和预训练的基线,在JARVIS和MP数据集上分别提高了4.26%和4.90%。此外,学习到的表示在稀疏数据条件下保持鲁棒,并且具有足够的表达能力,可以在有限实验数据微调时纠正DFT误差。代码可在https://github.com/shrimonmuke0202/CrysLDNet.git获取。

英文摘要

Fast and accurate prediction of crystal properties is a central challenge in new materials design. Graph neural networks and Transformer-based models have emerged as powerful tools for this task due to their ability to encode the local structural environment of atoms within a crystal. However, these models are data-hungry, and in practice, labeled data for crystal properties are scarce. Pretraining-finetuning strategies, particularly those based on diffusion models, have shown promise in addressing these limitations. In this work, we introduce a novel latent diffusion based pretraining framework, CrysLDNet, designed to mitigate data scarcity. Our approach integrates a Variational Autoencoder (VAE) with a diffusion model during the pretraining stage. The VAE encoder maps 3D crystal structures into a smooth latent space within which the diffusion process is applied. This latent diffusion pretraining enables the graph encoder to effectively capture structural and chemical semantics from large-scale unlabeled data, which can then be finetuned for specific property prediction tasks. Comprehensive experiments on popular DFT datasets for property prediction reveal that CrysLDNet significantly outperforms both training-from-scratch and pretrained baselines, with improvements of 4.26% and 4.90% on the JARVIS and MP datasets, respectively. Additionally, the learned representations remain robust in sparse-data conditions and are expressive enough to correct DFT errors when finetuned with limited experimental data. Code is available at: https://github.com/shrimonmuke0202/CrysLDNet.git.

2606.00775 2026-06-02 cs.CV cs.AI

GIRL-DETR: Gradient-Isolated Reinforcement Learning for Video Moment Retrieval

GIRL-DETR: 梯度隔离强化学习用于视频时刻检索

Shihang Zhang, Mingjin Kuai, Ye Wei, Zhen Zhang, Wei Ji

发表机构 * College of Electronics and Information Engineering, Sichuan University(四川大学电子信息工程学院) School of Intelligence Science and Technology, Nanjing University(南京大学智能科学与技术学院)

AI总结 针对视频时刻检索中连续代理损失与非可微指标不匹配导致的优化停滞问题,提出梯度隔离强化学习框架GIRL-DETR,通过冻结骨干网络并采用三阶段渐进强化学习策略直接优化tIoU指标,在轻量级模型中实现定位精度提升。

Comments 13 pages, 6 figures. Submitted to IEEE Transactions on Image Processing (TIP). Code is available at: https://github.com/Z-Shihang/GIRL-DETR

详情
AI中文摘要

视频时刻检索(VMR)任务要求精确定位与自然语言查询对齐的时间边界,但许多模型存在连续代理损失与非可微指标之间的不匹配,导致训练后期优化停滞,边界预测陷入次优解。尽管强化学习(RL)后训练成功优化了大模型的定位结果,但直接应用于轻量级网络容易破坏监督阶段建立的脆弱特征表示。为克服这一优化瓶颈,我们提出梯度隔离强化学习用于DETR(GIRL-DETR),首次将RL后训练引入轻量级时间定位框架。输入视频和文本特征首先通过跨模态交互(CMI)在进入Transformer编码器之前建立早期对齐。随后,文本引导门控(TGG)机制在Transformer解码器生成候选提案之前动态地将语义先验注入查询,为时间预测提供高信噪比输入。在监督训练达到收敛后,冻结骨干网络以保护特征流形,而检测头通过三阶段渐进强化学习(TPRL)策略直接优化非可微评估指标tIoU以提升定位精度。该方法实现了状态表示与指标优化的正交解耦。在Charades-STA、QVHighlights和TACoS上的实验表明,GIRL-DETR有效解决了代理损失退化问题,以最少的参数更新实现了显著的精度提升,为轻量级VMR模型中的RL应用提供了稳健的新途径。

英文摘要

Video Moment Retrieval (VMR) task requires accurately localizing temporal boundaries aligned with natural language queries, but many models suffer from a misalignment between continuous surrogate losses and non-differentiable metrics, leading to optimization stagnation during the late stages of training and trapping boundary predictions in suboptimal solutions. Although Reinforcement Learning (RL) post-training successfully optimizes localization results for large models, applying it directly to lightweight networks easily disrupts the fragile feature representations established during the supervised phase. To overcome this optimization bottleneck, we propose Gradient-Isolated Reinforcement Learning for DETR (GIRL-DETR), introducing RL post-training into a lightweight temporal localization framework for the first time. The input video and text features first establish early alignment through Cross-Modal Interaction (CMI) before entering the transformer encoder. Subsequently, a Text-Guided Gating (TGG) mechanism dynamically injects semantic priors into the queries before the transformer decoder generates candidate proposals, providing high signal-to-noise ratio inputs for temporal prediction. After the supervised training reaches convergence, the backbone network is frozen to protect the feature manifold, while the detection head directly optimizes the non-differentiable evaluation metric tIoU to enhance localization accuracy through a Three-stage Progressive Reinforcement Learning (TPRL) strategy. This approach achieves an orthogonal decoupling of state representation and metric optimization. Experiments on Charades-STA, QVHighlights, and TACoS demonstrate that GIRL-DETR effectively resolves surrogate loss degradation and achieves substantial accuracy improvements with minimal parameter updates, providing a robust new pathway for RL applications in lightweight VMR models.

2606.00773 2026-06-02 cs.RO

SafeVLA-Bench: A Benchmark for the Success-Safety Gap in Vision-Language-Action Models

SafeVLA-Bench: 视觉-语言-动作模型中成功-安全差距的基准

Jialiang Fan, Weizhe Xu, Oleg Sokolsky, Insup Lee, Fanxin Kong

发表机构 * University of Notre Dame(诺丁汉大学) University of Pennsylvania(宾夕法尼亚大学)

AI总结 提出SafeVLA-Bench,一种基于信号时序逻辑的后验安全评估框架,用于量化VLA策略在完成任务时的安全违规行为,揭示成功与安全之间的差距。

Comments 27 pages, 5 figures

详情
AI中文摘要

视觉-语言-动作(VLA)基准衡量策略是否完成指定的操作任务,但二元成功可能隐藏与安全相关的轨迹行为:在施加过度接触、干扰旁观物体、使被持物体不稳定或进入机器人自接触的同时达到目标。我们提出了SafeVLA-Bench,一个用于现有基于模拟器的VLA基准的后验安全评估框架。它将任务感知的安全要求形式化为信号时序逻辑(STL)规范,并用两个不安全成功指标报告原生成功:Succ-But-Unsafe(SBU),即既成功又违反安全策略的滚动比例,以及Violation Severity Index(VSI),一个有界的最坏违规深度分数。我们在LIBERO和RoboCasa-365上实例化SafeVLA-Bench,评估了九个策略基准条目,涵盖桌面和厨房操作任务。高任务成功并不意味安全执行:高SR的桌面基线仍然有13%到15%的不安全情节率,而36%到56%的成功RoboCasa-365滚动违反了至少一个活跃安全条款。项目页面:https://safevla.org。

英文摘要

Vision-language-action (VLA) benchmarks measure whether a policy completes a requested manipulation task, but binary success can hide safety-relevant trajectory behavior: reaching the goal while applying excessive contact, disturbing bystander objects, destabilizing the held object, or entering robot self-contact. We present SafeVLA-Bench, a post-hoc safety-evaluation framework for existing simulator-based VLA benchmarks. It formalizes task-aware safety requirements as Signal Temporal Logic (STL) specifications and reports native success with two unsafe-success metrics: Succ-But-Unsafe (SBU), the fraction of rollouts that both succeed and violate safety, and Violation Severity Index (VSI), a bounded worst-violation depth score. We instantiate SafeVLA-Bench on LIBERO and RoboCasa-365, evaluating nine policy-benchmark entries across tabletop and kitchen manipulation tasks. High task success does not imply safe execution: high-SR tabletop baselines still leave 13 to 15 percent unsafe-episode rates,and 36 to 56 percent of successful RoboCasa-365 rollouts violate at least one active safety clause. Project page: https://safevla.org.

2606.00771 2026-06-02 cs.LG cs.AI cs.SD

Logit Distillation on Manifolds: Mapping by Learning

流形上的对数蒸馏:通过学习进行映射

Yiru Yang, Junling Wang, Nishant Kumar Singh, Luohong Wu, Haoran Yan

发表机构 * University of Zurich(苏黎世大学) ETH Zurich(苏黎世联邦理工学院) Deutsche Bank Securities(德意志银行证券公司)

AI总结 提出一种层和点投影映射方法,将学生和教师表示对齐到高维嵌入空间,结合LoRA注入,在显著减少可训练参数的同时提高词错误率。

详情
AI中文摘要

提高几乎任何机器学习模型性能的一种简单方法是,不训练单个模型,而是训练多个使用不同算法的模型,这些模型对相同数据做出略有不同的预测和错误,从而提高平均预测和鲁棒性。然而,使用整个模型集成进行预测是繁琐且计算成本过高的,无法部署给大量用户,特别是当模型是大型神经网络时。为此,我们引入了一种层和点投影映射,在训练过程中将学生和教师表示映射到对齐的高维嵌入空间。所提出的方法结合LoRA注入,将学生模型的可训练参数减少到教师模型的不到1%,同时与其他蒸馏方法相比,显著提高了词错误率(WER),如消融研究所示。与专家混合不同,我们的方法可以快速并行训练。

英文摘要

A simple way to improve the performance of almost any machine learning model is not to train a single but several models with diverse algorithms which will make slightly distinct kinds of predictions and errors on the same data, and thus improve the average predictions and robustness. However, making predictions using a whole ensemble of models is cumbersome and computationally too expensive to allow deployment to a large number of users, especially if the models are large neural nets. In response to this, we introduce a layer and point wise projection mapping, which maps student and teacher representations into an aligned high-dimensional embedding space during training process. The proposed approach combined with LoRA injection reduces the student model trainable parameters to less than 1% of the teacher model, while significantly improving word error rate (WER) compared to other distillation methods, as demonstrated in ablation studies. Unlike a mixture of experts, our method can be trained rapidly and in parallel.

2606.00765 2026-06-02 cs.AI

FALAT: Tracing Failures in LLM Agent Trajectories via Dependency-Guided Search

FALAT: 通过依赖引导搜索追踪LLM智能体轨迹中的失败

Md Nakhla Rafi, Md Ahasanuzzaman, Dong Jae Kim, Zhijie Wang, Tse-Hsun Chen

发表机构 * SPEAR Lab(SPEAR实验室) Concordia University(康科德大学) DePaul University(德保罗大学)

AI总结 提出FALAT框架,通过依赖引导搜索方法,在LLM智能体轨迹中识别导致失败的关键步骤和责任智能体。

详情
AI中文摘要

基于LLM的智能体越来越多地通过包含推理步骤、工具调用和智能体间通信的长轨迹来解决复杂任务。然而,当这些智能体失败时,通常不清楚是哪个智能体导致了失败,以及哪个步骤引入了决定性错误。这个归因问题具有挑战性,因为错误可以在轨迹中传播:后续动作可能看起来不正确,但仅仅是因为它们依赖于先前被破坏的状态。因此,失败归因不能被视为独立的步骤级分类。 我们提出FALAT,一个用于LLM智能体轨迹中失败归因的诊断框架。FALAT将归因问题框架化为一个依赖引导的搜索问题。它首先构建任务应如何解决的期望,并利用该期望识别轨迹中的可疑区域。然后,它追踪决策、工具输出和智能体消息之间的依赖关系,以区分引入错误的步骤和仅仅继承或传播先前错误的步骤。最后,FALAT评估纠正候选步骤是否足以恢复预期结果,从而能够识别责任智能体和决定性失败步骤。 我们在Who&When基准上评估FALAT,该基准包括算法生成和手工制作的多智能体失败轨迹。结果表明,FALAT持续改进了责任智能体和决定性步骤的归因。其最佳配置在算法生成轨迹上达到46.0%的步骤级准确率,在更具挑战性的手工制作轨迹上达到29.1%,优于专门的归因基线和直接提示的独立LLM。这些发现表明,依赖感知推理对于LLM智能体系统中可靠的失败诊断至关重要。

英文摘要

LLM-based agents increasingly solve complex tasks through long trajectories involving reasoning steps, tool calls, and inter-agent communication. However, when these agents fail, it is often unclear which agent caused the failure and which step introduced the decisive error. This attribution problem is challenging because mistakes can propagate across the trajectory: later actions may appear incorrect, but only because they depend on an earlier corrupted state. Therefore, failure attribution cannot be treated as independent step-level classification. We propose FALAT, a diagnostic framework for failure attribution in LLM agent trajectories. FALAT frames attribution as a dependency-guided search problem. It first constructs an expectation of how the task should be solved and uses this expectation to identify suspicious regions in the trajectory. It then traces dependencies among decisions, tool outputs, and agent messages to distinguish error-introducing steps from steps that merely inherit or propagate prior mistakes. Finally, FALAT evaluates whether correcting a candidate step would be sufficient to recover the expected outcome, allowing it to identify both the responsible agent and the decisive failure step. We evaluate FALAT on the Who&When benchmark, which includes both algorithm-generated and hand-crafted multi-agent failure trajectories. The results show that FALAT consistently improves responsible-agent and decisive-step attribution. Its best configurations achieve 46.0% step-level accuracy on algorithm-generated trajectories and 29.1% on the more challenging hand-crafted trajectories, outperforming specialized attribution baselines and direct prompting with standalone LLMs. These findings suggest that dependency-aware reasoning is essential for reliable failure diagnosis in LLM agent systems.

2606.00762 2026-06-02 cs.RO

STEM: Semantic Target Search and Exploration using MAVs in Cluttered Environments

STEM: 杂乱环境中使用MAV的语义目标搜索与探索

Nikhil Sethi, Max Lodel, Laura Ferranti, Robert Babuška, Javier Alonso-Mora

发表机构 * Department of Cognitive Robotics(认知机器人学系) Delft University of Technology(代尔夫特理工大学) CIIRC(捷克技术大学布拉格分校智能信息研究中心)

AI总结 提出一种基于语义引导视点规划器的框架,利用MAV在非结构化3D环境中最小化目标搜索与探索时间,通过组合规划器和主动感知管道实现高效语义探索。

Comments Accepted to Autonomous Robots Journal. Nikhil Sethi and Max Lodel contributed equally

详情
AI中文摘要

自主目标搜索对于在应急响应和救援任务中部署微型飞行器(MAV)至关重要。现有方法要么专注于结构化环境中的2D语义导航(在复杂3D环境中效果较差),要么专注于杂乱空间中的机器人探索(通常缺乏高效目标搜索所需的语义推理)。本文通过提出一种新颖框架克服了这些限制,该框架利用语义引导的视点规划器,使用MAV在非结构化3D环境中最小化目标搜索和探索时间。具体来说,我们开发了一个组合规划器,通过优先考虑可能导向目标的视点来生成高效的语义探索计划。为了引导规划器朝向目标,开发了一个主动感知管道,将观察到的物体的语义优先级传播到相邻的前沿体素中,以计算前沿视点的语义信息增益。此外,我们展示了如何利用基于LLM的相似度分数作为我们管道的语义优先级输入。在两个不同模拟环境中的评估表明,所提方法通过快速找到目标同时保持合理的探索时间,始终优于基线方法。使用MAV的真实世界实验进一步证明了该方法处理实际约束(如有限电池寿命、小传感器范围和语义不确定性)的能力。

英文摘要

Autonomous target search is crucial for deploying Micro Aerial Vehicles (MAVs) in emergency response and rescue missions. Existing approaches either focus on 2D semantic navigation in structured environments -- which is less effective in complex 3D settings, or on robotic exploration in cluttered spaces -- which often lacks the semantic reasoning needed for efficient target search. This paper overcomes these limitations by proposing a novel framework that utilizes a semantically-guided viewpoint planner to minimize target search and exploration time in unstructured 3D environments using an MAV. Specifically, we develop a combinatorial planner that generates efficient semantic exploration plans by prioritizing viewpoints that likely lead to the target. To guide the planner towards the target, an active perception pipeline is developed that propagates semantic priorities of observed objects into neighboring frontier voxels for computing semantic information gains of frontier viewpoints. In addition, we demonstrate how LLM-based similarity scores can be leveraged as semantic priority input to our pipeline. Evaluations in two distinct simulation environments show that the proposed method consistently outperforms baselines by quickly finding the target while maintaining reasonable exploration times. Real-world experiments with an MAV further demonstrate the method's ability to handle practical constraints like limited battery life, small sensor range, and semantic uncertainty.

2606.00761 2026-06-02 cs.LG cs.CL

Confidence-Adaptive SwiGLU for Mixture-of-Experts

Confidence-Adaptive SwiGLU for Mixture-of-Experts

Shaohua Li, Xiuchao Sui, Xiaobing Sun, Yuhang Wu, Liangli Zhen, Yong Liu, Rick Siow Mong Goh

发表机构 * Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore(高性能计算研究所,新加坡科技研究局) Shanghai University of Engineering Science, China(上海工程技术大学)

AI总结 提出 Confidence-Aware SwiGLU (κ-SwiGLU),通过根据 token 级路由置信度调整专家门控锐度,在 MoE Transformer 中提升性能且仅增加少量参数和计算开销。

Comments 13 pages, 10 figures

详情
AI中文摘要

SwiGLU 已成为现代 Transformer MLP 中的标准门控激活函数,但其门控锐度——即门控函数的平滑性和选择性——在整个训练过程中通常是固定的。在这项工作中,我们提出了 Confidence-Aware SwiGLU (κ-SwiGLU),这是 SwiGLU 的一种变体,用于混合专家 (MoE) 模型,它根据 token 级路由置信度调整专家门控锐度。具体来说,κ-SwiGLU 将 SiLU 门控锐度系数参数化为路由器 logit 的可学习函数,使每个专家门控单元能够在平滑、广泛激活的门控和尖锐、选择性门控之间进行插值。我们在 FineWeb-Edu 数据集上评估了 κ-SwiGLU,使用了从 8 层到 28 层的 MoE Transformer 模型。在这些设置中,κ-SwiGLU 提高了平均 CORE 性能,同时仅增加了可忽略的参数和少量计算开销,表明置信度感知的门控锐度是改进 MoE MLP 的一种有前景的机制。代码可在 https://github.com/askerlee/kappa-swiglu 获取。

英文摘要

SwiGLU has become a standard gated activation in modern Transformer MLPs, yet its gate sharpness -- the smoothness and selectivity of the gating function -- is typically fixed throughout training. In this work, we propose Confidence-Aware SwiGLU ($κ$-SwiGLU), a variant of SwiGLU for Mixture-of-Experts (MoE) models that adjusts expert gate sharpness according to token-level routing confidence. Specifically, $κ$-SwiGLU parameterizes the SiLU gate sharpness coefficient as a learnable function of the router logit, enabling each expert gate unit to interpolate between smooth, broadly active gating and sharp, selective gating. We evaluate $κ$-SwiGLU on the FineWeb-Edu dataset across MoE Transformer models ranging from 8 to 28 layers. Across these settings, $κ$-SwiGLU improves mean CORE performance while adding negligible parameters and incurring only a small computational overhead, demonstrating that confidence-aware gate sharpness is a promising mechanism for improving MoE MLPs. The code is available at https://github.com/askerlee/kappa-swiglu.

2606.00759 2026-06-02 cs.LG

Distributed GNEP Algorithms without Multiplier Sharing and Applications to Multi-Robot Coordination and Contextual Bandit-Based Active Learning

无乘子共享的分布式GNEP算法及其在多机器人协调和基于上下文赌博机的主动学习中的应用

Shao-An Yin

发表机构 * Shao-An Yin(殷少安)

AI总结 提出无需交换拉格朗日乘子的全分布式连续时间算法,收敛到广义纳什均衡(非仅变分均衡),并应用于多机器人协调和基于上下文赌博机的主动学习策略选择。

Comments 136 pages, 14 figures

详情
AI中文摘要

人工智能的最新进展将关注点从经典优化扩展到非合作博弈中的均衡分析。许多此类博弈涉及共享约束,从而产生广义纳什均衡问题(GNEP)。现有的分布式算法通常要求智能体交换拉格朗日乘子以强制执行共识并计算变分GNEs(v-GNEs)。 本文介绍了全分布式连续时间算法,并在不需要交换乘子的情况下建立收敛性,从而减少每次迭代的信息交换,同时提高隐私保护。分析聚焦于具有凸个体约束和线性共享约束的强单调博弈。我还提出了连续时间算法的几种离散化方案。所提出的方法收敛到一般的GNEs,而非仅限于v-GNEs,达到的均衡取决于初始化。通过多机器人协调和放置应用展示了所提方法的有效性。 在第二部分中,本文包括与亚马逊科学家合作进行的研究。现实世界机器学习中最具挑战性的问题之一是标记数据收集,这通常需要大量的人力和成本。主动学习旨在减少这种标记需求。然而,现有的手工主动学习策略通常仅在特定类型的数据集上表现良好,而这些数据集往往是事先未知的。在本文中,我提出使用上下文赌博机自适应地选择最合适的主动学习策略。在公开的外部数据集上展示了所提方法的有效性。

英文摘要

Recent advances in artificial intelligence have expanded the focus from classical optimization to include equilibrium analysis in noncooperative games. Many such games involve shared constraints, leading to Generalized Nash Equilibrium Problems (GNEPs). Existing distributed algorithms typically require agents to exchange Lagrange multipliers to enforce consensus and compute variational-GNEs (v-GNEs). This work introduces fully distributed continuous-time algorithms and establishes convergence without requiring multiplier exchange, thereby reducing information exchange per iteration while improving privacy preservation. The analysis focuses on strongly monotone games with convex individual constraints and linear shared constraints. I also propose several discretization schemes for the continuous-time algorithms. The proposed approach converges to general GNEs, rather than being restricted to v-GNEs, with the attained equilibrium depending on the initialization. The effectiveness of the proposed method is demonstrated through applications in multi-robot coordination and placement. In the second part, this work includes research conducted in collaboration with Amazon scientists. One of the most challenging problems in real-world machine learning is labeled data collection, which typically requires substantial human effort and cost. Active learning aims to reduce this labeling requirement. Existing handcrafted active learning strategies, however, generally perform well only on specific types of datasets, which are often unknown in advance. In this work, I propose using contextual bandits to adaptively select the most suitable active learning strategy. The effectiveness of the proposed approach is demonstrated on publicly available external datasets.

2606.00756 2026-06-02 cs.AI

CoMIC: Collaborative Memory and Insights Circulation for Long-Horizon LLM Agents in Cloud-Edge Systems

CoMIC:云边系统中长周期LLM代理的协作记忆与洞察循环

Yannan Wang, Longli Yang, Zhen Liu, Abhishek Kumar, Carsten Maple

发表机构 * Beijing Jiaotong University(北京交通大学) The Alan Turing Institute(艾伦·图灵研究所) University of Warwick(沃里克大学)

AI总结 提出无需参数更新的云边框架CoMIC,通过集中式反思与分布式执行设计,利用语义子目标标识实现跨代理经验聚合,提升弱边缘代理在长周期任务中的进展率和动作基础。

详情
AI中文摘要

在边缘服务器上部署轻量级大语言模型(LLM)代理可以减少延迟并将代理服务更贴近用户,但资源受限的边缘模型在处理需要持久记忆、子目标跟踪和反思的长周期任务时往往表现不佳。部署后对边缘模型进行微调成本高昂且难以在异构节点上扩展,而纯本地记忆则使代理拥有孤立经验并导致提示上下文不断增长。我们提出 extsc{CoMIC},一种无需参数更新的云边框架,用于协作记忆与洞察循环。 extsc{CoMIC}遵循 extit{集中式反思,分散式执行}的设计:边缘代理使用面向子目标的分层记忆和选择性重新展开相关历史在本地执行,而云端LLM批评者异步评估完成的轨迹,过滤可重用经验,并通过语义子目标标识符聚合跨代理指导。在涵盖符号规划和文本交互的五项长周期代理任务中, extsc{CoMIC}提高了弱边缘代理的进展率和动作基础,并在不更新模型参数的情况下实现了任务相关的成功率提升。

英文摘要

Deploying lightweight Large Language Model (LLM) agents on edge servers can reduce latency and move agentic services closer to users, but resource-constrained edge models often struggle with long-horizon tasks that require persistent memory, subgoal tracking, and reflection. Fine-tuning edge models after deployment is costly and difficult to scale across heterogeneous nodes, while purely local memory leaves agents with isolated experience and growing prompt context. We propose \textsc{CoMIC}, a parameter-update-free cloud-edge framework for Collaborative Memory and Insights Circulation. \textsc{CoMIC} follows a \textit{Centralized Reflection, Decentralized Execution} design: edge agents execute locally using subgoal-oriented hierarchical memory and selective re-expansion of relevant histories, while a cloud-side LLM critic asynchronously evaluates completed trajectories, filters reusable experience, and aggregates cross-agent guidance keyed by semantic subgoal identifiers. Across five long-horizon agent tasks spanning symbolic planning and text interaction, \textsc{CoMIC} improves progress rate and action grounding for weak edge agents and yields task-dependent success-rate gains without updating model parameters.

2606.00755 2026-06-02 cs.CL cs.LG

Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning

内化温度:面向强化学习的同策略自蒸馏作为策略加热器

Xuewei Yang, Jiachen Yu, Jie Wu, Shaoning Sun, Junjie Wang, Yujiu Yang

发表机构 * Tsinghua University(清华大学)

AI总结 提出温度缩放同策略自蒸馏(TS-OPSD),通过将温度探索效应内化到模型参数中,缓解强化学习中的熵崩溃问题,无需外部教师或额外推理成本。

详情
AI中文摘要

基于可验证奖励的强化学习提升了大语言模型的推理能力,但常常遭受熵崩溃,即日益集中的策略减少了轨迹多样性和有用的学习信号。现有补救措施要么约束强化学习目标(如熵正则化),要么在轨迹收集期间调整采样温度,但这些干预措施仍外在于模型参数。我们提出温度缩放同策略自蒸馏(TS-OPSD),一种轻量级的策略加热方法,将温度的探索效应内化到模型参数中。从熵崩溃的强化学习检查点开始,TS-OPSD 通过对模型自身的 logits 应用高温缩放来构建自教师,然后将得到的更平滑分布蒸馏回学生。这种策略加热不需要外部教师、特权数据或额外的推理成本。在 Qwen3-4B-Base 和 Qwen3-8B-Base 上的实验表明,策略加热为继续强化学习提供了比标准继续强化学习和轨迹级温度加热更强的初始化。进一步分析表明,TS-OPSD 主要降低输出锐度,同时保留中间表示、顶级候选集和推理能力。这些结果表明,熵恢复可以作为面向推理的强化学习的一种简单的崩溃后干预措施。

英文摘要

Reinforcement learning from verifiable rewards improves the reasoning ability of large language models, but often suffers from entropy collapse, in which increasingly concentrated policies reduce rollout diversity and useful learning signals. Existing remedies either constrain the RL objective (e.g., entropy regularization) or adjust sampling temperature during rollout collection, but these interventions remain external to the model parameters. We propose Temperature-Scaled On-Policy Self-Distillation (TS-OPSD), a lightweight policy reheating method that internalizes the exploratory effect of temperature into model parameters. Starting from an entropy-collapsed RL checkpoint, TS-OPSD constructs a self-teacher by applying high-temperature scaling to the model's own logits, then distills the resulting smoother distribution back into the student. This policy reheating requires no external teacher, privileged data, or additional inference cost. Experiments on Qwen3-4B-Base and Qwen3-8B-Base show that policy reheating yields a stronger initialization for continued RL than both standard continued RL and rollout-level temperature reheating. Further analyses show that TS-OPSD mainly reduces output sharpness while preserving intermediate representations, top candidate sets, and reasoning capability. These results suggest that entropy restoration can serve as a simple post-collapse intervention for extending reasoning-oriented RL.

2606.00752 2026-06-02 cs.LG cs.CE cs.HC

A multimodal dataset of photoplethysmography and continuous behavioral responses to ASMR and nature videos

光电容积描记术和ASMR及自然视频连续行为反应的多模态数据集

Tushar Das, Daigo Hozaki, Koushlendra Kumar Singh, Hirohito M. Kondo

发表机构 * Machine Vision & Intelligence Lab, National Institute of Technology Jamshedpur(机器视觉与智能实验室,jamshedpur国家理工学院) School of Psychology, Chukyo University(心理学系,chukyo大学)

AI总结 提出REST-ASMR数据集,包含34名参与者在ASMR和自然视频刺激下的光电容积描记图、时间对齐视听刺激和连续主观标注,验证了刺激有效性、心血管减速,并通过双向长短期记忆模型实现高精度ASMR状态分类。

详情
AI中文摘要

自主感觉经络反应(ASMR)是一种以愉悦刺痛感和心血管减慢为特征的体感现象。然而,ASMR研究因缺乏标准化、开放获取的多模态数据集而受到阻碍。为解决这一限制,我们提出了REST-ASMR(对环境与感觉刺激的反应),这是一个同步的多模态数据集,旨在捕捉ASMR期间的行为报告和生理动态,并以自然放松视频作为对照刺激。该数据集包括来自34名参与者的高分辨率光电容积描记图(PPG)、时间对齐的视听刺激和连续主观标注。技术验证显示高刺激有效性(97%响应率)、显著的刺激特异性受试者间一致性(p < 0.05),以及稳健的PPG衍生的ASMR特异性心血管减速。此外,双向长短期记忆模型成功预测了主观ASMR刺痛状态,在严格的、无泄漏的受试者-视频双独立4折交叉验证下,实现了视频级ASMR与自然分类的完美准确率,以及帧级全局平均准确率75.51%、宏F1分数71.86%和100%自然基线特异性。REST-ASMR为情感计算、多模态研究以及个性化放松相关反应模型的发展提供了密集的时间基础。

英文摘要

Autonomous Sensory Meridian Response (ASMR) is a somatosensory phenomenon characterized by pleasant tingling sensations and cardiovascular slowing. However, ASMR research has been hindered by a dearth of standardized, open-access multimodal datasets. To address this limitation, we present REST-ASMR (Response to Environmental & Sensory Triggers), a synchronized multimodal dataset designed to capture behavioral reports and physiological dynamics during ASMR, with nature-relaxation videos as control stimuli. The dataset includes high-resolution photoplethysmography (PPG), time-aligned audiovisual stimuli, and continuous subjective annotations from 34 participants. Technical validation showed high stimulus efficacy (97% responder rate), significant stimulus-specific inter-subject agreement (p < 0.05), and a robust PPG-derived ASMR-specific cardiovascular deceleration. Additionally, a Bidirectional Long-Short Term Memory model successfully predicted subjective ASMR tingle states, achieving video-level ASMR vs. Nature classification with perfect accuracy and a frame-level global mean accuracy of 75.51%, macro F1-score of 71.86%, and 100% Nature-baseline specificity, under a strict, leakage-free subject-video double-independent 4-fold cross-validation. REST-ASMR constitutes a dense temporal foundation for affective computing, multimodal research, and the development of personalized models of relaxation-related responses.

2606.00751 2026-06-02 cs.CV

Head-Pose-Aware Visual Speech Recognition with FiLM Modulation

基于FiLM调制的头部姿态感知视觉语音识别

Matthew Kit Khinn Teng, Haibo Zhang, Takeshi Saitoh

发表机构 * Department of Artificial Intelligence, Kyushu Institute of Technology(人工智能系,九州工业大学)

AI总结 提出HP-VSR-ResFiLM框架,通过姿态条件残差FiLM模块显式融入头部姿态信息,在LRS2和LRS3上分别达到25.0%和33.2%的词错误率,有效提升非正面视角下视觉语音识别的鲁棒性。

Comments 27 pages, 4 figures

详情
AI中文摘要

视觉语音识别(VSR)旨在从唇部运动等视觉线索中识别语音,但其性能从根本上受到音素模糊性和姿态引起的变化(引入几何畸变和遮挡)的限制。现有方法主要依赖语言上下文或隐式不变性,导致非正面视角下的视觉表示不够鲁棒。本文提出一个姿态感知的音素级框架HP-VSR-ResFiLM,显式地将头部姿态信息融入视觉特征提取。该框架采用两阶段流水线:阶段1为姿态条件视觉编码器,阶段2使用预训练NLLB语言模型进行音素到文本重建。具体地,阶段1在2D CNN前端后引入姿态条件残差特征线性调制(FiLM)块,利用头部姿态信息自适应地优化视觉表示。在LRS2和LRS3上的实验表明,HP-VSR-ResFiLM在可比训练条件下取得了竞争性性能,无需额外训练数据即分别达到25.0%和33.2%的词错误率(WER)。消融研究进一步显示,单个残差FiLM块持续改善整体WER,而第3层和第4层的更深层调制为偏航角大于30°的样本带来更大增益,且不降低小姿态变化样本的性能。这些发现表明,显式的姿态感知特征调制为在无约束场景下提升VSR鲁棒性提供了一种有效且计算高效的解决方案。

英文摘要

Visual Speech Recognition (VSR) aims to recognize speech from visual cues such as lip movements, but its performance is fundamentally limited by viseme ambiguity and pose-induced variations that introduce geometric distortions and occlusions. Existing approaches mainly rely on linguistic context or implicit invariance, leaving visual representations insufficiently robust under non-frontal views. In this work, we propose a pose-aware phoneme-level framework, termed HP-VSR-ResFiLM, that explicitly incorporates head-pose information into visual feature extraction. The proposed framework adopts a two-stage pipeline consisting of a pose-conditioned visual encoder in Stage 1 and a pretrained NLLB language model in Stage 2 for phoneme-to-text reconstruction. Specifically, Stage 1 incorporates a pose-conditioned residual Feature-wise Linear Modulation (FiLM) block after the 2D CNN frontend to adaptively refine visual representations using head-pose information. Experiments on LRS2 and LRS3 demonstrate that HP-VSR-ResFiLM achieves competitive performance under comparable training conditions, attaining word error rates (WER) of 25.0% and 33.2%, respectively, without relying on additional training data. Ablation studies further show that a single residual FiLM block consistently improves overall WER, while deeper modulation at Layers 3 and 4 provides larger gains for samples with yaw angles greater than 30° without degrading performance for smaller pose variations. These findings demonstrate that explicit pose-aware feature modulation offers an effective and computationally efficient solution for improving VSR robustness in unconstrained settings.

2606.00750 2026-06-02 cs.CL

I-WebGenBench : Evaluating Interactivity in LLM-Generated Scientific Web Applications

I-WebGenBench: 评估大语言模型生成的科学网页应用中的交互性

Dasen Dai, Biao Wu, Meng Fang, Shuoqi Li, Wenhao Wang

发表机构 * Vast Intelligence Lab(vast 智能实验室) UTS University of Liverpool(利物浦大学)

AI总结 提出 Paper-to-Interactive-System Agent 将研究论文转化为可执行交互式网页系统,并构建 PaperVoyager 框架以显式建模机制和交互逻辑,显著提升生成质量。

Comments 9 pages, 4 figures

详情
AI中文摘要

近期视觉语言模型的进展使得自主代理能够进行复杂推理、工具使用和文档理解。然而,现有的文档代理主要将论文转化为静态产物,如摘要、网页或幻灯片,这对于涉及动态机制和状态转换的技术论文来说是不够的。在这项工作中,我们提出了一个论文到交互式系统的代理,将研究论文转化为可执行的交互式网页系统。给定一篇 PDF 论文,该代理无需人工干预即可进行端到端处理,包括论文理解、系统建模和交互式网页合成,使用户能够操作输入并观察动态行为。为了评估这一任务,我们引入了一个包含 19 篇研究论文的基准测试,每篇论文都配有专家构建的交互式系统作为真实值。我们进一步提出了 PaperVoyager,一个结构化生成框架,在合成过程中显式建模机制和交互逻辑。实验表明,PaperVoyager 显著提高了生成的交互式系统的质量,为交互式科学论文理解提供了新的范式。

英文摘要

Recent advances in visual language models have enabled autonomous agents for complex reasoning, tool use, and document understanding. However, existing document agents mainly transform papers into static artifacts such as summaries, webpages, or slides, which are insufficient for technical papers involving dynamic mechanisms and state transitions. In this work, we propose a Paper-to-Interactive-System Agent that converts research papers into executable interactive web systems. Given a PDF paper, the agent performs end-to-end processing without human intervention, including paper understanding, system modeling, and interactive webpage synthesis, enabling users to manipulate inputs and observe dynamic behaviors. To evaluate this task, we introduce a benchmark of 19 research papers paired with expert-built interactive systems as ground truth. We further propose PaperVoyager, a structured generation framework that explicitly models mechanisms and interaction logic during synthesis. Experiments show that PaperVoyager significantly improves the quality of generated interactive systems, offering a new paradigm for interactive scientific paper understanding.

2606.00746 2026-06-02 cs.CV cs.LG

Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders

将并行序列模型扩展到基础规模的视觉编码器

Yitong Jiang, Hongjun Wang, Collin McCarthy, Hanrong Ye, David Wehr, Xinhao Li, Qi Dou, Tianfan Xue, Ka Chun Cheung, Simon See, Wonmin Byeon, Ke Chen, Kai Han, Jinwei Gu, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Sifei Liu

发表机构 * NVIDIA The Chinese University of Hong Kong(香港中文大学) The University of Hong Kong(香港大学) University of California, San Diego(加州大学圣地亚哥分校)

AI总结 提出C-GSPN,一种基于2D空间传播的基础规模视觉编码器,通过快速CUDA内核、压缩潜在空间传播块和两阶段交叉算子蒸馏,在减少参数的同时提升性能并实现高效推理。

详情
AI中文摘要

视觉基础模型受限于自注意力的二次成本,这限制了可用分辨率并增加了大规模预训练的成本。次二次替代方案如线性注意力和状态空间模型降低了这一成本,但通常将图像序列化为1D令牌流,削弱了对视觉重要的2D空间结构。广义空间传播网络(GSPN)通过线扫描递归直接在2D网格上传播上下文,实现了接近线性的复杂度且无需位置嵌入,但很少用作基础规模的编码器。我们提出C-GSPN,一种基于2D空间传播的基础规模视觉编码器。C-GSPN通过三项改进使该算子实用化:(1)一个快速的GSPN CUDA内核,将每步启动融合为单个warp专用实现,采用共享内存分块、合并访问和紧凑的多通道传播,达到峰值内存带宽的90%以上,运行速度比原始GSPN实现快40-52倍;(2)一个带有融合归一化的压缩潜在空间传播块,将内核级速度转化为块级和模型级效率;(3)一个两阶段交叉算子蒸馏方案,从注意力教师训练新架构,无需从头开始进行基础规模训练的成本。使用6亿图像-文本对进行蒸馏,C-GSPN以少15%的参数匹配同构ViT基线,在ADE20K分割上提升+2.1%,以极少的数据迁移到高分辨率,并在2K分辨率下通过单次无分块推理实现4倍的端到端块加速。

英文摘要

Vision foundation models are bottlenecked by the quadratic cost of self-attention, which limits usable resolution and increases the cost of large-scale pretraining. Subquadratic alternatives such as linear attention and state-space models reduce this cost, but often serialize images into 1D token streams and weaken the 2D spatial structure important for vision. Generalized Spatial Propagation Networks (GSPN) instead propagate context directly on the 2D grid through line-scan recurrences, achieving near-linear complexity without positional embeddings, but have seen little use as foundation-scale encoders. We present C-GSPN, a foundation-scale vision encoder based on 2D spatial propagation. C-GSPN makes the operator practical through three improvements: (1) a fast GSPN CUDA kernel that fuses per-step launches into a single warp-specialized implementation with shared-memory tiling, coalesced access, and a compact multi-channel propagation, reaching over 90% of peak memory bandwidth and running up to 40--52x faster than the original GSPN implementation; (2) a compressed latent-space propagation block with fused normalization, which turns kernel-level speed into block- and model-level efficiency; and (3) a two-stage cross-operator distillation recipe that trains the new architecture from an attention teacher without the cost of from-scratch foundation-scale training. Distilled with 600M image-text pairs, C-GSPN matches an isomorphic ViT baseline with 15% fewer parameters, improves ADE20K segmentation by +2.1%, transfers to high resolution with a fraction of the data needed from scratch, and delivers a 4x end-to-end block speedup at 2K with single-pass, tiling-free inference.

2606.00741 2026-06-02 cs.LG cs.AI stat.ML

Quantum Tunneling-Aware Machine Learning: Physics-Derived Noise Models for Robust Deployment

量子隧穿感知机器学习:面向鲁棒部署的物理衍生噪声模型

Uiwon Hwang, Jaeho Hwang

发表机构 * Department of Computer Science and Engineering(计算机科学与工程系) Human-Centered Artificial Intelligence Research Institute(以人为本的人工智能研究院)

AI总结 本文提出量子隧穿感知机器学习(QTAML),通过WKB近似推导部署时的权重误差分布,并设计隧穿感知补偿(TAC)算法,在无需重训练和标签的情况下,以较低ECC开销恢复模型精度。

详情
AI中文摘要

晶体管缩放正接近量子力学极限,因为薄栅氧化物通过量子隧穿引起电子泄漏。与传统数字系统不同,只要错误结构被正确建模,AI推理可以容忍此类错误。在本文中,我们引入量子隧穿感知机器学习(QTAML)。我们使用Wentzel-Kramers-Brillouin(WKB)近似从第一性原理推导部署时的权重误差分布,并表明它具有通用高斯噪声模型所忽略的结构:精确的仿射均值漂移、由最高有效位主导的逐位方差层级,以及依赖于$\|W_\ell\|_\infty$和训练网络Jacobian的逐层依赖性。我们将这三个结构属性打包成一个单一的部署时算法——隧穿感知补偿(TAC),该算法结合了闭式均值校正和基于WKB方差分解的最优逐层自适应比特预算分配。在$p_\mathrm{flip}=0.10$的四个卷积架构和$p_\mathrm{flip}=0.05$的一个Transformer编码器上,TAC达到了干净精度的95%,同时ECC开销比从相同物理导出的自然基线Uniform-MSP低3.4倍到33.6倍。闭式饱和比$ ho^*$预先预测了这些增益,在异构架构上,WKB导出的评分在小预算下比基于幅度的分配高出多达24个百分点。该算法无需重训练、无需标签,且无推理时开销。我们还验证了WKB导出的分布定理达到蒙特卡洛精度。这些结果将WKB隧穿物理与噪声感知深度学习联系起来,并为超越传统缩放极限的硬件-软件协同设计提供了一条有原则的路径。

英文摘要

Transistor scaling is approaching a quantum-mechanical limit, as thin gate oxides induce electron leakage through quantum tunneling. Unlike conventional digital systems, AI inference can tolerate such errors provided their structure is modeled correctly. In this paper, we introduce quantum tunneling-aware machine learning (QTAML). We derive the deployment-time weight-error distribution from first principles using the Wentzel-Kramers-Brillouin (WKB) approximation and show that it has structure that generic Gaussian noise models miss: an exact affine mean drift, a per-bit variance hierarchy dominated by the most-significant bit, and a per-layer dependence on $\|W_\ell\|_\infty$ and the trained-network Jacobian. We package these three structural properties into a single deployment-time algorithm, Tunneling-Aware Compensation (TAC), that combines closed-form mean correction with an optimal layer-adaptive bit-budget allocation derived from the WKB variance decomposition. Across four convolutional architectures at $p_\mathrm{flip}$=0.10 and a transformer encoder at $p_\mathrm{flip}$=0.05, TAC reaches $95\%$ of clean accuracy with 3.4$\times$ to 33.6$\times$ less ECC overhead than Uniform-MSP, the natural baseline derived from the same physics. The closed-form saturation ratio $ρ^*$ predicts these gains in advance, and on heterogeneous architectures WKB-derived scoring outperforms magnitude-based allocation by up to 24 percentage points at small budgets. The algorithm requires no retraining, no labels, and no inference-time overhead. We also verify the WKB-derived distributional theorems to Monte Carlo precision. These results connect WKB tunneling physics with noise-aware deep learning and suggest a principled path toward hardware--software co-design beyond conventional scaling limits.

2606.00739 2026-06-02 cs.LG

Score $\times$ Decoder: A Unified View of Unsupervised Inference-Time Scaling for Hallucination Mitigation

Score × Decoder:无监督推理时缩放缓解幻觉的统一视角

Yun-Chen Cheng, Che-Yu Lin, Cheng-Lin Yang

发表机构 * CyCraft AI Lab, Taiwan(CyCraft人工智能实验室,台湾)

AI总结 本文提出Score×Decoder框架,通过配对四种内在分数(困惑度、对比度、幂分布似然、自验证)与三种解码族(优化、采样、共识),在无监督条件下选择最佳组合以缓解大语言模型幻觉。

详情
AI中文摘要

大型语言模型即使答案在其参数范围内也会产生幻觉。虽然推理时缩放可以揭示这种潜在知识,但最有效的方法需要监督:一个训练好的验证器或奖励模型。我们探讨仅使用基础语言模型可以做什么:哪个内在信号最能识别正确输出,以及应该如何解码?我们将此视为一个分数×解码器网格,将四种分数(困惑度、对比度、幂分布似然和自验证)与三种解码族(优化、采样、共识)配对,并在MATH500上使用基础版和指令调优版Qwen3-1.7B评估每个单元格。虽然自验证(提示模型判断自己的答案,并通过无训练虚拟思考前缀增强)在大多数设置中效果良好,但没有一个分数具有固定质量:其价值取决于使用它的解码器和模型能力。当没有监督可用时,必须同时选择分数和解码族。

英文摘要

Large language models hallucinate even when the answer lies within their parameters. While inference-time scaling can surface this latent knowledge, the most effective methods require supervision: a trained verifier or reward model. We ask what can be done with only a base language model: which intrinsic signal best identifies correct outputs, and how should it be decoded? We cast this as a score~$\times$~decoder grid pairing four scores (perplexity, contrastive, power-distribution likelihood, and self-verification) with three decoding families (optimization, sampling, consensus), and evaluate every cell on MATH500 with the base and instruction-tuned Qwen3-1.7B. While self-verification, which prompts the model to judge its own answer and is sharpened by a training-free virtual-thinking prefix, works well in most settings, no score has a fixed quality: its value depends on the decoder that consumes it and on model capability. When no supervision is available, the score and the decoding family must be chosen together.

2606.00738 2026-06-02 cs.LG cs.AI cs.CV

SORA: Free Second-Order Attacks in Fast Adversarial Training

SORA:快速对抗训练中的自由二阶攻击

Mazdak Teymourian, Ramtin Moslemi, Farzan Rahmani, Mohammad Hossein Rohban

发表机构 * Department of Computer Engineering, Sharif University of Technology, Tehran, Iran(谢赫大学计算机工程系)

AI总结 针对快速对抗训练中的灾难性过拟合问题,提出通过扰动变异性和梯度对齐指标PertAlign来预测并防止过拟合,并设计自适应步长方法SORA,实现最优鲁棒性和干净准确率。

Comments Accepted at ICML 2026

详情
AI中文摘要

对抗训练是对抗性样本的主要防御手段,但在高效的单步变体中常常遭受灾难性过拟合,即尽管单步性能很高,但对多步攻击的鲁棒性却崩溃。我们通过两个贡献来解决这种失效模式。首先,我们形式化了epsilon过拟合(EO),这是一种固定扰动幅度和方向加剧CO的视角,并表明引入扰动变异性可以显著提高不同架构和数据集上的鲁棒泛化能力。其次,我们提出了PertAlign(扰动对齐),这是一种理论上合理、计算开销可忽略的指标,通过测量攻击阶段的梯度对齐来预测CO的发生。利用这些见解,我们引入了SORA,一种自适应步长的AT方法,它根据损失曲面几何动态调整扰动。SORA始终能防止CO,实现最先进的鲁棒性和干净准确率,并使用一组固定的超参数在数据集和架构上泛化,这对于快速AT的适用性至关重要。在不同数据集和架构上的大量实验表明,SORA在提供更高干净准确率和卓越效率的同时,匹配或超越了先前方法的鲁棒性。代码可在https://github.com/SecondOrderAT/SORA获取。

英文摘要

Adversarial Training (AT) is a leading defense against adversarial examples but often suffers from Catastrophic Overfitting (CO) in efficient single-step variants, where robustness to multi-step attacks collapses despite high single-step performance. We address this failure mode with two contributions. First, we formalize Epsilon Overfitting (EO), a perspective in which fixed perturbation magnitudes and directions exacerbate CO, and show that introducing perturbation variability significantly improves robust generalization across different architectures and datasets. Second, we propose PertAlign (Perturbation Alignment), a theoretically grounded, computationally negligible metric that predicts CO onset by measuring gradient alignment across attack stages. Leveraging these insights, we introduce SORA, an adaptive step-size AT method that dynamically adjusts perturbations based on loss surface geometry. SORA consistently prevents CO, achieves state-of-the-art robustness and clean accuracy, and generalizes across datasets and architectures using a single fixed set of hyperparameters, which is essential for applicability in fast AT. Extensive experiments on diverse datasets and architectures show that SORA matches or surpasses the robustness of prior methods while delivering higher clean accuracy and superior efficiency. Code is available at https://github.com/SecondOrderAT/SORA.

2606.00737 2026-06-02 cs.RO math.OC

Beyond Pure Sampling: Hybrid Optimization Mechanisms for Non-Convex Model Predictive Control

超越纯采样:非凸模型预测控制的混合优化机制

Yuichiro Aoyama, Minchan Jung, Akash Ratheesh, Evangelos A. Theodorou

发表机构 * School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA, USA(航空航天工程学院,佐治亚理工学院,美国亚特兰大,GA州) Development Division, Komatsu Ltd., Tokyo, Japan(Komatsu Ltd.开发部门,日本东京) Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea(电气与计算机工程系,inha大学,韩国仁川,大韩民国)

AI总结 本文提出一种结合梯度下降与基于逆Hessian采样的双步优化机制,用于非凸模型预测控制,在多种机器人导航任务中相比纯采样方法(如MPPI)具有更高成功率和稳定性。

Comments 28 pages, 13 figures

详情
AI中文摘要

本文研究了使用最大熵微分动态规划(ME-DDP)框架的非凸模型预测控制(MPC)的优化机制。由非线性动力学、多个障碍物等引起的非凸代价景观仍然是机器人学中的一个基本挑战,其中基于梯度的方法经常收敛到次优局部最小值。我们展示了一种旨在克服这些陷阱的双步优化机制:(1)使用DDP利用代价景观梯度的初始阶段,随后(2)通过从由动作-价值函数的逆Hessian表征的策略中采样来破坏优化。我们对三种ME-DDP变体:单峰高斯ME-DDP、多峰高斯ME-DDP和Stein变分DDP的采样机制进行了严格分析。此外,通过在杂乱环境下的四个机器人系统的导航任务,我们对三种ME-DDP变体与确定性DDP以及最成功的基于采样的方案之一——模型预测路径积分(MPPI)控制(具有与ME-DDP对应的三种策略参数化和更新律)进行了广泛的基准测试。结果表明,在代价景观相对简单且局部信息足够代表性的低维系统中,我们的框架始终优于MPPI。在高维系统中,MPPI有时能够发现激进的机动,使其比基于DDP的方法更快地引导系统,而我们的方法保持更高、更稳定的成功率。最后,我们通过四旋翼飞行器在密集非凸障碍场中导航的硬件实验验证了该框架的实际功效,确认了所提框架在实际部署中的鲁棒性。

英文摘要

This paper investigates the optimization mechanisms of non-convex Model Predictive Control (MPC) using the Maximum Entropy Differential Dynamic Programming (ME-DDP) framework. Navigating non-convex cost landscapes induced by nonlinear dynamics, multiple obstacles, etc. remains a fundamental challenge in robotics, where gradient-based methods frequently converge to suboptimal local minima. We demonstrate a dual-step optimization mechanism designed to overcome these traps. (1) an initial phase of using DDP to exploit the gradient of the cost landscape, followed by (2) disruption of the optimization via sampling from policies characterized by the inverse Hessian of the action-value function. We provide a rigorous analysis of this sampling mechanism of three ME-DDP variants: Unimodal Gaussian ME-DDP, Multimodal Gaussian ME-DDP, and Stein Variational DDP. Furthermore, with navigation tasks of four robotic systems under cluttered environments, we conduct extensive benchmarking of three variants of the ME-DDP, against deterministic DDP, and one of the most successful sampling-based schemes, Model Predictive Path Integral (MPPI) control with three policy parameterizations and update laws that correspond to those of ME-DDPs. The results show that in low-dimensional systems where the cost landscapes are relatively simple and local information is sufficiently representative, our framework consistently outperforms MPPIs. In high-dimensional systems, MPPI can occasionally discover aggressive maneuvers that enable it to steer the systems faster than DDP-based methods, whereas our method maintains a higher, more stable success rate. Finally, we validate the practical efficacy of the framework through hardware experiments with a quadrotor navigating a dense, non-convex obstacle field, confirming the robustness of the proposed framework for real-world deployment.

2606.00730 2026-06-02 cs.RO

Infeasible optimization problems and the hierarchical augmented Lagrangian method in imitation learning

模仿学习中的不可行优化问题与分层增广拉格朗日方法

Roland Andrews, Justin Carpentier, Ajay Sathya

发表机构 * University of Cambridge(剑桥大学)

AI总结 针对模仿学习中约束不可行导致训练不稳定的问题,提出基于增广拉格朗日方法的解决方案,将策略引导至最近可行约束问题的解,并在驾驶示例中验证其有效性。

详情
AI中文摘要

模仿学习(IL)是训练复杂机器人策略的有效方法。最近的研究将硬约束引入模仿学习优化问题,以确保所学策略的安全性、稳定性和鲁棒性。然而,我们认为这些约束有时是不可行的,这可能导致不稳定或困难的训练动态。我们基于不可行设置下增广拉格朗日方法的最新理论结果,研究了一种针对此类情况的简单补救措施。我们表明,我们的方法将所学策略引导至具有理想属性的最近可行约束IL问题的解。该方法在一个具有总加速度约束和行人安全约束的玩具驾驶示例中进行了说明,该设置中不可行性自然出现,同时仍允许安全的所学策略。

英文摘要

Imitation learning (IL) is an effective approach to train complex robotics policies. Recent works have introduced hard constraints into imitation-learning optimization problems to ensure safety, stability, and robustness of the learned policy. However, we argue that these constraints are sometimes infeasible, which can lead to unstable or difficult training dynamics. We study a simple remedy for such situations based on recent theoretical results on the augmented Lagrangian method in infeasible settings. We show that our approach drives the learned policy toward the solution of a closest-feasible constrained IL problem with desirable properties. The method is illustrated on a toy driving example with a total-acceleration constraint and pedestrian-safety constraints, a setting in which infeasibility can naturally arise while still allowing a safe learned policy.

2606.00728 2026-06-02 cs.CL

From Empathy to Personalized Empathy: Adapting Empathetic Strategies to Individual Users

从共情到个性化共情:根据个体用户调整共情策略

Wuqiang Zheng, Chengbing Wang, Yilin Yang, Junyi Cheng, Jianfei Xiao, Hu Sun, Yi Xie, Yangyang Li, Wenjie Wang

发表机构 * University of Science and Technology of China(中国科学技术大学) Huawei Technologies(华为技术)

AI总结 针对大语言模型长期交互中忽略用户个性对共情策略影响的问题,提出个性化共情任务,构建PersonaEmp数据集和PereGRM奖励建模框架,实验证明其有效提升个性化共情能力。

详情
AI中文摘要

随着大语言模型(LLMs)越来越多地部署在与用户的长期交互中,共情已成为一项日益重要的能力。然而,现有研究忽视了用户个性特征对长期交互中共情策略的影响。为弥补这一空白,我们引入了个性化共情任务,其重点是根据从历史中获得的用户个性化特征调整共情策略。为了研究和增强这一能力,我们构建了PersonaEmp,一个基于长期用户-AI交互构建的个性化共情数据集,具有丰富的用户历史、人物信息和寻求共情的查询。我们进一步提出了PereGRM,一种奖励建模框架,它将共情评估结构与动态评估标准生成相结合,用于细粒度奖励建模。在不同设置和多个评判模型下的实验结果表明,PereGRM始终取得最强的性能提升,表明其在增强个性化共情能力方面的有效性。

英文摘要

As Large Language Models (LLMs) are increasingly deployed in long-term interactions with users, empathy has become an increasingly important capability. However, existing research overlooks the influence of users' personality traits on empathetic strategies during long-term interactions. To address this gap, we introduce the task of personalized empathy, which focuses on adapting empathetic strategies according to users' personalized characteristics derived from history. To study and enhance this capability, we construct PersonaEmp, a personalized empathy dataset built from long-term user-AI interactions, featuring rich user histories, persona information, and empathy-seeking queries. We further propose PereGRM, a reward modeling framework that combines the empathy evaluation structure with dynamic evaluation criteria generation for fine-grained reward modeling. Experimental results across different settings and multiple judge models show that PereGRM consistently achieves the strongest performance improvements, indicating its effectiveness for enhancing personalized empathetic capabilities.