arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2157
2002.09053 2026-05-20 cs.CV

Adapted Center and Scale Prediction: More Stable and More Accurate

适应中心和尺度预测:更加稳定和准确

Wenhao Wang, Jusheng Zhang

AI总结 本文提出了一种基于中心和尺度预测(CSP)的改进方法,旨在结合无锚点检测器的简洁性和两阶段检测器的准确性,通过增强CSP的鲁棒性、提出压缩宽度的新方法,并在CityPersons基准上取得第二名的性能,同时探索了可切换归一化的能力。

Comments 14 pages, 7 figures

详情
AI中文摘要

行人检测受益于深度学习技术,在近年来迅速发展。大多数检测器遵循通用目标检测框架,即默认框和两阶段过程。最近,无锚点和单阶段检测器被引入到这一领域。然而,它们的准确性并不令人满意。因此,为了同时享受无锚点检测器的简洁性和两阶段检测器的准确性,我们基于检测器提出了一些改进,即中心和尺度预测(CSP)。本文的主要贡献包括:(1)我们改进了CSP的鲁棒性,使其更容易训练。(2)我们提出了一种新的方法来预测宽度,即压缩宽度。(3)我们在CityPersons基准上取得了第二好的性能,即在合理集上9.3%的log-average miss rate(MR),在部分集上8.7%的MR,在裸集上5.6%的MR,这表明无锚点和单阶段检测器仍能保持高精度。(4)我们探索了可切换归一化的一些能力,这些能力在原始论文中未被提及。代码可在https://github.com/WangWenhao0716/Adapted-Center-and-Scale-Prediction上公开获取。

英文摘要

Pedestrian detection benefits from deep learning technology and gains rapid development in recent years. Most of detectors follow general object detection frame, i.e. default boxes and two-stage process. Recently, anchor-free and one-stage detectors have been introduced into this area. However, their accuracies are unsatisfactory. Therefore, in order to enjoy the simplicity of anchor-free detectors and the accuracy of two-stage ones simultaneously, we propose some adaptations based on a detector, Center and Scale Prediction(CSP). The main contributions of our paper are: (1) We improve the robustness of CSP and make it easier to train. (2) We propose a novel method to predict width, namely compressing width. (3) We achieve the second best performance on CityPersons benchmark, i.e. 9.3% log-average miss rate(MR) on reasonable set, 8.7% MR on partial set and 5.6% MR on bare set, which shows an anchor-free and one-stage detector can still have high accuracy. (4) We explore some capabilities of Switchable Normalization which are not mentioned in its original paper. The code is publicly available at https://github.com/WangWenhao0716/Adapted-Center-and-Scale-Prediction.

1912.11333 2026-05-20 cs.SD cs.LG eess.AS

Audio-based automatic mating success prediction of giant pandas

基于音频的 giant pandas 雌雄配对成功率预测

WeiRan Yan, MaoLin Tang, Qijun Zhao, Peng Chen, Dunwu Qi, Rong Hou, Zhihe Zhang

AI总结 本文提出了一种基于音频的自动方法,用于预测 giant pandas 的配对成功率,通过提取音频特征并使用深度神经网络进行分类,以辅助大熊猫的繁殖研究。

Comments The manuscript needs further revision

详情
AI中文摘要

大熊猫,通常被视为沉默的动物,在繁殖季节会发出显著更多的声音,这表明声音对于协调其繁殖和表达配对偏好至关重要。先前的生物学研究也证明,大熊猫的声音与配对结果和繁殖有关。本文首次尝试开发一种基于其声音的自动方法,用于预测大熊猫的配对成功率。给定一个记录于繁殖接触期间的大熊猫音频序列,我们首先裁剪出大熊猫的声音段落,并对其进行幅度和长度的归一化。然后从音频段落中提取声学特征,并将这些特征输入深度神经网络,以将配对分类为成功或失败。所提出的深度神经网络采用卷积层后接双向门控循环单元来提取声音特征,并应用注意力机制,以迫使网络专注于最相关特征。在过去九年收集的数据集上的评估实验取得了有希望的结果,证明了基于音频的自动配对成功率预测方法在辅助大熊猫繁殖方面的潜力。

英文摘要

Giant pandas, stereotyped as silent animals, make significantly more vocal sounds during breeding season, suggesting that sounds are essential for coordinating their reproduction and expression of mating preference. Previous biological studies have also proven that giant panda sounds are correlated with mating results and reproduction. This paper makes the first attempt to devise an automatic method for predicting mating success of giant pandas based on their vocal sounds. Given an audio sequence of mating giant pandas recorded during breeding encounters, we first crop out the segments with vocal sound of giant pandas, and normalize its magnitude, and length. We then extract acoustic features from the audio segment and feed the features into a deep neural network, which classifies the mating into success or failure. The proposed deep neural network employs convolution layers followed by bidirection gated recurrent units to extract vocal features, and applies attention mechanism to force the network to focus on most relevant features. Evaluation experiments on a data set collected during the past nine years obtain promising results, proving the potential of audio-based automatic mating success prediction methods in assisting giant panda reproduction.

2605.19020 2026-05-20 cs.CV

A Systematic Failure Analysis of Vision Foundation Models for Open Set Iris Presentation Attack Detection

对用于开放集虹膜呈现攻击检测的视觉基础模型系统性失败分析

Rahul Anand, Siddharth Singh, Dileep A D, Mahadeva Prasanna, Raghavendra Ramachandra

AI总结 本文系统分析了视觉基础模型在开放集虹膜呈现攻击检测中的表现,发现其在面对未见过的攻击设备和跨光谱转移时表现不佳,强调了需要更鲁棒的虹膜检测表示方法。

详情
AI中文摘要

视觉基础模型在多种视觉识别任务中表现出强大的迁移能力,并日益被应用于生物识别领域。然而,其在开放集条件下用于虹膜呈现攻击检测(PAD)的适用性仍不够充分。本文系统分析了通用视觉基础模型在开放集虹膜PAD中的表现,使用周缘视觉图像进行评估。在三个明确分离不同分布偏移的开放集协议下,评估了五个代表性基础模型:未见过的呈现攻击设备(PAIs)、使用不同传感器捕获的未见数据集以及近红外(NIR)到可见光(VIS)光谱的跨光谱转移。在统一的实验框架内,评估了冻结的特征表示和参数高效的LoRA任务适应方法。结果表明,基础模型能够在具有相似传感特征的数据集之间迁移,但无法可靠地推广到未见过的攻击设备,并在跨光谱评估中急剧退化。尽管LoRA在某些跨数据集设置中提高了性能,但在攻击级别和光谱偏移下经常放大失败。额外的验证实验使用分段虹膜输入、完整主干微调、联合跨数据集和跨PAI偏移以及反向VIS到NIR转移进一步证实,这些失败并非仅仅是周缘视觉输入、弱适应或单向光谱评估的产物。这些发现表明,强闭合集或跨数据集性能不应被视为开放集安全性的证据,并突显了需要虹膜检测表示方法在保持对呈现伪影的敏感性的同时,在现实部署变化下保持稳定性的需求。

英文摘要

Vision foundation models have demonstrated strong transferability across diverse visual recognition tasks and are increasingly considered for biometric applications. Their suitability for iris Presentation Attack Detection (PAD), particularly under realistic open-set operating conditions, remains insufficiently examined. This work presents a systematic failure analysis of general-purpose vision foundation models for open-set iris PAD using periocular imagery. Five representative foundation models are evaluated under three open-set protocols that explicitly separate different sources of distribution shift: unseen Presentation Attack Instruments (PAIs), unseen datasets captured with different sensors and cross-spectral transfer from near-infrared (NIR) to visible spectrum (VIS) imagery. Both frozen feature representations and parameter-efficient task adaptation using Low-Rank Adaptation (LoRA) are assessed within a unified experimental framework. The results indicate that foundation models can transfer across datasets with similar sensing characteristics, but fail to generalise reliably to unseen attack instruments and degrade sharply under cross-spectral evaluation. While LoRA improves performance in certain cross-dataset settings, it frequently amplifies failure under attack-level and spectral shifts. Additional validation experiments using segmented iris inputs, full backbone fine-tuning, joint cross-dataset and cross-PAI shifts, and reverse VIS to NIR transfer further confirm that these failures are not simply artefacts of periocular input, weak adaptation, or one-directional spectral evaluation. These findings show that strong closed-set or cross-dataset performance should not be treated as evidence of robust open-set security, and highlight the need for PAD representations that maintain sensitivity to presentation artefacts while remaining stable under realistic deployment variation.

2605.19018 2026-05-20 cs.LG

LoRA vs. Full Fine-Tuning: A Theoretical Perspective

LoRA与全微调:一种理论视角

Ali Zindari, Rotem Mulayoff, Sebastian U. Stich

AI总结 本文从理论角度研究了LoRA与全微调在线性回归中的表现,发现LoRA在过定和欠定情况下能够以更低的额外风险优于全微调,且LoRA秩的选择影响泛化性能,实验验证了理论结果的广泛适用性。

Comments Preprint

详情
AI中文摘要

微调通过少量标记数据将预训练模型适应到下游任务。低秩适应(LoRA)是一种高效的微调方法,它在减少内存和计算成本的同时,通常能实现接近全微调的性能。尽管广泛应用,LoRA的理论行为尚未深入理解。本文在简单的线性回归设置中研究LoRA,并将其额外风险与全微调进行比较。我们的分析识别出在过定和欠定情况下,LoRA在某些条件下能够实现低于全微调的额外风险。具体而言,我们的理论预测当预训练任务与下游任务之间的差异在低秩范围内时,LoRA可以超越全微调。我们进一步展示了LoRA秩的选择如何影响泛化性能,解释了在某些情况下使用极小的秩可以提高测试准确率,尽管这限制了模型的表达能力。最后,我们通过实际任务的实验支持了我们的理论结果,表明所识别的权衡和见解超出了线性回归的范围。

英文摘要

Fine-tuning adapts a pre-trained model to downstream tasks using a small amount of labeled data. Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that reduces memory and computation costs while often achieving performance close to full fine-tuning. Despite its widespread use, the theoretical behavior of LoRA is not yet well understood. In this paper, we study LoRA in a simple linear regression setting and compare its excess risk with that of full fine-tuning. Our analysis identifies regimes in which LoRA achieves lower excess risk than full fine-tuning in both overdetermined and underdetermined settings. Specifically, our theory predicts that LoRA can outperform full fine-tuning when the difference between the pretraining and the downstream tasks is effectively low-rank. We further show how the choice of LoRA rank affects generalization performance, explaining why using a very small rank can improve test accuracy in certain settings, even though it limits model expressivity. Finally, we support our theoretical results with experiments on practical tasks, suggesting that the identified tradeoffs and insights extend beyond linear regression.

2605.19014 2026-05-20 cs.LG econ.EM stat.ML

SAGA: A Sequence-Adaptive Generative Architecture for Multi-Horizon Probabilistic Forecasting with Adaptive Temporal Conformal Prediction

SAGA:一种序列自适应的生成架构,用于多时间跨度概率预测的自适应时间符合预测

Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov, Hafize Gonca Cömert

AI总结 本文提出SAGA,一种用于不规则表格面板序列的解码器-only transformer,结合分割符合校准包装器,提供个体层面的预测区间,并保证有限样本边缘覆盖。SAGA在瑞典LISA登记处的纵向数据上训练,预测了1到30年的年度劳动收入,并通过蒙特卡洛方法汇总成现值寿命收入分布。与传统参数过程和表格和循环基线相比,SAGA在10年时间跨度上将连续排名概率分数减少了31.9%,在20年时间跨度上将平均绝对误差减少了37.7%。符合区间在边缘情况下覆盖率为0.4个百分点,在最差的人口子群体中为2.4个百分点。重建的寿命收入基尼系数为0.327,与部分观测的真实值0.341和GKOS估计值0.378相比。模型权重、校准表和合成等价数据集已发布,供在保护的SCB MONA环境中外的复制使用。

Comments 14 pages, 3 figures, 12 tables, 5 appendices, 45 references. Submitted to IEEE TPAMI. Source code at https://github.com/olaflaitinen/saga (archived: doi:10.5281/zenodo.20260366). Synthetic equivalent dataset: doi:10.5281/zenodo.20260287. Empirical work conducted on the Swedish LISA register via SCB MONA (project SCB-MONA-2026-147); ethical approval Swedish Ethical Review Authority 2026-04127-01

详情
AI中文摘要

用于财政部门和中央银行的微模拟模型依赖于参数过程来捕捉生命周期收入的寿命,这些过程只捕捉条件分布的一阶和二阶矩,忽略了长期非线性结构。我们提出SAGA,一种用于不规则表格面板序列的解码器-only transformer,结合分割符合校准包装器,提供个体层面的预测区间,并保证有限样本边缘覆盖。在1990年至2022年的纵向瑞典LISA登记处数据上训练,包含2,143,817个个体和61,284,903人年,模型预测了1到30年的年度劳动收入,并通过蒙特卡洛方法汇总成现值寿命收入分布。与传统参数过程和表格和循环基线相比,SAGA在10年时间跨度上将连续排名概率分数减少了31.9%,在20年时间跨度上将平均绝对误差减少了37.7%。符合区间在边缘情况下覆盖率为0.4个百分点,在最差的人口子群体中为2.4个百分点。重建的寿命收入基尼系数为0.327,与部分观测的真实值0.341和GKOS估计值0.378相比。模型权重、校准表和合成等价数据集已发布,供在保护的SCB MONA环境中外的复制使用。

英文摘要

Microsimulation models used by ministries of finance and central banks rely on parametric processes for lifetime earnings that capture only first and second moments of the conditional distribution and miss long-range nonlinear structure. We propose SAGA, a decoder-only transformer for irregular tabular panel sequences, paired with a split conformal calibration wrapper that delivers individual-level prediction intervals with finite-sample marginal coverage guarantees. Trained on the longitudinal Swedish LISA register over 1990 to 2022, comprising 2,143,817 individuals and 61,284,903 person-years, the model forecasts annual labor earnings at horizons of one to thirty years and aggregates them by Monte Carlo into present-discounted lifetime earnings distributions. Against the canonical Guvenen, Karahan, Ozkan, and Song parametric process and tabular and recurrent baselines, SAGA reduces continuous ranked probability score by 31.9 percent at the ten-year horizon and mean absolute error by 37.7 percent at the twenty-year horizon. Conformal intervals achieve nominal coverage to within 0.4 percentage points marginally and within 2.4 percentage points on the worst-case demographic subgroup. The reconstructed lifetime earnings Gini coefficient is 0.327 against the partially observed truth of 0.341 and the GKOS estimate of 0.378. Model weights, calibration tables, and a synthetic equivalent dataset are released for replication outside the protected SCB MONA environment.

2605.19010 2026-05-20 cs.AI

AgentNLQ: A General-Purpose Agent for Natural Language to SQL

AgentNLQ: 一个通用的自然语言到SQL代理

Olena Bogdanov, Yeunji Jung, Chandra Dhir, Pareekshitreddy Gaddam, Saurabh Jain, Lakshmi Tumati, Vijay Parthasarathy, Anup Shirgaonkar

AI总结 本研究提出了一种多代理方法,用于改进自然语言到SQL的转换,该方法在BIRD基准测试中实现了78.1%的语义准确率,并通过优化的多代理解决方案、先进的模式增强方法以及跨不同领域和数据集的评估,展示了方法的准确性和泛化能力。

详情
AI中文摘要

自然语言到SQL(NL2SQL)转换是研究人员和企业关注的重要问题,因为关系数据库在广泛的实际问题中具有普遍的重要性。尽管大语言模型(LLMs)的能力迅速提升,NL2SQL尚未达到与人类专家SQL编写者同等的准确性,因此需要进一步改进NL2SQL算法。本研究提出了一种新的多代理方法用于NL2SQL,该方法在BIg Bench for LaRge-scale Database(BIRD)基准上实现了78.1%的语义准确性。我们的方法利用了用户提供的模式的语义丰富表示,添加了用户提供的业务规则,并生成了准确的SQL查询。本研究的主要贡献包括(a)我们设计了一种优化的多代理解决方案中的新调度器,该调度器利用LLMs进行计划、协调、反思和自我纠正以生成准确的SQL查询;(b)我们开发了一种先进的模式增强方法,创建了上下文感知的元数据以提高准确性;(c)我们通过在BIRD-SQL基准上评估该方法,展示了其在不同领域和数据集上的准确性和泛化能力。

英文摘要

Natural language to SQL (NL2SQL) conversion is an important problem for researchers and enterprises due to the ubiquitous importance of relational databases in broad-ranging practical problems. Despite the rapid advancements in the capabilities of LLMs, NL2SQL has not reached parity in accuracy with human expert SQL writers, hence needing additional improvements in NL2SQL algorithms. This study presents a new multi-agent method for NL2SQL that achieves 78.1% semantic accuracy on the BIg Bench for LaRge-scale Database (BIRD) benchmark. Our method leverages a semantically enriched representation of user-provided schema, adds user-provided business rules, and produces accurate SQL queries. The main contributions of this study are (a) We designed an optimized new orchestrator in a multi-agent solution that uses LLMs to plan, orchestrate, reflect, and self-correct to generate accurate SQL queries, (b) We developed an advanced schema enrichment method that creates context-aware metadata to improve accuracy, and (c) We demonstrated the accuracy and generalizability of the method across different domains and datasets by evaluating it on the BIRD-SQL benchmark.

2605.19009 2026-05-20 cs.RO cs.SY eess.SY

Adversarial Stress Testing of SPARK Humanoid Safety Filters

对SPARK人形机器人类安全过滤器的对抗性压力测试

Saurav Ghosh, Abdou Sow, Luke Zhang

AI总结 本文通过复制和压力测试研究了SPARK人形机器人类安全过滤器的鲁棒性,评估了多种方法在不同环境下的表现,揭示了安全行为在障碍物密集、距离估计噪声和延迟信息下的变化,强调了在部署前需使用能暴露故障模式的评估指标。

Comments 5 pages, 7 figures, 1 table. Code available at https://github.com/ghoshsaurav/spark-adversarial-safety

详情
AI中文摘要

人形机器人由于具有高维身体、众多碰撞约束以及必须在人和障碍物附近操作,难以安全部署。安全过滤器通过在可能违反避障约束时修改名义控制动作来帮助。然而,名义基准分数并不能完全显示这些过滤器在更困难环境中的行为。在本工作中,我们通过复制和压力测试研究了SPARK人形安全过滤器的鲁棒性。我们复制了SPARK基准案例G1SportMode_D1_WG_SO_v1到MuJoCo,并在受控随机种子下评估RSSA、RSSS、SSA、CBF、PFM和SMA。我们还构建了一个后处理流程,将原始SPARK日志转换为目标跟踪、最小距离和碰撞步骤指标。我们的结果表明,某些方法更接近目标跟踪,而其他方法更有效减少碰撞步骤。压力测试进一步表明,在障碍物密集、距离估计噪声和延迟障碍信息下,安全行为可能发生改变。这些发现表明,人形自主性应在名义性能之外进行评估,使用能暴露故障模式的指标。

英文摘要

Humanoid robots are difficult to deploy safely because they have high-dimensional bodies, many collision constraints, and must operate near people and obstacles. Safety filters help by modifying a nominal control action when it may violate collision-avoidance constraints. Still, nominal benchmark scores do not fully show how these filters behave in harder environments. In this work, we study the robustness of SPARK humanoid safety filters through replication and stress testing. We replicate the SPARK benchmark case G1SportMode_D1_WG_SO_v1 in MuJoCo and evaluate RSSA, RSSS, SSA, CBF, PFM, and SMA under controlled random seeds. We also built a post-processing pipeline that converts raw SPARK logs into goal-tracking, minimum-distance, and collision-step metrics. Our results show that some methods track the goal more closely, while others reduce collision steps more effectively. The stress tests further indicate that safety behavior can change under obstacle crowding, noisy distance estimates, and delayed obstacle information. These findings suggest that humanoid autonomy should be evaluated beyond nominal performance, using metrics that expose failure modes before deployment.

2605.19008 2026-05-20 cs.AI cs.CL cs.LG

Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

通过线学习的训练控制治理:在压力下受限制的自主训练以稳定性和效率

Anis Radianis

AI总结 本文提出了一种名为Learn-by-Wire Guard (LBW-Guard)的受限制自主训练控制治理层,用于在压力下提高大型语言模型的稳定性和效率,通过在AdamW之上进行有界控制,以保持固定训练目标。

详情
AI中文摘要

现代语言模型训练越来越暴露于不稳定性、退化运行和计算浪费,特别是在使用激进的学习率、规模和运行时间压力条件时。本文介绍了Learn-by-Wire Guard (LBW-Guard),一种在AdamW之上运行的受限制自主训练控制治理层。而不是替换优化器更新规则,LBW-Guard通过观察训练 telemetry,解读对不稳定性敏感的制度,并在保持固定训练目标的同时对优化器执行应用有界控制。我们评估LBW-Guard在以Qwen2.5为中心的压力和鲁棒性套件中使用WikiText-103,以Qwen2.5-7B为经验锚点,与Qwen2.5-3B和Qwen2.5-14B进行模型大小比较,学习率压力测试,梯度裁剪基线以及无LoRA TinyLlama-1B全参数 sanity check。在7B参考设置中,LBW-Guard将最终困惑度从13.21降低到10.74,降低18.7%,同时将端到端时间从392.54秒降低到357.02秒,提高了1.10倍的速度。在更强的学习率压力下,AdamW在LR=3e-3时退化到最终困惑度1885.24,在LR=1e-3时为659.76,而LBW-Guard分别保持可训练性为11.57和10.33。梯度裁剪基线无法再现这种效果。这些结果支持了一个范围系统的结论,即对稳定性敏感的LLM训练可以受益于在优化器之上进行治理。LBW-Guard提供了证据,表明在压力下受限制的运行时间控制可以在保持生产力计算的同时,与优化器替换和局部梯度抑制保持不同。

英文摘要

Modern language-model training is increasingly exposed to instability, degraded runs, and wasted compute, especially under aggressive learning-rate, scale, and runtime-stress conditions. This paper introduces Learn-by-Wire Guard (LBW-Guard), a bounded autonomous training-control governance layer that operates above AdamW. Rather than replacing the optimizer update rule, LBW-Guard observes training telemetry, interprets instability-sensitive regimes, and applies bounded control to optimizer execution while preserving fixed training objectives. We evaluate LBW-Guard in a Qwen2.5-centered stress-and-robustness suite using WikiText-103, with Qwen2.5-7B as the empirical anchor, model-size comparisons against Qwen2.5-3B and Qwen2.5-14B, learning-rate stress tests, gradient-clipping baselines, and a no-LoRA TinyLlama-1B full-parameter sanity check. In the 7B reference setting, LBW-Guard reduces final perplexity from 13.21 to 10.74, an 18.7% improvement, while reducing end-to-end time from 392.54s to 357.02s, a 1.10x speedup. Under stronger learning-rate stress, AdamW degrades to 1885.24 final perplexity at LR=3e-3 and 659.76 at LR=1e-3, whereas LBW-Guard remains trainable at 11.57 and 10.33, respectively. Gradient-clipping baselines do not reproduce this effect. These results support a scoped systems conclusion that stability-sensitive LLM training can benefit from a governance plane above the optimizer. LBW-Guard provides evidence that bounded runtime control can preserve productive compute under stress while remaining distinct from optimizer replacement and local gradient suppression.

2605.19004 2026-05-20 cs.CV cs.LG cs.RO

EgoTraj: Real-World Egocentric Human Trajectory Dataset for Multimodal Prediction

EgoTraj: 用于多模态预测的现实世界人轨迹数据集

Ahmad Yehia, Abduallah Mohamed, Tianyi Wang, Jiseop Byeon, Kun Qian, Junfeng Jiao, Christian Claudel

AI总结 本文提出EgoTraj数据集,用于多模态预测,包含75个真实城市环境中的人导航轨迹,提供了同步的RGB视频和地面真实数据,包括6自由度头部姿态、3D眼 gaze向量和场景注释,展示了该数据集在AR感知、导航和辅助系统中的应用价值。

Comments 21 pages, 14 figures. Project page: https://github.com/yehiahmad/EgoTraj

详情
AI中文摘要

准确地从第一人称视角预测人类轨迹在人形机器人、可穿戴传感系统和辅助导航等应用中起着核心作用。然而,由于现实世界环境中缺乏第一人称轨迹数据集,这一方向的进展受到限制。为了解决这一需求,我们介绍了EgoTraj,一个使用Meta Quest Pro (MQPro)录制的egocentric多模态开放数据集。EgoTraj包含75个由多个MQPro穿戴设备在真实城市环境中收集的人导航轨迹。每个记录都提供了同步的RGB视频以及地面真实数据,包括连续时间同步的6自由度头部姿态、每帧3D眼 gaze向量和场景注释。据我们所知,EgoTraj不同于典型的egocentric轨迹数据集,因为它捕捉了在多样化的城市路线中进行的长视距、自主导航,具有广泛的参与者多样性。为了展示该数据集的潜力,我们对几种最先进的egocentric轨迹预测方法进行了基准测试,并进行了消融研究以分析注视、场景和运动提示的贡献。结果突显了EgoTraj在AR感知、导航和辅助系统中的实用性。EgoTraj数据集、代码和EgoViz仪表板已公开在https://github.com/yehiahmad/EgoTraj。

英文摘要

Accurately forecasting human trajectories from an egocentric perspective plays a central role in applications such as humanoid robotics, wearable sensing systems, and assistive navigation. However, progress in this direction remains limited due to the scarcity of egocentric trajectory datasets collected in real-world environments. Addressing this need, we introduce EgoTraj, an egocentric multimodal open dataset recorded using Meta Quest Pro (MQPro). EgoTraj contains 75 sequences of human navigation collected from multiple MQPro wearers in real-world urban environments. Each recording provides synchronized RGB video along with ground-truth data, including continuous time-synchronized 6-degree-of-freedom head poses, per-frame 3D eye gaze vectors, scene annotations. To the best of our knowledge, EgoTraj differs from typical egocentric trajectory datasets by capturing long-horizon, self-directed navigation across diverse urban routes with broad participant diversity. To demonstrate the potential of the dataset, we benchmark several state-of-the-art methods for egocentric trajectory prediction and conduct ablation studies to analyze the contributions of gaze, scene, and motion cues. The results highlight the utility of EgoTraj for AR-based perception, navigation, and assistive systems. The EgoTraj dataset, code, and EgoViz Dashboard are publicly available at https://github.com/yehiahmad/EgoTraj.

2605.18999 2026-05-20 cs.LG

Distance-Aware Muon: Adaptive Step Scaling for Normalized Optimization

Distance-Aware Muon: Adaptive Step Scaling for Normalized Optimization

Yury Demidovich, Abhishek Chakraborty, Grigory Malinovsky, Angelia Nedić, Peter Richtárik

AI总结 本文研究了Muon优化器在一般范数几何中的自适应步长缩放规则,提出三种互补算法,包括Distance-Adaptive Muon、Scale-Calibrated Muon和Distance-Free Muon,通过证明站arity保证、目标间隙界和信任区域半径选择,提升了优化性能。

详情
AI中文摘要

Muon和相关的归一化优化器将更新方向的选择与步长缩放的选择解耦,但其实际性能仍然对归一化步长的尺度敏感。我们研究了Muon在一般范数几何中的自适应缩放规则,并开发了三种互补算法。对于光滑非凸目标,我们引入了Distance-Adaptive Muon,其信任区域半径由轨迹探索的半径设定,并在轨迹有界假设下证明了站arity保证。随后,我们转向星凸目标,这是用于推理深度神经网络经验损失景观的可处理模型,在此设置中,我们首先引入Scale-Calibrated Muon,它保持Muon的指数移动平均,但通过当前梯度和动量计算的局部下降证书设置步长长度。对于该方法,我们在初始子水平集有界假设下证明了最后迭代的O(1/T)目标间隙界,其中对应的半径参数仅出现在分析中,而不是算法中。最后,我们开发了Distance-Free Muon,这是一种重新中心的信任区域方法,使用标量距离证书和主要化的一维搜索来选择信任区域半径,无需要求未知的初始化到全局最小值的距离。在Transformer语言建模(GPT-124M/WikiText-103)和图像分类(ViT-Tiny/CIFAR-100)上的实验表明,所提出的自适应缩放规则减少了对手动缩放调整的敏感性,并在测试预算下匹配或改进了调优的固定缩放Muon基线。

英文摘要

Muon and related normalized optimizers decouple the choice of update direction from the choice of step scale, but their practical performance remains sensitive to the scale of the normalized step. We study adaptive scaling rules for Muon in general norm geometries and develop three complementary algorithms. For smooth non-convex objectives, we introduce Distance-Adaptive Muon, whose trust-region radius is set from the radius explored by the trajectory, and prove a stationarity guarantee under a bounded-trajectory assumption. We then turn to star-convex objectives, a tractable model of the favorable global geometry often used to reason about the empirical loss landscapes of deep neural networks, where objective-gap guarantees are possible. In this setting, we first introduce Scale-Calibrated Muon, which keeps Muon's exponential moving average but sets the step length from a local descent certificate computed from the current gradient and momentum. For this method, we prove a last-iterate O(1/T) objective-gap bound under a bounded initial sublevel-set assumption, where the corresponding radius parameter appears only in the analysis and not in the algorithm. Finally, we develop Distance-Free Muon, a recentered trust-region method that uses a scalar distance certificate and a majorized one-dimensional search to select the trust-region radius without requiring the unknown distance from the initialization to a global minimizer. Experiments on Transformer language modeling (GPT-124M/WikiText-103) and image classification (ViT-Tiny/CIFAR-100) show that the proposed adaptive scaling rules reduce sensitivity to manual scale tuning and match or improve tuned fixed-scale Muon baselines under the tested budgets.

2605.18984 2026-05-20 cs.CV

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

Artifact-Bench: 评估MLLMs在检测和评估AI生成视频中的伪影

Yuqi Tang, Yang Shi, Zhuoran Zhang, Qixun Wang, Xuehai Bai, Yue Ding, Ruizhe Chen, Bohan Zeng, Xinlong Chen, Xuanyu Zhu, Bozhou Li, Yuran Wang, Yifan Dai, Chengzhuo Tong, Xinyu Liu, Yiyan Ji, Yujie Wei, Yuhao Dong, Shilin Yan, Fengxiang Wang, Yi-Fan Zhang, Haotian Wang, Yuanxing Zhang, Pengfei Wan

AI总结 本文提出Artifact-Bench,一个用于评估多模态大语言模型在检测和分析AI生成视频伪影能力的基准,揭示了现有模型在伪影感知和推理上的显著局限性。

详情
AI中文摘要

近年来,视频生成模型在提高AI生成视频的真实感方面取得了显著进步,但其输出仍存在时间不一致、结构失真和语义不连贯等伪影。尽管多模态大语言模型(MLLMs)在视觉理解方面表现出色,但其感知和推理这些伪影的能力仍不明确。现有基准缺乏对伪影感知和细粒度诊断推理的系统评估,尤其是在超越逼真内容的多样化AI生成视频领域。为解决这一差距,我们引入Artifact-Bench,一个全面的基准,用于评估MLLMs在AI生成视频伪影检测和分析上的能力。我们首先建立了涵盖逼真、动画和CG风格视频的三级层次化伪影分类法。基于此分类法,Artifact-Bench定义了三个互补任务:真实与AI生成视频分类、成对真实感比较和细粒度伪影识别。在19种领先MLLMs上的实验揭示了伪影感知和推理的显著局限性,许多模型在挑战性设置中接近随机甚至低于随机表现。我们进一步观察到MLLM判断与人类感知偏好之间存在显著不一致,突显了其作为AI生成视频真实感一般评估者的有限可靠性。

英文摘要

Recent video generative models have greatly improved the realism of AI-generated videos, yet their outputs still exhibit artifacts such as temporal inconsistencies, structural distortions, and semantic incoherence. While Multimodal Large Language Models (MLLMs) show strong visual understanding capabilities, their ability to perceive and reason about such artifacts remains unclear. Existing benchmarks often lack systematic evaluation of artifact-aware perception and fine-grained diagnostic reasoning, especially across diverse AI-generated video domains beyond photorealistic content. To address this gap, we introduce Artifact-Bench, a comprehensive benchmark for evaluating MLLMs on AI-generated video artifact detection and analysis. We first establish a three-level hierarchical taxonomy of realism artifacts, covering photorealistic, animated, and CG-style videos. Based on this taxonomy, Artifact-Bench defines three complementary tasks: real vs. AI-generated video classification, pairwise realism comparison, and fine-grained artifact identification. Experiments on 19 leading MLLMs reveal substantial limitations in artifact perception and reasoning, with many models approaching random or even below-random performance in challenging settings. We further observe significant misalignment between MLLM judgments and human perceptual preferences, highlighting their limited reliability as general evaluators for AI-generated video realism.

2605.18979 2026-05-20 cs.LG

TabQL: In-Context Q-Learning with Tabular Foundation Models

TabQL: 基于表格基础模型的上下文Q学习

Qisai Liu, Zhanhong Jiang, Timilehin Ayanlade, Ashutosh Kumar Nirala, Yang Li, Aditya Balu, Soumik Sarkar

AI总结 本文提出TabQL,一种基于表格基础模型的强化学习框架,通过上下文学习能力替代传统参数Q网络,提升Q值表示的适应性与效率。

详情
AI中文摘要

我们提出了表格Q学习(TabQL),一种强化学习框架,该框架用具有上下文学习能力的表格基础模型替代传统参数Q网络。关键思想是通过序列到序列基础模型对状态-动作-Q值元组的表格化表示来表示Q值,从而通过条件于近期经验实现快速适应。TabQL不同于经典DQN之处在于利用(i)零次或少次射Q值推断通过上下文更新,以及(ii)使用标准DQN进行预热阶段以生成高质量的上下文。特别是,为了增强上下文质量,新的转移是通过执行TabQL输出的动作和DQN预测的Q值生成的。我们正式化了TabQL,分析了其收敛性和样本复杂度在温和假设下的表现,并展示了TabQL在上下文学习下介于原始Q学习和DQN之间。我们的分析表明,TabQL通过上下文学习消除了Bellman更新,从而比DQN更高效。通过多个基准的广泛数值实验,展示了所提TabQL的有效性和有效性。

英文摘要

We propose Tabular Q-Learning (TabQL), a reinforcement learning framework that replaces the conventional parametric Q-network in Deep Q-Learning (DQN) with a tabular foundation model endowed with in-context learning capabilities. The key idea is to represent Q-values through a sequence-to-sequence foundation model operating over a tabularized representation of state-action-Q-value tuples, enabling rapid adaptation from limited online interaction by conditioning on recent experience. TabQL departs from classical DQN by leveraging (i) zero- or few-shot Q-value inference via in-context updates, and (ii) a warm-up phase using standard DQN to bootstrap high-quality context. Particularly, to enhance the context quality, new transitions are generated by executing actions output by TabQL with predicted Q values from DQN. We formalize TabQL, analyze its convergence and sample complexity under mild assumptions, and show that TabQL interpolates between vanilla Q-learning and DQN with in-context learning. Our analysis demonstrates that TabQL achieves improved efficiency compared to DQN by amortizing Bellman updates through in-context learning. Extensive numerical experiments with several benchmarks showcase the effectiveness and efficacy of the proposed TabQL.

2605.18974 2026-05-20 cs.CV cs.AI cs.MM

Harnessing Self-Supervised Features for Art Classification

利用自监督特征进行艺术分类

Federico Melis, Davide Bilardello, Emanuele Prato, Evelyn Turri, Lorenzo Baraldi

AI总结 本文研究了监督和自监督主干作为特征提取器在艺术分类和检索中的有效性,特别是绘画,通过DINO家族和CLIP模型的实验评估,证明自监督主干在艺术分类中能带来一致的性能提升,并为现实应用如虚拟现实中的博物馆导航提供了见解。

Comments IRCDL 2026

详情
AI中文摘要

对艺术品进行分类是一项具有挑战性的任务,因为精细细节和抽象特征的复杂相互作用决定了艺术作品的风格或流派。本文系统地研究了监督和自监督主干作为特征提取器在艺术品分类和检索中的有效性,特别是绘画。我们通过DINO家族和CLIP模型进行了广泛的实验评估,评估了多种分类策略和特征表示。我们的结果表明,使用自监督主干在艺术品分类性能上产生了持续的改进。此外,我们的工作为现实应用中的分类和检索模块提供了见解,例如支持博物馆导航的虚拟现实(VR)应用。

英文摘要

Classifying artworks presents a significant challenge due to the complex interplay of fine-grained details and abstract features that condition the style or genre of an artwork. This paper presents a systematic investigation of the effectiveness of supervised and self-supervised backbones as feature extractors for both artwork classification and retrieval, with a particular focus on paintings. We conduct an extensive experimental evaluation using the DINO family and CLIP models, assessing multiple classification strategies and feature representations. Our results demonstrate that employing a self-supervised backbone leads to consistent improvements in artwork classification performance. Moreover, our work provides insights into the applicability of classification and retrieval modules in real-world applications, such as virtual reality (VR) applications that support museum navigation.

2605.18971 2026-05-20 cs.LG cs.AI

Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality

塑造先验:合成任务分布如何决定表格基础模型的质量

Mohamed Bouadi, Nassim Bouarour, Varun Kulkarni, Shivam Dubey, Aditya Tanna, Vinay Kumar Sankarapu

AI总结 本文研究了合成任务分布对表格基础模型质量的影响,提出O'Prior方法,通过四个耦合组件构建更真实的先验,提升了下游任务的准确性和鲁棒性。

详情
AI中文摘要

什么是决定表格基础模型质量的因素?与语言或视觉不同,表格基础模型的归纳偏倚几乎完全来自于合成预训练分布,但这些分布的设计仍不明确。标准的合成先验过于良好:它们忽略了不规则性和失败模式,这些决定了部署的鲁棒性。我们引入O'Prior,一种基于四个耦合组件的组合现实先验:一个跨越不同功能家族的分层SCM元生成器;一个覆盖异质边际、缺失值和目标转换的模块化现实引擎;一个显式压力模块注入混淆和支持-查询不匹配;以及一个受课程指导、泄漏安全的生成协议。为了将先验设计作为科学变量隔离,我们固定了架构、优化器和计算预算,只改变合成任务分布。O'Prior在真实表格基准上实现了持续且显著的改进,收益集中在分布不规则性特征的领域。消融实验确认了机制多样性、现实组成和移位感知压力各自独立贡献,其效果不可互换。这些结果确立了合成先验构建作为表格基础模型质量的第一性且长期被忽视的决定因素。

英文摘要

What determines the quality of a tabular foundation model? Unlike language or vision, tabular foundation models acquire their inductive biases almost entirely from synthetic pretraining distributions, yet the design of these distributions remains poorly understood. Standard synthetic priors are too well-behaved: they omit the irregularities and failure modes that determine deployment robustness. We introduce O'Prior, a compositional realism prior built around four coupled components: a hierarchical SCM meta-generator spanning diverse functional families; a modular realism engine covering heterogeneous marginals, missingness, and target transforms; an explicit stress module injecting confounding and support-query mismatch; and a curriculum-governed, leakage-safe generation protocol. To isolate prior design as the scientific variable, we hold architecture, optimizer, and compute budget fixed and vary only the synthetic task distribution. O'Prior yields consistent and substantial improvements in downstream accuracy and robustness across real tabular benchmarks, with gains concentrated in regimes characterized by distributional irregularities. Ablations confirm that mechanism diversity, realism composition, and shift-aware stress each contribute independently, their effects are not interchangeable. These results establish synthetic prior construction as a first-order and largely overlooked determinant of tabular foundation model quality

2605.18956 2026-05-20 cs.CV

MotionMERGE: A Multi-granular Framework for Human Motion Editing, Reasoning, Generation, and Explanation

MotionMERGE: 一种用于人体动作编辑、推理、生成和解释的多粒度框架

Bizhu Wu, Jinheng Xie, Wenting Chen, Zhe Kong, Jianfeng Ren, Linlin Shen, Ruibin Bai, Rong Qu

AI总结 本文提出MotionMERGE框架,通过细粒度语言引导的动作控制、跨粒度协同预训练和细粒度动作-语言对齐,实现了更精确的动作生成、理解和编辑,并建立了新的细粒度文本驱动动作编辑和动作引导推理基准。

详情
AI中文摘要

Recent motion-language models unify tasks like comprehension and generation but operate at a coarse granularity, lacking fine-grained understanding and nuanced control over body parts needed for animation or interaction. This stems from fundamental issues in both the model and the data, in which the model can't focus on motion's localized pattern, and the training data lacks fine-grained supervision. To tackle this, we propose MotionMERGE, a unified framework that bridges the granularity gap. First, we pioneer the study of fine-grained languageguided motion control, including detailed understanding and localized editing, by explicitly modeling motion at part and temporal levels within a single LLM, thereby endowing the model with robust priors for precise control. Second, we design ReasoningAware Granularity-Synergy pre-training, a novel strategy that employs joint supervision for cross-granularity alignment, temporal grounding, localized alignment, motion coherency, and motion-grounded chain-of-thought (CoT) reasoning. This equips the model with fine-grained motion-language alignment, crossgranularity synergy, and explicit reasoning ability. Third, we curate MotionFineEdit, a large-scale dataset (837K atomic + 144K complex triplets) with the first fine-grained spatio-temporal corrective instructions and motion-grounded CoT annotations, establishing a new benchmark for fine-grained text-driven motion editing and motion-grounded reasoning. Extensive experiments demonstrate the capability of MotionMERGE for more precise motion generation, understanding, and editing, and compelling zero-shot generalization to other complex motion tasks. This work represents a significant step toward models that interact with motion in finer granularity and human-like reasoning.

英文摘要

Recent motion-language models unify tasks like comprehension and generation but operate at a coarse granularity, lacking fine-grained understanding and nuanced control over body parts needed for animation or interaction. This stems from fundamental issues in both the model and the data, in which the model can't focus on motion's localized pattern, and the training data lacks fine-grained supervision. To tackle this, we propose MotionMERGE, a unified framework that bridges the granularity gap. First, we pioneer the study of fine-grained languageguided motion control, including detailed understanding and localized editing, by explicitly modeling motion at part and temporal levels within a single LLM, thereby endowing the model with robust priors for precise control. Second, we design ReasoningAware Granularity-Synergy pre-training, a novel strategy that employs joint supervision for cross-granularity alignment, temporal grounding, localized alignment, motion coherency, and motion-grounded chain-of-thought (CoT) reasoning. This equips the model with fine-grained motion-language alignment, crossgranularity synergy, and explicit reasoning ability. Third, we curate MotionFineEdit, a large-scale dataset (837K atomic + 144K complex triplets) with the first fine-grained spatio-temporal corrective instructions and motion-grounded CoT annotations, establishing a new benchmark for fine-grained text-driven motion editing and motion-grounded reasoning. Extensive experiments demonstrate the capability of MotionMERGE for more precise motion generation, understanding, and editing, and compelling zero-shot generalization to other complex motion tasks. This work represents a significant step toward models that interact with motion in finer granularity and human-like reasoning.

2605.18933 2026-05-20 cs.LG

A Geometric Analysis of Sign-Magnitude Asymmetry in a ReLU + RMSNorm Block under Ternary Quantization

对ReLU + RMSNorm块在三元量化下的符号幅度不对称性进行几何分析

Lei Dong

AI总结 本文通过符号幅度分解解释了在三元量化下ReLU + RMSNorm块的符号幅度不对称性,揭示了ReLU和RMSNorm在权重扰动中的几何机制,并通过实验验证了这种不对称性在实际模型中的表现。

Comments 53 pages, 2 figures, 21 tables, 7 appendices

详情
AI中文摘要

预归一化变换器使用RMSNorm可以容忍三元{-1,0,+1}权重量化,其损失出人意料的小(Ma等人,2024)。我们通过符号幅度分解给出了几何解释。在具有独立同分布高斯权重的两层ReLU + RMSNorm模型中,符号翻转产生的横向输出能量是符号保持幅度扰动的π/(π-2)≈2.75倍,当翻转率p→0时(定理3)。机制:ReLU在两种扰动类型之间创建了隐藏空间的方向不对称性,RMSNorm的横向投影Fréchet导数选择性地暴露了这种不对称性。符号量化误差本身是一种符号保持的扰动,具有角度对齐cos²→2/π(定理4);其后ReLU径向分数(0.365)与前ReLU值1-2/π在0.4%内一致,因此ReLU对三元误差几乎是透明的。多层叠加的2.75倍因子未被实验支持;与真实模型符号敏感性之间的差距源于异常特征违反去局部化。对于幅度为α的输入维度,单个符号翻转产生的后ReLU能量放大约为R≈nα²,相对于去局部化的条目。在TinyLlama-1.1B上,线性响应(p≤0.5%)下,计数匹配的NLL利用稳定在约10×≈nE[α²],与每条目理论一致;所有列NLL比率为5.0×,在R_col≤19内(67×PPL差距反映了度量非线性)。测量的异常α在第12层(中位数0.024,最大0.26)确认了重尾浓度。Bussgang常数2/π、RMSNorm几何和ReLU半空间结构共同解释了预归一化模型中的符号幅度不对称性,R≈nα²解释了真实模型的偏差。

英文摘要

Pre-norm Transformers with RMSNorm tolerate ternary {-1,0,+1} weight quantization with surprisingly small loss (Ma et al., 2024). We give a geometric explanation via sign-magnitude decomposition of weight perturbations. In a two-layer ReLU + RMSNorm model with i.i.d. Gaussian weights, sign-flips produce $π/(π-2) \approx 2.75$ times more transverse output energy than sign-preserving magnitude perturbations of equal Frobenius norm, as the flip rate $p \to 0$ (Theorem 3). The mechanism: ReLU creates a hidden-space directional asymmetry between the two perturbation types, which RMSNorm's transverse-projection Fréchet derivative selectively exposes. Sign-quantization error is itself a sign-preserving perturbation with angular alignment $\cos^2 \to 2/π$ (Theorem 4); its post-ReLU radial fraction ($0.365$) matches the pre-ReLU value $1-2/π$ within $0.4\%$, so ReLU is approximately transparent to ternary error. Multi-layer compounding of the $2.75\times$ factor is not experimentally supported; the gap to real-model sign sensitivity arises from outlier features violating delocalization. For an input dimension with amplitude $α$, a single sign-flip produces post-ReLU energy amplified by $R \approx nα^2$ relative to a delocalized entry. On TinyLlama-1.1B, at linear response ($p \leq 0.5\%$), count-matched NLL leverage stabilizes at $\sim 10\times \approx n\mathbb{E}[α^2]$, matching the per-entry theory; the all-column NLL ratio of $5.0\times$ falls within $R_{\mathrm{col}} \leq 19$ ($67\times$ PPL gap reflects metric nonlinearity). Measured outlier $α$ at layer 12 (median $0.024$, max $0.26$) confirms heavy-tailed concentration. The Bussgang constant $2/π$, RMSNorm geometry, and ReLU half-space structure together explain sign-magnitude asymmetry in pre-norm models, with $R \propto nα^2$ accounting for real-model deviations.

2605.18921 2026-05-20 cs.RO

Geo-Data-Driven HD Map Generation Workflow with Integrated Reference-Free Constraint-Based Verification

基于地理数据的高精地图生成工作流与集成的无参考约束验证

Ruidi He, Vaibhav Tiwari, Mohanad Al-Ghobari, Meng Zhang, Andreas Rausch

AI总结 本文提出了一种基于地理数据的高精地图生成工作流,结合了无参考约束验证,以降低对高精度参考数据的依赖,提高在缺乏专业测量数据或独立参考地图时的应用可行性。

详情
AI中文摘要

高精地图是自动驾驶系统的核心构件,但其生成通常依赖于传感器密集的移动测绘任务,而质量评估往往依赖于高精度参考数据。这些依赖性使得高精地图工程成本高且难以在缺乏专门测量数据或独立测量参考地图的环境中应用。本文提出了一种面向工程的基于地理数据的工作流,用于高精地图生成,并集成了表示层面的验证。该工作流使用公开可用的地理工程数据集作为主要输入源,并通过显式的中间表示和处理阶段,将它们转换为现有道路环境的车道级高精地图表示。为了在没有外部参考地图的情况下评估生成的表示,该工作流在工程过程中集成了可执行的基于约束的验证。所选约束来自与自动驾驶和道路设计指南相关的规范。它们直接在生成的车道let表示上进行评估,以检测几何、拓扑和高程相关的一致性问题。该工作流使用来自德国下萨克森州四个城市的基于真实世界shapefile的道路网络数据,并结合受控缺陷注入场景进行评估。真实世界评估显示,生成的地图表示在评估场景中满足所选约束,而缺陷注入研究证明了对所考虑缺陷类型的完全检测,没有观察到假阳性。结果表明,集成可执行验证的基于地理数据的高精地图生成可以在减少传感和参考数据可用性的情况下,为传感器密集的测绘工作流提供模块化和可检查的补充。

英文摘要

High-definition (HD) maps are core artifacts for automated driving systems, but their generation commonly relies on sensor-intensive mobile mapping campaigns, while quality assessment often depends on high-precision reference data. These dependencies make HD map engineering costly and difficult to apply in settings where specialised measurement data or independently measured reference maps are unavailable. This paper presents an engineering-oriented geo-data-driven workflow for HD map generation with integrated representation-level verification. The workflow uses openly available geo-engineering datasets as the primary input source and transforms them into lane-level HD map representations of existing road environments through explicit intermediate representations and processing stages. To assess the generated representations without external reference maps, the workflow integrates executable constraint-based verification into the engineering process. Selected constraints are derived from specifications relevant to automated driving and road-design guidelines. They are evaluated directly on the generated lanelet-based representation to detect geometric, topological, and elevation-related inconsistencies. The workflow is evaluated using real-world shapefile-based road-network data from four cities in Lower Saxony, Germany, and controlled defect-injection scenarios. The real-world evaluation shows that the generated map representations satisfy the selected constraints in the evaluated scenarios, while the defect-injection study demonstrates complete detection of the considered defect types without observed false positives. The results indicate that geo-data-driven HD map generation with integrated executable verification can provide a modular and inspectable complement to sensor-intensive mapping workflows under reduced sensing and reference-data availability.

2605.18905 2026-05-20 cs.LG cs.AI cs.NA cs.NE math.NA

Stability and Discretization Error of State Space Model Neural Operators

状态空间模型神经算子的稳定性与离散化误差

Abderrahim Bendahi, Adrien Fradin, Johan Peralez, Julie Digne, Madiha Nadri

AI总结 本文研究了状态空间模型神经算子的稳定性与离散化误差,通过理论分析建立了神经算子近似方案的离散误差和稳定性保证,提出了针对SS-NOs和FNOs的新的离散误差定理,并通过实验验证了其在不同分辨率下的鲁棒性。

详情
AI中文摘要

神经算子已作为一种强大的、与离散化无关的框架,用于求解偏微分方程(PDEs)。尽管已建立的方法如深度运算网络(DeepONet)已成功实现了运算符的通用逼近,而如傅里叶神经算子(FNOs)等架构已显示出代数收敛速率,但连续理论与其离散数值实现之间的精确理论联系仍是一个挑战。具体来说,连续公式与离散数值稳定性之间的关系尚未被充分探索。在本文中,我们通过建立神经算子近似方案的离散误差和稳定性的理论保证来填补这一空白。我们证明了将解的正则性与输入离散化联系起来的分析界,提供了在现实数值约束下神经算子精度的正式量化。我们为SS-NOs和FNOs的具体情况推导了这些界,从而为这些模型提出了新的离散误差定理。此外,通过输入到状态稳定性(ISS)分析,我们正式评估了离散化对连续域中SS-NOs结果稳定性的影响。我们在1D和2D基准上的实验证实了我们的理论界,并展示了SS-NOs在不同分辨率下的鲁棒性。

英文摘要

Neural operators have emerged as a powerful, discretization-invariant framework for solving partial differential equations (PDEs). Although established approaches like the Deep Operator Network (DeepONet) have successfully achieved universal approximation for operators, and architectures such as Fourier Neural Operators (FNOs) have shown algebraic convergence rates, a precise theoretical connection between the continuous theory and its discrete numerical implementation remains a challenge. Specifically, the relationship between the continuous formulation and the discrete numerical stability has yet to be fully explored. In this paper, we address this gap by establishing theoretical guarantees for the discretization error and stability of neural operator approximation schemes. We prove analytical bounds that link solution regularity to input discretization, providing a formal quantification of neural operator accuracy under real-world numerical constraints. We derive these bounds to the specific cases of State Space Model-based Neural Operators (SS-NOs) and FNOs, thus providing a new discretization error theorem for these models. Additionally, through an input-to-state stability (ISS) analysis, we formally assess the impact of discretization on the stability of SS-NOs results obtained in the continuous domain. Our empirical experiments on 1D and 2D benchmarks validate our theoretical bounds and show the robustness of SS-NOs under varying resolutions.

2605.18904 2026-05-20 cs.LG cs.AI cs.CL

Dynamic Model Merging Made Slim

动态模型合并的轻量级方法

Guodong Du, Wanyu Lin

AI总结 本文提出DiDi-Merging方法,通过可微分的秩分配平衡共享和专家参数,实现更高效的动态模型合并,在参数量上显著优于现有方法。

详情
AI中文摘要

模型合并使在不联合训练或访问原始数据的情况下重用微调模型成为可能。动态合并进一步通过选择性激活任务相关参数并高效组合多个任务的专家来提高灵活性。然而,现有动态方法要么维护一个完整的共享模型加小专家,要么为专家分配过多容量,导致准确性与效率之间的权衡不优。为此,我们提出DiDi-Merging,一种轻量动态合并框架,利用可微分的秩分配来平衡共享和专家参数。通过将参数预算分配建模为低秩模块中的可微分秩优化,并引入无需数据的细化步骤来恢复任务保真度,DiDi-Merging在仅1.24倍单个微调模型参数的情况下匹配现有动态基线,并在1.4倍时超越它们,显著优于需要>2倍存储容量的方法。DiDi-Merging适用于视觉、语言和多模态任务。

英文摘要

Model merging enables the reuse of fine-tuned models without joint training or access to original data. Dynamic merging further improves flexibility by selectively activating task-relevant parameters and efficiently composing experts across multiple tasks. However, existing dynamic methods either maintain a full shared model with tiny experts or allocate excessive capacity to experts, leading to suboptimal accuracy--efficiency trade-offs. To address this, we propose DiDi-Merging, a slim dynamic merging framework that leverages differentiable rank allocation to balance shared and expert parameters. By formulating parameter budgeting as differentiable rank optimization in low-rank modules and introducing a data-free refinement step to recover task fidelity, DiDi-Merging matches prior dynamic baselines at only 1.24x the parameters of a single fine-tuned model and surpasses them at 1.4x, substantially more compact than methods requiring > 2x storage. DiDi-Merging applies across vision, language, and multimodal tasks.

2605.18903 2026-05-20 cs.LG cs.CV

Reasoning Portability: Guiding Continual Learning for MLLMs in the RLVR Era

推理可移植性:引导MLLMs在RLVR时代的持续学习

Qiuhe Hong, Yuyang Liu, Shuo Yang, Tiantian Peng, Fei Zhu, Yonghong Tian

AI总结 本文提出了一种名为推理可移植性(RP)的机制,通过在持续学习中引入推理层面的约束,改进了多模态大语言模型在RLVR环境下的适应能力,实验表明RDB-CL在提升最后准确率方面优于基线方法。

详情
AI中文摘要

在持续学习中,视觉-语言模型(VLM-CL)旨在不断适应新多模态任务的同时保留先前知识。新兴的将多模态大语言模型(MLLMs)与具有可验证奖励的强化学习(RLVR)相结合的范式,要求一种新的模式来引导持续适应。随着推理能力的进步,现在可以在推理层面施加约束。我们正式化了可移植性,即一个样本级别的度量,用于衡量先前策略行为在新任务中的可重用性,并实证表明推理层面的信号在分布外样本上仍可靠,而答案层面的信号则不然。我们将此形式化为推理可移植性(RP),并提出基于推理的动态平衡持续学习(RDB-CL),该方法根据RP调节RLVR中的每样本Kullback-Leibler正则化:一个紧密的锚点在高RP样本上保留可重用的推理,而低RP样本上的放松锚点则允许探索新的推理路径。实验表明,RDB-CL在提升最后准确率方面优于基线方法,相比 vanilla RLVR 基线提升了+12.0%。

英文摘要

Vision-Language Models in Continual Learning (VLM-CL) aim to continuously adapt to new multimodal tasks while retaining prior knowledge. The emerging paradigm that couples Multimodal Large Language Models (MLLMs) with Reinforcement Learning with Verifiable Rewards (RLVR) calls for a new pattern to guide continual adaptation. Advances in reasoning capability now make it feasible to impose constraints at the reasoning level. We formalize portability, a sample-level measure of how reusable the previous policy's behavior is on a new task, and empirically show that reasoning-level signals remain reliable on out-of-distribution samples while answer-level signals do not. We instantiate this as Reasoning Portability (RP) and propose Reasoning-based Dynamic Balance Continual Learning (RDB-CL), which modulates the per-sample Kullback-Leibler regularization in RLVR according to RP: a tight anchor preserves reusable reasoning on high-RP samples, while a relaxed anchor on low-RP samples permits exploration of new reasoning pathways. Experiments show that RDB-CL consistently outperforms baselines, improving Last accuracy by +12.0% over the vanilla RLVR baseline.

2605.18899 2026-05-20 cs.LG cs.AI

Don't Let Bandit Feedback Pull Continual LLM-Recommender Updates Off Target

不要让多臂老虎机反馈将连续LLM推荐系统更新偏离目标

Taesan Kim, Hyeongjun Yun, Jaegul Choo, Chung Park

AI总结 本文提出了一种名为Anchored Bandit Policy Optimization (ABPO)的框架,用于持续改进基于生成式大语言模型的推荐系统,通过结合组内相对策略优化(GRPO)和显式处理曝光偏差和反馈模糊性,以减少因部署日志提供的策略形状上下文老虎机反馈导致的偏差,并提高推荐准确性。

详情
AI中文摘要

基于生成式大语言模型的推荐系统(LLM-Rec)需要持续部署后的更新,但部署日志仅提供策略形状的上下文老虎机反馈:结果仅在由先前服务策略暴露的项目上被观察到,导致曝光偏差,并产生部分、不对称的信号,包括相对可靠的积极响应和模糊的无响应。我们提出了一种连续LLM-Rec更新的Anchored Bandit Policy Optimization(ABPO)框架,结合组内相对策略优化(GRPO)与显式处理曝光偏差和反馈模糊性。具体来说,我们将在每个GRPO滚动组中插入暴露的推荐作为记录的锚点,使组内相对归一化能够针对先前策略实际暴露的动作进行校准,而不是仅针对新采样的滚动。因为正响应和无响应仅通过先前策略暴露被观察到,我们对固定锚点应用自归一化逆倾向评分,以校正策略不匹配。同时,我们将两种反馈类型进行不对称处理:正响应提供相对直接的推荐信号,而无响应仍然模糊,因为它们可能反映真正的不感兴趣或未观察到的外部因素。为了避免因模糊的无响应而过于激进的更新,我们用模型输出标记的置信度来削弱其惩罚,作为无监督的可靠性信号。在Amazon Reviews和MovieLens的五个领域中,我们的方法在推荐准确性上产生了持续的更新收益,同时比先前的基线方法更有效地缓解了先前策略引起的曝光偏差。

英文摘要

Generative LLM-based recommenders (LLM-Rec) require continual post-deployment updates, yet deployment logs provide only policy-shaped contextual bandit feedback: outcomes are observed solely for items exposed by a prior serving policy, inducing exposure bias and yielding partial, asymmetric signals consisting of relatively reliable positive responses and ambiguous no-responses. We propose an Anchored Bandit Policy Optimization (ABPO) framework for continual LLM-Rec updates that combines group-relative policy optimization (GRPO) with explicit treatment of exposure bias and feedback ambiguity. Specifically, we insert the exposed recommendation as a logged anchor into each GRPO rollout group, so that group-relative normalization is calibrated against the action actually exposed by the prior policy rather than against newly sampled rollouts alone. Because both positive- and no-responses are observed only through prior-policy exposure, we apply self-normalized inverse propensity scoring to the fixed anchor for both feedback types to correct for policy mismatch. At the same time, we treat the two feedback types asymmetrically in reliability: positive responses provide relatively direct endorsement signals, whereas no-responses remain ambiguous because they may reflect either true disinterest or unobserved external factors. To avoid overly aggressive updates from ambiguous no-responses, we temper their penalties with self-certainty, using the model's output-token confidence as a verifier-free reliability signal. Across five domains from Amazon Reviews and MovieLens, our method yields consistent post-update gains in recommendation accuracy while mitigating prior-policy-induced exposure bias more effectively than prior baselines.

2605.18895 2026-05-20 cs.RO cs.AI

KG-ASG: Collision-Knowledge-Guided Closed-Loop Adversarial Scenario Generation With Primary-Support Attribution

KG-ASG: 基于碰撞知识的闭环对抗场景生成与主支持属性

Cheng Wang, Chen Xiong, Ziwen Wang, Yuchen Zhou, Qiang Liu

AI总结 本文提出KG-ASG框架,通过碰撞知识引导和主支持属性,提高自动驾驶系统安全验证的对抗有效性、可解释性和可执行性。

详情
AI中文摘要

自动驾驶系统安全验证需要高风险场景覆盖、清晰的碰撞语义、可执行轨迹和可追溯的多车辆交互。现有安全关键场景生成方法通常依赖低级轨迹扰动、碰撞代理优化或单对抗者搜索,可能产生具有模糊碰撞原因或不可控多车辆碰撞的对抗样本。本文提出KG-ASG,一种基于碰撞知识的闭环对抗场景生成框架,具有主支持属性。KG-ASG构建了结构化的碰撞知识库,并训练了一个轻量级的碰撞专家来推断目标碰撞模式、唯一的主对抗者、支持车辆及其交互角色。在该语义先验的引导下,多车辆对抗生成被公式化为主支持过程,其中主对抗者引发主要冲突,支持车辆塑造周围风险结构,而不会成为额外碰撞者。规则、物理、交互安全性和单碰撞器约束被作为硬门来过滤不可执行的样本。为处理反应性驾驶者行为,进一步使用规划器-控制器反馈进行故障诊断、候选重新排序和终端细化。在MetaDrive中重建的WOMD场景上的实验表明,KG-ASG在IDM、Cruise和Expert控制器下实现了强对抗有效性,同时提高了有效主攻击、减少了多碰撞,并获得了闭环恢复收益。这些结果表明,碰撞知识引导和主支持单碰撞器推理提高了自动驾驶安全验证的对抗有效性、可解释性和可执行性。

英文摘要

Safety validation of autonomous driving systems requires high-risk scenario coverage, clear collision semantics, executable trajectories, and attributable multi-vehicle interactions. Existing safety-critical scenario generation methods often rely on low-level trajectory perturbations, collision-proxy optimization, or single-adversary search, which may produce adversarial samples with ambiguous collision causes or uncontrolled multi-vehicle collisions. This paper proposes KG-ASG, a collision-knowledge-guided closed-loop adversarial scenario generation framework with primary-support attribution. KG-ASG constructs a structured collision knowledge base and trains a lightweight Collision Expert to infer the target collision mode, the unique primary adversary, support vehicles, and their interaction roles. Guided by this semantic prior, multi-vehicle adversarial generation is formulated as a primary-support process, where the primary adversary induces the main conflict and support vehicles shape the surrounding risk structure without becoming additional colliders. Rule, physical, interaction-safety, and single-collider constraints are imposed as hard gates to filter non-executable samples. To handle reactive ego behaviors, planner-controller feedback is further used for failure diagnosis, candidate re-ranking, and terminal refinement. Experiments on WOMD scenarios reconstructed in MetaDrive show that KG-ASG achieves strong adversarial effectiveness while improving Valid Primary Attack, reducing multi-collision, and obtaining closed-loop recovery gains under IDM, Cruise, and Expert controllers. These results demonstrate that collision-knowledge guidance and primary-support single-collider reasoning improve adversarial effectiveness, interpretability, and executability for autonomous driving safety validation.

2605.18892 2026-05-20 cs.LG cs.AI cs.DC

Data-Free Client Contribution Estimation via Logit Maximization for Federated Learning

通过Logit最大化实现无数据的客户端贡献估计用于联邦学习

Asim Ukaye, Nurbek Tastan, Mubarak Abdu-Aguye, Karthik Nandakumar

AI总结 本文提出了一种基于Logit最大化的无数据客户端贡献估计和聚合框架CELM,该框架无需共享原始数据、客户端元数据或辅助公开数据,通过客户端更新获取类别证据分数并构建跨客户端证据矩阵,以量化每类的竞争力和类别覆盖范围,从而计算出对少数类提供强判别性证据的客户端贡献权重,提高联邦学习的鲁棒性和性能。

Comments 22 pages, 7 figures

详情
AI中文摘要

联邦学习(FL)使计算机视觉模型能够协同学习,其中隐私和监管限制防止在设备或组织之间集中数据。然而,实际的FL部署往往表现出严重的类别不平衡和标签偏斜,导致标准聚合协议过度拟合主导客户端并降级少数类性能。我们提出了一种基于Logit最大化的无数据、按类别贡献估计和聚合框架(CELM),该框架不需要共享原始数据、客户端元数据或辅助公开数据。FL服务器通过客户端更新获取类别证据分数,并构建跨客户端证据矩阵,该矩阵量化了每类的竞争力和类别覆盖范围。使用该矩阵,我们计算出贡献权重,以提升为少数类提供强判别性证据的客户端的权重。所得到的聚合是稳定的,由于简单约束和动量平滑,且与标准FL训练流水线保持兼容。我们在受控的非独立同分布和病理标签分割的代表性视觉基准上评估了该方法,证明CELM基于的聚合提高了对不平衡和统计异质性的鲁棒性,同时在不需任何额外数据交换的情况下实现了更好的性能。

英文摘要

Federated learning (FL) enables collaborative learning of computer vision models, where privacy and regulatory constraints prevent centralizing data across devices or organizations. However, practical FL deployments often exhibit severe class imbalance and label skew, causing standard aggregation protocols to overfit dominant clients and degrade minority-class performance. We propose a data-free, class-wise contribution estimation and aggregation framework based on logit maximization (CELM) that does not require sharing raw data, client metadata, or auxiliary public datasets. The FL server probes client updates to obtain class-wise evidence scores and assembles a cross-client evidence matrix, which quantifies both per-class competence and class coverage. Using this matrix, we compute contribution weights that upweight clients providing strong, discriminative evidence for underrepresented classes. The resulting aggregation is stable due to simplex constraints and momentum smoothing, and it remains compatible with standard FL training pipelines. We evaluate the approach on representative vision benchmarks under controlled non-IID and pathological label splits, demonstrating that CELM-based aggregation improves robustness to imbalance and statistical heterogeneity, while yielding better performance without requiring any additional data exchange.

2605.18891 2026-05-20 cs.LG cs.AI

Auditing Reasoning-Trace Memorization Claims after Unlearning with Head-Conditioned Canaries

在取消学习后使用头部条件化的候鸟审计推理轨迹记忆化声明

Yanhang Li, Zhichao Fan, Zexin Zhuang

AI总结 该研究通过在DeepSeek-R1-Distill-Qwen-7B上使用LoRA记忆化的虚构作者和NPO取消学习,结合六token候鸟头部条件,审计推理轨迹记忆化声明,发现正向解析器拆分绕过间隙本身并不能识别隐藏的权重级记忆化,也不能排除其存在。

详情
AI中文摘要

对推理模型的取消学习评估有时会显示绕过模式。答案侧看起来已取消学习,但模型自身的推理轨迹仍会发出遗忘内容,这种差距被当作证据表明权重仍记忆。我们使用LoRA记忆化的虚构作者和NPO取消学习,在六token候鸟头部条件下审计此阅读。在一种种子下,用相同的权重交换推理轨迹为短非候鸟预填,答案率下降幅度等于绕过间隙本身,无论预填是否模仿训练模板。在第二种种子下,绕过间隙缩小而非消失,预填交换方向反转并使答案率达到上限。正向解析器拆分绕过间隙本身并不能识别隐藏的权重级记忆化,也不能排除其存在。在不同的distillate中,相同指标因解析器无法找到闭合标签而改变符号。我们推荐在解码时进行模板交换作为廉价的合理性检查,与传统审计并行。

英文摘要

Evaluations of unlearning on reasoning models sometimes show a bypass pattern. The answer side looks unlearned, but the model's own thinking trace keeps emitting the forgotten content, and the gap is taken as evidence that the weights still remember. We audit this reading on DeepSeek-R1-Distill-Qwen-7B with LoRA-memorized fictional authors and NPO unlearning, conditioned on a six-token canary head. On one seed, swapping the thinking trace for a short non-canary prefill on the same weights drops the answer rate by as much as the bypass gap itself, whether the prefill mimics the training template or not. On a second seed the bypass gap shrinks rather than vanishing, and the prefill swap reverses direction and brings the answer rate to ceiling. A positive parser-split bypass gap thus does not by itself identify hidden weight-level memorization, and does not rule it out either. On a different distillate the same metric flips sign because the parser cannot find the closing tag. We recommend a decode-time template swap as a cheap sanity check alongside the canonical audit.

2605.18889 2026-05-20 cs.LG cs.AI

Soft Learning

软学习

Mohammed Aledhari, Ali Aledhari, Fatimah Aledhari, Mohamed Rahouti

AI总结 本文提出软学习框架,通过交叉验证非负最小二乘法发现最优组合权重,实现比深度网络快数十倍的训练速度,同时具备内在可解释性和未来扩展性,优于多种方法,在70%的任务上排名第一。

详情
AI中文摘要

现代机器学习迫使从业者在强大的但昂贵的深度网络和快速但有限的经典算法之间做出选择。本文介绍了软学习,一个维护异质专家库的框架,涵盖线性模型、树集成、核机和神经网络,并通过交叉验证非负最小二乘法发现可证明最优的组合权重。软学习保证能匹配或超过其专家的最佳加权组合,仅在CPU上训练速度比深度网络快两到三个数量级(72-435倍,取决于测试配置),通过学习的权重提供内在可解释性,揭示哪种算法范式最适合数据,并且具有未来保障性:添加专家能保证性能维持或提升。在37个数据集(25个分类,12个回归)上,针对包括CatBoost和调优深度网络在内的九种方法,软学习在70%的任务上排名第一,获得最佳平均排名(Friedman检验,p=1.12×10^-12),并且是唯一同时在分类和回归上均表现优异的方法,无需GPU硬件或超参数调优。这些结果表明从“哪种算法最好?”到“什么是有证明最优的组合?”的范式转变,软学习通过正式保证回答任何数据模态的问题。

英文摘要

Modern machine learning forces practitioners to choose between powerful but expensive deep networks and fast but limited classical algorithms. Here we introduce Soft Learning, a framework that maintains a library of heterogeneous specialists -- spanning linear models, tree ensembles, kernel machines, and neural networks -- and discovers provably optimal combination weights through cross-validated non-negative least squares. Soft Learning is guaranteed to match or exceed the best weighted combination of its specialists, trains over two orders of magnitude faster than deep networks on CPU alone (72-435x faster across tested configurations), provides inherent interpretability through learned weights that reveal which algorithmic paradigm best fits the data, and is future-proof: adding specialists is mathematically guaranteed to maintain or improve performance. Across 37 datasets (25 classification, 12 regression) against nine methods including CatBoost and tuned deep networks, Soft Learning ranks first on 70% of tasks, achieves the best mean rank (Friedman test, p = 1.12 x 10^-12), and is the only method to simultaneously excel at both classification and regression -- all without GPU hardware or hyperparameter tuning. These results suggest a paradigm shift from "which algorithm is best?" to "what is the provably optimal combination?" -- a question Soft Learning answers with formal guarantees for any data modality.

2605.18884 2026-05-20 cs.LG cs.CV

Navigating the Emotion Tree: Hierarchical Hyperbolic RAG for Multimodal Emotion Recognition

在情绪树中导航:用于多模态情绪识别的分层双曲RAG

Zeheng Wang, Bo Zhao, Yijie Zhu, Zhishu Liu, Hui Ma, Ruixin Zhang, Shouhong Ding, Qianyu Xie, Zitong Yu

AI总结 本文提出HyperEmo-RAG,一种利用结构化情绪知识库的检索增强生成框架,通过双曲空间嵌入和证据图构建来提升多模态情绪识别的性能。

详情
AI中文摘要

多模态情绪识别旨在整合文本、音频和视频源以理解人类情感状态。尽管多模态大语言模型在多模态推理方面表现优异,但通常将情绪类别视为独立标签,忽略了人类心理的丰富层次分类。此外,缺乏外部上下文知识使它们容易过度解释噪声线索,进一步复杂化细粒度情绪分类。为了解决这些问题,我们提出了HyperEmo-RAG,一种检索增强生成框架,利用结构化情绪知识库。我们的框架引入了两个关键创新。1)层次双曲 grounding。认识到情绪分类的内在层次树结构,我们将层次情绪标签和多模态样本嵌入到连续双曲空间(Poincaré球)中,并设计了层次束搜索 deliberation 过程,逐步从粗粒度到细粒度级别检索样本。2)结构化证据注入。基于检索到的证据,我们构建证据图,并通过Tree-Aware Attention机制和EmotionGraphFormer将结构化知识作为显式认知上下文注入LLM中,保持图结构信息的完整性。在多个数据集上的实验表明,HyperEmo-RAG显著优于现有方法。

英文摘要

Multimodal emotion recognition aims to integrate text, audio, and video sources to understand human affective states. Although multimodal large language models excel at multimodal reasoning, they typically treat emotion categories as independent labels, ignoring the rich hierarchical taxonomy of human psychology. Moreover, lacking external contextual knowledge makes them highly susceptible to over-interpreting noisy cues, further complicating fine-grained emotion classification. To address these issues, we propose \textbf{HyperEmo-RAG}, a retrieval-augmented generation framework that leverages a structured emotional knowledge base. Our framework introduces two key innovations. 1) Hierarchical hyperbolic grounding. Recognizing the inherent hierarchical tree structure of emotion taxonomies, we jointly embed hierarchical emotion labels and multimodal samples into a continuous hyperbolic space (Poincaré ball) and design a hierarchical beam-search deliberation process that progressively retrieves samples from coarse to fine-grained levels. 2) Structured evidence injection. Based on the retrieved evidence, we construct an evidence graph and inject the structured knowledge as explicit cognitive context into the LLM through a Tree-Aware Attention mechanism and an EmotionGraphFormer, preserving the integrity of graph-structured information. Experiments on multiple datasets demonstrate that HyperEmo-RAG significantly outperforms existing methods.

2605.18883 2026-05-20 cs.LG cs.AI

Prediction Is Not Physics: Learning and Evaluating Conserved Quantities in Neural Simulators

预测并非物理:在神经模拟器中学习和评估守恒量

Andrew Bukowski, Aditya Kothari, Simba Shi, Ishir Rao

AI总结 本文研究了神经网络能否从物理轨迹中学习或选择全局守恒量,通过三个哈密顿系统(抛体运动、单摆和弹簧-质量系统)验证了不同模型在守恒律保持方面的性能,发现黑盒CDN在加入时间一致性损失时表现更优,而多项式CDN对训练配置敏感。

Comments 10 pages

详情
AI中文摘要

训练在哈密顿轨迹上的扩散模型可以达到接近10^-3的滚动MSE,但其能量的标准差比地面真实能量的标准差大7500到36000倍,表明未能保持守恒定律。这一差距促使我们提出核心问题:神经网络能否从物理轨迹中学习或选择全局守恒量?我们研究了三个哈密顿系统:抛体运动、单摆和弹簧-质量系统。我们使用了结构化的T(v)+V(q)能量模型、黑盒守恒发现网络(CDN)、多项式CDN以及条件扩散基线。结构化网络在干净数据上对分析能量的R²≥0.9999,而黑盒CDN在训练时加入时间一致性损失和小的对齐损失(λ_align=0.2)时,R²≥0.996。当λ_align=0时,CDN在单摆和弹簧-质量系统上Pearson R²崩溃(<10^-3),表明仅靠时间一致性无法可靠地识别真实能量。在1%的加性高斯噪声下,CDN在抛体和弹簧-质量系统上优于结构化模型,表明CDN可能在该设置下对噪声输入更鲁棒。然而,多项式CDN对训练配置敏感:在单摆系统上短训练计划下R²=0.78,但通过更多训练时间和数据可以达到R²=0.9998,无论是否加入噪声。

英文摘要

A diffusion model trained on Hamiltonian trajectories can achieve rollout MSE near $10^{-3}$, but the standard deviation of its energy over time is between 7500 and 36000 times larger than the ground-truth energy standard deviation, indicating a failure to preserve conservation laws. This gap motivates our central question of whether neural networks can learn or select globally conserved quantities from physical trajectories. We investigate this across three Hamiltonian systems: projectile motion, pendulum, and spring-mass. We use a structured $T(v)+V(q)$ energy model, a black-box Conservation Discovery Network (CDN), a polynomial CDN, and a conditional diffusion baseline. The structured network reaches $R^2 \geq 0.9999$ against analytical energy on clean data, while the black-box CDN reaches $R^2 \geq 0.996$ when trained with temporal consistency plus a small alignment loss to analytical energy at $t=0$ ($λ_{\mathrm{align}}=0.2$). With $λ_{\mathrm{align}}=0$, CDN Pearson $R^2$ collapses on pendulum and spring-mass ($< 10^{-3}$), showing that temporal consistency alone is not enough to reliably identify the true energy. Under $1\%$ additive Gaussian noise, the CDN outperforms the structured model on the projectile and spring-mass systems, suggesting that the CDN may be more robust to noisy inputs in this setting. However, the polynomial CDN is sensitive to training configuration: it achieves $R^2=0.78$ under a short training schedule on the pendulum system, but reaches $R^2=0.9998$ with more training time and data, regardless of whether noise is added.

2605.18882 2026-05-20 cs.LG cs.AI

To Call or Not to Call: Diagnosing Intrinsic Over-Calling Bias in LLM Agents

叫还是不叫:诊断LLM代理中的内在过度调用偏差

Wei Shi, Ziheng Peng, Sihang Li, Xiting Wang, Xiang Wang, Mengnan Du, Na Zou

AI总结 本文研究了LLM代理中过度调用现象,提出内在偏差假说,通过稀疏自编码器恢复行为对齐的特征基,减少到带符号激活边距,并估计偏移量,从而修正过度调用问题。

详情
AI中文摘要

LLM代理表现出一种一致的倾向,即在不需要工具的情况下也频繁调用工具。在When2Call基准测试中,三个家族的六个模型显示出较高的调用准确性,但调用准确性远低于不调用准确性,导致总体准确性在55%-70%之间。我们将其归因于内在偏差假说(IBH):调用/不调用决策映射具有激活无关的调用偏移,因此模型在激活平衡时仍倾向于调用。使用稀疏自编码器(SAEs),我们恢复了与调用/不调用决策对齐的特征基,将其减少到带符号激活边距,并直接估计偏移量。在所有六个模型中,只有当不调用激活超过调用激活时,模型才是决策中性的,这与IBH一致。然后,我们通过自适应边距校准引导(AMCS)进行因果测试,这是一种沿SAE解码器方向的闭合形式反偏移。消除诊断出的偏移量可以减轻过度调用并提高总体准确性,同时调用准确性下降很小。我们的工作将过度调用从经验现象转变为可以进行因果修正的机制性对象。代码可在https://github.com/SKURA502/agent-sae/上获取。

英文摘要

LLM agents exhibit a consistent tendency to over-call, invoking tools even in situations where none is needed. On the When2Call benchmark, six models from three families show high call accuracy but much lower no-call accuracy, leaving overall accuracy in the 55%-70% range. We trace this to an Intrinsic Bias Hypothesis (IBH): the call/no-call decision mapping carries an activation-independent call offset, so the model favors call even at activation parity. Using Sparse Autoencoders (SAEs), we recover behavior-aligned feature bases for the call/no_call decision, reduce them to a signed activation margin, and estimate the offset directly. Across all six models, the model is decision-neutral only when no_call activation outweighs call activation, consistent with IBH. We then causally test IBH with Adaptive Margin-Calibrated Steering (AMCS), a closed-form counter-bias shift along SAE decoder directions. Cancelling the diagnosed offset mitigates over-calling and improves overall accuracy with a negligible drop in call accuracy. Our work recasts over-calling from an empirical phenomenon into a mechanistic object amenable to causal correction. Code is available at https://github.com/SKURA502/agent-sae/.

2605.18881 2026-05-20 cs.LG physics.flu-dyn

Emergence of a Flow-Assisted Casting Strategy for Olfactory Navigation via Memory-Augmented Reinforcement Learning

气味导航中通过记忆增强强化学习的流辅助铸造策略的出现

Changxu Zhao, Dongxiao Zhao, Xin Bian, Gaojin Li

AI总结 研究通过记忆增强强化学习探讨了在动态流场中动物如何利用记忆长度和流条件优化气味搜索效率,发现智能体通过自适应调整搜索轨迹几何形状和启动铸造的浓度阈值来最大化成功概率。

详情
AI中文摘要

在动态流场中,尽管依赖随机检测,各种动物表现出显著的气味搜索能力。有趣的是,存在一个最佳时间窗口,可以整合这些检测以最大化搜索效率。为了理解其内在机制,我们研究了在不稳定的流中,不同记忆长度和流条件下的强化学习(RL)智能体的导航性能。在没有任何预定义模型的情况下,智能体发展出一种流辅助的铸造策略,并自适应地调整其搜索轨迹的几何形状和启动铸造的浓度阈值以最大化成功率。智能体朝气味源的平均速度对记忆长度表现出非单调依赖性,这可以由“扇区搜索”模型解释。

英文摘要

In dynamic flow fields, various animals exhibit remarkable odor search capabilities despite relying on stochastic detections. Interestingly, there exists an optimal time window for integrating these detections that maximizes search efficiency. To understand the underlying mechanism, we investigate the navigation performance of Reinforcement Learning (RL) agents in unsteady flows under varying memory lengths and flow conditions. Without any predefined models, the agents develop a flow-assisted casting strategy and adaptively adjust both the geometry of their search trajectories and the concentration threshold for initiating casting to maximize the success rate. The agent's average speed toward the odor source exhibits a non-monotonic dependence on memory length, which can be explained by the "sector-search" model.

2605.18880 2026-05-20 cs.LG cs.CV q-bio.QM

A Multi-Dimensional Clustering Approach for Identifying Inborn Errors of Immunity

一种多维聚类方法用于识别先天性免疫缺陷

Nishad Kulkarni, Alexandra K. Martinson, Nicholas L. Rider, Michael Keller, Syed Muhammad Anwar

AI总结 本文提出一种多维聚类方法,用于从全国数据注册中识别新的罕见疾病模式并提取与先天性免疫缺陷相关的特征,通过改进IEI特征意识和开发罕见疾病人群分析的数据工具包,扩展了复杂医疗记录到可被无监督ML解释的数据结构。

Comments Accepted at EMBC 2026

详情
AI中文摘要

先天性免疫缺陷(IEI)等罕见疾病需要早期诊断以防止终器官损伤并提高生活质量。获取和整理大规模电子健康记录(EHR)数据的障碍限制了常规数据驱动分析保持在IEI和其他罕见疾病趋势的前沿。在IEI中开发机器学习(ML)算法进行模式识别以及已发表的方法研究如何系统地处理和整合复杂医疗数据有限。我们提出的流程,包括数据整理和ML聚类算法,旨在识别新的罕见疾病模式并从全国数据注册中提取IEI相关的特征。我们的EHR数据格式化和处理方法提出了一个流程,将原始免疫学实验室数据转换为向量。这进一步结合了通过聚类进行疾病模式识别的超参数调优。本研究改进了IEI特征意识,开发了罕见疾病人群分析的数据工具包,并扩展了将复杂医疗记录转换为可被无监督ML解释的数据结构。

英文摘要

Rare diseases such as inborn errors of immunity (IEI) require early diagnosis to prevent end organ damage and improve quality of life. Hurdles in accessing and curating large scale electronic health record (EHR) data limit routine data driven analyses to remain on the forefront of IEI and other rare disease trends. Development of machine learning (ML) algorithms in IEI for pattern recognition as well as published methodology examining how to systematically process and integrate complex medical data is limited. Our proposed pipeline, including data curation and ML clustering algorithms, is designed to recognize novel rare disease patterns and extract IEI- associated features from a national data registry. Our methodology for EHR data formatting and processing presents the pipeline that transforms raw immunologic lab data into vectors. This is further combined with hyperparameter tuning for diseases pattern recognition via clustering. This study refines IEI feature awareness, develops data tool kits for rare disease populations analysis, and expands on transforming complex medical records in data structures interpretable by unsupervised ML.