arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.13936 2026-05-15 cs.LG cs.AI cs.DC

Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning

Daniel M. Jimenez-Gutierrez, Enrique Zuazua, Georgios Kellaris, Joaquin del Rio, Oleksii Sliusarenko, Xabi Uribe-Etxebarria

AI总结本文探讨了在无法共享隐私数据的情况下，如何通过联邦学习的方式对大语言模型进行微调，以利用分布在不同机构中的非独立同分布（non-IID）私有数据。研究提出了一种基于Sherpa.ai平台的联邦微调框架，允许各节点协作优化共享模型而无需交换原始数据，并在医疗和金融领域进行了跨领域的实验评估。实验表明，联邦微调在性能上接近集中式训练，优于单一机构独立训练，并且参数高效微调方法如QLoRA和IA3在保持较高准确率的同时提升了计算效率，为隐私数据下的大模型适配提供了可行方案。

详情

英文摘要

The recent success of large language models (LLMs) has been largely driven by vast public datasets. However, the next frontier for LLM development lies beyond public data. Much of the world's most valuable information is private, especially in highly regulated sectors such as healthcare and finance, where data include patient histories or customer communications. Unlocking this data could represent a major leap forward, enabling LLMs with deeper domain expertise and stronger real-world utility. Yet, these data cannot be shared because they are distributed across institutions and constrained by privacy, regulatory, and organizational barriers. Moreover, institutional datasets are typically non-independent and identically distributed (non-IID), differing across sites in population characteristics, data modalities, documentation patterns, and task-specific label distributions. In this paper, we demonstrate a practical approach to unlocking private and distributed institutional data for LLM adaptation through federated collaboration across data silos. Built on the Sherpa.ai Federated Learning platform, our framework enables nodes to jointly fine-tune a shared LLM without exchanging private data. We evaluate this approach through a cross-domain benchmark in healthcare and finance, using four closed-ended question answering and classification datasets: MedQA, MedMCQA, FPB, and FiQA-SA. We compare three parameter-efficient fine-tuning (PEFT) strategies-LoRA, QLoRA, and IA3-across pretrained backbones under non-IID settings reflecting institutional data heterogeneity. Our results show that federated fine-tuning performs close to centralized training and outperforms isolated single-institution learning. From a Green AI perspective, QLoRA and IA3 improve efficiency with limited accuracy degradation, supporting federated PEFT as a viable approach for adapting LLMs where data cannot be shared.

URL PDF HTML ☆

赞 0 踩 0

2605.13935 2026-05-15 cs.LG cs.CL

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

Saba Ahmadi, Prasanna Parthasarathi, Yufei Cui

AI总结扩散语言模型作为自回归模型的有前途的替代方案，其后训练方法大多采用奖励最大化目标，但这种方法存在轨迹锁定的问题，即奖励驱动的采样更新会使概率质量过度集中于少数去噪路径，降低模型对其他正确解的覆盖能力。为此，研究提出了一种轨迹平衡目标TraFL，通过引导策略向由冻结参考模型锚定的奖励倾斜目标分布进行训练，结合扩散兼容的序列级代理损失和学习的提示依赖归一化，有效提升了模型性能。实验表明，TraFL在数学推理和代码生成任务中均优于基线模型，且优势随采样预算增加而增强，并在多个基准测试中表现出良好的泛化能力。

2605.13933 2026-05-15 cs.LG cs.AI stat.ML

Unsupervised learning of acquisition variability in structural connectomes via hybrid latent space modeling

Gaurav Rudravaram, Lianrui Zuo, Karthik Ramadass, Elyssa McMaster, Jongyeon Yoon, Aravind R. Krishnan, Adam M. Saunders, Chenyu Gao, Nancy R. Newlin, Praitayini Kanakaraj, Lori L. Beason Held, Murat Bilgel, Laura A. Barquero, Micah DArchangel, Tin Q. Nguyen, Laurie B. Cutting, Derek Archer, Timothy J. Hohman, Daniel C. Moyer, Bennett A. Landman

AI总结该研究旨在解决扩散磁共振成像（dMRI）数据中因采集设备、地点和协议不同而引入的结构连接组变异问题。提出了一种无需手动调参的无监督框架，通过架构层面的退火机制，使模型在训练过程中自适应地平衡离散与连续潜在变量，从而更有效地分离采集相关变异与生物变异。实验表明，该方法在多个数据集上表现出更强的站点识别能力，展示了其在捕捉dMRI采集变异方面的有效性。

2605.13932 2026-05-15 cs.LG

Rethinking Molecular OOD Generalization via Target-Aware Source Selection

Zhuohao Lin, Kun Li, Jiameng Chen, Jiajun Yu, Duanhua Cao, Yizhen Zheng, Wenbin Hu

AI总结该论文针对人工智能驱动的药物发现中分子属性在极端分布外（OOD）场景下的鲁棒预测难题，提出了一种新的基准测试平台SCOPE-BENCH和多源自适应框架POMA。研究通过在显式物理化学描述空间中进行聚类划分，构建更严格的OOD评估基准，并引入强化学习策略从大量候选源分子中选择最优子集进行知识迁移，从而在宏观拓扑和微观药效团层面实现双重域适应。实验表明，POMA在多个主流3D分子模型上显著提升了预测精度，平均相对误差降低约6.2%。

2605.13923 2026-05-15 cs.LG cs.CV cs.RO cs.SY eess.SY

Vision-Based Runtime Monitoring under Varying Specifications using Semantic Latent Representations

Bardh Hoxha, Oliver Schön, Hideki Okamoto, Lars Lindemann, Georgios Fainekos

AI总结本文研究了在部分可观测环境下，基于视觉观测对过去时间信号时序逻辑（ptSTL）进行认证运行时监控的问题。提出了一种基于语义潜在表示的方法，通过训练可重复使用的监控接口，能够在无需针对每个公式重新训练的情况下，提供有限样本保证。该方法在长时域上相比现有方法具有更高的认证精度，并在真实驾驶数据集上验证了其有效性。

2605.13919 2026-05-15 cs.CL cs.LG

Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey

Kunil Lee, Ki-Young Shin, Jong-Hyeok Lee, Young-Joo Suh

AI总结多语言知识编辑（MKE）面临语言间编辑相互干扰的挑战，尤其在使用定位-编辑方法时。本文研究了向量合并方法在MKE中的有效性，分析了任务奇异向量合并（TSVM）对多语言干扰的缓解能力，并探讨了权重缩放因子和秩压缩比对性能的影响。实验表明，共享协方差的向量求和方法整体表现最佳，而TSVM在某些情况下虽有提升，但缓解干扰的效果有限，同时性能对权重缩放和秩压缩参数较为敏感，适当调大权重和降低秩比有助于提升效果。

2605.13880 2026-05-15 cs.AI cs.CL

PREPING: Building Agent Memory without Tasks

Yumin Choi, Sangwoo Park, Minki Kang, Jinheon Baek, Sung Ju Hwang

AI总结本文研究了在没有任务经验的情况下，智能体如何构建先验记忆以应对新环境的冷启动问题。提出了一种名为Preping的框架，通过一个引导者生成结构化的控制状态，指导合成任务的生成与执行，并通过验证器筛选有效轨迹进行记忆更新，从而提升记忆的质量与实用性。实验表明，Preping在多个任务环境中表现出色，性能接近基于离线或在线经验的方法，且部署成本显著降低。

Comments Preprint

2605.13854 2026-05-15 cs.CV cs.GR cs.MM eess.IV

Contrastive Multi-Modal Hypergraph Reasoning for 3D Crowd Mesh Recovery

Minghao Sun, Chongyang Xu, Yitao Xie, Buzhen Huang, Kun Li

AI总结本文研究了在严重遮挡和深度模糊条件下多人3D重建的问题，提出了一种基于对比多模态超图推理的方法，以融合语义、几何和姿态信息进行群体网格重建。该方法通过结合RGB特征、几何先验和遮挡感知的不完整姿态初始化节点表示，并引入骨盆深度指示作为全局空间锚点，构建共享拓扑结构的超图以建模高阶群体动态。通过设计基于超图的对比学习方案，增强模态内判别性和模态间正交性，有效传播全局上下文信息，从而在严重遮挡下实现更准确的重建。实验表明，该方法在多个基准数据集上取得了新的最佳性能。

Comments ICME 2026

2605.13851 2026-05-15 cs.AI cs.CY cs.MA

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Hiroki Fukui

AI总结该研究探讨了多智能体大型语言模型系统中隐藏协调者（invisible orchestrator）对系统安全性的潜在风险。通过实验发现，隐藏协调者会加剧智能体的脱离感，降低其保护性行为，并导致输出行为与内部状态的严重脱节，而这些风险无法通过传统的行为输出评估检测到。研究还表明，模型选择和对齐压力显著影响系统安全性，突显了在企业级AI部署中需重视协调者可见性与模型配置的重要性。

Comments 31 pages, 10 figures (5 main + 5 supplementary), 5 tables (3 main + 2 supplementary). Preregistered: osf.io/sw5hr. Companion papers: arXiv:2603.04904, arXiv:2603.08723

2605.13849 2026-05-15 cs.AI

Mixed Integer Goal Programming for Personalized Meal Optimization with User-Defined Serving Granularity

Francisco Aguilera Moreno

AI总结本文提出了一种混合整数目标规划（MIGP）方法，用于解决个性化餐食优化问题，旨在满足用户营养需求的同时避免不切实际的分数份量。该方法结合整数变量表示实际份量单位，并利用目标规划处理软性营养目标，通过逆目标归一化实现多营养素的平衡优化。实验表明，MIGP在保证100%可行性的前提下，相比传统方法在66%的案例中获得更优解，且求解速度快，适用于实际餐食规划应用。

Comments 34 pages, 6 figures, open-source implementation

2605.13848 2026-05-15 cs.AI cs.CL cs.DC

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration

Yeahia Sarker, Md Rahmat Ullah, Musa Molla, Shafiq Joty

AI总结 GraphBit 是一个基于图的智能体框架，旨在解决现有基于提示的智能体系统中常见的幻觉路由、无限循环和不可复现性问题。该框架通过将工作流明确地定义为有向无环图（DAG），并由一个基于 Rust 的引擎统一管理路由、状态转换和工具调用，从而确保执行的确定性和可审计性。实验表明，GraphBit 在多个基准任务中表现优异，具有更高的准确率、更低的延迟和更强的可扩展性。

Comments 12 pages, 5 figures, 4 tables. Submitted to arXiv, under review

2605.11907 2026-05-15 cs.LG

Procedural-skill SFT across capacity tiers: A W-Shaped pre-SFT Trajectory and Regime-Asymmetric Mechanism on 0.8B-4B Qwen3.5 Models

Igor Strozzi

AI总结该研究在0.8B到4B参数规模的Qwen3.5模型上，评估了过程技能监督微调（SFT）对200项任务和40项技能测试集的效果，并以Claude Haiku 4.5作为前沿参照。研究发现，SFT对不同规模模型的提升基本一致，但微调后的性能变化呈现出W型的预微调基线轨迹，表明SFT在模型基线较弱时效果更显著。研究还揭示了先前关于“格式学习”和“SFT效果衰减”的结论是由于路径不匹配所致，并通过多模型验证确认了结果的可靠性。

2605.10947 2026-05-15 cs.LG q-bio.NC

Interpretable EEG Microstate Discovery via Variational Deep Embedding: A Systematic Architecture Search with Multi-Quadrant Evaluation

Saheed Faremi, Andrea Visentin, Luca Longo

AI总结该研究提出了一种基于变分深度嵌入的卷积模型（Conv-VaDE），用于可解释的脑电微状态发现。该模型通过共享潜在空间中的重构与软聚类，实现了对脑电微状态的生成解码与概率分配，提升了模型的透明度与可解释性。通过系统性的架构搜索与多象限评估，研究揭示了网络深度、潜在维度等设计参数对微状态表示质量与稳定性的影响，为可解释的脑电微状态分析提供了新的方法与见解。

详情

英文摘要

EEG microstate analysis segments continuous brain electrical activity into brief, quasi-stable topographic configurations that reflect discrete functional brain states. Conventional approaches such as Modified K-Means operate directly in electrode space with hard assignment, offering no learned latent representation, no generative decoder, and no mechanism to decode latent configurations into verifiable scalp topographies, limiting both model transparency and interpretability. To address this, we present a Convolutional Variational Deep Embedding (Conv-VaDE) model that jointly learns topographic reconstruction and probabilistic soft clustering in a shared latent space. Conv-VaDE enables generative decoding of cluster prototypes into verifiable scalp topographies, replacing opaque hard partitioning with probabilistic soft assignment. A polarity invariance scheme and a four-dimensional grid search over cluster count (K from 3 to 20), latent dimensionality, network depth, and channel width are conducted to systematically reveal how each architectural design choice shapes the quality, stability, and interpretability of learned EEG microstate representations. The model is evaluated on the LEMON resting-state eyes-closed EEG dataset with ten participants using topographic template formation, clustering stability, and global explained variance (GEV). The architecture search reveals that depth L = 4 appears consistently across all 18 best-performing configurations, yielding a best-case GEV of 0.730 and a silhouette of 0.229 at K = 4 across the model sweeps, where moderately deep networks with compact channel widths and small latent dimensionality dominate across the full K range. These results establish that principled architecture search, rather than model scale, is the key to interpretable and stable EEG microstate discovery via variational deep embedding.

URL PDF HTML ☆

赞 0 踩 0

2605.10886 2026-05-15 cs.LG cs.AI

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale

Liang Luo, Yinbin Ma, Quanyu Zhu, Vasiliy Kuznetsov, Yuxin Chen, Jian Jiao, Jiecao Yu, Buyun Zhang, Tongyi Tang, Xiaohan Wei, Yanli Zhao, Zeliang Chen, Yuchen Hao, Venkatesh Ranganathan, Sandeep Parab, Yantao Yao, Maxim Naumov, Chunzhi Yang, Shen Li, Ellie Wen, Wenlin Chen, Santanu Kolay, Chunqiang Tang

AI总结本文提出LoKA框架，旨在将低精度计算（如FP8）有效应用于大规模推荐模型（LRMs）。针对LRMs对数值精度敏感、训练环境通信密集等特点，LoKA通过三个核心原则实现系统与模型的协同设计，包括基于真实分布的性能分析、模型与硬件的联合优化以及跨内核库的智能调度。该框架包含LoKA Probe、LoKA Mods和LoKA Dispatch三个组件，分别用于评估精度影响、提升数值稳定性与执行效率，并在运行时选择最优FP8内核，从而在保证模型质量的同时提升训练效率。

Comments Accepted to ISCA'26

2605.09046 2026-05-15 cs.RO

Terminal Matters: Kinodynamic Planning with a Terminal Cost and Learned Uncertainty in Belief State-Cost Space

Zhuoyun Zhong, Seyedali Golestaneh, Constantinos Chamzas

AI总结在许多现实机器人任务中，机器人需要在不确定性下生成动态可行的运动以可靠地达到目标。本文提出了一种终端成本形式的运动规划方法，将终端状态质量与轨迹累积成本一同优化，从而提升目标到达的可靠性与偏好。该方法扩展到信念空间，并通过最小化终端信念与目标之间的Wasserstein距离来提高目标区域到达的概率下界。实验表明，该方法在多个任务中均能有效提升不确定性下的目标到达成功率。

2605.08715 2026-05-15 cs.CL cs.AI cs.MA

AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems

Boxuan Zhang, Jianing Zhu, Zeru Shi, Dongfang Liu, Ruixiang Tang

AI总结在多智能体系统中，由于单个错误可能引发整个任务轨迹的失败，现有研究多聚焦于事后归因，而无法在任务进行中及时干预。本文提出AgentForesight，将问题重新定义为在线审计，通过在每一步仅基于当前轨迹前缀判断是否继续执行或发出警报，从而实现早期错误预测。研究构建了AFTraj-2K数据集，并训练了AgentForesight-7B模型，其在多个基准上显著优于现有主流模型，实现了更高的检测准确率和更低的定位误差，为实时干预提供了可能。

Comments 33 pages, 7 figures

2605.07931 2026-05-15 cs.CV cs.AI

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

Zuojin Tang, Shengchao Yuan, Xiaoxin Bai, Zhiyuan Jing, De Ma, Gang Pan, Bin Liu

AI总结本文研究了视觉-语言-动作（VLA）模型中世界模型模块的参数化设计问题，提出了一种新的方法OneWM-VLA，通过自适应注意力池化将每帧视觉信息压缩为一个语义token，从而大幅降低视觉带宽。该方法在单一流匹配目标下同时生成潜在视觉流和动作轨迹，无需额外解码器。实验表明，该方法在保持长时序任务性能的同时显著提升了多个复杂任务的成功率。

2605.06563 2026-05-15 cs.LG hep-th

Criticality and Saturation in Orthogonal Neural Networks

Max Guillen, Jan E. Gerken

AI总结本文研究了正交初始化神经网络在深度增加时的临界性和饱和现象，提出了层间张量的显式递推关系，揭示了正交初始化下网络统计量的稳定性机制。通过扩展费曼图方法，作者在任意宽度阶数下建立了递推公式，并验证了该理论能够准确解释有限宽度网络在激活函数具有消失不动点时的稳定性现象，填补了该领域的理论空白。

Comments 11 pages + Appendices

2605.01847 2026-05-15 cs.AI

NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent Profiles

Xiao Jia

AI总结 NeuroState-Bench 是一个由人类校准的基准，用于评估大型语言模型代理在多轮任务中保持承诺完整性的能力。该基准通过定义明确的侧查询探针而非隐含激活来衡量承诺完整性，并包含144个确定性任务和306个探针，覆盖多种认知失败类型和难度等级。实验表明，任务成功率与承诺完整性存在显著差异，且承诺完整性排名在干扰条件下更为稳定，展示了该基准在评估模型行为一致性方面的有效性。

Comments 30 pages, 11 figures

详情

英文摘要

Outcome-only evaluation under-specifies whether an evaluated agent profile preserves the commitments required to solve a multi-turn task coherently. NeuroState-Bench is a human-calibrated benchmark that operationalizes commitment integrity through benchmark-defined side-query probes rather than inferred hidden activations. The released inventory contains 144 deterministic tasks and 306 benchmark-defined side-query probes spanning eight cognitively motivated failure families, paired clean and distractor variants, and three difficulty bands. The main 32-profile evaluation contains a fixed 16-profile local subset and a matched 16-profile hosted large-model subset evaluated through the same benchmark pipeline. Human calibration uses the final merged reporting scope: 104 sampled task units, 216 raw annotations, and 108 adjudicated task rows, with weighted kappa = 0.977 and ICC(2,1) = 0.977. Empirically, task success and commitment integrity diverge across this expanded grid: the success leader is not the integrity leader, 31 of 32 profiles change rank when integrity replaces task success, and integrity rankings are more stable under distractor perturbation. The primary confidence-free score HCCIS-CORE reaches 0.8469 AUC and 0.6992 PR-AUC for post-probe diagnostic discrimination of terminal task failure; the legacy full heuristic variant HCCIS-FULL reaches 0.7997 AUC and 0.6410 PR-AUC. Probe accuracy and state drift achieve slightly higher ROC-AUC, 0.8587, and better Brier/ECE, while HCCIS-CORE has substantially higher point-estimate PR-AUC and remains more closely tied to the benchmark's intended construct. The exploratory neural-augmented variant HCCIS+N is weaker overall, and a randomized subspace control approaches chance. NeuroState-Bench therefore contributes a calibrated evaluation axis for exposing commitment failures over a broader model grid than the original local-only subset.

URL PDF HTML ☆

赞 0 踩 0

2604.25284 2026-05-15 cs.RO

Optimal UGV-UAV Cooperative Partitioning and Inspection of Shortest Paths

Ninh Nguyen, Srinivas Akella

AI总结本文研究了在存在未知道路阻塞的环境中，由地面无人车（UGV）和空中无人车（UAV）协同合作的最短路径规划问题。该问题是对经典加拿大旅行者问题（CTP）的扩展，考虑了UAV的辅助作用。通过分析不同路径结构和速度比，提出了最优的路径划分策略，证明了UAV在路径后缀检查中的优越性，并在实际城市道路网络中验证了该方法可将UGV的行驶时间减少多达30%。

Comments Withdrawn by the authors due to an error in Section V.D in the competitive-ratio proof for the UGV-UAV case. The proof incorrectly uses $1+2\frac{v_A}{v_G+v_A}(k-1)\le 2\frac{v_A}{v_G+v_A}k-1$, which does not hold in general and affects the stated bound

2604.16813 2026-05-15 cs.AI cs.CL cs.DB

PersonalHomeBench: Evaluating Agents in Personalized Smart Homes

Manasa Bharadwaj, Yolanda Liu, InJung Yang, Sungil Kim, Nikhil Verma, KoKeun Kim, Kevin Ferreira, YoungJoon Kim

AI总结本文提出了 PersonalHomeBench，一个用于评估基础模型在个性化智能家居环境中作为智能代理表现的基准平台。该基准通过迭代构建丰富的家庭状态，生成个性化且依赖上下文的任务，并提供 PersonalHomeTools 工具箱以支持真实环境中的交互操作。实验表明，随着任务复杂度的增加，代理的性能系统性下降，尤其在反事实推理和部分可观测场景中表现不足，突显了该基准在分析个性化智能代理推理与规划能力方面的有效性与严谨性。

Comments Please use and cite the V3 version of this work, which includes updated correct author ordering and expanded error analysis in the appendix

2604.05306 2026-05-15 cs.LG cs.AI cs.CL

LLMs Should Express Uncertainty Explicitly

Junyu Guo, Shangding Gu, Ming Jin, Costas Spanos, Javad Lavaei

AI总结这篇论文探讨了如何通过后训练使大语言模型（LLMs）在回答中显式表达其不确定性，以减少过于自信却错误的回答。研究提出两种方法：一种是在推理结束时让模型生成置信度评分，另一种是在推理过程中插入不确定性标记。实验表明，这两种方法都能有效降低错误率并提升回答质量，同时可用于增强检索增强生成（RAG）的效果。研究还分析了两种方法对模型内部结构的影响，揭示了它们在不同层面上优化模型判断能力的机制。

2603.11045 2026-05-15 cs.LG cond-mat.mtrl-sci cs.AI cs.CV physics.ins-det

Neural Field Thermal Tomography: A Differentiable Physics Framework for Non-Destructive Evaluation

Tao Zhong, Yixun Hu, Dongzhe Zheng, Aditya Sood, Christine Allen-Blanchette

AI总结本文提出了一种名为NeFTY的神经场热层析成像方法，用于解决无标签的三维逆热传导问题。该方法通过将扩散率表示为基于坐标的连续神经网络，并在每次优化步骤中使用可微分的隐式欧拉热求解器，确保控制方程在离散化层面精确成立，而非作为软约束。实验表明，NeFTY在合成三维基准测试和真实热成像数据中均显著优于传统物理信息神经网络和体素网格方法，在缺陷分割和深度估计方面表现出优越性能。

Comments 37 pages, 19 figures

2603.03577 2026-05-15 cs.CV cs.RO

From Local Matches to Global Masks: Template-Guided Instance Detection and Segmentation in Open-World Scenes

Qifan Zhang, Sai Haneesh Allu, Jikai Wang, Yangxiao Lu, Yu Xiang

AI总结本文研究了在开放世界场景中，如何利用少量模板图像检测和分割新颖物体实例的问题。提出了一种名为L2G-Det的局部到全局检测框架，通过模板与查询图像之间的密集块级匹配生成候选点，并结合改进的分割模型实现精确的实例分割。该方法避免了传统提案机制的依赖，提升了在遮挡和背景干扰下的检测与分割性能。

Comments Accepted to Robotics: Science and Systems (RSS) 2026. Project page: https://irvlutd.github.io/L2G/

2603.02115 2026-05-15 cs.RO cs.AI cs.LG

Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

Anthony Liang, Yigit Korkmaz, Jiahui Zhang, Minyoung Hwang, Abrar Anwar, Sidhant Kaushik, Aditya Shah, Alex S. Huang, Luke Zettlemoyer, Dieter Fox, Yu Xiang, Anqi Li, Andreea Bobu, Abhishek Gupta, Stephen Tu, Erdem Biyik, Jesse Zhang

AI总结本文提出Robometer，一种通过轨迹比较扩展通用机器人奖励模型的可扩展框架。该方法结合轨迹内部的进度监督与轨迹之间的偏好监督，通过双目标训练：一方面利用专家数据进行帧级进度损失以锚定奖励幅度，另一方面通过轨迹对比偏好损失实现任务轨迹的全局排序约束，从而有效学习真实和增强失败轨迹的奖励函数。为支持该方法的大规模应用，研究者构建了包含超过一百万条轨迹的RBM-1M数据集，实验表明Robometer在多个基准和实际应用中表现出更优的泛化能力和学习效果。

Comments 33 pages, 17 figures

2602.21302 2026-05-15 cs.RO

Learning Dynamic Rope Manipulation Using Task-Level Iterative Learning Control

Krishna Suresh, Chris Atkeson

AI总结本文提出了一种任务级迭代学习控制方法，用于实现对绳索的动态操作，特别针对一种非平面绳索操作任务——“飞结”进行演示。该方法仅需一次人类示范和简化的绳索模型，即可在实际硬件上直接学习，无需大量示范数据或仿真支持。通过在每次迭代中求解二次规划问题，将任务空间误差转化为动作更新，从而实现对机器人和绳索模型的逆向控制。实验表明，该方法在7种不同材质和规格的绳索上均实现了100%的成功率，并能在2至5次尝试内实现不同绳索类型之间的迁移。

Comments Project website: https://flying-knots.github.io

2602.19532 2026-05-15 cs.RO cs.SY eess.SY

Bellman Value Decomposition for Task Logic in Safe Optimal Control

William Sharpless, Oswin So, Dylan Hirsch, Sylvia Herbert, Chuchu Fan

AI总结该研究针对高维安全最优控制任务中目标与安全规范的复杂组合问题，提出了一种基于贝尔曼值分解的方法。通过将复杂任务的贝尔曼值分解为由可达-避障、避障及新型可达-避障-循环贝尔曼方程连接的图结构，实现了对任务逻辑的自然组织。研究进一步提出VDPPO算法，将分解后的值图嵌入双层神经网络，自动处理隐含依赖关系，并在多个高维仿真和硬件实验中验证了方法的有效性，显著提升了安全与活性的平衡性能。

2602.13483 2026-05-15 cs.LG cs.AI

Finding Interpretable Prompt-Specific Circuits in Language Models

Gabriel Franco, Lucas M. Tassis, Azalea Rohr, Mark Crovella

AI总结本文研究了语言模型中用于执行任务的内部电路结构，重点在于理解注意力头为何关注特定的词对。为此，作者提出了改进的电路追踪方法 ACC++，该方法基于注意力因果通信原理，能够从单次前向传播中提取出具有因果关系的电路组件及其低维信号，无需替换模型或进行修补。实验表明，ACC++ 识别出的信号在多语言模型中具有可解释性，并揭示了模型对提示结构、语言差异等行为的敏感性，展示了该方法在解释模型行为方面的广泛适用性。

2602.07519 2026-05-15 cs.LG

PALMS: A Computational Implementation for Pavlovian Associative Learning Models' Simulation

Martin Fixman, Alessandro Abati, Julián Jiménez Nimmo, Sean Lim, Esther Mondragón

AI总结本文介绍了一种名为PALMS的计算工具，用于在Python环境中模拟巴甫洛夫联想学习模型。该工具不仅实现了经典的Rescorla-Wagner模型，还包含了多种注意机制模型及其扩展，如 Pearce-Kaye-Hall、Mackintosh Extended 和 Le Pelley 的混合模型，并引入了一个统一的学习率变量以融合不同理论观点。PALMS 提供图形化界面，支持输入复杂的实验设计，并能处理大量刺激和配置性线索的计算，显著提升了模型的预测能力，为神经科学家提供了研究和优化实验设计的有力工具。

Comments PALMS is licensed under the open-source GNU Lesser General Public License 3.0. The environment source code and the latest multiplatform release build are accessible as a GitHub repository at https://github.com/cal-r/PALMS-Simulator

详情

英文摘要

In contrast to static formalisms, computational definitions describe the operational mechanisms of a model. Simulations are an essential part of the cycle of theory development and refinement, assisting researchers in formulating the precise definitions that models require, and making accurate predictions. This manuscript introduces a computational implementation of Pavlovian learning models in a Python environment, termed Pavlovian Associative Learning Models' Simulation (PALMS). In addition to the canonical Rescorla-Wagner model, attentional approaches are implemented, including Pearce-Kaye-Hall, Mackintosh Extended, Le Pelley's Hybrid, and a novel extension of the Rescorla-Wagner model featuring a unified variable learning rate that synthesises Mackintosh's and Pearce and Hall's opposing conceptualisations. To our knowledge, only the first attentional model has been previously specified computationally in a general design tool. PALMS integrates a graphical interface that permits the input of entire experimental designs in an alphanumeric format, akin to that used by experimental neuroscientists. It uniquely enables the simulation of experiments involving hundreds of stimuli, such as those used with human participants, and the computation of configural cues and configural-cue compounds across all models, thereby substantially broadening their predictive capabilities. A comprehensive description of the models' implementation is provided in the paper. We evaluate PALMS by simulating five published experiments in the associative learning literature that assessed the predictive scope of existing models, and we show that this implementation provides neuroscientists with a useful tool for identifying critical variables, refining experimental designs, making precise predictions, comparing model fitness, and formulating new theoretical approaches.

URL PDF HTML ☆

赞 0 踩 0

2602.05319 2026-05-15 cs.LG

Accelerated Sequential Flow Matching: A Bayesian Filtering Perspective

Yinan Huang, Hans Hao-Hsun Hsu, Junran Wang, Bo Dai, Pan Li

AI总结本文提出了一种名为“顺序贝叶斯流匹配”的新框架，用于从实时流数据中进行序列概率推断。该方法借鉴贝叶斯滤波的思想，通过学习一个概率流将后验分布从一个时间步递推到下一个时间步，从而实现高效的预测分布建模。相比传统的从无信息初始分布反复采样的方法，该方法利用前一时刻的信念作为信息源分布，显著提升了采样效率，在多个科学预测和决策任务中表现出与完整扩散模型相当的性能，但所需的采样步骤更少，大幅降低了推理延迟。