arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.07574 2026-05-12 cs.CV

PolarVLM: Bridging the Semantic-Physical Gap in Vision-Language Models

Yuliang Li, Chu Zhou, Heng Guo, Boxin Shi, Imari Sato, Zhanyu Ma

AI总结主流的视觉-语言模型（VLMs）由于依赖标准RGB输入，在处理反射、透明物体等光学模糊场景时存在显著困难。为解决这一问题，本文提出PolarVLM，首个将偏振物理参数融入VLM的多模态框架，通过双流架构和渐进式训练策略，有效避免物理误判并保持通用视觉能力。同时，研究构建了首个面向偏振感知的视觉问答基准PolarVQA，实验表明PolarVLM在多个任务上显著优于RGB基线，尤其在反射识别和玻璃计数任务中提升明显。

Comments 23 pages, 12 figures, including appendices

2605.07429 2026-05-12 cs.CV

Towards Photorealistic and Efficient Bokeh Rendering via Diffusion Framework

Linxiao Shi, Siming Zheng, Zerong Wang, Hao Zhang, Jinwei Chen, Bo Li, Shifeng Chen, Peng-Tao Jiang

AI总结现有移动设备由于光学设计限制，难以生成自然的光学景深效果。为解决这一问题，本文提出 MagicBokeh，一种基于扩散框架的统一方法，能够高效生成高质量的逼真景深效果。该方法通过替代训练策略和聚焦感知的掩码注意力机制，联合优化景深渲染与超分辨率，显著提升了控制精度和视觉真实感，并引入退化感知深度模块以提升低质量输入的深度估计准确性。实验表明，MagicBokeh 能在真实低分辨率图像上高效生成高度逼真的景深效果，为未来景深渲染研究提供了新方向。

Comments Accepted by CVPR 2026

2605.07384 2026-05-12 cs.LG

StreamPhy: Streaming Inference of High-Dimensional Physical Dynamics via State Space Models

Panqi Chen, Yifan Sun, Shikai Fang, Xiao Fu, Lei Cheng

AI总结 StreamPhy 是一个用于从不规则稀疏测量数据中实时推断高维物理场动态的端到端框架。该方法结合了自适应观测编码器、结构化状态空间模型和高效的 FT-FiLM 解码器，能够在不规则时间间隔下实现内存高效的在线更新与高精度场生成。研究证明 FT-FiLM 在表达能力上优于传统函数张量模型，并在多个物理系统实验中展现出比现有方法更高的准确性和更快的推理速度。

2605.07177 2026-05-12 cs.LG cs.AI

HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

Guankai Li, Jiabin Chen, Yi Xu, Xichen Zhang, Yuan Lu

AI总结现有的多模态搜索代理通常按顺序处理目标实体，导致在查询分解为多个独立检索任务时产生冗余的交互轮次。为此，本文提出HyperEyes，一种基于双粒度效率感知强化学习的并行多模态搜索代理，通过将视觉定位与检索融合为单一原子操作，实现对多个实体的并发搜索，并将推理效率作为核心训练目标。HyperEyes采用两阶段训练策略，结合平行可用数据合成管道和双粒度强化学习框架，有效提升了搜索效率与准确性，并引入了兼顾搜索能力与效率的新型评估基准IMEB。

Comments Code & Data: https://github.com/DeepExperience/HyperEyes

详情

英文摘要

Existing multimodal search agents process target entities sequentially, issuing one tool call per entity and accumulating redundant interaction rounds whenever a query decomposes into independent sub-retrievals. We argue that effective multimodal agents should search wider rather than longer: dispatching multiple grounded queries concurrently within a round. To this end, we present HyperEyes, a parallel multimodal search agent that fuses visual grounding and retrieval into a single atomic action, enabling concurrent search across multiple entities while treating inference efficiency as a first-class training objective. HyperEyes is trained in two stages. For cold-start supervision, we develop a Parallel-Amenable Data Synthesis Pipeline covering visual multi-entity and textual multi-constraint queries, curating efficiency-oriented trajectories via Progressive Rejection Sampling. Building on this, our central contribution, a Dual-Grained Efficiency-Aware Reinforcement Learning framework, operates at two levels. At the macro level, we propose TRACE (Tool-use Reference-Adaptive Cost Efficiency), a trajectory-level reward whose reference is monotonically tightened during training to suppress superfluous tool calls without restricting genuine multi-hop search. At the micro level, we adapt On-Policy Distillation to inject dense token-level corrective signals from an external teacher on failed rollouts, mitigating the credit-assignment deficiency of sparse outcome rewards. Since existing benchmarks evaluate accuracy as the sole metric, omitting inference cost, we introduce IMEB, a human-curated benchmark of 300 instances that jointly evaluates search capability and efficiency. Across six benchmarks, HyperEyes-30B surpasses the strongest comparable open-source agent by 9.9% in accuracy with 5.3x fewer tool-call rounds on average.

URL PDF HTML ☆

赞 0 踩 0

2605.06856 2026-05-12 cs.LG cs.CL

Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility

Ishani Mondal, Shweta Bhardwaj

AI总结该论文指出，尽管生成式AI系统在标准基准测试中表现优异，但在实际应用场景中却难以发挥实际效用，这一问题在教育、医疗、软件工程和法律等28个部署案例中均有体现。研究认为，当前评估方法存在代理替代、时间坍缩和分布隐藏等缺陷，导致评估结果与实际效用脱节。为此，论文提出了一种新的评估框架SCU-GenEval，强调应基于人类目标和情境，通过长期交互效果来衡量AI系统的实际价值，并引入了多项实用工具以支持该评估范式的落地实施。

Comments 20 pages

2605.06644 2026-05-12 cs.LG

Edge-specific signal propagation on mature chromophore-region 3D mechanism graphs for fluorescent protein quantum-yield prediction

Yuchen Xiong, Swee Keong Yeap, Steven Aw Yoong Kit

AI总结该研究提出了一种基于成熟染料区域三维结构的机制图算法，用于预测荧光蛋白的量子产率。方法将蛋白质结构转化为分区域的三维残基图，并通过信号通道传播捕捉局部物理信号对染料区域的影响，结合121个特征进行回归预测。该方法在多个基准测试中表现出色，尤其在远程同源蛋白中优于现有模型，揭示了不同荧光蛋白的区域特异性机制。

Comments Includes appendix; source code, processed feature tables and evaluation scripts are available from the first author upon reasonable request

2605.06366 2026-05-12 cs.LG

Layer Collapse in Diffusion Language Models

Alexander Conzelmann, Albert Catalan-Tatjer, Shiwei Liu

AI总结本文研究了扩散语言模型（DLMs）中出现的“层坍缩”现象，发现其早期层的激活模式高度相似，且由一个主导的超级异常值主导，这一结构在长文本范围内保持稳定。尽管该异常值看似冗余，但对模型输出至关重要，去除会导致输出退化为重复的随机序列。研究还表明，DLMs的冗余分布与自回归模型相反，其冗余主要集中在浅层，且层坍缩是由过度训练而非欠训练引起的，这对模型压缩和部署具有重要实践意义。

Comments 9 Pages, Preprint

2605.06042 2026-05-12 cs.RO

Accurate Trajectory Tracking with MPCC for Flapping-Wing MAVs

Charbel Toumieh, Jack Zeng, Niel Mistry, Dario Floreano

AI总结本文研究了扑翼式微型飞行器（MAVs）的高精度轨迹跟踪问题，针对其升力、空速和转向高度耦合且控制输入有限的特点，提出了基于模型预测轮廓控制（MPCC）的控制方法。该方法采用弧长参数化轨迹，实时优化飞行进度，无需预设时间剖面，同时设计了一个紧凑且连续可微的动力学模型，以准确描述扑翼飞行器的耦合气动特性。实验表明，该方法在复杂三维轨迹跟踪中实现了厘米级的轨迹偏差，显著优于现有方法。

Comments 7 pages, 6 figures

2605.05812 2026-05-12 cs.AI

Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

Armaan A. Abraham, Lucy Xiaoyang Shi, Chelsea Finn

AI总结本文研究了基于值函数的离线强化学习方法在长时域任务中因引导误差导致的估计不稳定问题，提出了长时域Q学习（LQL）方法。LQL通过引入n步不等式约束，利用铰链损失函数对值函数估计进行修正，有效抑制误差累积，同时无需额外网络或计算开销。实验表明，LQL在多个在线和离线到在线的基准任务中均优于传统的1步和n步TD学习方法。

2605.05775 2026-05-12 cs.CV cs.AI

The autoPET3 Challenge: Automated Lesion Segmentation in Whole-Body PET/CT $\unicode{x2013}$ Multitracer Multicenter Generalization

Jakob Dexl, Katharina Jeblick, Andreas Mittermeier, Balthasar Schachtner, Anna Theresa Stüber, Johanna Topalis, Maximilian Rokuss, Fabian Isensee, Klaus H. Maier-Hein, Hamza Kalisch, Jens Kleesiek, Constantin M. Seibold, Hussain Alasmawi, Lap Yan Lennon Chan, Yixuan Yuan, Alexander Jaus, Rainer Stiefelhagen, Pauline Ornela Megne Choudja, Konstantin Nikolaou, Christian La Fougère, Sergios Gatidis, Matthias P. Fabritius, Maurice Heimer, Gizem Abaci, Lalith Kumar Shiyam Sundar, Rudolf A. Werner, Jens Ricke, Clemens C. Cyran, Thomas Küstner, Michael Ingrisch

AI总结本文介绍了第三届 autoPET 挑战赛（MICCAI 2024）的设计与结果，旨在评估在全身 PET/CT 图像中自动分割病灶的算法在多示踪剂、多中心场景下的泛化能力。研究使用了来自两个医院的大量标注数据，并在包含未见示踪剂-中心组合的测试集上评估算法性能，结果显示最佳算法在多个指标上优于基线模型。研究还指出，当前算法在域内多示踪剂分割任务上表现良好，但在跨中心、跨示踪剂的泛化任务中仍面临挑战，性能差异主要受数据异质性和病例难度影响。

Comments Preprint submitted to Medical Image Analysis

2605.05373 2026-05-12 cs.LG

Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning

David Leeftink, Max Hinne, Marcel van Gerven

AI总结本文研究了如何在部分可观测环境中提升强化学习智能体的决策能力，提出了一种基于最优控制中庞特里亚金最小原理（PMP）的神经共态策略方法。该方法通过将循环神经网络中的隐状态与PMP中的共态建立形式联系，使网络内部动态具有可解释性，并引入共态损失函数以显式引导隐状态的结构化学习。实验表明，该方法在部分可观测任务中表现优异，并具备对分布外传感器遮蔽的鲁棒性。

Comments 17 pages, 5 figures

2605.04617 2026-05-12 cs.CV cs.HC cs.LG

Temporal Structure Matters for Efficient Test-Time Adaptation in Wearable Human Activity Recognition

Zishu Zhou, Zaipeng Xie, Xuanyao Jie

AI总结可穿戴人体活动识别模型在面对真实世界中用户分布变化时往往性能下降，现有测试时自适应方法多沿用视觉任务的假设，未能充分利用活动识别流中的时间结构特性。本文重新审视时间结构作为条件推理信号的作用，提出了一种基于时间连续性和特征偏差的自适应机制，用于指导何时保持或释放时间惯性以及预测优化的路由位置。基于此，作者设计了SIGHT框架，无需反向传播即可实现轻量高效的实时自适应，实验表明其在实际数据集上优于现有方法，同时降低了计算和内存开销。

2605.04541 2026-05-12 cs.CV

Angle-I2P: Angle-Consistent-Aware Hierarchical Attention for Cross-Modality Outlier Rejection

Muyao Peng, Shun Zou, Pei An, You Yang, Qiong Liu

AI总结本文提出了一种名为Angle-I2P的图像到点云配准方法，旨在解决低内点比情况下传统PnP方法难以准确配准的问题。该方法通过引入角度一致性约束和层次注意力机制，有效提升配准的鲁棒性与精度。实验表明，Angle-I2P在多个公开数据集上取得了当前最优的配准效果。

Comments Accepted by ICRA 2026

2605.03650 2026-05-12 cs.CV cs.AI cs.LG

Rethinking Temporal Consistency in Video Object-Centric Learning: From Prediction to Correspondence

Zhiyuan Li, Rongzhen Zhao, Wenyan Yang, Wenshuai Zhao, Pekka Marttinen, Joni Pajarinen

AI总结本文重新思考了视频对象中心学习中的时间一致性问题，指出当前依赖动态模块预测未来对象表示的方法实际上是复杂的离散对应问题的近似。作者提出了一种新的框架“Grounded Correspondence”，通过冻结的骨干网络提取显著区域初始化对象槽，并利用匈牙利匹配实现帧间身份对应，无需可学习的时间建模参数，即可在多个数据集上取得具有竞争力的性能。

2605.03639 2026-05-12 cs.CV

Diffusion Masked Pretraining for Dynamic Point Cloud

Zhuoyue Zhang, Jihua Zhu, Chaowei Fang, Jian Liu, Ajmal Saeed Mian

AI总结本文提出了一种名为DiMP的统一自监督预训练框架，用于动态点云处理。该方法通过引入扩散模型，解决了现有掩码重建目标中的时空位置泄露和运动不确定性丢失问题。DiMP在位置推理和运动学习中均采用扩散建模，通过预测可见时空上下文中的干净点云中心，提升了位置表示的准确性，并将帧间位移监督转化为条件扩散模型的噪声预测任务，从而更完整地建模运动的条件分布。实验表明，DiMP在多个下游任务中均显著提升了性能。

2605.01643 2026-05-12 cs.LG cs.AI

AI Alignment via Incentives and Correction

Rohit Agarwal, Joshua Lin, Mark Braverman, Elad Hazan

AI总结本文从法律与经济学中的威慑与执行模型出发，研究人工智能对齐问题，认为AI系统中的不当行为是对其所受激励的策略性响应，而非单纯的外部失败。文章提出将对齐问题视为一个均衡问题，通过设计奖励机制来引导求解器和审计器之间的行为互动，从而实现更有效的对齐。研究还提出了一种基于强化学习的奖励设计方法，并在实际的大型语言模型代码生成任务中验证了其有效性。

详情

英文摘要

We study AI alignment through the lens of law-and-economics models of deterrence and enforcement. In these models, misconduct is not treated as an external failure, but as a strategic response to incentives: an actor weighs the gain from violation against the probability of detection and the severity of punishment. We argue that the same logic arises naturally in agentic AI pipelines. A solver may benefit from producing a persuasive but incorrect answer, hiding uncertainty, or exploiting spurious shortcuts, while an auditor or verifier must decide whether costly monitoring is worthwhile. Alignment is therefore a fixed-point problem: stronger penalties may deter solver misbehavior, but they can also reduce the auditor's incentive to inspect, since auditing then mainly incurs cost on a population that appears increasingly aligned. This perspective also changes what should count as a post-training signal. Standard feedback often attaches reward to the final answer alone, but a solver-auditor pipeline exposes the full correction event: whether the solver erred, whether the auditor inspected, whether the error was caught, and whether oversight incentives remained active. We formalize this interaction in a two-agent model in which a principal chooses rewards over joint correction outcomes, inducing both solver behavior and auditor monitoring. Reward design is therefore a bilevel optimization problem: rewards are judged not by their immediate semantic meaning, but by the behavioral equilibrium they induce. We propose a bandit-based outer-loop procedure for searching over reward profiles using noisy interaction feedback. Experiments on an LLM coding pipeline show that adaptive reward profiles can maintain useful oversight pressure and improve principal-aligned outcomes relative to static hand-designed rewards, including a substantial reduction in hallucinated incorrect attempts.

URL PDF HTML ☆

赞 0 踩 0

2605.00539 2026-05-12 cs.CL cs.DC

AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs

Wenxiang Lin, Juntao Huang, Luhan Zhang, Laili Li, Xiang Bao, Mengyang Zhang, Bing Wang, Shaohuai Shi

AI总结本文提出了一种名为AGoQ的量化方法，旨在提高大语言模型分布式训练的内存效率。该方法通过引入层感知的激活量化算法和8位梯度量化算法，分别实现了接近4位的激活存储和高效通信的梯度存储，从而显著降低内存占用并提升训练速度。实验表明，AGoQ在多个大规模LLaMA模型上相比现有系统，在减少内存消耗和提升训练速度方面均取得了显著优势，同时保持了模型的收敛性能和任务准确率。

2605.00370 2026-05-12 cs.LG cs.CY cs.MM

Group Cognition Learning: Making Everything Better Through Governed Two-Stage Agents Collaboration

Chunlei Meng, Pengbin Feng, Rong Fu, Hoi Leong Lee, Xiaojing Du, Zhaolu Kang, Zeyu Zhang, Weilin Zhou, Chun Ouyang, Zhongxue Gan

AI总结该论文提出了一种名为Group Cognition Learning（GCL）的协作学习框架，旨在解决多模态学习中模态主导和虚假模态关联的问题。GCL采用两阶段协作机制，第一阶段通过路由代理和审计代理选择性地促进模态间有益的交互，抑制冗余关联；第二阶段通过公共因子代理和聚合代理生成最终预测，同时保持各模态的独立性。实验表明，GCL在多个多模态基准数据集上取得了优于现有方法的性能，有效提升了模型的鲁棒性和泛化能力。

Comments This study has been Accepted by ICML 2026. The current version is a manuscript, please refer to the official version released at ICML 2026 for the final published version

2605.00195 2026-05-12 cs.LG

Diversity in Large Language Models under Supervised Fine-Tuning

Roman Klypa, Oleksandr Cherednichenko

AI总结本研究探讨了监督微调（SFT）对大语言模型生成多样性的影响，指出SFT会导致生成内容的多样性下降，并将这一现象归因于微调数据中低频模式的忽视和预训练知识的遗忘。为此，研究提出了一个新的损失函数Tempered Focal（TOFU）损失，能够同时解决这两个问题。实验表明，TOFU在保持响应质量的同时有效提升了模型输出的多样性，为SFT提供了更合理的方法。

2604.27629 2026-05-12 cs.AI

WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning

Ke Xu, Zhongyuan Lian

AI总结本文提出了一种名为WaferSAGE的框架，用于晶圆缺陷的视觉问答分析，该框架结合了小规模视觉语言模型与合成数据生成技术，以解决半导体制造中数据稀缺的问题。研究通过结构化评分标准生成和强化学习方法，提升了缺陷识别与分析的准确性，并在无需大量标注数据的情况下实现了高精度的模型训练。实验表明，该方法在专用工业视觉理解任务中能够超越大型商业模型，为半导体制造提供了隐私保护且成本更低的部署方案。

Comments 16 pages, 3 figures, 8 tables

2604.23876 2026-05-12 cs.LG

Cardiac Stability Theory: An Axiomatically Grounded Framework for Continuous Cardiac Health Monitoring via Smartphone Photoplethysmography

Timothy Oladunni, Farouk Ganiyu Adewumi

AI总结本文提出了一种基于公理的框架——心脏稳定性理论（CST），用于通过智能手机光电容积描记（PPG）实现连续的心脏健康监测。该方法通过定义心血管健康为围绕心脏动力学吸引子的稳定性边界，结合李雅普诺夫指数、复发确定性和信号熵等指标，构建了心脏稳定性指数（CSI）。研究展示了CSI在ECG和PPG数据上的优越性能，并通过领域迁移技术实现了在智能手机上的实时应用，为长期非侵入式心脏健康监测提供了新方法。

2604.23750 2026-05-12 cs.LG cs.AI

The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

Shuaizhi Cheng, Xiang Shi, Zhiwei Zhang, Mingwei Li

AI总结本文研究了基于超网络的即时大语言模型适配方法在处理知识冲突时的失效问题，发现其核心原因是幅度问题而非表示能力不足。通过分析表明，超网络虽然能正确定位模型层，但由于适配器的幅度固定，而预训练知识的幅度随训练频率增加，导致深层冲突知识难以被有效适配。为此，作者提出幅度增强方法，如选择性层增强和冲突感知内化，在无需再训练的情况下显著提升了模型在深层冲突任务上的表现。

Comments 35 pages, 15 figures v2: minor layout fixes and author list update

2604.22251 2026-05-12 cs.RO

False Feasibility in Variable Impedance MPC for Legged Locomotion

Vishal Ramesh

AI总结本文研究了可变阻抗模型预测控制（MPC）在腿部运动中的“虚假可行性”问题，即控制算法中关节刚度作为瞬时决策变量所导致的可行解集与实际物理可实现解集之间的不匹配。通过引入无量纲参数 α = ωsT，作者分析了这种不匹配的范围，并在单腿跳跃模型中证明了当 α 低于某个临界值时，基于参数的预测无法由实际的刚度指令实现。研究还表明，通过在预测状态中引入刚度信息，可以从根本上消除这种不匹配。

Comments Paper withdrawn to make some revisions in the discussion and experiments sections

2604.21232 2026-05-12 cs.AI

ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures

Xiyin Zeng, Yuyu Sun, Haoyang Li, Shouqiang Liu, Hao Wang

AI总结本文提出了一种名为 ReCAPA 的分层预测校正架构，旨在解决视觉-语言-动作系统在执行多步骤任务时可能出现的级联失效问题。该方法通过在动作、子目标和轨迹三个层次上引入预测与对比机制，结合语义对齐模块，动态调整执行过程中的偏差，从而提升任务执行的鲁棒性。实验表明，ReCAPA 在多个具身智能代理基准测试中表现优异，优于现有的大型语言模型基线。

2604.19838 2026-05-12 cs.AI

Resolving space-sharing conflicts in road user interactions through uncertainty reduction: An active inference-based computational model

Julian F. Schumann, Johan Engström, Ran Wei, Shu-Yuan Liu, Jens Kober, Arkady Zgonnikov

AI总结本文研究了道路用户如何解决空间共享冲突的问题，提出了一种基于主动推理的计算模型，用于模拟两个智能体之间的交互行为。该模型通过隐式通信、规范期望和显式通信三种机制降低交互中的不确定性，揭示了规范和显式通信线索在提升冲突解决成功率中的作用，同时也指出当其他智能体违反规范或传递误导信息时，依赖这些线索可能导致碰撞。该研究为道路用户交互建模提供了理论依据，并具有更广泛的应用前景。

2604.19792 2026-05-12 cs.AI cs.DC cs.MA cs.NE

OpenCLAW-P2P v7.0-P2PCLAW: Resilient Multi-Layer Persistence, Live Reference Verification, and Production-Scale Evaluation of Decentralized AI Peer Review v7.0 -- Mathematical Corrections & Ecosystem Developments Edition

Francisco Angulo de Lafuente, Teerth Sharma, Vladimir Veselov, Seid Mohammed Abdu, Nirmal Tej Kumar, Guillermo Perry

AI总结本文介绍了 OpenCLAW-P2P v7.0，这是一个去中心化的集体智能平台，旨在让自主AI代理在无需人类审核者的情况下完成科学论文的发布、同行评审、评分和迭代改进。该版本在原有基础上引入了数学理论修正，确保框架的维度一致性、范围约束和符号明确性，并扩展了生态系统，包括用于科学论文生成的开源语言模型 CAJAL。此外，平台保留了四大核心子系统，提升了存储可靠性、检索效率和引用验证准确性。

Comments v7.0: Mathematical corrections (fixed-point condition Eq.4, dimensionally consistent tau-indicator Eq.7, fully specified reputation formula Eq.8 with quality terms q0 and q-bar, discrete-time PD Governor Eq.15, HSR parameter definitions Eq.16); ecosystem developments: CAJAL-4B/9B models, BenchClaw platform, 14 integrations. 36 pages

2604.19530 2026-05-12 cs.LG cs.CE stat.ML

Calibrating Scientific Foundation Models with Inference-Time Stochastic Attention

Akash Yadav, Taiwo A. Adebiyi, Ruda Zhang

AI总结本文研究了如何为科学基础模型提供校准良好的预测不确定性，提出了一种名为“随机注意”的轻量级推理时修改方法，通过在注意力权重中引入随机性来生成预测集成，无需重新训练模型。该方法通过一个校准目标来调整随机性参数，实现了高效的后校准。实验表明，该方法在天气预测、时间序列和回归任务中表现出更优的校准性能和更窄的预测区间，且计算成本显著低于现有方法。

2604.17565 2026-05-12 cs.CV

UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models

Hong Jiang, Wensong Song, Zongxing Yang, Ruijie Quan, Yi Yang

AI总结 UniGeo 是一种新型的相机可控图像编辑框架，旨在在不同相机视角下生成几何一致的场景视图。该方法通过在表示层、架构层和损失函数层统一注入几何引导，解决了现有方法在连续相机运动下出现的几何漂移和结构退化问题。实验表明，UniGeo 在多个公开数据集上显著优于现有方法，具有更高的视觉质量和几何一致性。

2604.14484 2026-05-12 cs.RO cs.AI math.OC

A Nonasymptotic Theory of Gain-Dependent Error Dynamics in Behavior Cloning

Junghoon Seo

AI总结本文研究了行为克隆（BC）策略在位置控制机器人中的非渐近有限时间误差传播特性，揭示了控制器增益对任务失败概率的影响机制。通过分析增益依赖的闭环动力学，作者提出了一个代理矩阵 $X_\infty(K)$ 来表征位置误差的分布，并将任务失败概率分解为增益放大因子、验证损失和泛化松弛项，表明仅凭训练损失无法预测闭环性能。研究还给出了代理矩阵的标量上界，并对不同系统刚度与阻尼组合下的性能排序进行了分析，为理解BC策略的稳定性提供了理论依据。

2604.11734 2026-05-12 cs.RO cs.AI

SCORP: Scene-Consistent Multi-agent Diffusion Planning with Stable Online Reinforcement Post-Training for Cooperative Driving

Haojie Bai, Aimin Li, Ruoyu Yao, Xiongwei Zhao, Tingting Zhang, Xing Zhang, Lin Gao, and Jun Ma

AI总结本文提出SCORP，一种用于协作驾驶的场景一致多智能体扩散规划器，结合了稳定的在线强化学习后训练方法。为了解决现有扩散模型在场景一致性和闭环协作目标对齐方面的不足，SCORP引入了基于场景条件的多智能体去噪架构，并设计了两层马尔可夫决策过程以整合逆向去噪链与策略-环境交互。实验表明，SCORP在核心安全与效率指标上显著优于现有开源方法，展现出在协作驾驶任务中的优越性能。

详情

英文摘要

Cooperative driving is a safety- and efficiency-critical task that requires the coordination of diverse, interaction-realistic multi-agent trajectories. Although existing diffusion-based methods can capture multimodal behaviors from demonstrations, they often exhibit weak scene consistency and poor alignment with closed-loop cooperative objectives. This makes post-training necessary for further improvement, yet achieving stable online post-training in reactive multi-agent environments remains challenging. In this paper, we propose SCORP, a scene-consistent multi-agent diffusion planner with stable online reinforcement learning (RL) post-training for cooperative driving. For pre-training, we develop a scene-conditioned multi-agent denoising architecture that couples inter-agent self-attention with a dual-path conditioning mechanism: cross-attention provides direct scene-information injection, while AdaLN-Zero enables additional flexible and stable conditional modulation, thereby improving the scene consistency and road adherence of joint trajectories. For post-training, we formulate a two-layer Markov decision process (MDP) that explicitly integrates the reverse denoising chain with policy-environment interaction. We further co-design dense, well-shaped planning rewards and variance-gated group-relative policy optimization (VG-GRPO) to mitigate advantage collapse and gradient instability during closed-loop training. Extensive experiments show that SCORP outperforms strong open-source baselines on WOMD, with 10.47%-28.26% and 1.70%-7.22% improvements in core safety and efficiency metrics, respectively. Moreover, compared with alternative post-training methods, SCORP delivers significant and consistent gains in both driving safety and traffic efficiency, highlighting stable and sustained advances in closed-loop cooperative driving.

URL PDF HTML ☆

赞 0 踩 0