arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2410.06074 2026-06-01 cs.LG cs.NA math.NA

Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning

可扩展的机械神经网络用于微分方程和机器学习

Jiale Chen, Dingling Yao, Adeel Pervez, Dan Alistarh, Francesco Locatello

AI总结提出可扩展机械神经网络（S-MNN），通过线性化序列长度的计算和空间复杂度，实现高效建模长期动力学，保持精度和可解释性。

Comments Published as a conference paper at the Thirteenth International Conference on Learning Representations (ICLR 2025): https://openreview.net/forum?id=Oazgf8A24z

详情

Journal ref: International Conference on Learning Representations, 2025, pp. 10018-10039

AI中文摘要

我们提出了可扩展机械神经网络（S-MNN），这是一个增强的神经网络框架，专为涉及长时间序列的科学机器学习应用而设计。通过重新表述原始机械神经网络（MNN）（Pervez等人，2024），我们将计算时间和空间复杂度从分别关于序列长度的三次和二次降低到线性。这一显著改进使得在不牺牲准确性或可解释性的情况下高效建模长期动力学成为可能。大量实验表明，S-MNN在精度上与原始MNN相当，同时大幅减少计算资源。因此，S-MNN可以在应用中直接替换原始MNN，为将机械瓶颈集成到复杂动力系统的神经网络模型中提供实用且高效的工具。源代码可在https://github.com/IST-DASLab/ScalableMNN获取。

英文摘要

We propose Scalable Mechanistic Neural Network (S-MNN), an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. By reformulating the original Mechanistic Neural Network (MNN) (Pervez et al., 2024), we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. This significant improvement enables efficient modeling of long-term dynamics without sacrificing accuracy or interpretability. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources. Consequently, S-MNN can drop-in replace the original MNN in applications, providing a practical and efficient tool for integrating mechanistic bottlenecks into neural network models of complex dynamical systems. Source code is available at https://github.com/IST-DASLab/ScalableMNN.

URL PDF HTML ☆

赞 0 踩 0

2601.15197 2026-06-01 cs.AI cs.CL cs.CV cs.RO

LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries

LangForce: 通过潜在动作查询对视觉语言动作模型进行贝叶斯分解

Shijie Lian, Bin Yu, Xiaopeng Lin, Laurence T. Yang, Zhaolong Shen, Changti Wu, Yuzhuo Miao, Cong Huang, Kai Chen

AI总结针对VLA模型在训练中因数据偏差导致语言信息被忽略的问题，提出LangForce框架，通过贝叶斯分解和潜在动作查询构建双分支架构，最大化动作与指令的点互信息，无需新数据即可显著提升泛化能力。

Comments ICML 2026

详情

AI中文摘要

视觉-语言-动作（VLA）模型在机器人操作中显示出潜力，但往往难以泛化到新指令或复杂的多任务场景。我们识别出当前训练范式中的一个关键病理：目标驱动的数据收集造成了数据集偏差。在此类数据集中，仅凭视觉观察就能高度预测语言指令，导致指令与动作之间的条件互信息消失，我们将此现象称为信息崩溃。因此，模型退化为忽略语言约束的纯视觉策略，并在分布外（OOD）设置中失败。为解决此问题，我们提出LangForce，一种通过贝叶斯分解强制执行指令跟随的新框架。通过引入可学习的潜在动作查询，我们构建了一个双分支架构，用于估计纯视觉先验 $p(a \mid v)$ 和语言条件后验 $π(a \mid v, \ell)$。然后我们优化策略以最大化动作与指令之间的条件点互信息（PMI）。该目标有效惩罚了视觉捷径，并奖励明确解释语言命令的动作。无需新数据，LangForce显著提升了泛化能力。在SimplerEnv和RoboCasa上的大量实验证明了显著改进，包括在具有挑战性的OOD SimplerEnv基准上提升11.3%，验证了我们的方法在动作中稳健地锚定语言的能力。

英文摘要

Vision-Language-Action (VLA) models have shown promise in robot manipulation but often struggle to generalize to new instructions or complex multi-task scenarios. We identify a critical pathology in current training paradigms where goal-driven data collection creates a dataset bias. In such datasets, language instructions are highly predictable from visual observations alone, causing the conditional mutual information between instructions and actions to vanish, a phenomenon we term Information Collapse. Consequently, models degenerate into vision-only policies that ignore language constraints and fail in out-of-distribution (OOD) settings. To address this, we propose LangForce, a novel framework that enforces instruction following via Bayesian decomposition. By introducing learnable Latent Action Queries, we construct a dual-branch architecture to estimate both a vision-only prior $p(a \mid v)$ and a language-conditioned posterior $π(a \mid v, \ell)$. We then optimize the policy to maximize the conditional Pointwise Mutual Information (PMI) between actions and instructions. This objective effectively penalizes the vision shortcut and rewards actions that explicitly explain the language command. Without requiring new data, LangForce significantly improves generalization. Extensive experiments across on SimplerEnv and RoboCasa demonstrate substantial gains, including an 11.3% improvement on the challenging OOD SimplerEnv benchmark, validating the ability of our approach to robustly ground language in action.

URL PDF HTML ☆

赞 0 踩 0

2511.16084 2026-06-01 cs.CV cs.AI

SpectralTrain: A Universal Framework for Hyperspectral Image Classification

SpectralTrain：一种通用的高光谱图像分类框架

Meihua Zhou, Liping Yu, Xinyu Tong, Wai Kin Fung, Ruiguo Hu, Jiarui Zhao, Nan Wan

AI总结提出SpectralTrain通用训练框架，通过课程学习与基于PCA的光谱下采样提升高光谱图像分类效率，在多个数据集上实现2-7倍训练加速且精度损失小。

详情

AI中文摘要

高光谱图像（HSI）分类通常涉及大规模数据和计算密集的训练，这限制了深度学习模型在实际遥感任务中的部署。本研究引入SpectralTrain，一个通用的、与架构无关的训练框架，通过将课程学习（CL）与基于主成分分析（PCA）的光谱下采样相结合，提高学习效率。通过逐步引入光谱复杂性同时保留关键信息，SpectralTrain能够在显著降低计算成本的情况下高效学习光谱-空间模式。该框架独立于特定架构、优化器或损失函数，并与经典和最先进（SOTA）模型兼容。在三个基准数据集——Indian Pines、Salinas-A和新引入的CloudPatch-7上的大量实验表明，该框架在空间尺度、光谱特性和应用领域上具有很强的泛化能力。结果显示，训练时间一致减少2-7倍，精度变化取决于骨干网络。在云分类上的应用进一步揭示了其在气候相关遥感中的潜力，强调训练策略优化作为HSI模型中架构设计的有效补充。代码可在https://github.com/mh-zhou/SpectralTrain获取。

英文摘要

Hyperspectral image (HSI) classification typically involves large-scale data and computationally intensive training, which limits the practical deployment of deep learning models in real-world remote sensing tasks. This study introduces SpectralTrain, a universal, architecture-agnostic training framework that enhances learning efficiency by integrating curriculum learning (CL) with principal component analysis (PCA)-based spectral downsampling. By gradually introducing spectral complexity while preserving essential information, SpectralTrain enables efficient learning of spectral -- spatial patterns at significantly reduced computational costs. The framework is independent of specific architectures, optimizers, or loss functions and is compatible with both classical and state-of-the-art (SOTA) models. Extensive experiments on three benchmark datasets -- Indian Pines, Salinas-A, and the newly introduced CloudPatch-7 -- demonstrate strong generalization across spatial scales, spectral characteristics, and application domains. The results indicate consistent reductions in training time by 2-7x speedups with small-to-moderate accuracy deltas depending on backbone. Its application to cloud classification further reveals potential in climate-related remote sensing, emphasizing training strategy optimization as an effective complement to architectural design in HSI models. Code is available at https://github.com/mh-zhou/SpectralTrain.

URL PDF HTML ☆

赞 0 踩 0

2605.11946 2026-06-01 cs.AI

Counterfactual Trace Auditing of LLM Agent Skills

LLM Agent技能的反事实痕迹审计

Xiaolin Zhou, Jinbo Liu, Li Li, Ryan A. Rossi, Xiyang Hu

AI总结提出反事实痕迹审计（CTA）框架，通过配对有无技能的Agent轨迹并生成结构化技能影响模式（SIP）注释，揭示技能对行为的重塑效应，弥补仅通过通过率评估的不足。

Comments Code and data are available at https://github.com/WillChow66/CTA.git

详情

AI中文摘要

大型语言模型Agent越来越多地配备Agent技能。当前对技能的评估方法仍然有限。大多数已部署的基准测试仅报告技能附加前后的通过率，将技能视为对Agent行为的黑盒更改。我们引入了反事实痕迹审计（CTA），这是一个衡量技能如何改变Agent行为的框架。CTA将每个带技能的Agent轨迹与同一任务上不带技能的对应轨迹配对，将两条轨迹分割成目标导向的阶段，对齐这些阶段，并输出结构化的技能影响模式（SIP）注释。这些注释描述了技能的行为效果，而不仅仅是任务结果。我们在SWE-Skills-Bench上使用Claude对49个软件工程任务实例化了CTA。由此产生的审计揭示了一个明显的评估差距。通过率平均仅变化+0.3个百分点，表明总体效果很小。然而，CTA在相同的配对轨迹中识别出522个SIP实例，表明即使在通过率几乎不变的情况下，技能也显著重塑了Agent行为。审计还分离了通过率无法检测到的几种反复出现的效果，包括字面模板复制、偏离任务的人工制品创建、过度规划和任务恢复。出现了三个发现。首先，高基线任务包含了大多数观察到的技能效果，尽管它们的通过率已经饱和，因此无法反映这些效果。其次，基线性能适中的任务显示出最大的可恢复增益，但通常以显著更高的令牌成本为代价。第三，主导的SIP类型可以通过基线桶识别：表面锚定在最高任务中最常见，边缘案例提示在中档和最低任务中最常见。这些规律将非正式的故障模式观察转化为可重复的行为测量。

英文摘要

Large Language Model agents are increasingly augmented with agent skills. Current evaluation methods for skills remain limited. Most deployed benchmarks report only pass rate before and after a skill is attached, treating the skill as a black box change to agent behavior. We introduce Counterfactual Trace Auditing (CTA), a framework for measuring how a skill changes agent behavior. CTA pairs each with skill agent trace with a without skill counterpart on the same task, segments both traces into goal directed phases, aligns the phases, and emits structured Skill Influence Pattern (SIP) annotations. These annotations describe the behavioral effect of a skill rather than only its task outcome. We instantiate CTA on SWE-Skills-Bench with Claude across 49 software engineering tasks. The resulting audit reveals a clear evaluation gap. Pass rate changes by only +0.3 percentage points on average, suggesting little aggregate effect. Yet CTA identifies 522 SIP instances across the same paired traces, showing that the skills substantially reshape agent behavior even when pass rate is nearly unchanged. The audit also separates several recurring effects that pass rate cannot detect, including literal template copying, off task artifact creation, excess planning, and task recovery. Three findings emerge. First, high baseline tasks contain most of the observed skill effects, although their pass rate is already saturated and therefore cannot reflect those effects. Second, tasks with moderate baseline performance show the most recoverable gain, but often at substantially higher token cost. Third, the dominant SIP type can be identified by baseline bucket: surface anchoring is most common on ceiling tasks and edge-case prompting is most common on mid-range and floor tasks. These regularities turn informal failure mode observations into reproducible behavioral measurements.

URL PDF HTML ☆

赞 0 踩 0

2605.11367 2026-06-01 cs.CV

3D-Belief: Embodied Belief Inference via Generative 3D World Modeling

3D-Belief：通过生成式3D世界建模实现具身信念推断

Yifan Yin, Zehao Wen, Suyu Ye, Jieneng Chen, Zehan Zheng, Nanru Dai, Haojun Shi, Aydan Huang, Zheyuan Zhang, Alan Yuille, Jianwen Xie, Ayush Tewari, Tianmin Shu

AI总结提出3D-Belief，一种生成式3D世界模型，通过在线更新显式3D信念，使具身智能体能够在部分可观测环境中想象场景补全并推理，在2D/3D想象质量和下游物体导航任务上优于现有方法。

详情

AI中文摘要

近期视觉生成模型的进展凸显了学习生成式世界模型的前景。然而，现有大多数方法将世界建模视为新视角合成或未来帧预测，强调视觉真实感，而非部分可观测环境下具身智能体所需的结构化不确定性。在这项工作中，我们提出了一种不同的视角：世界建模作为3D空间中的具身信念推断。从这个角度看，世界模型不应仅仅渲染可能看到的景象，而应在获取新观测时维护并更新智能体关于未观测3D世界的信念。我们识别了此类模型的几个关键能力，包括空间一致的场景记忆、多假设信念采样、顺序信念更新以及基于语义的未观测区域预测。我们将这些思想实例化为3D-Belief，一种生成式3D世界模型，它从部分观测中推断出显式、可操作的3D信念，并随时间在线更新。与先前的视觉预测模型不同，3D-Belief直接在3D中表示不确定性，使具身智能体能够想象合理的场景补全并在部分可观测环境中进行推理。我们在场景记忆和未观测场景想象的2D视觉质量、使用我们提出的3D-CORE基准的物体和场景级3D想象，以及模拟和真实世界中的挑战性物体导航任务上评估了3D-Belief。实验表明，与最先进方法相比，3D-Belief提高了2D和3D想象质量以及下游具身任务性能。

英文摘要

Recent advances in visual generative models have highlighted the promise of learning generative world models. However, most existing approaches frame world modeling as novel-view synthesis or future-frame prediction, emphasizing visual realism rather than the structured uncertainty required by embodied agents acting under partial observability. In this work, we propose a different perspective: world modeling as embodied belief inference in 3D space. From this view, a world model should not merely render what may be seen, but maintain and update an agent's belief about the unobserved 3D world as new observations are acquired. We identify several key capabilities for such models, including spatially consistent scene memory, multi-hypothesis belief sampling, sequential belief updating, and semantically informed prediction of unseen regions. We instantiate these ideas in 3D-Belief, a generative 3D world model that infers explicit, actionable 3D beliefs from partial observations and updates them online over time. Unlike prior visual prediction models, 3D-Belief represents uncertainty directly in 3D, enabling embodied agents to imagine plausible scene completions and reason over partially observed environments. We evaluate 3D-Belief on 2D visual quality for scene memory and unobserved-scene imagination, object- and scene-level 3D imagination using our proposed 3D-CORE benchmark, and challenging object navigation tasks in both simulation and the real world. Experiments show that 3D-Belief improves 2D and 3D imagination quality and downstream embodied task performance compared to state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2605.11134 2026-06-01 cs.LG cs.AI

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

偏好优化中的虚假相关学习：机制、后果及通过平局训练的缓解方法

Christian Moya, Alex Semendinger, Guang Lin, Elliott Thornley

AI总结本文通过统一理论分析揭示了偏好优化（如DPO）中虚假相关学习的机制（均值虚假偏差和因果-虚假相关泄漏），证明其导致分布偏移下的不可逆脆弱性，并提出平局训练数据增强策略以选择性减少虚假学习。

Comments Proceedings of the 43rd International Conference on Machine Learning, 2026, Seoul, South Korea

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning, 2026, Seoul, South Korea

AI中文摘要

偏好学习方法（如直接偏好优化DPO）已知会诱导对虚假相关的依赖，导致当前语言模型中的谄媚和长度偏差，并可能在未来系统中造成严重的目标泛化错误。在这项工作中，我们对此现象进行了统一的理论分析，描述了虚假学习的机制、其在部署中的后果以及一种可证明的缓解策略。聚焦于对数线性策略，我们展示了标准偏好学习目标通过两个渠道在总体水平上诱导对虚假特征的依赖：均值虚假偏差和因果-虚假相关泄漏。然后我们表明这种依赖造成了分布偏移的不可逆脆弱性：来自相同训练分布的更多数据无法减少模型对虚假特征的依赖。为了解决这个问题，我们提出了平局训练，一种使用平局（等效用偏好对）的数据增强策略，以引入数据驱动的正则化。我们证明了该方法选择性地减少虚假学习而不降低因果学习。最后，我们在对数线性模型上验证了我们的理论，并提供了实证证据，表明虚假学习机制和平局训练的益处均适用于神经网络和大语言模型。

英文摘要

Preference learning methods like Direct Preference Optimization (DPO) are known to induce reliance on spurious correlations, leading to sycophancy and length bias in today's language models and potentially severe goal misgeneralization in future systems. In this work, we provide a unified theoretical analysis of this phenomenon, characterizing the mechanisms of spurious learning, its consequences on deployment, and a provable mitigation strategy. Focusing on log-linear policies, we show that standard preference-learning objectives induce reliance on spurious features at the population level through two channels: mean spurious bias and causal-spurious correlation leakage. We then show that this reliance creates an irreducible vulnerability to distribution shift: more data from the same training distribution fails to reduce the model's dependence on spurious features. To address this, we propose tie training, a data augmentation strategy using ties (equal-utility preference pairs) to introduce data-driven regularization. We demonstrate that this approach selectively reduces spurious learning without degrading causal learning. Finally, we validate our theory on log-linear models and provide empirical evidence that both the spurious learning mechanisms and the benefits of tie training persist for neural networks and large language models.

URL PDF HTML ☆

赞 0 踩 0

2605.06280 2026-06-01 cs.CV

Eulerian Motion Guidance: Robust Image Animation via Bidirectional Geometric Consistency

欧拉运动引导：基于双向几何一致性的鲁棒图像动画

Thong Nguyen, Khoi M. Le, Cong-Duy Nguyen, Luu Anh Tuan, See-Kiong Ng, Chunyan Miao

AI总结提出使用相邻帧欧拉运动场引导生成，并通过双向几何一致性机制解决遮挡问题，实现加速训练、保持时间连贯性和减少动态伪影。

Comments Work in progress. Code is available at https://github.com/nguyentthong/eulerian_motion_guidance

详情

AI中文摘要

近期图像动画的进展利用扩散模型为静态图像注入活力。然而，现有的可控框架通常依赖于拉格朗日运动引导，其中光流是相对于初始帧估计的。本文通过更局部的监督设计重新审视相同的光流基元：我们使用相邻帧欧拉运动场来引导生成，其中运动信号始终描述一个短时间跳跃。这种转变使得并行训练成为可能，并在整个生成过程中提供有界误差监督。为了减轻相邻帧生成中常见的漂移伪影，我们引入了一种双向几何一致性机制，该机制计算前向-后向循环检查以数学识别并掩蔽遮挡区域，防止模型学习错误的扭曲目标。大量实验表明，与基于参考的基线相比，我们的方法加速了训练，保持了时间连贯性，并减少了动态伪影。

英文摘要

Recent advancements in image animation have utilized diffusion models to breathe life into static images. However, existing controllable frameworks typically rely on Lagrangian motion guidance, where optical flow is estimated relative to the initial frame. This paper revisits the same optical-flow primitive through a more local supervision design: we use adjacent-frame Eulerian motion fields to guide generation, where the motion signal always describes a short temporal hop. This shift enables parallelized training and provides bounded-error supervision throughout the generation process. To mitigate the drift artifacts common in adjacent frame generation, we introduce a Bidirectional Geometric Consistency mechanism, which computes a forward-backward cycle check to mathematically identify and mask occluded regions, preventing the model from learning incorrect warping objectives. Extensive experiments demonstrate that our approach accelerates training, preserves temporal coherence, and reduces dynamic artifacts compared to reference-based baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.01581 2026-06-01 cs.RO

Hyper-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control

Hyper-DP3: 面向视觉运动控制的3D扩散策略的频率感知规模调整

Jinhao Zhang, Zhexuan Zhou, Huizhe Li, Yichen Lai, Wenlong Xia, Haoming Song, Youmin Gong, Jie Mei

AI总结针对机器人操作中扩散策略的高计算成本问题，从频域角度分析动作轨迹的平滑性，提出轻量级3D扩散策略Hyper-DP3，使用扩散混合器解码器和两步DDIM推理，以极低参数和延迟实现最先进性能。

详情

AI中文摘要

基于扩散的视觉运动策略在机器人操作中表现良好，但当前方法仍继承了图像生成风格的解码器和多步采样。我们从频域角度重新审视这一设计。机器人动作轨迹高度平滑，大部分能量集中在少数低频离散余弦变换模式上。在此结构下，我们证明最优去噪器的误差受低频子空间维度和残余高频能量限制，意味着去噪误差在很少的反向步骤后即饱和。这也表明动作去噪需要比图像生成简单得多的去噪模型。受此启发，我们提出Hyper-DP3（HDP3），一种口袋大小的3D扩散策略，具有轻量级扩散混合器解码器，支持两步DDIM推理。我们的合成实验验证了理论，并支持两步去噪的充分性。此外，在RoboTwin2.0、Adroit、MetaWorld和真实世界任务中，HDP3以不到先前基于3D扩散策略1%的参数和显著更低的推理延迟实现了最先进的性能。

英文摘要

Diffusion-based visuomotor policies perform well in robotic manipulation, yet current methods still inherit image-generation-style decoders and multi-step sampling. We revisit this design from a frequency-domain perspective. Robot action trajectories are highly smooth, with most energy concentrated in a few low-frequency discrete cosine transform modes. Under this structure, we show that the error of the optimal denoiser is bounded by the low-frequency subspace dimension and residual high-frequency energy, implying that denoising error saturates after very few reverse steps. This also suggests that action denoising requires a much simpler denoising model than image generation. Motivated by this insight, we propose Hyper-DP3 (HDP3), a pocket-scale 3D diffusion policy with a lightweight Diffusion Mixer decoder that supports two-step DDIM inference. Our synthetic experiments validate the theory and support the sufficiency of two-step denoising. Futhermore, across RoboTwin2.0, Adroit, MetaWorld, and real-world tasks, HDP3 achieves state-of-the-art performance with fewer than 1% of the parameters of prior 3D diffusion-based policies and substantially lower inference latency.

URL PDF HTML ☆

赞 0 踩 0

2604.12579 2026-06-01 cs.LG

EEG-Based Multimodal Learning via Hyperbolic Mixture-of-Curvature Experts

基于EEG的多模态学习：曲率混合专家双曲空间方法

Runhe Zhou, Shanglin Li, Guanxiang Huang, Xinliang Zhou, Qibin Zhao, Motoaki Kawanabe, Yi Ding, Cuntai Guan

AI总结提出EEG-MoCE框架，通过可学习曲率的双曲空间为每个模态分配专家，并采用曲率感知融合策略，实现层次结构建模，在情绪识别、睡眠分期和认知评估任务上达到最优性能。

Comments Accepted at the Forty-third International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

基于脑电图（EEG）的多模态学习将脑信号与互补模态相结合，以改善精神状态评估，具有巨大的临床潜力。这种范式的有效性在很大程度上取决于异构模态上的表示学习。对于基于EEG的范式，一种有前景的方法是利用其层次结构，因为最近的研究表明，EEG和相关模态（例如面部表情）都表现出反映复杂认知过程的层次结构。然而，欧几里得嵌入由于其平坦的几何结构难以表示这些层次结构，而双曲空间由于其指数增长特性，天然适合表示层次结构。在这项工作中，我们提出了EEG-MoCE，一种新颖的基于双曲曲率混合专家框架，专为多模态神经技术设计。EEG-MoCE将每个模态分配给一个具有可学习曲率的双曲空间中的专家，从而能够自适应地建模其内在几何结构。然后，一种曲率感知融合策略动态加权专家，强调具有更丰富层次信息的模态。在基准数据集上的大量实验表明，EEG-MoCE在情绪识别、睡眠分期和认知评估等任务上达到了最先进的性能。代码可在https://github.com/zhourunhe/EEG-MoCE获取。

英文摘要

Electroencephalography (EEG)-based multimodal learning integrates brain signals with complementary modalities to improve mental state assessment, providing great clinical potential. The effectiveness of such paradigms largely depends on the representation learning on heterogeneous modalities. For EEG-based paradigms, one promising approach is to leverage their hierarchical structures, as recent studies have shown that both EEG and associated modalities (e.g., facial expressions) exhibit hierarchical structures reflecting complex cognitive processes. However, Euclidean embeddings struggle to represent these hierarchical structures due to their flat geometry, while hyperbolic spaces, with their exponential growth property, are naturally suited for them. In this work, we propose EEG-MoCE, a novel hyperbolic mixture-of-curvature experts framework designed for multimodal neurotechnology. EEG-MoCE assigns each modality to an expert in a learnable-curvature hyperbolic space, enabling adaptive modeling of its intrinsic geometry. A curvature-aware fusion strategy then dynamically weights experts, emphasizing modalities with richer hierarchical information. Extensive experiments on benchmark datasets demonstrate that EEG-MoCE achieves state-of-the-art performance, including emotion recognition, sleep staging, and cognitive assessment. Code is available at https://github.com/zhourunhe/EEG-MoCE.

URL PDF HTML ☆

赞 0 踩 0

2602.16165 2026-06-01 cs.LG cs.AI

HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents

HiPER: 具有显式信用分配的分层强化学习用于大型语言模型智能体

Jiangweizhi Peng, Yuanxin Liu, Ruida Zhou, Charles Fleming, Zhaoran Wang, Alfredo Garcia, Mingyi Hong

AI总结针对稀疏奖励长程任务中LLM智能体信用分配困难的问题，提出HiPER分层规划-执行框架，通过分层优势估计（HAE）在规划和执行层面显式分配信用，在ALFWorld和WebShop上达到97.4%和83.3%的成功率。

Comments ICML 2026

详情

AI中文摘要

将LLM训练为用于多轮决策的交互式智能体仍然具有挑战性，特别是在具有稀疏和延迟奖励的长程任务中，智能体必须在获得有意义的反馈之前执行一系列扩展的动作。大多数现有的强化学习方法将LLM智能体建模为在单一时间尺度上运行的扁平策略，每轮选择一个动作。在稀疏奖励设置中，这种扁平策略必须跨整个轨迹传播信用，而没有显式的时间抽象，这常常导致不稳定的优化和低效的信用分配。我们提出HiPER，一种新颖的分层规划-执行强化学习框架，明确地将高层规划与低层执行分开。HiPER将策略分解为一个提出子目标的高层规划器和一个在多个动作步骤中执行这些子目标的低层执行器。为了将优化与此结构对齐，我们引入了一种称为分层优势估计（HAE）的关键技术，该技术在规划和执行层面仔细分配信用。通过聚合每个子目标执行过程中的回报并协调两个层面的更新，HAE提供了无偏的梯度估计器，并且与扁平广义优势估计相比，可证明地减少了方差。实验上，HiPER在具有挑战性的交互式基准测试中达到了最先进的性能，在ALFWorld上达到97.4%的成功率，在WebShop上达到83.3%的成功率（使用Qwen2.5-7B-Instruct，分别比先前最佳方法高出6.6%和8.3%），在需要多个依赖子任务的长程任务上尤其取得了巨大收益。这些结果突显了显式层次分解对于多轮LLM智能体的可扩展RL训练的重要性。

英文摘要

Training LLMs as interactive agents for multi-turn decision-making remains challenging, particularly in long-horizon tasks with sparse and delayed rewards, where agents must execute extended sequences of actions before receiving meaningful feedback. Most existing reinforcement learning (RL) approaches model LLM agents as flat policies operating at a single time scale, selecting one action at each turn. In sparse-reward settings, such flat policies must propagate credit across the entire trajectory without explicit temporal abstraction, which often leads to unstable optimization and inefficient credit assignment. We propose HiPER, a novel Hierarchical Plan-Execute RL framework that explicitly separates high-level planning from low-level execution. HiPER factorizes the policy into a high-level planner that proposes subgoals and a low-level executor that carries them out over multiple action steps. To align optimization with this structure, we introduce a key technique called hierarchical advantage estimation (HAE), which carefully assigns credit at both the planning and execution levels. By aggregating returns over the execution of each subgoal and coordinating updates across the two levels, HAE provides an unbiased gradient estimator and provably reduces variance compared to flat generalized advantage estimation. Empirically, HiPER achieves state-of-the-art performance on challenging interactive benchmarks, reaching 97.4\% success on ALFWorld and 83.3\% on WebShop with Qwen2.5-7B-Instruct (+6.6\% and +8.3\% over the best prior method), with especially large gains on long-horizon tasks requiring multiple dependent subtasks. These results highlight the importance of explicit hierarchical decomposition for scalable RL training of multi-turn LLM agents.

URL PDF HTML ☆

赞 0 踩 0

2605.08145 2026-06-01 cs.CV cs.AI cs.LG

Self-Captioning Multimodal Interaction Tuning: Amplifying Exploitable Redundancies for Robust Vision Language Models

自描述多模态交互调优：放大可利用冗余以实现鲁棒的视觉语言模型

Yuriel Ryan, Hei Man Ip, Adriel Kuek, Paul Pu Liang, Roy Ka-Wei Lee

AI总结针对视觉语言模型中的幻觉和鲁棒性问题，提出自描述多模态交互调优方法，通过放大模态间冗余信息来补偿受损模态，并设计多模态交互门机制将独特交互转化为冗余交互，实验表明该方法可减少38.3%的视觉诱导错误并提升16.8%的一致性。

Comments Accepted to ICML 2026. Code: https://github.com/yurielryan/Multimodal-Interaction-Tuning

详情

AI中文摘要

当前的视觉语言模型在面对模糊或受损模态时存在幻觉和鲁棒性问题。我们假设这些问题可以通过利用模态间的共享信息来补偿受损模态得到解决。为此，我们分析了多模态交互——模态提供的冗余（共享）、独特（排他）和协同（涌现）任务相关信息——以确定它们对模型可靠性的影响。具体来说，放大冗余交互将增加这种可利用的共享信息以解决这些问题；然而，现代指令数据集通常消除冗余以优先考虑视觉定位。我们通过一个自描述工作流弥合这一差距，该工作流包含一个 extsc{多模态交互门}：一种将独特交互转化为冗余交互的机制。我们的发现表明，增加冗余可以减少38.3%的视觉诱导错误，并提高16.8%的一致性。

英文摘要

Current vision language models face hallucination and robustness issues against ambiguous or corrupted modalities. We hypothesize that these issues can be addressed by exploiting the shared information between modalities to compensate for the impaired one. To this end, we analyze multimodal interactions -- redundant (shared), unique (exclusive), and synergistic (emergent) task-relevant information provided by the modalities -- to determine their impacts on model reliability. Specifically, amplifying redundant interactions would increase this exploitable shared information to resolve these issues; yet, modern instruction datasets often eliminate redundancies to prioritize visual grounding. We bridge this gap through a self-captioning workflow featuring a \textsc{Multimodal Interaction Gate}: a mechanism to convert unique interactions into redundant interactions. Our findings suggest that increasing redundancy can reduce visual induced errors by 38.3\% and improve consistency by 16.8\%.

URL PDF HTML ☆

赞 0 踩 0

2605.06831 2026-06-01 cs.LG cs.AI

Why DDIM Hallucinates More Than DDPM: A Theoretical Analysis of Reverse Dynamics

为什么DDIM比DDPM更容易产生幻觉：反向动力学的理论分析

Muhammad H. Ashiq, Samanyu Arora, Abhinav N. Harish, Ishaan Kharbanda, Hung Yun Tseng, Grigorios G. Chrysos

AI总结通过理论分析高斯混合目标下的反向ODE（DDIM）和SDE（DDPM），证明在临界时间τ后DDIM会卡在两个最近模式之间的线段上，而DDPM的随机性帮助其脱离该区域从而避免幻觉。

Comments Accepted in ICML

2605.06137 2026-06-01 cs.CV cs.AI cs.LG

Autoregressive Visual Generation Needs a Prologue

自回归视觉生成需要一个序幕

Bowen Zheng, Weijian Luo, Guang Yang, Colin Zhang, Tianyang Hu

AI总结提出Prologue方法，通过生成前置的序幕令牌来弥合自回归图像生成中的重建-生成差距，在不影响重建质量的前提下显著提升生成性能。

Comments Code: https://github.com/Zyriix/prologue Demo: https://huggingface.co/spaces/Zyriix/prologue-demo

详情

AI中文摘要

在这项工作中，我们提出了Prologue，一种弥合自回归（AR）图像生成中重建-生成差距的方法。Prologue不修改视觉令牌以同时满足重建和生成，而是生成一小部分序幕令牌，并将其前置到视觉令牌序列之前。这些序幕令牌仅使用AR交叉熵（CE）损失进行训练，而视觉令牌则专用于重建。这种解耦设计使我们能够通过AR模型的真实分布优化生成，而不影响重建质量，我们进一步从ELBO角度形式化了这一点。在ImageNet 256x256上，Prologue-Base在没有无分类器引导的情况下将gFID从21.01降至10.75，同时几乎保持重建不变；Prologue-Large使用标准AR模型，无需辅助语义监督，达到了具有竞争力的rFID 0.99和gFID 1.46。有趣的是，仅由AR梯度驱动，序幕令牌展现出涌现的语义结构：对16个序幕令牌进行线性探测达到35.88%的Top-1准确率，远高于标准分词器前16个令牌的23.71%；使用固定序幕令牌进行重采样保留了相似的高层语义布局。我们的结果暗示了一个新方向：通过引入单独学习的生成表示，同时保持原始表示不变，可以提升生成质量。

英文摘要

In this work, we propose Prologue, an approach to bridging the reconstruction-generation gap in autoregressive (AR) image generation. Instead of modifying visual tokens to satisfy both reconstruction and generation, Prologue generates a small set of prologue tokens prepended to the visual token sequence. These prologue tokens are trained exclusively with the AR cross-entropy (CE) loss, while visual tokens remain dedicated to reconstruction. This decoupled design lets us optimize generation through the AR model's true distribution without affecting reconstruction quality, which we further formalize from an ELBO perspective. On ImageNet 256x256, Prologue-Base reduces gFID from 21.01 to 10.75 without classifier-free guidance while keeping reconstruction almost unchanged; Prologue-Large reaches a competitive rFID of 0.99 and gFID of 1.46 using a standard AR model without auxiliary semantic supervision. Interestingly, driven only by AR gradients, prologue tokens exhibit emergent semantic structure: linear probing on 16 prologue tokens reaches 35.88% Top-1, far above the 23.71% of the first 16 tokens from a standard tokenizer; resampling with fixed prologue tokens preserves a similar high-level semantic layout. Our results suggest a new direction: generation quality can be improved by introducing a separate learned generative representation while leaving the original representation intact.

URL PDF HTML ☆

赞 0 踩 0

2605.05520 2026-06-01 cs.LG stat.AP stat.ML

Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors

使用商业微波链路和扩散模型先验的贝叶斯雨场重建

Badr Moufad, Albina Ilina, Hai Victor Habi, Salem Lahlou, Yazid Janati, Hagit Messer, Eric Moulines

AI总结提出将雨场重建视为贝叶斯逆问题，利用扩散模型作为高保真空间先验，通过无需训练的后验采样方法（如即插即用、序贯蒙特卡洛和副本交换）实现优于传统方法的性能。

Comments Added link to source code

详情

Journal ref: ICML 2026

AI中文摘要

商业微波链路（CML）为降雨感知提供了密集的空间覆盖，但其产生的路径积分测量使得精确的地面重建具有挑战性。现有方法通常将CML简化为点传感器，并忽略降雨与信号衰减之间的线积分关系，导致在非均匀降水条件下性能下降。在这项工作中，我们将雨场重建视为一个贝叶斯逆问题，使用扩散模型（DM）作为高保真空间先验。我们表明，与删失高斯过程相比，扩散模型能更好地保留关键降雨统计量。将降雨估计视为具有DM先验的贝叶斯逆问题，使得可以使用广泛的无需训练的后验采样方法，包括即插即用、序贯蒙特卡洛和副本交换方法。在合成和真实世界数据集上的实验表明，与基于CML的现有重建基线相比，该方法具有一致的改进。

英文摘要

Commercial Microwave Links (CMLs) offer dense spatial coverage for rainfall sensing but produce path-integrated measurements that make accurate ground-level reconstruction challenging. Existing methods typically oversimplify CMLs as point sensors and neglect line integration relating rainfall to signal attenuation, resulting in degraded performance under heterogeneous precipitation. In this work, we view rain field reconstruction as a Bayesian inverse problem with Diffusion Models (DMs) as high-fidelity spatial priors. We show that diffusion models better preserve key rainfall statistics compared to censored Gaussian processes. Framing rainfall estimation as a Bayesian inverse problem with a DM prior enables training-free posterior sampling using a broad family of methods, including Plug-and-Play, Sequential Monte Carlo, and Replica Exchange methods. Experiments on synthetic and real-world datasets demonstrate consistent improvements over established CML-based reconstruction baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.01134 2026-06-01 cs.AI

To Use AI as Dice of Possibilities with Timing Computation

将AI用作带有时序计算的可能性骰子

Jia Li, Vipin Kumar, Rui Zhang

AI总结本文提出基于动词的范式，定义时序计算和可能性，使AI能作为实现思维语法的工具，并在乳腺癌患者数据上自动发现临床轨迹和反事实时序推断。

2510.03096 2026-06-01 cs.LG

Adaptive Node Feature Selection For Graph Neural Networks

图神经网络的自适应节点特征选择

Madeline Navarro, Ali Azizpour, Santiago Segarra

AI总结提出一种自适应节点特征选择方法，通过置换特征值后验证性能的变化来识别和移除无关特征，适用于任意数据、模型和任务。

详情

AI中文摘要

我们为图神经网络（GNN）提出了一种自适应节点特征选择方法，能够在训练过程中识别并移除不必要的特征。衡量特征对模型输出的贡献能力对于解释决策和通过消除无帮助变量来降低维度至关重要。然而，图结构数据引入了复杂的依赖关系，可能不适合经典的特征重要性度量。受此启发，我们提出了一种数据、模型和任务无关的方法，该方法基于置换特征值后验证性能的变化，在训练过程中确定相关特征。我们从理论上通过刻画节点数据与图结构之间的关系如何影响GNN性能来论证我们的方法。实验表明：（i）我们的高度通用方法可与利用先验假设的定制特征选择方法相媲美；（ii）在GNN完全训练之前，我们就能返回有意义的特征重要性分数；（iii）我们的分数明显提取了与各种图学习设置中特征重要性相关的属性。

英文摘要

We propose an adaptive node feature selection approach for graph neural networks (GNNs) that identifies and removes unnecessary features during training. The ability to measure how features contribute to model output is key for interpreting decisions and reducing dimensionality by eliminating unhelpful variables. However, graph-structured data introduces complex dependencies that may be unsuited to classical feature importance metrics. Inspired by this, we present a data-, model-, and task-agnostic method that determines relevant features during training based on changes in validation performance upon permuting feature values. We theoretically motivate our approach by characterizing how the relationships between node data and graph structure influences GNN performance. Empirically, we show that (i) our highly general approach rivals the performance of tailored feature selection approaches that exploit prior assumptions; (ii) we return meaningful feature importance scores well before the GNN is fully trained; and (iii) our scores demonstrably extract relevant properties that inform feature importance for various graph learning settings.

URL PDF HTML ☆

赞 0 踩 0

2605.00265 2026-06-01 cs.LG

Polaris: Coupled Orbital Polar Embeddings for Hierarchical Concept Learning

Polaris: 用于层次概念学习的耦合轨道极坐标嵌入

Sahil Mishra, Srinitish Srinivasan, Sourish Dasgupta, Tanmoy Chakraborty

AI总结提出Polaris极坐标超球面嵌入框架，通过角度和半径分离语义与层次，结合局部约束、全局正则化和不确定性感知非对称目标，在多种层次结构扩展任务中显著提升检索性能。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

现实世界的知识通常组织为层次结构，如产品分类法、医学本体和标签树，但由于非对称结构和噪声语义，学习层次表示具有挑战性。我们引入了Polaris，一个极坐标超球面嵌入框架，它使用角度几何和半径将语义性与层次分离，使得在不干扰的情况下学习意义和结构。为了将潜在表示映射到球面上，我们将其投影到北极的切空间，应用指数映射，并使用球面线性层学习单位范数表示。Polaris结合了鲁棒的局部约束、防止几何坍缩的全局正则化以及鼓励方向包含的不确定性感知非对称目标。在推理时，Polaris使用结构引导检索在最终排序前高效缩小候选父节点范围。我们在分类法扩展的不同设置上评估Polaris——涵盖树、多父DAG和多模态层次结构，在top-K检索中一致提升高达约19个点，在14个强基线上平均排名降低高达约60%。

英文摘要

Real-world knowledge is often organized as hierarchies such as product taxonomies, medical ontologies, and label trees, yet learning hierarchical representations is challenging due to asymmetric structure and noisy semantics. We introduce Polaris, a polar hyperspherical embedding framework that separates semanticity from hierarchy using angular geometry and radius, enabling the learning of meaning and structure without interference. To map latent representation onto the sphere, we project it to the tangent space at the north pole, apply the exponential map, and learn unit-norm representations using spherical linear layers. Polaris then combines robust local constraints, global regularization that prevents geometric collapse, and uncertainty-aware asymmetric objectives that encourage directional containment. At inference time, Polaris uses structure-guided retrieval to efficiently narrow down candidate parents before final ranking. We evaluate Polaris on different settings of taxonomy expansion - spanning trees, multi-parent DAGs, and multimodal hierarchies, showing consistent improvements of up to ~19 points in top-K retrieval and up to ~60% reduction in mean rank over fourteen strong baselines.

URL PDF HTML ☆

赞 0 踩 0

2604.26262 2026-06-01 cs.CV

Semantic Foam: Unifying Spatial and Semantic Scene Decomposition

Semantic Foam：统一空间与语义场景分解

Amr Sharafeldin, Shrisudhan Govindarajan, Thomas Walker, Aryan Mikaeili, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi

AI总结提出Semantic Foam，通过扩展Radiant Foam表示，结合Voronoi网格的空间分解和显式语义特征场，实现高质量、一致性的语义分割。

Comments 15 pages, 10 figures, Accepted to CVPR 2026 (Highlight) , Project page: http://semanticfoam.github.io/

详情

AI中文摘要

现代场景重建方法，如3D高斯泼溅，能够以实时速度实现照片级真实感的新视角合成，但它们在交互式图形应用中的采用受到限制。一个主要瓶颈是与传统人工创作的3D资产相比，与这些表示进行交互的难度。尽管先前的研究尝试对这些模型施加语义分解，但在分割质量和一致性方面仍然存在重大挑战。为了解决这个问题，我们引入了Semantic Foam，将最近提出的Radiant Foam表示扩展到语义分解任务。我们的方法将Radiant Foam的Voronoi网格的自然空间体积分解与在单元级别参数化的显式语义特征场相结合。这种显式结构能够直接进行空间正则化，从而防止由遮挡或跨视图不一致监督引起的伪影——这是其他基于点的表示的常见问题。实验结果表明，与Gaussian Grouping和SAGA等最先进方法相比，我们的方法在对象级分割性能上达到或超越了它们。

英文摘要

Modern scene reconstruction methods, such as 3D Gaussian Splatting, deliver photo-realistic novel view synthesis at real-time speeds, yet their adoption in interactive graphics applications has been limited. A major bottleneck is the difficulty of interacting with these representations compared to traditional, human-authored 3D assets. While previous research has attempted to impose semantic decomposition on these models, significant challenges remain regarding segmentation quality and consistency. To address this, we introduce Semantic Foam, extending the recently proposed Radiant Foam representations to semantic decomposition tasks. Our approach integrates the natural spatial volumetric decomposition of Radiant Foam's Voronoi mesh with an explicit semantic feature field parameterized at the cell level. This explicit structure enables direct spatial regularization, which prevents artifacts caused by occlusion or inconsistent supervision across views - common pitfalls for other point-based representations. Experimental results show that our method achieves comparable or superior object-level segmentation performance compared to state-of-the-art methods like Gaussian Grouping and SAGA.

URL PDF HTML ☆

赞 0 踩 0

2602.03216 2026-06-01 cs.CL cs.LG

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Token Sparse Attention: 交错令牌选择的高效长上下文推理

Dongwon Jo, Beomseok Kang, Jiwon Song, Jae-Joon Kim

AI总结提出Token Sparse Attention，一种轻量级动态令牌级稀疏化机制，通过交错选择令牌并在注意力前后压缩/解压缩，实现高效长上下文推理，在128K上下文中获得高达3.23倍加速且精度损失小于1%。

Comments ICML 2026

详情

AI中文摘要

注意力的二次复杂度仍然是大语言模型长上下文推理的核心瓶颈。先前的加速方法要么使用结构化模式稀疏化注意力图，要么在特定层永久驱逐令牌，这可能会保留不相关的令牌或依赖不可逆的早期决策，尽管令牌重要性具有层/头动态性。在本文中，我们提出Token Sparse Attention，一种轻量级动态令牌级稀疏化机制，在注意力期间将每个头的$Q$、$K$、$V$压缩到减少的令牌集，然后将输出解压缩回原始序列，使得令牌信息可以在后续层中重新考虑。此外，Token Sparse Attention在令牌选择和稀疏注意力的交叉点上暴露了一个新的设计点。我们的方法完全兼容密集注意力实现，包括Flash Attention，并且可以无缝地与现有的稀疏注意力内核组合。实验结果表明，Token Sparse Attention持续改善精度-延迟权衡，在128K上下文中实现高达3.23倍的注意力加速，且精度下降小于1%。这些结果表明，动态和交错的令牌级稀疏化是可扩展长上下文推理的一种互补且有效的策略。

英文摘要

The quadratic complexity of attention remains the central bottleneck in long-context inference for large language models. Prior acceleration methods either sparsify the attention map with structured patterns or permanently evict tokens at specific layers, which can retain irrelevant tokens or rely on irreversible early decisions despite the layer-/head-wise dynamics of token importance. In this paper, we propose Token Sparse Attention, a lightweight and dynamic token-level sparsification mechanism that compresses per-head $Q$, $K$, $V$ to a reduced token set during attention and then decompresses the output back to the original sequence, enabling token information to be reconsidered in subsequent layers. Furthermore, Token Sparse Attention exposes a new design point at the intersection of token selection and sparse attention. Our approach is fully compatible with dense attention implementations, including Flash Attention, and can be seamlessly composed with existing sparse attention kernels. Experimental results show that Token Sparse Attention consistently improves accuracy-latency trade-off, achieving up to $\times$3.23 attention speedup at 128K context with less than 1% accuracy degradation. These results demonstrate that dynamic and interleaved token-level sparsification is a complementary and effective strategy for scalable long-context inference.

URL PDF HTML ☆

赞 0 踩 0

2508.21762 2026-06-01 cs.CL cs.AI

Reasoning-Intensive Regression

推理密集型回归

Diane Tchuindjo, Omar Khattab

AI总结针对推理密集型回归任务，提出MENTAT方法，结合批量反思提示优化与神经集成学习，在基准测试中相比基线提升高达65%。

详情

AI中文摘要

AI研究人员和从业者越来越多地将大型语言模型（LLMs）应用于我们称之为推理密集型回归（RiR）的任务，即从文本中推断细微的数值分数。与情感分析或相似性分析等标准语言回归任务不同，RiR通常出现在临时应用中，例如基于评分标准的评分、复杂环境中的密集奖励建模或特定领域的检索，这些任务需要对上下文进行更深入的分析，而可用的任务特定训练数据和计算资源有限。我们将四个实际问题作为RiR任务，建立初始基准，并用于测试我们的假设：即冻结的LLMs和通过梯度下降微调Transformer编码器在RiR中通常都会遇到困难。然后，我们提出MENTAT，一种简单轻量的方法，结合批量反思提示优化与神经集成学习。MENTAT在两个基线上实现了高达65%的提升，尽管未来仍有很大的改进空间。

英文摘要

AI researchers and practitioners increasingly apply large language models (LLMs) to what we call reasoning-intensive regression (RiR), i.e., deducing subtle numerical scores from text. Unlike standard language regression tasks such as sentiment or similarity analysis, RiR often appears instead in ad-hoc applications such as rubric-based scoring, modeling dense rewards in complex environments, or domain-specific retrieval, where much deeper analysis of context is required while only limited task-specific training data and computation are available. We cast four realistic problems as RiR tasks to establish an initial benchmark, and use that to test our hypothesis that prompting frozen LLMs and fine-tuning Transformer encoders via gradient descent will both often struggle in RiR. We then propose MENTAT, a simple and lightweight method that combines batch-reflective prompt optimization with neural ensemble learning. MENTAT achieves up to 65% improvement over both baselines, though substantial room remains for future advances.

URL PDF HTML ☆

赞 0 踩 0

2604.28020 2026-06-01 cs.LG

Cost-Aware Learning

成本感知学习

Clara Mohri, Amir Globerson, Haim Kaplan, Tomer Koren, Yishay Mansour

AI总结针对有限和优化中不同组件采样成本不同的问题，提出基于梯度范数和成本的采样分布算法Cost-Aware SGD，并应用于语言模型强化学习，显著降低策略优化中的token使用量。

详情

AI中文摘要

我们考虑成本感知学习问题，其中对有限和目标的各个组件进行采样会产生不同的成本。目标是在最小化总成本的同时达到目标误差。我们提出了成本感知SGD，它使用基于梯度范数和成本的分布来采样组件。我们对该算法进行了深入分析，包括相对于基线的成本改进界限、分布代理次优性的刻画以及下界。我们将理论见解应用于语言模型的强化学习，其中序列级策略梯度的计算成本随长度变化。我们发现优势幅度作为梯度范数的高保真代理，并据此引入成本感知GRPO。在1.5B、4B和8B LLM上的实验结果表明，该算法显著减少了策略优化中使用的token数量，同时匹配或超过基线准确率。

英文摘要

We consider the problem of Cost-Aware Learning, where sampling different components of a finite-sum objective incurs different costs. The objective is to reach a target error while minimizing the total cost. We propose Cost-Aware SGD, which uses a distribution based on gradient norms and costs to sample components. We provide a thorough analysis of this algorithm, including cost-improvement bounds over baselines, a characterization of distribution proxy sub-optimality, and a lower bound. We apply our theoretical insights to reinforcement learning with language models, where the computational cost of sequence-level policy gradients varies with length. We find that the advantage magnitude serves as a high-fidelity proxy for gradient norms, and use this to introduce Cost-Aware GRPO. Empirical results on 1.5B, 4B, and 8B LLMs demonstrate that this algorithm significantly reduces the tokens used in policy optimization while matching or exceeding baseline accuracy.

URL PDF HTML ☆

赞 0 踩 0

2604.27994 2026-06-01 cs.RO

Dreaming Across Towns: Semantic Rollout and Town-Adversarial Regularization for Zero-Shot Held-Out-Town Fixed-Route Driving in CARLA

跨城镇驾驶：面向CARLA零样本未见城镇固定路线驾驶的语义展开与城镇对抗正则化

Feeza Khan Khanzada, Jaerock Kwon

AI总结提出一种结合未来语义预测与城镇对抗正则化的训练方法，在仅使用Town05和Town06训练的情况下，提升CARLA驾驶代理在未见城镇Town03和Town04上的零样本迁移性能。

详情

AI中文摘要

在一个模拟城镇中训练的驾驶代理往往在新城镇中表现不佳，因为道路形状、交叉口和车道布局可能不同。本文研究如何在CARLA驾驶模拟器中改进这种迁移，而不向代理提供来自测试城镇的任何训练数据。代理仅在Town05和Town06中训练，然后直接在Town03和Town04中评估。为了聚焦于道路布局差异，所有实验使用相同的天气和交通设置。我们提出一种训练方法，鼓励代理学习跨城镇有用的特征，而不是与单个训练城镇绑定的特征。在训练过程中，代理被要求预测未来相机视图的高层视觉含义，并且被阻止依赖那些揭示数据来自哪个源城镇的线索。这些额外的学习信号仅在训练期间使用；在测试时，驾驶策略使用与基线代理相同的观测和控制接口。在与匹配的DreamerV3风格世界模型驾驶代理的受控比较中，所提出的方法在未见城镇上取得了最高的平均成功率：在Town03上为36.6%，95%置信区间[30.5, 42.7]；在Town04上为85.6%，95%置信区间[84.0, 87.2]（基于五个训练种子计算）。针对最强基线的种子配对测试显示，在两个未见城镇上成功率差异均为正。额外实验表明，单独预测未来视觉含义或单独去除城镇特定线索不足以匹配组合方法。这些结果表明，将未来场景理解与减少对源城镇特定特征的依赖相结合，可以改善该CARLA设置下的跨城镇驾驶性能。

英文摘要

Driving agents trained in one simulated town often perform poorly in a new town because the road shapes, intersections, and lane layouts can be different. This paper studies how to improve this kind of transfer in the CARLA driving simulator without giving the agent any training data from the test towns. The agent is trained only in Town05 and Town06, then evaluated directly in Town03 and Town04. To focus on road-layout differences, all experiments use the same weather and traffic settings. We propose a training method that encourages the agent to learn features that are useful across towns rather than features tied to one training town. During training, the agent is asked to predict the high-level visual meaning of future camera views and is also discouraged from relying on cues that reveal which source town the data came from. These extra learning signals are used only during training; at test time, the driving policy uses the same observation and control interface as the baseline agent. In controlled comparisons with matched DreamerV3-style world-model driving agents, the proposed method achieves the highest mean held-out success: 36.6\% on Town03 with a 95\% confidence interval of [30.5, 42.7] and 85.6\% on Town04 with a 95\% confidence interval of [84.0, 87.2], computed across five training seeds. Seed-paired tests against the strongest primary baselines show positive success-rate differences in both held-out towns. Additional experiments show that predicting future visual meaning alone or removing town-specific cues alone is not enough to match the combined method. These results suggest that combining future-scene understanding with reduced reliance on source-town-specific features can improve cross-town driving performance in this CARLA setting.

URL PDF HTML ☆

赞 0 踩 0

2604.27617 2026-06-01 cs.CV cs.AI

Robust Lightweight Crack Classification for Real-Time UAV Bridge Inspection

用于实时无人机桥梁检测的鲁棒轻量级裂缝分类

Wei Li, Haisheng Li, Weijie Li, Jiandong Wang, Kaichen Ma, Luming Yang

AI总结提出一个由轻量级骨干网络、CBAM注意力模块、基于场景先验的定向鲁棒增强策略和Focal Loss组成的统一轻量级CNN框架，在SDNET2018数据集上以11.21M参数和1.82G FLOPs实现825 FPS推理速度，F1分数提升2.51%，召回率提升3.95%。

详情

AI中文摘要

随着无人机在桥梁结构健康监测中的广泛应用，基于深度学习的自动裂缝检测已成为主要研究热点。然而，实际无人机检测仍面临四个关键挑战：弱裂缝特征、退化成像条件、严重类别不平衡以及实际无人机检测工作流程中有限的计算资源。为了解决这些问题，本文提出了一个统一的轻量级卷积神经网络框架，由四个协同组件组成：轻量级骨干网络、用于通道和空间增强的卷积块注意力模块（CBAM）、基于检测场景先验的定向鲁棒增强策略，以及用于类别不平衡下难样本学习的Focal Loss。在SDNET2018桥面数据集上的实验表明，所提方法仅以11.21M参数和1.82G FLOPs实现了825 FPS的推理速度。与基线模型相比，完整框架的F1分数提高了2.51%，召回率提高了3.95%。此外，Grad-CAM可视化表明，引入的注意力模块将模型关注点从分散区域转移到沿裂缝轨迹的精确跟踪。总体而言，本研究在准确性、速度和鲁棒性之间取得了强平衡，为无人机桥梁检测中地面站辅助的实时部署提供了实用解决方案。源代码可在 https://github.com/skylynf/AttXNet 获取。

英文摘要

With the widespread application of Unmanned Aerial Vehicles (UAVs) in bridge structural health monitoring, deep learning-based automatic crack detection has become a major research focus. However, practical UAV inspections still face four key challenges: weak crack features, degraded imaging conditions, severe class imbalance, and limited computational resources for practical UAV inspection workflows. To address these issues, this paper proposes a unified lightweight convolutional neural network framework composed of four synergistic components: a lightweight backbone network, a Convolutional Block Attention Module (CBAM) for channel and spatial enhancement, a directed robust augmentation strategy based on inspection-scene priors, and Focal Loss for hard-sample learning under class imbalance. Experiments on the SDNET2018 bridge deck dataset show that the proposed method achieves an inference speed of 825 FPS with only 11.21M parameters and 1.82G FLOPs. Compared with the baseline model, the complete framework improves the F1-score by 2.51% and recall by 3.95%. In addition, Grad-CAM visualizations indicate that the introduced attention module shifts the model's focus from scattered regions to precise tracking along crack trajectories. Overall, this study achieves a strong balance among accuracy, speed, and robustness, providing a practical solution for ground-station assisted real-time deployment in UAV bridge inspections. The source code is available at: https://github.com/skylynf/AttXNet .

URL PDF HTML ☆

赞 0 踩 0

2604.21928 2026-06-01 cs.CL

Evaluation of Automatic Speech Recognition Using Generative Large Language Models

使用生成式大语言模型评估自动语音识别

Thibault Bañeras-Roux, Shashi Kumar, Driss Khalil, Sergio Burdisso, Petr Motlicek, Shiran Liu, Mickael Rouvier, Jane Wottawa, Richard Dufour

AI总结本文提出利用生成式大语言模型通过假设选择、语义距离计算和错误分类三种方法评估ASR，在HATS数据集上达到92-94%的人类一致性，优于WER和语义指标。

2604.16922 2026-06-01 cs.AI

ClimAgent: LLM as Agents for Autonomous Open-ended Climate Science Analysis

ClimAgent：基于大语言模型的自主开放式气候科学分析智能体

Hao Wang, Jindong Han, Wei Fan, Hao Liu

AI总结提出ClimAgent框架，通过统一工具使用环境和严格推理协议，实现端到端建模与分析，在ClimaBench基准上相比原始LLM方案提升40.21%。

Comments It was submitted without the full consent of all co-authors

详情

AI中文摘要

气候研究对于缓解全球环境危机至关重要，然而多尺度数据集的加速增长和分析工具的复杂性造成了显著瓶颈，将科学发现限制在碎片化且劳动密集的工作流程中。尽管大语言模型（LLMs）的出现为扩展科学专业知识提供了变革性范式，但现有探索仍主要局限于简单的问答（Q&A）任务。这些方法往往过度简化现实世界的挑战，忽视了专业气候科学所需的复杂物理约束和数据驱动特性。为弥补这一差距，我们引入了ClimAgent，一个通用自主框架，旨在跨不同气候子领域执行广泛的研究任务。通过将统一工具使用环境与严格推理协议相结合，ClimAgent超越了简单的检索，实现了端到端的建模与分析。为促进系统评估，我们提出了ClimaBench，这是首个面向真实气候发现的综合基准。它涵盖了源自2000年至2025年间专业场景的5个不同任务类别中的挑战性问题。在ClimaBench上的实验表明，ClimAgent显著优于最先进的基线，在解决方案的严谨性和实用性上比原始LLM解决方案提升了40.21%。我们的代码可在https://github.com/usail-hkust/ClimAgent获取。

英文摘要

Climate research is pivotal for mitigating global environmental crises, yet the accelerating volume of multi-scale datasets and the complexity of analytical tools have created significant bottlenecks, constraining scientific discovery to fragmented and labor-intensive workflows. While the emergence Large Language Models (LLMs) offers a transformative paradigm to scale scientific expertise, existing explorations remain largely confined to simple Question-Answering (Q&A) tasks. These approaches often oversimplify real-world challenges, neglecting the intricate physical constraints and the data-driven nature required in professional climate science.To bridge this gap, we introduce ClimAgent, a general-purpose autonomous framework designed to execute a wide spectrum of research tasks across diverse climate sub-fields. By integrating a unified tool-use environment with rigorous reasoning protocols, ClimAgent transcends simple retrieval to perform end-to-end modeling and analysis. To foster systematic evaluation, we propose ClimaBench, the first comprehensive benchmark for real-world climate discovery. It encompasses challenging problems spanning 5 distinct task categories derived from professional scenarios between 2000 and 2025. Experiments on ClimaBench demonstrate that ClimAgent significantly outperforms state-of-the-art baselines, achieving a 40.21% improvement over original LLM solutions in solution rigorousness and practicality. Our code are available at https://github.com/usail-hkust/ClimAgent.

URL PDF HTML ☆

赞 0 踩 0

2604.20395 2026-06-01 cs.CV cs.RO

SpaCeFormer: Fast Proposal-Free Open-Vocabulary 3D Instance Segmentation

SpaCeFormer: 快速无提议开放词汇3D实例分割

Chris Choy, Junha Lee, Chunghyun Park, Minsu Cho, Jan Kautz

AI总结提出SpaCeFormer，一种基于空间曲线变换的无提议方法，在0.12-0.30秒内完成场景分割，比多阶段2D+3D流水线快2-3个数量级，并构建了最大开放词汇3D实例分割数据集SpaCeFormer-3M，在ScanNet200上零样本mAP达11.1，提升2.8倍。

Comments Project page: https://nvlabs.github.io/SpaCeFormer/

详情

AI中文摘要

开放词汇3D实例分割是机器人和AR/VR的核心能力，但先前方法存在瓶颈：多阶段2D+3D流水线聚合基础模型输出需数百秒每场景，而伪标签端到端方法依赖碎片化掩码和外部区域提议。我们提出SpaCeFormer，一种无提议的空间曲线变换器，在标准基准上每场景运行0.12-0.30秒，比多阶段2D+3D流水线快2-3个数量级。我们将其与SpaCeFormer-3M配对，这是最大的开放词汇3D实例分割数据集（通过多视图掩码聚类和多视图VLM标注构建，包含来自7.4K场景的604K实例的3.0M多视图一致描述）；其掩码召回率比先前单视图流水线高21倍（IoU>0.5时54.3% vs 2.5%）。SpaCeFormer结合空间窗口注意力与Morton曲线序列化以获得空间连贯特征，并使用RoPE增强解码器直接从学习到的查询预测实例掩码，无需外部提议。在ScanNet200上，我们实现11.1零样本mAP，比先前最佳无提议方法提升2.8倍；在ScanNet++和Replica上，我们达到22.9和24.1 mAP，超越包括使用多视图2D输入在内的所有先前方法。

英文摘要

Open-vocabulary 3D instance segmentation is a core capability for robotics and AR/VR, but prior methods trade one bottleneck for another: multi-stage 2D+3D pipelines aggregate foundation-model outputs at hundreds of seconds per scene, while pseudo-labeled end-to-end approaches rely on fragmented masks and external region proposals. We present SpaCeFormer, a proposal-free space-curve transformer that runs in 0.12--0.30 seconds per scene across standard benchmarks, 2--3 orders of magnitude faster than multi-stage 2D+3D pipelines. We pair it with SpaCeFormer-3M, the largest open-vocabulary 3D instance segmentation dataset (3.0M multi-view-consistent captions over 604K instances from 7.4K scenes) built through multi-view mask clustering and multi-view VLM captioning; it reaches 21$\times$ higher mask recall than prior single-view pipelines (54.3% vs 2.5% at IoU$>$0.5). SpaCeFormer combines spatial window attention with Morton-curve serialization for spatially coherent features, and uses a RoPE-enhanced decoder to predict instance masks directly from learned queries without external proposals. On ScanNet200 we achieve 11.1 zero-shot mAP, a 2.8$\times$ improvement over the prior best proposal-free method; on ScanNet++ and Replica, we reach 22.9 and 24.1 mAP, surpassing all prior methods including those using multi-view 2D inputs.

URL PDF HTML ☆

赞 0 踩 0

2604.09429 2026-06-01 cs.CV cs.AI cs.LG

Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories

射线即像素：学习视频与相机轨迹的联合分布

Wonbong Jang, Shikun Liu, Soubhik Sanyal, Juan Camilo Perez, Kam Woh Ng, Sanskar Agrawal, Juan-Manuel Perez-Rua, Yiannis Douratsos, Tao Xiang

AI总结提出一种视频扩散模型（Rays as Pixels），通过将相机表示为密集射线像素（raxels）并与视频帧共享潜在空间，联合去噪实现相机轨迹预测和相机控制视频生成。

Comments Accepted to ICML 2026. 9-page main paper plus supplementary material. Project page: https://wbjang.github.io/raysaspixels/

详情

AI中文摘要

从图像恢复相机参数和从新视角渲染场景在计算机视觉和图形学中被视为独立任务。当图像覆盖稀疏或姿态模糊时，这种分离会失效，因为每个任务依赖于另一个任务的输出。我们提出Rays as Pixels，一种视频扩散模型（VDM），学习视频和相机轨迹的联合分布。据我们所知，这是首个在单一框架内预测相机姿态并进行相机控制视频生成的模型。我们将每个相机表示为密集射线像素（raxels），这是一种与视频帧位于同一潜在空间的像素对齐编码，并通过解耦自交叉注意力机制联合去噪两者。一个训练好的模型处理三个任务：从视频预测相机轨迹、沿预定义轨迹从输入图像生成视频、以及从输入图像联合合成视频和轨迹。我们在姿态估计和相机控制视频生成上进行评估，并引入闭环自一致性测试，显示模型预测的姿态及其基于这些姿态的渲染结果一致。与Plücker嵌入的消融实验证实，将相机与视频共享潜在空间显著更有效。

英文摘要

Recovering camera parameters from images and rendering scenes from novel viewpoints have been treated as separate tasks in computer vision and graphics. This separation breaks down when image coverage is sparse or poses are ambiguous, since each task depends on what the other produces. We propose Rays as Pixels, a Video Diffusion Model (VDM) that learns a joint distribution over videos and camera trajectories. To our knowledge, this is the first model to predict camera poses and do camera-controlled video generation within a single framework. We represent each camera as dense ray pixels (raxels), a pixel-aligned encoding that lives in the same latent space as video frames, and denoise the two jointly through a Decoupled Self-Cross Attention mechanism. A single trained model handles three tasks: predicting camera trajectories from video, generating video from input images along a pre-defined trajectory, and jointly synthesizing video and trajectory from input images. We evaluate on pose estimation and camera-controlled video generation, and introduce a closed-loop self-consistency test showing that the model's predicted poses and its renderings conditioned on those poses agree. Ablations against Plücker embeddings confirm that representing cameras in a shared latent space with video is subtantially more effective.

URL PDF HTML ☆

赞 0 踩 0

2604.20650 2026-06-01 cs.CV

MAPRPose: Mask-Aware Proposal and Amodal Refinement for Multi-Object 6D Pose Estimation

MAPRPose: 面向多目标6D姿态估计的掩膜感知提议与模态补全精化

Yang Luo, Yan Gong, Yongsheng Gao, Xiaoying Sun, Jie Zhao

AI总结提出MAPRPose两阶段框架，通过掩膜感知对应关系生成姿态提议和模态补全驱动的ROI预测实现鲁棒精化，在BOP基准上达到76.5%平均召回率，比FoundationPose高3.1%且多目标推理加速43倍。

详情

AI中文摘要

在杂乱场景中，6D物体姿态估计由于严重遮挡和传感器噪声仍然具有挑战性。我们提出MAPRPose，一个两阶段框架，利用掩膜感知对应关系进行姿态提议，并利用模态补全驱动的感兴趣区域（ROI）预测进行鲁棒精化。在掩膜感知姿态提议（MAPP）阶段，我们将2D对应关系提升到3D空间，建立可靠的关键点匹配，并基于对应关系评分生成几何一致的姿态假设，从中选择前K个候选。在精化阶段，我们引入了一个张量化渲染-比较流水线，集成了模态补全掩膜预测和ROI重新对齐（AMPR）模块。通过重建完整的物体几何并动态调整ROI，AMPR减轻了严重遮挡下的定位误差和空间错位。此外，我们的GPU加速RGB-XYZ重投影使得所有N×B个姿态假设能够在单次前向传播中同时精化。在BOP基准上评估，MAPRPose实现了76.5%的最先进平均召回率（AR），比FoundationPose高出3.1% AR，同时在多目标推理中实现了43倍加速。

英文摘要

6D object pose estimation in cluttered scenes remains challenging due to severe occlusion and sensor noise. We propose MAPRPose, a two-stage framework that leverages mask-aware correspondences for pose proposal and amodal-driven Region-of-Interest (ROI) prediction for robust refinement. In the Mask-Aware Pose Proposal (MAPP) stage, we lift 2D correspondences into 3D space to establish reliable keypoint matches and generate geometrically consistent pose hypotheses based on correspondence-level scoring, from which the top-$K$ candidates are selected. In the refinement stage, we introduce a tensorized render-and-compare pipeline integrated with an Amodal Mask Prediction and ROI Re-Alignment (AMPR) module. By reconstructing complete object geometry and dynamically adjusting the ROI, AMPR mitigates localization errors and spatial misalignment under heavy occlusion. Furthermore, our GPU-accelerated RGB-XYZ reprojection enables simultaneous refinement of all $N \times B$ pose hypotheses in a single forward pass. Evaluated on the BOP benchmark, MAPRPose achieves a state-of-the-art Average Recall (AR) of 76.5%, outperforming FoundationPose by 3.1% AR while delivering a 43x speedup in multi-object inference.

URL PDF HTML ☆

赞 0 踩 0

2604.18587 2026-06-01 cs.LG cs.AI cs.LO cs.PL

Compile to Compress: Boosting Formal Theorem Provers by Compiler Outputs

编译以压缩：通过编译器输出提升形式定理证明器

Guchan Li, Rui Tian, Hongning Wang

AI总结利用编译器将大量证明尝试压缩为结构化失败模式，提出一种学习-精炼框架，通过树搜索基于验证器反馈局部修正错误，在可比测试时预算下在PutnamBench上达到最先进性能。

详情

AI中文摘要

大型语言模型在形式定理证明中展现出显著潜力，但最先进的性能往往需要通过大量展开或扩展上下文窗口来实现令人望而却步的测试时计算。在这项工作中，我们通过利用形式验证中的一种信息结构来解决这一可扩展性瓶颈：观察到编译器将大量不同的证明尝试空间映射到一组紧凑的结构化失败模式。我们引入了一个学习-精炼框架，利用这种压缩来执行高效的学习和证明探索。我们执行树搜索，根据明确的验证器反馈局部修正错误，从而避免了积累长历史证明尝试的相关成本。大量评估表明，我们的方法在不同规模上持续增强了基础证明器的推理能力。值得注意的是，在可比较的测试时预算下，我们的方法在PutnamBench上达到了公开报告的约80亿和约320亿参数模型中的最先进性能，为下一代验证器引导推理提供了一种可扩展的范式。

英文摘要

Large language models (LLMs) have demonstrated significant potential in formal theorem proving, yet state-of-the-art performance often necessitates prohibitive test-time compute via massive roll-outs or extended context windows. In this work, we address this scalability bottleneck by exploiting an informative structure in formal verification: the observation that compilers map a vast space of diverse proof attempts to a compact set of structured failure modes. We introduce a learning-to-refine framework that leverages this compression to perform efficient learning and proof exploration. We perform tree search that corrects errors locally conditioned on explicit verifier feedback, thereby circumventing the costs associated with accumulating a long history of proof attempts. Extensive evaluations show that our method consistently amplifies the reasoning capabilities of base provers across varying scales. Notably, our approach achieves state-of-the-art performance on PutnamBench among publicly reported $\sim$8B and $\sim$32B parameter models under comparable test-time budgets, offering a scalable paradigm for next-generation verifier-guided reasoning.

URL PDF HTML ☆

赞 0 踩 0

2604.17551 2026-06-01 cs.LG cs.AI

SVL: Goal-Conditioned Reinforcement Learning as Survival Learning

SVL：目标条件强化学习作为生存学习

Franki Nguimatsia Tiofack, Fabian Schramm, Théotime Le Hellard, Justin Carpentier

AI总结提出生存价值学习（SVL），通过将时间到目标建模为概率分布，将目标条件强化学习重构为生存学习问题，并利用危险模型进行最大似然估计，在离线基准上匹配或超越强基线方法。

Comments Accepted to the 43rd International Conference on Machine Learning, Seoul, South Korea

详情

AI中文摘要

标准的目标条件强化学习（GCRL）方法依赖于时间差分学习，由于自举可能导致不稳定和样本效率低下。虽然最近的工作探索了对比和监督公式以提高稳定性，但我们提出了一种概率替代方案，称为生存价值学习（SVL），通过将每个状态到目标的时间建模为概率分布，将GCRL重新定义为生存学习问题。这种结构化的分布蒙特卡洛视角产生了一个闭式恒等式，将目标条件价值函数表示为生存概率的折扣和，从而通过危险模型在事件和右删失轨迹上进行最大似然估计来实现价值估计。我们引入了三种实用的价值估计器，包括有限视界截断和两种分箱无限视界近似，以捕捉长视界目标。在离线GCRL基准上的实验表明，SVL与层次化演员结合，匹配或超越了强大的层次化TD和蒙特卡洛基线，在复杂的长视界任务上表现出色。网页和代码：https://simple-robotics.github.io/publications/survival-value-learning/

英文摘要

Standard approaches to goal-conditioned reinforcement learning (GCRL) that rely on temporal-difference learning can be unstable and sample-inefficient due to bootstrapping. While recent work has explored contrastive and supervised formulations to improve stability, we present a probabilistic alternative, called survival value learning (SVL), that reframes GCRL as a survival learning problem by modeling the time-to-goal from each state as a probability distribution. This structured distributional Monte Carlo perspective yields a closed-form identity that expresses the goal-conditioned value function as a discounted sum of survival probabilities, enabling value estimation via a hazard model trained via maximum likelihood on both event and right-censored trajectories. We introduce three practical value estimators, including finite-horizon truncation and two binned infinite-horizon approximations to capture long-horizon objectives. Experiments on offline GCRL benchmarks show that SVL combined with hierarchical actors matches or surpasses strong hierarchical TD and Monte Carlo baselines, excelling on complex, long-horizon tasks. Webpage and Code: https://simple-robotics.github.io/publications/survival-value-learning/

URL PDF HTML ☆

赞 0 踩 0