arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.22616 2026-05-29 cs.CL

Matryoshka 概念瓶颈模型

Ziye Chen, Hongbin Lin, Jie Li, Lijie Hu

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（莫扎德·本·扎耶德人工智能大学）； The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））

AI总结提出 Matryoshka 概念瓶颈模型 (MCBM)，通过嵌套层次结构实现自适应概念利用，将预期干预成本从线性降低到对数阶 O(log K)，同时保证单调性能提升。

详情

AI中文摘要

概念瓶颈模型 (CBMs) 已成为可解释深度学习的一种重要范式，通过将预测基于人类可理解的概念来学习。然而，它们的实际部署受到测试时干预成本高昂的阻碍，因为纠正模型错误通常需要人类专家手动检查和验证大量预测概念。现有方法存在根本性的结构限制：它们要么采用单一静态概念集，迫使专家详尽地标注概念，导致高昂的干预成本；要么训练多个针对不同概念预算的模型，导致大量的计算和维护开销。为了解决这一挑战，我们提出了 Matryoshka 概念瓶颈模型 (MCBM)，这是一种统一的架构，能够在单个模型中实现自适应概念利用。受 Matryoshka 表示学习的启发，MCBM 基于最大相关性和最小冗余性将概念组织成嵌套层次结构，允许在不重新训练的情况下在多个概念粒度级别进行推理。理论上，我们证明 MCBM 将预期干预成本从线性降低到对数阶 $O(\log K)$，同时保证单调性能提升。实验上，大量实验表明，MCBM 在实现动态且高效的专家交互的同时，与独立训练的模型性能相当。

英文摘要

Concept Bottleneck Models (CBMs) have emerged as a prominent paradigm for interpretable deep learning, learning by grounding predictions in human-understandable concepts. However, their practical deployment is hindered by the high cost of test-time intervention, as correcting model errors typically requires human experts to manually inspect and verify a large set of predicted concepts. Existing approaches suffer from a fundamental structural limitation: they either adopt a single static concept set, forcing experts to exhaustively annotate concepts and incurring prohibitive intervention costs, or train multiple models tailored to different concept budgets, resulting in substantial computational and maintenance overhead. To address this challenge, we propose the Matryoshka Concept Bottleneck Model (MCBM), a unified architecture that enables adaptive concept utilization within a single model. Inspired by Matryoshka Representation Learning, MCBM organizes concepts into a nested hierarchy based on maximum relevance and minimum redundancy, allowing inference at multiple levels of conceptual granularity without retraining. Theoretically, we show that MCBM reduces the expected intervention costs from linear to logarithmic order, $O(\log K)$, while guaranteeing monotonic performance improvement. Empirically, extensive experiments demonstrate that MCBM matches the performance of independently trained models while enabling dynamic and efficient expert interaction.

URL PDF HTML ☆

赞 0 踩 0

2605.16608 2026-05-29 cs.LG cs.CL

TabPFN-3: 技术报告

Léo Grinsztajn, Klemens Flöge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Mihir Manium, Shi Bin Hoo, Magnus Bühler, Anurag Garg, Dominik Safaric, Jake Robertson, Benjamin Jäger, Simone Alessi, Adrian Hayler, Vladyslav Moroshan, Lennart Purucker, Philipp Singer, Alan Arazi, Julien Siems, Jan Hendrik Metzen, Georg Grab, Nick Erickson, Siyuan Guo, Eliott Kalfon, Simon Bing, David Salinas, Clara Cornu, Lilly Charlotte Wehrhahn, Diana Kriuchkova, Kursat Kaya, Lydia Sidhoum, Marie Salmon, Jerry Chen, Madelon Hulsebos, Yann LeCun, Samuel Müller, Bernhard Schölkopf, Sauraj Gambhir, Noah Hollmann, Frank Hutter

发表机构 * Prior Labs

AI总结本文提出TabPFN-3，通过扩展训练数据和优化推理，在表格数据上实现最先进性能，并支持时间序列、关系数据和表格文本数据。

详情

AI中文摘要

表格数据支撑着科学和工业中大多数高价值预测问题，而TabPFN推动了该模态的基础模型革命。根据用户反馈设计，TabPFN-3在此基础上将最先进性能扩展到具有100万训练行的数据集，并大幅减少训练和推理时间。TabPFN-3完全基于我们先验的合成数据进行预训练，极大地推动了表格预测的前沿，并在时间序列、关系数据和表格文本数据上带来了实质性收益。在标准表格基准TabArena上，TabPFN-3的前向传播以显著优势优于所有其他模型（包括调优和集成基线），并在速度/性能前沿上占据帕累托优势。在更多样化的数据集上，TabPFN-3在多类数据集上排名第一，并在多达100万训练行和200个特征的数据集上击败了经过8小时调优的梯度提升树基线。TabPFN-3将测试时计算缩放引入表格基础模型。我们的API产品TabPFN-3-Plus（思考版）利用这一点，在TabArena上以超过200 Elo的优势击败所有非TabPFN模型，在最大数据子集上达到420 Elo，并且比AutoGluon 1.5 extreme快10倍，同时不使用LLM、真实数据、互联网搜索或除TabPFN之外的任何其他模型。TabPFN-3扩展了我们模型的能力，实现了对关系数据（在RelBenchV1上新的最先进基础模型）和表格文本数据（通过TabPFN-3-Plus在TabSTAR上达到最先进）的最先进预测；并改进了现有集成：专用检查点TabPFN-TS-3在时间序列基准fev-bench上排名第二，SHAP值计算速度提升高达120倍。TabPFN-3在实现这一性能的同时，比TabPFN-2.5快20倍。此外，减少的KV缓存和行分块技术使得在单个H100上以快速推理速度扩展到100万行。

英文摘要

Tabular data underpins most high-value prediction problems in science and industry, and TabPFN has driven the foundation model revolution for this modality. Designed with feedback from our users, TabPFN-3 builds on this foundation to scale state-of-the-art performance to datasets with 1M training rows and substantially reduce training and inference time. Pretrained exclusively on synthetic data from our prior, TabPFN-3 dramatically pushes the frontier of tabular prediction and brings substantial gains on time series, relational, and tabular-text data. On the standard tabular benchmark TabArena, a forward pass of TabPFN-3 outperforms all other models, including tuned and ensembled baselines, by a significant margin, and pareto-dominates the speed/performance frontier. On more diverse datasets, TabPFN-3 ranks first on datasets with many classes, and beats 8-hour-tuned gradient-boosted-tree baselines on datasets up to 1M training rows and 200 features. TabPFN-3 introduces test-time compute scaling to tabular foundation models. Our API offering TabPFN-3-Plus (Thinking) exploits this to beat all non-TabPFN models by over 200 Elo on TabArena, rising to 420 Elo on the largest data subset, and outperforms AutoGluon 1.5 extreme while being 10x faster, without using LLMs, real data, internet search or any other model besides TabPFN. TabPFN-3 extends the capabilities of our models, enabling SOTA prediction on relational data (new SOTA foundation model on RelBenchV1) and tabular-text data (SOTA on TabSTAR via TabPFN-3-Plus); and improves existing integrations: a specialized checkpoint, TabPFN-TS-3, ranks 2nd on the time-series benchmark fev-bench, and SHAP-value computation is up to 120x faster. TabPFN-3 achieves this performance while being up to 20x faster than TabPFN-2.5. In addition, a reduced KV cache and row-chunking scale to 1M rows on one H100 with fast inference speed.

URL PDF HTML ☆

赞 0 踩 0

2605.13230 2026-05-29 cs.LG cs.AI

Teacher-Guided Policy Optimization for On-Policy Reasoning Distillation under Large Policy Divergence

教师引导的策略优化：大策略差异下的在线推理蒸馏

Xinyu Liu, Kechen Jiao, Chunyang Xiao, Runsong Zhao, Junhao Ruan, Bei Li, Jiahao Liu, Qifan Wang, Xin Chen, Jingang Wang, Chenglong Wang, Tong Xiao, JingBo Zhu

发表机构 * School of Computer Science and Engineering, Northeastern University, China（东北大学计算机科学与工程学院）； Tsinghua University（清华大学）； Meituan（美团）； Meta AI ； NiuTrans Research, Shenyang, China（新译研究院，沈阳，中国）

AI总结针对在线蒸馏中教师与学生策略差异大时反向KL监督失效的问题，提出教师引导策略优化（TGPO），通过教师直接指导学生上下文的token级生成并结合RLVR奖励，在推理基准上优于现有方法。

详情

AI中文摘要

在线蒸馏（OPD）已成为面向推理的大型语言模型（LLM）后训练的一种有前景的范式，特别是与可验证奖励的强化学习（RLVR）结合时。现有的OPD方法依赖于基于反向KL（RKL）的教师监督，对学生策略采样的轨迹进行监督。然而，我们识别出一个关键限制：在教师-学生策略差异大的情况下，RL驱动的探索常常产生教师分布之外的轨迹，导致无信息的负面反馈。为了解决这个问题，我们提出教师引导策略优化（TGPO），一种在策略差异大设置下仍然有效的在线推理蒸馏方法。TGPO不依赖于单纯的评估监督，而是利用教师直接指导基于学生生成上下文的token级生成；结合RLVR风格的轨迹级奖励，TGPO引导探索朝向改进的延续。在推理基准上的实验表明，TGPO始终优于现有的基于RKL的OPD方法，并且在不同教师模型下保持鲁棒性。

英文摘要

On-policy distillation (OPD) has become a promising paradigm for reasoning-oriented post-training of large language models (LLMs), especially when combined with reinforcement learning from verifiable rewards (RLVR). Existing OPD methods rely on reverse KL (RKL)-based teacher supervision over trajectories sampled from the student policy. However, we identify a critical limitation: under large teacher--student policy divergence, RL-driven exploration often produces trajectories outside the teacher distribution, resulting in uninformative negative feedback. To address this, we propose Teacher-Guided Policy Optimization (TGPO), an on-policy reasoning distillation method that remains effective under large policy divergence settings. Rather than relying solely on evaluative supervision, TGPO uses teacher to directly guide token level generation conditioning on student-generated contexts; together with RLVR-style trajectory level rewards, TGPO steers exploration toward improved continuations. Experiments on reasoning benchmarks show that TGPO consistently outperforms existing RKL-based OPD methods and remains robust across different teacher models.

URL PDF HTML ☆

赞 0 踩 0

2605.11723 2026-05-29 cs.CV cs.AI

CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating

CaC：通过分层时空聚焦推进视频奖励模型

Jiyuan Wang, Huan Ouyang, Jiuzhou Lin, Chunyu Lin, Dewen Fan, Boheng Zhang, Haonan Fan, Fei Zuo, Jia Sun, Huaiqing Wang, Honglie Wang, Yiyang Fan, Zhenlong Yuan, Zijun Li, Yongrui Heng, Guosheng Lin, Fan Yang, Tingting Gao

发表机构 * BJTU（北京工业大学）； NTU（国立台湾大学）； BUPT（北京邮电大学）； Kuaishou Technology（快手科技）

AI总结提出基于视觉语言模型的粗到细异常奖励模型CaC，通过全局时间扫描、局部空间定位和结构化时空思维链推理，结合大规模生成视频异常数据集和三阶段渐进训练，显著提升细粒度异常检测精度并减少生成视频异常。

Comments 27 pages, 10 figures

详情

AI中文摘要

在本文中，我们提出了Concentrate and Concentrate (CaC)，一种基于视觉语言模型的粗到细异常奖励模型。在推理过程中，它首先进行全局时间扫描以锚定异常时间窗口，然后在局部区间内进行细粒度空间定位，最后通过结构化的时空思维链推理得出稳健判断。为了使模型具备这些能力，我们构建了第一个大规模生成视频异常数据集，包含逐帧边界框注释、时间异常窗口和细粒度归因标签。基于该数据集，我们设计了三阶段渐进训练范式。模型首先通过单帧和多帧监督微调学习空间和时间锚定，然后通过基于两轮组相对策略优化（GRPO）的强化学习策略进行优化。除了传统的准确率奖励，我们引入了时间和空间IoU奖励来监督中间定位过程，有效引导模型进行更扎实和可解释的时空推理。大量实验表明，CaC能够稳定聚焦于细微异常，在细粒度异常基准上实现了25.7%的准确率提升，并且作为奖励信号时，CaC将生成视频异常减少了11.7%，同时提高了整体视频质量。

基于超球面置信映射的不确定性估计

Eunseo Choi, Ho-Yeon Kim, Jaewon Lee, Taeyong jo, Myungjun lee, Heejin Ahn

发表机构 * KAIST（韩国科学技术院）； Samsung Electronic Co., Ltd（三星电子有限公司）

AI总结提出超球面置信映射（HCM），通过将输出分解为幅度和归一化方向向量并利用几何约束违反程度实现无采样、无分布假设的不确定性估计，在回归和分类任务中匹配或超越集成与证据方法且推理成本更低。

Comments Accepted at ICLR 2026. 24 pages, 7 figures, including appendix. Updated references

详情

AI中文摘要

量化神经网络预测中的不确定性对于自动驾驶、医疗和制造等高安全领域至关重要。现有方法通常依赖昂贵的采样或严格的分布假设，我们提出超球面置信映射（HCM），一个简单而原则性的框架，用于无采样和无分布假设的不确定性估计。HCM将输出分解为幅度和约束在单位超球面上的归一化方向向量，从而将不确定性解释为该几何约束的违反程度，得到适用于回归和分类的确定性和可解释性估计。在多种基准和实际工业任务上的实验表明，HCM匹配或超越了集成和证据方法，且推理成本更低，置信度-错误对齐更强。我们的结果凸显了几何结构在不确定性估计中的力量，并将HCM定位为传统技术的通用替代方案。

英文摘要

Quantifying uncertainty in neural network predictions is essential for high-stakes domains such as autonomous driving, healthcare, and manufacturing. While existing approaches often depend on costly sampling or restrictive distributional assumptions, we propose Hyperspherical Confidence Mapping (HCM), a simple yet principled framework for sampling-free and distribution-free uncertainty estimation. HCM decomposes outputs into a magnitude and a normalized direction vector constrained to lie on the unit hypersphere, enabling a novel interpretation of uncertainty as the degree of violation of this geometric constraint. This yields deterministic and interpretable estimates applicable to both regression and classification. Experiments across diverse benchmarks and real-world industrial tasks demonstrate that HCM matches or surpasses ensemble and evidential approaches, with far lower inference cost and stronger confidence-error alignment. Our results highlight the power of geometric structure in uncertainty estimation and position HCM as a versatile alternative to conventional techniques.

URL PDF HTML ☆

赞 0 踩 0

2605.05155 2026-05-29 cs.CV cs.AI

CompleteRXN：迈向完整开放化学反应数据库

Gabriel Vogel, Minouk Noordsij, Evgeny Pidko, Jana M. Weber

发表机构 * Department of Intelligent Systems（智能系统系）； Delft University of Technology（代尔夫特理工大学）； Department of Chemical Engineering（化学工程系）

AI总结针对化学反应数据库（如USPTO）普遍存在的不完整问题，提出CompleteRXN基准和约束反应平衡器（CRB）模型，通过监督学习和约束解码实现高精度的反应补全。

详情

AI中文摘要

诸如USPTO等化学反应数据集存在严重的不完整性，经常缺失副产物、共反应物和化学计量系数。这限制了它们在下游应用中的适用性和可靠性。在此，我们介绍CompleteRXN，一个在现实缺失数据条件下用于反应补全的大规模监督基准。通过将USPTO记录映射到精心整理的机理反应，我们构建了一个对齐的不完整和原子平衡反应数据集。我们评估了代表性基线方法，包括一种新颖的具有约束解码的编码器-解码器反应补全模型——约束反应平衡器（CRB），以及最近的算法方法SynRBL。在我们的CompleteRXN基准上，CRB在难度递增的划分上实现了高性能，在随机划分上达到99.20%的等价准确率，在极端分布外划分上达到91.12%。SynRBL生成了许多平衡且化学上合理的补全结果，但在基准测试划分上的准确率较低。在所有方法中，性能随着不完整程度的增加而下降。当在基准之外（完整的未整理USPTO）评估反应时，我们观察到性能大幅下降，这突显了基准性能与实际鲁棒性之间的差距，并激励了未来的工作。

英文摘要

Chemical reaction datasets such as USPTO suffer from substantial incompleteness, frequently missing byproducts, co-reactants, and stoichiometric coefficients. This limits their applicability and reliability in downstream applications. Here, we introduce CompleteRXN, a large-scale supervised benchmark for reaction completion under realistic missing-data conditions. We construct a dataset of aligned incomplete and atom-balanced reactions by mapping USPTO records to curated mechanistic reactions. We evaluate representative baselines, including a novel encoder-decoder reaction completion model with constrained decoding, the Constrained Reaction Balancer (CRB), and a recent algorithmic method, SynRBL. On our CompleteRXN benchmark, the CRB achieves high performance across splits of increasing difficulty, reaching 99.20% equivalence accuracy on the random split and 91.12% on the extreme out-of-distribution split. SynRBL produces many balanced and chemically plausible completions, but with lower accuracy on the benchmark test splits. Across all methods, performance degrades with increasing incompleteness. We observe a substantial drop when evaluating on reactions outside the benchmark (full uncurated USPTO), highlighting the gap between benchmark performance and practical robustness and motivating future work.

URL PDF HTML ☆

赞 0 踩 0

2604.27272 2026-05-29 cs.CL cs.AI cs.LG

When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks

当2D任务遇到1D序列化：结构化任务中的序列化摩擦

Chung-Hsiang Lo, Lu Li, Diji Yang, Tianyu Zhang, Yunkai Zhang, Yoshua Bengio, Yi Zhang

发表机构 * Northeastern University（东北大学）； University of Pennsylvania（宾夕法尼亚大学）； UC Santa Cruz（加州大学圣克鲁兹分校）； Mila - Quebec AI Institute（魁北克人工智能研究所）； University of Montreal（蒙特利尔大学）； BAIR, UC Berkeley（伯克利大学BAIR实验室）

AI总结研究通过矩阵转置、康威生命游戏和LU分解三个任务，发现将二维布局任务序列化为一维文本会因表示不匹配导致性能下降，且错误呈现空间结构模式。

详情

AI中文摘要

在LLM时代，许多符号化和结构化问题通过一维文本序列化呈现给模型。然而，其中一些问题本质上是二维的：它们的相关关系，如行列对应或空间邻接，由二维布局中的位置定义，而非顺序。这引发了一个表示问题：在一维序列中保留相同的符号条目是否也保留了计算所需的关系结构？我们通过序列化摩擦的视角研究这一问题：即相同底层任务实例和条目仍然存在，但依赖于布局的关系在一维序列化下变得隐式的表示不匹配。本研究使用三个受控合成测试任务：矩阵转置、康威生命游戏和LU分解。在每个任务中，相同的实例要么作为一维文本序列化呈现，要么作为其原生二维布局渲染为图像呈现。在整个测试集中，随着任务规模增长，一维序列化的性能下降更显著，且序列化下的错误呈现空间结构模式，表明这种呈现选择在我们的测试集中具有重要影响。为了进一步解释这些结果，我们添加了补充分析，包括视觉内探针以及混合训练转置设置下两种输入呈现的额外比较。这些发现表明，对于布局定义的任务，将输入简化为1D序列化并非中性的表示选择。

英文摘要

In the LLM era, many symbolic and structured problems are presented to models through 1D text serialization. Yet some such problems are natively two-dimensional: their relevant relations, such as row--column correspondence or spatial adjacency, are defined by position in a 2D layout rather than by sequential order. This raises a representational question: does preserving the same symbolic entries in a 1D sequence also preserve the relational structure needed for computation? We study this issue through the lens of serialization friction: the representational mismatch in which the same underlying task instances and entries are still present, but relations that depend on layout become implicit under 1D serialization. The study uses a controlled synthetic testbed of three tasks: matrix transpose, Conway's Game of Life, and LU decomposition. In each task, the same instances are presented either as 1D text serialization or as their native 2D layout rendered as an image. Across this testbed, 1D serialization degrades more sharply as task size grows, and errors under serialization exhibit spatially structured patterns, suggesting that this presentation choice is consequential within our testbed. To further interpret these results, we add supplementary analyses that include a within-visual probe and an additional comparison of the two input presentations under the mixed-training transpose setting. These findings suggest that, for layout-defined tasks, reducing inputs to 1D serialization is not a neutral choice of representation.

URL PDF HTML ☆

赞 0 踩 0

2604.26645 2026-05-29 cs.AI cs.LG

超越思维链：重写作为生成式多模态嵌入的通用接口

Peixi Wu, Ke Mei, Feipeng Ma, Bosong Chai, Zhibin Lan, Chenxi Zhao, Shannan Yan, Jie Chen, Zhangchi Hu, Yansong Peng, Bo Lin, Junjie Zhou, Dacheng Yin, Tianyi Wang, Fengyun Rao, Jing Lyu, Hebei Li, Xiaoyan Sun

发表机构 * WeChat Vision, Tencent Inc.（腾讯微信视觉部）； Zhejiang University（浙江大学）； Tsinghua University（清华大学）； Institute of Artificial Intelligence, Hefei Comprehensive National Science Center（合肥综合性国家科学中心人工智能研究院）

AI总结针对思维链推理在检索中产生冗余和语义歧义的问题，提出重写驱动的多模态嵌入框架RIME，联合优化生成与嵌入，并通过跨模态对齐和精炼强化学习实现高效准确的检索。

详情

AI中文摘要

多模态大语言模型已成为通用多模态嵌入的有前景的基础。最近的研究表明，推理驱动的生成式多模态嵌入在多个嵌入任务上可以超越判别式嵌入。然而，思维链推理往往会产生冗余的思考步骤，并在更广泛的检索场景中引入总结答案的语义歧义。为了解决这一限制，我们提出了重写驱动的多模态嵌入（RIME），这是一个通过检索友好的重写联合优化生成和嵌入的统一框架。同时，我们提出了跨模态对齐（CMA）来桥接生成式和判别式嵌入空间，从而实现灵活的相互检索以权衡效率和准确性。在此基础上，我们还引入了精炼强化学习（Refine-RL），将判别式嵌入作为稳定的语义锚点来指导重写优化。在MMEB-V2、MRMR和UVRB上的大量实验表明，RIME显著优于先前的生成式嵌入模型，同时大幅减少了思考长度。

英文摘要

Multimodal Large Language Models (MLLMs) have emerged as a promising foundation for universal multimodal embeddings. Recent studies have shown that reasoning-driven generative multimodal embeddings can outperform discriminative embeddings on several embedding tasks. However, Chain-of-Thought (CoT) reasoning tends to generate redundant thinking steps and introduce semantic ambiguity in the summarized answers in broader retrieval scenarios. To address this limitation, we propose Rewrite-driven Multimodal Embedding (RIME), a unified framework that jointly optimizes generation and embedding through a retrieval-friendly rewrite. Meanwhile, we present the Cross-Mode Alignment (CMA) to bridge the generative and discriminative embedding spaces, enabling flexible mutual retrieval to trade off efficiency and accuracy. Based on this, we also introduce Refine Reinforcement Learning (Refine-RL) that treats discriminative embeddings as stable semantic anchors to guide the rewrite optimization. Extensive experiments on MMEB-V2, MRMR and UVRB demonstrate that RIME substantially outperforms prior generative embedding models while significantly reducing the length of thinking.

URL PDF HTML ☆

赞 0 踩 0

2604.19011 2026-05-29 cs.LG cs.RO

Accelerating trajectory optimization with Sobolev-trained diffusion policies

基于Sobolev训练的扩散策略加速轨迹优化

Théotime Le Hellard, Franki Nguimatsia Tiofack, Quentin Le Lidec, Justin Carpentier

发表机构 * Inria - Département d’Informatique de l’École normale supérieure, PSL Research University（法国国家科学研究中心-巴黎高等师范学院计算机系，PSL研究大学）； Courant Institute, New York University（纽约大学Courant研究所）

AI总结针对梯度型轨迹优化求解器，提出利用Sobolev学习训练扩散策略以提供初始猜测，通过利用轨迹和反馈增益的一阶损失避免复合误差，实现求解时间减少2至20倍。

详情

AI中文摘要

轨迹优化求解器利用已知系统动力学通过迭代改进计算局部最优轨迹。其缺点是每个新问题实例独立求解，因此收敛速度和求解质量依赖于初始轨迹。为提高效率，一种自然的方法是用学习策略生成的初始猜测对轨迹优化进行热启动，该策略在求解器先前生成的轨迹上训练。基于扩散的策略最近成为表达性模仿学习模型，使其成为这一角色的有前途候选者。然而，一个反直觉的挑战来自轨迹优化示范的局部最优性：当策略展开时，小的非最优偏差可能将其推入训练数据中未表示的情况，从而在长时域上引发复合误差。在这项工作中，我们专注于基于学习的热启动，用于同时提供反馈增益的梯度型轨迹优化求解器。利用这一特性，我们推导出一阶损失，用于使用轨迹和反馈增益对基于扩散的策略进行Sobolev学习。通过全面实验，我们证明所得策略避免了复合误差，因此可以从非常少的轨迹中学习，提供初始猜测，将求解时间减少2倍到20倍。结合一阶信息使得用更少的扩散步骤进行预测成为可能，从而降低推理延迟。

英文摘要

Trajectory Optimization (TO) solvers exploit known system dynamics to compute locally optimal trajectories through iterative improvements. A downside is that each new problem instance is solved independently; therefore, convergence speed and quality of the solution found depend on the initial trajectory proposed. To improve efficiency, a natural approach is to warm-start TO with initial guesses produced by a learned policy trained on trajectories previously generated by the solver. Diffusion-based policies have recently emerged as expressive imitation learning models, making them promising candidates for this role. Yet, a counterintuitive challenge comes from the local optimality of TO demonstrations: when a policy is rolled out, small non-optimal deviations may push it into situations not represented in the training data, triggering compounding errors over long horizons. In this work, we focus on learning-based warm-starting for gradient-based TO solvers that also provide feedback gains. Exploiting this specificity, we derive a first-order loss for Sobolev learning of diffusion-based policies using both trajectories and feedback gains. Through comprehensive experiments, we demonstrate that the resulting policy avoids compounding errors, and so can learn from very few trajectories to provide initial guesses reducing solving time by $2\times$ to $20 \times$. Incorporating first-order information enables predictions with fewer diffusion steps, reducing inference latency.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Chinese sensorimotor and embodiment norms for 3,000 lexicalized concepts

Direct content-based retrieval from music scores images

GaussianDream: A Feed-Forward 3D Gaussian World Model for Robotic Manipulation

Matryoshka Concept Bottleneck Models

To MRL or not to MRL: Text Embeddings are Robust to Truncation Without Matryoshka Learning, Except In Heavy Truncation Scenarios

Turning Stale Gradients into Stable Gradients: Coherent Coordinate Descent with Implicit Landscape Smoothing for Lightweight Zeroth-Order Optimization

Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents

ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows

TabPFN-3: Technical Report

Teacher-Guided Policy Optimization for On-Policy Reasoning Distillation under Large Policy Divergence

CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating

Nearly-Optimal Algorithm for Adversarial Kernelized Bandits

TopoGeoScore: A Self-Supervised Source-Only Geometric Framework for OOD Checkpoint Selection

Inpainting physics: self-supervised learning for context-driven fluid simulation

PRIM: Meta-Learned Bayesian Root Cause Analysis

Order-Agnostic Autoregressive Modelling with Missing Data

Uncertainty Estimation via Hyperspherical Confidence Mapping

Aes3D: Aesthetic Assessment in 3D Gaussian Splatting

Transformed Latent Variable Multi-Output Gaussian Processes

LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention

Linearizing Vision Transformer with Test-Time Training

LabBuilder: Protocol-Grounded 3D Layout Generation for Interactable and Safe Laboratory

MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio

CompleteRXN: Toward Completing Open Chemical Reaction Databases

When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks

SciHorizon-DataEVA: An Agentic System for AI-Readiness Evaluation of Heterogeneous Scientific Data

SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts

Graph Memory Transformer (GMT)

Beyond Chain-of-Thought: Rewrite as a Universal Interface for Generative Multimodal Embeddings

Accelerating trajectory optimization with Sobolev-trained diffusion policies