arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1696
2605.14344 2026-05-18 cs.AI

CrystalReasoner: Reasoning and RL for Property-Conditioned Crystal Structure Generation

CrystalReasoner: 基于推理和强化学习的性质条件晶体结构生成

Yuyang Wu, Stefano Falletta, Delia McGrath, Sherry Yang

AI总结 CrystalReasoner通过引入物理先验和强化学习,实现从自然语言指令生成稳定且具有特定性质的晶体结构,提升了生成精度和科学合理性。

Comments Our work is available at https://crystalreasoner.github.io/, with code at https://github.com/wyy603/CrystalReasoner

详情
AI中文摘要

生成模型已成为发现晶体结构的有前途方法。然而,现有基于LLM的生成模型在原子级精度上表现不佳,而基于扩散的方法在整合高层科学知识方面存在不足。为此,我们提出了CrystalReasoner(CrysReas),一种端到端的LLM框架,通过推理和对齐从自然语言指令生成晶体结构。CrysReas引入物理先验作为思考标记,包括晶体学对称性、局部配位环境和预测的物理性质,在生成原子坐标前包含这些信息。这架起了自然语言与3D结构之间的桥梁。CrysReas随后采用强化学习(RL)与多目标、密集奖励函数,以对齐生成与物理有效性、化学一致性和热力学稳定性。对于性质条件任务,我们设计了任务特定的奖励函数,并训练专门模型处理离散约束(如空间群)和连续属性(如弹性、热膨胀)。实证结果表明,与先前工作和无思考痕迹或RL的基线相比,CrysReas在多种指标上表现更好,三倍提升S.U.N.比率,并在性质条件生成中取得更好表现。CrysReas还表现出适应性推理,随着原子数增加,推理长度也随之增加。我们的工作展示了利用思考痕迹和RL生成有效、稳定且性质条件的晶体结构的潜力。

英文摘要

Generative modeling has emerged as a promising approach for crystal structure discovery. However, existing LLM-based generative models struggle with low-level atomic precision, while diffusion-based methods fall short in integrating high-level scientific knowledge. As a result, generated structures are often invalid, unstable, or do not possess desirable properties. To address this gap, we propose CrystalReasoner (CrysReas), an end-to-end LLM framework that generates crystal structures from natural language instructions through reasoning and alignment. CrysReas introduces physical priors as thinking tokens, which include crystallographic symmetry, local coordination environments and predicted physical properties before generating atomic coordinates. This bridges the gap between natural language and 3D structures. CrysReas then employs reinforcement learning (RL) with a multi-objective, dense reward function to align generation with physical validity, chemical consistency, and thermodynamic stability. For property-conditioned tasks, we design task-specific reward functions and train specialized models for discrete constraints (e.g., space group) and continuous properties (e.g., elasticity, thermal expansion). Empirical results demonstrate that compared to prior works and baselines without thinking traces or RL, CrysReas obtains better performance on diverse metrics, triples S.U.N. ratio, and achieves better performance for property conditioned generation. CrysReas also exhibits adaptive reasoning, increasing reasoning lengths as the number of atoms increases. Our work demonstrates the potential of leveraging thinking traces and RL for generating valid, stable, and property-conditioned crystal structures.

2605.14236 2026-05-18 cs.LG cs.AI cs.CL

Active Learners as Efficient PRP Rerankers

主动学习者作为高效的PRP重排序器

Jeremías Figueiredo Paschmann, Juan Kaplan, Francisco Nattero, Santiago Barron, Juan Wisznia, Luciano del Corro

AI总结 本文将PRP重排序问题重新定义为从噪声成对比较中进行主动学习,证明主动排序器在受限调用下能提升NDCG@10性能,并引入随机方向oracle以降低计算成本。

Comments 13 pages, 7 figures. Preprint

详情
AI中文摘要

Pairwise Ranking Prompting (PRP) 通过从大语言模型 (LLM) 中获取成对偏好判断,然后通过经典排序算法聚合为排序结果。然而,这些判断具有噪声性、顺序敏感性和有时不一致的特性,因此排序假设与实际设置不匹配。由于排序旨在恢复完整排列,截断以满足调用预算无法产生可靠top-K。因此,我们将PRP重排序重新定义为从噪声成对比较中进行主动学习,并展示主动排序器在受限调用环境下能提升NDCG@10性能。我们的噪声鲁棒框架还引入了单次LLM调用每对的随机方向oracle,该方法将系统位置偏差转化为零均值噪声,从而在不增加双向调用成本的情况下实现无偏聚合排序。

英文摘要

Pairwise Ranking Prompting (PRP) elicits pairwise preference judgments from an LLM, which are then aggregated into a ranking, usually via classical sorting algorithms. However, judgments are noisy, order-sensitive, and sometimes intransitive, so sorting assumptions do not match the setting. Because sorting aims to recover a full permutation, truncating it to meet a call budget does not produce a dependable top-K. We thus reframe PRP reranking as active learning from noisy pairwise comparisons and show that active rankers are drop-in replacements that improve NDCG@10 per call in the call-constrained regime. Our noise-robust framework also introduces a randomized-direction oracle that uses a single LLM call per pair. This approach converts systematic position bias into zero-mean noise, enabling unbiased aggregate ranking without the cost of bidirectional calls.

2605.13788 2026-05-18 cs.LG

Force-Aware Neural Tangent Kernels for Scalable and Robust Active Learning of MLIPs

面向可扩展性和鲁棒性的力感知神经切线核用于机器学习原子势的主动学习

Eszter Varga-Umbrich, Zachary Weller-Davies, Paul Duckworth, Jules Tilly, Olivier Peltre, Shikha Surana

AI总结 本文提出一种线性可扩展的主动学习框架,结合力感知神经切线核,有效提升MLIPs在大规模候选池中的鲁棒性和效率,验证了其在多个数据集上的优越性能。

Comments 10 main pages, total 34 pages

详情
AI中文摘要

针对机器学习原子势(MLIPs)的主动学习,必须解决几个挑战以实现实用性:扩展到大规模候选池、利用能量-力监督以及在候选池相对于目标分布偏移时保持鲁棒性。本文联合解决这些挑战。我们首先引入了一种基于分块特征空间后验方差筛选的线性可扩展获取框架。通过避免候选集和训练集核的实体化,该方法能够在数小时内筛选出约20万结构,并广泛适用于基于分子相似性度量评分候选的获取策略。随后,我们将神经切线核(NTK)扩展到力感知设置,通过混合参数坐标导数,得到力NTK和联合能量-力NTK,为矢量场预测提供自然的相似性度量。我们在OC20数据集上展示了联合能量-力NTK的有效性,其中力感知获取至关重要:它在所有指标和分布分割中实现了最低的能量和力MAE和RMSE。在T1x、PMechDB和RGD基准测试中,我们的力NTK方法在与现有基线竞争的同时,显著优于基于委员会的方法。在受控候选池偏移案例研究中,基于预训练MLIP嵌入和NTK的获取保持稳健,而基于委员会的方法则表现出更高的方差。总体而言,这些结果表明,单个预训练MLIP可以实现可扩展、力感知和分布稳健的主动学习,用于基础模型微调。

英文摘要

Active learning for machine-learning interatomic potentials (MLIPs) must address several challenges to be practical: scaling to large candidate pools, leveraging energy-force supervision, and maintaining robustness when candidate pools are biased relative to the target distribution. In this work, we jointly address these challenges. We first introduce a linearly scaling acquisition framework based on chunked feature-space posterior-variance shortlisting. By avoiding materialisation of the candidate and train set kernels, this approach enables screening of ~200k structures within hours and applies broadly to acquisition strategies that score candidates based on molecular similarity metrics. We then extend the Neural Tangent Kernel (NTK) to a force-aware setting via mixed parameter-coordinate derivatives, yielding a force NTK and a joint energy-force NTK that provide natural similarity metrics for vector-field prediction. We demonstrate the effectiveness of the joint energy-force NTK on the OC20 dataset, where force-aware acquisition is crucial: it achieves the lowest energy and force MAE and RMSE across all metrics and distribution splits. Across T1x, PMechDB, and RGD benchmarks, our force NTK methods remain competitive with established baselines while being significantly more efficient than committee-based approaches. Under a controlled candidate-pool shift case study on T1x, acquisition based on pretrained MLIP embeddings and NTKs remains robust, whereas committee-based methods exhibit higher variance. Overall, these results show that a single pretrained MLIP can enable scalable, force-aware, and distribution-robust active learning for foundation-model fine-tuning.

2605.13169 2026-05-18 cs.CV cs.AI

PanoWorld: Towards Spatial Supersensing in 360$^\circ$ Panorama World

PanoWorld:迈向360度全景世界的空间超感知

Changpeng Wang, Xin Lin, Junhan Liu, Yuheng Liu, Zhen Wang, Donglian Qi, Yunfeng Yan, Xi Chen

AI总结 本文提出PanoWorld,通过构建全景原生理解能力,解决传统多模态大模型在空间感知上的不足,通过全景空间交叉注意力机制提升3D空间推理能力,并建立PanoSpace-Bench基准测试,验证了全景原生监督的有效性。

Comments Project page: https://wcpcp.github.io/PanoWorld

详情
AI中文摘要

多模态大实验室模型(MLLMs)在主导视角图像范式下仍难以实现空间理解,继承了人类感知的窄视野。为导航、机器人搜索和3D场景理解,360度全景感知通过一次性捕捉整个周围环境提供超感知。然而,现有MLLM流程通常将全景分解为多个视角,使等距投影(ERP)的球形结构隐含。本文研究全景原生理解,要求MLLM在ERP全景上作为连续的观察者中心空间进行推理。为此,我们首先定义了全景原生理解的关键能力,包括语义锚定、球形定位、参考框架转换和深度感知的3D空间推理。然后构建大规模元数据构造流程,将混合源ERP全景转换为几何感知、语言引导和深度感知的监督,并将这些信号作为能力对齐的指令微调数据。在模型方面,我们引入具有球形空间交叉注意力的PanoWorld,将球形几何注入视觉流。我们进一步构建PanoSpace-Bench,一个评估ERP原生空间推理的诊断基准。实验表明,PanoWorld在PanoSpace-Bench、H* Bench和R2R-CE Val-Unseen基准上显著优于专有和开源基线。这些结果表明,稳健的全景推理需要专门的全景原生监督和几何感知的模型适应。所有源代码和提出的数据将公开发布。

英文摘要

Multimodal large laboratory models (MLLMs) still struggle with spatial understanding under the dominant perspective-image paradigm, which inherits the narrow field of view of human-like perception. For navigation, robotic search, and 3D scene understanding, 360-degree panoramic sensing offers a form of supersensing by capturing the entire surrounding environment at once. However, existing MLLM pipelines typically decompose panoramas into multiple perspective views, leaving the spherical structure of equirectangular projection (ERP) largely implicit. In this paper, we study pano-native understanding, which requires an MLLM to reason over an ERP panorama as a continuous, observer-centered space. To this end, we first define the key abilities for pano-native understanding, including semantic anchoring, spherical localization, reference-frame transformation, and depth-aware 3D spatial reasoning. We then build a large-scale metadata construction pipeline that converts mixed-source ERP panoramas into geometry-aware, language-grounded, and depth-aware supervision, and instantiate these signals as capability-aligned instruction tuning data. On the model side, we introduce PanoWorld with Spherical Spatial Cross-Attention, which injects spherical geometry into the visual stream. We further construct PanoSpace-Bench, a diagnostic benchmark for evaluating ERP-native spatial reasoning. Experiments show that PanoWorld substantially outperforms both proprietary and open-source baselines on PanoSpace-Bench, H* Bench, and R2R-CE Val-Unseen benchmarks. These results demonstrate that robust panoramic reasoning requires dedicated pano-native supervision and geometry-aware model adaptation. All source code and proposed data will be publicly released.

2605.12715 2026-05-18 cs.LG cs.CL

Scaling Laws for Mixture Pretraining Under Data Constraints

混合预训练下的扩展规律

Anastasiia Sedova, Skyler Seto, Natalie Schluter, Pierre Ablin

AI总结 研究混合预训练中数据限制下的扩展规律,发现重复是影响目标领域性能的核心因素,提出考虑重复的混合扩展定律以优化预训练配置。

详情
AI中文摘要

随着语言模型规模的扩大,所需数据量也随之增加--然而许多目标数据源,如低资源语言或专业领域,本质上尺寸有限。常见策略是将稀缺但有价值的目标数据与大量通用数据混合,这带来了根本性的权衡:混合中目标数据过少会使模型对目标领域暴露不足,而过多则会导致重复示例过多,产生边际效益递减并最终过拟合。我们研究了超过2000次语言模型训练运行,涵盖多种模型和目标数据集大小,以及多种数据类型,包括多语言、领域特定和质量过滤混合。在所有设置中,我们发现重复是目标领域性能的核心驱动因素,且混合训练比单源训练更能容忍更高的重复:稀缺目标语料可重复使用15-20次,最优重复次数取决于目标数据大小、计算预算和模型规模。接下来,我们引入了一种考虑重复的混合扩展定律,该定律考虑了重复目标标记的递减价值和通用数据的正则化作用。优化扩展定律提供了一种系统的方法来计算有效的混合配置,从而在数据限制下为预训练提供实用的混合推荐。

英文摘要

As language models scale, the amount of data they require grows -- yet many target data sources, such as low-resource languages or specialized domains, are inherently limited in size. A common strategy is to mix this scarce but valuable target data with abundant generic data, which presents a fundamental trade-off: too little target data in the mixture underexposes the model to the target domain, while too much target data repeats the same examples excessively, yielding diminishing returns and eventual overfitting. We study this trade-off across more than 2,000 language-model training runs spanning multiple model and target dataset sizes, as well as several data types, including multilingual, domain-specific, and quality-filtered mixtures. Across all settings, we find that repetition is a central driver of target-domain performance, and that mixture training tolerates much higher repetition than single-source training: scarce target corpora can be reused 15-20 times, with the optimal number of repetitions depending on the target data size, compute budget, and model scale. Next, we introduce a repetition-aware mixture scaling law that accounts for the decreasing value of repeated target tokens and the regularizing role of generic data. Optimizing the scaling law provides a principled way to compute effective mixture configurations, yielding practical mixture recommendations for pretraining under data constraints.

2605.12309 2026-05-18 cs.CV

G$^2$TR: Generation-Guided Visual Token Reduction for Separate-Encoder Unified Multimodal Models

G$^2$TR: 基于生成的视觉标记减少方法用于分离编码统一多模态模型

Junxian Li, Kai Liu, Zizhong Ding, Zhixin Wang, Zhikai Chen, Renjing Pei, Yulun Zhang

AI总结 本文提出G$^2$TR方法,通过生成分支信号减少多模态模型的视觉标记,提升效率并保持性能,实验显示在图像理解和编辑任务中表现优异。

详情
AI中文摘要

单独编码统一多模态模型(UMMs)的发展伴随着由于密集视觉标记处理而迅速增长的推理成本。本文聚焦于理解侧的视觉标记减少以提高单独编码UMMs的效率。尽管该主题在MLLMs中已被广泛研究,现有方法通常依赖于注意力分数、文本-图像相似性等,隐含假设最终目标是判别推理。这一假设不适用于UMMs,其中理解侧的视觉标记必须保留模型对图像编辑的能力。我们提出G$^2$TR,一种用于单独编码UMMs的生成引导视觉标记减少框架。我们的关键见解是生成分支提供了一个任务无关的信号,用于识别不仅语义相关但对潜在空间图像重建和生成也重要的理解侧视觉标记。G$^2$TR通过估计与VAE潜在一致性来估计标记重要性,进行平衡的标记选择,并将冗余标记合并到保留的代表中以减少信息损失。该方法是训练无关的,即插即用的,并且仅在理解编码阶段之后应用,使其兼容现有的UMM推理流程。在图像理解和编辑基准上的实验表明,G$^2$TR显著减少了视觉标记和prefill计算,减少了1.94倍,同时保持推理准确性和编辑质量,在几乎所有基准上优于基线。代码地址:https://github.com/lijunxian111/G2TR。

英文摘要

The development of separate-encoder Unified multimodal models (UMMs) comes with a rapidly growing inference cost due to dense visual token processing. In this paper, we focus on understanding-side visual token reduction for improving the efficiency of separate-encoder UMMs. While this topic has been widely studied for MLLMs, existing methods typically rely on attention scores, text-image similarity and so on, implicitly assuming that the final objective is discriminative reasoning. This assumption does not hold for UMMs, where understanding-side visual tokens must also preserve the model's capabilities for editing images. We propose G$^2$TR, a generation-guided visual token reduction framework for separate-encoder UMMs. Our key insight is that the generation branch provides a task-agnostic signal for identifying understanding-side visual tokens that are not only semantically relevant but also important for latent-space image reconstruction and generation. G$^2$TR estimates token importance from consistency with VAE latent, performs balanced token selection, and merges redundant tokens into retained representatives to reduce information loss. The method is training-free, plug-and-play, and applied only after the understanding encoding stage, making it compatible with existing UMM inference pipelines. Experiments on image understanding and editing benchmarks show that G$^2$TR substantially reduces visual tokens and prefill computation by 1.94x while maintaining both reasoning accuracy and editing quality, outperforming baselines on almost all benchmarks. Code is at: https://github.com/lijunxian111/G2TR.

2605.10813 2026-05-18 cs.AI

NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

NanoResearch: 为个性化研究自动化共进化技能、记忆与政策

Jinhang Xu, Qiyuan Zhu, Yujun Wu, Zirui Wang, Dongxu Zhang, Marcia Tian, Yiling Duan, Siyuan Li, Jingxuan Wei, Sirui Han, Yike Guo, Odin Zhang, Conghui He, Cheng Tan

AI总结 本文提出NanoResearch框架,通过三重共进化解决研究自动化中的个性化需求,提升研究效率与用户体验。

Comments 40 pages, 14 figures, 7 tables

详情
AI中文摘要

基于大语言模型的多智能体系统如今能够自动化从构想到论文写作的整个研究流程,但一个根本问题依然存在:自动化为谁服务?研究人员在资源配置、方法论偏好和输出格式上各不相同。一个无论这些差异如何产生统一输出的系统将系统性地忽视每位用户,使个性化成为研究自动化真正可用的前提。然而,实现这一目标需要三种当前系统缺乏的能力:在不同项目间积累可重用的程序性知识、在不同会话中保留用户特定的经验、以及内化隐含的偏好,这些偏好难以显式形式化。我们提出NanoResearch,一个通过三级共进化解决这些差距的多智能体框架。技能库将重复操作提炼成紧凑的程序规则,可在不同项目间重用。记忆模块维护用户和项目特定的经验,使规划决策基于每位用户的研究历史。无标签的政策学习将自由形式反馈转化为规划器的持续参数更新,重塑后续协调。这三层结构共进化:可靠的技能产生更丰富的记忆,更丰富的记忆指导更好的规划,偏好内化持续调整循环以适应每位用户。大量实验表明,NanoResearch在最先进的AI研究系统上取得了显著优势,并在后续循环中逐步优化,以更低的成本产生更高质量的研究。

英文摘要

LLM-powered multi-agent systems can now automate the full research pipeline from ideation to paper writing, but a fundamental question remains: automation for whom? Researchers operate under different resource configurations, hold different methodological preferences, and target different output formats. A system that produces uniform outputs regardless of these differences will systematically under-serve every individual user, making personalization a precondition for research automation to be genuinely usable. However, achieving it requires three capabilities that current systems lack: accumulating reusable procedural knowledge across projects, retaining user-specific experience across sessions, and internalizing implicit preferences that resist explicit formalization. We propose NanoResearch, a multi-agent framework that addresses these gaps through tri-level co-evolution. A skill bank distills recurring operations into compact procedural rules reusable across projects. A memory module maintains user- and project-specific experience that grounds planning decisions in each user's research history. A label-free policy learning converts free-form feedback into persistent parameter updates of the planner, reshaping subsequent coordination. These three layers co-evolve: reliable skills produce richer memory, richer memory informs better planning, and preference internalization continuously realigns the loop to each user. Extensive experiments demonstrate that NanoResearch delivers substantial gains over state-of-the-art AI research systems, and progressively refines itself to produce better research at lower cost over successive cycles.

2605.10810 2026-05-18 cs.LG

Likelihood scoring for continuations of mathematical text: a self-supervised benchmark with tests for shortcut vulnerabilities

数学文本延续的似然评分:一个自监督基准及快捷漏洞测试

Daniel Ranard

AI总结 本文提出一个自动基准,用于预测技术论文中的隐藏文本。通过比较模型生成的辅助预测字符串与评分器对延续的预测,评估信息传递效果。实验显示,GPT-5.5等模型在方程后缀预测任务中优于基线,支持似然评分作为静态基准和快捷漏洞测试的工具。

Comments 13 pages + appendices, 4 figures; v2: expanded related work

详情
AI中文摘要

我们介绍了一个自动生成的基准,用于预测技术论文中的隐藏文本。一篇论文提供可见上下文X和隐藏延续Y;评估模型生成辅助预测字符串Z,评分器分别计算Y在有无Z条件下的下一个token概率。这提供了无标签测试,以判断Z是否传递关于延续的信息,对比Z为近期上下文而非预测的情况。主要测试床是方程后缀预测:预测器看到上下文和显示方程的一部分,然后预测剩余部分。任务混合了表面层arXiv/TeX文本建模与推理敏感的推理;后缀是许多等效延续之一,因此基准是统计而非逐项评估。在1363个方程延续(来自138篇近期物理和数学论文)上,GPT-5.5、Opus 4.7和GPT-5.4 nano的预测在Qwen3-8B和Kimi K2.6评分器下均优于基线控制,区分了模型家族和推理努力设置。为模拟快捷漏洞,我们还对评分器进行上下文-only提示微调,并将其应用于保留论文作为更强的控制。GPT-5.5预测仍优于微调控制;GPT-5.4 nano预测则不。更长的散文/TeX延续在控制上显示积极但更嘈杂的提升,集中在目标的开头附近。这些结果支持跨模型似然评分作为静态基准和快捷漏洞测试的工具。

英文摘要

We introduce an automatically generated benchmark for predicting hidden text in technical papers. A paper supplies visible context $X$ and a hidden continuation $Y$; the evaluated model writes an auxiliary forecast string $Z$, and a separate scorer assigns next-token probability to $Y$ both with and without conditioning on $Z$. This gives a label-free test of whether $Z$ transmits information about the continuation, compared against controls where $Z$ is recent context rather than a forecast. Our main testbed is equation-suffix prediction: the predictor sees context and the first part of a displayed equation, then forecasts the rest. The task mixes surface-level arXiv/TeX text modeling with reasoning-sensitive inference; the suffix is one of many roughly equivalent continuations, so the benchmark is read statistically rather than item-by-item. On 1363 equation continuations from 138 recent physics and mathematics papers, forecasts from GPT-5.5, Opus 4.7, and GPT-5.4 nano all improve clipped likelihood over the context control under both Qwen3-8B and Kimi K2.6 scorers, distinguishing model families and reasoning-effort settings without human labels. To emulate shortcuts where $Z$ further primes the scorer rather than making a useful forecast, we also fine-tune the scorer on context-only prompts and apply it to held-out papers as a stronger control. GPT-5.5 forecasts still beat this fine-tuned control; GPT-5.4 nano forecasts do not. Longer prose/TeX continuations show positive but noisier lift over controls, concentrated near the beginning of the target. These results support cross-model likelihood scoring as a static benchmark and as a setup for probing shortcut vulnerabilities before reinforcement learning or model-selection optimization is applied.

2605.10100 2026-05-18 cs.CV cs.AI

HYPERPOSE: Hyperbolic Kinematic Phase-Space Attention for 3D Human Pose Estimation

HYPERPOSE:超几何运动相空间注意力用于3D人体姿态估计

Vinduja Thekkath, Ashish Musale, Ajay Waghumbare, Upasna Singh

AI总结 HYPERPOSE提出一种在双曲空间内进行时空推理的3D人体姿态估计框架,通过超几何运动相空间注意力机制保留人体骨骼的树状结构,提升几何精度和时间动态建模。

详情
AI中文摘要

我们引入HYPERPOSE,一种新颖的3D人体姿态估计框架,其通过在洛伦兹模型的双曲空间$\mathbb{H}^d$中进行时空推理,原生保持人体骨骼的层次树状拓扑结构。当前最先进的姿态估计器依赖于transformers和图卷积网络来捕捉复杂的关节动态,但这些架构仅在欧几里得空间中操作,与人体固有的树状结构根本不匹配,导致指数体积扭曲和结构不一致。为此,我们脱离平坦空间,引入超几何运动相空间注意力(HKPSA)机制,原生嵌入复杂关节关系,同时结合多尺度窗口双曲注意力机制,以$O(TW)$复杂度高效建模时间动态。此外,为克服非欧几里得流形训练的已知不稳定性,HYPERPOSE引入新的黎曼损失套件和不确定性加权课程学习,强制物理测地线约束,如骨骼长度和速度一致性。在Human3.6M和MPI-INF-3DHP数据集上的广泛评估表明,HYPERPOSE在结构和时间一致性上达到最先进的水平,显著减少体积扭曲和速度误差,同时在整体位置准确性上建立新的最先进基准。

英文摘要

We introduce HYPERPOSE, a novel 3D human pose estimation framework that performs spatio-temporal reasoning entirely within the Lorentz model of hyperbolic space $\mathbb{H}^d$ to natively preserve the hierarchical tree topology of the human skeleton. Current state-of-the-art pose estimators aim to capture complex joint dynamics by relying on transformers and graph convolutional networks. Since these architectures operate exclusively in Euclidean space which fundamentally mismatches the inherent tree structure of the human body, these methods inevitably suffer from exponential volume distortion and struggle to maintain structural coherence. To this end, we depart from flat spaces and aim to improve geometric fidelity with Hyperbolic Kinematic Phase-Space Attention (HKPSA), natively embedding complex joint relationships without distortion, alongside a multi-scale windowed hyperbolic attention mechanism that efficiently models temporal dynamics in $O(TW)$ complexity. Furthermore, to overcome the well-known instability of training non-Euclidean manifolds, HYPERPOSE introduces a novel Riemannian loss suite and an uncertainty-weighted curriculum, enforcing physical geodesic constraints like bone length and velocity consistency. Extensive evaluations on the Human3.6M and MPI-INF-3DHP datasets demonstrate that HYPERPOSE achieves state-of-the-art structural and temporal coherence, significantly reducing both volume distortion and velocity error, while establishing new state-of-the-art benchmarks in overall positional accuracy.

2605.09403 2026-05-18 cs.LG cs.AI cs.NE

Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers

稀疏性推动计算:FFN架构如何重塑小规模Transformer中的注意力

Gabriel Smithline, Chris Mascioli

AI总结 研究通过单层Transformer在数字加法、模运算和直方图计数中发现,稀疏MoE路由将计算从FFN转移到注意力,且GLU门控旋转任务相关傅里叶结构至分布式子空间。

Comments Preprint

详情
AI中文摘要

Transformer前馈网络(FFN)块内的架构选择不仅影响自身,还重塑模型其余部分学习的计算。我们研究了单层Transformer在数字加法、模运算和直方图计数中的效果。比较密集FFN、门控线性单元(GLUs)、专家混合(MoE)和MoE-GLUs发现,稀疏MoE路由可将计算从FFN转移到注意力,且在基于进位的加法中效果最显著。我们分解了这种重新分布为减少每token的FFN容量和专家间的稀疏分区。关键发现,冻结随机路由几乎匹配学习路由,表明重新分布主要由架构稀疏性而非路由学习专精驱动。次要发现,GLU风格乘法门控将任务相关傅里叶结构从神经元基底旋转至分布式子空间,使神经元层面可解释性信息减少但保留结构化计算。我们通过随机路由、窄FFN、Top-2 MoE控制及参数匹配、激活函数和宽度缩放分析验证结论。这些结果表明,局部FFN设计选择对Transformer计算有非局部影响。

英文摘要

Architectural choices inside the Transformer feedforward network (FFN) block do not merely affect the block itself; they reshape the computations learned by the rest of the model. We study this effect in one-layer Transformers trained on digit addition with carry, modular arithmetic, and histogram counting. Comparing dense FFNs, gated linear units (GLUs), mixture-of-experts (MoE), and MoE-GLUs, we find that sparse MoE routing can shift computation from FFN to attention, with the strongest ablation-visible effect on carry-based addition. We decompose this redistribution into reduced per-token FFN capacity and sparse partitioning across experts. Critically, frozen random routing nearly matches learned routing, suggesting that redistribution is driven largely by architectural sparsity rather than router-learned specialization. As a secondary finding, GLU-style multiplicative gating rotates task-relevant Fourier structure out of the per-neuron basis and into distributed subspaces, making neuron-level interpretability less informative while preserving structured computation. We validate these conclusions with random-routing, narrow-FFN, and top-2 MoE controls, plus parameter-matching, activation-function, and width-scaling analyses. Together, these results show that local FFN design choices can have nonlocal consequences for Transformer computation.

2605.09391 2026-05-18 cs.AI

Do Linear Probes Generalize Better in Persona Coordinates?

在人格坐标中线性探针是否表现得更优?

Prasad Mahadik, Adrians Skapars

AI总结 本文研究了在人格坐标中是否存在能更稳健地捕捉有害行为的低维子空间,通过对比人格特定向量的PCA得到主成分,发现基于人格-PC投影训练的探针在多个数据集上表现更优。

Comments 15 pages, preprint. Revised version: corrected references and citation links; results unchanged

详情
AI中文摘要

在语言模型交互中,监控有害行为变得越来越必要,但文本监控不足,因为模型有时会策略性欺骗和沙袋行为。这促使使用白盒监控器如线性探针,可直接读取模型内部。目前,此类探针在分布偏移下会失效,限制了其实际应用。我们研究是否存在一个低维子空间,能更稳健地捕捉有害行为,同时排除 spuriously 相关特征。受助手轴和人格选择模型启发,我们使用对比性人格提示构造欺骗和阿谀的人格轴。通过无监督PCA得到的主成分,能清晰分离有害和无害的人格。在10个评估数据集中,我们发现基于人格-PC投影训练的探针在多个数据集上表现更优。我们还发现一个包含多种有害和无害行为的统一轴,能提升跨行为和数据集的泛化能力。总体而言,人格向量为构建更可转移的行为探针提供了有用的归纳偏置。

英文摘要

It is becoming increasingly necessary to have monitors check for harmful behaviors during language model interactions, but text-only monitoring has not been sufficient. This is because models sometimes exhibit strategic deception and sandbagging, changing their behavior during evaluation. This motivates the use of white-box monitors like linear probes, which can read the model internals directly. Currently, such probes can fail under distribution shift, limiting their usefulness in real settings. We study whether there exists a low-dimensional subspace of the model internals that captures harmful behaviors more robustly, while leaving out spuriously correlative features. Inspired by the Assistant Axis and Persona Selection Model, we construct persona axes for deception and sycophancy using contrastive persona prompts. The first principal components, obtained by unsupervised PCA of the persona-specific vectors, cleanly separate harmful and harmless personas. Across 10 evaluation datasets, we show that persona-derived directions transfer non-trivially and probes trained on persona-PC projections generalize better than probes trained on raw activations. We also find that a unified axis consisting of multiple harmful and harmless behaviors improves generalization across behaviors and datasets. Overall, persona vectors provide a useful inductive bias for building more transferable behavior probes.

2605.09231 2026-05-18 cs.CV stat.ML

An Elastic Shape Variational Autoencoder for Skeleton Pose Trajectories

一种弹性形状变分自编码器用于骨骼姿态轨迹

Arafat Rahman, Shashwat Kumar, Laura E. Barnes, Anuj Srivastava

AI总结 本文提出ES-VAE,通过运输平方根速度场表示在Kendall形状流形上学习骨骼轨迹的生成模型,有效分离形状动态,优于标准VAE和序列建模基线,在步态分析和动作识别中表现优异。

Comments 9 pages

详情
AI中文摘要

深度生成模型为建模复杂结构数据提供了灵活的框架,如图像、视频、3D物体和文本。然而,当应用于人体骨骼序列时,标准变分自编码器(VAEs)通常将大量容量分配给干扰因素,如摄像机方向、主体尺寸、视角和执行速度,而非形状和运动的内在几何结构。我们提出弹性形状-变分自编码器(ES-VAE),一种针对骨骼轨迹的几何感知生成模型,利用传输平方根速度场(TSRVF)表示在Kendall形状流形上。该表示本质上消除了形状的刚体平移、旋转和全局缩放以及序列的时间率变化,隔离了底层形状动态。ES-VAE编码器将骨骼序列映射到低维潜在空间,结合黎曼对数映射,而解码器利用相应的指数映射重建序列。我们在两个数据集上展示了ES-VAE的有效性。首先,我们分析骨骼步态周期以预测临床移动评分并分类主体为健康和中风后组。其次,我们在NTU RGB+D数据集上评估动作识别。在两种设置中,ES-VAE均优于标准VAE和一系列序列建模基线,包括时间卷积网络、Transformer和图卷积网络。更广泛地说,ES-VAE为在姿态形状流形上学习生成模型提供了系统框架,相较于现有深度学习方法,提供了改进的潜在表示和下游性能。

英文摘要

Deep generative models provide flexible frameworks for modeling complex, structured data such as images, videos, 3D objects, and texts. However, when applied to sequences of human skeletons, standard variational autoencoders (VAEs) often allocate substantial capacity to nuisance factors-such as camera orientation, subject scale, viewpoint, and execution speed-rather than the intrinsic geometry of shapes and their motion. We propose the Elastic Shape - Variational Autoencoder (ES-VAE), a geometry-aware generative model for skeletal trajectories that leverages the transported square-root velocity field (TSRVF) representation on Kendall's shape manifold. This representation inherently removes rigid translations, rotations, and global scaling of shapes, and temporal rate variability of sequences, isolating the underlying shape dynamics. The ES-VAE encoder maps skeletal sequences to a low-dimensional latent space incorporating the Riemannian logarithm map, while the decoder reconstructs sequences using the corresponding exponential map. We demonstrate the effectiveness of ES-VAE on two datasets. First, we analyze skeletal gait cycles to predict clinical mobility scores and classify subjects into healthy and post-stroke groups. Second, we evaluate action recognition on the NTU RGB+D dataset. Across both settings, ES-VAE consistently outperforms standard VAEs and a range of sequence modeling baselines, including temporal convolutional networks, transformers, and graph convolutional networks. More broadly, ES-VAE provides a principled framework for learning generative models of longitudinal data on pose shape manifolds, offering improved latent representation and downstream performance compared to existing deep learning approaches.

2605.09034 2026-05-18 cs.LG

Accelerating Zeroth-Order Spectral Optimization with Partial Orthogonalization from Power Iteration

通过幂迭代部分正交化加速零阶谱优化

Jiahe Chen, Ziye Ma

AI总结 本文提出通过幂迭代部分正交化加速零阶优化,改进了Muon中的牛顿-舒尔兹过程,提升了收敛速度和泛化能力,在超大规模语言模型微调中表现出色。

Comments added more related works discussion

详情
AI中文摘要

零阶优化(ZO)在微调大型语言模型(LLMs)中变得越来越重要,尤其是在边缘设备上,因为它可以调整模型以适应本地数据,而无需内存密集型反向传播。最近的研究尝试通过低维子空间搜索来减少ZO方差,但子空间限制本身未能充分利用关键优化几何,这促使了进一步的加速。在本文中,我们专注于隐藏层训练问题,其中谱优化器如Muon在AdamW之上表现优异,因其能够通过正交化利用弱谱方向。然而,我们发现与一阶设置不同,完全正交化在ZO设置中效果不佳,因为梯度估计高度噪声且不可靠。为了解决这个问题,我们提出应用部分谱正交化以加速ZO优化。为此,我们用更快、更集中的幂迭代方法替代Muon中的标志性牛顿-舒尔兹过程,使其只放大主导谱方向。此外,为了提高算法的效率和泛化能力,我们采用了幂迭代的流变体,该方法要求梯度方差低,通过约束我们的搜索在通过动量投影获得的子空间内实现,呼应了最近的进展。在LLM微调实验中,我们的方法在SuperGlue数据集上的OPT-13B模型上实现了从1.5倍到4倍的ZO-Muon收敛速度提升,当前的SOTA算法。在不同模型中,我们还达到了与MeZO、LOZO和ZO-Muon等强大ZO基线相比,大多数情况下在较少时间内达到竞争性的最终准确率。代码可在https://github.com/MOFA-LAB/ZO-MOPI.git获取。

英文摘要

Zeroth-order (ZO) optimization has become increasingly popular and important in fine-tuning large language models (LLMs), especially on edge devices due to its ability to adjust the model to local data without the need for memory-intensive back-propagation. Recent works try to reduce ZO variance through low-dimensional subspace search, but subspace restriction alone leaves key optimization geometry under-exploited, motivating additional acceleration. In this work, we focus on the hidden layer training problem in which spectral optimizers like Muon outperform AdamW due to its ability to exploit weak spectral directions by orthogonalization. However, we have discovered that unlike in the first-order setting, full orthogonalization works poorly in the ZO setting since the gradient estimates are highly noisy and unreliable. To address this issue, we propose applying partial spectral orthogonalization to accelerate ZO optimization. To do so, we replace the iconic Newton-Schulz procedure in Muon with the faster, more concentrated power-iteration method so that it only amplifies dominant spectral directions. Furthermore, to improve the efficiency and generalization of the algorithm, we adopted a streaming variant of power-iteration that requires low variance in gradients, which was achieved through constraining our search inside a subspace obtained through the projection of momentum, echoing recent advances. Experiments on LLM fine-tuning show that our method can achieve from 1.5x to 4x the convergence speed of ZO-Muon, the current SOTA algorithm, across SuperGlue datasets in the OPT-13B model. Across different models, we also reach competitive final accuracies with less time in most cases compared with strong ZO baselines such as MeZO, LOZO and ZO-Muon. Code is available at https://github.com/MOFA-LAB/ZO-MOPI.git.

2605.08856 2026-05-18 cs.LG

Controlling Transient Amplification Improves Long-horizon Rollouts

控制瞬态放大提高长周期展开

Adeel Pervez, Francesco Locatello

AI总结 本文通过分析发现瞬态放大是导致长周期展开误差的原因,提出交换性正则化方法,通过减少雅可比矩阵的正常性和交换子范数来提升模型的长周期展开能力。

详情
AI中文摘要

自回归神经模拟器现在在短周期物理系统预测上能与经典求解器相媲美,但其在长周期展开时准确性迅速下降。本文识别出在展开轨迹周围扰动的瞬态放大是导致展开误差的结构机制。通过线性化分析,我们发现当自回归轨迹上的雅可比矩阵非正交且非交换时,模型会瞬态放大误差,即使整体系统渐近稳定。基于此分析,我们提出交换性正则化:一种结合两种惩罚项的方法,旨在减少单个雅可比矩阵的非正交缺陷和跨步雅可比矩阵的交换子范数。惩罚项通过雅可比-向量积估计,无推理时间成本。我们展示了一个传播界,量化在近似交换性和正交性下的展开误差。我们评估了带有交换性正则化的UNet和FNO变体,在合成和真实1D和2D时空数据上实现了数千步的长周期展开。此外,我们展示该方法在ERA5数据上改进FourCastNet气候预测,无需使用任何新数据。增益在分布外情况最显著:训练在几百步轨迹上,正则化模型在初始条件上可维持数千步的展开,而基线模型则发散。

英文摘要

Autoregressive neural simulators now match classical solvers on short-horizon prediction of physical systems, yet their accuracy degrades rapidly when rolled out over long horizons. In this work, we identify transient amplification of perturbations around rollout trajectories as a structural mechanism driving rollout error. Using a linearization analysis we show that when the Jacobians along an autoregressive trajectory are non-normal and non-commuting, the model amplifies errors transiently, resulting in model rollout drift even when the overall system is asymptotically stable. Building on the analysis, we propose commutativity regularization: a combination of two penalties designed to reduce the normality defect of individual Jacobians and the commutator norm of Jacobians across steps. The penalties are estimated with Jacobian-vector products and have no inference-time cost. We show a propagator bound that quantifies rollout error under approximate commutativity and normality. We evaluate UNet and FNO variants with commutativity regularization on 1D and 2D spatio-temporal data in synthetic and real settings, showing successful long-horizon rollouts over thousands of steps. Further, we show that the method improves FourCastNet climate forecasts on ERA5 without using any new data. The gain is most pronounced out-of-distribution: trained on trajectories of a few hundred steps, regularized models remain in-distribution for thousands of rollout steps on initial conditions where baselines diverge.

2605.08464 2026-05-18 cs.LG

The Geometric Structure of Models Learning Sparse Data

学习稀疏数据的模型几何结构

Thomas Walker, T. Mitchell Roddenberry, Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

AI总结 本文研究了在稀疏条件下模型通过局部几何结构实现成功学习的机制,提出正常对齐概念,并引入GrokAlign正则化策略提升深度网络训练效率,同时改进递归特征机以增强对抗鲁棒性。

Comments 27 pages, 7 figures, 5 tables

详情
AI中文摘要

曼哈顿假设(MH)常用于解释机器学习如何克服维度诅咒,但仅在训练数据能提供足够密集的底层低维流形样本或存在此类流形的 regime 中适用。本文描述了MH不适用的稀疏 regime,并证明正常对齐分类器在 norm 约束下最小化训练目标并实现最大局部鲁棒性。对于连续分段仿射深度网络,正常对齐表现为网络诱导幂图分区内的质心对齐,源于特征学习 regime。受理论启发,我们引入GrokAlign正则化策略主动诱导正常对齐,并展示其显著加速深度网络训练动态。此外,我们应用正常对齐原理到递归特征机(RFMs)中,引入递归特征对齐机(RFAMs),证明其在表格数据训练下比RFMs具有更强的对抗鲁棒性。

英文摘要

The manifold hypothesis (MH) is often used to explain how machine learning can overcome the curse of dimensionality. However, the MH is only applicable in regimes where the training data provides a sufficiently dense sample of the underlying low-dimensional data manifold, or where such a low-dimensional manifold is conceivably present. We describe the regimes where the MH is not applicable as sparse. In this paper, we demonstrate that models succeed in the sparse regime by exploiting a highly structured local geometry, a property we formalize as normal alignment. We prove that normal-aligned classifiers -- whose input-output Jacobians are rank-one and align perfectly with the training data -- minimize the training objective under norm constraints and achieve maximal local robustness under a non-zero Jacobian constraint. For continuous piecewise-affine deep networks, normal alignment manifests geometrically as centroid alignment within the network's induced power diagram partition and results from the feature-learning regime. Motivated by these theoretical insights, we introduce GrokAlign, a regularization strategy that actively induces normal alignment. We demonstrate that GrokAlign significantly accelerates the training dynamics of deep networks relevant to the grokking phenomenon. Furthermore, we apply the principle of normal alignment to Recursive Feature Machines (RFMs) to introduce Recursive Feature Alignment Machines (RFAMs). We show that RFAMs exhibit greater adversarial robustness compared to RFMs when trained on tabular data.

2605.08401 2026-05-18 cs.CL cs.AI

AIPO: Learning to Reason from Active Interaction

AIPO: 通过主动交互学习推理

Junnan Liu, Linhao Luo, Thuy-Trang Vu, Gholamreza Haffari

AI总结 AIPO通过主动多智能体交互提升大语言模型推理能力,引入三个协作代理解决推理瓶颈,改进探索效率并扩展能力边界。

Comments Preprint

详情
AI中文摘要

近期大语言模型(LLM)的进展展示了卓越的推理能力,主要受可验证奖励强化学习(RLVR)推动。然而,现有RL算法面临探索受限于策略模型固有能力边界的根本限制。尽管近期方法引入外部专家演示扩展此边界,但通常依赖完整轨迹级指导,样本效率低、信息稀疏且可能限制探索于静态指导空间。受多智能体系统的启发,我们提出AIPO,一种增强的强化学习框架,通过探索期间的主动多智能体交互提升LLM推理能力。具体而言,AIPO使策略模型在遇到推理瓶颈时主动咨询三个功能协作代理,即验证代理、知识代理和推理代理,从而获得细粒度和针对性的指导,主动扩展其能力边界。我们进一步引入定制的重要性采样系数和剪裁策略,以缓解从代理提供的反馈中学习时出现的离策略偏差和梯度消失问题。训练后,策略模型可独立进行推理而不依赖协作代理。在多样化的推理基准测试中,包括AIME、MATH500、GPQA-Diamond和LiveCodeBench,AIPO一致提升了推理性能,跨不同策略模型和RLVR算法具有鲁棒泛化能力,并有效扩展了策略模型的推理能力边界。

英文摘要

Recent advances in large language models (LLMs) have demonstrated remarkable reasoning capabilities, largely stimulated by Reinforcement Learning with Verifiable Rewards (RLVR). However, existing RL algorithms face a fundamental limitation: their exploration remains largely constrained by the inherent capability boundary of the policy model. Although recent methods introduce external expert demonstrations to extend this boundary, they typically rely on complete trajectory-level guidance, which is sample-inefficient, information-sparse, and may confine exploration to a static guidance space. Inspired by the potential of multi-agent systems, we propose $\textbf{AIPO}$, an enhanced reinforcement learning framework that improves LLM reasoning through active multi-agent interaction during exploration. Specifically, AIPO enables the policy model to proactively consult three functional collaborative agents, $\textit{Verify Agent}$, $\textit{Knowledge Agent}$, and $\textit{Reasoning Agent}$, when encountering reasoning bottlenecks, thereby receiving fine-grained and targeted guidance to actively expand its capability boundary during training. We further introduce a tailored importance sampling coefficient together with a clipping strategy to mitigate the off-policy bias and gradient vanishing issues that arise when learning from agent-provided feedback. After training, the policy model performs reasoning independently without relying on collaborative agents. Extensive experiments on diverse reasoning benchmarks, including AIME, MATH500, GPQA-Diamond, and LiveCodeBench, show that AIPO consistently improves reasoning performance, generalizes robustly across different policy models and RLVR algorithms, and effectively expands the reasoning capability boundary of the policy model.

2605.06475 2026-05-18 cs.AI cs.CV

Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features

通过视觉手写特征的证据深度回归进行历史手稿的概率年代测定

Ranjith Chodavarapu

AI总结 本文提出一种基于视觉特征的深度回归方法,用于确定历史手稿的年代,通过分解不确定性提升预测精度,实验显示模型在测试集上取得优异性能。

详情
AI中文摘要

我们介绍了一种概率方法,用于仅通过视觉特征确定历史手稿页面的年代。与以往文献中将世纪聚合为类别的做法不同,我们将年代测定视为一个在连续年份轴上的证据深度回归问题,使神经网络能够在一个前向传递中输出完整的预测分布,包含分解的偶然性和epistemic不确定性。我们的架构结合了EfficientNet-B2主干网络和通过联合负对数似然和证据正则化目标训练的Normal-Inverse-Gamma(NIG)输出头。在DIVA-HisDB基准(150页,3个中世纪手稿,151936个补丁)上,我们的模型在测试集上取得了5.4年的MAE,远低于50年的世纪标签监督粒度,93%的补丁在5年内,97%在10年内。我们的方法在单次前向传递中实现了PICP=92.6%的校准,优于MC Dropout(PICP=88.2%,50次传递)和Deep Ensembles(PICP=79.7%,5个模型)的性能,且推理成本低5倍。不确定性分解显示偶然性不确定性是年代误差的强预测因子(Spearman ρ=0.729),且对最确定的20%补丁的有选择性预测可提供0.5年的MAE。我们展示了预测的不确定性随着图像退化程度的恶化而增加,空间分解映射解释了哪些手写区域导致偶然性不确定性,且页面级聚合将MAE降低到4.5年,不确定性与页面级误差之间的相关性为ρ=0.905。

英文摘要

We introduce a probabilistic approach for dating historical manuscript pages from visual features alone. Instead of aggregating centuries into classes as is standard in the previous literature, we pose dating as an evidential deep regression problem over a continuous year axis, allowing our neural network to output a full predictive distribution with decomposed aleatoric and epistemic uncertainty in a single forward pass. Our architecture combines an EfficientNet-B2 backbone with a Normal-Inverse-Gamma (NIG) output head trained with a joint negative-log-likelihood and evidence-regularization objective. On the DIVA-HisDB benchmark (150 pages, 3 medieval codices, 151,936 patches), our model scores a test MAE of 5.4 years, well below the 50-year century-label supervision granularity, with 93\% of patches within 5 years and 97\% within 10 years. Our approach achieves \textbf{PICP=92.6\%}, the best calibration among all compared methods, in a single forward pass, outperforming MC Dropout (PICP=88.2\%, 50 passes) and Deep Ensembles (PICP=79.7\%, 5 models) at $5\times$ lower inference cost. Uncertainty decomposition shows aleatoric uncertainty is a strong predictor of dating error (Spearman $ρ=0.729$), and a selective prediction about the most certain 20\% of patches can provide \textbf{0.5 years MAE}. We show that predicted uncertainty increases as image degradation worsens, spatial decomposition maps explain which script regions cause aleatoric uncertainty, and page-level aggregation reduces MAE to 4.5 years with $ρ=0.905$ between uncertainty and page-level error.

2605.06223 2026-05-18 cs.AI cs.RO

ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries

ProCompNav:基于比较判断的主动实例导航

Junhyuk Kwon, Seungjoon Lee, Hyejin Park, Kyle Min, Jungseul Ok

AI总结 ProCompNav通过两阶段框架解决用户查询歧义问题,通过比较判断逐步缩小候选集,提升导航成功率并减少用户响应长度。

Comments Project page: https://tree-jhk.github.io/procompnav/ . Code: https://github.com/tree-jhk/procompnav/

详情
AI中文摘要

自然语言实例导航在初始请求不唯一指定目标实例时变得具有挑战性。一个实用的代理应通过主动询问区分目标与相似干扰项所需的信息来减轻用户负担,而非要求详细描述。现有方法常无法达到此目标:它们可能在初步可行候选者前停止,或在收集多个候选后仅询问单个候选的属性,而非选择区分候选池的提问。因此,尽管有对话,代理仍可能无法区分目标与干扰项,导致提前决策和冗长用户响应。我们提出了Proactive Instance Navigation with Comparative Judgment(ProCompNav),一个两阶段框架,首先构建候选池,然后通过比较判断确定目标。每轮中,ProCompNav提取一个属性-值对,将当前池分割,询问二元是/否问题,并一次性修剪所有不一致的候选。这将歧义消除从开放性目标描述转为池级辨别提问,每个问题旨在缩小候选集。在CoIN-Bench上,ProCompNav在相同最小输入和非交互基线中提高了成功率,并显著减少了响应长度。ProCompNav还在TextNav上实现了最先进的成功率,表明比较判断对相似干扰项间的实例导航具有广泛价值。代码可在https://github.com/tree-jhk/procompnav获取。

英文摘要

Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target from similar distractors, rather than requiring a detailed description upfront. Existing approaches often fall short of this goal: they may stop at the first plausible candidate before sufficiently exploring alternatives, or, even after collecting multiple candidates, ask about the target's attributes derived from individual candidates rather than questions selected to distinguish candidates in the pool. As a result, despite the dialogue, the agent may still fail to distinguish the target from distractors, leading to premature decisions and lengthy user responses. We propose Proactive Instance Navigation with Comparative Judgment (ProCompNav), a two-stage framework that first constructs a candidate pool and then identifies the target through comparative judgment. At each round, ProCompNav extracts an attribute-value pair that splits the current pool, asks a binary yes/no question, and prunes all inconsistent candidates at once. This reframes disambiguation from open-ended target description to pool-level discriminative questioning, where each question is chosen to narrow the candidate set. On CoIN-Bench, ProCompNav improves Success Rate over interactive baselines with the same minimal input and non-interactive baselines with detailed descriptions, while substantially reducing Response Length. ProCompNav also achieves state-of-the-art Success Rate on TextNav, suggesting that comparative judgment is broadly useful for instance-level navigation among similar distractors. Code is available at https://github.com/tree-jhk/procompnav.

2605.05652 2026-05-18 cs.LG

Information-Preserving Domain Transfer with Unlabeled Data in Misspecified Simulation-Based Inference

在模拟基础推断中保留信息的域转移与未标记数据

Joon Jang, Eunho Jeong, Kyu Sung Choi, Hyeonjin Kim

AI总结 本文提出SPIN框架,利用未标记的真实世界数据进行信息保留的域转移,提升模拟基础推断的可靠性,尤其在模型不准确时表现更佳。

详情
AI中文摘要

Simulation-based inference (SBI) 提供了一种从模拟器生成的数据中进行近似贝叶斯参数推断的方法,无需显式评估似然函数。其可靠性在模型不准确时会下降,因为现实观测无法很好地由训练所用的模拟器表示。现有方法利用未标记的真实世界数据通常对模拟和现实数据分布进行对齐,但仅对齐边缘分布无法直接保留用于后验推断所需的参数相关信息。我们提出SPIN,一种具有参数相关信息保留域转移的SBI框架,使用未配对的真实世界观测。在训练过程中,SPIN将带标签的模拟观测转换到现实域并返回模拟域,利用原始模拟标签来鼓励域转移以保留参数相关的互信息。在测试时,学习到的真实到模拟传输映射将现实观测转换到模拟域进行后验推断,无需现实参数标签或配对的真实-模拟观测。在受控的合成和物理现实世界基准测试中,SPIN提升了现实世界后验推断性能,随着不准确程度的增加,改进更加明显。

英文摘要

Simulation-based inference (SBI) provides amortized Bayesian parameter inference from simulator-generated data without requiring explicit likelihood evaluation. Its reliability can degrade under model misspecification, where real-world observations are not well represented by the simulator used for training. Existing methods using unlabeled real-world data often align simulated and real-world data distributions, but marginal alignment alone does not directly preserve parameter-relevant information needed for posterior inference. We propose SPIN, an SBI framework with parameter-relevant information-preserving domain transfer using unlabeled, unpaired real-world observations. During training, SPIN translates labeled simulator observations toward the real-world domain and back to the simulator domain, using the original simulator labels to encourage domain transfer that preserves parameter-relevant mutual information. At test time, the learned real-to-simulator transport maps real-world observations into the simulator domain for posterior inference, without requiring real-world parameter labels or paired real--simulator observations. Across controlled synthetic and physical real-world benchmarks, SPIN improves real-world posterior inference, with the improvement becoming clearer as misspecification increases.

2605.05112 2026-05-18 cs.LG

Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime

回滚通过率控制:引导二元奖励强化学习进入其最信息丰富的区域

Tianshu Zhu, Wenyu Zhang, Xiaoying Zuo, Lun Tian, Haotian Zhao, Yucheng Zeng, Jingnan Gu, Daxiang Dong, Jianmin Wu, Dawei Yin, Dou Shen

AI总结 本文提出Prefix Sampling方法,通过回放自生成轨迹前缀引导二元奖励强化学习进入最信息丰富的区域,提升模型效率和性能。

Comments 25 pages, 8 figures, 12 tables; revised formatting

详情
AI中文摘要

针对软件工程中的强化学习,其大量计算用于状态轨迹,其二元奖励高度偏斜且对比度弱。本文将其视为通过率控制问题,证明在四准则下,二元奖励信号最强于约50%的回滚通过率。提出Prefix Sampling方法,通过回放自生成轨迹前缀引导偏斜组进入此区域:成功前缀给多数失败组先机,失败前缀抑制多数通过组。回放状态通过现有回滚路径重建,回放token从损失中屏蔽,仅优化当前策略延续。在SWE-bench Verified上,PS在评估变异性内达到基线高分区域,同时在Qwen3-14B和Qwen3-32B上分别实现2.01倍和1.55倍的端到端时间加速;14B峰值从0.274提升至0.295。AIME 2025实验在4B和8B上显示相同通过率控制模式,4B消融实验表明收益归因于回放、双向覆盖和自适应控制。

英文摘要

Agentic reinforcement learning (RL) for software engineering spends much of its compute on stateful trajectories whose grouped binary rewards are highly skewed and weakly contrastive. We frame this as pass-rate control and show that the binary reward-side signal is strongest near a 50% rollout pass rate under four criteria: reward entropy, group-filtering survival, leave-one-out (RLOO) advantage energy under Group Relative Policy Optimization (GRPO), and success-failure pair count. We propose Prefix Sampling (PS), which replays self-generated trajectory prefixes to steer skewed groups toward this regime: successful prefixes give mostly failing groups a head start, while failing prefixes handicap mostly passing groups. Replayed states are reconstructed through the existing rollout path, and replayed tokens are masked from the loss so optimization applies only to current-policy continuations. On SWE-bench Verified, PS reaches the baseline high-score regime within evaluation variability while delivering 2.01x and 1.55x end-to-end wall-clock speedups on Qwen3-14B and Qwen3-32B; the 14B peak improves from 0.274 to 0.295. AIME 2025 experiments on 4B and 8B show the same pass-rate-control pattern, and 4B ablations attribute gains to replay, bidirectional coverage, and adaptive control.

2605.03964 2026-05-18 cs.LG physics.chem-ph

Pretrained Model Representations as Acquisition Signals for Active Learning of MLIPs

预训练模型表示作为主动学习MLIPs的获取信号

Eszter Varga-Umbrich, Shikha Surana, Paul Duckworth, Jules Tilly, Olivier Peltre, Zachary Weller-Davies

AI总结 本文研究预训练MLIP的潜在空间是否包含有效获取信息,提出两种获取信号提升主动学习效率,减少数据需求,提升性能。

Comments 8 main pages, 28 total pages

详情
AI中文摘要

训练机器学习互原子势(MLIPs)用于反应化学通常受限于量子化学标签成本高和候选池中过渡态配置稀缺。主动学习(AL)可缓解这些成本,但其效果依赖于获取规则。我们探讨预训练MLIP的潜在空间是否已包含有效获取信息,无需辅助不确定性头、贝叶斯训练和微调或委员会集合。我们引入两种直接从预训练MACE势导出的获取信号:有限宽度神经切线核(NTK)和由隐藏潜在空间特征构建的激活核。在反应化学基准上,两种核均优于固定描述符基线、委员会分歧和随机获取,减少数据需求平均降低能量误差38%和力误差28%。我们进一步表明,预训练模型诱导的相似性空间保留化学有意义的结构,并提供比随机初始化或固定描述符基核更可靠的残差不确定性估计。我们的结果表明,预训练使潜在空间几何与模型误差对齐,提供实用且充分的获取信号用于反应MLIP微调。

英文摘要

Training machine learning interatomic potentials (MLIPs) for reactive chemistry is often bottlenecked by the high cost of quantum chemical labels and the scarcity of transition state configurations in candidate pools. Active learning (AL) can mitigate these costs, but its effectiveness hinges on the acquisition rule. We investigate whether the latent space of a pretrained MLIP already contains the information necessary for effective acquisition, eliminating the need for auxiliary uncertainty heads, Bayesian training and fine-tuning, or committee ensembles. We introduce two acquisition signals derived directly from a pretrained MACE potential: a finite-width neural tangent kernel (NTK) and an activation kernel built from hidden latent space features. On reactive-chemistry benchmarks, both kernels consistently outperform fixed-descriptor baselines, committee disagreement, and random acquisition, reducing the data required to reach performance targets by an average of 38% for energy error and 28% for force error. We further show that the pretrained model induces similarity spaces that preserve chemically meaningful structure and provide more reliable residual uncertainty estimates than randomly initialised or fixed-descriptor-based kernels. Our results suggest that pretraining aligns latent-space geometry with model error, yielding a practical and sufficient acquisition signal for reactive MLIP fine-tuning.

2605.03258 2026-05-18 cs.LG cs.CL

The Right Answer, the Wrong Direction: Why Transformers Fail at Counting and How to Fix It

正确的答案,错误的方向:为什么Transformer在计数上失败以及如何修复

Gabriel Garcia

AI总结 研究发现Transformer在计数任务中失败是由于输出路径与所需令牌对齐问题,通过局部调整输出头和注意力机制可提升计数性能。

Comments 27 pages, 3 figures, 18 tables. Code: https://github.com/Gpgabriel25/GeometricReadoutBottleneck

详情
AI中文摘要

大型语言模型在简单计数任务中常表现不佳,即使计数项在提示中也难以正确处理。本文通过分析Pythia、Qwen3和Mistral三种模型家族(0.4B至14B参数),发现Transformer未能将计数信息转换为正确的输出令牌是主要原因。线性探针显示中间层能恢复正确计数(R²>0.99),但计数方向与数字令牌输出头几乎正交(|cos|≤0.032)。通过局部调整输出头和注意力机制的LoRA,可显著提升计数性能。Logit-lens分析显示,计数信息在第35层被正确存储,但输出路径与所需令牌对齐问题仍需解决。这些结果表明计数失败是几何读取瓶颈,而非内部表示问题:模型知道计数,但输出路径与所需令牌对齐不当。

英文摘要

Large language models often fail at simple counting tasks, even when items to count are in the prompt. We investigate whether this failure occurs because transformers do not represent counts internally, or because they cannot convert representations to the correct output tokens. Across three model families: Pythia, Qwen3, and Mistral, ranging from 0.4B to 14B parameters, we find evidence for the second explanation. Linear probes recover the correct count from intermediate layers with $R^2>0.99$, showing that the information is present. However, the internal directions that encode counts are nearly orthogonal to digit-token output-head rows ($|\cos| \leq 0.032$). In other words, the model stores the count in a form that the digit logits do not naturally read out. We localize this failure with two interventions. Updating only the digit rows of the output head (36,864 parameters) substantially improves constrained digit prediction (60.7--100.0% on four tasks), but it does not fix unconstrained generation (0%); we do not claim that digit-row repair fixes open-ended text. By contrast, small LoRA on attention Q/V (7.67M parameters) improves upstream routing and achieves 83.1%$\pm$7.2% in true greedy autoregressive generation (deployable fix). Logit-lens at layer 35 (entity counting; correct-digit rank): (i) median over 3 seeds drops from order-$10^4$ to 1; (ii) seed 42 shows $54{,}332 \to 838$ (median top-1 while one seed stays far below). Norm, logit-lens, and cross-task analyses generalize the bottleneck to counting, addition, and list length; nulls on MMLU and GSM8K and limited DROP transfer. These results identify counting failure as a geometric readout bottleneck, not an internal-representation failure: the model knows the count but the output pathway is misaligned with tokens needed to express it.

2605.02689 2026-05-18 cs.LG

MSMixer: Learned Multi-Scale Temporal Mixing with Complementary Linear Shortcut for Long-Term Time Series Forecasting

MSMixer:基于互补线性快捷方式的多尺度时序混合学习

Ahmed Cherif

AI总结 MSMixer通过多尺度分支和互补快捷方式提升长时序预测精度,以112K参数实现优于现有模型的平均MSE,且在参数更少的情况下超越Transformer基线。

Comments 21 pages, 5 figures, 8 tables. Submitted to International Journal of Machine Learning and Cybernetics (Springer)

详情
AI中文摘要

长时序预测需要模型同时捕捉快速振荡、中等范围周期性和缓慢演变的宏观趋势。现有轻量MLP模型通常仅在单一时间分辨率上运行,限制了对多尺度模式的显式建模能力。我们提出MSMixer,一种通道无关的多尺度MLP架构,通过三个互补创新解决这一限制:(i) 三个并行尺度分支,采样因子为{1x, 4x, 16x},具有独立的MLP块;(ii) 可学习的softmax门,动态加权分支输出;(iii) DLinear互补快捷方式,提供完整的窗口趋势和季节性上下文。MSMixer在H=96时仅含112K参数,运行复杂度为O(T)。在四个ETT基准测试中,MSMixer在标准时间顺序划分和三个随机种子下,实现了轻量模型中最低的平均MSE(0.357),优于DLinear(0.386,-7.4%)和NLinear(0.365,-2.1%),赢得12/16配置。在五种Transformer基线中,MSMixer在9/16配置中实现最佳或第二好的MSE,且参数仅为PatchTST的5倍。消融分析和敏感性分析证实了多尺度分支和DLinear快捷方式的互补贡献。

英文摘要

Long-term time series forecasting requires models that simultaneously capture rapid oscillations, medium-range periodicities, and slowly evolving macro-trends from a fixed look-back window. Existing lightweight MLP-based models typically operate on a single temporal resolution, limiting their ability to explicitly model patterns at multiple scales. We propose MSMixer, a channel-independent multi-scale MLP architecture that addresses this limitation through three complementary innovations: (i) three parallel scale branches at down-sample factors {1x, 4x, 16x} with independent MLP blocks, (ii) a learnable softmax gate that dynamically weighs branch outputs, and (iii) a DLinear complementary shortcut that provides full-window trend and seasonality context. MSMixer contains only 112K parameters at H=96 and runs at O(T) complexity. Evaluated on four ETT benchmarks with standard chronological splits and three random seeds, MSMixer achieves the lowest average MSE (0.357) among lightweight models, outperforming DLinear (0.386, -7.4%) and NLinear (0.365, -2.1%), winning 12 of 16 configurations. Against five Transformer-based baselines from the literature, MSMixer achieves best or second-best MSE in 9 of 16 configurations while using 5x fewer parameters than PatchTST. Ablation and sensitivity analyses confirm the complementary contributions of the multi-scale branches and the DLinear shortcut.

2605.02609 2026-05-18 cs.LG

Gradient-Discrepancy Acquisition for Pool-Based Active Learning

梯度不一致性获取用于池式主动学习

Mohamadsadegh Khosravani, Sandra Zilles

AI总结 本文提出基于梯度的获取准则,用于替代不确定性采样中的不确定性度量,或整合到考虑采样点分布和标签不确定性的多样性方法中,理论和实验证明其有效性。

详情
AI中文摘要

主动学习的有效性依赖于学习算法选择潜在信息数据点的获取准则。本文提出一种新的基于梯度的获取准则,源自Luo等人(2022)引入的一般化界限。该准则可应用于不确定性采样中的不确定性度量,或整合到考虑采样点分布和标签不确定性的多样性方法中。我们提供了所提获取准则的理论证明,并通过实验证明其有效性。

英文摘要

The effectiveness of active learning hinges on the choice of the acquisition criterion by which a learning algorithm selects potentially informative data points whose label is subsequently queried. This paper proposes a novel gradient-based acquisition criterion, derived from a generalization bound introduced by Luo et al. (2022). This criterion can be applied in lieu of uncertainty measures in uncertainty sampling, or incorporated into diversity-based methods that consider the spread of sampled points in addition to the uncertainty of their labels. We provide a theoretical justification of the proposed acquisition criterion, and demonstrate its effectiveness in an empirical evaluation.

2605.00934 2026-05-18 cs.LG cs.CV stat.ML

Structured Analytic Coherent Point Drift for Non-Rigid Point Set Registration

结构化分析一致点漂移用于非刚性点集配准

Wei Feng, Haiyong Zheng

AI总结 本文提出Analytic-CPD,通过结构化分析映射改进传统CPD,实现更高效且可控的非刚性点集配准,实验验证其在不同数据集上的有效性与精度效率优势。

Comments Revised version. Supplementary material incorporated as appendices; method, implementation, and experimental details expanded

详情
AI中文摘要

Coherent Point Drift (CPD) 是一种用于无监督非刚性点集配准的概率框架。其标准非刚性M-step然而依赖于点索引高斯核系统,其大小随移动点数量增长,导致大点集的形变估计计算负担重且难以控制复杂度。为解决这些限制,我们提出Analytic-CPD,一种新的无监督非刚性配准框架,为CPD提供结构化分析重述。Analytic-CPD保留CPD后验对应层,但将M-step从点索引核位移估计提升到结构化分析映射估计。通过将CPD的高斯混合后验机制与结构化分析映射(SAM)耦合,该方法获得一个系数维度由环境维度和分析阶数而非移动点数量决定的形变模型。更重要的是,形变估计在可解释的分析函数空间层次上组织,因此分析阶数可以随着后验对应可靠性增加而逐步提升。我们通过增加阶数连续策略与减少阶段长度实现该想法:低阶分析映射首先稳定后验对应结构,而更高阶模式随后细化非线性残差形变。在受控模型匹配、平滑模型不匹配和注册人体形状数据上的实验验证了Analytic-CPD的有效性和优越的精度-效率性能。

英文摘要

Coherent Point Drift (CPD) is a representative probabilistic framework for unsupervised non-rigid point set registration. Its standard non-rigid M-step, however, relies on a point-indexed Gaussian-kernel system whose size grows with the number of moving points, making deformation estimation computationally heavy for large point sets and difficult to control in complexity during registration. To address these limitations, we propose Analytic-CPD, a new unsupervised non-rigid registration framework that gives CPD a structured analytic reformulation. Analytic-CPD preserves the CPD posterior correspondence layer, but lifts the M-step from point-indexed kernel displacement estimation to structured analytic mapping estimation. By coupling the Gaussian-mixture posterior mechanism of CPD with Structured Analytic Mappings (SAM), the method obtains a deformation model whose coefficient dimension is governed by the ambient dimension and analytic order rather than by the number of moving points. More importantly, deformation estimation is organized over an interpretable hierarchy of analytic function spaces, so the analytic order can be increased progressively as posterior correspondences become more reliable. We implement this idea through an increasing-degree continuation strategy with decreasing stage lengths: low-order analytic maps first stabilize the posterior correspondence structure, while higher-order modes later refine nonlinear residual deformation. Experiments on controlled model-matched, smooth model-mismatch, and registered human-shape data demonstrate the effectiveness and favorable accuracy--efficiency performance of Analytic-CPD.

2605.00674 2026-05-18 cs.CL

Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs

超越基准:MathArena作为LLMs数学评估平台

Jasper Dekoninck, Nikola Jovanović, Tim Gehrunger, Kári Rögnvaldsson, Ivo Petrov, Chenhao Sun, Martin Vechev

AI总结 本文提出MathArena作为持续维护的数学评估平台,涵盖广泛任务,展示前沿模型在数学推理中的进步。

详情
AI中文摘要

大型语言模型(LLMs)正成为日益强大的数学合作者,但静态基准不再足以评估进展:它们通常范围狭窄、迅速饱和且很少更新。本文构建了MathArena平台,涵盖证明竞赛、arXiv研究问题和Lean形式化证明生成等任务,通过持续维护和更新评估协议,确保平台挑战性。最强模型GPT-5.5在2026年美国数学竞赛中达到98%,在研究级问题中达到74%,展示了LLMs在数学推理中的快速进步。

英文摘要

Large language models (LLMs) are becoming increasingly capable mathematical collaborators, but static benchmarks are no longer sufficient for evaluating progress: they are often narrow in scope, quickly saturated, and rarely updated. This makes it hard to compare models reliably and track progress over time. Instead, we need evaluation platforms: continuously maintained systems that run, aggregate, and analyze evaluations across many benchmarks to give a comprehensive picture of model performance within a broad domain. In this work, we build on the original MathArena benchmark by substantially broadening its scope from final-answer olympiad problems to a continuously maintained evaluation platform for mathematical reasoning with LLMs. MathArena now covers a much wider range of tasks, including proof-based competitions, research-level arXiv problems, and formal proof generation in Lean. Additionally, we maintain a clear evaluation protocol for all models and regularly design new benchmarks as model capabilities improve to ensure that MathArena remains challenging. Notably, the strongest model, GPT-5.5, now reaches 98% on the 2026 USA Math Olympiad and 74% on research-level questions, showing that frontier models can now comfortably solve extremely challenging mathematical problems. This highlights the importance of continuously maintained evaluation platforms like MathArena to track the rapid progress of LLMs in mathematical reasoning.

2604.28111 2026-05-18 cs.RO

GSDrive: Reinforcing Driving Policies by Multi-mode Future Trajectory Probing with 3D Gaussian Splatting Environment

GSDrive: 通过多模式未来轨迹探查与3D高斯点散布环境强化驾驶策略

Ziang Guo, Chen Min, Xuefeng Zhang, Yixiao Zhou, Shuo Wang, Sifa Zheng, Dzmitry Tsetserukou, Zufeng Zhang

AI总结 GSDrive通过多模式轨迹探查和3D高斯点散布环境,结合模仿学习与强化学习,提升端到端自动驾驶的训练效果与鲁棒性。

Comments 2nd version

详情
AI中文摘要

端到端自动驾驶旨在直接将感官观测映射到驾驶动作,但其现实部署受限于数据分布的演变和持续标注的高成本。尽管结合模仿学习(IL)和强化学习(RL)是改进策略的常见策略,但传统RL训练依赖于延迟的事件奖励,导致策略仅从碰撞等灾难性结果学习,从而提前收敛到次优行为。为解决这些限制,我们提出了GSDrive框架,该框架利用可微的3D高斯点散布(3DGS)环境进行未来感知轨迹探查和奖励塑造。GSDrive首先通过IL学习多模式轨迹探针,然后使用RL在3DGS环境中评估多个候选未来,将模拟回报转换为密集的奖励形状以优化策略。这产生了一个循环的混合IL-RL训练循环,其中IL提供结构化的未来先验,RL提供交互反馈以进行迭代优化。在重建的nuScenes数据集上评估,我们的方法在闭环实验中优于其他基于模拟的RL方法。代码可在https://github.com/ZionGo6/GSDrive获得。

英文摘要

End-to-end (E2E) autonomous driving aims to directly map sensory observations to driving actions, but its real-world deployment is hindered by evolving data distributions and the high cost of continual annotation. While combining imitation learning (IL) and reinforcement learning (RL) is a common strategy for policy improvement, conventional RL training relies on delayed, event-based rewards, where policies learn only from catastrophic outcomes such as collisions, leading to premature convergence to suboptimal behaviors. To address these limitations, we propose GSDrive, a framework that uses a differentiable 3D Gaussian Splatting (3DGS) environment for future-aware trajectory probing and reward shaping in E2E driving. GSDrive first learns a multi-mode trajectory probe via IL and then uses RL to evaluate multiple candidate futures in the 3DGS environment, converting their simulated returns into dense shaping rewards for policy optimization. This yields a cyclic hybrid IL-RL training loop, where IL supplies structured future priors and RL provides interactive feedback for iterative refinement. Evaluated on the reconstructed nuScenes dataset, our method outperforms other simulation-based RL approaches in closed-loop experiments. Code is available at https://github.com/ZionGo6/GSDrive.

2604.26733 2026-05-18 cs.AI cs.LG

FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards

FutureWorld: 一个用于预测代理的实时强化学习环境,具有现实世界结果奖励

Zhixin Han, Yanzhi Zhang, Chuyang Wei, Maohang Gao, Xiawei Yue, Kefei Chen, Yu Zhuang, Haoxiang Guan, Jiyan He, Jian Li, Yitong Duan, Yu Shi, Mengting Hu, Shuxin Zheng

AI总结 本文提出FutureWorld,一个实时强化学习环境,通过闭环预测、结果实现与参数更新,提升预测准确性与校准能力。

Comments The code will be released in the near future. The experiments are currently ongoing

详情
AI中文摘要

实时预测指的是在事件发生前对其做出预测的任务。这项任务越来越多地使用基于大型语言模型的智能体系统进行研究,并且对于构建能够持续从现实世界学习的智能体至关重要。它可以提供大量基于多样现实事件的预测问题,同时防止答案泄露。为了利用未来预测的优势,我们提出了FutureWorld,一个实时智能体强化学习环境,它在预测、结果实现和参数更新之间闭合训练回路。具体来说,我们修改并扩展了verl-tool,从而得到一个新的框架,我们称之为verl-tool-future。与依赖即时奖励的标准强化学习训练框架不同,verl-tool-future存储预测时间的回放,待现实世界结果可用后回填奖励,然后回放完成的轨迹以更新策略。在三个开源智能体上,连续的FutureWorld训练轮次导致预测准确性、概率评分和校准的一致提升,证明了延迟的现实世界结果反馈可以作为有效的强化学习信号。

英文摘要

Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from the real world. It can provide a large number of prediction questions grounded in diverse real-world events, while preventing answer leakage. To leverage the advantages of future prediction, we present FutureWorld, a live agentic reinforcement learning environment that closes the training loop between prediction, outcome realization, and parameter updates. Specifically, we modify and extend verl-tool, resulting in a new framework that we call verl-tool-future. Unlike standard reinforcement learning training frameworks that rely on immediate rewards, verl-tool-future stores prediction-time rollouts, backfills rewards after real-world outcomes become available, and then replays the completed trajectories for policy update. Across three open-source agents, successive FutureWorld training rounds lead to consistent improvements in prediction accuracy, probabilistic scoring, and calibration, demonstrating that delayed real-world outcome feedback can serve as an effective reinforcement learning signal.

2604.25384 2026-05-18 cs.CL

Wiki Dumps to Training Corpora: South Slavic Case

维基数据转训练语料:南斯拉夫语法

Mihailo Škorić, Cosimo Palma

AI总结 本文提出将维基数据转化为七种南斯拉夫语言高质量语料的流程,通过文本提取清洗和冗余过滤提升语料质量,为语言模型训练和跨语言比较提供可靠资源。

详情
AI中文摘要

本文提出了一种将原始维基数据转换为七种南斯拉夫语言高质量语料的流程。工作分为两个主要阶段:第一阶段涉及从维基百科、维基源、维基书籍、维基新闻和维基引语的原始数据中提取和清洗文本,需要仔细处理原始维基标记以分离文本文章并提取可用的自然语言文本。第二阶段解决可疑或低质量文章的问题,这些文章通常来自数据库或结构化知识库,具有重复模式、通用短语和极少或没有原创内容。为减轻其影响,采用基于n-gram的过滤策略来检测文章间的高文本冗余并完全移除这些文章。最终的语料库旨在提供语言丰富的文本,适用于语言模型训练或跨南斯拉夫语言的比较研究。通过系统提取与质量控制相结合,本工作为创建可靠、高信息量的语料库做出了贡献,这些语料库反映了语言的真实文化背景。尽管本文专注于南斯拉夫语的情况,但该方法主要语言无关,可推广到其他语言。

英文摘要

This paper presents a pipeline designed to transform raw Wikimedia dumps into quality textual corpora for seven South Slavic languages. The work is divided into two major phases. The first involves extracting and cleaning text from raw dumps of Wikipedia, Wikisource, Wikibooks, Wikinews, and Wikiquote. This step requires careful handling of raw wiki markup to isolate, first of all, textual articles, and then usable natural language text within them. The second phase addresses the challenge of questionable or low-quality articles, which are often generated from databases or structured knowledge bases. These articles are characterised by repetitive patterns, generic phrasing, and minimal to no original content. To mitigate their impact, a n-gram-based filtering strategy was employed to detect high levels of textual redundancy between articles and then remove such articles from the corpora entirely. The resulting datasets aim to provide linguistically rich texts suitable for training language models or conducting comparative research across South Slavic languages. By combining systematic extraction with quality control, this work contributes to the creation of reliable, high-information corpora that reflect the authentic cultural contexts of languages. While focused on the South Slavic case in the paper, the approach is mostly language-agnostic and can be generalised to other languages.

2604.21251 2026-05-18 cs.LG cs.AI

CAP: Controllable Alignment Prompting for Unlearning in LLMs

CAP:用于大语言模型中去学习的可控对齐提示

Zhaokun Wang, Jinyu Guo, Jingwen Pu, Hongli Pu, Meng Yang, Xunlei Chen, Jie Ou, Wenyi Li, Guangchun Luo, Wenhong Tian

AI总结 本文提出CAP框架,通过强化学习将去学习过程转化为可学习的提示优化,实现可控的去学习,无需更新模型参数,解决了现有方法的计算成本高、遗忘边界不可控等问题。

Comments Accpeted to ACL 2026 Main Conference

详情
AI中文摘要

大型语言模型(LLMs)在未过滤语料上训练时,固有地面临保留敏感信息的风险,需要选择性知识去学习以满足监管合规和伦理安全要求。然而,现有参数修改方法面临根本性限制:计算成本高、遗忘边界不可控以及对模型权重访问的严格依赖。这些限制使它们在闭源模型中不切实际,而当前非侵入式替代方案仍缺乏系统性和依赖经验。为解决这些挑战,我们提出了可控对齐提示(CAP)框架,一种端到端的提示驱动去学习范式。CAP通过强化学习将去学习分解为可学习的提示优化过程,其中提示生成器与LLM协作,以抑制目标知识的同时保留选择性的一般能力。这种方法通过提示撤销实现可逆的知识恢复。广泛实验表明,CAP实现了无需更新模型参数的精确、可控的去学习,建立了一种动态对齐机制,克服了先前方法的可转移性限制。

英文摘要

Large language models (LLMs) trained on unfiltered corpora inherently risk retaining sensitive information, necessitating selective knowledge unlearning for regulatory compliance and ethical safety. However, existing parameter-modifying methods face fundamental limitations: high computational costs, uncontrollable forgetting boundaries, and strict dependency on model weight access. These constraints render them impractical for closed-source models, yet current non-invasive alternatives remain unsystematic and reliant on empirical experience. To address these challenges, we propose the Controllable Alignment Prompting for Unlearning (CAP) framework, an end-to-end prompt-driven unlearning paradigm. CAP decouples unlearning into a learnable prompt optimization process via reinforcement learning, where a prompt generator collaborates with the LLM to suppress target knowledge while preserving general capabilities selectively. This approach enables reversible knowledge restoration through prompt revocation. Extensive experiments demonstrate that CAP achieves precise, controllable unlearning without updating model parameters, establishing a dynamic alignment mechanism that overcomes the transferability limitations of prior methods.