arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2079
2606.06494 2026-06-05 cs.LG

TailLoR: Protecting Principal Components in Parameter-Efficient Continual Learning

TailLoR:在参数高效持续学习中保护主成分

Marius Dragoi, Ioana Pintilie, Alexandra Dragomir, Antonio Barbalau, Florin Brad

发表机构 * Bitdefender

AI总结 提出TailLoR方法,利用预训练权重的奇异基作为固定参考系,对奇异值矩阵进行低秩更新,并通过软谱惩罚抑制与主导奇异方向对齐的更新,从而减少干扰并实现细粒度适应。

详情
AI中文摘要

基于谱分解的参数高效微调方法推动了持续学习的进展。本文介绍TailLoR,该方法利用预训练权重的奇异基U和V作为固定参考系,学习应用于奇异值矩阵的低秩更新。软谱惩罚抑制与主导奇异方向对齐的更新,减少干扰,同时将细粒度适应引导到高度灵活的长尾谱坐标中。

英文摘要

Parameter-efficient finetuning methods based on spectral decomposition have enabled progress in Continual Learning. In this paper we introduce TailLoR, which utilizes the singular bases U and V of the pre-trained weights as a fixed reference frame to learn a low-rank update applied to the singular value matrix. A soft spectral penalty discourages updates aligned with dominant singular directions, reducing interference while routing fine-grained adaptation into the highly flexible, long-tail spectral coordinates.

2606.06491 2026-06-05 cs.RO cs.AI

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

TempoVLA: 学习速度可控的视觉-语言-动作策略

Dong Jing, Jingchen Nie, Tianqi Zhang, Jiaqi Liu, Huaxiu Yao, Zhiwu Lu, Mingyu Ding

发表机构 * RUC(中国人民大学) FDU(福建大学) UNC(北卡罗来纳大学教堂山分校)

AI总结 提出TempoVLA,通过可变速度轨迹增强和速度条件机制,实现机器人操作中速度的双向灵活控制,并支持动态速度调节。

详情
AI中文摘要

机器人操作在低风险过渡阶段需要快速执行,而在高风险接触阶段需要缓慢精确的运动。然而,现有的视觉-语言-动作模型(VLA)仅从训练演示中继承单一的固定速度。先前通过模型压缩、KV缓存重用或强化学习加速VLA的尝试仅将策略从一个固定速度转移到另一个,而几乎未探索减速。我们观察到每个预测动作的幅度已经决定了机器人移动的速度,这为可控执行速度开辟了直接途径。我们将这一观察转化为TempoVLA,一个执行速度由显式条件控制的单一VLA。TempoVLA结合了两个耦合组件:(1)数据侧的可变速度轨迹增强(VSTA),通过合并或分割动作重新定时演示到任何目标速度,同时保留其运动语义;(2)模型侧的条件机制,将速度馈送给策略。统计显示,VSTA以可忽略的运动误差达到请求的速度。在仿真和真实世界任务上的实验表明,TempoVLA实现了双向的灵活速度控制,而VSTA通过更好的数据利用进一步提升了默认的1倍性能。此外,通过与大型多模态模型协作,TempoVLA实现了动态速度控制,在低风险阶段加速,在高风险阶段减速。

英文摘要

Robot manipulation alternates between low-risk transit phases that call for fast execution and high-risk contact stages that demand slow, precise motion. Yet existing Vision-Language-Action models (VLAs) only inherit a single fixed speed from training demonstrations. Prior efforts to accelerate VLAs through model compression, KV-cache reuse, or reinforcement learning only shift the policy from one fixed speed to another, and leave deceleration almost unexplored. We observe that the magnitude of each predicted action already governs how fast the robot moves, opening a direct route to controllable execution speed. We turn this observation into TempoVLA, a single VLA whose execution speed is controlled by an explicit condition. TempoVLA combines two coupled components. (1) A data-side Variable-Speed Trajectory Augmentation (VSTA) that re-times demonstration to any target speed by merging or splitting actions while preserving its motion semantics. (2) A model-side conditioning mechanism that feeds the speed to the policy. Statistics show that VSTA reaches the requested speed with negligible motion error. Experiments in simulation and on real-world tasks demonstrate that TempoVLA achieves flexible speed control in both directions, while VSTA additionally boosts the default $1\times$ performance via better data utilization. Furthermore, by cooperating with a large multimodal model, TempoVLA realizes dynamic speed control, accelerating through low-risk phases and decelerating for high-risk ones.

2606.06486 2026-06-05 cs.LG cs.AI cs.GT

Regret Minimization with Adaptive Opponents in Repeated Games

重复博弈中与自适应对手的遗憾最小化

Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang

发表机构 * Massachusetts Institute of Technology(麻省理工学院) OpenAI University of Maryland, College Park(马里兰大学学院公园分校)

AI总结 针对重复博弈中自适应对手的遗憾最小化问题,提出重复策略遗憾(RP-Regret)指标,并设计三种算法实现次线性遗憾,同时证明所有玩家最小化该遗憾可学习子博弈完美均衡。

详情
AI中文摘要

在本文中,我们研究重复博弈中与\emph{自适应}对手(即能够根据历史对局做出反应的对手)的遗憾最小化问题。已知在线学习中的标准\emph{外部遗憾}指标无法捕捉这种自适应性。为了考虑玩家的反事实推理,我们引入了{ t 重复策略遗憾(RP-Regret)},这是一种博弈论指标,衡量当所有玩家都能对历史对局做出\emph{反应}时,\emph{实际}累积效用与\emph{事后最优}累积效用之间的差异。与此背景下现有的遗憾概念相比,我们的概念更贴近重复博弈,允许更强的比较器和约束更少的对手,同时当所有玩家最小化该遗憾时,仍有可能找到更好的均衡。我们首先确定了获得时间次线性{ t RP-Regret}的必要条件,涉及遗憾定义中玩家比较器策略的变化以及比较器和对手策略的记忆。然后,我们研究了最小化{ t RP-Regret}的附加条件和可证明的算法,该遗憾在策略空间上本质上是\emph{非凸}的。为了应对这一挑战,我们提出了三种算法:(i)基于优化预言机(如先前一些在线非凸学习工作所假设的);(ii)每次迭代最小化{ t RP-Regret}的凸\emph{线性化}替代项;(iii)当对手缓慢改变策略时,直接最小化{ t RP-Regret}。此外,当所有玩家都能运行算法最小化{ t RP-Regret}(或其线性化变体)时,可以学习重复博弈的某些子博弈完美均衡。我们还提供了实验,表明最小化我们的遗憾概念可以在诸如猎鹿博弈等游戏中带来更合作、效用更高的解。

英文摘要

In this paper, we study regret minimization in repeated games with \emph{adaptive} opponents who can respond based on histories of play. The standard metric of \emph{external regret} in online learning is known to fail to capture such adaptivity. To account for players' counterfactual reasoning, we introduce {\tt Repeated Policy Regret (RP-Regret)}, a game-theoretic metric that measures the difference between the \emph{realized} and the \emph{best-in-hindsight} accumulated utility when all players can \emph{respond} to the history of play. Compared to existing regret notions in this setting, ours is native to repeated game playing, enabling stronger comparators and opponents with fewer constraints, while maintaining the possibility of finding better equilibria when all players minimize it. We first identify necessary conditions for obtaining {\tt RP-Regret} sublinear in time, on the variation of the player's comparator strategies in the regret definition and on the memories of both the comparator and opponents' strategies. We then study additional conditions and provable algorithms to minimize {\tt RP-Regret}, which is by definition \emph{non-convex} in the strategy space. To address this challenge, we propose three algorithms: (i) one based on an optimization oracle, as assumed in some prior work in online non-convex learning; (ii) one that minimizes a convex and \emph{linearized} surrogate of {\tt RP-Regret} at each iteration; (iii) one that directly minimizes {\tt RP-Regret} when opponents change strategies slowly. Furthermore, when all players can run algorithms to minimize the {\tt RP-Regret} (or its linearized variant), certain subgame perfect equilibria of the repeated game can be learned. We also provide experiments showing that minimizing our regret notions can lead to more cooperative solutions with higher utility in games such as Stag-Hunt.

2606.06485 2026-06-05 cs.CV

PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene Understanding

PAR3D: 一种用于场景理解的统一部件感知3D多模态大语言模型

Shaohui Dai, Yansong Qu, You Shen, Shengchuan Zhang, Liujuan Cao

发表机构 * Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University(教育部多媒体可信感知与高效计算重点实验室,厦门大学)

AI总结 提出PAR3D框架,通过部件感知3D表示学习和层次化分割查询生成,解决现有3D-MLLM在细粒度部件理解上的不足,在部件级问答和指代分割任务上取得显著提升。

详情
Comments
Project page: https://atrovast.github.io/PAR3D/
AI中文摘要

近期3D多模态大语言模型(3D-MLLMs)的进展为3D场景理解任务(包括视觉问答、描述和指代分割)提供了统一解决方案。然而,现有的3D-MLLM仍以物体为中心,限制了其对细粒度部件结构的建模能力,而这对于与3D环境的具身交互至关重要。在这项工作中,我们提出了PAR3D,一个统一的部件感知3D-MLLM框架,使模型能够理解、推理并定位3D场景中的物体及其部件。为了支持部件感知3D场景理解的训练和评估,我们引入了ScenePart,一个带有部件级标注和语言指令的合成3D场景数据集。我们进一步开发了部件感知3D表示学习,以用细粒度部件级语义丰富3D视觉表示,并提出了层次化分割查询生成,通过层次化的物体-部件查询来定位部件目标。大量实验表明,我们的方法显著提升了部件级问答和指代分割的性能,同时在物体级视觉语言任务上也取得了强劲表现。

英文摘要

Recent advances in 3D multimodal large language models (3D-MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D-MLLMs remain largely object-centric, limiting their ability to model fine-grained part structures that are essential for embodied interaction with 3D environments. In this work, we present PAR3D, a unified part-aware 3D-MLLM framework that enables models to understand, reason about, and ground both objects and their parts in 3D scenes. To enable training and evaluation of part-aware 3D scene understanding, we introduce ScenePart, a synthetic 3D scene dataset with part-level annotations and language instructions. We further develop Part-Aware 3D Representation Learning to enrich 3D visual representations with fine-grained part-level semantics, and propose Hierarchical Segmentation Query Generation to ground part targets via hierarchical object-part queries. Extensive experiments show that our method substantially improves part-level question answering and referring segmentation, while also achieving strong performance across object-level vision-language tasks.

2606.06481 2026-06-05 cs.CL cs.AI cs.LG

Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection

操作引导的渐进式人机文本转换基准:面向多粒度AI文本检测

Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao, Tianjun Yao, Xinyi Shang, Yi Tang, Jiacheng Cui, Ahmed Elhagry, Salwa K. Al Khatib, Hao Li, Salman Khan, Zhiqiang Shen

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(莫扎德·本·泽亚德人工智能大学) University College London(伦敦大学学院)

AI总结 提出OpAI-Bench基准,通过九种渐进修订版本和五种AI编辑操作,模拟人机协作编辑过程,支持文档、句子、词元和跨度多粒度检测,揭示AI文本可检测性受编辑操作、领域和累积修订历史影响,并发现混合作者中间版本比纯人类或纯AI端点更难检测。

详情
Comments
Our code and data are available at https://github.com/VILA-Lab/OpAI-Bench
AI中文摘要

随着AI写作助手越来越多地融入现实世界的起草和修订流程,许多文档不再是纯粹的人类撰写或AI生成,而是渐进式人机共同编辑的结果。然而,现有的AI文本检测基准主要关注最终输出,对AI作者身份信号如何在修订过程中出现、累积或消失的理解有限。我们引入了OpAI-Bench,一个操作引导的基准,用于研究在文档、句子、词元和跨度粒度上的渐进式人机文本转换。从人类撰写的文档开始,OpAI-Bench在预定义的AI覆盖水平和五种代表性AI编辑操作下,为每个样本构建了九个顺序修订版本,涵盖四个领域,同时保留多粒度上的完整作者身份来源。该基准支持8个文档级检测器、7个句子级检测器和2个细粒度词元/跨度级检测器的全面评估。实验表明,AI文本的可检测性不仅受AI编辑内容比例的影响,还受编辑操作、领域和累积修订历史的影响。有趣的是,我们注意到混合作者身份的中间版本通常比完全人类或大量AI编辑的端点更难检测,暴露了现有基准遗漏的非单调检测模式。OpAI-Bench为分析在现实渐进编辑场景下,AI辅助写作是否、何时以及如何变得可检测提供了一个受控测试平台。我们的代码和基准可在https://github.com/VILA-Lab/OpAI-Bench获取。

英文摘要

As AI writing assistants become increasingly integrated into real-world drafting and revision workflows, many documents are no longer purely human-written or AI-generated, but instead result from progressive human-AI co-editing. However, existing AI-text detection benchmarks largely focus on final outputs and provide limited understanding of how AI authorship signals emerge, accumulate, or disappear throughout the revision process. We introduce OpAI-Bench, an operation-guided benchmark for studying progressive human-to-AI text transformation across document, sentence, token, and span granularities. Starting from human-written documents, OpAI-Bench constructs nine sequentially revised versions for each sample under predefined AI coverage levels and five representative AI edit operations, covering four domains while preserving complete authorship provenance at multiple granularities. The benchmark supports comprehensive evaluation with 8 document-level detectors, 7 sentence-level detectors, and 2 fine-grained token/span-level detectors. Experiments reveal that AI-text detectability is governed not only by the proportion of AI-edited content, but also by edit operation, domain, and cumulative revision history. Interestingly, we notice that mixed-authorship intermediate versions are often harder to detect than both fully human and heavily AI-edited endpoints, exposing non-monotonic detection patterns missed by existing benchmarks. OpAI-Bench provides a controlled testbed for analyzing whether, when, and how AI-assisted writing becomes detectable under realistic progressive editing scenarios. Our code and benchmark are available at https://github.com/VILA-Lab/OpAI-Bench.

2606.06479 2026-06-05 cs.LG cs.AI

Pretraining Recurrent Networks without Recurrence

无递归预训练循环网络

Akarsh Kumar, Phillip Isola

发表机构 * MIT(麻省理工学院)

AI总结 提出监督记忆训练(SMT)方法,通过将循环神经网络训练转化为一步记忆转换标签上的监督学习,实现时间并行训练和稳定梯度路径,优于反向传播通过时间(BPTT)方法。

详情
Comments
30 pages, 23 figures
AI中文摘要

训练循环神经网络(RNN)需要在长序列计算中分配信用。标准的反向传播通过时间(BPTT)对此问题处理不佳:它在时间上是顺序的,限制了并行性,并且遭受梯度消失或爆炸,使得长程关联难以学习。我们提出监督记忆训练(SMT),一种训练非线性RNN的方法,通过将RNN训练简化为一步记忆转换标签 $(m_t, x_{t+1}) \rightarrow m_{t+1}$ 上的监督学习,完全绕过了循环信用传播。SMT通过训练基于Transformer的编码器在预测状态目标上获取这些记忆标签——仅保留预测未来所需的过去信息。通过将记忆内容与记忆更新方式解耦,SMT实现了时间并行的RNN训练,任意两个token之间具有稳定的$O(1)$长度梯度路径——而无需展开RNN。我们发现,在语言建模和像素序列建模等任务上预训练各种RNN架构时,SMT优于BPTT。SMT使非线性RNN能够更好地捕获长程依赖并并行训练,可能解锁构建过去经验时间抽象模型的缩放能力。

英文摘要

Training recurrent neural networks (RNNs) requires assigning credit across long sequences of computations. Standard backpropagation through time (BPTT) addresses this problem poorly: it is sequential in time, limiting parallelism, and suffers from vanishing or exploding gradients, making long-range associations difficult to learn. We propose Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely by reducing RNN training to supervised learning on one-step memory transition labels $(m_t, x_{t+1}) \rightarrow m_{t+1}$. SMT acquires these memory labels by training a Transformer-based encoder on a predictive state objective--retaining only information from the past necessary to predict the future. By decoupling what to remember from how to update memory, SMT enables time-parallel RNN training with a stable $O(1)$ length gradient path between any two tokens--without ever unrolling the RNN. We find that SMT outperforms BPTT when pretraining various RNN architectures on tasks like language modeling and pixel sequence modeling. SMT enables nonlinear RNNs to better capture long-range dependencies and train in parallel, potentially unlocking the scaling of models that build temporal abstractions of past experience.

2606.06477 2026-06-05 cs.CV

Complexity-Balanced Diffusion Splitting

复杂度平衡的扩散分裂

Noam Issachar, Dani Lischinski, Raanan Fattal

发表机构 * The Hebrew University of Jerusalem(耶路撒冷希伯来大学)

AI总结 提出复杂度平衡分裂(CBS)框架,通过将扩散时间线划分为等近似负担的段并分配更多容量给困难区域,在多个架构和数据集上提升生成质量而不增加推理成本。

详情
AI中文摘要

标准连续时间生成模型依赖于整体架构,必须从各向同性噪声到复杂数据分布等截然不同的信号域中导航。虽然扩展模型容量可提升性能,但在整个生成时间线上均匀部署大规模网络本质上效率低下。在这项工作中,我们提出复杂度平衡分裂(CBS),一种用于时间容量分配的原则性框架,将生成工作负载分布到多个专门的子网络上。基于函数逼近理论和de Boor的等分布原则,CBS将扩散时间线划分为等近似负担的段,将更多表示容量分配给生成动力学更难建模的区域。为估计这种局部复杂度,我们引入两个互补且易于处理的监控函数:基于流Dirichlet能量的空间度量,和基于采样轨迹加速度的几何度量。通过使用轻量级辅助模型估计这些复杂度分布,我们的方法消除了启发式时间分割或计算昂贵的搜索过程的需求。在多种架构(SiT、JiT和UNet)和数据集上的广泛评估表明,CBS在不增加每步推理成本的情况下持续提升合成质量。特别地,在SiT-XL上使用CFG时,CBS相比朴素时间分割将FID改善了约35%。项目页面见https://noamissachar.github.io/CBS/。

英文摘要

Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance, deploying a massive network uniformly across the entire generative timeline is inherently inefficient. In this work, we propose Complexity-Balanced Splitting (CBS), a principled framework for temporal capacity allocation that distributes the generative workload across multiple specialized sub-networks. Grounded in function approximation theory and de Boor's equidistribution principle, CBS partitions the diffusion timeline into segments of equal approximation burden, allocating more representational capacity to regions where the generative dynamics are more difficult to model. To estimate this local complexity, we introduce two complementary and tractable monitor functions: a spatial measure based on the flow's Dirichlet energy, and a geometric measure based on the acceleration of the sampling trajectories. Using a lightweight auxiliary model to estimate these complexity profiles, our approach eliminates the need for heuristic temporal splits or computationally expensive search procedures. Extensive evaluation across multiple architectures (SiT, JiT, and UNet) and datasets demonstrates that CBS consistently improves synthesis quality without increasing per-step inference cost. In particular, CBS improves FID by ~35% on SiT-XL with CFG relative to naive temporal partitioning. Project page is available at https://noamissachar.github.io/CBS/.

2606.06476 2026-06-05 cs.CV

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

思考与想象:基于世界模拟器的智能视觉空间推理

Chenming Zhu, Jingli Lin, Yilin Long, Peizhou Cao, Tai Wang, Jiangmiao Pang, Xihui Liu

发表机构 * The University of Hong Kong(香港大学) Shanghai AI Laboratory(上海人工智能实验室) Shanghai Jiao Tong University(上海交通大学) Fudan University(复旦大学) Beihang University(北航大学)

AI总结 提出Astra框架,通过强化学习训练VLM策略与Bagel世界模拟器交互,在推理中生成想象视觉证据,解决空间推理中的未观察布局、跨视角一致性和替代视角推理问题。

详情
Comments
Project page: https://zcmax.github.io/projects/Thinking-With-Imagination
AI中文摘要

尽管视觉语言模型(VLM)展现出强大的视觉推理能力,但其空间推理能力仍然很大程度上局限于观察到的图像和面向文本的思维链。当只有有限的自我中心观察可用时,它们通常难以推断未观察到的布局、保持跨视角一致性以及从替代视角进行推理。在这项工作中,我们将此问题研究为“思考与想象”,即VLM在推理过程中通过与世界模拟器交互主动获取想象的视觉证据。我们提出Astra,一种智能空间推理框架,赋予VLM以动作条件视觉想象能力。具体而言,Astra将强化学习训练的VLM策略Astra-VL与基于Bagel的世界模拟器Astra-WM相结合,后者从上下文图像和自然语言相机运动生成新视角观察。为了提供可靠的想象证据,Astra-WM通过视角一致性训练进行训练,以提高跨视角的姿态和内容一致性。在强化学习阶段,我们提出了一种世界模拟器在环的两阶段强化学习课程,以稳定工具使用探索,并提升模型仅在想象观察优于直接回答时调用模拟器的能力。实验表明,世界模拟器和智能策略都是必要的:Astra-WM将模拟器增强的Gemini-3-Flash在MMSI-Bench上的性能从45.1提升到49.5,而Astra-VL将Qwen3-VL骨干网络在MMSI-Bench上的性能从29.8提升到38.8,在MindCube上从36.8提升到42.7。这些结果表明,想象观察可以提供有用的空间证据,但有效的世界模型增强推理需要学习何时、何地以及如何想象。

英文摘要

While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning abilities remain largely constrained to the observed images and text-oriented chain-of-thought. They often struggle to infer unobserved layouts, maintain cross-view consistency, and reason from alternative viewpoints when only limited egocentric observations are available. In this work, we study this problem as thinking with imagination, where a VLM actively acquires imagined visual evidence by interacting with a world simulator during reasoning. We propose Astra, an agentic spatial reasoning framework that empowers VLMs with action-conditioned visual imagination. Specifically, Astra couples Astra-VL, an RL-trained VLM policy, with Astra-WM, a Bagel-based world simulator that generates novel-view observations from context images and natural-language camera motions. To provide reliable imagined evidence, Astra-WM is trained with view consistency tuning to improve pose and content consistency across views. In the RL stage, we propose a world-simulator-in-the-loop two-phase RL curriculum to stabilize tool-use exploration and advance the model's ability to invoke the simulator only when imagined observations improve over direct answering. Experiments demonstrate that both the world simulator and the agentic policy are necessary: Astra-WM improves simulator-augmented Gemini-3-Flash on MMSI-Bench from 45.1 to 49.5, while Astra-VL improves the Qwen3-VL backbone from 29.8 to 38.8 on MMSI-Bench and from 36.8 to 42.7 on MindCube. These results show that imagined observations can provide useful spatial evidence, but effective world-model-augmented reasoning requires learning when, where, and how to imagine.

2606.06475 2026-06-05 cs.LG cs.AI

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

RREDCoT: 推理模型的片段级奖励再分配

Mykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger, Sepp Hochreiter

发表机构 * ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria(林茨ELLIS单元和LIT人工智能实验室,机器学习研究所,林茨约瑟夫·冯·拉格纳大学,奥地利) Cognizant AI Lab, San Francisco, USA(认知人工智能实验室,美国旧金山) NXAI GmbH, Linz, Austria(NXAI公司,奥地利林茨)

AI总结 针对推理语言模型强化学习微调中的延迟奖励问题,提出RREDCoT方法,利用模型自身近似最优奖励再分配,无需额外生成,降低方差并提升信用分配效率。

详情
Comments
Preprint, under review
AI中文摘要

近期推理语言模型的进展由强化学习微调驱动。通常,这些依赖于组相对策略优化(GRPO)算法或其变体来引导模型生成思维链(CoT)轨迹。最终答案只能在CoT轨迹完成后验证并分配奖励,这构成了延迟奖励问题。GRPO及其变体对应于标准强化学习中的蒙特卡洛方法,已知具有高方差。该问题的一个可能解决方案是通过信用分配进行奖励再分配,其中对达到期望解重要的CoT轨迹片段通过分配更高奖励来强调。虽然蒙特卡洛采样可用于提供中间状态值的无偏估计,但其计算开销使其不适用于长上下文高粒度下的训练时信用分配。我们引入RREDCoT(思维链的奖励再分配),它利用模型自身近似最优奖励再分配,无需额外生成。我们研究了我们的方法相比MC采样和几种归因方法的优势。我们进一步分析了与再分配构建相关的几个方面,例如CoT轨迹的分割和状态值估计。

英文摘要

Recent advancements in reasoning language models have been driven by Reinforcement Learning (RL) fine-tuning. Most often, these rely on the Group Relative Policy Optimization (GRPO) algorithm or modifications thereof to steer the models to produce Chain-of-Thought (CoT) traces. The final answer can only be verified, and the reward assigned, after the CoT trace is complete, making it a delayed reward problem. GRPO and its modifications correspond to Monte Carlo methods in standard RL, which are known to suffer from high variance. A possible solution to this problem is the redistribution of rewards through credit assignment, where segments of the CoT trace that are important for arriving at the desirable solution are emphasized by assigning a higher reward. While Monte Carlo sampling can be used to provide an unbiased estimate of intermediate state values, its computational overhead makes it unsuitable for train-time credit assignment in long contexts at high granularity. We introduce RREDCoT (Reward REDistribution for Chain of Thoughts), which utilizes the model itself to approximate the optimal reward redistribution without additional generation. We investigate the advantages of our method compared to MC sampling and several attribution methods. We further analyze several aspects relevant to the construction of the redistribution such as segmentation of CoT traces and state value estimation.

2606.06474 2026-06-05 cs.CL cs.AI cs.LG

Self-Augmenting Retrieval for Diffusion Language Models

扩散语言模型的自增强检索

Paul Jünger, Justin Lovelace, Linxi Zhao, Dongyoung Go, Kilian Q. Weinberger

发表机构 * University of California, Berkeley(加州大学伯克利分校) Google Research(谷歌研究院)

AI总结 提出SARDI框架,利用扩散语言模型去噪过程中丢弃的低置信度标记作为前瞻信号指导检索,无需训练且与检索器无关,在多跳问答基准上以高达8倍吞吐量超越现有方法。

详情
Comments
ICML 2026
AI中文摘要

离散扩散语言模型通过并行迭代去噪整个响应来生成文本。每一步,它们为每个掩码位置预测暂定标记,将高置信度预测提交到输出,并丢弃低置信度标记。我们表明,被丢弃的标记实际上对检索增强生成是有用的前瞻信号:即使低置信度标记也常在去噪轨迹早期浮现显著实体,从而在输出最终确定前检索到更强的证据。我们通过扩散语言模型的自增强检索(SARDI)利用这一点,这是一个动态RAG框架,在去噪过程中使用这些前瞻标记指导检索。SARDI无需训练、与检索器无关,并适用于任何具备推理能力的离散扩散语言模型。在五个多跳QA基准上,SARDI以高达8倍的吞吐量优于当前无训练的扩散和自回归检索基线。

英文摘要

Discrete diffusion language models generate text by iteratively denoising an entire response in parallel. At each step, they predict tentative tokens for every masked position, committing the confident predictions to the output and discarding the unconfident ones. We show that the discarded tokens are in fact a useful lookahead signal for retrieval-augmented generation: even low-confidence tokens often surface salient entities early in the denoising trajectory, enabling retrieval of stronger evidence before the output is finalized. We exploit this through Self-Augmenting Retrieval for Diffusion Language Models (SARDI), a dynamic RAG framework that uses these lookahead tokens to guide retrieval during denoising. SARDI is training-free, retriever-agnostic, and applicable to any reasoning-capable discrete diffusion language model. Across five multi-hop QA benchmarks, SARDI outperforms current training-free diffusion and autoregressive retrieval baselines at up to $8\times$ higher throughput.

2606.06473 2026-06-05 cs.AI cs.CL

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

MLEvolve:一种用于自动化机器学习算法发现的自我进化框架

Shangheng Du, Xiangchao Yan, Jinxin Shi, Zongsheng Cao, Shiyang Feng, Zichen Liang, Boyuan Sun, Tianshuo Peng, Yifan Zhou, Xin Li, Jie Zhou, Liang He, Bo Zhang, Lei Bai

发表机构 * Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) East China Normal University(东华大学)

AI总结 提出MLEvolve框架,通过渐进式MCGS、回溯记忆和分层控制解决LLM智能体在长期任务中的信息隔离、无记忆搜索和缺乏分层控制问题,在MLE-Bench和数学算法优化任务上取得最先进性能。

详情
AI中文摘要

大型语言模型(LLM)智能体越来越多地应用于长期任务,如科学发现和机器学习工程(MLE),其中持续的自我进化成为关键能力。然而,现有的MLE智能体存在分支间信息隔离、无记忆搜索和缺乏分层控制的问题,这些共同阻碍了长期优化。我们提出了MLEvolve,一个基于LLM的自我进化多智能体框架,用于端到端的机器学习算法发现。通过将树搜索扩展到渐进式MCGS,MLEvolve通过基于图的参考边实现跨分支信息流,并借助熵启发的渐进式调度,逐步将搜索从广泛探索转向集中利用。为了让智能体能够随着积累的经验进化,我们引入了回溯记忆,它将冷启动领域知识库与动态全局记忆相结合,用于特定任务的体验检索和重用。为了实现稳定的长期迭代,我们进一步将战略规划与代码生成解耦,并采用自适应编码模式。在MLE-Bench上的评估表明,MLEvolve在多个维度上实现了最先进的性能,包括在12小时预算(标准运行时间的一半)下的平均奖牌率和有效提交率。此外,MLEvolve在数学算法优化任务上也优于专门的算法发现方法(包括AlphaEvolve),展示了强大的跨领域泛化能力。我们的代码可在https://github.com/InternScience/MLEvolve获取。

英文摘要

Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE agents suffer from inter-branch information isolation, memoryless search, and lack of hierarchical control, which together hinder long-horizon optimization. We present MLEvolve, an LLM-based self-evolving multi-agent framework for end-to-end machine learning algorithm discovery. By extending tree search to Progressive MCGS, MLEvolve enables cross-branch information flow through graph-based reference edges and gradually shifts the search from broad exploration to focused exploitation with an entropy-inspired progressive schedule. To allow the agent to evolve with accumulated experience, we introduce Retrospective Memory, which combines a cold-start domain knowledge base with a dynamic global memory for task-specific experience retrieval and reuse. For stable long-horizon iteration, we further decouple strategic planning from code generation with adaptive coding modes. Evaluation on MLE-Bench shows that MLEvolve achieves state-of-the-art performance across multiple dimensions including average medal rate and valid submission rate under a 12-hour budget (half the standard runtime). Moreover, MLEvolve also outperforms specialized algorithm discovery methods including AlphaEvolve on mathematical algorithm optimization tasks, demonstrating strong cross-domain generalization. Our code is available at https://github.com/InternScience/MLEvolve.

2606.06470 2026-06-05 cs.LG cs.AI

PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training

PC层:通过多项式权重预处理改进大语言模型预训练

Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang, Kunxiang Zhao, Alex Schwing, Ruoyu Sun

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Google LLC(谷歌公司) Shenzhen International Center for Industrial and Applied Mathematics(深圳国际工业与应用数学中心) Shenzhen Research Institute of Big Data(深圳大数据研究院) University of Illinois at Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出一种多项式预条件子权重参数化方法(PC层),通过低阶多项式预条件重塑权重矩阵奇异值谱,确保LLM训练中权重条件稳定,且训练后无推理开销,在Llama-1B预训练中优于标准Transformer。

详情
AI中文摘要

我们提出了一种预条件(PC)层,一种通过多项式预条件子实现的权重参数化方法,确保在整个LLM训练过程中权重条件稳定。PC模块通过低阶多项式预条件重塑权重矩阵的奇异值谱。训练后,预条件权重可以合并回原始架构,不产生推理开销。我们展示了在Llama-1B预训练中,对于AdamW和Muon优化器,所提出的PC层相对于标准Transformer的优势。理论上,我们通过证明对于某些深度线性网络,均匀限制每层的奇异值能确保梯度下降几何收敛到全局最小值,从而证明了这一谱控制原理。我们的代码可在https://github.com/Empath-aln/PC-layer获取。

英文摘要

We propose a preconditioning (PC) layer, a weight parameterization via polynomial preconditioner that ensures stable weight conditioning throughout LLM training. The PC module reshapes the singular-value spectrum of weight matrices via low-degree polynomial preconditioning. After training, the preconditioned weights can be merged back into the original architecture, incurring no inference overhead. We demonstrate the advantage of the proposed PC layer over standard transformers in Llama-1B pre-training, for both the AdamW and Muon optimizers. Theoretically, we justify this spectrum-control principle by proving that uniformly bounding each layer's singular values ensures geometric convergence of gradient descent to global minima, for certain deep linear networks. Our code is available at https://github.com/Empath-aln/PC-layer.

2606.06468 2026-06-05 cs.AI

Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

Goedel-Architect: 通过蓝图生成与精炼简化形式定理证明

Jui-Hui Chung, Ziyang Cai, Zihao Li, Qishuo Yin, Rohit Agarwal, Simon Park, Rodrigo Porto, Narutatsu Ri, Ziran Yang, Shange Tang, Xingyu Dang, Hongzhou Lin, Mengdi Wang, Danqi Chen, Chi Jin, Liam H Fowl, Sanjeev Arora

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学) University of Science and Technology of China(中国科学技术大学) University of Toronto(多伦多大学) National University of Singapore(新加坡国立大学) University of Tokyo(东京大学) University of Washington(华盛顿大学)

AI总结 提出Goedel-Architect框架,通过生成和精炼依赖图蓝图,结合Lean 4证明器并行证明引理,在多个基准测试上达到开源最优性能。

详情
AI中文摘要

我们介绍Goedel-Architect,一个以蓝图生成和精炼为中心的Lean 4形式定理证明智能体框架。蓝图是一个定义和引理的依赖图,逐步构建到主定理。首先,Goedel-Architect生成一个包含形式化定义和引理及其声明依赖关系的蓝图。该蓝图可选地由自然语言证明引导。然后,一个配备工具的Lean证明器组件使用相关依赖并行证明每个开放的引理节点。失败的引理反过来驱动全局蓝图的精炼。这种策略与其他主流方法形成对比,后者使用递归引理分解,并可能低效地在死胡同策略上循环。使用开放权重的DeepSeek-V4-Flash (284B-A13B)作为骨干,Goedel-Architect在MiniF2F-test上达到99.2%的pass@1,在PutnamBench上达到75.6%的pass@1。在更困难的问题上,通过可选的初始蓝图自然语言证明种子,我们额外解决了剩余的两个MiniF2F-test问题(达到100%),将PutnamBench提升至88.8%(597/672),并在IMO 2025上解决了4/6,在Putnam 2025上解决了11/12,在USAMO 2026上解决了3/6。这代表了开源流水线在价格点比可比开源流水线低至500倍的情况下的最先进性能。

英文摘要

We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem. First, Goedel-Architect generates a blueprint of formally stated definitions and lemmas, along with declared dependencies. This blueprint is optionally guided by a natural language proof. Then, a tool-equipped Lean prover component closes each open lemma node in parallel using relevant dependencies. Failed lemmas in turn drive refinement of the global blueprint. This strategy contrasts with other mainstream approaches which use recursive lemma decomposition, and can inefficiently loop on dead-end strategies. Using the open-weight DeepSeek-V4-Flash (284B-A13B) as the backbone, Goedel-Architect attains 99.2% pass@1 on MiniF2F-test and 75.6% pass@1 on PutnamBench. With an optional natural-language proof seeding the initial blueprint on the harder problems, we additionally close the remaining two MiniF2F-test problems (reaching 100%), lift PutnamBench to 88.8% (597/672), and solve 4/6 on IMO 2025, 11/12 on Putnam 2025, and 3/6 on USAMO 2026. This represents state-of-the-art performance for an open-source pipeline at a price point up to 500x less than comparable open-source pipelines.

2606.06467 2026-06-05 cs.CL cs.AI cs.LG

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

仅索引一次:具有共享路由的跨层稀疏注意力

Yutao Sun, Yanqi Zhang, Li Dong, Jianyong Wang, Furu Wei

发表机构 * Microsoft Research(微软研究院) Tsinghua University(清华大学)

AI总结 提出跨层稀疏注意力(CLSA),通过共享KV缓存和路由索引,在保持token稀疏注意力精度的同时减少路由开销,显著提升长上下文LLM的解码效率。

详情
AI中文摘要

现代LLM中的长上下文推理越来越受到解码效率的限制,尤其是在模型生成长中间思维链的推理密集型场景中。现有的稀疏注意力方法通常面临实际的效率-质量权衡。结构化块稀疏方法通常提供更强的加速,但会导致明显的质量损失,而token稀疏方法通常更准确,但由于在全缓存上进行top-k路由仍然昂贵,因此端到端加速有限。在这项工作中,我们提出了跨层稀疏注意力(CLSA),它建立在KV共享架构(如YOCO)之上。核心思想不仅是跨解码器层共享KV缓存,还共享路由索引。单个索引器计算一次token级别的top-k选择,并在各层之间重用生成的索引,从而保留了token稀疏注意力的细粒度选择性,同时分摊了路由开销。由此产生的架构共同改善了所有主要的推理瓶颈,包括预填充、KV缓存存储和长上下文解码。在短上下文和长上下文基准上的实验表明,CLSA既准确又高效,在128K上下文下实现了高达7.6倍的解码加速和17.1倍的总体吞吐量提升。这些结果表明,对于长上下文LLM,这是一种更完整的架构解决方案,可同时提升模型质量和推理效率。

英文摘要

Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings where models generate long intermediate chains of thought. Existing sparse attention methods often face a practical efficiency-quality trade-off. Structured block sparse methods typically provide stronger acceleration but incur noticeable quality loss, while token sparse methods are usually more accurate yet deliver limited end-to-end speedup because top-k routing over the full cache remains expensive. In this work, we propose cross-layer sparse attention (CLSA), which is built on top of KV-sharing architectures such as YOCO. The core idea is to share not only the KV cache across cross-decoder layers, but also the routing index. A single indexer computes token-level top-k selection once and reuses the resulting index across layers, thereby preserving the fine-grained selectivity of token sparse attention while amortizing the routing overhead. The resulting architecture improves all major inference bottlenecks jointly, including pre-filling, KV-cache storage, and long-context decoding. Experiments across short-context and long-context benchmarks show that CLSA is both accurate and efficient, achieving up to 7.6x decoding speedup and 17.1x overall throughput improvement at 128K context. These results suggest a more complete architectural solution for long-context LLMs that jointly advances model quality and inference efficiency.

2606.06462 2026-06-05 cs.AI

Benchmark Everything Everywhere All at Once

无处不在的基准测试

Shiyun Xiong, Dongming Wu, Peiwen Sun, Yuang Ai, Bokang Yang, Wencheng Han, Xiao-Hui Li, Xiangyu Yue

发表机构 * MMLab, The Chinese University of Hong Kong(中大香港实验室) CPII under InnoHK(创新香港 CPII) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Shenzhen Loop Area Institute(深圳环城研究院) Shandong University(山东大学) Huawei Technologies(华为技术)

AI总结 提出Benchmark Agent,一个全自主智能体系统,自动化基准构建流程,以解决现有基准构建劳动密集、难以复用和性能饱和的问题。

详情
Comments
Project page: https://benchmarkagent.github.io/
AI中文摘要

基准测试通过提供标准化和明确的性能度量,对于评估和推进LLM和MLLM至关重要。然而,它们的构建劳动密集且难以复用,引发了可持续性和可扩展性的担忧。此外,现有基准在发布后往往很快达到性能饱和,导致对最先进模型的区分不足。为了应对这些挑战,我们引入了Benchmark Agent,一个完全自主的智能体系统,专为基准构建而设计。我们的框架编排了完整的基准构建流程,从用户查询分析和子任务设计到数据注释和质量控制。为了评估Benchmark Agent,我们实现了它来生成15个代表性基准,涵盖多种评估场景,包括文本理解、多模态理解和领域特定推理。大量实验,包括人工评估、LLM作为评判者的评估和一致性检查,表明Benchmark Agent能够在最小人工参与下生成高质量的基准样本。更重要的是,通过持续评估,我们观察到一些有洞察力的发现,包括当前模型在某些领域特定推理任务上存在困难。我们相信快速演进的基准可以为研究社区做出重要贡献。预览和代码将在演示页面和代码仓库中公开。

英文摘要

Benchmarks are fundamental for evaluating and advancing LLMs and MLLMs by providing standardized and explicit measures of performance. However, their construction is labor-intensive and hard to reuse, raising concerns about sustainability and scalability. Moreover, existing benchmarks often quickly reach performance saturation after their release, resulting in insufficient discrimination among state-of-the-art models. To address these challenges, we introduce Benchmark Agent, a fully autonomous agentic system designed for benchmark building. Our framework orchestrates the complete benchmark construction pipeline, from user query analysis and subtask design to data annotation and quality control. To assess Benchmark Agent, we implement it to produce 15 representative benchmarks, spanning diverse evaluation scenarios, including text understanding, multimodal understanding, and domain-specific reasoning. Extensive experiments, including human evaluation, LLM-as-a-judge assessment, and consistency checks, demonstrate Benchmark Agent can generate high-quality benchmark samples with minimal human involvement. More importantly, through continual evaluation, we observe several insightful findings, including that current models struggle with certain domain-specific reasoning tasks. We believe that rapidly evolving benchmarks can contribute significantly to the research community. The preview and code will be publicly available at the demo page and code repository.

2606.06461 2026-06-05 cs.RO

Flow-based Policy Adaptation without Policy Updates

基于流的策略适应无需策略更新

Luzhe Sun, Jingtian Ji, Haoran Chen, Jiawei Zhou, Matthew R. Walter

发表机构 * Toyota Technological Institute at Chicago(芝加哥丰田技术研究所) Stony Brook University(石溪大学)

AI总结 提出GLOVES方法,通过流模型将非专家动作向专家动作分布传输,实现选择性动作级适应,提升任务成功率并保持智能体意图。

详情
AI中文摘要

利用预训练策略、基础模型或人类操作员的先验知识,为从零开始学习机器人技能提供了一种高效替代方案。然而,这些智能体提供的动作往往是次优的、有噪声的,或与特定任务的专家行为不一致。我们提出了GLOVES,一系列基于流的适应方法,通过将非专家动作向专家动作分布传输来纠正它们。GLOVES并非用完全自主性取代智能体控制,而是执行选择性的动作级适应,在提升任务成功率的同时保持智能体意图。学习到的流还通过反向流评估提供了一种自然的分布内评分机制。我们利用该信号作为干预门:与专家分布一致的动作保持不变,而异常或分布外(OOD)动作则被纠正。这样,仅在必要时提供辅助。GLOVES仅需有限的专家监督,使用少量演示或可重用的成功技能片段。通过学习局部专家动作模式并在执行过程中拼接,GLOVES提供了一个轻量级的共享控制模块,用于跨任务和环境的鲁棒动作适应。代码和演示可在ripl.github.io/GLOVES_web获取。

英文摘要

Leveraging prior knowledge from pretrained policies, foundation models, or human operators offers an efficient alternative to learning robot skills from scratch. However, these agents often provide actions that are suboptimal, noisy, or misaligned with task-specific expert behavior. We propose GLOVES, a family of flow-based adaptation methods that correct non-expert actions by transporting them toward an expert action distribution. Rather than replacing agentic control with full autonomy, GLOVES performs selective action-level adaptation, improving task success while preserving agent intent. The learned flow also provides a natural in-distribution scoring mechanism through reverse flow evaluation. We use this signal as an intervention gate: actions that appear consistent with the expert distribution are passed through unchanged, while anomalous or out-of-distribution (OOD) actions are corrected. In this way, assistance is only provided when necessary. GLOVES requires only limited expert supervision, using a small number of demonstrations or reusable successful skill segments. By learning local expert action patterns and stitching them during execution, GLOVES provides a lightweight shared-control module for robust action adaptation across tasks and environments. Code and demos are available at ripl.github.io/GLOVES_web.

2606.06459 2026-06-05 cs.LG

Event Detection for Parameter-to-KPI Dependency Learning for AI-RAN

面向AI-RAN的参数到KPI依赖学习的事件检测

Christie Djidjev, Nicholas Kaminski

发表机构 * arXiv.org

AI总结 针对AI-RAN中多AI控制函数相互干扰问题,提出基于事件检测的依赖学习方法,通过将噪声连续遥测转换为二元事件指示器,并利用合成数据评估机器学习管道恢复潜在依赖结构的能力。

详情
AI中文摘要

下一代无线网络预计将依赖多个并发的AI驱动控制功能,这些功能同时优化不同的网络目标,特别是在AI集成和开放无线接入网络架构中,如AI无线接入网络(AI-RAN)和开放无线接入网络(O-RAN)。当这些功能相互作用时,它们可能以难以仅从原始网络数据中检测的方式相互干扰。管理此类交互的一个关键缺失部分是可靠、可解释的依赖结构,该结构捕获在任何给定时间哪些控制参数积极影响哪些网络性能结果。本文聚焦于支持此类依赖学习所需的事件检测步骤,通过将噪声连续遥测转换为参数活动和KPI响应的二元指示器。核心困难在于并非数据中的每个波动都反映真实的控制交互,因此该方法必须区分真实的参数-结果关系与背景变化。由于难以获得具有已知参数-KPI真实标签的真实AI-RAN流量轨迹,我们引入了一个带有植入潜在依赖的合成闭环流量生成器。我们使用这种受控遥测来评估基于机器学习的依赖恢复管道,该管道将连续轨迹到二元事件指示器的转换表述为一个显著性检测问题。实验评估表明,当信号与背景变化充分分离时,所提出的管道能够可靠地从噪声连续轨迹中恢复潜在依赖结构,同时强调阈值校准是控制事件检测质量的关键因素。这些结果为自适应AI-RAN控制系统的可解释依赖学习奠定了基础。

英文摘要

Next-generation wireless networks are expected to rely on multiple concurrent AI-driven control functions that optimize different network objectives simultaneously, particularly in AI-integrated and open radio access network architectures such as AI Radio Access Network (AI-RAN) and Open Radio Access Network (O-RAN). When these functions interact, they can interfere with one another in ways that are difficult to detect from raw network data alone. A key missing piece for managing such interactions is a reliable, interpretable dependency structure that captures which control parameters are actively influencing which network performance outcomes at any given time. This paper focuses on the event-detection step needed to support such dependency learning by converting noisy continuous telemetry into binary indicators of parameter activity and KPI response. The central difficulty is that not every fluctuation in the data reflects a genuine control interaction, so the method must distinguish real parameter-outcome relationships from background variation. Because real AI-RAN traffic traces with known parameter-KPI ground truth are difficult to obtain, we introduce a synthetic closed-loop traffic generator with planted latent dependencies. We use this controlled telemetry to evaluate a machine-learning-based dependency recovery pipeline that formulates the conversion of continuous traces into binary event indicators as a significance-detection problem. Experimental evaluation shows that the proposed pipeline reliably recovers the latent dependency structure from noisy continuous traces when the signal is sufficiently separated from background variation, while highlighting threshold calibration as the key factor controlling event-detection quality. These results constitute a foundational step toward interpretable dependency learning for adaptive AI-RAN control systems.

2606.06458 2026-06-05 cs.LG cs.AI cs.CV

In-Context Multiple Instance Learning

上下文多实例学习

Alexander Möllers, Marvin Sextro, Julius Hense, Gabriel Dernbach, Klaus-Robert Müller

发表机构 * Berlin Institute for the Foundations of Learning and Data(柏林学习与数据基础研究所) Machine Learning Group, Technische Universität Berlin(柏林技术大学机器学习小组) Aignostics Institute of Pathology, Charité – Universitätsmedizin Berlin(柏林查理医院病理研究所) Max-Planck Institute for Informatics(马克斯·普朗克信息研究所) Department of Artificial Intelligence, Korea University(韩国大学人工智能系)

AI总结 本文提出一种基于感知器架构的上下文学习器,通过合成数据预训练,无需梯度更新即可从少量标记包中解决新的多实例学习任务,在12个基准上超越需任务特定训练的监督基线。

详情
AI中文摘要

多实例学习(MIL)解决了在实例包级别提供监督的问题,并已成功应用于从计算病理学到卫星图像等领域。然而,现有算法在低标签率(许多实际应用的特点)下表现不佳。灵活的模型过拟合,而僵化的模型无法适应手头的任务。我们证明,在合成数据上预训练一个具有感知器架构的上下文学习器,可以得到一个能够从少量标记包中解决新任务的模型。在推理时,分类在单次前向传播中完成,无需梯度更新。我们提出并研究了不同的用于包结构数据的合成数据生成器,发现它们捕获了互补的归纳偏差。在这些生成器的混合上预训练的模型继承了每个生成器在各自任务上的优势,并在12个MIL基准上取得了最佳平均性能,超过了需要任务特定训练的监督基线。

英文摘要

Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characterizes many real-world applications. Flexible models overfit and rigid ones fail to adapt to the task at hand. We show that pretraining an in-context learner with a Perceiver-style architecture on synthetic data yields a model that can solve new tasks from a handful of labeled bags. At inference time, classification happens in a single forward pass and requires no gradient updates. We propose and investigate different synthetic data generators for bag-structured data and find that they capture complementary inductive biases. A model pretrained on a mixture of these generators inherits their per-task strengths and achieves the best average performance across twelve MIL benchmarks, outperforming supervised baselines that require task-specific training.

2606.06453 2026-06-05 cs.AI

Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents

Vortex: 面向AI Agent的高效可编程稀疏注意力服务

Zhuoming Chen, Xinrui Zhong, Qilong Feng, Ranajoy Sadhukhan, Yang Zhou, Michael Qizhe Shieh, Zhihao Jia, Beidi Chen

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Rice University(Rice大学) National University of Singapore(新加坡国立大学)

AI总结 提出Vortex系统,通过Python嵌入式前端语言和面向页面的张量抽象,结合高效后端,实现稀疏注意力算法的快速原型设计、部署和评估,显著提升吞吐量。

详情
AI中文摘要

随着生成长度的增长,稀疏注意力对于服务大型语言模型(LLMs)变得越来越重要。然而,大规模部署和评估新的稀疏注意力算法仍然高度工程密集,这减慢了人类研究人员和AI Agent探索稀疏注意力设计的速度。为了应对这一挑战,我们提出了Vortex,一个系统,它结合了在面向页面的张量抽象之上的Python嵌入式前端语言,用于表达广泛的稀疏注意力算法,以及一个紧密集成到现代LLM服务栈中的高效后端。Vortex能够快速原型设计、部署和评估稀疏注意力算法,有效地将其理论效率提升转化为实际吞吐量的改进。因此,Vortex大大加速了稀疏注意力算法的设计和迭代。首先,AI Agent使用Vortex自动生成和优化多样化的算法,最佳算法在保持准确性的同时,吞吐量比全注意力高出高达3.46倍。其次,Vortex将稀疏注意力扩展到新兴架构和非常大的模型,这些模型原本难以实验,在基于MLA的GLM-4.7-Flash上实现了高达4.7倍的吞吐量提升,在229B参数的MiniMax-M2.7上实现了1.37倍的提升(在NVIDIA B200 GPU上)。

英文摘要

Sparse attention is becoming increasingly important for serving large language models (LLMs) as generation lengths continue to grow. However, deploying and evaluating new sparse attention algorithms at scale remains highly engineering-intensive, slowing both human researchers and AI agents in exploring the sparse attention design. To address this challenge, we present Vortex, a system that combines a Python-embedded frontend language atop a page-centric tensor abstraction for expressing a broad range of sparse attention algorithms, with an efficient backend tightly integrated into modern LLM serving stacks. Vortex enables rapid prototyping, deployment, and evaluation of sparse attention algorithms, effectively translating their theoretical efficiency gains into real-world throughput improvements. As a result, Vortex substantially accelerates the design and iteration of sparse attention algorithms. First, AI agents use Vortex to automatically generate and refine diverse algorithms, the best reaching up to $3.46\times$ higher throughput than full attention while preserving accuracy. Second, Vortex extends sparse attention to emerging architectures and very large models that are otherwise hard to experiment with, reaching up to $4.7\times$ higher throughput on the MLA-based GLM-4.7-Flash and $1.37\times$ on the 229B-parameter MiniMax-M2.7 on NVIDIA B200 GPUs.

2606.06448 2026-06-05 cs.AI

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

Agent记忆:有状态长时任务工作负载的表征与系统影响

Yasmine Omri, Ziyu Gan, Zachary Broveak, Robin Geens, Zexue He, Alex Pentland, Marian Verhelst, Tsachy Weissman, Thierry Tambe

发表机构 * Massachusetts Institute of Technology(麻省理工学院) Stanford University(斯坦福大学) University of California, Berkeley(加州大学伯克利分校) MIT Media Lab(麻省理工学院媒体实验室)

AI总结 本文首次对LLM agent记忆系统进行系统级表征,提出四轴分类法,通过阶段感知分析框架评估10种代表性系统,并给出10条系统设计建议。

详情
AI中文摘要

LLM agent越来越多地被部署在需要跨扩展交互历史进行持续推理的长时任务上。大规模实现这一点要求agent在会话之间持久地存储、检索和更新自己的记忆。一个丰富的agent记忆系统生态系统已经出现,涵盖平面检索、LLM介导的提取、整合事实存储和agent控制流。然而,它们的系统级行为尚未被表征。我们提出了agent记忆的首次系统表征。首先,我们引入了一个面向系统的分类法,沿四个轴对agent记忆系统进行分类。其次,我们构建了一个阶段感知的分析框架,将成本归因于构建、检索和生成。第三,我们跨两个基准套件表征了十个代表性系统,揭示了设计选择如何在写和读路径上转移成本。最后,我们推导出10条系统建议,涵盖构建调度、能力下限、通过查询量的摊销、新鲜度-延迟权衡以及集群规模管理。

英文摘要

LLM agents are increasingly deployed on long-horizon tasks requiring sustained reasoning over extended interaction histories. Realizing this at scale requires agents to persistently store, retrieve, and update their own memory across sessions. A rich ecosystem of agent memory systems has emerged spanning flat retrieval, LLM-mediated extraction, consolidating fact stores, and agentic control flows. Yet, their system-level behavior remains uncharacterized. We present the first systems characterization of agent memory. First, we introduce a system-oriented taxonomy classifying agent memory systems along four axes. Second, we build a phase-aware profiling harness attributing cost to construction, retrieval, and generation. Third, we characterize ten representative systems across two benchmark suites, uncovering how design choices shift cost across the write and read paths. Finally, we derive 10 system recommendations covering construction scheduling, capability floors, amortization via query volume, freshness-latency tradeoffs, and fleet-scale management.

2606.06447 2026-06-05 cs.CL cs.LG

Latent Reasoning with Normalizing Flows

基于归一化流的潜在推理

Guancheng Tu, Xiangjun Fu, Suhao Yu, Yao Tang, Haoqiang Kang, Lianhui Qin, Yizhe Zhang, Jiatao Gu

发表机构 * University of Pennsylvania(宾夕法尼亚大学) UC San Diego(圣地亚哥大学) Meta(Meta公司)

AI总结 提出NF-CoT框架,通过归一化流在LLM内部建模连续潜在思维,保留自回归生成、概率采样、KV缓存解码和似然估计等优势,在代码生成任务中提升通过率并降低推理成本。

详情
AI中文摘要

大型语言模型通常通过生成显式思维链(CoT)来改进推理,展示了中间计算的重要性。然而,文本CoT迫使这种计算通过离散、串行且面向通信的令牌流进行:每个推理步骤必须在模型继续之前被语言化,即使底层更新是语义的、不确定的或仅部分形成的。潜在推理通过在承诺文本之前以紧凑的连续状态执行中间计算,提供了一种更高带宽的替代方案。然而,现有的潜在推理方法常常牺牲了使CoT在自回归语言模型中有效的关键优势,包括原生的从左到右生成、概率采样、与KV缓存解码的兼容性以及可处理的似然估计。我们提出NF-CoT,一种潜在推理框架,通过使用归一化流对连续思维进行建模来保留这些优势。NF-CoT在LLM骨干内部实例化一个TARFlow风格的归一化流,定义了从显式CoT提炼的紧凑连续思维上的可处理概率模型。连续思维位置由NF头生成,而文本位置由标准LM头在同一因果流中生成。这种设计为潜在思维提供了精确的似然,支持使用原始KV缓存进行概率从左到右解码,并支持在潜在推理空间中进行直接策略梯度优化。在代码生成基准测试中,NF-CoT在显式CoT和先前潜在推理基线上提高了通过率,同时显著降低了中间推理成本。

英文摘要

Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream: each reasoning step must be verbalized before the model can proceed, even when the underlying update is semantic, uncertain, or only partially formed. Latent reasoning offers a higher-bandwidth alternative by performing intermediate computation in compact continuous states before committing to text. Yet existing latent-reasoning methods often sacrifice key advantages that make CoT effective in autoregressive language models, including native left-to-right generation, probabilistic sampling, compatibility with KV-cache decoding, and tractable likelihood estimation. We propose NF-CoT, a latent reasoning framework that preserves these advantages by modeling continuous thoughts with normalizing flows. NF-CoT instantiates a TARFlow-style normalizing flow inside the LLM backbone, defining a tractable probability model over compact continuous thoughts distilled from explicit CoT. Continuous-thought positions are generated by an NF head, while text positions are generated by the standard LM head within the same causal stream. This design provides exact likelihoods for latent thoughts, enables probabilistic left-to-right decoding with the original KV cache, and supports direct policy-gradient optimization in the latent reasoning space. On code-generation benchmarks, NF-CoT improves pass rates over explicit-CoT and prior latent-reasoning baselines while substantially reducing intermediate-reasoning cost.

2606.06440 2026-06-05 cs.LG stat.ML

Causal Atlases from Entropic Inference: Bayesian Networks beyond Optimal DAGs

来自熵推理的因果图谱:超越最优DAG的贝叶斯网络

Hazhir Aliahmadi, Irina Babayan, Greg van Anders

发表机构 * Department of Physics , Engineering Physics and Astronomy(物理系、工程物理与天文学系)

AI总结 针对数据驱动因果识别中多因果链问题,提出基于熵推理的因果图谱方法,通过最大熵系综采样量化因果结构歧义性。

详情
Comments
18 pages, 2 figures
AI中文摘要

数据驱动的因果关系识别对于理解科学内外的复杂系统至关重要。贝叶斯网络通过有向无环图(DAG)为建模通用因果关系提供了一种概率方法。然而,构建贝叶斯网络的典型技术依赖于优化,这可能不适合学习因果关系,因为底层数据可能允许多条因果链。更忠实于数据的因果关系表示将提供构建多个因果图的框架,这些因果图与底层数据固有的变异性一致。在这里,我们展示了基于熵的推理生成了与底层数据一致的合理因果关系的图谱。在2节点和20节点线性结构方程模型的模拟噪声数据上,我们对图的最大熵系综进行采样,从而量化底层因果关系中固有的结构歧义性。我们的方法表明,“优化”的DAG可能包含在同等精确的拓扑中不一致的因果伪影。

英文摘要

Data-driven causal relationship identification is pertinent to advancing understanding of complex systems both within and beyond science. Bayesian networks offer a probabilistic method for modelling generic causal relationships via directed acyclic graphs (DAGs). However, typical techniques for constructing Bayesian networks rely on optimization, which can be ill-suited for learning causal relationships because the underlying data may admit multiple chains of causation. More data-faithful representations of causal relationships would provide frameworks for constructing multiple causal maps that are consistent with the variability that is inherent in underlying data. Here, we show that entropy-based inference generates atlases of plausible causal relationships that are consistent with underlying data. On simulated noisy data of 2- and 20-node linear structural equation models, we sample a maximum-entropy ensemble of graphs that allow us to quantify the inherent structural ambiguity in underlying causal relationships. Our method shows that "optimized" DAGs can contain causal artifacts are not consistent across equivalently accurate topologies.

2606.06428 2026-06-05 cs.CL

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

强化学习引发对未见语言的上下文翻译学习

Hanxu Hu, Zdeněk Šnajdr, Pinzhen Chen, Jannis Vamvas, Rico Sennrich

发表机构 * University of Zurich(苏黎世大学) ETH Zurich(苏黎世联邦理工学院) Queen’s University Belfast(贝尔法斯特女王大学)

AI总结 提出使用强化学习(RL)方法,以chrF为奖励,使大语言模型从丰富的语言上下文中提取并应用语言学知识,实现对完全未见语言的有效翻译。

详情
Comments
15 pages, 2 figures
AI中文摘要

先前的工作表明,大型语言模型(LLMs)可以通过持续训练甚至在其上下文中编码语法书来翻译未见或低资源语言。然而,这两种方法通常过拟合特定语言,在测试时零样本迁移有限。为了大规模翻译极低资源语言,我们认为LLMs必须获得利用上下文语言学知识的元技能,而不是记忆特定语言。在本文中,我们提出了一种强化学习(RL)方法,用于在丰富的语言学上下文中进行未见语言翻译,使用表面级翻译指标(chrF)作为奖励。实验表明,尽管奖励轻量级,我们的RL训练模型有效地从提供的上下文中提取和应用相关的语言学信息,导致对完全未见语言的翻译优于上下文学习或有监督微调。我们的分析表明,基于结果的RL可以扩展到数学和编码等传统推理任务之外,作为从上下文学习语言的配方。

英文摘要

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.

2606.06423 2026-06-05 cs.RO cs.AI

RiskFlow: Fast and Faithful Safety-Critical Traffic Scenario Generation

RiskFlow: 快速且保真的安全关键交通场景生成

Qi Lan, Yining Tang, Yu Shen, Yi Zhou, Yuhao Wei, Jie Li, Guofa Li

发表机构 * National University of Singapore(新加坡国立大学)

AI总结 提出RiskFlow框架,通过动作空间中的单次前向传输替代迭代去噪,实现快速、保真的安全关键多智能体交通场景生成。

详情
AI中文摘要

安全关键交通场景生成对于评估自动驾驶系统在罕见但高风险交互下的表现至关重要。现有的基于扩散的方法在闭环生成中提供了强大的可控性,但其迭代去噪过程计算成本高,并且可能在长时间滚动中累积采样和引导误差,导致不真实的运动伪影,如抖动、异常加速度和越野行为。为了解决这些问题,我们提出了RiskFlow,一个闭环安全关键多智能体交通生成框架,将未来轨迹生成公式化为动作空间中的传输。RiskFlow不依赖迭代去噪,而是学习有限区间上的平均速度场,通过单次前向传递将高斯动作序列转换为未来的加速度和偏航率命令,使用基于JVP的目标函数实现高效稳定的训练。在测试时,RiskFlow将输出空间引导应用于生成的动作,引导选定的关键智能体走向风险交互,同时正则化越野行为,并通过车辆动力学重建物理可行的轨迹。在nuScenes上使用tbsim闭环评估的实验表明,RiskFlow在多智能体和长时域设置中实现了强大的对抗性与真实性的权衡。与代表性基线相比,RiskFlow在保持竞争性安全关键生成能力的同时,持续提高了真实性,并显著减少了推理时间。

英文摘要

Safety-critical traffic scenario generation is essential for evaluating autonomous driving systems under rare but high-risk interactions. Existing diffusion-based methods offer strong controllability in closed-loop generation, but their iterative denoising process is computationally expensive and may accumulate sampling and guidance errors over long rollouts, causing unrealistic motion artifacts such as jitter, abnormal acceleration, and off-road behavior. To address these issues, we propose RiskFlow, a closed-loop safety-critical multi-agent traffic generation framework that formulates future trajectory generation as transport in the action space. Instead of relying on iterative denoising, RiskFlow learns an average velocity field over a finite interval to transform Gaussian action sequences into future acceleration and yaw-rate commands with a single forward pass, using a JVP-based objective for efficient and stable training. At test time, RiskFlow applies output-space guidance to the generated actions, steering selected critical agents toward risky interactions while regularizing off-road behavior, and reconstructs physically feasible trajectories through vehicle dynamics. Experiments on nuScenes with tbsim closed-loop evaluation show that RiskFlow achieves a strong adversariality-realism trade-off across multi-agent and long-horizon settings. Compared with representative baselines, RiskFlow consistently improves realism while maintaining competitive safety-critical generation capability, and substantially reduces inference time for evaluation.

2606.06420 2026-06-05 cs.CL

A Komi-Yazva--Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation

科米-亚兹瓦语-俄语平行语料库及零样本和少样本LLM翻译评估协议

Petr Parshakov

发表机构 * HSE University, Perm, Russia(俄罗斯彼尔姆国立经济大学) School of Management SKOLKOVO, Moscow, Russia(莫斯科SKOLKOVO管理学院)

AI总结 构建首个科米-亚兹瓦语-俄语平行语料库,并提出显式评估协议,研究大语言模型在极度低资源濒危语言翻译中的零样本和检索增强少样本性能。

详情
Comments
18 pages, 6 tables, 3 figures
AI中文摘要

我们提出了首个科米-亚兹瓦语-俄语平行语料库,以及用于研究LLM在濒危、极度低资源环境下翻译的显式评估协议。该数据集包含来自74篇叙事文本的457个对齐句子对,并附有文档化的来源、句子级对齐和故事标识符,支持泄漏感知评估。我们利用这一设置,在零样本和基于检索的少样本场景下,比较了现代大语言模型在科米-亚兹瓦语到俄语翻译中的表现,其中平行数据极度稀缺。协议包括故事级交叉验证、用于少样本提示的确定性检索、生成输出的严格验证、互补的基于参考和基于评判的指标,以及故事级不确定性估计。跨模型而言,LLM产生了有意义的翻译,但性能因模型家族和提示方式而异。基于检索的少样本提示始终优于零样本提示,而超出小检索上下文的增益仍然有限。结果表明,该设置下的评估结论在很大程度上取决于指标选择和失败处理方式,因此本文将该语料库既作为数据集贡献,也作为濒危语言机器翻译的可复现评估测试平台。

英文摘要

We present the first Komi-Yazva--Russian parallel corpus together with an explicit evaluation protocol for studying LLM translation in an endangered, extremely low-resource setting. The dataset contains 457 aligned sentence pairs from 74 narrative texts and is accompanied by documented provenance, sentence-level alignment, and story identifiers that enable leakage-aware evaluation. We use this setup to compare modern large language models on Komi-Yazva-to-Russian translation under severe parallel-data scarcity in zero-shot and retrieval-based few-shot regimes. The protocol includes story-level cross-validation, deterministic retrieval for few-shot prompting, strict validation of generated outputs, complementary reference-based and judge-based metrics, and story-level uncertainty estimates. Across models, LLMs produce non-trivial translations, but performance varies strongly by model family and prompting regime. Retrieval-based few-shot prompting consistently improves over zero-shot prompting, while gains beyond a small retrieved context remain limited. The results show that evaluative conclusions in this setting depend materially on metric choice and failure handling, so the paper frames the corpus as both a dataset contribution and a reproducible evaluation testbed for endangered-language machine translation.

2606.06418 2026-06-05 cs.LG cs.AI cs.SY eess.SY

Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss

双重预处理 (DoPr):针对测试时性能而非验证损失的优化

Thomas T. Zhang, Alok Shah, Yifei Zhang, Vincent Zhang, Nikolai Matni, Max Simchowitz

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学) University of Cambridge(剑桥大学) DeepMind(深度Mind) Google Research(谷歌研究)

AI总结 提出双重预处理优化范式,通过结合梯度级和激活级预处理,缓解自回归语言建模等场景中训练/验证损失与下游指标不匹配的测试时反馈问题,提升测试时性能而不一定改善验证损失。

详情
AI中文摘要

深度学习的许多现代应用涉及通过一步预测损失(例如,$L^2$回归、交叉熵)训练神经网络,但部署时沿着其自身预测进行展开。关键例子包括自回归语言建模、基于流的生成建模和机器人策略学习。已有充分证据表明,这些设置会引发我们称为测试时反馈(TTF)的现象:训练/验证损失与下游感兴趣指标(如任务成功率和生成质量)之间的不匹配,且随任务长度增长。虽然数据整理、架构和目标设计已被提出用于对抗TTF设置中的训练-测试偏移,但本文提出优化作为缓解误差累积的新设计轴。具体而言,我们引入了一种称为双重预处理(DoPr)的新优化范式,专门针对TTF的挑战。DoPr将梯度级预处理(如Adam和Muon中的)与激活级预处理(AP)(如KFAC中的)相结合。我们表明,添加AP可以在各种TTF设置中作为一种即插即用的干预手段,提高下游模型性能。有趣的是,这些测试时性能的提升并不总是伴随验证损失的改善,这为如何正确评估使用一步监督目标训练的模型提出了新问题。

英文摘要

Many modern applications of deep learning involve training a neural network via a one-step prediction loss (e.g., $L^2$ regression, cross-entropy), but deploy the network by rolling out along its own predictions. Key examples include autoregressive language modeling, flow-based generative modeling, and robot policy learning. It is well-documented that these settings induce a phenomenon we call test-time feedback (TTF): the mismatch between the training/validation loss and downstream metrics of interest, such as task success rate and generation quality, which grows with task length. While data curation, architecture, and objective design have been proposed to combat train-test shift in TTF settings, this paper proposes optimization as a new design axis to mitigate error accumulation. Specifically, we introduce a new optimization paradigm called double-preconditioning (DoPr) uniquely tailored to the challenges of TTF. DoPr combines gradient-wise preconditioning, as in Adam and Muon, with activation-wise preconditioning (AP), such as in KFAC. We show that the addition of AP yields a drop-in intervention for increasing downstream model performance across a range of TTF settings. Interestingly, these gains in test-time performance do not consistently accompany improvements in validation loss, opening new questions about how to properly evaluate models trained with one-step supervised objectives.

2606.06416 2026-06-05 cs.AI cs.CL cs.LG cs.MA

Unsupervised Skill Discovery for Agentic Data Analysis

面向智能体数据分析的无监督技能发现

Zhisong Qiu, Kangqi Song, Shengwei Tang, Shuofei Qiao, Lei Liang, Huajun Chen, Shumin Deng

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出DataCOPE框架,通过无监督验证器引导从探索轨迹中发现可复用的数据分析技能,在报告式和推理式分析任务上分别提升平均得分9.71%和32.30%。

详情
Comments
Work in progress
AI中文摘要

推理时技能增强通过注入可复用的程序性知识而不更新模型参数,为改进数据分析智能体提供了一种轻量级方法。然而,发现有效的数据分析技能仍然具有挑战性,因为可靠的监督成本高昂,且成功标准因分析格式而异。这提出了一个关键问题:如何仅从无标签探索中发现可复用的数据分析技能。我们提出DataCOPE,一种面向数据分析智能体的无监督验证器引导的技能发现框架。DataCOPE从探索轨迹中提取验证器信号,并利用这些信号表征轨迹间的相对质量或一致性。它迭代地协调一个数据分析智能体用于轨迹生成、一个无监督验证器用于信号提取、以及一个技能管理器用于对比式技能蒸馏。对于报告式分析,我们将验证器实例化为自适应检查表验证器,该验证器推导任务特定标准,通过可验证覆盖率对报告评分,并迭代优化检查表。对于推理式分析,我们将其实例化为答案一致性验证器,该验证器根据答案一致性对轨迹分组,并使用自一致性作为辅助信号。我们在Deep Data Research的报告式分析和DABStep的推理式分析上评估DataCOPE。在两种设置下,DataCOPE在保留任务上持续优于基线。在四种模型设置上平均,DataCOPE在报告式和推理式任务上分别将平均得分提高了9.71%和32.30%。

英文摘要

Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effective skills for data analysis remains challenging, as reliable supervision is expensive and success criteria vary across analytical formats. This raises the key question of how to discover reusable data-analysis skills from unlabeled exploration alone. We propose DataCOPE, an unsupervised verifier-guided skill discovery framework for data-analytic agents. DataCOPE derives verifier signals from the exploration trajectories and uses them to characterize relative quality or aggreement among trajectories. It iteratively coordinates a Data-Analytic Agent for trajectory generation, an Unsupervised Verifier for signal extraction, and a Skill Manager for contrastive skill distillation. For report-style analysis, we instantiate the verifier as an Adaptive Checklist Verifier that derives task-specific criteria, scores reports by verifiable coverage, and iteratively refines the checklist. For reasoning-style analysis, we instantiate it as an Answer Agreement Verifier that groups trajectories by answer agreement and uses self-consistency as an auxiliary signal. We evaluate DataCOPE on report-style analysis from Deep Data Research and reasoning-style analysis from DABStep. Across both settings, DataCOPE consistently improves held-out performance over baselines. Averaged across four model settings, DataCOPE improves the mean score by 9.71% and 32.30% on report-style and reasoning-style tasks respectively.

2606.06390 2026-06-05 cs.CV cs.AI

HomeWorld: A Unified Floorplan-to-Furnished Framework for Generating Controllable, Densely Interactive Whole-Home Scenes

HomeWorld:一个统一的从平面图到家具的框架,用于生成可控、密集交互的全屋场景

Wenbo Li, Xiaoliang Ju, Zipeng Qin, Rongyao Fang, Hongsheng Li

发表机构 * Ace Robotics(Ace机器人公司) CUHK MMLab(香港大学多模态实验室) Shenzhen Loop Area Institute(深圳环城区域研究院)

AI总结 提出一个统一的分层框架,通过大规模真实平面图数据集训练大语言模型生成全屋平面图,结合图像生成模型和VLM优化器生成家具及小物体布局,并附加物理属性和纹理光照,实现可控、高真实感的全屋场景生成。

详情
AI中文摘要

室内场景生成对于机器人仿真和现代室内设计至关重要。然而,复杂的布局加上稀缺的3D场景数据使得基于学习的生成具有挑战性。现有方法通常依赖手工规则或关注孤立子任务(例如平面图合成或单房间家具布置),生成的全屋场景缺乏全局连贯性、真实感和仿真就绪性。为缓解这些限制,我们提出一个统一的分层框架,将室内场景合成分解为可控阶段。首先,我们整理了一个包含30万真实住宅平面图的大规模数据集,用于训练一个全屋平面图生成的大语言模型。通过详细描述和基于K-D树的表示,我们的方法实现了细粒度、可控的全屋平面图生成。基于生成的全屋平面图,我们利用图像生成模型从多级漫游视角草拟家具布局,然后生成不同支撑表面(例如橱柜、书桌和餐桌)上可操作小物体的布局,用于具身AI仿真。在家具和物体布局生成过程中,一个基于VLM的优化器迭代修正家具和物体放置,而一个3D生成模型则允许灵活替换单个资产。我们进一步附加基本物理属性和简单表面纹理与光照设置,以完成用于具身AI的流水线。实验和用户研究表明,我们的流水线生成的室内空间具有更大的布局多样性和更强的3D设计吸引力,在定量和定性指标上均优于先前方法。最后,除了生成流水线,我们还将向社区发布平面图数据集和5000个完全家具化的场景。项目页面:https://kairos-homeworld.github.io/

英文摘要

Indoor scene generation is crucial for robot simulation and modern interior design. However, complex layouts together with scarce 3D scene data make learning-based generation challenging. Existing methods often rely on hand-crafted rules or focus on isolated sub-tasks (e.g., floorplan synthesis or single-room furnishing), producing whole-home scenes that lack global coherence, realism, and simulation readiness. To mitigate these limitations, we propose a unified hierarchical framework that decomposes indoor scene synthesis into controllable stages. First, we curate a large-scale dataset of 300K real residential floorplans to train a large language model for whole-home floorplan generation. With detailed descriptions and a K-D tree-based representation, our method enables fine-grained, controllable whole-home floorplan generation. Building upon the generated whole-home floorplan, we leverage image generation models to draft furniture layouts from multi-level roaming viewpoints, and then generate the layouts of small manipulable objects on different supporting surfaces (e.g., cabinets, desks, and dining tables) for embodied AI simulation. During furniture and object layout generation, a VLM-based refiner iteratively corrects furniture and object placement, and a 3D generative model enables flexible replacement of individual assets. We further attach basic physical attributes and simple surface texture and lighting setups to complete the pipeline for embodied AI use. Experiments and user studies demonstrate that our pipeline produces indoor spaces with greater layout diversity and stronger 3D design appeal, outperforming prior methods on both quantitative and qualitative metrics. Finally, alongside our generation pipeline, we will release the floorplan dataset and 5K fully furnished scenes to the community. Project Page: https://kairos-homeworld.github.io/

2606.06385 2026-06-05 cs.LG

Learned Response-Field Inertia Operator for HEC-RAS 2D Water-Surface Elevation Prediction

用于 HEC-RAS 2D 水面高程预测的学习型响应场惯性算子

Edward Holmberg, Elias Ioup, Md Meftahul Ferdaus, Mahdi Abdelguerfi, Julian Simeonov

发表机构 * Canizaro Livingston Gulf States Center for Environmental Informatics, Department of Computer Science, The University of New Orleans(坎西罗利文斯顿湾州环境信息中心,计算机科学系,新奥尔良大学) Center for Geospatial Sciences, Naval Research Laboratory(地理空间科学中心,海军研究实验室) Ocean Sciences Division, Naval Research Laboratory(海洋科学 division,海军研究实验室)

AI总结 提出学习型响应场惯性算子(LRFIO),一种基于增量、无外力项的学习代理模型,通过从已求解的 HEC-RAS 轨迹中校准惯性响应算子并在原生非均匀网格上进行封闭形式滚动预测,实现了跨数据集的水面高程预测,并展示了自适应复杂度控制。

详情
Comments
Preprint manuscript prepared using IEEEtran journal format
AI中文摘要

本文提出了一种跨数据集评估学习型原生网格代理模型的方法,用于 HEC-RAS 2D 中求解器一致的水面高程(WSE)预测。为避免栅格重映射误差和信息访问混淆,代理模型直接在原始非均匀计算单元上评估,并采用显式策略分离静态项目输入、当前水力状态、项目输入强迫、校准衍生量以及未来求解器输出目标。我们引入了学习型响应场惯性算子(LRFIO),这是一种无外力、基于增量的学习代理模型,它从已求解的 HEC-RAS 轨迹中校准惯性响应算子,并通过封闭形式的原生网格滚动部署保留的算子。LRFIO 评估了一个基例优先的响应层次结构,包括持久性、全局校准惯性和分段响应场惯性。分段、残差校正和神经化惯性被视为可学习建模选择,仅当验证证据证明其成本合理时才保留增加的复杂度。在四个不同的 HEC-RAS 2D 基准测试中,LRFIO 对不同领域保留了不同的响应结构,展示了自适应学习复杂度。选择器审计显示复杂度可控,最大验证遗憾为 4.30%。在部署期间,保留的滚动时间范围为 0.003 秒至 0.242 秒,Beaver Bayou 实测-求解比较表明,相对于 HEC-RAS 实现了约 2.75 × 10^4 的视界归一化加速。这些结果表明,当前的原生网格增量是一个强大的求解器条件预测支架,并且仅在经验证实时才应保留增加的响应场、神经或空间复杂度。

英文摘要

This article presents a cross-dataset evaluation of learned native-cell surrogate models for solver-consistent water-surface elevation (WSE) prediction in HEC-RAS 2D. To avoid raster remapping error and information-access confounding, surrogates are evaluated directly on the original nonuniform computational cells under an explicit policy that separates static project inputs, current hydraulic state, project-input forcing, calibration-derived quantities, and future solver-output targets. We introduce the Learned Response-Field Inertia Operator (LRFIO), a no-forcing, increment-based learned surrogate that calibrates an inertial response operator from solved HEC-RAS trajectories and deploys the retained operator through closed-form native-cell rollout. LRFIO evaluates a base-case-first response hierarchy consisting of persistence, global calibrated inertia, and segmented response-field inertia. Segmentation, residual correction, and neuralized inertia are treated as learnable modeling choices, with added complexity retained only when validation evidence justifies its cost. Evaluated across four diverse HEC-RAS 2D benchmarks, LRFIO retains different response structures for different domains, demonstrating adaptive learned complexity. The selector audit shows controlled complexity with a maximum validation regret of 4.30%. During deployment, retained rollout times range from 0.003 s to 0.242 s, and the Beaver Bayou measured-solve comparison gives an estimated 2.75 x 10^4 horizon-normalized speedup over HEC-RAS. These results indicate that the current native-cell increment is a strong solver-conditioned predictive scaffold and that added response-field, neural, or spatial complexity should be retained only when empirically justified.

2606.06380 2026-06-05 cs.CL cs.AI cs.MA cs.NE

Emergent Language as an Approach to Conscious AI

涌现语言作为有意识AI的一种方法

Zengqing Wu, Chuan Xiao

发表机构 * University of Osaka(大阪大学)

AI总结 提出一种生成式方法,通过多智能体强化学习中的涌现语言,在最小先验下研究意识相关结构,并证明智能体可发展出自我指涉通信(如回声-不匹配检测电路)。

详情
Comments
Source codes available at https://github.com/wuzengqing001225/ConsciousAI_Indexicality/
AI中文摘要

人工系统是否有意识的问题仍然悬而未决,部分原因是现有方法要么根据理论派生的清单评估系统(判别式),要么直接工程化受意识启发的模块(架构式);两者都未能确定观察到的结构是否是人类语言先验的产物。我们提出一种生成式方法论:多智能体强化学习中的涌现语言(EL),其中智能体从最小起点(无语言、无自我概念、极少接触人类文本)出发,仅在任务压力下发展通信,确保因果可归因于任务需求而非继承的人类语言先验。我们通过讨论EL如何作为研究意识相关结构的生成工具来定位我们的方法论,包括环境复杂性的作用以及对涌现通信的解释。作为概念验证,我们在一个最小环境中实例化该方法论,并证明智能体发展出自我指涉通信,包括一个回声-不匹配检测电路,该电路并非仅由任务结构或架构预测,而是从特定的环境可供性中涌现。

英文摘要

The question of whether artificial systems can be conscious remains open, in part because existing approaches either evaluate systems against theory-derived checklists (discriminative) or engineer consciousness-inspired modules directly (architectural); both leave open whether observed structures are artifacts of human language priors. We propose a generative methodology: emergent language (EL) in multi-agent reinforcement learning, where agents start from minimal (no language, no concept of self, minimal exposure to human text) and develop communication under task pressure alone, ensuring causal attributability to task demands rather than inherited human language priors. We position our methodology by discussing how EL serves as a generative tool for studying consciousness-relevant structure, including the role of environment complexity and the interpretation of emergent communication. As a proof of concept, we instantiate this methodology in a minimal environment and show that agents develop self-referential communication, including an echo-mismatch detection circuit that is not predicted by task structure or architecture alone but emerges from a specific environmental affordance.