arXivDaily arXiv每日学术速递 周一至周五更新
重置
2511.19314 2026-06-11 cs.AI cs.CL cs.LG 版本更新

PRInTS: Reward Modeling for Long-Horizon Information Seeking

PRInTS:面向长程信息检索的奖励建模

Jaewoo Lee, Archiki Prasad, Justin Chih-Yao Chen, Zaid Khan, Elias Stengel-Eskin, Mohit Bansal

发表机构 * University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校) University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 提出PRInTS生成式过程奖励模型,通过密集评分和轨迹摘要提升长程信息检索中工具交互与推理能力,在多个基准上超越前沿模型。

详情
Comments
ACL 2026, 19 pages, code: this https URL
AI中文摘要

信息检索是AI智能体的核心能力,要求它们在整个长轨迹中收集和推理工具生成的信息。然而,这种多步骤信息检索任务对于基于语言模型的智能体仍然具有挑战性。虽然过程奖励模型(PRM)可以通过在测试时对候选步骤进行排序来指导智能体,但现有的PRM——设计用于具有二元判断的短程推理——无法捕捉信息检索步骤的更丰富维度,例如工具交互和对工具输出的推理,也无法处理长程任务中快速增长的上下文。为了解决这些限制,我们引入了PRInTS,一种具有双重能力的生成式PRM:(1)基于PRM对步骤质量多个维度(例如,工具输出的解释、工具调用的信息量)的推理进行密集评分,以及(2)轨迹摘要,在压缩不断增长的上下文的同时保留步骤评估所需的基本信息。在FRAMES、GAIA(级别1-3)和WebWalkerQA(简单-困难)基准上对多个模型的广泛评估表明,使用PRInTS进行最佳n采样增强了开源模型以及专门智能体的信息检索能力,以更小的骨干智能体匹配或超越前沿模型,并优于其他强奖励建模基线。

英文摘要

Information-seeking is a core capability for AI agents, requiring them to gather and reason over tool-generated information across long trajectories. However, such multi-step information-seeking tasks remain challenging for agents backed by language models. While process reward models (PRMs) can guide agents by ranking candidate steps at test-time, existing PRMs - designed for short reasoning with binary judgment - cannot capture richer dimensions of information-seeking steps, such as tool interactions and reasoning over tool outputs, nor handle the rapidly growing context in long-horizon tasks. To address these limitations, we introduce PRInTS, a generative PRM trained with dual capabilities: (1) dense scoring based on the PRM's reasoning across multiple dimensions of step quality (e.g., interpretation of tool outputs, tool call informativeness) and (2) trajectory summarization that compresses the growing context while preserving essential information for step evaluation. Extensive evaluations across FRAMES, GAIA (levels 1-3), and WebWalkerQA (easy-hard) benchmarks on multiple models reveal that best-of-n sampling with PRInTS enhances information-seeking in open-source models as well as specialized agents, matching or surpassing frontier models with a much smaller backbone agent and outperforming other strong reward modeling baselines.

2511.00044 2026-06-11 cs.LG nlin.AO 版本更新

Time-multiplexed layer reuse for physical neural networks

物理神经网络的时间复用层重用

Kohei Tsuchiyama, Andre Roehm, Takatomo Mihana, Ryoichi Horisaki

发表机构 * Graduate School of Information Science and Technology, The University of Tokyo(信息科学与技术研究生学校,东京大学)

AI总结 针对物理神经网络权重调整慢的瓶颈,提出TIDAL-Net,通过时间复用层增加有效深度,在图像分类和自然语言处理任务上提升性能。

详情
AI中文摘要

物理神经网络(PNN)是下一代计算的有前途的候选者,但现有演示仍比现代数字神经网络小几个数量级,而现代数字神经网络的最新进展是由可训练参数的快速增长驱动的。这种情况类似于早期数字神经网络的限制,这导致了关于参数重用的想法。我们研究了类似高效的硬件架构可能是什么样子,特别关注PNN中权重重新调整的常见瓶颈。我们提出了时间索引深度交替层网络(TIDAL-Net),它占据循环神经网络和深度神经网络之间的中间状态,专门针对常见PNN原型的规模和限制。TIDAL-Net利用许多PNN中快速前向动力学和缓慢可训练权重与偏置之间的时间尺度分离,通过逐层时间复用来增加有效深度,同时限制实现成本。在图像分类和自然语言处理任务上的数值实验表明,TIDAL-Net在仅对传统PNN进行微小修改的情况下提高了性能。

英文摘要

Physical neural networks (PNNs) are promising candidates for next-generation computing, but existing demonstrations remain several orders of magnitude smaller than modern digital neural networks, whose recent advances have been driven by rapid growth in trainable parameters. This situation resembles the constraints of early digital neural networks, which led to ideas around parameter reuse. We investigate what similarly efficient hardware architectures may look like, focusing specifically on the common bottleneck of slow re-adjustment of the weights in PNNs. We propose the Time-Indexed Deep Alternating Layers Network (TIDAL-Net), which occupies an intermediate regime between recurrent and deep neural networks, specifically aimed at the scales and restrictions of common PNN prototypes. TIDAL-Net leverages the timescale separation found in many PNNs between fast forward dynamics and slowly trainable weights and biases, using layer-by-layer time multiplexing to increase effective depth while limiting implementation cost. Numerical experiments on image classification and natural language processing tasks show that TIDAL-Net improves performance with only minor modifications to conventional PNNs.

2511.13207 2026-06-11 cs.RO cs.CV 版本更新

PIGEON: VLM-Driven Object Navigation via Points of Interest Selection

PIGEON: 通过兴趣点选择的VLM驱动物体导航

Cheng Peng, Zhenzhe Zhang, Xiaobao Wei, Yanhao Zhang, Heng Wang, Pengwei Wang, Zhongyuan Wang, Cheng Chi, Shanghang Zhang, Jing Liu

发表机构 * Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) Beijing Academy of Artificial Intelligence (BAAI)(北京人工智能研究院) Peking University(北京大学) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院)

AI总结 提出PIGEON框架,将物体导航建模为基于原始观测的稀疏决策问题,通过兴趣点(PoI)作为视觉决策单元,结合VLM选择关键点,实现零样本SOTA性能并迁移至主动具身问答。

详情
AI中文摘要

在未见过的室内环境中进行物体导航要求智能体在部分可观测条件下执行语义搜索。视觉-语言模型(VLM)为此任务提供了强大的语义-空间先验,但如何将其与机器人导航接口仍然具有挑战性:密集的VLM推理成本高昂,而将环境抽象为符号记忆通常将高层推理与支持它的原始视觉证据分离。我们提出PIGEON(基于兴趣点引导的物体导航探索),一种VLM驱动的框架,将物体导航建模为基于原始观测的稀疏决策问题。PIGEON引入兴趣点(PoI)作为稀疏视觉决策单元,将几何可执行的路点与原始自我中心观测耦合。PIGEON不是将VLM用作密集控制器或限制其进行前沿排序,而是使VLM能够选择任务关键的PoI,包括探索前沿、疑似目标物体、可穿越楼梯和楼层级摘要,而低级规划器在它们之间执行连续运动。这种PoI接口进一步使高层导航决策可验证,使我们能够开发一个RLVR流水线,无需手动思维链注释即可改进局部VLM。在Habitat ObjectNav基准上的大量实验表明,PIGEON实现了零样本最先进性能,与基础模型能力一致扩展,并且仅通过提示修改即可迁移到主动具身问答。在物理机器人上的实际部署进一步证明了其鲁棒性和效率。

英文摘要

Object navigation in unseen indoor environments requires agents to perform semantic search under partial observability. Vision-language models (VLMs) provide strong semantic-spatial priors for this task, but how to interface them with robot navigation remains challenging: dense VLM inference is expensive, while abstracting environments into symbolic memories often separates high-level reasoning from the raw visual evidence that supports it. We propose we propose PIGEON (Point of Interest Guided Exploration for Object Navigation), a VLM-driven framework that formulates object navigation as raw-observation-grounded sparse decision problem. PIGEON introduces Points of Interest (PoIs) as sparse visual decision units that couple geometrically executable waypoints with raw egocentric observations. Rather than using VLMs as dense controllers or restricting them to frontier ranking, PIGEON enables VLMs to select among task-critical PoIs, including exploration frontiers, suspected target objects, traversable stairs, and floor-level summaries, while low-level planners execute continuous motion between them. This PoI interface further makes high-level navigation decisions verifiable, allowing us to develop an RLVR pipeline that improves local VLMs without manual Chain-of-Thought annotations. Extensive experiments on Habitat ObjectNav benchmarks show that PIGEON achieves state-of-the-art zero-shot performance, scales consistently with foundation model capacity, and transfers to Active Embodied Question Answering with only prompt modifications. Real-world deployments on physical robots further demonstrate its robustness and efficiency.

2511.08195 2026-06-11 cs.CV 版本更新

UI2Code^N: UI-to-Code Generation as Interactive Visual Optimization

UI2Code^N: 将UI到代码生成视为交互式视觉优化

Zhen Yang, Wenyi Hong, Mingde Xu, Xinyue Fan, Weihan Wang, Jiale Cheng, Xiaotao Gu, Jie Tang

发表机构 * Zhejiang University(浙江大学)

AI总结 提出将UI截图转代码任务重构为交互式视觉优化问题,采用基于偏好的强化学习方法RVPO优化视觉排名,在UI起草、润色和编辑任务上达到SOTA。

详情
Comments
27 pages
AI中文摘要

UI到代码旨在将UI截图转换为可执行的前端代码。尽管视觉语言模型(VLM)取得了进展,但大多数现有方法将UI到代码视为单次生成,这与现实世界中本质上是迭代和反馈驱动的UI开发不匹配。我们将UI到代码重新表述为一个交互式视觉优化问题,其中代码生成嵌入在执行、视觉检查和由渲染视觉反馈驱动的迭代细化的闭环过程中。为了解决视觉目标的不可微性和绝对视觉评估器的噪声,我们提出了相对视觉策略优化(RVPO),这是一种基于偏好的强化学习方法,在执行反馈下优化渲染候选之间的相对视觉排名。我们将这一范式实例化为UI2Code^N,这是一个开源的9B模型,通过持续预训练、监督微调和强化学习进行训练。实验表明,在UI起草、UI润色和UI编辑基准测试中,即使超越更大的模型,也达到了最先进的性能,并且通过迭代视觉优化性能持续提升。我们的代码和模型可在该https URL获取。

英文摘要

UI-to-code aims to translate UI screenshots into executable front-end code. Despite progress with vision-language models (VLMs), most existing methods formulate UI-to-code as a single-pass generation, which mismatches real-world UI development that is inherently iterative and feedback-driven. We reformulate UI-to-code as an interactive visual optimization problem, where code generation is embedded in a closed-loop process of execution, visual inspection, and iterative refinement driven by rendered visual feedback. To address the non-differentiability of visual objectives and the noise of absolute visual evaluators, we propose Relative Visual Policy Optimization (RVPO), a preference-based reinforcement learning method that optimizes relative visual rankings among rendered candidates under execution feedback. We instantiate this paradigm in UI2Code^N, an open-source 9B model trained via continual pre-training, supervised fine-tuning, and reinforcement learning. Experiments demonstrate state-of-the-art performance on UI drafting, UI polishing, and UI editing benchmarks, even outperforming larger models, with performance consistently improving through iterative visual optimization. Our code and models are available at this https URL.

2411.12193 2026-06-11 stat.AP cs.LG stat.ML 版本更新

Hierarchical Probabilistic Conformal Prediction for Distributed Energy Resources Adoption

分布式能源采纳的分层概率保形预测

Wenbin Zhou, Shixiang Zhu

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 针对分布式能源采纳预测中的不确定性和分层电网结构,提出基于多元霍克斯过程与分裂保形预测的量化框架,确保聚合后统计有效性,在印第安纳波利斯数据上优于基线。

详情
AI中文摘要

分布式能源(DERs)的快速增长为电网管理带来了机遇和运营挑战。准确预测DER采纳对于主动基础设施规划至关重要,但DER增长的固有不确定性和空间差异使传统预测方法复杂化。此外,配电网的分层结构要求预测在电路和变电站层面均满足统计保证,这是可靠决策的非平凡要求。本文提出了一种新的DER采纳预测不确定性量化框架,确保在分层电网结构中的有效性。利用多元霍克斯过程建模DER采纳动态,并采用定制的分裂保形预测算法,我们引入了一种新的非一致性分数,在保持预测效率的同时,在聚合下保留统计保证。我们在温和条件下建立了理论有效性,并通过印第安纳州印第安纳波利斯的客户级太阳能电池板安装数据实证评估,表明我们的方法在预测准确性和不确定性校准方面始终优于现有基线。

英文摘要

The rapid growth of distributed energy resources (DERs) presents both opportunities and operational challenges for electric grid management. Accurately predicting DER adoption is critical for proactive infrastructure planning, but the inherent uncertainty and spatial disparity of DER growth complicate traditional forecasting approaches. Moreover, the hierarchical structure of distribution grids demands that predictions satisfy statistical guarantees at both the circuit and substation levels, a non-trivial requirement for reliable decision-making. In this paper, we propose a novel uncertainty quantification framework for DER adoption predictions that ensures validity across hierarchical grid structures. Leveraging a multivariate Hawkes process to model DER adoption dynamics and a tailored split conformal prediction algorithm, we introduce a new nonconformity score that preserves statistical guarantees under aggregation while maintaining prediction efficiency. We establish theoretical validity under mild conditions and demonstrate through empirical evaluation on customer-level solar panel installation data from Indianapolis, Indiana that our method consistently outperforms existing baselines in both predictive accuracy and uncertainty calibration.

2511.09789 2026-06-11 cs.LG 版本更新

CaReTS: A Multi-Task Framework Unifying Classification and Regression for Time Series Forecasting

CaReTS:统一分类与回归的多任务时间序列预测框架

Fulong Yao, Wanqing Zhao, Chao Zheng, Xiaofei Han

发表机构 * Cardiff University(卡迪夫大学) Newcastle University(纽卡斯尔大学) University of Leeds(利兹大学)

AI总结 提出CaReTS多任务框架,通过双流架构联合分类趋势与回归偏差,实现高精度预测与可解释性,在真实数据集上优于现有方法。

详情
AI中文摘要

近年来深度预测模型取得了显著性能,但大多数方法仍难以同时提供准确的预测和对时间动态的可解释洞察。本文提出CaReTS,一种新颖的多任务学习框架,结合分类和回归任务用于多步时间序列预测问题。该框架采用双流架构,其中分类分支学习未来的逐步趋势,而回归分支估计目标变量最新观测值的相应偏差。双流设计通过分离目标变量的宏观趋势和微观偏差,提供更具可解释性的预测。为了在输出预测、偏差估计和趋势分类中实现有效学习,我们设计了一个具有不确定性加权机制的多任务损失,以自适应平衡每个任务的贡献。此外,在该框架下实例化了四种变体(CaReTS1-4),以集成主流时序建模编码器,包括卷积神经网络(CNN)、长短期记忆网络(LSTM)和Transformer。在真实数据集上的实验表明,CaReTS在预测准确性上优于最先进的算法,同时实现了更高的趋势分类性能。

英文摘要

Recent advances in deep forecasting models have achieved remarkable performance, yet most approaches still struggle to provide both accurate predictions and interpretable insights into temporal dynamics. This paper proposes CaReTS, a novel multi-task learning framework that combines classification and regression tasks for multi-step time series forecasting problems. The framework adopts a dual-stream architecture, where a classification branch learns the stepwise trend into the future, while a regression branch estimates the corresponding deviations from the latest observation of the target variable. The dual-stream design provides more interpretable predictions by disentangling macro-level trends from micro-level deviations in the target variable. To enable effective learning in output prediction, deviation estimation, and trend classification, we design a multi-task loss with uncertainty-aware weighting to adaptively balance the contribution of each task. Furthermore, four variants (CaReTS1--4) are instantiated under this framework to incorporate mainstream temporal modelling encoders, including convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and Transformers. Experiments on real-world datasets demonstrate that CaReTS outperforms state-of-the-art (SOTA) algorithms in forecasting accuracy, while achieving higher trend classification performance.

2511.08299 2026-06-11 cs.RO 版本更新

Phase-Based Multi-Gait Learning for a Salamander-Like Robot

基于相位的多步态学习用于蝾螈机器人

Zhiang Liu, Yang Liu, Yongchun Fang, Xian Guo

发表机构 * Nankai University(南开大学)

AI总结 提出一种基于相位的无参考运动学习框架,通过相位变量和相位覆盖奖励,结合形态对称数据增强,使蝾螈机器人自主习得22种动态对称步态。

详情
AI中文摘要

蝾螈机器人受其生物对应物的骨骼结构启发而设计。然而,现有控制器无法充分利用这些形态特征,主要依赖预定义模式或关节轨迹,这阻碍了多样化和灵活步态的生成,并限制了其在现实场景中的应用。在本文中,我们提出一种基于相位的学习框架,使机器人无需使用参考运动即可获得多样化的步态库。每个身体部分由一个能够向前和向后演化的相位变量控制,并采用相位覆盖奖励来促进腿部相位空间的探索。此外,通过数据增强融入机器人的形态对称性,提高了样本效率,并在学习行为中强制实现了运动级和任务级的对称性。大量实验表明,机器人成功习得了22种具有动态和对称运动的代表性步态,证明了所提学习框架的有效性。

英文摘要

Salamander-like robots are designed inspired by the skeletal structure of their biological counterparts. However, existing controllers cannot fully exploit these morphological features and largely rely on predefined patterns or joint trajectories, which prevents the generation of diverse and flexible gaits and limits their applicability in real-world scenarios. In this paper, we propose a phase-based learning framework that enables the robot to acquire a diverse repertoire of gaits without using reference motions. Each body part is controlled by a phase variable capable of forward and backward evolution, with a phase coverage reward to promote the exploration of the leg phase space. Additionally, morphological symmetry of the robot is incorporated via data augmentation, improving sample efficiency and enforcing both motion-level and task-level symmetry in learned behaviors. Extensive experiments show that the robot successfully acquires 22 representative gaits exhibiting both dynamic and symmetric movements, demonstrating the effectiveness of the proposed learning framework.

2511.07332 2026-06-11 cs.LG cs.AI 版本更新

Grounding Computer Use Agents on Human Demonstrations

基于人类演示的计算机使用智能体基础构建

Aarash Feizi, Shravan Nayak, Xiangru Jian, Kevin Qinghong Lin, Kaixin Li, Rabiul Awal, Xing Han Lù, Johan Obando-Ceron, Juan A. Rodriguez, Nicolas Chapados, David Vazquez, Adriana Romero-Soriano, Reihaneh Rabbany, Perouz Taslakian, Christopher Pal, Spandana Gella, Sai Rajeswar

发表机构 * Mila - Quebec AI Institute(魁北克AI研究所) McGill University(麦吉尔大学) Université de Montréal(蒙特利尔大学) ServiceNow Research(ServiceNow研究) University of Waterloo(滑铁卢大学) University of Oxford(牛津大学) National University of Singapore(新加坡国立大学) Polytechnique Montréal(蒙特利尔理工学院) École de Technologie Supérieure(高级技术学院) CIFAR AI Chair(CIFAR人工智能主席)

AI总结 为解决桌面环境高质量基础数据稀缺问题,构建了包含87个应用、56K截图和3.56M人工标注的GroundCUA数据集,并基于此训练GroundNext模型,在5个基准上以少于先前十分之一的数据取得最优结果。

详情
Comments
Accepted at ICLR 2026
AI中文摘要

构建可靠的计算机使用智能体需要基础构建:将自然语言指令准确连接到正确的屏幕元素。尽管存在大量用于网络和移动交互的数据集,但桌面环境的高质量资源有限。为填补这一空白,我们引入了GroundCUA,一个基于专家人类演示构建的大规模桌面基础数据集。它涵盖12个类别的87个应用,包含56K张截图,每个屏幕元素都经过仔细标注,总计超过3.56M个人工验证标注。从这些演示中,我们生成了多样的指令,覆盖广泛的实际任务,为模型训练提供高质量数据。利用GroundCUA,我们开发了GroundNext系列模型,将指令映射到目标UI元素。在3B和7B规模上,GroundNext通过监督微调在五个基准上取得了最先进的结果,同时所需训练数据不到先前工作的十分之一。强化学习后训练进一步提升了性能,在OSWorld基准上使用o3作为规划器的智能体评估中,GroundNext取得了与使用更多数据训练的模型相当或更优的结果。这些结果证明了高质量、专家驱动数据集在推进通用计算机使用智能体中的关键作用。

英文摘要

Building reliable computer-use agents requires grounding: accurately connecting natural language instructions to the correct on-screen elements. While large datasets exist for web and mobile interactions, high-quality resources for desktop environments are limited. To address this gap, we introduce GroundCUA, a large-scale desktop grounding dataset built from expert human demonstrations. It covers 87 applications across 12 categories and includes 56K screenshots, with every on-screen element carefully annotated for a total of over 3.56M human-verified annotations. From these demonstrations, we generate diverse instructions that capture a wide range of real-world tasks, providing high-quality data for model training. Using GroundCUA, we develop the GroundNext family of models that map instructions to their target UI elements. At both 3B and 7B scales, GroundNext achieves state-of-the-art results across five benchmarks using supervised fine-tuning, while requiring less than one-tenth the training data of prior work. Reinforcement learning post-training further improves performance, and when evaluated in an agentic setting on the OSWorld benchmark using o3 as planner, GroundNext attains comparable or superior results to models trained with substantially more data,. These results demonstrate the critical role of high-quality, expert-driven datasets in advancing general-purpose computer-use agents.

2507.23534 2026-06-11 cs.LG cs.CV 版本更新

Continual Learning with Support Boundary Experience Blending

支持边界经验混合的持续学习

Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

发表机构 * National Taiwan University(国立台湾大学)

AI总结 提出经验混合框架,通过差分隐私启发的噪声生成支持边界数据,联合训练样本和边界数据以正则化决策边界,在多个数据集上提升持续学习准确率。

详情
AI中文摘要

持续学习旨在减轻模型在顺序任务训练时的灾难性遗忘。常见方法经验回放存储过去的样本,但仅稀疏地近似数据分布,导致决策边界脆弱且过于简化。我们通过引入支持边界数据来解决这一限制,该数据通过差分隐私启发的噪声注入潜在特征,生成边界邻近表示,隐式正则化决策边界。基于此,我们提出经验混合框架,通过双模型聚合策略联合训练样本和支持边界数据。经验混合有两个组成部分:(1) 潜在空间噪声注入以生成支持边界数据,(2) 联合利用样本和支持边界数据的端到端训练。与标准经验回放不同,支持边界数据丰富了决策边界附近的特征空间,从而实现更稳定和鲁棒的持续学习。在CIFAR-10、CIFAR-100、Tiny ImageNet和ImageNet1K上的大量实验分别展示了10%、6%、13%和2%的持续准确率提升。

英文摘要

Continual learning (CL) seeks to mitigate catastrophic forgetting when models are trained with sequential tasks. A common approach, experience replay (ER), stores past exemplars but only sparsely approximates the data distribution, yielding fragile and oversimplified decision boundaries. We address this limitation by introducing Support Boundary Data (SBD), generated via differential-privacy-inspired noise into latent features to create boundary-adjacent representations that implicitly regularize decision boundaries. Building on this idea, we propose Experience Blending (EB), a framework that jointly trains on exemplars and SBD through a dual-model aggregation strategy. EB has two components: (1) latent-space noise injection to generate support boundary data, and (2) end-to-end training that jointly leverages exemplars and SBD. Unlike standard experience replay, SBD enriches the feature space near decision boundaries, leading to more stable and robust continual learning. Extensive experiments on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet1K demonstrate consistent accuracy improvements of 10%, 6%, 13%, 2%, respectively.

2509.11575 2026-06-11 cs.AI 版本更新

A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models

时间序列中基于大语言模型的推理与智能体系统综述

Ching Chang, Yidan Shi, Defu Cao, Wei Yang, Jeehyun Hwang, Haixin Wang, Jiacheng Pang, Wei Wang, Yan Liu, Wen-Chih Peng, Tien-Fu Chen

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校) University of Southern California(南加州大学) National Yang Ming Chiao Tung University(阳明交通大学)

AI总结 本文定义时间序列推理问题,按推理拓扑分为直接、线性链和分支结构三类,结合传统分析、解释、因果推断和生成等目标,综述方法、系统、数据集和评估实践,并指导拓扑选择与部署权衡。

详情
Comments
Accepted to Transactions on Machine Learning Research (TMLR)
AI中文摘要

时间序列推理将时间作为第一类轴,并将中间证据直接纳入答案。本综述定义该问题,并按推理拓扑组织文献,分为三类:一步直接推理、具有显式中间步骤的线性链推理,以及探索、修正和聚合的分支结构推理。该拓扑与领域的主要目标交叉,包括传统时间序列分析、解释与理解、因果推断与决策,以及时间序列生成,同时一个紧凑的标签集跨越这些轴,并捕获分解与验证、集成、工具使用、知识访问、多模态、智能体循环和LLM对齐机制。跨领域回顾了方法和系统,展示了每种拓扑所能实现的功能以及在忠实性或鲁棒性方面的不足,同时提供了支持研究和部署的精选数据集、基准和资源(此 https URL)。强调了保持证据可见且时间对齐的评估实践,并提炼了关于将拓扑与不确定性匹配、基于可观察伪影进行基础化、规划偏移和流式处理,以及将成本和延迟视为设计预算的指导。我们强调,推理结构必须在基础化和自我纠正的能力与计算成本和可重复性之间取得平衡,而未来的进展可能依赖于将推理质量与效用联系起来的基准,以及在偏移感知、流式处理和长视野设置下权衡成本和风险的闭环测试平台。综合来看,这些方向标志着从狭窄的准确性向大规模可靠性的转变,使系统不仅能够分析,还能理解、解释和作用于动态世界,提供可追溯的证据和可信的结果。

英文摘要

Time series reasoning treats time as a first-class axis and incorporates intermediate evidence directly into the answer. This survey defines the problem and organizes the literature by reasoning topology with three families: direct reasoning in one step, linear chain reasoning with explicit intermediates, and branch-structured reasoning that explores, revises, and aggregates. The topology is crossed with the main objectives of the field, including traditional time series analysis, explanation and understanding, causal inference and decision making, and time series generation, while a compact tag set spans these axes and captures decomposition and verification, ensembling, tool use, knowledge access, multimodality, agent loops, and LLM alignment regimes. Methods and systems are reviewed across domains, showing what each topology enables and where it breaks down in faithfulness or robustness, along with curated datasets, benchmarks, and resources that support study and deployment ( this https URL ). Evaluation practices that keep evidence visible and temporally aligned are highlighted, and guidance is distilled on matching topology to uncertainty, grounding with observable artifacts, planning for shift and streaming, and treating cost and latency as design budgets. We emphasize that reasoning structures must balance capacity for grounding and self-correction against computational cost and reproducibility, while future progress will likely depend on benchmarks that tie reasoning quality to utility and on closed-loop testbeds that trade off cost and risk under shift-aware, streaming, and long-horizon settings. Taken together, these directions mark a shift from narrow accuracy toward reliability at scale, enabling systems that not only analyze but also understand, explain, and act on dynamic worlds with traceable evidence and credible outcomes.

2506.17137 2026-06-11 cs.CV 版本更新

Towards Conditional Feature Alignment for Cross-Domain Counting

面向跨域计数的条件特征对齐

Zhuonan Liang, Dongnan Liu, Jianan Fan, Yaxuan Song, Qiang Qu, Runnan Chen, Yu Yao, Peng Fu, Weidong Cai

发表机构 * Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) School of Electronic Engineering and Information Science, Beijing Institute of Technology(北京理工大学电子工程与信息科学学院)

AI总结 提出条件特征对齐(CFA)框架,通过标签诱导的条件对齐而非全局域不变性,解决跨域计数中密度分布变化问题,在无监督域适应和源域泛化任务上取得显著性能提升。

详情
Comments
12 pages, 6 figures, 4 tables
AI中文摘要

目标计数模型在跨域部署时性能往往会下降,因为密度组成在不同域之间变化,并且其本身与任务相关。标准的特征对齐方法倾向于通过鼓励全局域不变性来抑制这种变化,但当源域和目标域包含不同比例的背景、稀疏前景和密集前景时,这可能是有害的。我们提出条件特征对齐(CFA),一种跨域计数框架,它在标签诱导的条件下对齐表示,而不是在整个边缘特征分布上对齐。给定密度标注或伪密度预测,CFA构建前景/背景或密度级别的条件,并仅对齐属于匹配条件的特征。我们通过条件散度视角形式化这一思想,表明条件对齐消除了条件内的差异,同时保留了条件边缘的密度偏移。对于无监督域适应,CFA从标注中估计源域条件,从分离的伪密度图中估计目标域条件,然后执行条件级对抗对齐,并加入全图一致性正则化。对于源域泛化,我们通过MPCount实例化相同原则,在生成的源域视图之间强制执行条件级记忆一致性。在人群和细胞计数基准上的实验表明,在多种UDA和DG设置下,性能具有竞争力或得到提升。例如,在JHU-CROWD++ FH→SN上,CFA-DG将MAE/RMSE从MPCount的216.3/421.4降低到90.5/169.9,表明条件级对齐在大的天气和密度引起的偏移下特别有效。这些结果表明,条件级对齐是域自适应计数的一个有前景的设计原则。

英文摘要

Object counting models often degrade under cross-domain deployment because density composition varies across domains and is itself task-relevant. Standard feature alignment methods tend to suppress such variation by encouraging global domain invariance, which can be harmful when source and target domains contain different proportions of background, sparse foreground, and dense foreground. We propose Conditional Feature Alignment (CFA), a cross-domain counting framework that aligns representations within label-induced conditions rather than across full marginal feature distributions. Given density annotations or pseudo-density predictions, CFA constructs foreground/background or density-level conditions and aligns only features belonging to matching conditions. We formalise this idea through a conditional divergence perspective, showing that conditional alignment removes within-condition discrepancy while preserving condition-marginal density shift. For unsupervised domain adaptation, CFA estimates source conditions from annotations and target conditions from detached pseudo-density maps, then performs condition-wise adversarial alignment with full-image consistency regularisation. For source-domain generalisation, we instantiate the same principle with MPCount by enforcing condition-wise memory-consistency between generated source-domain views. Experiments on crowd and cell counting benchmarks show competitive or improved performance across diverse UDA and DG settings. For example, on JHU-CROWD++ FH$\rightarrow$SN, CFA-DG reduces MAE/RMSE from MPCount's 216.3/421.4 to 90.5/169.9, indicating that condition-wise alignment is especially effective under large weather- and density-induced shifts. These results suggest that condition-wise alignment is a promising design principle for domain-adaptive counting.

2510.24515 2026-06-11 cs.RO 版本更新

Learning Ordinal Response Policies in Rank-Based Stochastic Prize-Collecting Games

基于排序的随机奖品收集博弈中的序数响应策略学习

Malintha Fernando, Petter Ögren, Silun Zhang

发表机构 * KTH Royal Institute of Technology(皇家理工学院)

AI总结 提出随机奖品收集定向越野博弈(SPCOG),扩展团队定向越野问题至自利代理场景,利用序数排名(OR)作为强归纳偏置,并设计虚拟序数响应学习(FORL)算法实现收敛策略。

详情
AI中文摘要

团队定向越野问题(TOP)概括了自主移动、空中物流和监视应用中出现的许多现实世界多智能体调度和路由任务。虽然多智能体系统规划中存在多种TOP变体,但它们假设所有智能体都朝着单一目标合作;因此,当它们在奖励稀缺环境中竞争时,这些变体并不适用。我们提出随机奖品收集定向越野博弈(SPCOG)作为TOP的扩展,以在存在自利智能体、能量约束和随机转移的情况下在图上进行规划。关于完全图和星图的理论讨论表明,在SPCOG中存在唯一的纯纳什均衡,该均衡与基于排序的冲突解决下等效TOP的最优路由解一致。我们提出序数排名(OR)的概念,作为智能体全局排名及其在拓扑定义良好的邻域内位置的简洁表示。在动态和静态奖品分布下,对真实世界道路网络图进行的实证评估表明,在参数共享设置中,利用局部信息的策略可以优于利用全局信息的策略,前提是前者以OR而非全局排名为条件,这表明OR在图上的多智能体博弈中充当了强归纳偏置。与全局排名条件策略相比,OR条件策略还能更好地泛化到具有大量智能体的博弈中。最后,我们还提出虚拟序数响应学习(FORL)作为一种熵调节算法,以在图上奖品收集博弈的独立学习设置中获得收敛策略。

英文摘要

The Team Orienteering Problem (TOP) generalizes many real-world multi-agent scheduling and routing tasks that occur in autonomous mobility, aerial logistics, and surveillance applications. While many flavors of the TOP exist for planning in multi-agent systems, they assume that all the agents cooperate toward a single objective; therefore, they do not extend to settings when they compete in reward-scarce environments. We propose Stochastic Prize-Collecting Orienteering Games (SPCOG) as an extension of the TOP to plan in the presence of self-interested agents operating on a graph, under energy constraints and stochastic transitions. A theoretical discussion on complete and star graphs establishes that there is a unique pure Nash equilibrium in SPCOGs that coincides with the optimal routing solution of an equivalent TOP under rank-based conflict resolution. We propose the concept of Ordinal Rank (OR) as a concise representation of an agents' global rank and its location within a topological, well-defined neighborhood. Empirical evaluations conducted on real-world, road-network graphs under both dynamic and stationary prize distributions show that in parameter-sharing settings, the policies that leverage local information can outperform those policies leverage global information when the former is conditioned on the OR rather than the global rank, indicating that the OR acts as a strong inductive bias in multi-agent games on graphs. The OR-conditioned policies also generalize much better to games with large number of agents compared to global-rank conditioned policies. Finally, we also propose we propose Fictitious Ordinal Response Learning (FORL) as an entropy-regulated algorithm to obtain convergent policies in independent-learning settings in prize-collecting games on graphs.

2510.23320 2026-06-11 eess.AS cs.CL cs.SD 版本更新

LibriConvo: Simulating Conversations from Read Literature for ASR and Diarization

LibriConvo:从阅读文献模拟对话用于ASR和说话人日志

Máté Gedeon, Péter Mihajlik

发表机构 * Department of Telecommunications and Artificial Intelligence, Budapest University of Technology and Economics(电信与人工智能系,布达佩斯技术与经济大学) Speechtex Ltd.(Speechtex公司) ELTE Research Centre for Linguistics(ELTE语言研究所)

AI总结 提出LibriConvo合成对话语音语料库,基于说话人感知模拟对话框架构建,用于说话人日志和ASR基准测试,包含240.1小时音频,基线实验显示Sortformer在日志中优于pyannote,Fast Conformer-CTC在ASR中优于Whisper。

详情
Comments
Accepted by TSD 2026
AI中文摘要

我们介绍了LibriConvo,一个用于说话人日志和自动语音识别(ASR)的合成对话语音语料库,通过在数据集和基准测试设置中实例化先前提出的说话人感知模拟对话(SASC)框架构建而成。本文的主要贡献是基于该框架的语料库构建流程和基准测试。为了使数据更适合下游ASR和说话人日志,我们使用外部语音活动检测从英语CallHome估计对话时间统计信息,压缩长停顿,按书籍分组LibriTTS话语以改善局部语义连续性,并通过空间合理性启发式选择房间脉冲响应。生成的语料库包含240.1小时的音频,涉及830个说话人的1496个对话,划分为说话人不重叠的训练、验证和测试集。我们报告了说话人日志和ASR的基线结果。在测试集上,Sortformer在说话人日志中优于pyannote流水线(DER 11.1%对比24.4%)。对于ASR,使用序列化输出训练微调的Fast Conformer-CTC XLarge模型实现了7.29%的WER和6.97%的cpWER,优于零样本Whisper-large-v3。这些结果使LibriConvo成为研究合成对话语音和评估多说话人语音处理系统的实用基准。

英文摘要

We introduce LibriConvo, a synthetic conversational speech corpus for speaker diarization and automatic speech recognition (ASR), built by instantiating the previously proposed Speaker-Aware Simulated Conversation (SASC) framework in a dataset and benchmarking setting. The main contribution of this paper is a corpus construction pipeline and benchmark derived from that framework. To make the data more suitable for downstream ASR and diarization, conversational timing statistics are estimated from English CallHome using external voice activity detection, long pauses are compressed, LibriTTS utterances are grouped by book to improve local semantic continuity, and room impulse responses are selected with a spatial-plausibility heuristic. The resulting corpus contains 240.1 hours of audio across 1,496 dialogues involving 830 speakers, partitioned into speaker-disjoint train, validation, and test splits. We report baseline results for both diarization and ASR. On the test split, Sortformer outperforms the pyannote pipeline in diarization (11.1\% vs.~24.4\% DER). For ASR, a Fast Conformer-CTC XLarge model fine-tuned with Serialized Output Training achieves 7.29\% WER and 6.97\% cpWER, outperforming zero-shot Whisper-large-v3. These results position LibriConvo as a practical benchmark for studying synthetic conversational speech and for evaluating multi-speaker speech processing systems.

2510.22397 2026-06-11 cs.NI cs.LG 版本更新

NetBurst: Event-Centric Forecasting of Bursty, Intermittent Time Series

NetBurst: 以事件为中心的突发间歇性时间序列预测

Satyandra Guthula, Jaber Daneshamooz, Charles Fleming, Kesheng Wu, Walter Willinger, Arpit Gupta

发表机构 * University of California, Santa Barbara(加州大学圣巴bara分校) Cisco Research(思科研究) Lawrence Berkeley National Laboratory(伯克利国家实验室) Northwestern University(西北大学)

AI总结 针对网络遥测数据中罕见突发和长间隔低活动的“野性”统计特性,提出NetBurst事件中心管道,通过压缩低活动期、分离突发时序和幅度流学习统一表示,在预测误差、突发分布匹配和异常描述性上显著优于Chronos-2等基线。

详情
AI中文摘要

网络运营商通过收集遥测数据(如数据包计数、字节速率或流体积)来监控其基础设施,但有效运营所需的问题——预测未来负载、诊断和表征异常、搜索和检索历史先例——需要超越原始测量。弥合这一差距需要学习表示:紧凑的每实体摘要,从每个实体的单变量时间序列中捕获时间动态。时间序列基础模型是自然的起点,但它们是为密集、周期性的基准数据集(“温和”统计体制)设计的。然而,网络遥测数据处于“野性”体制:操作相关事件罕见,被可变长度的低活动或无活动(“低潮”)间隔分隔,并伴有间歇性的重尾极端值突发(“潮汐”)。我们提出NetBurst,一个以事件为中心的管道,它压缩低潮,将每个时间序列分离为突发时序流和突发幅度流,并学习一个服务于所有三个操作任务的单一表示。与八个基线中最强的竞争者(包括Amazon的Chronos-2和Datadog的Toto)相比,在九个生产遥测配置上,NetBurst在野性体制数据上将中位预测误差降低了1.3–116倍,对真实突发分布的匹配度提高了1.0–7.5倍,并在温和体制基准上与基线相当。对于异常表征,NetBurst产生平衡、分布良好的聚类,在一种新的可解释性评分下,这些聚类在操作员熟悉的术语中可描述性提高了16倍,而聚类过滤搜索实现了7.5倍的端到端检索加速。

英文摘要

Network operators monitor their infrastructure by collecting telemetry data such as packet counts, byte rates, or flow volumes, yet answering the questions that effective operations demand -- forecasting future load, diagnosing and characterizing anomalies, and searching for and retrieving historical precedents -- requires more than raw measurements. Bridging this gap calls for learned representations: compact per-entity summaries that capture temporal dynamics from each entity's univariate time series. Time-series foundation models are the natural starting point, but they are designed for dense, periodic benchmark datasets -- the \emph{mild} statistical regime. However, network telemetry data inhabits the \emph{wild} regime: operationally relevant events are rare, separated by variable-length stretches of low or no activity (``ebbs''), with intermittent bursts of heavy-tailed extremes (``tides''). We present NetBurst, an event-centric pipeline that collapses ebbs, separates each time series into a stream of burst timings and a stream of burst magnitudes, and learns a single representation serving all three operational tasks. Compared to the strongest competitors among eight baselines -- including Amazon's Chronos-2 and Datadog's Toto -- and across nine production telemetry configurations, NetBurst reduces median forecasting error by $1.3$--$116\times$ on wild-regime data with a $1.0$--$7.5\times$ better match to the true burst distribution, and matches baselines on mild-regime benchmarks. For characterizing anomalies, NetBurst produces balanced, well-spread clusters that are $16\times$ more describable in operator-familiar terms under a novel interpretability score, and cluster-filtered search delivers $7.5\times$ faster end-to-end retrieval.

2510.22335 2026-06-11 cs.CV cs.AI 版本更新

Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction

超越扩散:层级到层级自回归用于fMRI到图像重建

Xu Zhang, Ruijie Quan, Wenguan Wang, Yi Yang

发表机构 * The State Key Lab of Brain-Machine Intelligence, Zhejiang University, China(脑机智能国家重点实验室,浙江大学,中国) ReLER, CCAI, College of Artificial Intelligence, Zhejiang University, China(ReLER、中国人工智能学会、人工智能学院、浙江大学、中国)

AI总结 提出MindHier框架,通过层级fMRI编码器、层级对齐和尺度感知粗到细引导策略,实现从粗到细的fMRI到图像重建,优于扩散方法。

详情
Comments
ICLR 2026
AI中文摘要

从fMRI信号重建视觉刺激是连接机器学习和神经科学的核心挑战。最近的扩散方法通常将fMRI活动映射到单个神经嵌入,并将其作为静态指导贯穿整个生成过程。然而,这种固定指导压缩了层级神经信息,并且与图像重建的阶段依赖性需求不一致。为此,我们提出MindHier,一种基于尺度自回归建模的从粗到细的fMRI到图像重建框架。MindHier引入三个组件:层级fMRI编码器提取多级神经嵌入,层级到层级对齐方案强制与CLIP特征的逐层对应,以及尺度感知的粗到细神经引导策略将这些嵌入注入到匹配尺度的自回归中。这些设计使MindHier成为扩散方法的一种高效且认知对齐的替代方案,通过实现层级重建过程,先合成全局语义再细化局部细节,类似于人类视觉感知。在NSD数据集上的大量实验表明,MindHier在语义保真度、推理速度(4.67倍)和结果确定性方面均优于基于扩散的基线方法。

英文摘要

Reconstructing visual stimuli from fMRI signals is a central challenge bridging machine learning and neuroscience. Recent diffusion-based methods typically map fMRI activity to a single neural embedding, using it as static guidance throughout the entire generation process. However, this fixed guidance collapses hierarchical neural information and is misaligned with the stage-dependent demands of image reconstruction. In response, we propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling. MindHier introduces three components: a Hierarchical fMRI Encoder to extract multi-level neural embeddings, a Hierarchy-to-Hierarchy Alignment scheme to enforce layer-wise correspondence with CLIP features, and a Scale-Aware Coarse-to-Fine Neural Guidance strategy to inject these embeddings into autoregression at matching scales. These designs make MindHier an efficient and cognitively aligned alternative to diffusion-based methods by enabling a hierarchical reconstruction process that synthesizes global semantics before refining local details, akin to human visual perception. Extensive experiments on the NSD dataset show that MindHier achieves superior semantic fidelity, 4.67$\times$ faster inference, and more deterministic results than the diffusion-based baselines.

2510.14828 2026-06-11 cs.AI cs.RO 版本更新

RoboGPT-R1: Enhancing Robot Task Planning with Reinforcement Learning

RoboGPT-R1: 通过强化学习增强机器人任务规划

Jinrui Liu, Bingyan Nie, Boyu Li, Yaran Chen, Yuze Wang, Shunsen He, Haoran Li

发表机构 * Institute of Automation, CASIA(中国科学院自动化研究所) School of Artificial Intelligence, UCAS(中国科学技术大学人工智能学院) Huawei Cloud Technology Co., Ltd(华为云技术有限公司)

AI总结 提出RoboGPT-R1两阶段微调框架,先监督学习获取基础知识,再通过强化学习提升视觉空间理解和推理能力,在EmbodiedBench上超越GPT-4o-mini 21.33%。

详情
Journal ref
Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026), pp. 2827-2837, IFAAMAS, 2026
AI中文摘要

提高具身智能体的推理能力对于机器人在长视距操作任务中成功完成复杂的人类指令至关重要。尽管基于监督微调(SFT)的大语言模型和视觉语言模型在规划任务中取得了成功,但由于其常识和推理能力受限,它们在复杂现实环境中执行长视距操作任务时仍面临挑战。考虑到通过监督微调将通用视觉语言模型对齐到机器人规划任务存在泛化能力差和物理理解不足的问题,我们提出了RoboGPT-R1,一个用于具身规划的两阶段微调框架。在该框架中,监督训练通过专家序列获取基础知识,随后通过强化学习解决模型在视觉空间理解和推理方面的不足。为了实现多步推理任务中的物理理解和动作序列一致性,我们设计了一个基于规则的奖励函数,同时考虑了长视距性能和环境中的动作约束。基于Qwen2.5-VL-3B训练的推理模型在EmbodiedBench基准上显著优于更大规模的模型GPT-4o-mini 21.33%,并超过其他基于Qwen2.5-VL-7B训练的工作20.33%。

英文摘要

Improving the reasoning capabilities of embodied agents is crucial for robots to complete complex human instructions in long-view manipulation tasks successfully. Despite the success of large language models and vision language models based on Supervised Fine-Tuning (SFT) in planning tasks, they continue facing challenges in performing long-horizon manipulation tasks in complex real-world environments, owing to their restricted common sense and reasoning capabilities. Considering that aligning general-purpose vision language models to robotic planning tasks via supervised fine-tuning suffers from poor generalization and insufficient physical understanding, we propose RoboGPT-R1, a two-stage fine-tuning framework for embodied planning. In this framework, supervised training acquires foundational knowledge through expert sequences, followed by RL to address the model's shortcomings in visual-spatial understanding and reasoning. To achieve physical understanding and action sequence consistency in multi-step reasoning tasks, we design a rule-based reward function that simultaneously considers long-horizon performance and action constraint in the environment. The reasoning model, trained on Qwen2.5-VL-3B, significantly outperforms the larger-scale model, GPT-4o-mini, by 21.33% and surpasses other work trained on Qwen2.5-VL-7B by 20.33% on the EmbodiedBench benchmark.

2509.16456 2026-06-11 cs.AI 版本更新

GPO: Learning from Critical Steps to Improve LLM Reasoning

GPO:从关键步骤中学习以改进大语言模型推理

Jiahao Yu, Zelei Cheng, Xian Wu, Xinyu Xing

发表机构 * Department of Computer Science Northwestern University(计算机科学系西北大学) AI Foundations Capital One(人工智能基础资本 one) Meta AI

AI总结 提出引导式关键优化(GPO)微调策略,通过识别推理轨迹中的关键步骤并优先学习,显著提升大语言模型的多步推理能力。

详情
Comments
39th Conference on Neural Information Processing Systems (NeurIPS 2025)
AI中文摘要

大语言模型(LLMs)越来越多地应用于各个领域,在不同任务上展现出令人印象深刻的潜力。最近,推理LLMs被提出以改善LLMs的推理或思考能力,从而解决复杂问题。尽管推理LLMs取得了有希望的结果,但增强LLMs的多步推理能力仍然是一个重大挑战。虽然现有的优化方法已经推进了LLM的推理能力,但它们通常将推理轨迹视为一个整体,而不考虑轨迹中潜在的关键步骤。在本文中,我们引入了引导式关键优化(GPO),一种新颖的微调策略,深入推理过程以实现更有效的改进。GPO首先识别推理轨迹中的“关键步骤”——模型必须谨慎进行以成功解决问题的点。我们通过估计优势函数来定位关键步骤。然后,GPO将策略重置到关键步骤,采样新的轨迹,并优先学习这些轨迹。这种关注使模型能够更有效地从推理过程中的关键时刻学习,以提高推理性能。我们证明GPO是一种通用策略,可以与各种优化方法集成以提高推理性能。除了理论分析外,我们在具有挑战性的推理基准上的实验表明,GPO能够持续且显著地提升现有优化方法的性能,展示了其通过关注生成过程中的关键时刻来改进LLM推理的有效性和泛化性。

英文摘要

Large language models (LLMs) are increasingly used in various domains, showing impressive potential on different tasks. Recently, reasoning LLMs have been proposed to improve the \textit{reasoning} or \textit{thinking} capabilities of LLMs to solve complex problems. Despite the promising results of reasoning LLMs, enhancing the multi-step reasoning capabilities of LLMs still remains a significant challenge. While existing optimization methods have advanced the LLM reasoning capabilities, they often treat reasoning trajectories as a whole, without considering the underlying critical steps within the trajectory. In this paper, we introduce \textbf{G}uided \textbf{P}ivotal \textbf{O}ptimization (GPO), a novel fine-tuning strategy that dives into the reasoning process to enable more effective improvements. GPO first identifies the `critical step' within a reasoning trajectory - a point that the model must carefully proceed to succeed at the problem. We locate the critical step by estimating the advantage function. GPO then resets the policy to the critical step, samples the new rollout and prioritizes the learning process on those rollouts. This focus allows the model to learn more effectively from pivotal moments within the reasoning process to improve the reasoning performance. We demonstrate that GPO is a general strategy that can be integrated with various optimization methods to improve reasoning performance. Besides theoretical analysis, our experiments across challenging reasoning benchmarks show that GPO can consistently and significantly enhance the performance of existing optimization methods, showcasing its effectiveness and generalizability in improving LLM reasoning by concentrating on pivotal moments within the generation process.

2510.17816 2026-06-11 eess.SP cs.CV 版本更新

Cross-Domain Multi-Person Human Activity Recognition via Near-Field Wi-Fi Sensing

基于近场Wi-Fi感知的跨域多人人体活动识别

Xin Li, Jingzhi Hu, Yinghui He, Hongbo Wang, Jin Gan, Jun Luo

发表机构 * College of Computing and Data Science, Nanyang Technological University, Singapore(计算与数据科学学院,南洋理工大学,新加坡)

AI总结 针对Wi-Fi多人活动识别中跨域适应难题,提出WiAnchor框架,通过预训练扩大类间特征间隔、微调阶段引入锚点匹配机制过滤个体干扰,实现缺失类别下的高效跨域识别,准确率超90%。

详情
AI中文摘要

基于Wi-Fi的人体活动识别(HAR)提供了极大的便利,并已成为一个蓬勃发展的研究领域,然而Wi-Fi固有的粗空间分辨率严重阻碍了其区分多个目标的能力。通过利用近场主导效应,为每个目标通过其个人Wi-Fi设备建立专用传感链路,为原生流量下的多人HAR提供了一种有前景的解决方案。然而,由于近场信号的目标特定特性和不规则模式,HAR神经网络模型需要微调(FT)以实现跨域适应,这在某些类别不可用时变得特别具有挑战性。在本文中,我们提出WiAnchor,一种新颖的训练框架,用于在活动类别不完整的情况下实现高效的跨域适应。该框架通过三个步骤处理嵌入不规则时间信息的Wi-Fi信号:在预训练期间,我们扩大类间特征间隔以增强活动的可分离性;在微调阶段,我们创新性地引入一种锚点匹配机制用于跨域适应,根据不完整的活动类别过滤目标特定干扰,而不是试图从中提取完整特征;最后,基于输入样本与锚点的特征级相似性进一步改进识别。我们构建了一个全面的数据集来彻底评估WiAnchor,在缺失活动类别的情况下实现了超过90%的跨域准确率。

英文摘要

Wi-Fi-based human activity recognition (HAR) provides substantial convenience and has emerged as a thriving research field, yet the coarse spatial resolution inherent to Wi-Fi significantly hinders its ability to distinguish multiple subjects. By exploiting the near-field domination effect, establishing a dedicated sensing link for each subject through their personal Wi-Fi device offers a promising solution for multi-person HAR under native traffic. However, due to the subject-specific characteristics and irregular patterns of near-field signals, HAR neural network models require fine-tuning (FT) for cross-domain adaptation, which becomes particularly challenging with certain categories unavailable. In this paper, we propose WiAnchor, a novel training framework for efficient cross-domain adaptation in the presence of incomplete activity categories. This framework processes Wi-Fi signals embedded with irregular time information in three steps: during pre-training, we enlarge inter-class feature margins to enhance the separability of activities; in the FT stage, we innovate an anchor matching mechanism for cross-domain adaptation, filtering subject-specific interference informed by incomplete activity categories, rather than attempting to extract complete features from them; finally, the recognition of input samples is further improved based on their feature-level similarity with anchors. We construct a comprehensive dataset to thoroughly evaluate WiAnchor, achieving over 90% cross-domain accuracy with absent activity categories.

2510.16152 2026-06-11 cs.DL cs.AI cs.CL cs.LG 版本更新

Mapping Scientific Literature with Large Language Models and Topic Modeling

利用大语言模型和主题建模绘制科学文献图谱

Mason Smetana, Lev Khazanovich

发表机构 * Department of Civil and Environmental Engineering(土木与环境工程系) University of Pittsburgh(匹兹堡大学)

AI总结 提出基于大语言模型的两阶段分类框架,通过主题建模分析PNAS工程类文献,生成语义可解释主题并揭示跨主题关联,性能优于传统方法。

详情
Comments
35 pages, 10 figures. Accepted for publication in Scientometrics. Final version available via DOI
AI中文摘要

科学文献因学科边界、专业术语和潜在稀疏的关键词系统而日益碎片化,使得捕捉现代科学的演化结构变得困难。本研究引入了一个大语言模型驱动的框架,从主题建模的角度绘制科学文献图谱。该方法在《美国国家科学院院刊》20年间超过1500篇工程相关文章语料上进行了演示。一个两阶段分类流水线首先根据每篇文章的摘要分配一个主要主题类别,然后进行全文分析以识别次要分类,揭示语料库中潜在的跨主题联系。与传统主题模型不同,基于LLM的框架在保持强量化性能的同时,生成语义可解释的主题。与既定主题建模方法的比较评估显示,主题多样性更高,重叠度更低,且具有竞争性的一致性指标。对随机抽样的摘要子集进行手动验证,准确率达到75.9%。额外的传统自然语言处理分析证实,生成的主题对应于语料库中有意义的语言模式。连接主要和次要分类的二部网络进一步揭示了仅通过摘要或关键词系统不易观察到的隐含主题关系。结果表明,该框架无需事先了解期刊的编辑双重分类结构,即可独立恢复其大部分结构。总体而言,所提出的方法为绘制科学图谱和识别研究中新兴的跨主题联系提供了有力工具。

英文摘要

Scientific literature is increasingly fragmented by disciplinary boundaries, specialized terminology, and potentially sparse keyword systems, making it difficult to capture the evolving structure of modern science. This study introduces a large language model (LLM)-driven framework for mapping scientific literature from a topic modeling perspective. The approach is demonstrated on a 20-year corpus of more than 1,500 engineering-related articles published in the Proceedings of the National Academy of Sciences (PNAS). A two-stage classification pipeline first assigns a primary thematic category to each article based on its abstract, followed by full-text analysis to identify secondary classifications that reveal latent cross-topic connections within the corpus. Unlike conventional topic models, the LLM-based framework produces semantically interpretable topics while maintaining strong quantitative performance. Comparative evaluation against established topic modeling methods shows higher topic diversity and lower overlap with competitive coherence metrics. Manual validation on a randomly sampled subset of abstracts yields an accuracy of 75.9%. Additional traditional natural language processing analyses confirm that the generated topics correspond to meaningful linguistic patterns in the corpus. A bipartite network linking primary and secondary classifications further reveals implicit thematic relationships that are not readily observable through abstracts or keyword systems alone. The findings indicate that the framework independently recovers much of the journal's editorial dual-classification structure without prior knowledge of its schema. Overall, the proposed approach offers a powerful tool for mapping science and identifying emerging cross-topic connections in research.

2510.08073 2026-06-11 cs.CV cs.LG 版本更新

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

物理驱动的时空建模用于AI生成视频检测

Shuhai Zhang, ZiHao Lian, Jiahao Yang, Daiyuan Li, Guoxuan Pang, Feng Liu, Bo Han, Shutao Li, Mingkui Tan

发表机构 * South China University of Technology(华南理工大学) University of Science and Technology of China(中国科学技术大学) Key Laboratory of Big Data and Intelligent Robot, Ministry of Education(教育部大数据与智能机器人重点实验室) Pazhou Lab(琶洲实验室) University of Melbourne(墨尔本大学) Hunan University(湖南大学) Hong Kong Baptist University(香港 Baptist大学)

AI总结 提出基于概率流守恒的物理驱动AI生成视频检测范式,通过归一化时空梯度(NSG)统计量捕捉物理异常,结合预训练扩散模型估计NSG,并利用最大均值差异(MMD)进行检测,在Recall和F1-Score上分别提升16.00%和10.75%。

详情
Comments
Accepted at NeurIPS 2025 spotlight
AI中文摘要

AI生成的视频已实现近乎完美的视觉真实感(如Sora),迫切需要可靠的检测机制。然而,检测此类视频在建模高维时空动态和识别违反物理规律的细微异常方面面临重大挑战。本文提出首个基于概率流守恒原理的物理驱动AI生成视频检测范式。具体而言,我们提出一种称为归一化时空梯度(NSG)的统计量,该统计量量化空间概率梯度与时间密度变化之比,明确捕捉与自然视频动态的偏差。利用预训练的扩散模型,我们通过空间梯度近似和运动感知时间建模开发了NSG估计器,无需复杂的运动分解,同时保持物理约束。在此基础上,我们提出基于NSG的视频检测方法(NSG-VD),该方法计算测试视频与真实视频NSG特征之间的最大均值差异(MMD)作为检测指标。最后,我们推导了真实视频与生成视频之间NSG特征距离的上界,证明由于分布偏移,生成视频表现出放大的差异。大量实验证实,NSG-VD在Recall和F1-Score上分别比最先进的基线方法高出16.00%和10.75%,验证了NSG-VD的优越性能。源代码可在该 https URL 获取。

英文摘要

AI-generated videos have achieved near-perfect visual realism (e.g., Sora), urgently necessitating reliable detection mechanisms. However, detecting such videos faces significant challenges in modeling high-dimensional spatiotemporal dynamics and identifying subtle anomalies that violate physical laws. In this paper, we propose the first physics-driven AI-generated video detection paradigm based on probability flow conservation principles. Specifically, we propose a statistic called Normalized Spatiotemporal Gradient (NSG), which quantifies the ratio of spatial probability gradients to temporal density changes, explicitly capturing deviations from natural video dynamics. Leveraging pre-trained diffusion models, we develop an NSG estimator through spatial gradients approximation and motion-aware temporal modeling without complex motion decomposition while preserving physical constraints. Building on this, we propose an NSG-based video detection method (NSG-VD) that computes the Maximum Mean Discrepancy (MMD) between NSG features of the test and real videos as a detection metric. Last, we derive an upper bound of NSG feature distances between real and generated videos, proving that generated videos exhibit amplified discrepancies due to distributional shifts. Extensive experiments confirm that NSG-VD outperforms state-of-the-art baselines by 16.00% in Recall and 10.75% in F1-Score, validating the superior performance of NSG-VD. The source code is available at this https URL.

2510.01529 2026-06-11 cs.LG cs.CR 版本更新

Bypassing Prompt Guards in Production with Controlled-Release Prompting

绕过生产环境中的提示守卫:受控释放提示攻击

Jaiden Fairoze, Sanjam Garg, Keewoo Lee, Mingyuan Wang

发表机构 * UC Berkeley(加州大学伯克利分校) zkBricks Inc(zkBricks公司) Ethereum Foundation(以太坊基金会) NYU Shanghai(纽约大学上海分校)

AI总结 针对AI对齐的提示过滤存在理论上的不可能性,本文提出受控释放提示攻击,利用轻量级输入过滤器与主模型之间的资源不对称性,在实际部署的大语言模型系统中成功绕过提示守卫。

详情
Comments
Accepted to USENIX Security 2026
AI中文摘要

Ball等人最近指出,用于AI对齐的提示过滤面临一个根本性障碍:在标准密码学假设下,任何运行速度远快于被保护模型的过滤器都无法普遍区分对抗性提示和良性提示。我们研究这一不可能性结果是否转化为已部署的大语言模型(LLM)系统中的现实漏洞。我们通过引入受控释放提示攻击给出了肯定答案,这是理论框架的一种实际实例化,利用了轻量级输入过滤器与其保护的主模型之间的资源不对称性。与理论构造不同,我们的攻击不需要修改模型:它生成任何有界过滤器无法解读但对目标LLM仍然可处理的恶意提示。我们发现,在基线方法失败的四个主要聊天平台(Google Gemini、DeepSeek Chat、xAI Grok和Mistral Le Chat)上,我们的攻击均成功。此外,我们将攻击应用于从Gemini提取受版权保护的数据。最后,我们对14个开源提示守卫模型进行了系统评估,揭示即使具有推理能力的过滤器也无法在不产生过高资源开销的情况下可靠地检测我们的攻击。

英文摘要

Ball et al. recently established that prompt filtering for AI alignment faces a fundamental barrier: under standard cryptographic assumptions, no filter running significantly faster than the protected model can universally distinguish adversarial prompts from benign ones. We investigate whether this impossibility result translates to real-world vulnerabilities in deployed large language model (LLM) systems. We answer affirmatively by introducing controlled-release prompting, a practical instantiation of the theoretical framework that exploits the resource asymmetry between lightweight input filters and the main models they protect. Unlike the theoretical construction, our attack does not require model modification: it generates malicious prompts that are indecipherable by any bounded filter yet remain tractable to the target LLM. We find our attack to be successful on four major chat platforms (Google Gemini, DeepSeek Chat, xAI Grok, and Mistral Le Chat) where baseline methods fail. Additionally, we apply our attack to extract copyrighted data from Gemini. Finally, we provide a systematic evaluation of 14 open-weight prompt guard models, revealing that even reasoning-capable filters cannot reliably detect our attack without incurring prohibitive resource overhead.

2510.03520 2026-06-11 cs.LG cs.AI eess.SY 版本更新

Certifiable Safe RLHF: Semantic Grounding and Fixed Penalty Constraint Optimization for Safer LLM Alignment

可认证安全RLHF:基于语义基础与固定惩罚约束优化的更安全大语言模型对齐

Kartik Pandit, Sourav Ganguly, Arnesh Banerjee, Shaahin Angizi, Arnob Ghosh

发表机构 * Department of Electrical and Computer Engineering(电气与计算机工程系) New Jersey Institute of Technology(新泽西理工学院) Department of Computer Engineering(计算机工程系) Heritage Institute of Technology(遗产理工学院)

AI总结 针对现有RLHF方法依赖奖励/成本函数和双变量调优导致性能敏感且缺乏可证明安全保证的问题,提出CS-RLHF,通过语义基础成本模型和固定惩罚约束优化,实现可认证安全对齐,效率提升至少5倍。

详情
AI中文摘要

确保安全是大语言模型(LLMs)的基本要求。在增强模型输出效用与减轻其潜在危害之间取得适当平衡是一个复杂且持续的挑战。当代方法通常将这个问题形式化为约束马尔可夫决策过程(CMDP)框架,并采用成熟的CMDP优化技术。然而,这些方法表现出两个显著的限制。首先,它们对奖励和成本函数的依赖使得性能对底层评分机制高度敏感,而该机制必须捕捉语义含义,而不是被表面关键词触发。其次,基于CMDP的训练需要调整双变量,这一过程计算成本高昂,并且对于可能通过对抗性越狱利用的固定双变量,不提供任何可证明的安全保证。为了克服这些限制,我们引入了可认证安全RLHF(CS-RLHF),它引入了一个在大规模语料库上训练的成本模型,以分配基于语义的安全分数。与基于拉格朗日的方法相比,CS-RLHF采用了一种修正的基于惩罚的公式。该设计借鉴了约束优化中精确惩罚函数理论,其中约束满足直接通过适当选择的惩罚项来强制执行。通过适当缩放的惩罚,可以在优化器处保证安全约束的可行性,从而消除了双变量更新的需要。实证评估表明,CS-RLHF优于最先进的LLM模型响应,对正常和越狱提示的效率至少提高5倍。

英文摘要

Ensuring safety is a foundational requirement for large language models (LLMs). Achieving an appropriate balance between enhancing the utility of model outputs and mitigating their potential for harm is a complex and persistent challenge. Contemporary approaches frequently formalize this problem within the framework of Constrained Markov Decision Processes (CMDPs) and employ established CMDP optimization techniques. However, these methods exhibit two notable limitations. First, their reliance on reward and cost functions renders performance highly sensitive to the underlying scoring mechanism, which must capture semantic meaning rather than being triggered by superficial keywords. Second, CMDP-based training entails tuning dual-variable, a process that is both computationally expensive and does not provide any provable safety guarantee for a fixed dual variable that can be exploitable through adversarial jailbreaks. To overcome these limitations, we introduce Certifiable Safe-RLHF (CS-RLHF) that introduces a cost model trained on a large-scale corpus to assign semantically grounded safety scores. In contrast to the lagrangian-based approach, CS-RLHF adopts a rectified penalty-based formulation. This design draws on the theory of exact penalty functions in constrained optimization, wherein constraint satisfaction is enforced directly through a suitably chosen penalty term. With an appropriately scaled penalty, feasibility of the safety constraints can be guaranteed at the optimizer, eliminating the need for dual-variable updates. Empirical evaluation demonstrates that CS-RLHF outperforms state-of-the-art LLM model responses rendering at-least 5 times efficient against nominal and jail-breaking prompts

2510.02660 2026-06-11 cs.HC cs.AI 版本更新

When Researchers Say Mental Model/Theory of Mind of AI, What Are They Really Talking About?

当研究人员谈论AI的心理模型/心智理论时,他们究竟在说什么?

Xiaoyun Yin, Elmira Zahmat Doost, Shiwen Zhou, Garima Arya Yadav, Jamie C. Gorman

发表机构 * Center for Human, Artificial Intelligence, and Robot Teaming(人类、人工智能与机器人协同中心)

AI总结 本文指出当前AI心智理论研究混淆了行为预测与真实认知,提出应转向人机交互中的互惠心智理论框架。

详情
Comments
This work have been accepted in CogInterp @ NeurIPS 2025
AI中文摘要

当研究人员声称AI系统拥有心智理论或心理模型时,他们本质上是在讨论行为预测和偏差校正,而非真正的心理状态。本文认为,当前的讨论将复杂的模式匹配与真实的认知混为一谈,忽略了模拟与体验之间的关键区别。尽管最近的研究表明,LLMs在实验室的心智理论任务中达到了人类水平的表现,但这些结果仅基于行为模仿。更重要的是,整个测试范式可能存在缺陷,因为它将个体人类认知测试应用于AI系统,而不是在人类与AI交互的当下直接评估人类认知。我建议将焦点转向互惠心智理论框架,该框架承认人类认知和AI算法的同时贡献,强调交互动态,而非孤立地测试AI。

英文摘要

When researchers claim AI systems possess ToM or mental models, they are fundamentally discussing behavioral predictions and bias corrections rather than genuine mental states. This position paper argues that the current discourse conflates sophisticated pattern matching with authentic cognition, missing a crucial distinction between simulation and experience. While recent studies show LLMs achieving human-level performance on ToM laboratory tasks, these results are based only on behavioral mimicry. More importantly, the entire testing paradigm may be flawed in applying individual human cognitive tests to AI systems, but assessing human cognition directly in the moment of human-AI interaction. I suggest shifting focus toward mutual ToM frameworks that acknowledge the simultaneous contributions of human cognition and AI algorithms, emphasizing the interaction dynamics, instead of testing AI in isolation.

2508.09459 2026-06-11 cs.CV cs.AI 版本更新

RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization

RelayFormer: 一种用于可扩展图像和视频篡改定位的统一局部-全局注意力框架

Wen Huang, Jiarui Yang, Tao Dai, Jiawei Li, Shaoxiong Zhan, Bin Wang, Shu-Tao Xia

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University(清华大学深圳国际研究生院,清华大学) College of Artificial Intelligence, Nankai University(南开大学人工智能学院) College of Computer Science and Software Engineering, Shenzhen University(深圳大学计算机科学与软件工程学院) Huawei Technologies Co., Ltd(华为技术有限公司)

AI总结 提出RelayFormer统一框架,通过全局局部中继(GLR)令牌和中继注意力机制,适应不同分辨率并统一处理图像与视频,在篡改定位任务中实现高效且性能优越。

详情
AI中文摘要

视觉篡改定位(VML)旨在识别图像和视频中被篡改的区域,随着高级编辑工具的兴起,这一任务变得日益具有挑战性。现有方法面临两个核心问题。首先是分辨率多样性。调整大小或填充可能会扭曲微妙的取证线索,并引入不必要的计算成本。其次是将图像的空间模型扩展到视频的时空输入的困难,这通常导致为两种数据类型维护单独的架构。为了解决这些挑战,我们提出了RelayFormer,一个统一框架,能够适应不同分辨率并自然处理静态和时态视觉数据。RelayFormer将输入划分为固定大小的子图像,并引入全局局部中继(GLR)令牌,通过基于中继的注意力机制传播结构化上下文。这种设计使得全局线索(如语义或时间一致性)的高效交换成为可能,同时保留细粒度的篡改伪影。与依赖统一调整大小或稀疏注意力的先前方法不同,RelayFormer以最小的开销扩展到可变分辨率和视频序列。跨多个基准的实验表明,其具有优越的性能和强大的效率,结合了无需插值或过多填充的分辨率适应性、图像和视频的统一处理,以及准确性和计算成本之间的有利平衡。代码可在\href{this https URL}{this https URL}获取。

英文摘要

Visual manipulation localization (VML) aims to identify tampered regions in images and videos, a task that has become increasingly challenging with the rise of advanced editing tools. Existing methods face two central issues. The first is resolution diversity. Resizing or padding can distort subtle forensic cues and introduce unnecessary computational cost. The second is the difficulty of extending spatial models for images to spatio-temporal inputs in videos, which often results in maintaining separate architectures for the two data types. To address these challenges, we propose RelayFormer, a unified framework that adapts to varying resolutions and naturally handles both static and temporal visual data. RelayFormer partitions inputs into fixed-size sub-images and introduces Global Local Relay (GLR) tokens that propagate structured context through a relay-based attention mechanism. This design enables efficient exchange of global cues, such as semantic or temporal consistency, while preserving fine-grained manipulation artifacts. Unlike prior approaches that depend on uniform resizing or sparse attention, RelayFormer scales to variable resolutions and video sequences with minimal overhead. Experiments across diverse benchmarks demonstrate superior performance and strong efficiency, combining resolution adaptivity without interpolation or excessive padding, unified processing for images and videos, and a favorable balance between accuracy and computational cost. Code is available at~\href{ this https URL }{ this https URL }.

2510.02149 2026-06-11 cs.LG math.OC stat.ML 版本更新

Reinforcement Learning with Action-Triggered Observations

具有动作触发观测的强化学习

Alexander Ryabchenko, Wenlong Mou

发表机构 * Department of Statistical Sciences, University of Toronto(统计科学系,多伦多大学;向量研究所) Vector Institute

AI总结 提出动作触发稀疏可追踪MDP框架,推导Bellman方程并证明最优策略存在,利用观测间动作序列的线性表示实现基于回归的方法,在几何分布情节下达到与完全可观测线性MDP匹配的遗憾界。

详情
AI中文摘要

我们引入了动作触发稀疏可追踪马尔可夫决策过程(ATST-MDPs),这是一种用于部分可观测性的强化学习框架,其中完整状态观测在每个步骤以由所选动作决定的概率随机发生。我们推导了针对该设置的Bellman方程,并证明了最优策略的存在性。利用稀疏观测揭示完整状态的事实,我们提供了一个等价公式,其中智能体在连续观测之间承诺动作序列。在线性MDP假设下,我们证明了这些动作序列上的值函数在有限维特征映射中具有线性表示,从而能够使用标准的基于回归的方法。作为一个应用,我们推导了ATST-LSVI-UCB,一种乐观算法,在几何分布的情节学习中实现了遗憾界$\widetilde{O}(\sqrt{Kd^3(1-\gamma)^{-3}})$,其中$K$是情节数,$d$是特征维度,$\gamma$是折扣因子(情节继续概率),与完全可观测线性MDP的已知速率相匹配。

英文摘要

We introduce Action-Triggered Sporadically Traceable Markov Decision Processes (ATST-MDPs), a reinforcement learning framework for partial observability in which full state observations occur stochastically at each step, with probability determined by the chosen action. We derive Bellman equations tailored to this setting and establish the existence of an optimal policy. Exploiting the fact that sporadic observations reveal the full state, we provide an equivalent formulation in which agents commit to action-sequences between consecutive observations. Under the linear MDP assumption, we show that the value function over such action-sequences admits a linear representation in a finite-dimensional feature map, enabling standard regression-based methods. As an application, we derive ATST-LSVI-UCB, an optimistic algorithm achieving regret $\widetilde{O}(\sqrt{Kd^3(1-\gamma)^{-3}})$ for episodic learning with geometrically distributed horizons, where $K$ is the number of episodes, $d$ the feature dimension, and $\gamma$ the discount factor (episode continuation probability), matching the known rate for linear MDPs with full observability.

2509.26294 2026-06-11 cs.LG cs.AI 版本更新

Noise-Guided Transport for Imitation Learning

噪声引导的模仿学习传输方法

Lionel Blondé, Joao A. Candido Ramos, Alexandros Kalousis

发表机构 * University of Cambridge(剑桥大学) University of Oxford(牛津大学)

AI总结 针对低数据场景下的模仿学习,提出噪声引导传输(NGT)方法,通过对抗训练将模仿问题转化为最优传输问题,无需预训练或特殊架构,在极低数据量下实现强性能。

详情
Comments
Accepted at ICML 2026. Code: this https URL
AI中文摘要

我们考虑低数据场景下的模仿学习,其中只有有限数量的专家演示可用。在这种情况下,依赖大规模预训练或高容量架构的方法难以应用,对演示数据的效率变得至关重要。我们引入了噪声引导传输(NGT),一种轻量级的离策略方法,将模仿问题转化为通过对抗训练解决的最优传输问题。NGT不需要预训练或专门架构,通过设计包含不确定性估计,并且易于实现和调优。尽管简单,NGT在具有挑战性的连续控制任务(包括高维人形任务)中,在仅有20个转换的超低数据场景下取得了强劲的性能。

英文摘要

We consider imitation learning in the low-data regime, where only a limited number of expert demonstrations are available. In this setting, methods that rely on large-scale pretraining or high-capacity architectures can be difficult to apply, and efficiency with respect to demonstration data becomes critical. We introduce Noise-Guided Transport (NGT), a lightweight off-policy method that casts imitation as an optimal transport problem solved via adversarial training. NGT requires no pretraining or specialized architectures, incorporates uncertainty estimation by design, and is easy to implement and tune. Despite its simplicity, NGT achieves strong performance on challenging continuous control tasks, including high-dimensional Humanoid tasks, under ultra-low data regimes with as few as 20 transitions.

2509.25359 2026-06-11 cs.CL cs.AI 版本更新

Geometric Metrics and LLMs: What They Measure and When They Work

几何度量与大语言模型:它们测量什么以及何时有效

Viacheslav Yusupov, Anna Antipina, Ameliia Alaeva, Danil Maksimov, Anna Vasileva, Tatyana Zaitseva, Alina Ermilova, Evgeny Burnaev, Egor Shvetsov

发表机构 * Moscow Institute of Physics and Technology(莫斯科物理技术学院) Russian Academy of Sciences(俄罗斯科学院)

AI总结 本文系统测试了用于大语言模型评估的几何度量,发现部分度量主要反映输出长度,而几何度量在文本统计基础上提供有限但真实的信息,并指出故障检测是最有前景的应用。

详情
AI中文摘要

我们提出了对大语言模型评估中几何度量的系统性压力测试。基于排名的内部表示几何特性作为无参考质量信号显示出前景,但其可靠的条件仍不清楚。我们评估了八种常用度量:内在维度估计器、谱范数及相关量,在六个测试模型(0.5-8B)和八个生成器上对比任务,将真实的几何信号与文本长度效应以及标准文本统计已捕获的信息区分开。三个发现出现。首先,一些度量(特别是Schatten范数和MOM)主要反映输出长度,一旦控制长度,其明显的区分能力就崩溃。其次,几何度量在文本统计之外增加了适度但真实的信息:结合它们,分类器在6路生成器识别上达到78%的准确率,而仅用文本统计为69%。第三,度量并不追踪文本质量的通用概念,而是显示内在维度与词汇多样性(RTTR)之间仅存在中等关联。我们给出了特定用例的建议,并指出故障检测是最有前景的近期应用。

英文摘要

We present a systematic stress-test of geometric metrics for LLM evaluation. Rank-based geometric properties of internal representations have shown promise as reference-free quality signals, but the conditions under which they are reliable remain unclear. We evaluate eight commonly-used metrics: intrinsic-dimensionality estimators, spectral norms, and related quantities across six tester models (0.5-8B) and eight generators on contrasting tasks, separating genuine geometric signal from text-length effects and from what standard text statistics already capture. Three findings emerge. First, some metrics (notably Schatten Norm and MOM) mainly reflect output length, and their apparent discriminative power collapses once length is controlled. Second, geometric metrics add modest but real information beyond text statistics: combined with them, a classifier reaches 78% accuracy on 6-way generator identification versus 69% for text statistics alone. Third, rather than tracking a general notion of text quality, the metrics demonstrate only moderate association between the intrinsic-dimensionality and lexical diversity (RTTR). We give use-case-specific recommendations and identify failure detection as the most promising near-term application.

2509.23982 2026-06-11 cs.CL cs.AI cs.CY cs.LG cs.NE 版本更新

Toward Preference-aligned Large Language Models via Residual-based Model Steering

基于残差模型引导的偏好对齐大型语言模型

Lucio La Cava, Andrea Tagarelli

发表机构 * DIMES Dept., University of Calabria, Italy(卡利博大学DIMES系)

AI总结 提出PaLRS方法,利用残差流中的偏好信号提取轻量级引导向量,无需训练即可在推理时对齐模型偏好,在数学推理和代码生成任务上取得一致提升,同时节省大量时间。

详情
Comments
Accepted at IJCAI 2026
AI中文摘要

偏好对齐是使大型语言模型(LLMs)有用且与(人类)偏好一致的关键步骤。现有方法如基于人类反馈的强化学习或直接偏好优化通常需要精心策划的数据和对数十亿参数进行昂贵的优化,最终导致持久性的任务特定模型。在这项工作中,我们引入了基于残差引导的LLM偏好对齐(PaLRS),这是一种无需训练的方法,利用LLM残差流中编码的偏好信号。从仅一百个偏好对中,PaLRS提取出轻量级、即插即用的引导向量,可在推理时应用以将模型推向偏好行为。我们在各种中小型开源LLM上评估了PaLRS,显示PaLRS对齐的模型在数学推理和代码生成基准上取得了一致的提升,同时保持了基线通用性能。此外,与使用DPO和SimPO对齐的模型相比,它们表现更好且节省大量时间。我们的发现强调,PaLRS为标准偏好优化流程提供了一种有效、更高效且灵活的替代方案,提供了一种无需训练、即插即用的对齐机制,且数据需求极少。

英文摘要

Preference alignment is a critical step in making Large Language Models (LLMs) useful and aligned with (human) preferences. Existing approaches such as Reinforcement Learning from Human Feedback or Direct Preference Optimization typically require curated data and expensive optimization over billions of parameters, and eventually lead to persistent task-specific models. In this work, we introduce Preference alignment of Large Language Models via Residual Steering (PaLRS), a training-free method that exploits preference signals encoded in the residual streams of LLMs. From as few as one hundred preference pairs, PaLRS extracts lightweight, plug-and-play steering vectors that can be applied at inference time to push models toward preferred behaviors. We evaluate PaLRS on various small-to-medium-scale open-source LLMs, showing that PaLRS-aligned models achieve consistent gains on mathematical reasoning and code generation benchmarks while preserving baseline general-purpose performance. Moreover, when compared to models aligned with DPO and SimPO, they perform better with great time-savings. Our findings highlight that PaLRS offers an effective, much more efficient and flexible alternative to standard preference optimization pipelines, offering a training-free, plug-and-play mechanism for alignment with minimal data.

2509.19463 2026-06-11 cs.RO 版本更新

CU-Multi: A Dataset for Multi-Robot Collaborative Perception

CU-Multi:多机器人协同感知数据集

Doncey Albin, Daniel McGann, Miles Mena, Annika Thomas, Harel Biggie, Xuefei Sun, Steve McGuire, Jonathan P. How, Christoffer Heckman

发表机构 * Autonomous Robotics and Perception Group at the University of Colorado Boulder(科罗拉多大学波尔得分校自主机器人与感知组) Robot Perception Lab at Carnegie Mellon University(卡内基梅隆大学机器人感知实验室) Aerospace Controls Laboratory at Massachusetts Institute of Technology(麻省理工学院航空航天控制实验室) Computer Science and Artificial Intelligence Laboratory at Massachusetts Institute of Technology(麻省理工学院计算机科学与人工智能实验室) Human-Aware Robotic Exploration Lab at University of California Santa Cruz(加州大学圣克ruz分校人感知机器人探索实验室)

AI总结 针对多机器人协同感知基准测试缺乏专用数据集的问题,提出CU-Multi数据集,包含多天采集的同步多机器人轨迹、RGB-D、RTK GPS、语义LiDAR及精确里程计,支持可重复评估。

详情
Comments
8 pages, 11 figures. arXiv admin note: text overlap with arXiv:2505.17576
AI中文摘要

多机器人系统的一个核心挑战是将独立收集的感知数据融合成统一表示。尽管协同SLAM(C-SLAM)取得了进展,但由于缺乏专用的多机器人数据集,基准测试仍然受到阻碍。许多评估转而分割单机器人轨迹,这种做法可能仅部分反映真实的多机器人操作,更关键的是缺乏标准化,导致结果难以解释或跨研究比较。虽然最近引入了几个多机器人数据集,但它们大多包含短轨迹,机器人间重叠有限且机器人内闭环稀疏。为克服这些限制,我们引入了CU-Multi,这是一个在科罗拉多大学博尔德分校两个大型户外场地多天收集的数据集。CU-Multi包含四个同步运行,具有对齐的起始时间和受控的轨迹重叠,复现了机器人团队的不同视角。它包括RGB-D感知、RTK GPS、语义LiDAR和精化的地面真实里程计。通过将重叠变化与密集语义标注相结合,CU-Multi为多机器人协同感知任务中的可重复评估提供了坚实基础。

英文摘要

A central challenge for multi-robot systems is fusing independently gathered perception data into a unified representation. Despite progress in Collaborative SLAM (C-SLAM), benchmarking remains hindered by the scarcity of dedicated multi-robot datasets. Many evaluations instead partition single-robot trajectories, a practice that may only partially reflect true multi-robot operations and, more critically, lacks standardization, leading to results that are difficult to interpret or compare across studies. While several multi-robot datasets have recently been introduced, they mostly contain short trajectories with limited inter-robot overlap and sparse intra-robot loop closures. To overcome these limitations, we introduce CU-Multi, a dataset collected over multiple days at two large outdoor sites on the University of Colorado Boulder campus. CU-Multi comprises four synchronized runs with aligned start times and controlled trajectory overlap, replicating the distinct perspectives of a robot team. It includes RGB-D sensing, RTK GPS, semantic LiDAR, and refined ground-truth odometry. By combining overlap variation with dense semantic annotations, CU-Multi provides a strong foundation for reproducible evaluation in multi-robot collaborative perception tasks.

2509.14860 2026-06-11 cs.CV cs.AI cs.CL cs.MA 版本更新

MARIC: Multi-Agent Reasoning for Image Classification

MARIC:用于图像分类的多智能体推理

Wonduk Seo, Minhyeong Yu, Hyunjin An, Seunghyun Lee

发表机构 * Enhans, Seoul, South Korea(韩国首尔Enhans) Peking University, Beijing, China(中国北京北京大学)

AI总结 提出多智能体框架MARIC,通过分解图像分类为协作推理过程,利用大纲智能体、方面智能体和推理智能体进行多视角分析与综合,在四个基准数据集上显著优于基线方法。

详情
Comments
11 pages, preprint
AI中文摘要

图像分类传统上依赖于参数密集型模型训练,需要大规模标注数据集和大量微调才能达到有竞争力的性能。虽然最近的视觉语言模型(VLM)缓解了其中一些限制,但它们仍然受限于对单次表示的依赖,往往无法捕捉视觉内容的互补方面。在本文中,我们介绍了基于多智能体的图像分类推理(MARIC),这是一个多智能体框架,将图像分类重新表述为协作推理过程。MARIC首先利用大纲智能体分析图像的全局主题并生成有针对性的提示。基于这些提示,三个方面智能体沿着不同的视觉维度提取细粒度描述。最后,推理智能体通过集成反思步骤综合这些互补输出,产生用于分类的统一表示。通过明确地将任务分解为多个视角并鼓励反思性综合,MARIC减轻了参数繁重训练和单一VLM推理的缺点。在4个不同的图像分类基准数据集上的实验表明,MARIC显著优于基线,突出了多智能体视觉推理在鲁棒且可解释的图像分类中的有效性。

英文摘要

Image classification has traditionally relied on parameter-intensive model training, requiring large-scale annotated datasets and extensive fine tuning to achieve competitive performance. While recent vision language models (VLMs) alleviate some of these constraints, they remain limited by their reliance on single pass representations, often failing to capture complementary aspects of visual content. In this paper, we introduce Multi Agent based Reasoning for Image Classification (MARIC), a multi agent framework that reformulates image classification as a collaborative reasoning process. MARIC first utilizes an Outliner Agent to analyze the global theme of the image and generate targeted prompts. Based on these prompts, three Aspect Agents extract fine grained descriptions along distinct visual dimensions. Finally, a Reasoning Agent synthesizes these complementary outputs through integrated reflection step, producing a unified representation for classification. By explicitly decomposing the task into multiple perspectives and encouraging reflective synthesis, MARIC mitigates the shortcomings of both parameter-heavy training and monolithic VLM reasoning. Experiments on 4 diverse image classification benchmark datasets demonstrate that MARIC significantly outperforms baselines, highlighting the effectiveness of multi-agent visual reasoning for robust and interpretable image classification.