arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2085
2603.19254 2026-05-11 cs.CL

FinReasoning: A Hierarchical Benchmark for Reliable Financial Research Reporting

Yiyun Zhu, Yidong Jiang, Ziwen Xu, Yinsheng Yao, Dawei Cheng, Jinru Ding, Jie Xu

AI总结 FinReasoning 是一个用于评估金融研究报告生成能力的分层基准,旨在解决当前大型语言模型在金融分析中存在事实错误、数据不一致和浅层分析等问题。该研究将金融研究的核心能力分解为语义一致性、数据对齐和深度洞察三个层次,并提出了一套细粒度的评估框架,包含12项核心分析指标,以更准确地区分模型在基础审计、错误修正和高级分析等不同阶段的表现。实验表明,不同类型的模型在各层次上表现出明显差异,为多智能体金融系统中的角色分配提供了重要参考。

详情
英文摘要

Large language models (LLMs) are increasingly deployed in financial research workflows, where their role is evolving from single-model assistance for human analysts toward autonomous collaboration among multiple agents. Yet real-world deployments still expose factual errors, numerical inconsistencies, and shallow analysis, which can distort assessments of corporate fundamentals and trigger severe economic losses. While existing benchmarks have begun to evaluate such failures, they score all aspects of the generated analysis in one pass, failing to distinguish whether a model fails at foundational stages like auditing and correction, or underperforms at generating research-grade insights. Consequently, it obscures capability bottlenecks and the specialized strengths essential for multi-agent role assignment. To address these gaps, we introduce FinReasoning, a hierarchical benchmark that decomposes the core capabilities of financial research into semantic consistency, data alignment, and deep insight. We further propose a fine-grained evaluation framework that strengthens hallucination-correction assessment and incorporates a 12-indicator rubric for core analytical skills. FinReasoning reveals clear capability stratification across model types. Closed-source models (like Doubao-Seed-1.8) perform strongly overall and are better suited for core reasoning agents in multi-agent financial systems; open-source general models (like Qwen3-235B) show clear capability divergence and consistently underperform on Semantic Consistency, making them less suited for quality-sensitive generation tasks; financial-domain models (like Fin-R1) generate moderate insights but lack foundational auditing skills. Our work has already been deployed in pilot tests across several real-world scenarios. The resource is available at https://github.com/TongjiFinLab/FinReasoning.

2603.18856 2026-05-11 cs.CV cs.AI

Motion-o: Trajectory-Grounded Video Reasoning

Bishoy Galoaa, Shayda Moezzi, Xiangyu Bai, Sarah Ostadabbas

AI总结 本文提出了一种名为Motion-o的视频推理模型,旨在解决现有视频理解模型在轨迹依赖推理中的不足。该模型引入了基于运动的“运动推理链”(MCoT),通过显式标注物体的运动方向、速度和尺度变化,使轨迹信息可验证。研究通过构建轨迹标注数据集并设计相应的训练奖励机制,有效提升了模型对动态场景的推理能力,在多个视频理解任务中表现出色。

详情
英文摘要

Recent video reasoning models increasingly produce spatio-temporal evidence chains that localize objects at specific timestamps. While these traces improve interpretability by grounding \emph{where} and \emph{when} evidence appears, they often leave the motion connecting observations, the \textit{how}, implicit. This makes dynamic and trajectory-dependent claims difficult to supervise, verify, or penalize when unsupported by the video. We formalize this missing component as Spatial-Temporal-Trajectory (STT) reasoning and introduce \textbf{Motion-o}, a motion-centric extension to vision-language models (VLMs) that makes trajectories explicit and verifiable. Motion-o augments evidence chains with Motion Chain of Thought (MCoT), a structured pathway that represents object motion through a discrete \texttt{<motion/>} tag summarizing direction, speed, and scale change. To supervise MCoT, we densify sparse spatio-temporal annotations into object tracks and derive motion descriptors from centroid displacement and box-area change. We then train with complementary rewards for trajectory consistency and visual grounding, including a perturbation-based signal that penalizes motion descriptions that remain unchanged when temporal evidence is removed. Across multiple video understanding benchmarks, Motion-o consistently improves trajectory-faithful reasoning without architectural modifications. These results suggest that an explicit motion interface can complement existing VLM pipelines by converting implicit dynamics into verifiable evidence. Code is available at~\href{https://github.com/ostadabbas/Motion-o}{\faGithub\ \texttt{ostadabbas/Motion-o}}.

2603.18636 2026-05-11 cs.CV

Attention Sparsity is Input-Stable: Training-Free Sparse Attention for Video Generation via Offline Sparsity Profiling and Online QK Co-Clustering

Jiayi Luo, Jiayu Chen, Jiankun Wang, Cong Wang, Hanxin Zhu, Qingyun Sun, Chen Gao, Zhibo Chen, Jianxin Li

AI总结 该研究针对视频生成中扩散变换器(DiTs)因密集的3D注意力机制导致的高推理成本问题,提出了一种无需训练的稀疏注意力方法SVOO。通过离线逐层稀疏性分析和在线双向共聚类算法,SVOO有效解决了现有方法在注意力剪枝中忽略层间差异和查询-键耦合的问题,实现了更优的质量与加速比平衡。实验表明,SVOO在多个主流视频生成模型上取得了显著的加速效果,最高可达1.93倍,同时保持了较高的视频质量。

Comments Accepted by ICML 2026

详情
英文摘要

Diffusion Transformers (DiTs) achieve strong video generation quality but suffer from high inference cost due to dense 3D attention, motivating sparse attention techniques for improving efficiency. However, existing training-free sparse attention methods for video generation still face two unresolved limitations: ignoring layer heterogeneity in attention pruning and ignoring query-key coupling in block partitioning, which hinder a better quality-speedup trade-off. In this work, we uncover a critical insight: attention sparsity is an intrinsic layer-wise property, with only minor variation across different inputs. Motivated by this observation, we propose SVOO, a training-free sparse attention framework for fast video generation via offline layer-wise sparsity profiling and online bidirectional co-clustering. Specifically, SVOO adopts a two-stage paradigm: (i) offline layer-wise sensitivity profiling to derive intrinsic per-layer pruning levels, and (ii) online block-wise sparse attention via a bidirectional co-clustering algorithm. Extensive experiments on seven widely used video generation models demonstrate that SVOO achieves a superior quality-speedup trade-off over state-of-the-art methods, delivering up to 1.93x speedup while maintaining a PSNR of up to 29 dB on Wan2.1. Code is available at: https://github.com/Mutual-Luo/SVOO.

2603.16876 2026-05-11 cs.CV cs.AI cs.LG

Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation

Kaito Baba, Risa Kishikawa, Satoshi Kodera

AI总结 本文提出MARL-Rad,一种基于多智能体强化学习的多模态框架,用于生成放射科报告。该方法将胸部X光影像的解读任务分解为区域特异性智能体和全局整合智能体,并通过临床可验证的奖励进行联合优化,克服了现有方法中固定语言模型被手动设计为代理系统而未针对角色进行优化的局限。实验表明,MARL-Rad在多个临床评估指标上表现优异,生成的报告在准确性、详细性和左右侧一致性方面均有提升,并获得了临床专家的认可。

Comments 23 pages, 4 figures

详情
英文摘要

We propose MARL-Rad, a multi-modal multi-agent reinforcement learning framework for radiology report generation that trains the entire agentic system on policy within its deployed radiology workflow. MARL-Rad addresses the limitation of post-hoc agentization, where fixed LLMs are organized into hand-designed agentic workflows without being optimized for their assigned roles. Our framework decomposes chest X-ray interpretation into region-specific agents and a global integrating agent, and jointly optimizes them using clinically verifiable rewards. Experiments on the MIMIC-CXR and IU X-ray datasets show that MARL-Rad consistently improves clinical efficacy metrics such as RadGraph, CheXbert, and GREEN scores, achieving state-of-the-art clinical efficacy performance. Further analyses show that MARL-Rad improves laterality consistency and produces more accurate and detailed reports. A blinded clinician evaluation further suggests that MARL-Rad produces reports clinically comparable to ground-truth reports.

2603.14787 2026-05-11 cs.RO

Dynamic Properties and Motion Reproducibility of a Compact Pneumatically Actuated Humanoid Upper Body for Data-Driven Control

Hiroshi Atsuta, Hisashi Ishihara, Minoru Asada

AI总结 本文研究了一款具有13个自由度的紧凑型气动驱动人形机器人上半身的动力学特性及其运动可重复性。通过分析其执行器的时间延迟等关键动态特性,验证了系统的高可重复性,并基于此开发了一个基于多层感知机的数据驱动控制器,用于4自由度机械臂子系统的轨迹跟踪控制。实验表明,该方法在轨迹跟踪性能上优于传统PID控制器,展示了数据驱动方法在控制复杂高自由度气动机器人中的潜力。

Comments 25 pages, 21 figures. Submitted to Advanced Robotics

详情
英文摘要

Pneumatically-actuated anthropomorphic robots with high degrees of freedom (DOF) offer significant potential for physical human-robot interaction. However, precise control of pneumatic actuators is challenging due to their inherent nonlinearities. This paper presents the development of a compact 13-DOF upper-body humanoid robot. To assess the feasibility of an effective controller, we first investigate its key dynamic properties, such as actuation time delays, and confirm that the system exhibits highly reproducible behavior. Leveraging this reproducibility, we implement a preliminary data-driven controller for a 4-DOF arm subsystem based on a multilayer perceptron with explicit time delay compensation. The network was trained on random movement data to generate pressure commands for tracking arbitrary trajectories. Comparative evaluations with a traditional PID controller demonstrate superior trajectory tracking performance, highlighting the potential of data-driven approaches for controlling complex, high-DOF pneumatic robots.

2603.14186 2026-05-11 cs.CV

Setting-Matched and Semantics-Scaled Benchmarking of One-Step Generative Models Against Multistep Diffusion and Flow Models

Advaith Ravishankar, Serena Liu, Mingyang Wang, Todd Zhou, Jeffrey Zhou, Arnav Sharma, Ziling Hu, Léopold Das, Abdulaziz Sobirov, Faizaan Siddique, Freddy Yu, Seungjoo Baek, Yan Luo, Mengyu Wang

AI总结 本文针对当前文本到图像生成模型在推理过程中计算成本高的问题,提出了一种公平对比单步生成模型与多步扩散和流模型的方法。研究构建了一个标准化的评估框架,基于ImageNet及其扩展数据集,对多种模型在不同采样步数和分类器无关引导(CFG)设置下的性能进行系统比较。通过引入语义对齐指标(如CLIP Score和Pick Score),揭示了仅依赖FID等传统指标可能带来的误导,并提出了新的评估指标(如csFID、psFID)以更全面地衡量生成图像的语义一致性与视觉质量。

详情
英文摘要

State-of-the-art text-to-image models produce high-quality images, but inference remains expensive as generation requires several sequential ODE or denoising steps. Native one-step models aim to reduce this cost by mapping noise to an image in a single step, yet fair comparisons to multi-step systems are difficult because studies use mismatched sampling steps and different classifier-free guidance (CFG) settings, where CFG can shift FID, Inception Score, and CLIP-based alignment in opposing directions. It is also unclear how well one-step models scale to multi-step inference, and there is limited standardized out-of-distribution evaluation for label-ID-conditioned generators beyond ImageNet. To address this, we benchmark eight models spanning one-step flows (MeanFlow, Improved MeanFlow, SoFlow), multi-step baselines (RAE, Scale-RAE), and established systems (SiT, Stable Diffusion 3.5, FLUX.1) under a class-conditional protocol on ImageNet validation, ImageNetV2, and reLAIONet, our new proofread out-of-distribution dataset aligned to ImageNet label IDs. Using FID, Inception Score, CLIP Score, and Pick Score, we show that FID-focused model development and CFG selection can be misleading in few-step regimes, where guidance changes can improve FID while degrading text-image alignment and human preference signals, worsening visual quality. To make these tradeoffs explicit, we introduce CLIP-scaled and PickScore-scaled variants of FID (csFID, psFID) and Inception Score (csIS, psIS) to serve as a diagnostic for semantically aligned image generation.

2603.08256 2026-05-11 cs.CL

NCL-UoR at SemEval-2026 Task 5: Embedding-Based Methods, Fine-Tuning, and LLMs for Word Sense Plausibility Rating

Tong Wu, Thanet Markchom, Huizhi Liang

AI总结 本文研究了在短篇叙事语境中对歧义同形词的词义合理性进行评分的问题,提出并比较了三种方法:基于嵌入的回归方法、参数高效的Transformer微调方法以及结合结构化推理和显式决策规则的大语言模型提示方法。最佳系统采用结构化提示策略,将评估分解为叙事的不同组成部分,并应用显式决策规则进行评分校准,实验表明该方法在性能上优于微调模型和基于嵌入的方法,且提示设计比模型规模对任务表现更具影响。

详情
英文摘要

Word sense plausibility rating requires predicting the human-perceived plausibility of a given word sense on a 1-5 scale in the context of short narrative stories containing ambiguous homonyms. This paper systematically compares three approaches: (1) embedding-based methods pairing sentence embeddings with standard regressors, (2) transformer fine-tuning with parameter-efficient adaptation, and (3) large language model (LLM) prompting with structured reasoning and explicit decision rules. The best-performing system employs a structured prompting strategy that decomposes evaluation into narrative components (precontext, target sentence, ending) and applies explicit decision rules for rating calibration. The analysis reveals that structured prompting with decision rules outperforms both fine-tuned models and embedding-based approaches, and that prompt design matters more than model scale for this task.

2603.07475 2026-05-11 cs.CL cs.LG

A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs

Raghavv Goel, Risheek Garrepalli, Sudhanshu Agrawal, Chris Lott, Mingu Lee, Fatih Porikli

AI总结 本文比较了自回归(AR)语言模型与扩散语言模型(dLLMs)在层和词元层面的表征能力,揭示了扩散目标生成更全局的表征,具有早期层冗余和更少的近期偏差,而AR模型则生成紧密耦合的局部结构表征。研究发现,尽管扩散模型在训练时采用扩散目标,但以AR模型初始化的dLLMs仍保留AR动态特性,表明初始化偏差具有持续性。通过利用扩散模型的冗余性,原生dLLMs在保持性能的同时可减少高达18.75%的计算量,而AR模型在相同操作下性能显著下降。

Comments v3: improving writing with all v2 changes

详情
英文摘要

Autoregressive (AR) language models build representations incrementally via left-to-right prediction, while diffusion language models (dLLMs) are trained through full-sequence denoising. Although recent dLLMs match AR performance, whether diffusion objectives fundamentally reshape internal representations remains unclear. We perform the first layer- and token-wise representational analysis comparing native dLLMs (LLaDA), native AR models (Qwen2.5), and AR-initialized dLLMs (Dream-7B), using cosine similarity across layers and tokens alongside static inference-time layer-skipping as an analytical probe of redundancy. We find that diffusion objectives produce more global representations with substantial early-layer redundancy and reduced recency bias, while AR objectives yield tightly coupled, locally structured representations. AR-initialized dLLMs retain AR-like dynamics despite diffusion training, revealing persistent initialization bias. Leveraging this redundancy, native dLLMs absorb up to 18.75% FLOPs reduction while retaining over 90% performance on math-reasoning and coding benchmarks, whereas AR models collapse under identical skipping, revealing that diffusion objectives, rather than architecture alone, induce depth redundancy that enables principled compression.

2603.05687 2026-05-11 cs.RO

Contact-Grounded Policy: Dexterous Visuotactile Policy with Generative Contact Grounding

Zhengtong Xu, Yeping Wang, Ben Abbatematteo, Jom Preechayasomboon, Sonny Chan, Nick Colonnese, Amirhossein H. Memar

AI总结 该研究针对多指机械手在复杂接触场景下的灵巧操作难题,提出了一种基于触觉反馈的接触引导策略(CGP)。该方法通过预测机器人状态与触觉反馈的耦合轨迹,并利用学习到的接触一致性映射,将预测结果转化为合规控制器可执行的目标状态,从而实现精确的接触控制。实验表明,CGP在实物和仿真环境中均优于现有视觉和视觉-触觉策略,显著提升了灵巧操作任务的性能。

详情
英文摘要

Contact-rich dexterous manipulation with multi-finger hands remains an open challenge in robotics because task success depends on multi-point contacts that continuously evolve and are highly sensitive to object geometry, frictional transitions, and slip. Recently, tactile-informed manipulation policies have shown promise. However, most use tactile signals as additional observations rather than modeling contact state or how their action outputs interact with low-level controller dynamics. We present Contact-Grounded Policy (CGP), a visuotactile policy that grounds multi-point contacts by predicting coupled trajectories of actual robot state and tactile feedback, and using a learned contact-consistency mapping to convert these predictions into executable target robot states for a compliance controller. CGP consists of two components: (i) a conditional diffusion model that forecasts future robot state and tactile feedback in a compressed latent space, and (ii) a learned contact-consistency mapping that converts the predicted robot state-tactile pair into executable targets for a compliance controller, enabling it to realize the intended contacts. We evaluate CGP using a physical four-finger Allegro V5 hand with Digit360 fingertip tactile sensors, and a simulated five-finger Tesollo DG-5F hand with dense whole-hand tactile arrays. Across a range of dexterous tasks including in-hand manipulation, delicate grasping, and tool use, CGP outperforms visuomotor and visuotactile diffusion-policy baselines.

2603.05117 2026-05-11 cs.RO

SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation

Youqiang Gui, Yuxuan Zhou, Shen Cheng, Xinyang Yuan, Haoqiang Fan, Peng Cheng, Shuaicheng Liu

AI总结 该研究提出了一种名为SeedPolicy的扩散策略模型,用于解决机器人操作任务中长期时间跨度下的模仿学习问题。其核心方法是引入自进化门控注意力模块(SEGA),通过时间演化潜状态实现高效递归更新,从而在保持长期上下文信息的同时过滤无关时间信息。实验表明,SeedPolicy在多个操作任务中显著优于现有方法,尤其在参数效率方面表现突出,展示了其在长时域机器人操作中的先进性。

Comments 22 pages, 14 figures

详情
英文摘要

Imitation Learning (IL) enables robots to acquire manipulation skills from expert demonstrations. Diffusion Policy (DP) models multi-modal expert behaviors but degrades when naively increasing stacked observation horizons, limiting long-horizon manipulation. We propose Self-Evolving Gated Attention (SEGA), a temporal module that maintains a time-evolving latent state via gated attention, enabling efficient recurrent updates that accumulate long-term context into a compact latent representation while filtering irrelevant temporal information. Integrating SEGA into DP yields Self-Evolving Diffusion Policy (SeedPolicy), which resolves the temporal modeling bottleneck and extends the effective temporal horizon with moderate overhead. On the RoboTwin 2.0 benchmark with 50 manipulation tasks, SeedPolicy outperforms DP and other IL baselines. Averaged across both CNN and Transformer backbones, SeedPolicy achieves 36.8% relative improvement in clean settings and 169% relative improvement in randomized challenging settings over the DP. Compared to vision-language-action models such as RDT with 1.2B parameters, SeedPolicy achieves stronger performance in the clean setting with one to two orders of magnitude fewer parameters, demonstrating strong efficiency. These results establish SeedPolicy as a state-of-the-art imitation learning method for long-horizon robotic manipulation. Code is available at: https://anonymous.4open.science/r/SeedPolicy-64F0/.

2603.04676 2026-05-11 cs.CV cs.AI

Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks

Chenjun Li

AI总结 多图像推理仍是视觉语言模型(VLMs)面临的重要挑战。本文发现,在生成推理链的过程中,VLMs 的文本到图像注意力表现出分散的“脉冲”现象,即注意力分布松散且无法聚焦于任务相关的图像,并揭示了注意力分配中存在系统性的位置偏差。针对这一问题,作者提出了一种无需训练、仅在推理阶段使用的 PulseFocus 方法,通过结构化推理过程为计划与聚焦块交替,并结合软注意力门控机制,有效提升了模型对相关图像的注意力聚焦能力,在多个多图像基准测试中取得了显著性能提升。

Comments This article is withdrawn because the experimental results and analysis require substantial revision. The current version should not be cited as a reliable representation of the work

详情
英文摘要

Multi-image reasoning remains a significant challenge for vision-language models (VLMs). We investigate a previously overlooked phenomenon: during chain-of-thought (CoT) generation, the text-to-image (T2I) attention of reasoning VLMs exhibits diffuse "pulses": sporadic and unfocused attention patterns that fail to concentrate on task-relevant images. We further reveal a systematic positional bias in attention allocation across images. Motivated by these observations, we propose PulseFocus, a training-free, inference-time method that structures CoT reasoning into interleaved plan/focus blocks with soft attention gating. By forcing the model to explicitly plan which image to examine and then gating decode-time attention to the referenced image, PulseFocus sharpens attention focus and yields consistent improvements on multi-image benchmarks like BLINK benchmark (+3.7%) and MuirBench (+1.07%).

2603.02883 2026-05-11 cs.CV

SemanticDialect: Semantic-Aware Mixed-Format Quantization for Video Diffusion Transformers

Wonsuk Jang, Thierry Tambe

AI总结 本文提出了一种名为 SemanticDialect 的语义感知混合格式量化方法,旨在解决视频扩散变换器(DiT)在边缘设备部署时面临的大内存和高计算量问题。该方法通过块级混合格式量化,结合查找表实现高效格式选择,并引入注意力引导的激活分解和语义感知的格式分配策略,有效降低了量化误差并保持视频语义与时间一致性。实验表明,SemanticDialect 在保持接近 FP16 质量的同时,优于现有量化方法,并具备良好的硬件部署能力。

详情
英文摘要

Diffusion Transformers (DiTs) achieve state-of-the-art video generation quality, but their substantial memory and computational footprints hinder edge deployment. Quantization can reduce these costs, yet existing methods often degrade video quality due to high activation variation and the difficulty of preserving semantic and temporal coherence. We propose SemanticDialect, which advances block-wise mixed-format quantization. In this framework, each block selects an optimal format (dialect) from a candidate set (formatbook), which is augmented with lookup tables that store quantization errors and quantized indices, enabling efficient per-block format selection and quantization with minimal online overhead. We further introduce attention-guided activation decomposition, which reduces quantization error via residual quantization, and semantic-aware dialect assignment (SeDA), which reduces cross-token quantization inconsistency by enforcing format uniformity among semantically correlated tokens. Experiments demonstrate that SemanticDialect outperforms prior quantization methods and block-wise formats (MXFP4, NVFP4) while approaching FP16 quality on Open-Sora 2.0. We also validate hardware deployability through RTL design and GPU kernel implementation.

2603.01586 2026-05-11 cs.CV

InterCoG: Towards Spatially Precise Image Editing with Interleaved Chain-of-Grounding Reasoning

Yecong Wan, Fan Li, Chunwei Wang, Hao Wu, Mingwen Shao, Wangmeng Zuo

AI总结 该研究提出了一种名为InterCoG的新型图文交织的链式 grounding 推理框架,旨在解决复杂多实体场景中的细粒度图像编辑问题。其核心方法是通过文本中的空间关系信息进行目标位置推理,再结合视觉定位生成边界框和掩码,最终实现对编辑目标的精确修改。为提升模型性能,研究还引入了两种辅助训练模块,并构建了包含详细推理注释的GroundEdit-45K数据集与评估基准GroundEdit-Bench,实验表明该方法在复杂空间场景下的编辑精度显著优于现有方法。

详情
英文摘要

Emerging unified editing models have demonstrated strong capabilities in general object editing tasks. However, it remains a significant challenge to perform fine-grained editing in complex multi-entity scenes, particularly those where targets are not visually salient and require spatial reasoning. To this end, we propose InterCoG, a novel text-vision Interleaved Chain-of-Grounding reasoning framework for fine-grained image editing in complex real-world scenes. The key insight of InterCoG is to first perform object position reasoning solely within text that includes spatial relation details to explicitly deduce the location and identity of the edited target. It then conducts visual grounding via highlighting the editing targets with generated bounding boxes and masks in pixel space, and finally rewrites the editing description to specify the intended outcomes. To further facilitate this paradigm, we propose two auxiliary training modules: multimodal grounding reconstruction supervision and multimodal grounding reasoning alignment to enforce spatial localization accuracy and reasoning interpretability, respectively. We also construct GroundEdit-45K, a dataset comprising 45K grounding-oriented editing samples with detailed reasoning annotations, and GroundEdit-Bench for grounding-aware editing evaluation. Extensive experiments substantiate the superiority of our approach in highly precise edits under spatially intricate and multi-entity scenes.

2602.23811 2026-05-11 cs.LG cs.AI

Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parametric Policies

Xiang Li, Yuheng Zhang, Nan Jiang

AI总结 本文研究了在一般函数逼近框架下的离线强化学习理论问题,旨在从离线数据中学习具有良好性能的策略。针对现有算法在大动作空间或连续动作空间中计算效率低、依赖状态级镜像梯度且难以支持独立策略参数化的问题,本文提出了一种适用于参数化策略类的理论分析方法,并揭示了上下文耦合是扩展镜像梯度至参数化策略的核心难点。通过将镜像梯度与自然策略梯度联系起来,文章获得了新的理论保证与算法见解,并揭示了离线强化学习与模仿学习之间的潜在统一性。

详情
英文摘要

We investigate the theoretical aspects of offline reinforcement learning (RL) under general function approximation. While prior works (e.g., Xie et al., 2021) have established the theoretical foundations of learning a good policy from offline data via pessimism, existing algorithms that are computationally tractable (often in an oracle-efficient sense), such as PSPI, only apply to finite and small action spaces. Moreover, these algorithms rely on state-wise mirror descent and require actors to be implicitly induced from the critic functions, failing to accommodate standalone policy parameterization which is ubiquitous in practice. In this work, we address these limitations and extend the theoretical guarantees to parameterized policy classes over large or continuous action spaces. When extending mirror descent to parameterized policies, we identify contextual coupling as the core difficulty, and show how connecting mirror descent to natural policy gradient leads to novel analyses, guarantees, and algorithmic insights, including a surprising unification between offline RL and imitation learning.

2602.22831 2026-05-11 cs.LG cs.AI cs.CL cs.CV cs.CY

Direction-Flipped Influence Audits Reveal Hidden Structure in Moral Choices of LLMs

Phil Blandfort, Tushar Karayil, Alex McKenzie, Urja Pawar, Robert Graham, Dmitrii Krasheninnikov

AI总结 该研究通过“方向翻转影响审计”方法,揭示了大型语言模型在道德选择任务中对上下文线索的敏感性。研究发现,即使是微小的上下文提示,也能显著影响模型的选择倾向,且不同模型家族对此表现出不同反应。研究还发现,部分模型在面对特定提示时会出现“反向效应”,即选择与提示方向相反的选项,且这种现象在有推理能力的模型中依然存在,但表现形式有所差异。这一发现为评估和理解语言模型的道德决策机制提供了新的视角和工具。

详情
英文摘要

Moral benchmarks for LLMs typically score models on context-free prompts, implicitly treating the measured choice rate as stable. We test this assumption with a direction-flipped influence audit: for each scenario, we compare a baseline prompt with matched cues steering toward option A or option B. Across a trolley-problem-style moral triage task, BBQ, and DailyDilemmas, and across five LLM families with and without reasoning, short contextual cues shift per-condition choice rates by 12-18 percentage points on average. These shifts reveal structure that baseline scores miss: roughly 40% of baseline-neutral triage and BBQ conditions exhibit directional asymmetry under influence, and a meaningful share of significant effects backfire, moving opposite the cue's intended direction. In follow-up probes, models often recognize the cue while denying that it affected their choice. Among significant backfire trials, this stated-vs.-revealed inconsistency appears in 78% of cases. Reasoning does not eliminate contextual sensitivity but reshapes it: social-pressure cues such as user preference and emotional appeal weaken across benchmarks, while few-shot demonstrations strengthen sharply on both triage and BBQ. We recommend direction-flipped influence pairs as a standard complement to context-free moral-bias evaluation, and release the harness and data to make such audits routine.

2602.21858 2026-05-11 cs.AI

ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices

Dezhi Kong, Zhengzhao Feng, Qiliang Liang, Hao Wang, Haofei Sun, Changpeng Yang, Yang Li, Peng Zhou, Shuai Nie, Hongzhen Wang, Linfeng Zhou, Hao Jia, Jiaming Xu, Runyu Shi, Ying Huang

AI总结 ProactiveMobile 是一个旨在推动移动端主动智能发展的综合性基准测试平台,针对当前多模态大语言模型在移动端主要局限于被动响应用户指令的局限性,提出了一种自主预测用户需求并主动执行任务的新范式。该基准通过分析设备上的多维上下文信号,生成可执行的函数序列,并包含14种场景下的3,660个实例,全面评估模型的主动推理与执行能力。实验表明,该基准能够有效衡量模型的主动智能水平,凸显当前模型在该方面的能力不足及改进空间。

详情
英文摘要

Multimodal large language models (MLLMs) have made significant progress in mobile agent development, yet their capabilities are predominantly confined to a reactive paradigm, where they merely execute explicit user commands. The emerging paradigm of proactive intelligence, where agents autonomously anticipate needs and initiate actions, represents the next frontier for mobile agents. However, its development is critically bottlenecked by the lack of benchmarks that can address real-world complexity and enable objective, executable evaluation. To overcome these challenges, we introduce ProactiveMobile, a comprehensive benchmark designed to systematically advance research in this domain. ProactiveMobile formalizes the proactive task as inferring latent user intent across four dimensions of on-device contextual signals and generating an executable function sequence from a comprehensive function pool of 63 APIs. The benchmark features over 3,660 instances of 14 scenarios that embrace real-world complexity through multi-answer annotations. To ensure quality, a team of 30 experts conducts a final audit of the benchmark, verifying factual accuracy, logical consistency, and action feasibility, and correcting any non-compliant entries. Extensive experiments demonstrate that our fine-tuned Qwen2.5-VL-7B-Instruct achieves a success rate of 19.15%, outperforming o1 (15.71%) and GPT-5 (7.39%). This result indicates that proactivity is a critical competency widely lacking in current MLLMs, yet it is learnable, emphasizing the importance of the proposed benchmark for proactivity evaluation.

2602.20974 2026-05-11 cs.LG

MAST: A Multi-fidelity Augmented Surrogate model via Spatial Trust-weighting

Ahmed Mohamed Eisa Nasr, Ali Elham, Haris Moazam Sheikh

AI总结 在工程设计和科学计算中,计算成本与预测精度之间存在固有矛盾。为了解决这一问题,本文提出了一种名为MAST的多保真度增强代理模型方法,通过空间信任加权机制,结合修正后的低保真度数据与高保真度预测,在高保真度样本附近信任高保真度结果,其他区域则依赖修正后的低保真度数据。该方法通过显式差异建模和基于距离的权重分配,构建了一个异方差高斯过程模型,显著提升了多保真度场景下的预测性能,并在不同预算和保真度差距条件下表现出良好的鲁棒性。

详情
英文摘要

In engineering design and scientific computing, computational cost and predictive accuracy are intrinsically coupled. High-fidelity simulations provide accurate predictions but at substantial computational costs, while lower-fidelity approximations offer efficiency at the expense of accuracy. Multi-fidelity surrogate modelling addresses this trade-off by combining abundant low-fidelity data with sparse high-fidelity observations. However, existing methods rely on global correlation assumptions that can often fail in practice to capture how fidelity relationships vary across the input space, leading to poor performance, particularly under tight budget constraints. We introduce MAST, a method that blends corrected low-fidelity observations with high-fidelity predictions, trusting high-fidelity near observed samples and relying on corrected low-fidelity elsewhere. MAST achieves this through explicit discrepancy modelling and distance-based weighting with closed-form variance propagation, producing a single heteroscedastic Gaussian process. Across multi-fidelity synthetic benchmarks, MAST shows a marked improvement over the current state-of-the-art techniques. Crucially, MAST maintains robust performance across varying total budget and fidelity gaps, conditions under which competing methods exhibit significant degradation or unstable behaviour. More broadly, MAST provides a spatially adaptive framework for multi-fidelity Gaussian-process modelling, in which the contribution of low-fidelity information is governed by its proximity to high-fidelity calibration data, opening a new direction for more reliable surrogate construction under sparse and budget-constrained settings.

2602.20816 2026-05-11 cs.CL cs.LG

Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation

Sayantan Dasgupta, Trevor Cohn, Timothy Baldwin

AI总结 本文研究了语言模型知识蒸馏中核心学习信号的问题,指出传统KL散度过于关注教师模型输出中概率最高的几个词(即众数),而忽略了分布尾部可能包含的有用信息。为此,作者提出了一种新的尾部感知散度,将教师模型的top-K预测概率与低概率预测的贡献解耦,在保持计算效率的同时增强分布尾部的影响。实验表明,该方法在多种数据集上提升了解码器模型的预训练和有监督蒸馏效果,且蒸馏过程高效,适合大规模数据集的学术级计算资源。

Comments ICML 2026

详情
英文摘要

The core learning signal used in language model distillation is the standard Kullback-Leibler (KL) divergence between the student and teacher distributions. Traditional KL divergence tends to be dominated by the next tokens with the highest probabilities, i.e., the teacher's modes, thereby diminishing the influence of less probable yet potentially informative components of the output distribution. We propose a new tail-aware divergence that decouples the contribution of the teacher model's top-K predicted probabilities from that of lower-probability predictions, while maintaining the same computational profile as the KL Divergence. Our decoupled approach reduces the impact of the teacher modes and, consequently, increases the contribution of the tail of the distribution. Experimental results demonstrate that our modified distillation method yields competitive performance in both pre-training and supervised distillation of decoder models across various datasets. Furthermore, the distillation process is efficient and can be performed with a modest academic budget for large datasets, eliminating the need for industry-scale computing.

2602.20338 2026-05-11 cs.LG

Emergent Manifold Separability during Reasoning in Large Language Models

Chanwoo Chun, Alexandre Polo, SueYeon Chung

AI总结 该研究探讨了大型语言模型在推理过程中潜在表示的几何动态变化,通过应用流形容量理论(MCT),分析了两个组合推理任务中的表示演化过程。研究发现,推理时模型会短暂地将概念流形解耦为线性可分的子空间,随后迅速压缩,这一现象与传统的线性探针准确率存在显著差异。研究认为,这种动态变化反映了模型通过动态流形管理机制优化残差流信息带宽,从而提升推理效率。

Comments Alexandre Polo and Chanwoo Chun contributed equally to this work

详情
英文摘要

Chain-of-Thought (CoT) prompting significantly improves reasoning in Large Language Models, yet the temporal dynamics of the underlying representation geometry remain poorly understood. We investigate these dynamics by applying Manifold Capacity Theory (MCT) to two compositional reasoning tasks: a controlled Boolean logic tree that supports deep mechanistic analysis, and a natural-language eligibility task in which the model has to extract attributes from prose, compare them to thresholds, and compose the local decisions through a fixed evaluation tree. MCT lets us quantify the linear separability of latent representations without the confounding factors of probe training. On both tasks, and across several open-weight models, reasoning manifests as a transient geometric pulse: concept manifolds are untangled into linearly separable subspaces immediately prior to computation and rapidly compressed thereafter. This behavior diverges from standard linear probe accuracy, which remains high long after computation, suggesting a fundamental distinction between information that is merely retrievable and information that is geometrically prepared for processing. We interpret this phenomenon as Dynamic Manifold Management, a mechanism where the model dynamically modulates representational capacity to optimize the bandwidth of the residual stream throughout the reasoning chain.

2602.19974 2026-05-11 cs.CV

RL-RIG: A Generative Spatial Reasoner via Intrinsic Reflection

Tianyu Wang, Zhiyuan Ma, Qian Wang, Xinyi Zhang, Xinwei Long, Bowen Zhou

AI总结 本文提出RL-RIG,一种基于强化学习的生成式空间推理框架,旨在解决图像生成模型在捕捉细粒度空间关系和生成结构完整场景方面的不足。该方法采用生成-反思-编辑的范式,结合扩散模型、检查器、策略网络和逆扩散模块,通过内在反思机制增强模型的推理能力。实验表明,RL-RIG在空间一致性指标上优于现有先进模型,提升了可控性和空间生成精度。

详情
英文摘要

Recent advancements in image generation have achieved impressive results in producing high-quality images. However, existing image generation models still generally struggle with a spatial reasoning dilemma, lacking the ability to accurately capture fine-grained spatial relationships from the prompt and correctly generate scenes with structural integrity. To mitigate this dilemma, we propose RL-RIG, a Reinforcement Learning framework for Reflection-based Image Generation. Our architecture comprises four primary components: Diffuser, Checker, Actor, and Inverse Diffuser, following a Generate-Reflect-Edit paradigm to spark the Chain of Thought reasoning ability in image generation for addressing the dilemma. To equip the model with better intuition over generation trajectories, we further develop Reflection-GRPO to train the VLM Actor for edit prompts and the Image Editor for better image quality under a given prompt, respectively. Unlike traditional approaches that solely produce visually stunning yet structurally unreasonable content, our evaluation metrics prioritize spatial accuracy, utilizing Scene Graph IoU and employing a VLM-as-a-Judge strategy to assess the spatial consistency of generated images on LAION-SG dataset. Experimental results show that RL-RIG outperforms existing state-of-the-art open-source models by up to 11% in terms of controllable and precise spatial reasoning in image generation.

2602.17472 2026-05-11 cs.RO

A Cost-Effective and Climate-Resilient Air Pressure System for Rain Effect Reduction on Automated Vehicle Cameras

Mohamed Sabry, Joseba Gorospe, Cristina Olaverri-Monreal

AI总结 本文提出了一种低成本且具有气候适应性的气压系统,用于减少雨天对自动驾驶车辆摄像头的干扰。该系统能够同时兼容多台摄像头,有效提升了深度学习模型在雨天环境下对行人检测的准确率,从8.3%提升至41.6%。该方案不仅在技术上具有创新性,还支持交通系统的可持续发展目标,降低了资源消耗,促进了自动驾驶技术在恶劣天气条件下的高效部署。

详情
英文摘要

Recent advances in automated vehicles have focused on improving perception performance under adverse weather conditions; however, research on physical hardware solutions remains limited, despite their importance for perception critical applications such as vehicle platooning. Existing approaches, such as hydrophilic or hydrophobic lenses and sprays, provide only partial mitigation, while industrial protection systems imply high cost and they do not enable scalability for automotive deployment. To address these limitations, this paper presents a cost-effective hardware solution for rainy conditions, designed to be compatible with multiple cameras simultaneously. Beyond its technical contribution, the proposed solution supports sustainability goals in transportation systems. By enabling compatibility with existing camera-based sensing platforms, the system extends the operational reliability of automated vehicles without requiring additional high-cost sensors or hardware replacements. This approach reduces resource consumption, supports modular upgrades, and promotes more cost-efficient deployment of automated vehicle technologies, particularly in challenging weather conditions where system failures would otherwise lead to inefficiencies and increased emissions. The proposed system was able to increase pedestrian detection accuracy of a Deep Learning model from 8.3% to 41.6%.

2602.16548 2026-05-11 cs.LG

RIDER: 3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion

Tianmeng Hu, Yongzheng Cui, Biao Luo, Ke Li

AI总结 RIDER 是一种基于强化学习引导的扩散模型,用于三维 RNA 反向设计,旨在直接优化生成序列与目标三维结构的相似性。该方法首先构建了一个基于图神经网络的生成扩散模型,并在目标结构条件下进行预训练,显著提升了原始序列恢复性能。随后,通过改进的策略梯度算法和四个基于三维自洽性指标的奖励函数进行微调,实验表明 RIDER 在结构相似性指标上提升了超过 100%,并能生成与天然序列不同的有效设计。

Comments Accepted as a conference paper at ICLR 2026

详情
英文摘要

The inverse design of RNA three-dimensional (3D) structures is crucial for engineering functional RNAs in synthetic biology and therapeutics. While recent deep learning approaches have advanced this field, they are typically optimized and evaluated using native sequence recovery, which is a limited surrogate for structural fidelity, since different sequences can fold into similar 3D structures and high recovery does not necessarily indicate correct folding. To address this limitation, we propose RIDER, an RNA Inverse DEsign framework with Reinforcement learning that directly optimizes for 3D structural similarity. First, we develop and pre-train a GNN-based generative diffusion model conditioned on the target 3D structure, achieving a 9% improvement in native sequence recovery over state-of-the-art methods. Then, we fine-tune the model with an improved policy gradient algorithm using four task-specific reward functions based on 3D self-consistency metrics. Experimental results show that RIDER improves structural similarity by over 100% across all metrics and discovers designs that are distinct from native sequences.

2602.13837 2026-05-11 cs.CV

A Causal Diffusion Model for Video Reconstruction from Ultra-Low-Bitrate Representations

Cem Eteke, Batuhan Tosun, Martin Piccolrovazzi, Alexander Griessel, Wolfgang Kellerer, Eckehard Steinbach

AI总结 本文研究从超低比特率表示中重建视频的问题,重点解决传统和神经编解码器在解码过程中引入模糊、生成模型和语义方法难以同时保持保真度、时间一致性和感知质量的挑战。为此,作者提出了一种因果视频扩散模型,通过联合建模超低比特率语义和高度压缩帧的互补信息进行视频重建,并引入单向时间知识蒸馏以实现高效训练和因果推理。实验表明,该方法在超低比特率视频重建任务中优于多种经典、神经、生成和语义基线方法。

详情
英文摘要

We study video reconstruction from ultra-low-bitrate representations, where the primary challenge shifts from encoding to decoding. In this regime, reconstruction with classical and neural codecs introduces blur, while generative and semantic approaches often struggle to jointly preserve fidelity, temporal consistency, and perceptual quality. To address these limitations, we propose a causal video diffusion model that reconstructs videos from ultra-low-bitrate semantics and highly compressed frames by jointly modeling their complementary information. We further introduce temporal-only distillation from a bidirectional teacher to enable parameter-efficient training and causal few-step inference. Through extensive quantitative, qualitative, and subjective evaluation, we show that the proposed method outperforms classical, neural, generative, and semantic baselines in ultra-low-bitrate video reconstruction.

2602.13506 2026-05-11 cs.LG cs.AI math.OC

$γ$-weakly $θ$-up-concavity: A Unified Framework for Non-Convex Optimization Beyond DR-Submodular and OSS Functions

Mohammad Pedramfar, Vaneet Aggarwal

AI总结 本文提出了一种新的非凸优化函数的统一框架——$γ$-weakly $θ$-up-concavity,该条件严格推广了DR-子模函数和单侧平滑(OSS)函数,能够描述更广泛的尺度依赖性曲率特性。研究证明这类函数具有上线性化性质,即在任意可行点处均可构造一个线性代理函数,其增益可证明逼近原非线性目标函数。该框架为离线优化和在线优化中的多种问题提供了统一的近似保证,并在DR-子模最大化和OSS优化中取得了更优的近似系数。

详情
英文摘要

Optimizing non-convex functions is a fundamental challenge across machine learning and combinatorial optimization. We introduce and study $γ$-weakly $θ$-up-concavity, a novel first-order condition that characterizes a broad class of such functions. This condition provides a powerful unifying framework, strictly generalizing both DR-submodular and One-Sided Smooth (OSS) functions while capturing broader forms of scale-dependent curvature, including accumulating-then-diminishing returns and flat-start behavior. Our central theoretical contribution demonstrates that $γ$-weakly $θ$-up-concave functions are upper-linearizable: for any feasible point, we can construct a linear surrogate whose gains provably approximate the original non-linear objective. A key technical contribution is a nonuniform upper-linearization argument yielding approximation coefficients that depend explicitly on the curvature parameters and the geometry of the feasible region. This linearizability yields immediate and unified approximation guarantees for a wide range of problems. Specifically, we obtain unified approximation guarantees for offline optimization as well as static and dynamic regret bounds in online settings via standard reductions to linear optimization. Moreover, our framework recovers the optimal approximation coefficient for DR-submodular maximization and improves existing approximation coefficients for OSS optimization, particularly over matroid constraints.

2602.13357 2026-05-11 cs.CV cs.AI

AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers

Dong Liu, Yanxuan Yu, Ben Lengerich, Ying Nian Wu

AI总结 本文提出了一种名为 AdaCorrection 的自适应偏移缓存校正框架,用于提升扩散变换器(DiTs)在图像和视频生成中的推理效率与生成质量。该方法通过在每个时间步动态估计缓存的有效性,并自适应地融合缓存特征与新生成特征,有效缓解了传统缓存策略导致的时间漂移和缓存错位问题。实验表明,AdaCorrection 在保持生成质量接近原始模型的同时,实现了显著的加速效果。

详情
英文摘要

Diffusion Transformers (DiTs) achieve state-of-the-art performance in high-fidelity image and video generation but suffer from expensive inference due to their iterative denoising structure. While prior methods accelerate sampling by caching intermediate features, they rely on static reuse schedules or coarse-grained heuristics, which often lead to temporal drift and cache misalignment that significantly degrade generation quality. We introduce \textbf{AdaCorrection}, an adaptive offset cache correction framework that maintains high generation fidelity while enabling efficient cache reuse across Transformer layers during diffusion inference. At each timestep, AdaCorrection estimates cache validity with lightweight spatio-temporal signals and adaptively blends cached and fresh activations. This correction is computed on-the-fly without additional supervision or retraining. Our approach achieves strong generation quality with minimal computational overhead, maintaining near-original FID while providing moderate acceleration. Experiments on image and video diffusion benchmarks show that AdaCorrection consistently improves generation performance.

2602.13224 2026-05-11 cs.AI cs.CL

A Geometric Taxonomy of Hallucinations in LLMs

Javier Marín

AI总结 该论文研究了大型语言模型中幻觉现象的几何分类问题,旨在在仅能访问模型输入和输出的黑盒场景下,开发可解释的检测方法。作者提出了一种基于嵌入空间几何特性的分类框架,将幻觉分为三类,并通过角度比、方向特征等几何指标预测其可检测性。研究还构建了一个包含212对人工生成错误回答的数据集,验证了方法在实际部署场景中的有效性。

详情
英文摘要

Hallucinations in deployed language models can have real consequences for downstream decisions in domains such as healthcare, legal, and financial services. In production, detection has to run on what the deployed system can see: the query, the response, and often a source document. White-box access to model internals and multi-sample querying are not generally available behind a third-party API. Within this setting - black-box, single-pass, only question/answer available - the dominant baseline is NLI, which returns a value but no diagnosis when it fails. We argue that operating directly on the geometry of the embedding space provides detection methods whose successes and failures are interpretable as structural properties of contrastive sentence-encoder training \citep{wang2020understanding}. The contribution is: given an operationally-motivated taxonomy, geometry predicts which types of hallucination are detectable and which are not - and the predictions hold. We propose three operational types organized by the relation of the response embedding to the plausibility region of grounded responses on the unit hypersphere, and derive from the alignment objective a prediction for each: (1)query-proximate unfaithfulness is detectable by an angular ratio; (2)confabulation outside the plausibility region produces a directional signature that outperforms NLI on expert-annotated error; (3)factual errors sharing vocabulary and frame with correct answers are not separable by angular geometry. To validate on content resembling deployment, we built a 212-pair human-confabulated dataset across nine domains using provoked confabulation.

2602.12852 2026-05-11 cs.AI

WebClipper: Efficient Evolution of Web Agents with Graph-based Trajectory Pruning

Junjie Wang, Zequn Xie, Dan Yang, Jie Feng, Yue Shen, Duolin Sun, Meixiu Long, Yihan Jiao, Zhehao Tan, Jian Wang, Peng Wei, Jinjie Gu

AI总结 WebClipper 是一种基于图结构的轨迹剪枝框架,旨在提升网络代理的搜索效率。该方法将代理的搜索过程建模为状态图,并通过挖掘最小必要有向无环图来压缩轨迹,去除冗余步骤,从而在保持推理质量的同时减少工具调用次数。实验表明,WebClipper 能有效提升代理的效率,减少约20%的工具调用轮次,并引入了 F-AE 分数作为衡量准确率与效率平衡的新指标。

Comments ACL 2026 Main

详情
英文摘要

Deep Research systems based on web agents have shown strong potential in solving complex information-seeking tasks, yet their search efficiency remains underexplored. We observe that many state-of-the-art open-source web agents rely on long tool-call trajectories with cyclic reasoning loops and exploration of unproductive branches. To address this, we propose WebClipper, a framework that compresses web agent trajectories via graph-based pruning. Concretely, we model the agent's search process as a state graph and cast trajectory optimization as a minimum-necessary Directed Acyclic Graph (DAG) mining problem, yielding pruned trajectories that preserve essential reasoning while eliminating redundant steps. Continued training on these refined trajectories enables the agent to evolve toward more efficient search patterns and reduces tool-call rounds by about 20% while improving accuracy. Furthermore, we introduce a new metric called F-AE Score to measure the model's overall performance in balancing accuracy and efficiency. Experiments demonstrate that WebClipper compresses tool-call rounds under excellent performance, providing practical insight into balancing effectiveness and efficiency in web agent design.

2602.12162 2026-05-11 cs.LG

Amortized Molecular Optimization via Group Relative Policy Optimization

Muhammad bin Javaid, Hasham Hussain, Ashima Khanna, Berke Kisin, Jonathan Pirnay, Alexander Mitsos, Dominik G. Grimm, Martin Grohe

AI总结 在结构受限的分子优化任务中,现有方法通常需要为每个新输入结构重新启动耗时的搜索过程,难以应对多个起始结构或昂贵搜索场景。本文提出AMORTIX,一种基于图变换器的免训练模型,能够在单次前向传播中直接优化分子结构,无需推理时调用搜索模块。通过在相同起始结构的优化结果中归一化奖励,该方法有效解决了训练不稳定的问题,并在多种结构约束下的分子设计任务中表现出色,显著优于现有方法。

详情
英文摘要

In structurally constrained molecular optimization, state-of-the-art methods restart an expensive oracle-driven search from scratch for every new input structure, scaling poorly to settings with many starting structures or expensive oracles. While amortized approaches that learn a transferable policy could in principle remove this bottleneck, existing methods struggle to generalize to diverse structural constraints at inference time. We present AMORTIX, an amortized Graph Transformer model that natively supports such constraints, optimizing molecular structures in a single forward pass with zero inference-time oracle calls. A central challenge for amortized training in this domain is that optimization difficulty varies drastically across starting structures. We show that, under this heterogeneity, standard reinforcement learning methods fail to stabilize training, and address this by normalizing rewards within groups of completions sharing the same starting structure. We evaluate on structurally constrained single- and multi-target kinase inhibitor design, and on a few-shot prodrug case study. AMORTIX outperforms both amortized and instance-optimization baselines on goal-directed scaffold decoration and ranks first among amortized methods on the PMO benchmark; the prodrug case study further demonstrates transfer of a learned modification rule to unseen drug structures. Code is available at https://github.com/Hash-hh/AMORTIX/.

2602.11162 2026-05-11 cs.CL

Retrieval Heads are Dynamic

Yuping Lin, Zitao Li, Yue Xing, Pengfei He, Yingqian Cui, Yaliang Li, Bolin Ding, Jingren Zhou, Jiliang Tang

AI总结 近年来的研究发现,大型语言模型中存在负责从输入上下文中提取信息的“检索头”。然而,以往的研究多依赖于跨数据集的静态统计,仅识别平均意义上的检索头,忽略了自回归生成过程中的时间动态特性。本文从动态视角出发,揭示了检索头在不同时间步上具有动态变化、不可替代以及与未来模式相关的特性,并通过实验验证了这些发现,为理解大语言模型的内部机制提供了新见解。

Comments Accepted at ACL 2026

详情
英文摘要

Recent studies have identified "retrieval heads" in Large Language Models (LLMs) responsible for extracting information from input contexts. However, prior works largely rely on static statistics aggregated across datasets, identifying heads that perform retrieval on average. This perspective overlooks the fine-grained temporal dynamics of autoregressive generation. In this paper, we investigate retrieval heads from a dynamic perspective. Through extensive analysis, we establish three core claims: (1) Dynamism: Retrieval heads vary dynamically across timesteps; (2) Irreplaceability: Dynamic retrieval heads are specific at each timestep and cannot be effectively replaced by static retrieval heads; and (3) Correlation: The model's hidden state encodes a predictive signal for future retrieval head patterns, indicating an internal planning mechanism. We validate these findings on the Needle-in-a-Haystack task and a multi-hop QA task, and quantify the differences on the utility of dynamic and static retrieval heads in a Dynamic Retrieval-Augmented Generation framework. Our study provides new insights into the internal mechanisms of LLMs.

2602.10512 2026-05-11 cs.LG cs.LO stat.ML

Exponential Sample Complexity Separation between Flat and Hierarchical Agentic Theorem Provers

Sho Sonoda, Shunta Akiyama, Yuya Uezato

AI总结 本文研究了平铺式与分层式智能定理证明器在样本复杂度上的指数级差异。作者通过将定理证明过程建模为确定性有限时间马尔可夫决策过程,并基于教师证明器生成的验证证明轨迹进行离线模仿学习,分析了两种学习方式在样本效率上的区别。结果表明,分层式学习器通过复用证明结构,能够以指数级更少的样本完成验证,从而揭示了可复用证明结构对基于验证的定理证明的重要作用。

详情
英文摘要

Agentic theorem provers often introduce intermediate lemmas, proof sketches, or subgoal decompositions before returning to tactic-level search. This can look like an expensive detour: if proving lemmas is itself hard, why should a learned prover spend effort there? We give a statistical learning answer. Instead of worst-case proof complexity over all formulas, we study the biased data distribution produced by a teacher prover: initial theorem states together with successful verified proof traces. We model proof search as a deterministic finite-horizon MDP and analyze offline imitation learning from those traces. The success bounds depend on the average length of teacher proofs, how predictable the teacher's next action is, and how accurately the student learns that local prediction problem. A flat student learns from fully inlined traces, so repeated subproofs appear many times in its training and test-time certificate. A hierarchical student instead predicts a reusable proof DAG and solves each shared block once. When flattening duplicates the same hard local argument exponentially many times, the sufficient-sample certificate produced by our bounds can be exponentially smaller for the hierarchical learner. This gives a concrete statistical mechanism by which reusable proof structure helps verifier-based theorem proving.