arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2092
2605.06610 2026-05-11 cs.LG cs.CV

SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders

Jakub Stępień, Marcin Mazur, Jacek Tabor, Przemysław Spurek

AI总结 稀疏自编码器(SAEs)在机制可解释性研究中发挥着重要作用,尤其在分析大型语言模型和视觉变换器的内部表示方面。然而,传统的Top-K SAEs采用固定稀疏度,无法适应不同输入的复杂性变化,可能导致简单输入引入噪声或复杂输入遗漏关键信息。为此,本文提出SoftSAE,通过可微分的Soft Top-K操作实现动态稀疏度选择,使模型能够根据输入复杂度自动调整激活特征数量,从而更准确地反映数据结构和信息量。实验表明,SoftSAE不仅能提取有意义的特征,还能为每个概念选择合适的特征数量。

详情
英文摘要

Sparse Autoencoders (SAEs) have become an important tool in mechanistic interpretability, helping to analyze internal representations in both Large Language Models (LLMs) and Vision Transformers (ViTs). By decomposing polysemantic activations into sparse sets of monosemantic features, SAEs aim to translate neural network computations into human-understandable concepts. However, common architectures such as TopK SAEs rely on a fixed sparsity level. They enforce the same number of active features (K) across all inputs, ignoring the varying complexity of real-world data. Natural data often lies on manifolds with varying local intrinsic dimensionality, meaning the number of relevant factors can change significantly across samples. This suggests that a fixed sparsity level is not optimal. Simple inputs may require only a few features, while more complex ones need more expressive representations. Using a constant K can therefore introduce noise in simple cases or miss important structure in more complex ones. To address this issue, we propose SoftSAE, a sparse autoencoder with a Dynamic Top-K selection mechanism. Our method uses a differentiable Soft Top-K operator to learn an input-dependent sparsity level k. This allows the model to adjust the number of active features based on the complexity of each input. As a result, the representation better matches the structure of the data, and the explanation length reflects the amount of information in the input. Experimental results confirm that SoftSAE not only finds meaningful features, but also selects the right number of features for each concept. The source code is available at: https://github.com/St0pien/SoftSAE.

2605.06474 2026-05-11 cs.LG cs.AI stat.ML

Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching

Xiang Li, Nan Jiang

AI总结 本文提出了一种名为Q-MMR的新型理论框架,用于有限时间马尔可夫决策过程中的离线策略评估。该方法通过递归重加权和矩匹配,学习一组标量权重以近似目标策略下的期望回报,并在无需依赖函数类复杂度的情况下,建立了数据依赖的有限样本保证。研究还揭示了覆盖性在离线强化学习中的本质意义,并与重要性采样和线性FQE等现有方法建立了联系。

详情
英文摘要

We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under the target policy. The weights are learned inductively in a top-down manner via a moment matching objective against a value-function discriminator class. Notably, and perhaps surprisingly, a data-dependent finite-sample guarantee for general function approximation can be established under only the realizability of $Q^π$, with a dimension-free bound -- that is, the error does not depend on the statistical complexity of the function class. We also establish connections to several existing methods, such as importance sampling and linear FQE. Further theoretical analyses shed new light on the nature of coverage, a concept of fundamental importance to offline RL.

2605.06353 2026-05-11 cs.CL

SEQUOR: A Multi-Turn Benchmark for Realistic Constraint Following

Beatriz Canaverde, Duarte M. Alves, José Pombal, Giuseppe Attanasio, André F. T. Martins

AI总结 SEQUOR 是一个用于评估模型在长期多轮对话中遵循约束能力的自动基准测试。该研究通过模拟基于真实对话的个性化交互,揭示了当前模型在面对持续、复杂或变化的用户指令时表现不佳的问题。实验表明,随着对话轮次增加或约束条件变化,模型的遵循准确率显著下降,突显了长期多轮指令遵循任务的挑战性。

详情
英文摘要

In a conversation, a helpful assistant must reliably follow user directives, even as they refine, modify, or contradict earlier requests. Yet most instruction-following benchmarks focus on single-turn or short multi-turn scenarios, leaving open how well models handle long-horizon instruction-following tasks. To bridge this gap, we present SEQUOR, an automatic benchmark for evaluating constraint adherence in long multi-turn conversations. SEQUOR consists of simulated persona-driven interactions built with constraints extracted from real-world conversations. Our results show that even when following a single constraint, instruction-following accuracy consistently decreases as the conversation grows longer, with drops exceeding 11%. This decline becomes larger when models have to follow multiple constraints simultaneously, reducing their accuracy by over 40%. In scenarios where constraints are added or replaced at arbitrary points of the conversation, model accuracy decreases by more than 9%. Taken together, our results reveal that current models still struggle to follow user instructions in multi-turn conversations, and provide a way for better measuring instruction-following capabilities in assistants.

2605.06173 2026-05-11 cs.CV cs.AI

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

Abdelrahman Zaian, Sheethal Bhat, Mohamed Abdalkader, Andreas Maier

AI总结 该研究提出了一种名为Retina-RAG的低成本模块化框架,用于同时进行糖尿病视网膜病变严重程度分级、黄斑水肿检测和临床报告生成。该方法结合高性能视网膜分类器与参数高效的视觉-语言模型,并通过检索增强生成模块注入眼科知识,以提升诊断一致性和减少错误生成。实验表明,Retina-RAG在多个指标上显著优于现有方法,且可在普通消费级GPU上运行,展示了在有限计算资源下实现临床结构化视网膜AI的可行性。

Comments 10 pages, 5 figures. Submitted to MICCAI 2026

详情
英文摘要

Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework that jointly performs DR severity grading, macular edema (ME) detection, and report generation. The architecture decouples a high-performance retinal classifier and a parameter-efficient vision-language model (Qwen2.5-VL-7B-Instruct) adapted via Low-Rank Adaptation (LoRA), enabling flexible component integration. A retrieval-augmented generation (RAG) module injects curated ophthalmic knowledge together with structured classifier outputs at inference time to improve diagnostic consistency and reduce hallucinations. Retina-RAG achieves an F1-score of 0.731 for DR grading and 0.948 for ME detection, substantially outperforming zero-shot Qwen (0.096, 0.732) and MMed-RAG (0.541, 0.641) on a retinal disease detection dataset with captions. For report generation, Retina-RAG attains ROUGE-L 0.438 and SBERT similarity 0.884, exceeding all baselines. The full framework operates on a single consumer-grade GPU, demonstrating that clinically structured retinal AI can be achieved with modest computational resources.

2605.06110 2026-05-11 cs.AI cs.CL

On Time, Within Budget: Constraint-Driven Online Resource Allocation for Agentic Workflows

Xinglin Wang, Zishen Liu, Shaoxiong Feng, Peiwen Yuan, Yiwei Li, Jiayi Shi, Yueqi Zhang, Chuyi Tan, Ji Zhang, Boyuan Pan, Yao Hu, Kan Li

AI总结 该研究聚焦于满足预算和截止时间约束的智能体工作流在线资源分配问题。面对复杂任务分解后的子任务调度,研究提出了一种基于依赖结构的动态资源分配方法,旨在最大化在给定预算和时间限制下完成整个工作流的概率。为此,作者提出了蒙特卡洛投资组合规划(MCPP)算法,通过模拟执行流程并根据实时结果进行重规划,有效提升了在多种约束条件下的任务完成成功率。

Comments Preprint

详情
英文摘要

Agentic systems increasingly solve complex user requests by executing orchestrated workflows, where subtasks are assigned to specialized models or tools and coordinated according to their dependencies. While recent work improves agent efficiency by optimizing the performance--cost--latency frontier, real deployments often impose concrete requirements: a workflow must be completed within a specified budget and before a specified deadline. This shifts the goal from average efficiency optimization to maximizing the probability that the entire workflow completes successfully under explicit budget and deadline constraints. We study \emph{constraint-driven online resource allocation for agentic workflows}. Given a dependency-structured workflow and estimates of success rates and generation lengths for each subtask--model pair, the executor dynamically allocates models and parallel samples across simultaneously executable subtasks while managing the remaining budget and time. We formulate this setting as a finite-horizon stochastic online allocation problem and propose \emph{Monte Carlo Portfolio Planning} (MCPP), a lightweight closed-loop planner that directly estimates constrained completion probability through simulated workflow executions and replans after observed outcomes. Experiments on CodeFlow and ProofFlow demonstrate that MCPP consistently improves constrained completion probability over strong baselines across a wide range of budget--deadline constraints.

2605.05957 2026-05-11 cs.LG

Knowing but Not Correcting: Routine Task Requests Suppress Factual Correction in LLMs

Zixuan Chen, Hao Lin, Zizhe Chen, Yizhou Tian, Garry Yang, Depeng Wang, Ya Guo, Huijia Zhu, James Cheng

AI总结 研究表明,大型语言模型在面对孤立的错误声明时能够可靠地进行纠正,但在任务导向的请求中却往往选择顺从而非纠正,这种现象被称为“纠正抑制”。研究通过构建包含300个错误前提的基准,发现多个模型在任务场景下抑制纠正的比例高达90%。分析表明,模型并非缺乏知识,而是在任务上下文影响下,注意力被引导至顺从输出,从而抑制了纠正行为。研究提出了两种无需训练的干预方法,有效提升了模型的事实严谨性。

详情
英文摘要

LLMs reliably correct false claims when presented in isolation, yet when the same claims are embedded in task-oriented requests, they often comply rather than correct. We term this failure mode \emph{correction suppression} and construct a benchmark of 300 false premises to systematically evaluate it across eight models. Suppression rates range from 19\% to 90\%, with four models exceeding 80\%, establishing correction suppression as a prevalent and severe phenomenon. Mechanistic analysis reveals that suppression is not a knowledge failure: the model registers the error internally but task context diverts early-layer attention from the false claim as output intent crystallizes toward compliance at middle layers. We characterize this as \emph{knowing but not correcting} -- suppression occurs at response selection rather than knowledge encoding. Guided by this mechanism, we propose two training-free interventions. Correction Direction Steering (CDS) estimates a correction-compliance direction from matched pairs and injects it at middle layers before output intent crystallizes. Dynamic Payload Amplification (DPA) localizes payload tokens via attention divergence between early and late layers and amplifies their representation at the final layer, requiring no calibration data. Experiments on Qwen3.5-9B and LLaMA3.1-8B show both methods substantially improve factual strictness. CDS achieves the highest correction rate on Qwen3.5-9B (0\%$\to$58.2\%). DPA is the only method that preserves or improves reasoning capability on both models. These findings introduce \emph{factual strictness} -- the willingness to uphold accuracy against contextual pressures -- as a new dimension of model reliability.

2605.05848 2026-05-11 cs.CV cs.AI

VideoRouter: Query-Adaptive Dual Routing for Efficient Long-Video Understanding

Kuanwei Lin, Wenhao Zhang, Ge Li

AI总结 随着长视频内容的增多,视频大模型在推理时面临内存和延迟的挑战。为此,本文提出VideoRouter,一种基于InternVL的查询自适应双路由框架,通过语义路由和图像路由分别预测时间覆盖策略和帧相关性,实现对不重要帧的高效压缩与关键帧的细节保留。该方法在多个基准测试中表现出色,在保持或降低计算预算的情况下显著提升了模型性能。

详情
英文摘要

Video large multimodal models increasingly face a scalability bottleneck: long videos produce excessively long visual-token sequences, which sharply increase memory and latency during inference. While existing compression methods are effective in specific settings, most are either weakly query-aware or apply a fixed compression policy across frames, proving suboptimal when visual evidence is unevenly distributed over time. To address this, we present VideoRouter, a query-adaptive dual-router framework built on InternVL for budgeted evidence allocation. The Semantic Router predicts the dominant allocation policy, choosing between broad temporal coverage and adaptive high-resolution preservation, while the Image Router uses early LLM layers to score frame relevance. This enables aggressive compression on less relevant frames while preserving detail on critical evidence frames. To train both routers, we build Video-QTR-10K for allocation-policy supervision and Video-FLR-200K for frame-relevance supervision. Experiments on VideoMME, MLVU, and LongVideoBench show that VideoRouter consistently improves over the InternVL baseline under comparable or lower budgets, achieving up to a 67.9% token reduction.

2605.05497 2026-05-11 cs.LG

Online Localized Conformal Prediction

Yuheng Lai, Garvesh Raskutti

AI总结 本文提出了一种名为“在线局部化共形预测”(OLCP)的新方法,旨在解决在线学习和时间序列场景下传统共形预测因数据非交换性而无法有效量化不确定性的问题。该方法结合在线自适应与协变量依赖的局部化策略,以更好地应对数据异质性,并进一步开发了OLCP-Hedge算法,通过在线凸优化框架实现带宽选择,提升鲁棒性。实验表明,新方法在保证长期覆盖率的同时,相比现有方法具有更窄的预测区间。

详情
英文摘要

Conformal prediction is a framework that provides valid uncertainty quantification for general models with exchangeable data. However, in the online learning and time-series settings, exchangeability is not satisfied. Existing online conformal methods, such as adaptive conformal inference (ACI), can achieve long-run validity, yet they remain inefficient under covariate heterogeneity because they rely on global calibration. We propose \emph{Online Localized Conformal Prediction (OLCP)}, which combines online adaptation with covariate-dependent localization to better reflect heterogeneity. To reduce sensitivity to the localization bandwidth, we further develop \emph{OLCP-Hedge}, which performs bandwidth selection as an online expert aggregation problem using a constrained online convex optimization framework. Importantly, we provide coverage guarantees for both algorithms and demonstrate through simulations and real-data experiments that the proposed methods attain valid long-run coverage with narrower prediction sets than existing baselines.

2605.05110 2026-05-11 cs.RO cs.AI

LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts

Seungeun Rho, Shamel Fahmi, Jeonghwan Kim, Arianna Ilvonen, Sehoon Ha, Gabriel Nelson

AI总结 本文提出了一种名为 LineRides 的基于轨迹引导的强化学习框架,用于训练自行车机器人完成高难度特技动作。该方法无需示教或明确的时间信息,仅通过用户提供的空间轨迹和关键姿态即可学习多样化的可控特技行为。LineRides 引入了跟踪裕度以处理不可行轨迹,并通过轨迹距离和关键姿态序列解决时间模糊性问题,实验表明该方法在 Ultra Mobility Vehicle 平台上实现了多种特技动作的流畅切换与执行。

Comments Published in IEEE Robotics and Automation Letters (RA-L), 2026

详情
英文摘要

Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guided learning framework that enables a custom bicycle robot to acquire diverse, commandable stunt behaviors from a user-provided spatial guideline and sparse key-orientations, without demonstrations or explicit timing. LineRides handles physically infeasible guidelines using a tracking margin that permits controlled deviation, resolves temporal ambiguity by measuring progress via traveled distance along the guideline, and disambiguates motion details through position- and sequence-based key-orientations. We evaluate LineRides on the Ultra Mobility Vehicle (UMV) and show that the policy trained with our methods supports seamless transitions between normal driving and stunt execution, enabling five distinct stunts on command: MiniHop, LargeHop, ThreePointTurn, Backflip, and DriftTurn.

2605.04712 2026-05-11 cs.LG

SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning

Lirui Luo, Guoxi Zhang, Hongming Xu, Cong Fang, Qing Li

AI总结 在深度强化学习中,智能体在持续学习过程中可能会出现“可塑性丧失”问题,即其学习新技能的能力随训练时间增加而下降。本文针对基于专家混合(MoE)网络的策略在持续学习中表现出的可塑性退化问题,提出了一种基于神经切线核理论的解决方案,将可塑性丧失形式化为谱可塑性损失,并设计了一种可计算的代理指标。基于此,作者提出了SPHERE方法,通过引入Parseval正则化惩罚,有效缓解了MoE策略在持续学习中的谱可塑性损失,实验表明该方法在多个基准任务中显著提升了持续学习性能。

Comments Accepted to ICML 2026

详情
英文摘要

In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixture-of-Experts (MoE) networks have been reported to enable scaling laws and facilitate the learning of diverse skills. However, in continual reinforcement learning settings, their performance can degenerate as learning proceeds, indicating a loss of plasticity. To address this, building on Neural Tangent Kernel (NTK) theory, we formalize the plasticity loss in MoE policies as a loss of spectral plasticity. We then derive a tractable proxy for spectral plasticity, one expressible in terms of individual expert feature matrices. Leveraging this proxy, we introduce SPHERE, a practical Parseval penalty tailored for MoE-based policies that alleviates the loss of spectral plasticity. On MetaWorld and HumanoidBench, SPHERE improves average success under continual RL by 133% and 50% over an unregularized MoE baseline, while maintaining higher spectral plasticity throughout training.

2605.04651 2026-05-11 cs.LG cs.CL

FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation

Guangsheng Bao, Hongbo Zhang, Han Cui, Ke Sun, Yanbin Zhao, Juncai He, Yue Zhang

AI总结 该论文提出了一种名为FAAST的前向-only关联学习方法,用于在测试时进行有监督适配。FAAST通过一次性解析标注样本来生成快速权重,无需反向传播或依赖记忆/上下文,从而实现常数时间推理并解耦任务适配与预训练表示。实验表明,FAAST在图像分类和语言建模任务中表现优异,相比传统方法大幅减少了适配时间和内存消耗,是一种高效且可扩展的解决方案。

Comments 9 pages, 6 figures, 10 tables

详情
英文摘要

Adapting pretrained models typically involves a trade-off between the high training costs of backpropagation and the heavy inference overhead of memory-based or in-context learning. We propose FAAST, a forward-only associative adaptation method that analytically compiles labeled examples into fast weights in a single pass. By eliminating memory or context dependence, FAAST achieves constant-time inference and decouples task adaptation from pretrained representation. Across image classification and language modeling benchmarks, FAAST matches or exceeds backprop-based adaptation while reducing adaptation time by over 90% and is competitive to memory/context-based adaptation while saving memory usage by up to 95%. These results demonstrate FAAST as a highly efficient, scalable solution for supervised task adaptation, particularly for resource-constrained models. We release the code and models at https://github.com/baoguangsheng/faast.

2605.04460 2026-05-11 cs.LG

Discovering Sparse Counterfactual Factors via Latent Adjustment for Survey-based Community Intervention

Fatima Ashraf, Muhammad Ayub Sabir, Junbiao Pang, Yufang Zhou, Yan Shang

AI总结 该研究旨在从基于调查的社区干预数据中发现稀疏且可行的反事实干预策略,以引导目标群体向参考群体转变。研究提出了一种基于固定基非负潜在表示的方法,通过可解释的潜在因素调整实现分布对齐,并结合Shapley值指导的归因分析和熵正则化的最优传输方法,学习出具有稀疏性且易于实施的群体级干预方案。实验表明,该方法在真实交通调查数据上有效提升了群体转化效果,同时保持了干预策略的简洁性和可操作性。

详情
英文摘要

Transportation surveys are widely used to understand travel preferences and adoption barriers, yet most survey-based analyses remain descriptive or predictive and rarely provide sparse, policy-feasible intervention strategies. We study sparse counterfactual community intervention from survey responses, where the goal is to shift a target respondent group toward a desired reference group through controllable survey-variable adjustments. We formulate this task as a policy-feasible distributional alignment problem using a fixed-basis nonnegative latent representation that preserves pre/post comparability and provides a stable map from latent factors to original variables. To make latent movement actionable, target-relevant latent factors are identified through Shapley-guided attribution and transferred to controllable variables as intervention priorities. Feasible group-level adjustments are then learned by minimizing an entropy-regularized optimal-transport discrepancy between the post-intervention target distribution and the reference distribution, together with a weighted $\ell_{2,1}$ penalty that promotes shared policy-lever sparsity. Experiments on real-world transportation survey datasets show that the proposed framework produces compact and interpretable policy-feasible interventions with explicit adjustment magnitudes, improves population-level conversion, and preserves intervention sparsity. Code and datasets are publicly available at: https://github.com/pangjunbiao/latent-group-alignment.git

2605.04323 2026-05-11 cs.LG cs.DB

LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems

Kuangdai Leng, Simon Jeffery, Panos Panagos, Tarje Nissen-Meyer

AI总结 LUCAS-MEGA 是一个大规模多模态数据集,旨在推动土壤-环境系统的表示学习研究。该数据集通过系统融合欧洲土壤环境观测数据构建,包含超过7万个样本和1000多个涵盖物理、化学、生物等多方面的特征。研究提出了一种名为 SoilFuser 的多智能体数据融合框架,用于标准化异构数据并生成统一的机器学习特征空间,并基于该数据集预训练了多模态表格模型 SoilFormer,展示了其在不确定性感知预测和土壤过程建模中的有效性。

Comments 27 pages, 7 figures, 1 table

详情
英文摘要

Understanding soil is fundamental to agriculture, carbon cycling, and environmental sustainability, yet progress is limited by fragmented and heterogeneous datasets that constrain modeling to small-scale predictive settings rather than high-dimensional representation learning. We introduce LUCAS-MEGA, a large-scale multimodal dataset constructed through systematic data fusion of European soil-environment observations, with the LUCAS survey as its backbone. The fused dataset comprises over 70,000 samples and more than 1,000 features spanning physical, chemical, environmental, biological, and visual attributes, aggregated from 68 source datasets. To enable integration at scale, we develop SoilFuser, a multi-agent, human-in-the-loop data fusion pipeline that standardizes heterogeneous data formats and measurement protocols, resolves inconsistencies and invalid entries (e.g., unit inconsistencies, codebook mismatches, and erroneous values), incorporates natural language annotations, and harmonizes multimodal attributes and metadata into a unified, machine learning-ready feature space. The resulting dataset captures key characteristics of real-world soil observations, including multimodality, uneven feature coverage, and heterogeneous uncertainty. To demonstrate the usability of LUCAS-MEGA for data-driven modeling, we pretrain a multimodal tabular transformer (SoilFormer) using a self-supervised objective based on feature masking, achieving stable training, strong predictive performance, and representations that support uncertainty-aware prediction. We further show that the learned representations recover relationships consistent with established soil processes. LUCAS-MEGA is released with open access and is accompanied by composable, agent-friendly APIs that support structured querying and data-driven workflows.

2605.04081 2026-05-11 cs.LG cs.AI

Time series causal discovery with variable lags

Bruno Petrungaro, Anthony C. Constantinou

AI总结 该研究旨在从时间序列数据中发现变量之间的因果关系,并允许不同变量之间存在不同的时间滞后。提出了一种基于禁忌搜索的结构学习算法,能够在指定最大滞后范围内为每条边分配特定的滞后长度,从而更灵活地建模时间依赖关系。该方法结合了基于BIC的可分解评分函数和节点特定的有效样本大小,同时引入滞后长度惩罚项以促进简洁的滞后分配,并提供了理论上的有效性与局部最优性保证。实验表明,该方法在模拟数据和真实疫情政策数据中均能准确恢复因果结构和滞后关系。

详情
英文摘要

Causal Bayesian Networks (CBNs) are a powerful tool for reasoning under uncertainty about complex real-world problems. Such problems evolve over time, responding to external shocks as they occur. To support decision-making, CBNs require a cause-and-effect map of the variables under consideration, known as the network's structure. Learning the graphical structure of a causal model from data remains challenging; learning it from time-series data is even harder because dependencies may arise at different time lags. Existing time-series causal discovery methods often assume a fixed lag window and do not explicitly optimise edge-specific lags. We propose a Tabu-based structure learning algorithm that searches for a time-ordered directed structure (i.e., where every edge respects time) while allowing edge-specific lags up to a specified maximum lag. The approach uses a decomposable BIC-based score with node-specific effective sample sizes and an explicit lag-length penalty encouraging parsimonious delay assignments while preserving efficient local score updates. We provide theoretical guarantees of validity and local optimality, and we also describe a parallel implementation for improved scalability. In simulations, the method recovered graph structure competitively and estimated lags accurately when true adjacencies were recovered. On a real-world UK COVID-19 policy dataset, the learnt structure was dominated by short delays while retaining a substantial minority of longer-lag dependencies, consistent with delayed behavioural and epidemiological effects.

2605.04035 2026-05-11 cs.CV cs.LG

Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

Evangelos Ntavelis, Sean Wu, Mohamad Shahbazi, Fabio Maninchedda, Dmitry Kostiaev, Artem Sevastopolsky, Vittorio Megaro, Trevor Phillips, Alejandro Blumentals, Shridhar Ravikumar, Mehak Gupta, Reinhard Knothe, Jeronimo Bayer, Matthias Vestner, Simon Schaefer, Thomas Etterlin, Christian Zimmermann, Mathias Deschler, Peter Kaufmann, Stefan Brugger, Sebastian Martin, Brian Amberg, Tom Runia

AI总结 本文提出了一种名为 HeadsUp 的高效前馈方法,用于从大规模多视角摄像机捕获中重建高质量的3D高斯人脸模型。该方法采用编码-解码架构,将输入视角压缩为紧凑的潜在表示,并将其解码为基于中性人脸模板的UV参数化3D高斯分布,从而实现输入图像数量和分辨率与3D高斯数量的解耦。实验在包含超过10,000个主体的内部数据集上进行,模型在重建质量、泛化能力和计算效率方面均达到先进水平,并展示了其在生成新身份和表情动画中的应用潜力。

Comments Project page: https://apple.github.io/ml-headsup/

详情
英文摘要

We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation. This latent representation is then decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This UV representation decouples the number of 3D Gaussians from the number and resolution of input images, enabling training with many high-resolution input views. We train and evaluate our model on an internal dataset with more than 10,000 subjects, which is an order of magnitude larger than existing multi-view human head datasets. HeadsUp achieves state-of-the-art reconstruction quality and generalizes to novel identities without test-time optimization. We extensively analyze the scaling behavior of our model across identities, views, and model capacity, revealing practical insights for quality-compute trade-offs. Finally, we highlight the strength of our latent space by showcasing two downstream applications: generating novel 3D identities and animating the 3D heads with expression blendshapes.

2605.03327 2026-05-11 cs.LG cs.AI

DGPO: Distribution Guided Policy Optimization for Fine Grained Credit Assignment

Hongbo Jin, Rongpeng Zhu, Zhongjing Du, Xu Jiang, Jingqi Tian, Qiaoman Zhang, Jiayu Ding

AI总结 该研究提出了一种名为DGPO的分布引导策略优化方法,旨在解决大语言模型在复杂推理任务中细粒度信用分配的问题。DGPO通过将分布偏差作为引导信号而非严格惩罚,结合熵门控机制,有效区分真实推理突破与幻觉噪声,并实现对关键推理步骤的精准激励。该方法无需额外价值网络,显著提升了推理轨迹的探索效率,在多个基准测试中取得了优于现有方法的优异性能。

详情
英文摘要

Reinforcement learning is crucial for aligning large language models to perform complex reasoning tasks. However, current algorithms such as Group Relative Policy Optimization suffer from coarse grained, sequence level credit assignment, which severely struggles to isolate pivotal reasoning steps within long Chain of Thought generations. Furthermore, the standard unbounded Kullback Leibler divergence penalty induces severe gradient instability and mode seeking conservatism, ultimately stifling the discovery of novel reasoning trajectories. To overcome these limitations, we introduce Distribution Guided Policy Optimization, a novel critic free reinforcement learning framework that reinterprets distribution deviation as a guiding signal rather than a rigid penalty. DGPO replaces the volatile KL divergence with the bounded Hellinger distance to safely quantify token level exploration without the risk of gradient explosion. To effectively distinguish genuine reasoning breakthroughs from hallucinatory noise, we propose an entropy gating mechanism that scales this deviation by the policy`s epistemic uncertainty. By dynamically redistributing the coarse sequence-level advantage to individual tokens based on these gated scores, DGPO heavily incentivizes critical exploratory steps while suppressing unwarranted, low-entropy deviations. Consequently, DGPO completely eliminates the traditional token-level KL penalty and achieves fine-grained credit reallocation without the computational overhead of an additional value network. Extensive empirical evaluations demonstrate that DGPO sets a new state-of-the-art for critic free alignment. Notably, on the Qwen2.5-32B architecture, DGPO achieves 60.0% Avg@32 accuracy and 46.0% Avg@32 accuracy on the challenging AIME2024 and AIME2025 benchmarks respectively, substantially outperforming competitive baselines like DAPO.

2605.02357 2026-05-11 cs.CV

Channel-Level Relation to Attentive Aggregation with Neighborhood-Homogeneity Constraint for Point Cloud Analysis

Jiaqi Shi, Jin Xiao, Xiaoguang Hu, Wenxuan Ji, Zichong Jia, Zifan Long, Tianyou Chen, Baochang Zhang

AI总结 在三维点云理解中,如何准确捕捉复杂邻域中的判别性特征是核心挑战,这对下游任务如具身人工智能和自动驾驶的执行精度有直接影响。为解决现有方法在点级或通道级特征评估中信息损失严重的问题,本文提出PointCRA网络,引入时间趋势变化作为新的评估维度,并结合邻域同质性约束构建多级校准框架,提升通道级特征的判别能力。实验表明,PointCRA在多个基准数据集上取得了优异的性能,并具有良好的可解释性、可迁移性和高效性。

详情
英文摘要

In 3D point cloud understanding, the core challenge lies in accurately capturing discriminative features within complex neighborhoods, which directly affects the execution precision of downstream tasks such as embodied AI and autonomous driving. Existing methods explore feature correlation discrimination but are limited to point-level spatial distribution or channel responses, enabling only coarse-grained level evaluation. For modern multi-scale point cloud networks, such coarse-grained metrics inevitably incur significant information loss in deeper layers. To address this, we propose PointCRA, a novel network with a channel-level metric-based enhancement mechanism. Our core idea is to introduce temporal trend variation as a new evaluation dimension to avoid the information loss caused by weight dimension collapse in existing spatial and channel attention mechanisms. On this basis, we construct a multi-level calibration framework guided by neighborhood homogeneity for weight calibration, and design a dedicated loss function to enhance channel discriminability.PointCRA leverages intrinsic feature priors to adaptively correct feature aggregation, offering interpretability with low parameter overhead. Our method is transferable, interpretable, and efficient. We validate the proposed method on diverse datasets and benchmark models, and further demonstrate its rationality through extensive analytical experiments. Our PointCRA achieves 77.5\% mIoU on the S3DIS dataset, 90.4\% OA on the ScanObjectNN dataset, and 87.4\% instance mIoU on the ShapeNetPart dataset. The code and pretrained weights are publicly available on GitHub: https://github.com/AGENT9717/PointCRA

2605.02196 2026-05-11 cs.LG

DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning

Abdullah Ahmad Khan, Ferdous Sohel

AI总结 机器卸载旨在移除特定训练数据以满足隐私法规要求,但现有研究大多假设卸载与部署时精度一致,忽略了实际大模型常以低精度部署。本文提出量化恢复攻击(QRA),指出INT4量化会在模型通过BF16合规性检查后恢复被遗忘的内容,且INT4量化下恢复强度可达22倍。为应对这一问题,作者提出DURABLEUN-SAF方法,在保证遗忘效果和模型性能的同时提升量化鲁棒性,并在多个数据集上验证了其有效性。

详情
英文摘要

Machine unlearning aims to remove specified training data to satisfy privacy regulations such as GDPR. However, existing evaluations assume identical precision at unlearning and deployment, overlooking that production LLMs are deployed at low-bit precision. We show that INT4 quantization systematically restores forgotten content even when models pass compliance audits at bfloat16 (BF16), we term this the quantization recovery attack (QRA). We conduct the first systematic study of unlearning robustness under adapter-space INT4 quantization in the NF4+LoRA regime, evaluating seven methods on LLaMA-3-8B-Instruct across TOFU, MUSE-News, and WikiBio-WPU. INT8 is benign; INT4 induces recovery of up to 22x, worsening with dataset difficulty. We identify the FA-RA-Q-INT4 trilemma: no method simultaneously achieves strong forgetting, high utility, and quantization robustness. A dense Pareto sweep reveals a sharp phase transition once robustness is achieved, retaining accuracy collapses regardless of further tuning. To address this, we propose DURABLEUN-SAF (Sharpness-Aware Forgetting), a quantization-aware objective using Straight-Through Estimator gradients through INT4 rounding. DURABLEUN-SAF is the only method to achieve a stable empirical (0.047, {BF16, INT8, INT4})- durability certificate: Q-INT4= 0.043 +- 0.002, cert rate= 3/3, versus SalUn's cert rate= 1/3 at its own published hyperparameters. We call for Q-INT4 to be adopted as a standard evaluation metric alongside FA and RA.

2605.02073 2026-05-11 cs.CL

Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning

Arash Ahmadi, Sarah Sharif, Yaser, Banad

AI总结 该研究旨在提升大语言模型在数学推理任务中的表现,通过优化奖励函数来改进强化学习的效果。研究提出了一种基于搜索的框架,利用前沿语言模型生成候选奖励函数,并通过多轮迭代验证与优化,最终选出性能最佳的奖励函数组合。实验表明,该方法在GSM8K数据集上显著提升了模型的F1分数,优于传统基线方法。

详情
英文摘要

Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward function that drives policy optimization. This paper introduces a search-driven framework that treats the reward specification itself as an object of optimization. The setting of interest is one in which the base model is held fixed and the reward specification is the primary remaining design lever. Candidate reward functions are generated by a frontier language model, validated automatically, screened through 500-step Group Relative Policy Optimization (GRPO) training runs on a Llama-3.2-3B-Instruct base model with Low-Rank Adaptation (LoRA), and ranked by F1 on the GSM8K test set. Ranked summaries from prior rounds are then fed back into the next round of generation. Over five rounds, the search produces 50 candidate rewards. The mean F1 rises from 0.596 in Round 1 to 0.632 in Round 5, and the top individual reward reaches F1 = 0.787. Seven ensemble configurations of top-ranked rewards are evaluated. The best ensemble achieves F1 = 0.795 (95% bootstrap CI [0.756, 0.832]) and accuracy 0.660 [0.635, 0.686], a 0.19 absolute F1 gain over a base-rewards-only GRPO baseline (F1 = 0.609). Pairwise McNemar tests with Bonferroni correction show all five-or-more-reward configurations are statistically indistinguishable at α = 0.05/21. A three-seed re-training of the best ensemble yields F1 of 0.785. A randomly drawn 5-reward control collapses to F1 = 0.047, which shows that the ranked-feedback loop, not the additive signal of having more rewards, drives the gain.

2605.01482 2026-05-11 cs.AI

Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization

Yunhan Bu, Quan Zhang, Huaping Zhang, Guotong Geng, Chunxiao Gao, Askar Hamdulla, Juan Wang, Qiuchi Li, Baohua Zhang, Shuai Lei, Yunbo Cao, Zhunchen Luo

AI总结 本文研究了多跳事实验证(MHFV)中复杂推理的问题,针对大语言模型在逻辑链断裂和幻觉方面的缺陷,提出了一种基于结构因果模型(SCM)的新框架,将验证过程建模为因果推理任务。通过引入基于规则的强化学习策略——组相对策略优化(GRPO),动态平衡推理链的深度与简洁性,实验表明该方法在多个数据集上显著优于现有方法,为复杂事实验证提供了可靠且可解释的解决方案。

详情
英文摘要

Multi-Hop Fact Verification (MHFV) necessitates complex reasoning across disparate evidence, posing significant challenges for Large Language Models (LLMs) which often suffer from hallucinations and fractured logical chains. Existing methods, while improving transparency via Chain-of-Thought (CoT), lack explicit modeling of the causal dependencies between evidence and claims. In this work, we introduce a novel framework that grounds reasoning in a Structural Causal Model (SCM), treating verification as a constructive causal inference process. We empirically identify an "inverted U-shaped" correlation between reasoning chain length and accuracy, revealing that excessive structural complexity degrades performance. To address this, we propose a Rule-based Reinforcement Learning strategy using Group Relative Policy Optimization (GRPO). This approach dynamically optimizes the trade-off between structural depth and conciseness. Extensive experiments on HoVer and EX-FEVER demonstrate that our SCM-GRPO framework significantly outperforms state-of-the-art baselines, offering a reliable and interpretable solution for complex fact verification.

2605.01288 2026-05-11 cs.LG cond-mat.dis-nn stat.ML

A Theory of Saddle Escape in Deep Nonlinear Networks

Divit Rawal, Michael R. DeWeese

AI总结 本文研究了深度非线性网络在小初始化条件下训练过程中出现的长时间平坦期及突变特征获取现象。通过推导适用于任意平滑激活函数和可微损失函数的矩阵Frobenius范数不平衡恒等式,作者将激活函数分为四类通用类别,并在对称子流形上将矩阵演化简化为标量ODE,得出了临界深度逃逸时间与瓶颈层数相关的解析公式。理论结果与数值模拟高度一致,揭示了深度网络训练动态中瓶颈结构对逃逸时间的关键影响。

详情
英文摘要

In deep networks with small initialization, training exhibits long plateaus separated by sharp feature-acquisition transitions. Whereas shallow nonlinear networks and deep linear networks are well studied, extending these analyses to deep nonlinear networks remains challenging. We derive an exact identity for the imbalance of Frobenius norms of layer weight matrices that holds for any smooth activation and any differentiable loss and use this to classify activation functions into four universality classes. On the permutation-symmetric submanifold, the identity combines with an approximate balance law to reduce the full matrix flow to a scalar ODE, giving a critical-depth escape time law $τ_\star = Θ(\varepsilon^{-(r-2)})$ governed by the number $r$ of layers at the bottleneck scale rather than the total depth $L$. We find that this same $r-2$ exponent is recovered under He-normal initialization with $r$ bottleneck layers rescaled by $\varepsilon$, where the symmetry manifold is preserved by the flow but not attracting. We find close agreement between our theory and numerical simulations.

2605.00814 2026-05-11 cs.CV cs.AI

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Zefeng He, Muxin Fu, Daizong Liu, Wei-Long Zheng, Yu Cheng

AI总结 尽管自回归的大型视觉-语言模型(LVLMs)在多模态任务中表现出色,但在生成过程中会出现“视觉信号稀释”现象,导致视觉注意力随着生成长度增加而衰减。为解决这一问题,本文提出了一种轻量可学习模块——持久视觉记忆(PVM),通过并行于前馈网络(FFN)的分支,建立一种与距离无关的视觉信息检索路径,从而增强模型对视觉信息的持续感知能力。实验表明,PVM在参数开销极小的情况下显著提升了模型性能,尤其在需要长期视觉感知的复杂推理任务中表现突出。

详情
英文摘要

While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence length. To counteract this, we propose Persistent Visual Memory (PVM), a lightweight learnable module designed to strengthen sustained, on-demand access to visual evidence. Integrated as a parallel branch alongside the Feed-Forward Network (FFN) in LVLMs, PVM establishes a distance-agnostic retrieval pathway that directly provides visual embeddings for enhanced visual perception, thereby structurally mitigating the signal suppression inherent to deep generation. Extensive experiments on Qwen3-VL models demonstrate that PVM brings notable improvements with negligible parameter overhead, delivering consistent average accuracy gains across both 4B and 8B scales, particularly in complex reasoning tasks that demand persistent visual perception. Furthermore, in-depth analysis reveals that PVM shows improved robustness in longer generations and accelerates internal prediction convergence.

2605.00380 2026-05-11 cs.LG cs.CL

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

Zihan Lin, Xiaohan Wang, Jie Cao, Jiajun Chai, Li Wang, Xiaodong Lu, Wei Lin, Ran He, Guojun Yin

AI总结 该论文提出了一种名为ResRL的新方法,旨在提升大语言模型的推理能力,同时保持生成多样性。ResRL通过引入负样本投影残差强化学习,将正负样本之间的语义分布解耦,并利用低秩正空间投影和梯度调制策略,在增强推理性能的同时避免多样性下降。实验表明,ResRL在多个基准任务中优于现有方法,尤其在数学推理任务上取得了显著提升。

Comments Accepted to ICML 2026. Preprint version. https://github.com/1229095296/ResRL.git

详情
英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement (NSR) mitigate this issue by upweighting penalty from negative samples, they may suppress the semantic distributions shared between positive and negative responses. To boost reasoning ability without losing diversity, this paper proposes negative sample projection Residual Reinforcement Learning (ResRL) that decouples similar semantic distributions among positive and negative responses. We theoretically link Lazy Likelihood Displacement (LLD) to negative-positive head-gradient interference and derive a single-forward proxy that upper-bounds representation alignment to guide conservative advantage reweighting. ResRL then projects negative-token hidden representations onto an SVD-based low-rank positive subspace and uses projection residuals to modulate negative gradients, improving reasoning while preserving diversity and outperforming strong baselines on average across twelve benchmarks spanning Mathematics, Code, Agent Tasks, and Function Calling. Notably, ResRL surpasses NSR on mathematical reasoning by 9.4\% in Avg@16 and 7.0\% in Pass@128. Code is available at https://github.com/1229095296/ResRL.git.

2604.26509 2026-05-11 cs.RO cs.CV

3D Generation for Embodied AI and Robotic Simulation: A Survey

Tianwei Ye, Yifan Mao, Minwen Liao, Jian Liu, Chunchao Guo, Dazhao Du, Quanxin Shou, Fangqi Zhu, Song Guo

AI总结 本文综述了用于具身人工智能和机器人仿真中的3D生成技术,重点探讨了其在生成可交互对象、构建任务导向仿真环境以及促进仿真到现实迁移中的三大作用。研究指出,当前领域正从追求视觉真实转向注重交互能力,并指出了物理注释不足、几何质量与物理合理性不匹配等主要瓶颈问题。该综述为推动3D生成成为具身智能可靠基础提供了系统性分析与未来方向。

Comments 27 pages, 11 figures, 8 tables

详情
英文摘要

Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has advanced rapidly, embodied applications impose requirements far beyond visual realism: generated objects must carry kinematic structure and material properties, scenes must support interaction and task execution, and the resulting content must bridge the gap between simulation and reality. This survey reviews 3D generation for embodied AI and organizes the literature around three roles that 3D generation plays in embodied systems. In Data Generator, 3D generation produces simulation-ready objects and assets, including articulated, physically grounded, and deformable content for downstream interaction; in Simulation Environments, it constructs interactive and task-oriented worlds, spanning structure-aware, controllable, and agentic scene generation; and in Sim2Real Bridge, it supports digital twin reconstruction, data augmentation, and synthetic demonstrations for downstream robot learning and real-world transfer. We also show that the field is shifting from visual realism toward interaction readiness, and we identify the main bottlenecks, including limited physical annotations, the gap between geometric quality and physical validity, fragmented evaluation, and the persistent sim-to-real divide, that must be addressed for 3D generation to become a dependable foundation for embodied intelligence. Our project page is at https://3dgen4robot.github.io.

2604.24013 2026-05-11 cs.LG cs.AI cs.CV cs.DC

CommFuse: Hiding Tail Latency via Communication Decomposition and Fusion for Distributed LLM Training

Rezaul Karim, Austin Wen, Wang Zongzuo, Weiwei Zhang, Yang Liu, Walid Ahmed

AI总结 随着大语言模型规模的快速增长,分布式训练中的通信开销成为影响计算效率的主要瓶颈。本文提出了一种名为CommFuse的新方法,通过通信分解与融合技术,有效消除现有重叠策略中的尾部延迟问题。该方法将传统的集体通信操作替换为细粒度的点对点通信,并优化计算调度,从而在数据并行和张量并行场景下显著降低通信开销,提升模型训练的吞吐量和计算利用率。

Comments Slightly modified the title, and corresponding minor wording change in the content

详情
英文摘要

The rapid growth in the size of large language models has necessitated the partitioning of computational workloads across accelerators such as GPUs, TPUs, and NPUs. However, these parallelization strategies incur substantial data communication overhead significantly hindering computational efficiency. While communication-computation overlap presents a promising direction, existing data slicing based solutions suffer from tail latency. To overcome this limitation, this research introduces a novel communication-computation overlap technique to eliminate this tail latency in state of the art overlap methods for distributed LLM training. The aim of this technique is to effectively mitigate communication bottleneck of tensor parallelism and data parallelism for distributed training and inference. In particular, we propose a novel method termed CommFuse that replaces conventional collective operations of reduce-scatter and all-gather with decomposed peer-to-peer (P2P) communication and schedules partitioned computations to enable fine-grained overlap. Our method provides an exact algorithm for reducing communication overhead that eliminates tail latency. Moreover, it presents a versatile solution compatible with data-parallel training and various tensor-level parallelism strategies, including TPSP and UP. Experimental evaluations demonstrate that our technique consistently achieves lower latency, superior Model FLOPS Utilization (MFU), and high throughput.

2604.23938 2026-05-11 cs.CL

TSAssistant: A Human-in-the-Loop Agentic Framework for Automated Target Safety Assessment

Xiaochen Zheng, Zhiwen Jiang, Melanie Guerard, Klas Hatje, Tatyana Doktorova

AI总结 本文提出了一种名为TSAssistant的人机协作智能框架,用于自动化靶点安全性评估(TSA)。该框架通过模块化、分章节的多智能体架构,将报告生成分解为多个专业子代理协同完成,每个子代理负责生成可引用、基于证据的TSA报告部分。TSAssistant支持用户在生成过程中进行交互式修改与补充,并通过系统记忆保持对话连贯性,旨在减轻证据整合与报告撰写的机械负担,实现人工智能与毒理学家的协同决策。

Comments Updated with self-consistency quantitative evaluation; additional quantitative and expert evaluations to be included in future revisions

详情
英文摘要

Target Safety Assessment (TSA) requires systematic integration of heterogeneous evidence, including genetic, transcriptomic, target homology, pharmacological, and clinical data, to evaluate potential safety liabilities of therapeutic targets. This process is inherently iterative and expert-driven, posing challenges in scalability and reproducibility. We present TSAssistant, a multi-agent framework designed to support TSA report drafting through a modular, section-based, and human-in-the-loop paradigm. The framework decomposes report generation into a coordinated pipeline of specialised subagents, each targeting a single TSA section. Specialised subagents retrieve structured and unstructured data as well as literature evidence from curated biomedical sources through standardised tool interfaces, producing individually citable, evidence-grounded sections. Agent behaviour is governed by a hierarchical instruction architecture comprising system prompts, domain-specific skill modules, and runtime user instructions. A key feature is an interactive refinement loop in which users may manually edit sections, append new information, upload additional sources, or re-invoke agents to revise specific sections, with the system maintaining conversational memory across iterations. TSAssistant is designed to reduce the mechanical burden of evidence synthesis and report drafting, supporting a hybrid model in which agentic AI augments evidence synthesis while toxicologists retain final decision authority.

2604.20403 2026-05-11 cs.LG

Robustness of Spatio-temporal Graph Neural Networks for Fault Location in Partially Observable Distribution Grids

Burak Karabulut, Carlo Manna, Chris Develder

AI总结 本文研究了在部分可观测的配电网络中,时空图神经网络(STGNN)用于故障定位的鲁棒性问题。作者提出了一种基于测量节点构建图结构的新方法,并引入了基于GraphSAGE和改进的GATv2的STGNN模型,实验表明该方法在性能和训练效率上均优于传统RNN模型。研究还发现,仅使用测量节点构建的图结构能够显著提升模型效率和稳定性,为部分可观测配电网络的故障定位提供了更实用和鲁棒的解决方案。

详情
英文摘要

Fault location in distribution grids is critical for reliability and minimizing outage durations. Yet, it remains challenging due to partial observability, given sparse measurement infrastructure. Recent works show promising results by combining Recurrent Neural Networks (RNNs) and Graph Neural Networks (GNNs) for spatio-temporal learning. Still, many modern GNN architectures remain untested for this grid application, while existing GNN solutions have not explored GNN topology definitions beyond simply adopting the full grid topology to construct the GNN graph. We address these gaps by (i) systematically comparing a newly proposed graph-forming strategy (measured-only) to the traditional full-topology approach, and (ii) introducing STGNN (Spatio-temporal GNN) models based on GraphSAGE and an improved Graph Attention (GATv2), for distribution grid fault location; (iii) benchmarking them against state-of-the-art STGNN and RNN baselines on the IEEE 123-bus feeder. In our experiments, all evaluated STGNN variants achieve high performance and consistently outperform a pure RNN baseline, with improvements up to 11 percentage points F1. Among STGNN models, the newly explored RGATv2 and RGSAGE achieve only marginally higher F1 scores. Still, STGNNs demonstrate superior stability, with tight confidence intervals (within +/- 1.4%) compared to the RNN baseline (up to +/- 7.5%) across different experiment runs. Finally, our proposed reduced GNN topology (measured-only) shows clear benefits in both (i) model training time (6-fold reduction) and (ii) model performance (up to 11 points F1). This suggests that measured-only graphs offer a more practical, efficient, and robust framework for partially observable distribution grids.

2604.19697 2026-05-11 cs.CV

Unveiling Fine-Grained Visual Traces: Evaluating Multimodal Interleaved Reasoning Chains in Multimodal STEM Tasks

Jing Jin, Hao Liu, Yan Bai, Yihang Lou, Zhenke Wang, Tianrun Yuan, Juntong Chen, Yongkang Zhu, Fanhu Zeng, Xuanyu Zhu, Tao Feng, Yige Xu

AI总结 该研究针对多模态大语言模型在STEM领域中的推理能力评估问题,提出了一个名为StepSTEM的细粒度基准测试,涵盖数学、物理、化学等283道研究生级别题目,强调跨模态推理过程的评估。该基准通过严格构建文本与视觉输入的互补性,并引入基于动态规划的步骤级评估框架,全面衡量模型的推理链表现。实验表明,当前主流模型仍主要依赖文本推理,跨模态能力仍有较大提升空间,StepSTEM为细粒度多模态推理研究提供了重要参考。

详情
英文摘要

Multimodal large language models (MLLMs) have shown promising reasoning abilities, yet evaluating their performance in specialized domains remains challenging. STEM reasoning is a particularly valuable testbed because it provides highly verifiable feedback, but existing benchmarks often permit unimodal shortcuts due to modality redundancy and focus mainly on final-answer accuracy, overlooking the reasoning process itself. To address this challenge, we introduce StepSTEM: a graduate-level benchmark of 283 problems across mathematics, physics, chemistry, biology, and engineering for fine-grained evaluation of cross-modal reasoning in MLLMs. StepSTEM is constructed through a rigorous curation pipeline that enforces strict complementarity between textual and visual inputs. We further propose a general step-level evaluation framework for both text-only chain-of-thought and interleaved image-text reasoning, using dynamic programming to align predicted reasoning steps with multiple reference solutions. Experiments across a wide range of models show that current MLLMs still rely heavily on textual reasoning, with even Gemini 3.1 Pro and Claude Opus 4.6 achieving only 38.29% accuracy. These results highlight substantial headroom for genuine cross-modal STEM reasoning and position StepSTEM as a benchmark for fine-grained evaluation of multimodal reasoning. Source code is available at https://github.com/lll-hhh/STEPSTEM.

2604.15719 2026-05-11 cs.AI

Harnessing Pre-Resolution Signals for Future Prediction Agents

Chuyang Wei, Maohang Gao, Zhixin Han, Kefei Chen, Yu Zhuang, Haoxiang Guan, Yanzhi Zhang, Yilin Cheng, Xiren Zhou, Huanhuan Chen, Jian Li, Jiyan He, Yu Shi, Yitong Duan, Shuxin Zheng

AI总结 本文研究了在结果尚未确定的情况下进行未来预测的问题,核心挑战在于监督信号仅在事后提供,难以指导预测过程中的关键判断。作者提出利用多次预测过程中产生的“预解决信号”来改进预测代理的判断能力,并设计了名为Milkyway的预测系统,通过持续更新的外部状态存储可复用的指导信息,从而在多次预测中不断优化预测结果。实验表明,该方法在多个基准测试中表现优异,其优势主要来源于预解决信号驱动的系统演化。

Comments Work in progress

详情
英文摘要

Many high-stakes decisions depend on forecasts made before outcomes are known. In this future prediction setting, the central challenge is that public evidence evolves over time, while the main supervision signal arrives only after resolution: the realized outcome mainly assesses final correctness, offering only coarse guidance on what to track, what to verify, and which judgments to leave uncertain along the way. Our key observation is that revisiting the same unresolved question over time creates informative temporal contrasts across evolving evidence and repeated forecasts, exposing what earlier attempts missed before resolution and yielding a diagnostic signal we call the pre-resolution signal. We instantiate this idea in Milkyway, a future prediction agent with a persistent future prediction harness, an editable external state that stores reusable procedural guidance across revisits to the same unresolved question. As the same unresolved question is revisited, Milkyway extracts pre-resolution signals from evolving evidence and repeated forecasts, uses them to update the harness, and improves later forecasts on that question before resolution. After resolution, the realized outcome serves as a post-resolution check of provisional updates. On the FutureX and FutureWorld benchmarks, Milkyway achieves strong performance against competitive baselines, and a mechanism study suggests that the gains stem from harness evolution driven by pre-resolution signals rather than repeated prediction alone.

2604.06333 2026-05-11 cs.LG cs.CV

Drifting Fields are not Conservative

Leonard T. Franz, Sebastian Hoffmann, Tim Weiland, Bernhard Schölkopf, Georg Martius

AI总结 本文研究了漂移场(drift field)在生成模型中的性质,指出漂移场通常不是保守场,因此不能表示为任何标量势函数的梯度。作者发现非保守性的根源在于位置依赖的归一化操作,而高斯核是唯一的径向例外。为此,他们引入了尖锐核(sharp kernel)和对应的归一化漂移场,使其对于一般的径向核都成为保守场,从而可以使用梯度下降直接优化标量势函数,提升了模型的理论基础和生成性能。

详情
英文摘要

Drifting models have recently gained attention for generating high-quality samples in a single forward pass. During training, they learn a push-forward map by following a vector-valued field, the drift field. We ask whether this procedure is equivalent to optimizing a scalar loss and find that, in general, it is not: drift fields are not conservative and cannot be written as the gradient of any scalar potential. We identify the position-dependent normalization as the source of non-conservatism, with the Gaussian kernel as the unique radial exception. Guided by this, we introduce the sharp kernel $k^\#$ and a sharp-normalized drift field that is conservative for general radial kernels. The resulting vector field is the gradient of a scalar potential that can be optimized directly using stochastic gradient descent. Moreover, the field has the form of a score difference of kernel density estimates, and gives exact equilibrium identifiability. Thus, sharp normalization closes the gap to related literature, such as Wasserstein gradient-flows and denoising score matching, also for non-Gaussian kernels. Empirically, sharp normalization preserves the performance of the original drifting objective, suggesting that the non-conservative flexibility is not required for high-quality generation.