arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2251
专题追踪
2601.21207 2026-05-28 cs.LG cs.AI math.AT

A Sheaf-Theoretic and Topological Perspective on Complex Network Modeling and Attention Mechanisms in Graph Neural Models

图神经模型中复杂网络建模与注意力机制的层论与拓扑视角

Chuan-Shen Hu

发表机构 * National Central University(国立中央大学)

AI总结 提出细胞层论框架分析图神经网络中节点特征与边权重的局部一致性与调和性,并引入基于拓扑数据分析的多尺度扩展以捕获层次特征交互。

详情
AI中文摘要

组合与拓扑结构,如图、单纯复形和胞腔复形,构成了几何与拓扑深度学习(GDL和TDL)架构的基础。这些模型在此类域上聚合信号、整合局部特征,并为多样化的实际应用生成表示。然而,训练过程中GDL和TDL特征的分布与扩散行为仍是一个开放且未充分探索的问题。受此空白启发,我们引入了一个细胞层论框架,用于建模和分析基于图的架构中节点特征与边权重的局部一致性与调和性。通过层结构追踪局部特征对齐与一致性,该框架提供了特征扩散与聚合的拓扑视角。此外,受拓扑数据分析(TDA)启发,提出了一个多尺度扩展,以捕获图模型中层次化的特征交互。该方法基于GDL和TDL架构的底层几何与拓扑结构以及其上定义的学习信号,实现了对它们的联合刻画,为节点分类、子结构检测和社区检测等传统任务的未来研究提供了见解。

英文摘要

Combinatorial and topological structures, such as graphs, simplicial complexes, and cell complexes, form the foundation of geometric and topological deep learning (GDL and TDL) architectures. These models aggregate signals over such domains, integrate local features, and generate representations for diverse real-world applications. However, the distribution and diffusion behavior of GDL and TDL features during training remains an open and underexplored problem. Motivated by this gap, we introduce a cellular sheaf theoretic framework for modeling and analyzing the local consistency and harmonicity of node features and edge weights in graph-based architectures. By tracking local feature alignments and agreements through sheaf structures, the framework offers a topological perspective on feature diffusion and aggregation. Furthermore, a multiscale extension inspired by topological data analysis (TDA) is proposed to capture hierarchical feature interactions in graph models. This approach enables a joint characterization of GDL and TDL architectures based on their underlying geometric and topological structures and the learned signals defined on them, providing insights for future studies on conventional tasks such as node classification, substructure detection, and community detection.

2603.02097 2026-05-28 cs.CL

ClinConsensus: A Physician-Calibrated Benchmark for Evaluating Clinical Rubric Coverage in Chinese Medical LLMs

ClinConsensus:一个用于评估中文医疗大模型临床评分标准覆盖率的医师校准基准

Xiang Zheng, Han Li, Wenjie Luo, Weiqi Zhai, Yiyuan Li, Chuanmiao Yan, Xue Yang, Kailuan Wu, Ruyi Xu, Tianyun Lu, Tianyi Tang, Yubo Ma, Kexin Yang, Dayiheng Liu, Sen Yang, Lin Qu, Bing Zhao, Hu Wei

发表机构 * Alibaba Group(阿里巴巴集团)

AI总结 为解决开放域医疗大模型评估缺乏医师校准的临床响应标准覆盖率问题,提出包含2500个专家病例的ClinConsensus基准,并引入医师锚定覆盖率评分(CACS)及双裁判框架,发现前沿模型存在19.2-21.9分的覆盖率差距。

详情
AI中文摘要

开放域医疗大模型评估在医师校准的临床相关响应标准覆盖率方面仍然薄弱,尤其是在本地化临床环境中。我们引入了 extsc{ClinConsensus},一个中文医疗基准,包含 2,500 个专家精选病例,涵盖 36 个专科、12 个任务主题、多个难度级别以及面向非专业与专业人员的场景。每个病例配有 30 个病例特定的二元评分标准。为了评估响应是否满足足够多的医师撰写的标准,我们提出了 \emph{医师锚定覆盖率评分}(CACS),一个在 \(k=10\) 实例化的医师校准阈值度量,并开发了一个双裁判框架,结合 GPT-5.1 评分器与一个医师监督的 Qwen3-8B 裁判。评估 11 个前沿大模型,我们发现存在持续的覆盖率差距:评分准确率在 39.6% 到 52.1% 之间,而 CACS@10 在 17.8% 到 32.9% 之间,模型间存在 19.2-21.9 个百分点的差距。分层分析进一步揭示了在推理、证据使用、结构化提取、用药说明、随访和对话语域方面的显著差异。这些结果表明,医疗大模型评估应衡量阈值化的、基于评分标准的临床覆盖率,而非平均部分正确性。

英文摘要

Open-ended medical LLM evaluation remains weakly grounded in physician-calibrated coverage of clinically relevant response criteria, especially in localized clinical settings. We introduce \textsc{ClinConsensus}, a Chinese medical benchmark of 2{,}500 expert-curated cases spanning 36 specialties, 12 task themes, multiple difficulty levels, and lay-facing versus professional-facing settings. Each case is paired with 30 case-specific binary rubric criteria. To evaluate whether responses satisfy enough physician-authored criteria, we propose \emph{Clinician-Anchored Coverage Score} (CACS), a physician-calibrated threshold metric instantiated at \(k=10\), and develop a dual-judge framework combining a GPT-5.1 grader with a physician-supervised Qwen3-8B judge. Evaluating 11 frontier LLMs, we find a persistent coverage gap: Rubric Accuracy ranges from 39.6\% to 52.1\%, whereas CACS@10 ranges from 17.8\% to 32.9\%, leaving a 19.2--21.9 point gap across models. Stratified analyses further reveal substantial variation across reasoning, evidence use, structured extraction, medication instructions, follow-up, and dialogue register. These results suggest that medical LLM evaluation should measure thresholded, rubric-grounded clinical coverage rather than average partial correctness.

2603.16985 2026-05-28 cs.LG

Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting

通过蒸馏将归纳偏置整合到Transformer中用于金融时间序列预测

Yu-Chen Den, Kuan-Yu Chen, Kendro Vincent, Darby Tien-Hao Chang

发表机构 * National Chengchi University(中华大学)

AI总结 提出TIPS框架,通过知识蒸馏将因果性、局部性和周期性等归纳偏置整合到统一Transformer中,在四个主要股票市场实现年化收益、夏普比率和卡尔玛比率分别提升55%、9%和16%,且推理计算量仅为38%。

Comments KDD 2026

详情
AI中文摘要

基于Transformer的模型因其高表示能力和架构灵活性而被广泛用于时间序列预测。然而,许多Transformer变体隐含地假设平稳性和稳定的时间动态——这些假设在具有制度转换和非平稳性的金融市场中经常被违反。经验上,最先进的时间序列Transformer在金融任务上甚至常常不如普通Transformer,而具有不同归纳偏置的简单架构(如CNN和RNN)能以更低的复杂度实现更强的性能。同时,没有单一的归纳偏置在所有市场或制度中占主导地位,这表明稳健的金融预测需要整合互补的时间先验。我们提出TIPS(Transformer with Inductive Prior Synthesis),一个知识蒸馏框架,将多样化的归纳偏置——因果性、局部性和周期性——综合到统一的Transformer中。TIPS通过注意力掩码训练偏置专用的Transformer教师,然后通过跨归纳偏置的制度依赖对齐,将它们的知识蒸馏到单个学生模型中。在四个主要股票市场上,TIPS实现了最先进的性能,在年化收益、夏普比率和卡尔玛比率上分别超过强集成基线55%、9%和16%,同时仅需要38%的推理计算量。进一步分析表明,TIPS产生了统计上显著的超额收益,超过普通Transformer及其教师集成,并在其盈利期间表现出与经典架构的制度依赖行为对齐。这些结果强调了在非平稳金融时间序列中,制度依赖的归纳偏置利用对于稳健泛化的重要性。

英文摘要

Transformer-based models have been widely adopted for time-series forecasting due to their high representational capacity and architectural flexibility. However, many Transformer variants implicitly assume stationarity and stable temporal dynamics -- assumptions routinely violated in financial markets characterized by regime shifts and non-stationarity. Empirically, state-of-the-art time-series Transformers often underperform even vanilla Transformers on financial tasks, while simpler architectures with distinct inductive biases, such as CNNs and RNNs, can achieve stronger performance with substantially lower complexity. At the same time, no single inductive bias dominates across markets or regimes, suggesting that robust financial forecasting requires integrating complementary temporal priors. We propose TIPS (Transformer with Inductive Prior Synthesis), a knowledge distillation framework that synthesizes diverse inductive biases -- causality, locality, and periodicity -- within a unified Transformer. TIPS trains bias-specialized Transformer teachers via attention masking, then distills their knowledge into a single student model with regime-dependent alignment across inductive biases. Across four major equity markets, TIPS achieves state-of-the-art performance, outperforming strong ensemble baselines by 55%, 9%, and 16% in annual return, Sharpe ratio, and Calmar ratio, while requiring only 38% of the inference-time computation. Further analyses show that TIPS generates statistically significant excess returns beyond both vanilla Transformers and its teacher ensembles, and exhibits regime-dependent behavioral alignment with classical architectures during their profitable periods. These results highlight the importance of regime-dependent inductive bias utilization for robust generalization in non-stationary financial time series.

2601.04505 2026-05-28 cs.AI cs.CL cs.SY eess.SY

CircuitLM: A Multi-Agent LLM-Aided Design Framework for Generating Circuit Schematics from Natural Language Prompts

CircuitLM: 一种基于多智能体的大语言模型辅助设计框架,用于从自然语言提示生成电路原理图

Khandakar Shakib Al Hasan, Syed Rifat Raiyan, Hasin Mahtab Alvee, Wahid Sadik

发表机构 * Department of Computer Science and Engineering(计算机科学与工程系) Department of Electrical and Electronic Engineering(电气与电子工程系) Islamic University of Technology(伊斯兰技术大学)

AI总结 提出CircuitLM多智能体流水线,通过嵌入驱动的组件知识库和五阶段流程,将自然语言提示转化为结构化的CircuitJSON原理图,并采用确定性电气规则检查和LLM作为评判的元评估器双重验证,解决大语言模型在电路设计中的幻觉和物理约束问题。

Comments Accepted at the 2026 IEEE International Conference on LLM-Aided Design (ICLAD), 10 pages, 8 figures, 6 tables

详情
AI中文摘要

从高层自然语言描述生成准确的电路原理图仍然是电子设计自动化(EDA)中的一个持久挑战,因为大语言模型(LLM)经常产生组件幻觉、违反严格的物理约束并输出非机器可读的结果。为解决此问题,我们提出CircuitLM,一个多智能体流水线,将用户提示转化为结构化的、视觉可解释的$\texttt{CircuitJSON}$原理图。该框架通过五个顺序阶段: (i) 组件识别,(ii) 规范引脚输出检索,(iii) 思维链推理,(iv) JSON原理图合成,以及(v) 交互式力导向可视化,基于一个精心策划的、嵌入驱动的组件知识库进行生成,从而减轻幻觉并确保物理可行性。我们在一个包含100个独特电路设计提示的数据集上,使用五个最先进的大语言模型评估了该系统。为系统评估性能,我们部署了严格的双层评估方法:一个确定性电气规则检查(ERC)引擎按严格严重性(关键、主要、次要、警告)对拓扑故障进行分类,同时一个LLM作为评判的元评估器识别复杂的、上下文感知的设计缺陷,这些缺陷绕过了标准的基于规则的检查器。最终,这项工作展示了目标检索与确定性和语义验证相结合如何将自然语言转化为结构可行的、原理图就绪的硬件和安全电路原型。我们的代码和数据公开在 https://github.com/Khandakar227/CircuitLM。

英文摘要

Generating accurate circuit schematics from high-level natural language descriptions remains a persistent challenge in electronic design automation (EDA), as large language models (LLMs) frequently hallucinate components, violate strict physical constraints, and produce non-machine-readable outputs. To address this, we present CircuitLM, a multi-agent pipeline that translates user prompts into structured, visually interpretable $\texttt{CircuitJSON}$ schematics. The framework mitigates hallucination and ensures physical viability by grounding generation in a curated, embedding-powered component knowledge base through five sequential stages: (i) component identification, (ii) canonical pinout retrieval, (iii) chain-of-thought reasoning, (iv) JSON schematic synthesis, and (v) interactive force-directed visualization. We evaluate the system on a dataset of 100 unique circuit-design prompts using five state-of-the-art LLMs. To systematically assess performance, we deploy a rigorous dual-layered evaluation methodology: a deterministic Electrical Rule Checking (ERC) engine categorizes topological faults by strict severity (Critical, Major, Minor, Warning), while an LLM-as-a-judge meta-evaluator identifies complex, context-aware design flaws that bypass standard rule-based checkers. Ultimately, this work demonstrates how targeted retrieval combined with deterministic and semantic verification can bridge natural language to structurally viable, schematic-ready hardware and safe circuit prototyping. Our code and data are publicly available at https://github.com/Khandakar227/CircuitLM.

2512.20780 2026-05-28 cs.CL cs.CY

Large Language Models Approach Expert Pedagogical Quality in Math Tutoring but Differ in Instructional and Linguistic Profiles

大型语言模型在数学辅导中接近专家教学质量,但在教学和语言特征上存在差异

Ramatu Oiza Abdulsalam, Segun Aroyehun

发表机构 * African University of science and Technology(非洲科学与技术大学) University of Konstanz(康斯坦茨大学)

AI总结 通过分析数学辅导对话数据集,比较专家、新手教师和七种大型语言模型的教学质量,发现大型语言模型平均接近专家水平,但在教学策略和语言特征上存在系统性差异。

详情
AI中文摘要

最近的工作探索了使用大型语言模型(LLMs)生成数学辅导回应,但尚不清楚其教学行为与人类专家实践的接近程度。我们分析了一个数学补救对话数据集,其中专家教师、新手教师和七种不同规模的大型语言模型(包括开放权重和商业模型)对相同的学生错误做出回应。我们检查了教学策略和辅导回应的语言特征,包括吸收(重述和转述)、追问准确性和推理、词汇多样性、可读性、礼貌性和能动性。我们发现专家教师产生的回应质量高于新手教师,并且较大的LLMs通常比较小的模型获得更高的教学质量评分,平均接近专家表现。然而,LLMs在教学特征上表现出系统性差异:它们较少使用专家教师特有的讨论策略,同时生成更长、词汇更丰富、更礼貌的回应。回归分析表明,追问准确性和推理、重述和转述以及词汇多样性与感知教学质量正相关,而更高水平的能动性和礼貌性语言则负相关。这些发现强调了在评估人类教师和智能辅导系统的辅导回应时分析教学策略和语言特征的重要性。

英文摘要

Recent work has explored the use of large language models (LLMs) to generate tutoring responses in mathematics, yet it remains unclear how closely their instructional behavior aligns with expert human practice. We analyze a dataset of math remediation dialogues in which expert tutors, novice tutors, and seven LLMs of varying sizes, comprising both open-weight and commercial models, respond to the same student errors. We examine instructional strategies and linguistic characteristics of tutoring responses, including uptake (restating and revoicing), pressing for accuracy and reasoning, lexical diversity, readability, politeness, and agency. We find that expert tutors produce higher-quality responses than novices, and that larger LLMs generally receive higher pedagogical quality ratings than smaller models, approaching expert performance on average. However, LLMs exhibit systematic differences in their instructional profiles: they underuse discursive strategies characteristic of expert tutors while generating longer, more lexically diverse, and more polite responses. Regression analyses show that pressing for accuracy and reasoning, restating and revoicing, and lexical diversity, are positively associated with perceived pedagogical quality, whereas higher levels of agentic and polite language are negatively associated. These findings highlight the importance of analyzing instructional strategies and linguistic characteristics when evaluating tutoring responses across human tutors and intelligent tutoring systems.

2502.08938 2026-05-28 cs.LG

Reevaluating Policy Gradient Methods for Imperfect-Information Games

重新评估不完美信息博弈的策略梯度方法

Max Rudolph, Nathan Lichtle, Sobhan Mohammadpour, Alexandre Bayen, J. Zico Kolter, Amy Zhang, Gabriele Farina, Eugene Vinitsky, Samuel Sokota

发表机构 * University of Texas at Austin(德克萨斯大学奥斯汀分校) University of California, Berkeley(加州大学伯克利分校) Massachusetts Institute of Technology(麻省理工学院) Carnegie Mellon University(卡内基梅隆大学) NYU Tandon School of Engineering(纽约大学坦顿工程学院)

AI总结 通过实现五种大型博弈的精确可剥削性计算,对比发现基于虚构博弈、双预言机和反事实遗憾最小化的深度强化学习算法未能超越通用策略梯度方法(如PPO)。

Comments International Conference on Learning Representations (ICLR) 2026

详情
AI中文摘要

在过去十年中,受对抗性不完美信息博弈中朴素自我对弈深度强化学习(DRL)所谓失败的驱动,研究人员基于虚构博弈(FP)、双预言机(DO)和反事实遗憾最小化(CFR)开发了大量DRL算法。鉴于最近磁镜下降算法的结果,我们假设更简单的通用策略梯度方法(如PPO)与这些基于FP、DO和CFR的DRL方法相比具有竞争力或更优。为了验证这一假设,我们实现并发布了五个大型博弈的首次广泛可访问的精确可剥削性计算。利用这些博弈,我们进行了不完美信息博弈中DRL算法有史以来最大规模的可剥削性比较。在超过7000次训练运行中,我们发现基于FP、DO和CFR的方法未能超越通用策略梯度方法。代码可在https://github.com/nathanlct/IIG-RL-Benchmark 和 https://github.com/gabrfarina/exp-a-spiel 获取。

英文摘要

In the past decade, motivated by the putative failure of naive self-play deep reinforcement learning (DRL) in adversarial imperfect-information games, researchers have developed numerous DRL algorithms based on fictitious play (FP), double oracle (DO), and counterfactual regret minimization (CFR). In light of recent results of the magnetic mirror descent algorithm, we hypothesize that simpler generic policy gradient methods like PPO are competitive with or superior to these FP-, DO-, and CFR-based DRL approaches. To facilitate the resolution of this hypothesis, we implement and release the first broadly accessible exact exploitability computations for five large games. Using these games, we conduct the largest-ever exploitability comparison of DRL algorithms for imperfect-information games. Over 7000 training runs, we find that FP-, DO-, and CFR-based approaches fail to outperform generic policy gradient methods. Code is available at https://github.com/nathanlct/IIG-RL-Benchmark and https://github.com/gabrfarina/exp-a-spiel .

2603.14864 2026-05-28 cs.CL

Shopping Companion: Benchmarking and Training LLM Agents for Long-Horizon Preference-Grounded E-Commerce Tasks

购物助手:面向长期偏好引导的电子商务任务的LLM智能体基准测试与训练

Zijian Yu, Kejun Xiao, Huaipeng Zhao, Tao Luo, Xiaoyi Zeng

发表机构 * Alibaba International Digital Commercial Group(阿里巴巴国际数字商业集团)

AI总结 针对电子商务中缺乏长期偏好感知购物任务基准和细粒度训练监督的问题,提出了Shopping Companion Bench基准和免标注工具级奖励方法,有效提升了LLM智能体的偏好捕获与任务性能。

详情
AI中文摘要

在电子商务中,LLM智能体在推荐、预算管理和捆绑销售等购物任务中展现出潜力,其中从长期对话中准确捕捉用户偏好至关重要。然而,进展受到两个关键挑战的限制:(1)缺乏评估长期偏好感知购物任务的基准,(2)缺乏用于购物智能体训练的细粒度监督。为了填补基准空白,我们引入了Shopping Companion Bench,这是一个新颖的基准,包含两个需要跨会话偏好记忆的购物任务,基于超过120万真实商品的产品池。我们的分析进一步指出了该基准上失败的两个主要来源:偏好幻觉导致的级联错误,以及未能充分验证产品属性是否符合用户需求。为了解决这些失败模式,我们设计了免标注的、工具级奖励,为每次工具调用提供过程监督,从而缓解了长期任务中的奖励稀疏问题。实验结果表明,即使是GPT-5等最先进模型,成功率也低于70%,凸显了我们基准的难度。值得注意的是,我们微调的轻量级4B模型在偏好捕获和任务性能上均持续优于强基线,表明我们奖励设计的有效性。

英文摘要

In e-commerce, LLM agents show promise for shopping tasks such as recommendations, budget management, and bundle deals, where accurately capturing user preferences from long-horizon conversations is critical. However, progress is limited by two key challenges: (1) the absence of benchmarks for evaluating long-term preference-aware shopping tasks, and (2) the lack of fine-grained supervision for shopping agent training. To fill the benchmark gap, we introduce Shopping Companion Bench, a novel benchmark comprising two shopping tasks that require cross-session preference memory, grounded in a product pool of over 1.2 million real-world items. Our analysis further identifies two major sources of failure on this benchmark: cascading errors caused by preference hallucination, and insufficient verification of product attributes against user requirements. To address these failure modes, we design annotation-free, tool-wise rewards that provide process supervision for each tool call, alleviating reward sparsity in long-horizon tasks. Experimental results demonstrate that even state-of-the-art models such as GPT-5 achieve success rates below 70%, highlighting the difficulty of our benchmark. Notably, our fine-tuned lightweight 4B model consistently outperforms strong baselines in both preference capture and task performance, suggesting the effectiveness of our reward design.

2603.14773 2026-05-28 cs.LG cs.AI

HO-SFL: Hybrid-Order Split Federated Learning with Backprop-Free Clients and Dimension-Free Aggregation

HO-SFL: 混合阶分割联邦学习,无反向传播客户端与维度无关聚合

Qiyuan Chen, Xian Wu, Yi Wang, Xianhao Chen

发表机构 * Department of Electrical and Computer Engineering, The University of Hong Kong, Hong Kong SAR, China(电子与计算机工程系,香港大学,香港特别行政区,中国)

AI总结 提出HO-SFL框架,通过拉格朗日框架重构分割学习,服务器执行一阶更新而客户端进行零阶优化,实现无反向传播客户端、维度无关聚合,理论证明收敛速度与一阶方法相当,实验验证通信和内存成本显著降低。

Comments Accepted to ICML 2026

详情
AI中文摘要

在边缘设备上微调大模型受到标准框架(如联邦学习和分割学习)中内存密集型的反向传播(BP)的严重阻碍。虽然用零阶优化替代BP可以显著减少内存占用,但通常会导致收敛速度严重下降。为了解决这一困境,我们提出了混合阶分割联邦学习(HO-SFL)。通过在拉格朗日框架内重构分割学习过程,HO-SFL解耦了优化景观:服务器执行精确的一阶更新(即BP),而客户端进行内存高效的零阶优化。这种混合设计不仅消除了客户端BP的需求,还实现了维度无关的模型聚合,大幅降低了通信成本。关键的是,我们提供了理论收敛分析,证明HO-SFL缓解了零阶优化的维度依赖收敛放缓,实现了与一阶方法相当的收敛速度。在视觉和语言模态任务上的大量实验验证了HO-SFL在实现与一阶基线相当的收敛速度的同时,显著降低了通信成本和客户端内存占用。

英文摘要

Fine-tuning large models on edge devices is severely hindered by the memory-intensive backpropagation (BP) in standard frameworks like federated learning and split learning. While substituting BP with zeroth-order optimization can significantly reduce memory footprints, it typically suffers from prohibitively degraded convergence speed. To resolve this dilemma, we propose Hybrid-Order Split Federated Learning (HO-SFL). By reformulating the split learning process within a Lagrangian framework, HO-SFL decouples the optimization landscape: The server performs precise first-order updates (i.e., BP), whereas clients conduct memory-efficient zeroth-order optimization. This hybrid design not only eliminates the need for client-side BP but also enables dimension-free model aggregation, drastically lowering communication costs. Crucially, we provide a theoretical convergence analysis, demonstrating that HO-SFL mitigates the dimension-dependent convergence slowdown of zeroth-order optimization, achieving a convergence rate comparable to first-order methods. Extensive experiments on tasks across vision and language modalities validate that HO-SFL achieves convergence speeds comparable to first-order baselines while significantly reducing communication costs and client memory footprints.

2603.14515 2026-05-28 cs.LG physics.chem-ph physics.comp-ph quant-ph

Excited Pfaffians: Generalized Neural Wave Functions Across Structure and State

激发Pfaffians:跨结构和状态的广义神经波函数

Nicholas Gao, Till Grutschus, Frank Noé, Stephan Günnemann

发表机构 * Department of Computer Science \& Munich Data Science Institute, Technical University of Munich Free University of Berlin Rice University Microsoft Research AI4Science

AI总结 提出多态重要性采样(MSIS)和激发Pfaffians架构,以近恒定样本量高效计算多态重叠,并在单个神经网络中表示多个激发态,实现更快训练和更多状态建模。

详情
AI中文摘要

变分蒙特卡洛(VMC)中的神经网络波函数在精确表示基态和激发态方面取得了巨大成功。然而,在状态重叠中实现足够的数值精度需要增加蒙特卡洛样本数量,从而增加计算成本,且随状态数增加。我们提出了一种近乎恒定样本量的方法——多态重要性采样(MSIS),利用来自所有状态的样本来估计成对重叠。为了高效评估所有样本的所有状态,我们引入了激发Pfaffians。受Hartree-Fock启发,该架构在单个神经网络内表示多个状态。激发Pfaffians还作为广义波函数,允许单个模型表示多态势能面。在碳二聚体上,我们匹配了$O(N_s^4)$标度的自然激发态,同时训练速度提高了$>200$倍,并建模了多50%的状态。我们有利的标度使我们能够首次使用神经网络找到铍原子的所有不同能级。最后,我们证明了单个波函数可以表示不同分子中的激发态。

英文摘要

Neural-network wave functions in Variational Monte Carlo (VMC) have achieved great success in accurately representing both ground and excited states. However, achieving sufficient numerical accuracy in state overlaps requires increasing the number of Monte Carlo samples, and consequently the computational cost, with the number of states. We present a nearly constant sample-size approach, Multi-State Importance Sampling (MSIS), that leverages samples from all states to estimate pairwise overlap. To efficiently evaluate all states for all samples, we introduce Excited Pfaffians. Inspired by Hartree-Fock, this architecture represents many states within a single neural network. Excited Pfaffians also serve as generalized wave functions, allowing a single model to represent multi-state potential energy surfaces. On the carbon dimer, we match the $O(N_s^4)$-scaling natural excited states while training $>200\times$ faster and modeling 50% more states. Our favorable scaling enables us to be the first to use neural networks to find all distinct energy levels of the beryllium atom. Finally, we demonstrate that a single wave function can represent excited states across various molecules.

2602.20497 2026-05-28 cs.CV cs.AI

LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration

LESA: 可学习的阶段感知预测器用于扩散模型加速

Peiliang Cai, Jiacheng Liu, Haowen Xu, Xinyu Wang, Chang Zou, Linfeng Zhang

发表机构 * Shanghai Jiao Tong University(上海交通大学)

AI总结 针对扩散模型计算开销大、现有缓存策略难以适应去噪过程阶段动态变化的问题,提出基于两阶段训练的可学习阶段感知预测器框架,利用KAN网络学习时序特征映射并采用多阶段多专家架构,在保持高质量生成的同时实现显著加速。

Comments Accepted to CVPR 2026

详情
AI中文摘要

扩散模型在图像和视频生成任务中取得了显著成功。然而,扩散Transformer(DiTs)的高计算需求对其实际部署构成了重大挑战。虽然特征缓存是一种有前景的加速策略,但现有基于简单重用或无训练预测的方法难以适应扩散过程中复杂的、阶段相关的动态变化,常常导致质量下降,并无法保持与标准去噪过程的一致性。为解决这一问题,我们提出了一种基于两阶段训练的可学习阶段感知(LESA)预测器框架。我们的方法利用Kolmogorov-Arnold网络(KAN)从数据中准确学习时序特征映射。我们进一步引入了一种多阶段、多专家架构,为不同噪声水平阶段分配专门的预测器,从而实现更精确和鲁棒的特征预测。大量实验表明,我们的方法在保持高保真生成的同时实现了显著加速。实验显示,在FLUX.1-dev上实现了5.00倍加速,质量下降极小(1.0%);在Qwen-Image上实现了6.25倍加速,质量比之前的最优方法(TaylorSeer)提升20.2%;在HunyuanVideo上实现了5.00倍加速,PSNR比TaylorSeer提升24.7%。在文本到图像和文本到视频合成任务上的最先进性能验证了我们基于训练框架在不同模型上的有效性和泛化能力。我们的代码可在https://github.com/caipeiliang2004/LESA获取。

英文摘要

Diffusion models have achieved remarkable success in image and video generation tasks. However, the high computational demands of Diffusion Transformers (DiTs) pose a significant challenge to their practical deployment. While feature caching is a promising acceleration strategy, existing methods based on simple reusing or training-free forecasting struggle to adapt to the complex, stage-dependent dynamics of the diffusion process, often resulting in quality degradation and failing to maintain consistency with the standard denoising process. To address this, we propose a LEarnable Stage-Aware (LESA) predictor framework based on two-stage training. Our approach leverages a Kolmogorov-Arnold Network (KAN) to accurately learn temporal feature mappings from data. We further introduce a multi-stage, multi-expert architecture that assigns specialized predictors to different noise-level stages, enabling more precise and robust feature forecasting. Extensive experiments show our method achieves significant acceleration while maintaining high-fidelity generation. Experiments demonstrate 5.00x acceleration on FLUX.1-dev with minimal quality degradation (1.0% drop), 6.25x speedup on Qwen-Image with a 20.2% quality improvement over the previous SOTA (TaylorSeer), and 5.00x acceleration on HunyuanVideo with a 24.7% PSNR improvement over TaylorSeer. State-of-the-art performance on both text-to-image and text-to-video synthesis validates the effectiveness and generalization capability of our training-based framework across different models. Our code is available at https://github.com/caipeiliang2004/LESA.

2602.18982 2026-05-28 cs.LG q-bio.PE

Conditionally Site-Independent Neural Evolution of Antibody Sequences

抗体序列的条件性位点无关神经进化

Stephen Zhewen Lu, Aakarsh Vermani, Kohei Sanno, Jiarui Lu, Frederick A Matsen, Milind Jagota, Yun S. Song

发表机构 * University of California, Berkeley Columbia University Mila - Qu \'e bec AI Institute Fred Hutchinson Cancer Research Center University of Washington Howard Hughes Medical Institute

AI总结 提出CoSiNE模型,用深度神经网络参数化的连续时间马尔可夫链桥接系统发育模型与深度学习,实现抗体序列进化建模,在零样本变异效应预测中优于现有语言模型,并引入引导吉莱斯皮采样优化抗体亲和力。

Comments 28 pages, 15 figures. Accepted as a poster at ICML 2026

详情
AI中文摘要

常见的抗体工程深度学习方法侧重于建模序列的边缘分布。然而,这些方法将序列视为独立样本,忽略了亲和力成熟作为抗体探索潜在适应度景观的进化过程中丰富且很大程度上未开发的信息来源。相比之下,经典的系统发育模型明确表示进化动力学,但缺乏捕捉复杂上位相互作用的表达能力。我们通过CoSiNE(一种由深度神经网络参数化的连续时间马尔可夫链)弥合了这一差距。数学上,我们证明CoSiNE提供了难以处理的顺序点突变过程的一阶近似,以分支长度二次方的误差界捕捉上位效应。实验上,CoSiNE通过明确区分选择与上下文依赖的体细胞超突变,在零样本变异效应预测中优于最先进的语言模型。最后,我们引入了引导吉莱斯皮(Guided Gillespie),一种在推理时引导CoSiNE的分类器引导采样方案,从而实现对特定抗原的抗体结合亲和力的高效优化。

英文摘要

Common deep learning approaches for antibody engineering focus on modeling the marginal distribution of sequences. By treating sequences as independent samples, however, these methods overlook affinity maturation as a rich and largely untapped source of information about the evolutionary process by which antibodies explore the underlying fitness landscape. In contrast, classical phylogenetic models explicitly represent evolutionary dynamics but lack the expressivity to capture complex epistatic interactions. We bridge this gap with CoSiNE, a continuous-time Markov chain parameterized by a deep neural network. Mathematically, we prove that CoSiNE provides a first-order approximation to the intractable sequential point mutation process, capturing epistatic effects with an error bound that is quadratic in branch length. Empirically, CoSiNE outperforms state-of-the-art language models in zero-shot variant effect prediction by explicitly disentangling selection from context-dependent somatic hypermutation. Finally, we introduce Guided Gillespie, a classifier-guided sampling scheme that steers CoSiNE at inference time, enabling efficient optimization of antibody binding affinity toward specific antigens.

2603.13003 2026-05-28 cs.RO cs.SY eess.SY

From Passive Monitoring to Active Defence: Resilient Control of Manipulators Under Cyberattacks

从被动监测到主动防御:网络攻击下机械臂的弹性控制

Gabriele Gualandi, Alessandro V. Papadopoulos

发表机构 * Department of Computer Science and Engineering, Mälardalen University(计算机科学与工程系,马尔默大学)

AI总结 针对虚假数据注入攻击(FDIA)下冗余机械臂的弹性控制问题,提出一种基于异常分数的主动控制级防御方法,通过单调函数衰减控制输入,显著降低攻击引起的末端执行器偏差,同时保证无攻击时的标称性能。

Comments v2: Accepted at ICRA 2026. Corrected minor typos, grammatical errors, and notation inconsistencies. Corrected the attacker's PD law in Sec. III-C: removed the feedforward acceleration term, viable only when the attacker assumes sufficient tracking precision; the active defence prevents this in our experiments, so only PD terms are used

详情
AI中文摘要

网络物理机器人系统容易受到虚假数据注入攻击(FDIAs),其中攻击者在规避基于残差的被动异常检测器(如卡方检验)的同时破坏传感器信号。这种隐蔽攻击可以在不触发警报的情况下引起显著的末端执行器偏差。本文研究了冗余机械臂对隐蔽FDIAs的弹性,并将架构从被动监测推进到主动防御。我们建立了一个闭环模型,包括反馈线性化机械臂、稳态卡尔曼滤波器和基于卡方的异常检测器。在此被动监测层的基础上,我们提出了一种主动控制级防御,通过一个新颖的驱动投影、无测量状态预测器生成的异常分数的单调函数来衰减控制输入。所提出的设计在标称驱动损失上提供了概率保证,并保持了闭环稳定性。从攻击者角度,我们推导了一个凸QCQP来计算一步最优隐蔽攻击。在六自由度平面机械臂上的仿真表明,所提出的防御显著减少了攻击引起的末端执行器偏差,同时在无攻击时保持了标称任务性能。

英文摘要

Cyber-physical robotic systems are vulnerable to false data injection attacks (FDIAs), in which an adversary corrupts sensor signals while evading residual-based passive anomaly detectors such as the chi-squared test. Such stealthy attacks can induce substantial end-effector deviations without triggering alarms. This paper studies the resilience of redundant manipulators to stealthy FDIAs and advances the architecture from passive monitoring to active defence. We formulate a closed-loop model comprising a feedback-linearized manipulator, a steady-state Kalman filter, and a chi-squared-based anomaly detector. Building on this passive monitoring layer, we propose an active control-level defence that attenuates the control input through a monotone function of an anomaly score generated by a novel actuation-projected, measurement-free state predictor. The proposed design provides probabilistic guarantees on nominal actuation loss and preserves closed-loop stability. From the attacker perspective, we derive a convex QCQP for computing one-step optimal stealthy attacks. Simulations on a 6-DOF planar manipulator show that the proposed defence significantly reduces attack-induced end-effector deviation while preserving nominal task performance in the absence of attacks.

2603.12344 2026-05-28 cs.LG

Can Decision Trees Teach Large Language Models? Distilling Verbalized Knowledge for Molecular Property Prediction

决策树能否教会大语言模型?为分子性质预测提炼语言化知识

Khiem Le, Sreejata Dey, Marcos Martínez Galindo, Vanessa Lopez, Ting Hua, Nitesh V. Chawla, Hoang Thanh Lam

发表机构 * University of Notre Dame(内布拉斯加大学) IBM Research(IBM研究院)

AI总结 提出TreeKD方法,通过将基于决策树/随机森林的专业模型知识语言化并融入提示,训练大语言模型,显著提升其在分子性质预测任务上的性能。

详情
AI中文摘要

分子性质预测(MPP)是药物发现中的一个基本问题,近年来受到越来越多的关注。大语言模型(LLMs)以其跨领域的惊人能力而闻名,有望成为MPP的通用模型。然而,它们目前的性能仍低于实际应用所需的阈值。为了弥补这一差距,我们提出了TreeKD,用于将基于树的专业模型的知识提炼到LLMs中,以补充LLMs的内部知识并提高其预测准确性。对于每个性质,我们使用输入分子中4万个功能基团衍生的特征训练一个专业决策树。然后,将决策树学习到的预测规则(编码了其知识)语言化,并纳入用于训练LLMs的提示中。此外,通过用随机森林替换单个决策树,我们引入了一种称为规则一致性的测试时缩放技术,该技术聚合了从不同规则构建的不同提示生成的预测。使用两个LLM(Gemma-2-2B和Granite-3.3-2B)在包含22个预测任务的TDC基准上进行的大量评估表明,我们的方法显著提高了LLMs的性能,推动了MPP通用模型的发展。

英文摘要

Molecular Property Prediction (MPP) is a fundamental problem in drug discovery that has recently attracted growing attention. Large Language Models (LLMs), known for their impressive proficiency across domains, show promise as generalist models for MPP. However, their current performance remains below the threshold needed for practical adoption. To bridge this gap, we propose TreeKD for distilling the knowledge of tree-based specialist models into LLMs to complement the internal knowledge of LLMs and improve their predictive accuracy. For each property, we train a specialist decision tree using features derived from 40K functional groups in the input molecules. Then, the predictive rule learned by the decision tree, which encodes its knowledge, is verbalized and incorporated into the prompts for training LLMs. In addition, by replacing a single decision tree with a Random Forest, we introduce a test-time scaling technique called rule-consistency, which aggregates predictions generated from different prompts constructed with different rules. An extensive evaluation with two LLMs, Gemma-2-2B and Granite-3.3-2B, on the TDC benchmark with 22 prediction tasks shows that our method substantially enhances the performance of LLMs, advancing the development of generalist models for MPP.

2603.10961 2026-05-28 cs.LG

Bio-Inspired Self-Supervised Learning for Wrist-worn Accelerometer Data

生物启发的自监督学习用于腕戴式加速度计数据

Prithviraj Tarale, Kiet Chu, Abhishek Varghese, Kai-Chun Liu, Maxwell A. Xu, Mohit Iyyer, Sunghoon I. Lee

发表机构 * College of Information and Computer Sciences, University of Massachusetts, Amherst, United States(信息与计算机科学学院,马萨诸塞大学阿默斯特分校) Google Health, Seattle, United States(谷歌健康,西雅图,美国) Department of Computer Science, University of Maryland, College Park, United States(计算机科学系,马里兰大学学院公园分校) Stevens Institute of Technology, Hoboken, United States(史蒂文斯理工学院,霍博肯,美国)

AI总结 提出基于运动子单元理论的令牌化策略,通过掩码重建预训练Transformer编码器,在六个HAR基准上超越现有自监督方法。

详情
AI中文摘要

可穿戴加速度计能够实现大规模健康监测,但学习鲁棒的人体活动表示受到标记数据稀缺的限制。虽然自监督学习提供了一种解决方案,但现有方法将传感器流视为非结构化时间序列,忽略了人体运动的潜在生物结构,我们认为这一因素对于有效的人类活动识别(HAR)至关重要。我们引入了一种新颖的令牌化策略,该策略基于运动控制的子单元理论,该理论认为连续的手腕运动由称为子单元的基本基函数组成。我们将令牌定义为运动片段,这是一个计算上可处理的运动单元,由有限序列的子单元组成。通过掩码重建这些令牌来预训练Transformer编码器,我们将学习焦点从局部波形形态转移到高层次的结构和时间组织。在NHANES语料库(约28k小时;11k参与者)上预训练后,我们的表示在六个受试者分离的HAR基准上优于强大的可穿戴SSL基线。代码和预训练权重可在https://prithvitarale.github.io/biopm-site/获取。

英文摘要

Wearable accelerometers enable large-scale health monitoring, yet learning robust human-activity representations has been constrained by scarce labeled data. While self-supervised learning offers a remedy, existing methods treat sensor streams as unstructured time series, overlooking the underlying biological structure of human movement, a factor we argue is critical for effective Human Activity Recognition (HAR). We introduce a novel tokenization strategy grounded in the submovement theory of motor control, which posits that continuous wrist motion is composed of elementary basis functions called submovements. We define our token as the movement segment, a computationally tractable unit of motion composed of a finite sequence of submovements. By pretraining a Transformer encoder via masked reconstruction of these tokens, we shift the learning focus from local waveform morphology to high-level structural and temporal organization. Pretrained on the NHANES corpus (approximately 28k hours; 11k participants), our representations outperform strong wearable SSL baselines across six subject-disjoint HAR benchmarks. Code and pretrained weights are available at https://prithvitarale.github.io/biopm-site/.

2603.09882 2026-05-28 cs.RO cs.AI

Emerging Extrinsic Dexterity in Cluttered Scenes via Dynamics-aware Policy Learning

杂乱场景中通过动力学感知策略学习涌现的外在灵巧性

Yixin Zheng, Jiangran Lyu, Yifan Zhang, Jiayi Chen, Mi Yan, Yuntian Deng, Xuesong Shi, Xiaoguang Zhao, Yizhou Wang, Zhizheng Zhang, He Wang

发表机构 * Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) Beijing Academy of Artificial Intelligence(北京人工智能研究院) Galbot Peking University(北京大学) Shanghai Jiao Tong University(上海交通大学)

AI总结 提出动力学感知策略学习框架,通过显式世界建模学习接触诱导物体动力学表示并用于强化学习,使杂乱场景中的外在灵巧性无需手工启发式或复杂奖励塑造即可涌现。

Comments Accepted to Robotics: Science and Systems (RSS) 2026. Project page: https://pku-epic.github.io/DAPL/

详情
AI中文摘要

外在灵巧性利用环境接触来克服抓取操作的局限性。然而,在杂乱场景中实现这种灵巧性仍然具有挑战性且未被充分探索,因为它需要选择性地利用多个相互作用的物体之间的接触,而这些物体具有内在耦合的动力学。现有方法缺乏对这种复杂动力学的显式建模,因此在杂乱环境中的非抓取操作方面表现不足,这反过来限制了它们在现实环境中的实际应用。在本文中,我们介绍了一种动力学感知策略学习(DAPL)框架,该框架可以利用在杂乱环境中学习到的接触诱导物体动力学的表示来促进策略学习。这种表示通过显式世界建模学习,并用于条件化强化学习,使得外在灵巧性无需手工制作的接触启发式或复杂的奖励塑造即可涌现。我们在仿真和现实世界中评估了我们的方法。在具有不同密度的未见过的仿真杂乱场景中,我们的方法在成功率上比抓取操作、人类遥操作和基于先前表示的策略高出25%以上。在10个杂乱场景中,现实世界的成功率达到了约50%,而实际杂货部署进一步证明了稳健的仿真到现实迁移和适用性。

英文摘要

Extrinsic dexterity leverages environmental contact to overcome the limitations of prehensile manipulation. However, achieving such dexterity in cluttered scenes remains challenging and underexplored, as it requires selectively exploiting contact among multiple interacting objects with inherently coupled dynamics. Existing approaches lack explicit modeling of such complex dynamics and therefore fall short in non-prehensile manipulation in cluttered environments, which in turn limits their practical applicability in real-world environments. In this paper, we introduce a Dynamics-Aware Policy Learning (DAPL) framework that can facilitate policy learning with a learned representation of contact-induced object dynamics in cluttered environments. This representation is learned through explicit world modeling and used to condition reinforcement learning, enabling extrinsic dexterity to emerge without hand-crafted contact heuristics or complex reward shaping. We evaluate our approach in both simulation and the real world. Our method outperforms prehensile manipulation, human teleoperation, and prior representation-based policies by over 25% in success rate on unseen simulated cluttered scenes with varying densities. The real-world success rate reaches around 50% across 10 cluttered scenes, while a practical grocery deployment further demonstrates robust sim-to-real transfer and applicability.

2603.02702 2026-05-28 cs.AI cs.LG

FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

FinTexTS: 基于语义和多层级配对的金融文本-时间序列数据集

Jaehoon Lee, Suhwan Park, Taeyoon Lim, Seunghan Lee, Jun Seo, Dongwan Kang, Hwanil Choi, Minjae Kim, Sungdong Yoo, Soonyoung Lee, Yongjae Lee, Wonbin Ahn

发表机构 * LG AI Research(LG人工智能研究所) Ulsan National Institute of Science and Technology(乌山国立科学技术研究院)

AI总结 提出基于语义和多层级配对的框架,从SEC文件和新闻中提取并匹配多层级文本信息,构建大规模文本配对的股票价格数据集FinTexTS,提升股价预测性能。

Comments 12 pages, KDD 2026, Datasets and Benchmarks Track

详情
AI中文摘要

金融领域涉及多种重要的时间序列问题。近年来,联合利用文本和数值信息的时间序列分析方法越来越受到关注。因此,人们做出了大量努力来构建金融领域中的文本配对时间序列数据集。然而,金融市场具有复杂的相互依赖性,一家公司的股票价格不仅受公司特定事件的影响,还受其他公司事件和更广泛的宏观经济因素的影响。现有的基于简单关键词匹配的文本与金融时间序列数据配对方法往往无法捕捉这种复杂关系。为了解决这一局限性,我们提出了一种基于语义和多层级的配对框架。具体来说,我们从SEC文件中提取目标公司的特定上下文,并应用基于嵌入的匹配机制,根据该上下文检索语义相关的新闻文章。此外,我们使用大语言模型(LLMs)将新闻文章分为四个层级(宏观层级、行业层级、相关公司层级和目标公司层级),实现新闻文章与目标公司的多层级配对。将该框架应用于公开可用的新闻数据集,我们构建了FinTexTS,这是一个新的大规模文本配对的股票价格数据集。在FinTexTS上的实验结果表明,我们的基于语义和多层级的配对策略在股价预测中是有效的。除了FinTexTS所依赖的公开新闻外,我们还表明,将我们的方法应用于专有但精心策划的新闻源,可以产生更高质量的配对数据,并提高股价预测性能。

英文摘要

The financial domain involves a variety of important time-series problems. Recently, time-series analysis methods that jointly leverage textual and numerical information have gained increasing attention. Accordingly, numerous efforts have been made to construct text-paired time-series datasets in the financial domain. However, financial markets are characterized by complex interdependencies, in which a company's stock price is influenced not only by company-specific events but also by events in other companies and broader macroeconomic factors. Existing approaches that pair text with financial time-series data based on simple keyword matching often fail to capture such complex relationships. To address this limitation, we propose a semantic-based and multi-level pairing framework. Specifically, we extract company-specific context for the target company from SEC filings and apply an embedding-based matching mechanism to retrieve semantically relevant news articles based on this context. Furthermore, we classify news articles into four levels (macro-level, sector-level, related company-level, and target company-level) using large language models (LLMs), enabling multi-level pairing of news articles with the target company. Applying this framework to publicly-available news datasets, we construct FinTexTS, a new large-scale text-paired stock price dataset. Experimental results on FinTexTS demonstrate the effectiveness of our semantic-based and multi-level pairing strategy in stock price forecasting. In addition to publicly-available news underlying FinTexTS, we show that applying our method to proprietary yet carefully curated news sources leads to higher-quality paired data and improved stock price forecasting performance.

2603.08264 2026-05-28 cs.CV

Event-based Motion & Appearance Fusion for 6D Object Pose Tracking

基于事件的运动与外观融合的6D物体姿态跟踪

Zhichao Li, Chiara Bartolozzi, Lorenzo Natale, Arren Glover

发表机构 * Event-driven Perception for Robotics, Istituto Italiano di Tecnologia, Italy(事件驱动感知机器人实验室,意大利理工学院) Humanoid Sensing and Perception, Istituto Italiano di Tecnologia, Italy(人形感知与感知,意大利理工学院) University of Genoa, Genoa, Italy(热那亚大学,意大利)

AI总结 提出一种结合事件相机高时间分辨率优势的无学习方法,通过事件光流传播姿态并利用模板匹配校正,在高速运动物体上达到或超越现有算法性能。

详情
AI中文摘要

物体姿态跟踪是机器人在家庭和工业环境中执行任务的基本且必要的任务。最常用的传感器是RGB-D相机,但在高动态环境中,由于运动模糊和帧率限制,它们可能达到极限。事件相机具有高时间分辨率和低延迟等显著特性,使其成为高速物体姿态跟踪的理想视觉传感器。尽管如此,目前仅有少数工作涉及事件相机的6D姿态跟踪。在这项工作中,我们利用高时间分辨率的优势,提出了一种结合传播步骤与姿态校正策略的方法。具体而言,我们使用从事件光流中获得的6D物体速度进行姿态传播,然后利用基于模板的局部姿态校正模块进行姿态校正。我们的无学习方法与最先进的算法性能相当,并且在某些情况下对快速移动物体的表现更优。结果表明,在深度网络方法受限于低更新速率的高动态场景中,事件相机具有应用潜力。

英文摘要

Object pose tracking is a fundamental and essential task for robotics to perform tasks in the home and industrial settings. The most commonly used sensors to do so are RGB-D cameras, which can hit limitations in highly dynamic environments due to motion blur and frame-rate constraints. Event cameras have remarkable features such as high temporal resolution and low latency, which make them a potentially ideal vision sensors for object pose tracking at high speed. Even so, there are still only few works on 6D pose tracking with event cameras. In this work, we take advantage of the high temporal resolution and propose a method that uses both a propagation step fused with a pose correction strategy. Specifically, we use 6D object velocity obtained from event-based optical flow for pose propagation, after which, a template-based local pose correction module is utilized for pose correction. Our learning-free method has comparable performance to the state-of-the-art algorithms, and in some cases out performs them for fast-moving objects. The results indicate the potential for using event cameras in highly-dynamic scenarios where the use of deep network approaches are limited by low update rates.

2601.21309 2026-05-28 cs.LG

Transferable Graph Condensation from the Causal Perspective

从因果视角的可迁移图压缩

Huaming Du, Yijie Huang, Su Yao, Yiying Wang, Yueyang Zhou, Jingwen Yang, Jinshi Zhang, Han Ji, Yu Zhao, Guisong Liu, Hegui Zhang, Carl Yang, Gang Kou

发表机构 * Southwestern University of Finance and Economics(西南财经大学) Tsinghua University(清华大学) Ant Group(蚂蚁集团) Dongbei University of Finance and Economics(东北财经大学) Emory University(埃默里大学) Xiangjiang Laboratory(湘江实验室)

AI总结 提出基于因果不变性的可迁移图压缩方法TGCC,通过因果干预提取域不变特征并注入压缩图,实现跨任务和跨域场景下的有效压缩。

详情
AI中文摘要

图数据集的规模日益增大,显著提升了图表示学习方法的性能,但也带来了巨大的训练挑战。图数据集压缩技术应运而生,旨在将大规模数据集压缩为更小但信息丰富的数据集,同时保持相似的测试性能。然而,这些方法严格要求下游应用与原始数据集和任务匹配,在跨任务和跨域场景中往往失效。为解决这些挑战,我们提出了一种新颖的基于因果不变性的可迁移图数据集压缩方法,命名为TGCC,提供有效且可迁移的压缩数据集。具体而言,为保留域不变知识,我们首先通过因果干预从图的空间域提取域因果不变特征。然后,为充分捕捉原始图的结构和特征信息,我们执行增强压缩操作。最后,通过谱域增强对比学习,将因果不变特征注入压缩图,确保压缩图保留原始图的因果信息。在五个公开数据集和我们新构建的FinReport数据集上的实验结果表明,TGCC在跨任务和跨域复杂场景下相比现有方法提升高达13.41%,并在6个数据集中的5个上,在单一数据集和任务场景下达到了最先进性能。

英文摘要

The increasing scale of graph datasets has significantly improved the performance of graph representation learning methods, but it has also introduced substantial training challenges. Graph dataset condensation techniques have emerged to compress large datasets into smaller yet information-rich datasets, while maintaining similar test performance. However, these methods strictly require downstream applications to match the original dataset and task, which often fails in cross-task and cross-domain scenarios. To address these challenges, we propose a novel causal-invariance-based and transferable graph dataset condensation method, named TGCC, providing effective and transferable condensed datasets. Specifically, to preserve domain-invariant knowledge, we first extract domain causal-invariant features from the spatial domain of the graph using causal interventions. Then, to fully capture the structural and feature information of the original graph, we perform enhanced condensation operations. Finally, through spectral-domain enhanced contrastive learning, we inject the causal-invariant features into the condensed graph, ensuring that the compressed graph retains the causal information of the original graph. Experimental results on five public datasets and our novel FinReport dataset demonstrate that TGCC achieves up to a 13.41% improvement in cross-task and cross-domain complex scenarios compared to existing methods, and achieves state-of-the-art performance on 5 out of 6 datasets in the single dataset and task scenario.

2603.05642 2026-05-28 cs.RO cs.AI

Relational Semantic Reasoning on 3D Scene Graphs for Open World Interactive Object Search

基于3D场景图的开放世界交互式物体搜索的关系语义推理

Imen Mahdi, Matteo Cassinelli, Fabien Despinoy, Tim Welschehold, Abhinav Valada

发表机构 * University of Freiburg(弗赖堡大学) Toyota Motor Europe(丰田欧洲公司)

AI总结 提出SCOUT方法,通过从LLM蒸馏的关系探索启发式直接搜索3D场景图,实现高效开放世界交互式物体搜索,性能匹配LLM且计算高效。

详情
AI中文摘要

家庭环境中的开放世界交互式物体搜索需要理解物体与其周围环境之间的语义关系,以有效引导探索。先前的方法要么依赖视觉-语言嵌入相似性,这不能可靠地捕获任务相关的关系语义,要么依赖大型语言模型(LLM),这对于实时部署来说太慢且成本高昂。我们提出SCOUT:基于场景图探索的开放世界交互式物体搜索学习效用,这是一种新颖的方法,通过使用关系探索启发式(如房间-物体包含和物体-物体共现)为房间、前沿和物体分配效用分数,直接搜索3D场景图。为了在不牺牲开放词汇泛化能力的情况下使其实用,我们提出了一种离线程序化蒸馏框架,将LLM中的结构化关系知识提取到轻量级模型中,用于机器人上的推理。此外,我们提出了SymSearch,一个用于评估交互式物体搜索任务中语义推理的可扩展符号基准。在符号和模拟环境中的广泛评估表明,SCOUT优于基于嵌入相似性的方法,并在保持计算效率的同时达到LLM级别的性能。最后,真实世界实验证明了向物理环境的有效迁移,在现实感知和导航约束下实现了开放世界交互式物体搜索。

英文摘要

Open-world interactive object search in household environments requires understanding semantic relationships between objects and their surrounding context to guide exploration efficiently. Prior methods either rely on vision-language embeddings similarity, which does not reliably capture task-relevant relational semantics, or large language models (LLMs), which are too slow and costly for real-time deployment. We introduce SCOUT: Scene Graph-Based Exploration with Learned Utility for Open-World Interactive Object Search, a novel method that searches directly over 3D scene graphs by assigning utility scores to rooms, frontiers, and objects using relational exploration heuristics such as room-object containment and object-object co-occurrence. To make this practical without sacrificing open-vocabulary generalization, we propose an offline procedural distillation framework that extracts structured relational knowledge from LLMs into lightweight models for on-robot inference. Furthermore, we present SymSearch, a scalable symbolic benchmark for evaluating semantic reasoning in interactive object search tasks. Extensive evaluations across symbolic and simulation environments show that SCOUT outperforms embedding similarity-based methods and matches LLM-level performance while remaining computationally efficient. Finally, real-world experiments demonstrate effective transfer to physical environments, enabling open-world interactive object search under realistic sensing and navigation constraints.

2603.05425 2026-05-28 cs.CV cs.AI

RelaxFlow: Text-Driven Amodal 3D Generation

RelaxFlow: 文本驱动的非模态3D生成

Jiayin Zhu, Guoji Fu, Xiaolu Liu, Qiyuan He, Yicong Li, Angela Yao

发表机构 * National University of Singapore(新加坡国立大学) Zhejiang University(浙江大学) University of Science and Technology of China(中国科学技术大学)

AI总结 针对遮挡下图像到3D生成的语义歧义问题,提出无训练的双分支框架RelaxFlow,通过多先验共识模块和松弛机制解耦控制粒度,实现文本提示引导下对未观察区域的补全,同时严格保留输入观测。

Comments Accepted as a spotlight presentation at ICML 2026. Code: https://github.com/viridityzhu/RelaxFlow

详情
AI中文摘要

图像到3D生成在遮挡下面临固有的语义歧义,仅凭部分观测通常不足以确定物体类别。在这项工作中,我们形式化了文本驱动的非模态3D生成,其中文本提示引导对未观察区域的补全,同时严格保留输入观测。关键的是,我们识别出这些目标需要不同的控制粒度:对观测的刚性控制与对提示的松弛结构控制。为此,我们提出RelaxFlow,一个无训练的双分支框架,通过多先验共识模块和松弛机制解耦控制粒度。理论上,我们证明我们的松弛等价于在生成向量场上应用低通滤波器,抑制高频实例细节以隔离适应观测的几何结构。为便于评估,我们引入了两个诊断基准:ExtremeOcc-3D和AmbiSem-3D。大量实验表明,RelaxFlow成功引导未观察区域的生成以匹配提示意图,同时不损害视觉保真度。

英文摘要

Image-to-3D generation faces inherent semantic ambiguity under occlusion, where partial observation alone is often insufficient to determine object category. In this work, we formalize text-driven amodal 3D generation, where text prompts steer the completion of unseen regions while strictly preserving input observation. Crucially, we identify that these objectives demand distinct control granularities: rigid control for the observation versus relaxed structural control for the prompt. To this end, we propose RelaxFlow, a training-free dual-branch framework that decouples control granularity via a Multi-Prior Consensus Module and a Relaxation Mechanism. Theoretically, we prove that our relaxation is equivalent to applying a low-pass filter on the generative vector field, which suppresses high-frequency instance details to isolate geometric structure that accommodates the observation. To facilitate evaluation, we introduce two diagnostic benchmarks, ExtremeOcc-3D and AmbiSem-3D. Extensive experiments demonstrate that RelaxFlow successfully steers the generation of unseen regions to match the prompt intent without compromising visual fidelity.

2602.22769 2026-05-28 cs.AI cs.LG

AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

AMA-Bench:评估智能体应用的长时记忆

Yujie Zhao, Boqin Yuan, Junbo Huang, Haocheng Yuan, Zhongming Yu, Haozhou Xu, Lanxiang Hu, Abhilash Shankarampeta, Zimeng Huang, Wentao Ni, Yuandong Tian, Jishen Zhao

发表机构 * UCSD(加州大学圣塔克拉拉分校) Recursive

AI总结 提出AMA-Bench基准,通过真实与合成轨迹评估LLM智能体的长时记忆,并基于因果图与工具增强检索提出AMA-Agent系统,在基准上提升11.16%准确率。

详情
AI中文摘要

大型语言模型(LLM)越来越多地被用作复杂、长时应用中的自主智能体,其中有效的记忆对于持续性能至关重要。然而,现有的记忆基准主要围绕对话,而真实的智能体记忆由连续的智能体-环境交互轨迹组成,包括状态、动作、观察和工具输出。为填补这一空白,我们引入了**AMA-Bench**(任意长度的智能体记忆),一个在现实智能体设置中评估长时记忆的基准。AMA-Bench结合了来自代表性应用的真实智能体轨迹与专家策划的问答,以及可扩展到任意视野的合成轨迹与基于规则的问答。我们的研究表明,现有记忆系统表现不佳,因为它们未能捕获因果和客观信息,并严重依赖有损的基于相似性的检索。我们进一步提出了**AMA-Agent**,一个基于因果图构建和工具增强检索的记忆系统。AMA-Agent在AMA-Bench上达到**57.22%**的准确率,超过最强基线**11.16%**。资源可在[https://ama-bench.github.io/](https://ama-bench.github.io/)获取。

英文摘要

Large Language Models (LLMs) are increasingly used as autonomous agents in complex, long-horizon applications, where effective memory is critical for sustained performance. Yet existing memory benchmarks are largely dialogue-centric, while real agent memory consists of continuous agent-environment interaction trajectories composed of states, actions, observations, and tool outputs. To address this gap, we introduce **AMA-Bench** (**A**gent **M**emory with **A**ny length), a benchmark for evaluating long-horizon memory in realistic agentic settings. AMA-Bench combines real-world agent trajectories from representative applications with expert-curated QA, as well as synthetic trajectories that scale to arbitrary horizons with rule-based QA. Our study shows that existing memory systems underperform because they fail to capture causal and objective information and rely heavily on lossy similarity-based retrieval. We further propose **AMA-Agent**, a memory system based on causality-graph construction and tool-augmented retrieval. AMA-Agent achieves **57.22%** accuracy on AMA-Bench, outperforming the strongest baseline by **11.16%**. Resources are available at: [https://ama-bench.github.io/](https://ama-bench.github.io/).

2603.01766 2026-05-28 cs.RO

Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models

神经隐式动作场:从离散路点到连续函数的视觉-语言-动作模型

Haoyun Liu, Jianzhuang Zhao, Xinyuan Chang, Tianle Shi, Chuanzhang Meng, Jiayuan Tan, Feng Xiong, Tong Lin, Dongjie Huo, Mu Xu, SongLin Dong, Zhiheng Ma, Yihong Gong, Sheng Zhong

发表机构 * State Key Laboratory for Novel Software Technology, Nanjing University(南京大学新型软件技术国家重点实验室) Faculty of Computility Microelectronics, Shenzhen University of Advanced Technology(深圳大学计算微电子学院) Guangdong Provincial Key Laboratory of Computility Microelectronics(广东省计算微电子重点实验室) Amap, Alibaba Group(阿里集团Amap) Shenzhen University(深圳大学) Xi'an Jiaotong University(西安交通大学) Beijing University of Chemical Technology(北京化工大学)

AI总结 针对视觉-语言-动作模型预测离散动作路点与物理运动连续性不匹配的问题,提出神经隐式动作场(NIAF),通过将动作表示从离散路点重构为连续函数,实现任意时间分辨率的连续动作流形合成,支持解析求导和显式速度监督,提升控制平滑性和物理合理性。

Comments Accepted at ICML 2026

详情
AI中文摘要

尽管视觉-语言-动作(VLA)模型取得了快速进展,但将动作块预测为离散路点的普遍做法在结构上与物理运动的内在连续性不一致。这种离散化自然源于固定频率的机器人数据收集和大语言模型的逐词预测范式,但将动作绑定到固定的采样率,不能自然支持解析一致的高阶导数,并引入量化伪影,阻碍精确、柔顺的交互。我们提出神经隐式动作场(NIAF),将块级动作表示从离散路点重构为连续动作函数。通过使用视觉-语言模型作为可学习运动先验上的分层频谱调制器,NIAF 合成具有任意时间分辨率的连续时间动作流形。这种公式支持解析微分,允许显式监督速度和正则化高阶导数信号,以促进数学一致性、物理合理性和控制平滑性。我们的方法在 CALVIN 和 LIBERO 上跨多种骨干网络取得了强劲结果。真实世界实验进一步证实 NIAF 支持稳定的阻抗控制,桥接了策略侧动作生成和执行侧平滑控制。

英文摘要

Despite the rapid progress of vision-language-action (VLA) models, the prevailing practice of predicting action chunks as discrete waypoints remains structurally misaligned with the intrinsic continuity of physical motion. This discretization arises naturally from fixed-rate robot data collection and the token-by-token prediction paradigm of large language models, but ties actions to rigid sampling rates, does not naturally support analytically consistent higher-order derivatives, and introduces quantization artifacts that hinder precise, compliant interaction. We propose Neural Implicit Action Fields (NIAF), which reformulates chunk-level action representation from discrete waypoints to continuous action functions. Using a vision-language model as a hierarchical spectral modulator over a learnable motion prior, NIAF synthesizes continuous-time action manifolds with arbitrary temporal resolution. This formulation enables analytical differentiation, allowing explicit supervision of velocity and regularization of higher-order derivative signals to promote mathematical consistency, physical plausibility, and control smoothness. Our approach achieves strong results on CALVIN and LIBERO across diverse backbones. Real-world experiments further confirm that NIAF supports stable impedance control, bridging policy-side action generation and execution-side smooth control.

2603.00349 2026-05-28 cs.AI cs.MA

COOP$^2$: Defining, Observing, and Repairing Cooperation in LLM Multi-Agent Systems

COOP$^2$: 定义、观察和修复LLM多智能体系统中的合作

Hanqing Yang, Narjes Nourzad, Shiyu Chen, Marie Siew, Jingdi Chen, Carlee Joe-Wong

发表机构 * Carnegie Mellon University(卡内基梅隆大学) University of Southern California(南加州大学) Singapore University of Technology and Design(新加坡科技设计大学) University of Arizona(亚利桑那大学)

AI总结 提出COOP$^2$框架,通过将高层合作动态与任务进度关联,定义可验证的合作任务,并开发COOP$^2$-Repair方法预测约束失败并引导修复,提升LLM多智能体系统的任务成功率和约束满足度。

详情
AI中文摘要

许多复杂任务需要超出单个智能体能力的持续努力、多样化能力或协调行动。然而,简单地增加更多智能体并不能保证更好的性能,因为有效的合作取决于智能体之间以及智能体与任务结构之间的交互方式,以满足随时间演变的约束。对于基于LLM的多智能体系统(LLM-MAS),这一挑战被放大:计划、消息和修订以自然语言发生,而任务进展依赖于具体环境中的行动。当前的评估大多将合作视为最终任务成功的隐含因素,使得合作以及多智能体交互对任务动态的影响难以研究。我们引入了COOP$^2$,一个评估框架,将LLM-MAS中的高层智能体合作动态与环境中的任务进展联系起来。COOP$^2$定义了具有可验证合作需求的合作任务,使我们能够分析合作如何随时间相对于任务进展展开,以及合作在何处和为何破裂。基于此框架,我们开发了COOP$^2$-Repair,它从群体计划中预测约束失败,并打开有针对性的修复通道以进行引导修订。在两个环境和三种通信结构下,COOP$^2$-Repair提高了任务成功率和约束满足度,同时暴露了修复所需的额外决策开销和通信负载。项目网页见:https://happyeureka.github.io/coop2。

英文摘要

Many complex tasks require extended effort, diverse capabilities, or coordinated actions beyond what a single agent can provide. However, simply adding more agents does not guarantee better performance, as effective cooperation depends on how agents interact with each other and with task structure to satisfy evolving constraints over time. This challenge is amplified for LLM-based multi-agent systems (LLM-MAS): plans, messages, and revisions occur in natural language, whereas task progress depends on grounded environment actions. Current evaluations mostly treat cooperation as an implicit ingredient of final task success, leaving both cooperation and the effect of multi-agent interaction on task dynamics difficult to study. We introduce COOP$^2$, an evaluation framework that grounds high-level agent cooperation dynamics in LLM-MAS within task progress in the environment. COOP$^2$ then defines cooperative tasks with verifiable cooperative requirements, allowing us to analyze how cooperation unfolds over time with respect to task progress, as well as where and why cooperation breaks down. Building on this framework, we develop COOP$^2$-Repair, which predicts constraint failures from group plans and opens targeted repair channels for guided revisions. Across two environments and three communication structures, COOP$^2$-Repair improves task success and constraint satisfaction while exposing the additional decision overhead and communication load required for repair. The project web page can be found at: https://happyeureka.github.io/coop2.

2603.00309 2026-05-28 cs.AI cs.MA

DIG to Heal: Scaling General-purpose Agent Collaboration via Explainable Dynamic Decision Paths

DIG to Heal: 通过可解释的动态决策路径扩展通用智能体协作

Hanqing Yang, Hyungwoo Lee, Yuhang Yao, Zhiwei Liu, Kay Liu, Jingdi Chen, Carlee Joe-Wong

发表机构 * Carnegie Mellon University(卡内基梅隆大学) University of Arizona(亚利桑那大学) Zoom Salesforce Amazon(亚马逊)

AI总结 提出动态交互图(DIG)框架,将通用LLM智能体的涌现协作建模为时变因果网络,首次实现协作过程的可观察、可解释与实时纠错。

详情
AI中文摘要

日益流行的智能体AI范式有望利用多个通用大语言模型(LLM)智能体的能力协作完成复杂任务。尽管许多智能体AI系统通过预定义工作流或固定智能体角色来降低复杂性,但理想情况是支持真正自主的智能体,能够在多个交互智能体之间实现涌现协作。然而在实践中,这种非结构化交互常常导致冗余工作和级联故障,难以解释或纠正。在这项工作中,我们研究了由通用LLM智能体组成的多智能体系统,这些智能体通过涌现协作解决问题,而不依赖预定义角色、控制流或通信约束。我们引入了动态交互图(DIG),它将涌现协作捕获为智能体激活和交互的时变因果网络。DIG首次使涌现协作变得可观察和可解释,能够直接从智能体的协作路径中实时识别、解释和纠正协作引发的错误模式。因此,DIG填补了理解通用LLM智能体如何在真正智能体化的多智能体系统中共同解决问题的关键空白。项目网页见:https://happyeureka.github.io/dig。

英文摘要

The increasingly popular agentic AI paradigm promises to harness the power of multiple, general-purpose large language model (LLM) agents to collaboratively complete complex tasks. While many agentic AI systems reduce complexity through predefined workflows or fixed agent roles, the ideal is to support truly autonomous agents capable of emergent collaboration across many interacting agents. Yet in practice, such unstructured interactions often lead to redundant work and cascading failures that are difficult to interpret or correct. In this work, we study multi-agent systems composed of general-purpose LLM agents that solve problems through emergent collaboration, without relying on predefined roles, control flows, or communication constraints. We introduce the Dynamic Interaction Graph (DIG), which captures emergent collaboration as a time-evolving causal network of agent activations and interactions. DIG makes emergent collaboration observable and explainable for the first time, enabling real-time identification, explanation, and correction of collaboration-induced error patterns directly from agents' collaboration paths. Thus, DIG fills a critical gap in understanding how general LLM agents solve problems together in truly agentic multi-agent systems. The project webpage can be found at: https://happyeureka.github.io/dig.

2502.17055 2026-05-28 cs.LG cs.AI

GradientStabilizer:Fix the Norm, Not the Gradient

GradientStabilizer:固定范数,而非梯度

Tianjin Huang, Zhangyang Wang, Haotian Hu, Zhenyu Zhang, Gaojie Jin, Xiang Li, Li Shen, Jiaxing Shang, Tianlong Chen, Ke Li, Lu Liu, Qingsong Wen, Shiwei Liu

发表机构 * Department of Computer Science, University of Exeter(埃克塞特大学计算机科学系) Department of Mathematics and Computer Science, Eindhoven University of Technology(埃因霍温理工大学数学与计算机科学系) School of the Gifted Young, University of Science and Technology of China(中国科学技术大学天才青年学院) Department of Electrical and Computer Engineering, University of Texas at Austin(德克萨斯大学奥斯汀分校电气与计算机工程系) Department of Computer Science, University of Reading(阅读大学计算机科学系) School of Cyber Science and Technology, Sun Yat-sen University(中山大学网络科学与技术学院) Department of Computer Science, The University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校计算机科学系) ELLIS Institute Tubingen(图宾根ELLIS研究所) Max Planck Institute for Intelligent Systems(马克斯·普朗克智能系统研究所) Tübingen AI Center, Tübingen, Germany(图宾根人工智能中心,德国图宾根) College of Computer Science, Chongqing University(重庆大学计算机学院)

AI总结 提出GradientStabilizer,一种轻量级梯度变换方法,通过统计稳定的梯度范数估计替换更新幅度,在不改变梯度方向的前提下抑制极端梯度尖峰,从而提升训练稳定性并减少发散。

Comments Accepted By ICML2026

详情
AI中文摘要

现代深度学习系统中的训练不稳定性通常由罕见但极端的梯度范数尖峰引发,这些尖峰可能导致参数更新过大、破坏优化器状态,并导致缓慢恢复或发散。广泛使用的保护措施如梯度裁剪可以缓解这些故障,但需要调整阈值且不加区分地截断大更新。我们提出GradientStabilizer,一种轻量级、即插即用的梯度变换方法,它在保留瞬时梯度方向的同时,用从运行梯度范数统计中导出的统计稳定估计替换更新幅度。我们证明了在尖峰步骤上,得到的稳定幅度一致有界,与尖峰大小无关,并展示了这种有界性如何控制自适应方法中优化器状态的演化。在LLM预训练(FP16)、量化感知预训练(FP4)、ImageNet分类、强化学习和时间序列预测中,GradientStabilizer一致地提高了训练稳定性,扩大了稳定学习率区域,并相对于基于裁剪的基线减少了发散,甚至显著降低了Adam对权重衰减强度的敏感性。代码即将发布。

英文摘要

Training instability in modern deep learning systems is frequently triggered by rare but extreme gradient-norm spikes, which can induce oversized parameter updates, corrupt optimizer state, and lead to slow recovery or divergence. Widely used safeguards such as gradient clipping mitigate these failures but require threshold tuning and indiscriminately truncate large updates. We propose GradientStabilizer, a lightweight, drop-in gradient transform that preserves the instantaneous gradient direction while replacing the update magnitude with a statistically stabilized estimate derived from running gradient-norm statistics. We prove that the resulting stabilized magnitude is uniformly bounded on spike steps, independent of the spike size, and show how this boundedness controls optimizer state evolution in adaptive methods. Across LLM pre-training (FP16), quantization-aware pre-training (FP4), ImageNet classification, reinforcement learning, and time-series forecasting, GradientStabilizer consistently improves training stability, widens stable learning-rate regions, and reduces divergence relative to clipping-based baselines, even substantially reducing Adam's sensitivity to weight-decay strength. Code will be released soon.

2602.22787 2026-05-28 cs.CL cs.AI

Probing for Knowledge Attribution in Large Language Models

探测大型语言模型中的知识归因

Ivo Brink, Alexander Boer, Dennis Ulmer

发表机构 * KPMG NL(KPMG荷兰分公司) University of Amsterdam(阿姆斯特丹大学)

AI总结 本文通过线性探针从隐藏表示中分类大型语言模型输出的主导知识来源(记忆或上下文),并引入自监督流水线AttriWiki生成训练数据,在多个模型和数据集上达到高F1分数。

详情
AI中文摘要

大型语言模型(LLM)的幻觉,即流畅但事实不正确的生成,分为两类:忠实性违反,即模型误用提供的上下文;以及事实性违反,即答案反映内部知识中的错误。适当的缓解取决于知道哪个来源驱动每个答案。我们研究贡献性归因,即对每个输出背后的主导知识来源进行分类,并表明在隐藏表示上训练的简单线性探针可以可靠地识别它。我们引入了AttriWiki,一个自监督流水线,通过提示模型从记忆中回忆被隐藏的实体或从上下文中读取它们,而不依赖知识冲突,自动生成标记的训练数据。在AttriWiki上训练的探针在Llama-3.1-8B、Mistral-7B和Qwen-7B上达到高达0.96的Macro-$F_1$,迁移到SQuAD和WebQuestions时达到0.94-0.99的Macro-$F_1$,并零样本泛化到Tighidet等人(2024)的基准,在冲突设置上无需重新训练即优于他们的探针。此外,归因不匹配会使错误率提高高达70%,尽管正确的归因并不能保证正确的答案,这表明需要更广泛的检测框架。

英文摘要

Large language model (LLM) hallucinations, meaning fluent but factually incorrect generations, fall into two types: faithfulness violations, where the model misuses provided context, and factuality violations, where answers reflect errors in internal knowledge. Proper mitigation depends on knowing which source drives each answer. We study contributive attribution, i.e. the classification of the dominant knowledge source behind each output, and show that a simple linear probe trained on hidden representations can reliably identify it. We introduce AttriWiki, a self-supervised pipeline that automatically generates labelled training data by prompting models to recall withheld entities from memory or read them from context without relying on knowledge conflicts. Probes trained on AttriWiki achieve up to 0.96 Macro-$F_1$ on Llama-3.1-8B, Mistral-7B, and Qwen-7B, transfer to SQuAD and WebQuestions with 0.94-0.99 Macro-$F_1$, and generalise zero-shot to Tighidet et al. (2024)'s benchmark, outperforming their probe on conflicting settings without retraining. Furthermore, attribution mismatches raise error rates by up to 70%, though correct attribution does not guarantee correct answers, pointing to the need for broader detection frameworks.

2602.22096 2026-05-28 cs.CV

WeatherCity: Urban Scene Reconstruction with Controllable Multi-Weather Transformation

WeatherCity: 可控多天气变换的城市场景重建

Wenhua Wu, Huai Guan, Zhe Liu, Hesheng Wang

发表机构 * School of Automation and Intelligent Sensing, Shanghai Jiao Tong University(自动化与智能感知学院,上海交通大学)

AI总结 提出WeatherCity框架,利用文本引导的图像编辑、天气高斯表示和物理驱动模型,实现高保真、时间一致的4D城市场景重建与多天气编辑。

详情
AI中文摘要

可编辑的高保真4D场景对于自动驾驶至关重要,因为它们可以应用于端到端训练和闭环仿真。然而,现有的重建方法主要局限于复制观察到的场景,缺乏多样化的天气模拟能力。而图像级别的天气编辑方法往往引入场景伪影,并且对天气效果的可控性较差。为了解决这些限制,我们提出了 extbf{WeatherCity},一个用于4D城市场景重建和天气编辑的新框架。具体来说,我们利用文本引导的图像编辑模型来实现图像天气背景的灵活编辑。为了应对多天气建模的挑战,我们引入了一种基于共享场景特征和专用天气解码器的新型天气高斯表示。这种表示进一步通过内容一致性优化得到增强,确保不同天气条件下的连贯建模。此外,我们设计了一个物理驱动模型,通过粒子和运动模式模拟动态天气效果。在多个数据集和各种场景上的大量实验表明,WeatherCity在4D重建和天气编辑中实现了灵活的可控性、高保真度和时间一致性。我们的框架不仅能够对天气条件(例如小雨和大雪)进行细粒度控制,还支持场景内的物体级操作。代码已发布在https://github.com/IRMVLab/WeatherCity。

英文摘要

Editable high-fidelity 4D scenes are crucial for autonomous driving, as they can be applied to end-to-end training and closed-loop simulation. However, existing reconstruction methods are primarily limited to replicating observed scenes and lack the capability for diverse weather simulation. While image-level weather editing methods tend to introduce scene artifacts and offer poor controllability over the weather effects. To address these limitations, we propose \textbf{WeatherCity}, a novel framework for 4D urban scene reconstruction and weather editing. Specifically, we leverage a text-guided image editing model to achieve flexible editing of image weather backgrounds. To tackle the challenge of multi-weather modeling, we introduce a novel weather Gaussian representation based on shared scene features and dedicated weather-specific decoders. This representation is further enhanced with a content consistency optimization, ensuring coherent modeling across different weather conditions. Additionally, we design a physics-driven model that simulates dynamic weather effects through particles and motion patterns. Extensive experiments on multiple datasets and various scenes demonstrate that WeatherCity achieves flexible controllability, high fidelity, and temporal consistency in 4D reconstruction and weather editing. Our framework not only enables fine-grained control over weather conditions (e.g., light rain and heavy snow) but also supports object-level manipulation within the scene. Codes are released at https://github.com/IRMVLab/WeatherCity.

2602.20020 2026-05-28 cs.CL

CodeGENCAT: Generative Computerized Adaptive Testing for Open-ended Coding Problems

CodeGENCAT:面向开放式编程问题的生成式计算机自适应测试

Wanyong Feng, Alexander Scarlatos, Ruochen Sun, Andrew Lan

发表机构 * University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校) Independent Researcher(独立研究者)

AI总结 提出CodeGENCAT框架,通过生成式项目反应理论模型预测学生代码响应,并设计三种选题算法,在编程教育数据集上优于现有CAT基线。

Comments 23 pages, 2 figures

详情
AI中文摘要

现有的计算机自适应测试(CAT)框架通常根据预测的学生正确回答概率来选题。这种设计忽略了学生开放式回答中包含的信息,尤其是在编程教育等领域,代码结构和错误蕴含丰富的学生知识信息。在这项工作中,我们提出了 extbf{Code} extbf{GEN}erative extbf{CAT}( extbf{CodeGENCAT}),一种使用预测的学生代码响应来选题的生成式CAT框架。首先,我们开发了一个生成式项目反应理论(GIRT)模型,该模型根据估计的学生知识生成代码响应,通过监督微调和直接偏好优化进行知识-响应对齐训练。其次,我们引入了三种选题算法,分别衡量不确定性、编码风格多样性以及从预测的学生代码响应中提取的信息。在两个真实世界的编程教育数据集上的实验表明,CodeGENCAT优于所有CAT基线,在自适应测试早期阶段,AUC比最强基线提高了4.32%。

英文摘要

Existing Computerized Adaptive Testing (CAT) frameworks typically select questions based on the predicted likelihood that the student will answer correctly. This design ignores information contained in students' open-ended responses, especially in domains such as programming education, where code structures and bugs contain rich information on student knowledge. In this work, we propose \textbf{Code} \textbf{GEN}erative \textbf{CAT} (\textbf{CodeGENCAT}), a generative CAT framework that selects questions using predicted student code responses. First, we develop a Generative Item Response Theory (GIRT) model that generates code responses conditioned on estimated student knowledge, trained with supervised fine-tuning followed by direct preference optimization for knowledge-response alignment. Second, we introduce three question-selection algorithms that measure uncertainty, coding style diversity, and information from predicted student code responses. Experiments on two real-world programming education datasets show that CodeGENCAT outperforms all CAT baselines, achieving an AUC improvement of up to 4.32\% over the strongest baseline in the early stages of adaptive testing.

2602.18647 2026-05-28 cs.LG cs.AI cs.CV cs.IT math.IT

Noise Scheduling as Information-Guided Allocation in Diffusion Training

噪声调度作为扩散训练中的信息引导分配

Gabriel Raya, Bac Nguyen, Georgios Batzolis, Yuhta Takida, Dejan Stancevic, Naoki Murata, Chieh-Hsin Lai, Yuki Mitsufuji, Luca Ambrogioni

发表机构 * Tilburg University & JADS(蒂尔堡大学及JADS) Sony AI(索尼人工智能) University of Cambridge(剑桥大学) Radboud University(拉德堡德大学) Sony Group Corporation(索尼集团公司)

AI总结 提出InfoNoise,一种在线自适应噪声调度方法,通过估计条件熵率剖面动态调整训练噪声分布,以优化去噪任务中的信息增益,在图像、DNA和语言生成等任务中达到或超越基线,并节省高达3倍训练计算量。

详情
AI中文摘要

我们引入了InfoNoise,一种用于扩散训练的在线自适应噪声调度,它将优化努力重新分配到去噪最具信息量的噪声水平上。与损失加权一起,噪声调度在去噪问题之间诱导出有效的分配,而这种分配通常在知道信息性噪声水平之前就已固定。InfoNoise通过从训练期间的去噪损失中估计条件熵率剖面,使这种分配具有数据自适应性,无需辅助模型或离线搜索。通过I--MMSE,该剖面识别出噪声观测在何处能快速减少关于干净样本的不确定性,并指导训练噪声分布的适应。它只改变这个分布,保持目标、加权和参数化不变。在图像基准测试中,调度已被广泛调整,InfoNoise匹配或略微超过强基线,并且可以用更少的更新达到相同的质量。在表示、序列和模态转换(包括DNA和语言生成)上,InfoNoise优于固定和自适应基线,并且达到目标质量所需的训练计算量最多减少3倍。这些结果确立了条件熵率剖面作为噪声调度设计的数据依赖目标,并使在线自适应成为手动调度搜索的实用替代方案。

英文摘要

We introduce InfoNoise, an online adaptive noise schedule for diffusion training that reallocates optimization effort toward noise levels where denoising is most informative. Together with loss weighting, a noise schedule induces an effective allocation across denoising problems, often fixed before informative noise levels are known. InfoNoise makes this allocation data-adaptive by estimating a conditional-entropy-rate profile from denoising losses during training, without auxiliary models or offline search. Through I--MMSE, this profile identifies where noisy observations rapidly reduce uncertainty about the clean sample and guides adaptation of the training noise distribution. It changes only this distribution, keeping the objective, weighting, and parameterization fixed. On image benchmarks, where schedules have been extensively tuned, InfoNoise matches or slightly exceeds strong baselines and can reach the same quality with fewer updates. On representation, sequence, and modality shifts, including DNA and language generation, InfoNoise improves over fixed and adaptive baselines and reaches target quality with up to $3\times$ less training compute. These results establish the conditional-entropy-rate profile as the data-dependent target for noise schedule design and make online adaptation a practical alternative to manual schedule search.

2602.17003 2026-05-28 cs.CL cs.AI

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

Persona2Web: 基于用户历史进行上下文推理的个性化Web智能体基准

Serin Kim, Sangam Lee, Dongha Lee

发表机构 * Department of Artificial Intelligence, Yonsei University, Seoul, Republic of Korea(人工智能系,延世大学,首尔,大韩民国)

AI总结 提出Persona2Web基准,通过澄清-个性化原则评估Web智能体在真实开放网络中利用用户历史解决模糊查询的个性化能力,并引入推理感知评估框架。

Comments Accepted to ICML 2026

详情
AI中文摘要

大型语言模型推动了Web智能体的发展,但当前的智能体缺乏个性化能力。由于用户很少明确说明其意图的每个细节,实用的Web智能体必须能够通过推断用户偏好和上下文来解释模糊查询。为应对这一挑战,我们提出了Persona2Web,这是首个在真实开放网络上评估个性化Web智能体的基准,基于澄清-个性化原则构建,要求智能体根据用户历史而非依赖显式指令来解决歧义。Persona2Web包括:(1) 在长时间跨度内隐含揭示偏好的用户历史,(2) 需要智能体推断隐含用户偏好的模糊查询,以及(3) 一个推理感知评估框架,能够对个性化进行细粒度评估。我们针对各种智能体架构、骨干模型、历史访问方案和不同模糊程度的查询进行了广泛实验,揭示了个性化Web智能体行为中的关键挑战。为便于复现,我们的代码和数据集公开在 https://serin-kimm.github.io/Persona2Web/。

英文摘要

Large language models have advanced web agents, yet current agents lack personalization capabilities. Since users rarely specify every detail of their intent, practical web agents must be able to interpret ambiguous queries by inferring user preferences and contexts. To address this challenge, we present Persona2Web, the first benchmark for evaluating personalized web agents on the real open web, built upon the clarify-to-personalize principle, which requires agents to resolve ambiguity based on user history rather than relying on explicit instructions. Persona2Web consists of: (1) user histories that reveal preferences implicitly over long time spans, (2) ambiguous queries that require agents to infer implicit user preferences, and (3) a reasoning-aware evaluation framework that enables fine-grained assessment of personalization. We conduct extensive experiments across various agent architectures, backbone models, history access schemes, and queries with varying ambiguity levels, revealing key challenges in personalized web agent behavior. For reproducibility, our codes and datasets are publicly available at https://serin-kimm.github.io/Persona2Web/.