arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3868
2602.17245 2026-06-09 cs.AI 版本更新

Web Agents Should Use Typed Actions Instead of Click-Based Browsing

Web 智能体应使用类型化动作而非基于点击的浏览

Linxi Jiang, Rui Xi, Zhijie Liu, Shuo Chen, Zhiqiang Lin, Suman Nath

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出通过语义层支持的类型化动作(web verbs)替代低层交互原语,以构建可靠、可审计的Web智能体,并通过案例展示其优势。

Comments Accepted to the ICML 2026 Position Paper Track

详情
AI中文摘要

这篇立场论文认为,构建可靠的智能体Web需要从低层交互原语转向由语义层支持的类型化动作。当前的Web智能体主要通过点击、按键和DOM操作运行,这导致长程行为脆弱、执行成本高且可审计性有限。我们提出web verbs作为该层的具体设计。一个verb将Web操作暴露为类型化函数,具有结构化输入、结构化输出和文档化行为,无论其背后是服务器端Web API还是维护的客户端工作流。Verb调用可以携带前置条件、后置条件、策略标签和日志钩子,使智能体能够合成具有显式控制流和数据流的简洁程序,并生成可检查的执行轨迹。通过代表性案例研究,我们展示了verb级组合如何产生正确、可复现的结果,而使用低层交互原语的浏览器智能体可能产生脆弱行为或错误推理。最后,我们呼吁采取行动,标准化、开发工具和社区流程,以使该语义层在Web规模上可部署且值得信赖。

英文摘要

This position paper argues that building a reliable agentic Web requires shifting from low-level interaction primitives to typed actions supported by a semantic layer. Today's web agents primarily operate through clicks, keystrokes, and DOM manipulation, which leads to brittle long-horizon behavior, high execution cost, and limited auditability. We propose web verbs as a concrete design for this layer. A verb exposes a web operation as a typed function with structured inputs, structured outputs, and documented behavior, whether it is backed by a server-side Web API or a maintained client-side workflow. Verb calls can carry preconditions, postconditions, policy tags, and logging hooks, allowing agents to synthesize concise programs with explicit control flow and data flow and to produce checkable execution traces. Using representative case studies, we illustrate how verb-level composition can produce correct, reproducible outcomes, while browser agents using low-level interaction primitives may produce brittle behavior or incorrect reasoning. We conclude with a call to action on standardization, developer tooling, and community processes needed to make this semantic layer deployable and trustworthy at web scale.

2602.16224 2026-06-09 cs.LG 版本更新

Amortized Predictability-aware Training Framework for Time Series Forecasting and Classification

面向时间序列预测与分类的摊销可预测性感知训练框架

Xu Zhang, Peng Wang, Yichen Li, Wei Wang

发表机构 * Shanghai Key Laboratory of Data Science, College of Computer Science and Artificial Intelligence Fudan University(复旦大学计算机科学与人工智能学院上海数据科学关键实验室) Department of Electrical and Computer Engineering University of British Columbia (UBC)(英属哥伦比亚大学电气与计算机工程系)

AI总结 提出APTF框架,通过分层可预测性感知损失和摊销模型识别并惩罚低可预测性样本,提升时间序列预测与分类性能。

Comments This work is accepted by the proceedings of the ACM Web Conference 2026 (WWW 2026). The code is available at the link https://github.com/Meteor-Stars/APTF

详情
AI中文摘要

时间序列数据在各个领域容易受到噪声的影响,训练样本可能包含偏离正常数据分布的低可预测性模式,导致训练不稳定或收敛到较差的局部最小值。因此,减轻低可预测性样本的不利影响对于时间序列分析任务(如时间序列预测(TSF)和时间序列分类(TSC))至关重要。尽管许多深度学习模型已取得有希望的性能,但很少有模型考虑如何识别和惩罚低可预测性样本以从训练角度改进模型性能。为填补这一空白,我们提出了一个通用的摊销可预测性感知训练框架(APTF),适用于TSF和TSC。APTF引入了两个关键设计,使模型能够关注高可预测性样本,同时仍能从低可预测性样本中适当学习:(i)分层可预测性感知损失(HPL),动态识别低可预测性样本并随着训练进行逐步扩大其损失惩罚,以及(ii)一个摊销模型,减轻由模型偏差引起的可预测性估计误差,进一步增强HPL的有效性。代码可在https://github.com/Meteor-Stars/APTF获取。

英文摘要

Time series data are prone to noise in various domains, and training samples may contain low-predictability patterns that deviate from the normal data distribution, leading to training instability or convergence to poor local minima. Therefore, mitigating the adverse effects of low-predictability samples is crucial for time series analysis tasks such as time series forecasting (TSF) and time series classification (TSC). While many deep learning models have achieved promising performance, few consider how to identify and penalize low-predictability samples to improve model performance from the training perspective. To fill this gap, we propose a general Amortized Predictability-aware Training Framework (APTF) for both TSF and TSC. APTF introduces two key designs that enable the model to focus on high-predictability samples while still learning appropriately from low-predictability ones: (i) a Hierarchical Predictability-aware Loss (HPL) that dynamically identifies low-predictability samples and progressively expands their loss penalty as training evolves, and (ii) an amortization model that mitigates predictability estimation errors caused by model bias, further enhancing HPL's effectiveness. The code is available at https://github.com/Meteor-Stars/APTF.

2602.16015 2026-06-09 cs.LG 版本更新

Geometry-Aware Uncertainty Quantification via Conformal Prediction on Manifolds

几何感知的不确定性量化:流形上的保形预测

Marzieh Amiri Shahbazi, Ali Baheri

发表机构 * Rochester Institute of Technology(罗切斯特理工大学)

AI总结 提出自适应测地线保形预测框架,通过测地距离和交叉验证局部难度归一化,在球面和IGRF-14地磁场预测中实现有效覆盖并改善条件覆盖。

详情
AI中文摘要

保形预测为回归提供了有限样本覆盖保证,但大多数标准构造是针对欧几里得输出空间设计的。当响应位于黎曼流形上时,欧几里得残差和基于坐标的区域会忽略定义有意义误差的几何结构。我们提出自适应测地线保形预测,一个简单的框架,它从测地距离构建非一致性分数,并通过交叉验证的局部预测难度估计对其进行归一化。在球面上,这产生测地帽,其面积与位置无关,而它们的半径仍然适应异方差噪声。在合成球面实验和IGRF-14地磁场预测任务中,自适应方法保持了有效的边际覆盖,减少了条件覆盖的变化,并相对于非自适应和基于坐标的基线改善了最坏情况覆盖。

英文摘要

Conformal prediction gives finite-sample coverage guarantees for regression, but most standard constructions are designed for Euclidean output spaces. When the response lies on a Riemannian manifold, Euclidean residuals and coordinate-based regions can ignore the geometry that defines meaningful error. We propose adaptive geodesic conformal prediction, a simple framework that builds nonconformity scores from geodesic distances and normalizes them with a cross-validated estimate of local prediction difficulty. On the sphere, this produces geodesic caps whose area is independent of position, while their radii still adapt to heteroscedastic noise. In both a synthetic sphere experiment and an IGRF-14 geomagnetic field forecasting task, the adaptive method preserves valid marginal coverage, reduces variation in conditional coverage, and improves worst-case coverage relative to non-adaptive and coordinate-based baselines.

2602.15829 2026-06-09 cs.LG 版本更新

Operationalising the Superficial Alignment Hypothesis via Task Complexity

通过任务复杂度操作化浅层对齐假设

Tomás Vergara-Browne, Darshan Patil, Ivan Titov, Siva Reddy, Tiago Pimentel, Marius Mosbach

发表机构 * University of Maryland(马里兰大学) University of California, Berkeley(加州大学伯克利分校) University of Washington(华盛顿大学) University of Toronto(多伦多大学) University of Edinburgh(爱丁堡大学)

AI总结 提出任务复杂度指标(达到目标性能的最短程序长度)来量化浅层对齐假设,实验表明预训练大幅降低任务复杂度,而微调可将复杂度降低数个数量级。

Comments ICML 2026

详情
AI中文摘要

浅层对齐假设(SAH)认为,大型语言模型在预训练期间学习大部分知识,而后期训练只是将这些知识表面化。然而,SAH缺乏精确的定义,导致(i)支持它的不同且看似正交的论点,以及(ii)对其的重要批评。我们提出一个新的度量标准,称为任务复杂度:在任务上达到目标性能的最短程序长度。在这个框架中,SAH简单地声称预训练模型大幅降低了在许多任务上实现高性能的复杂度。我们的定义统一了先前支持SAH的论点,将它们解释为寻找此类短程序的不同策略。实验上,我们估计了数学推理、机器翻译和指令遵循的任务复杂度;然后我们表明,当以预训练模型为条件时,这些复杂度可以非常低。此外,我们发现预训练能够访问我们任务上的强性能,但可能需要千兆字节长度的程序来访问它们。另一方面,后期训练将达到相同性能的复杂度降低了几个数量级。总体而言,我们的结果强调,任务适应通常需要惊人的少量信息——通常只有几千字节。

英文摘要

The superficial alignment hypothesis (SAH) posits that large language models learn most of their knowledge during pre-training, and that post-training merely surfaces this knowledge. The SAH, however, lacks a precise definition, which has led to (i) different and seemingly orthogonal arguments supporting it, and (ii) important critiques to it. We propose a new metric called task complexity: the length of the shortest program that achieves a target performance on a task. In this framework, the SAH simply claims that pre-trained models drastically reduce the complexity of achieving high performance on many tasks. Our definition unifies prior arguments supporting the SAH, interpreting them as different strategies to find such short programs. Experimentally, we estimate the task complexity of mathematical reasoning, machine translation, and instruction following; we then show that these complexities can be remarkably low when conditioned on a pre-trained model. Further, we find that pre-training enables access to strong performances on our tasks, but it can require programs of gigabytes of length to access them. Post-training, on the other hand, collapses the complexity of reaching this same performance by several orders of magnitude. Overall, our results highlight that task adaptation often requires surprisingly little information -- often just a few kilobytes.

2602.15327 2026-06-09 cs.LG cs.AI cs.CL stat.ML 版本更新

Prescriptive Scaling Reveals the Evolution of Language Model Capabilities

规范性缩放揭示语言模型能力的演变

Hanlin Zhang, Jikai Jin, Vasilis Syrgkanis, Sham Kakade

发表机构 * Harvard University(哈佛大学) Stanford University(斯坦福大学)

AI总结 通过大规模观测评估和分位数回归,提出规范性缩放定律,将预训练计算预算映射到下游准确率,并验证其时间稳定性,引入平衡I-最优采样算法降低评估成本。

Comments ICML 2026 Oral. Blog Post: https://jkjin.com/prescriptive-scaling

详情
AI中文摘要

机器学习模型性能的提升往往源于竞争和应用。针对部署,我们考虑规范性缩放定律:给定预训练计算预算,通过当代后训练实践可获得的下游准确率是多少,以及随着领域发展该映射的稳定性如何?我们使用大规模观测评估,涵盖2022-2026年间六个基准测试的5000个现有和2000个新评估的模型检查点,通过带有单调饱和S型参数化的平滑分位数回归,估计能力边界(即基准分数作为对数预训练FLOPs函数的高条件分位数)。我们通过在早期模型代上拟合并在后续版本上评估来验证时间可靠性:在六个任务中的四个上,分布外覆盖误差低于2%,而数学推理能力边界随时间持续提升。例如,在预算为10^24 FLOPs时,IFEval上的估计可达准确率为0.83,MATH Lvl 5上为0.54。然后我们扩展方法以分析任务相关的饱和性,并探测数学推理任务中与污染相关的偏移。最后,我们引入一种平衡I-最优采样算法,该算法使用约20%的参数计数加权评估预算(某些任务低至5%)恢复接近全数据的前沿,同时保持可比的校准。总之,我们的工作发布了Proteus-2k(最新的模型性能评估数据集),并引入了一种实用方法,将计算预算转化为可靠的性能预期,并监测能力边界随时间的变化。

英文摘要

Machine learning model performance improvements tend to arise from competition and application. For deployment, we consider prescriptive scaling laws: given a pre-training compute budget, what downstream accuracy is attainable with contemporary post-training practice, and how stable is that mapping as the field evolves? Using large-scale observational evaluations with 5k existing and 2k newly evaluated model checkpoints spanning 2022-2026 across six benchmarks, we estimate capability boundaries, high conditional quantiles of benchmark scores as a function of log pre-training FLOPs, via smoothed quantile regression with a monotone, saturating sigmoid parameterization. We validate temporal reliability by fitting on earlier model generations and evaluating on later releases: across four of six tasks, the out-of-distribution coverage error remains below 2%, while math reasoning exhibits a consistently advancing boundary over time. For instance, at a budget of 10^24 FLOPs, the estimated attainable accuracies are 0.83 on IFEval and 0.54 on MATH Lvl 5. We then extend our approach to analyze task-dependent saturation and to probe contamination-related shifts on math reasoning tasks. Finally, we introduce a balanced I-optimal sampling algorithm that recovers near-full-data frontiers using roughly 20% of the parameter-count-weighted evaluation budget, as low as 5% on some tasks, while maintaining comparable calibration. Together, our work releases Proteus-2k, the latest model performance evaluation dataset, and introduces a practical methodology for translating compute budgets into reliable performance expectations and for monitoring when capability boundaries shift across time.

2602.15253 2026-06-09 cs.LG q-bio.GN 版本更新

Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics

单细胞转录组学中掩码重建Transformer的缩放定律

Ihor Kendiukhov

发表机构 * Department of Computer Science, University of Tübingen(图宾根大学计算机科学系)

AI总结 本研究首次系统探索单细胞RNA测序数据上掩码重建Transformer的缩放行为,发现数据充足时存在幂律缩放定律,数据稀缺时缩放可忽略,并指出数据-参数比是关键决定因素。

详情
AI中文摘要

神经缩放定律——损失、模型大小和数据之间的幂律关系——已在语言和视觉Transformer中得到广泛记录,但它们在单细胞基因组学中的存在性仍未得到充分探索。我们首次系统研究了在单细胞RNA测序(scRNA-seq)数据上训练的掩码重建Transformer的缩放行为。使用CELLxGENE Census的表达谱,我们构建了两种实验设置:数据丰富设置(512个高度可变基因,200,000个细胞)和数据有限设置(1,024个基因,10,000个细胞)。在参数数量跨越三个数量级(533到3.4×10^8个参数)的七种模型大小上,我们将参数化缩放定律拟合到验证均方误差(MSE)。数据丰富设置表现出清晰的幂律缩放,不可约损失下限c约为1.44,而数据有限设置显示出可忽略的缩放,表明当数据稀缺时模型容量不是约束条件。这些结果确立了类似于自然语言处理中观察到的缩放定律在单细胞转录组学中确实存在(当数据充足时),并确定了数据-参数比是缩放行为的关键决定因素。将数据丰富渐近下限初步转换为信息论单位,估计每个掩码基因位置约2.30比特熵。我们讨论了对单细胞基础模型设计的启示,并概述了完善该熵估计所需的额外测量。

英文摘要

Neural scaling laws -- power-law relationships between loss, model size, and data -- have been extensively documented for language and vision transformers, yet their existence in single-cell genomics remains largely unexplored. We present the first systematic study of scaling behaviour for masked-reconstruction transformers trained on single-cell RNA sequencing (scRNA-seq) data. Using expression profiles from the CELLxGENE Census, we construct two experimental regimes: a data-rich regime (512 highly variable genes, 200,000 cells) and a data-limited regime (1,024 genes, 10,000 cells). Across seven model sizes spanning three orders of magnitude in parameter count (533 to 3.4 x 10^8 parameters), we fit the parametric scaling law to validation mean squared error (MSE). The data-rich regime exhibits clear power-law scaling with an irreducible loss floor of c ~ 1.44, while the data-limited regime shows negligible scaling, indicating that model capacity is not the binding constraint when data are scarce. These results establish that scaling laws analogous to those observed in natural language processing do emerge in single-cell transcriptomics when sufficient data are available, and they identify the data-to-parameter ratio as a critical determinant of scaling behaviour. A preliminary conversion of the data-rich asymptotic floor to information-theoretic units yields an estimate of approximately 2.30 bits of entropy per masked gene position. We discuss implications for the design of single-cell foundation models and outline the additional measurements needed to refine this entropy estimate.

2601.02085 2026-06-09 cs.RO cs.AI 版本更新

Vision-Based Early Fault Diagnosis and Self-Recovery for Strawberry Harvesting Robots

基于视觉的草莓采摘机器人早期故障诊断与自恢复

Meili Sun, Chunjiang Zhao, Lichao Yang, Hao Liu, Shimin Hu, Ya Xiong

发表机构 * NERCITA

AI总结 针对草莓采摘机器人视觉感知差、夹爪错位、空抓/误抓和滑落等问题,提出视觉故障诊断与自恢复框架,通过SRR-Net统一感知、相对误差补偿、微光学相机反馈及LSTM滑落预测,实现高精度定位与故障恢复。

Comments Accepted by Artificial Intelligence in Agriculture

详情
AI中文摘要

草莓采摘机器人面临视觉感知差、夹爪错位、空抓/误抓和滑落等挑战,降低了采摘稳定性和效率。为解决这些问题,本文提出了一种视觉故障诊断与自恢复框架。端到端SRR-Net通过联合检测、分割和果实与夹爪的成熟度回归,实现了统一感知和故障诊断。利用这种集成感知,设计了一种由目标-夹爪同步检测驱动的相对误差补偿方法,以纠正超过容差阈值的位置错位。集成在末端执行器内的微光学相机提供实时视觉反馈。基于微光学相机,在放气阶段使用MobileNet V3-Small分类器进行夹爪调整,能够在空抓/误抓情况下提前中止采摘周期。此外,在拉断阶段应用时间序列LSTM分类器预测草莓滑落。基于这些预测,系统对滑落草莓执行重新充气和二次拉断尝试,或对已滑落草莓中止周期。实验表明,末端执行器与采摘点之间的平均绝对误差沿x轴和y轴分别从11.50 mm和5.25 mm降低到3.12 mm和4.06 mm,时间增加0.64 ± 0.24秒。夹爪调整模块将抓取阶段缩短约0.5秒,并避免了失败情况下的空放置。草莓滑落预测模块以88.89%的成功率处理滑落情况,每个采摘周期为失败情况节省约4.00秒。同时,对滑落草莓实现了81.25%的恢复率,重新抓取需要额外0.63秒。

英文摘要

Strawberry-harvesting robots faced challenges such as poor visual perception, gripper misalignment, empty grasp/misgrasp, and slippage, which reduced harvesting stability and efficiency.To overcome these issues, this paper proposes a visual fault diagnosis and self-recovery framework. An end-to-end SRR-Net achieved unified perception and fault diagnosis through joint detection, segmentation, and ripeness regression of the fruit and gripper. Leveraging this integrated perception, a relative error compensation method driven by simultaneous target-gripper detection was designed to correct positional misalignments exceeding the tolerance threshold. A micro-optical camera integrated within the end-effector delivered real-time visual feedback. Based on the micro-optical camera, a MobileNet V3-Small classifier was utilized for grasp adjustment during the deflating stage, enabling the early abort of the harvesting cycle in cases of empty grasp/misgrasps. Furthermore, a time-series LSTM classifier was applied during the snap-off stage to predict strawberry slippage. Based on these predictions, the system executed re-inflation and a secondary snap-off attempt for slipping strawberries, or aborted the cycle for slipped strawberries. Experiments demonstrated that the mean absolute errors between the end-effector and the picking point were reduced to 3.12 mm and 4.06 mm from 11.50 mm and 5.25 mm along the x- and y-axes, respectively, at the cost of a time increment of 0.64 $pm$ 0.24 s. The grasp adjustment module reduced the grasping phase by approximately 0.5 s and avoided empty-placement for failure cases. The strawberry slip prediction module handled slipped cases with an 88.89% success rate, saving approximately 4.00 s per harvesting cycle for failure cases. Also, it achieved an 81.25% recovery rate for slipping strawberries, requiring additional 0.63 s for re-grasping.

2510.18428 2026-06-09 cs.AI 版本更新

AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library

AlphaOPT: 利用自改进LLM经验库构建优化问题

Minwei Kong, Ao Qu, Xiaotong Guo, Wenbin Ouyang, Chonghe Jiang, Han Zheng, Yining Ma, Dingyi Zhuang, Yuhan Tang, Junyi Li, Shenhao Wang, Haris Koutsopoulos, Hai Wang, Cathy Wu, Jinhua Zhao

发表机构 * Singapore-MIT Alliance for Research and Technology(新加坡-麻省理工联合研究技术联盟) Massachusetts Institute of Technology(麻省理工学院) University of Florida(佛罗里达大学) Northeastern University(东北大学) Singapore Management University(新加坡管理学院)

AI总结 提出AlphaOPT,一种自改进经验库,使LLM能从有限监督中学习优化建模知识,通过库学习和库演化两阶段循环,逐步提升性能,在多个基准上超越基线方法。

详情
AI中文摘要

优化建模是各行业关键决策的基础,但自动化仍然困难:自然语言问题描述必须转化为精确的数学公式和可执行的求解器代码。现有的基于LLM的方法通常依赖于脆弱的提示或昂贵的重新训练,两者都泛化能力有限。最近的研究表明,大型模型可以通过经验重用进行改进,但在结构受限的环境中如何系统地获取、精炼和重用这些经验仍不清楚。我们提出了\textbf{AlphaOPT},一个自改进的经验库,使LLM能够从有限的监督中学习优化建模知识,包括仅包含答案反馈(无标准程序)、带注释的推理轨迹或参数更新。AlphaOPT运行在一个持续的两阶段循环中:\emph{库学习}阶段从失败的尝试中提取求解器验证的结构化见解,以及\emph{库演化}阶段基于跨任务的聚合证据精炼存储见解的适用性。这种设计允许模型积累可重用的建模原则,提高跨问题实例的迁移能力,并随时间保持库的有界增长。在多个优化基准上的评估表明,AlphaOPT随着更多训练数据的可用而稳步提升(从100个训练项到300个,准确率从65\%提高到72\%),并在两个分布外数据集上分别比最强基线高出9.1\%和8.2\%。这些结果表明,基于求解器反馈的结构化经验学习为需要精确公式化和执行的复杂推理任务提供了一种实用的替代重新训练的方法。所有代码和数据可在以下网址获取:this https URL。

英文摘要

Optimization modeling underlies critical decision-making across industries, yet remains difficult to automate: natural-language problem descriptions must be translated into precise mathematical formulations and executable solver code. Existing LLM-based approaches typically rely on brittle prompting or costly retraining, both of which offer limited generalization. Recent work suggests that large models can improve via experience reuse, but how to systematically acquire, refine, and reuse such experience in structurally constrained settings remains unclear. We present \textbf{AlphaOPT}, a self-improving experience library that enables LLMs to learn optimization modeling knowledge from limited supervision, including answer-only feedback without gold-standard programs, annotated reasoning traces, or parameter updates. AlphaOPT operates in a continual two-phase cycle: a \emph{Library Learning} phase that extracts solver-verified, structured insights from failed attempts, and a \emph{Library Evolution} phase that refines the applicability of stored insights based on aggregate evidence across tasks. This design allows the model to accumulate reusable modeling principles, improve transfer across problem instances, and maintain bounded library growth over time. Evaluated on multiple optimization benchmarks, AlphaOPT steadily improves as more training data become available (65\% $\rightarrow$ 72\% from 100 to 300 training items) and outperforms the strongest baseline by 9.1\% and 8.2\% on two out-of-distribution datasets. These results demonstrate that structured experience learning, grounded in solver feedback, provides a practical alternative to retraining for complex reasoning tasks requiring precise formulation and execution. All code and data are available at: https://github.com/Minw913/AlphaOPT.

2602.12996 2026-06-09 cs.CL cs.AI 版本更新

Know More, Know Clearer: A Meta-Cognitive Framework for Knowledge Augmentation in Large Language Models

知道更多,更清晰:大型语言模型中知识增强的元认知框架

Hao Chen, Ye He, Yuchun Fan, Yukun Yan, Zhenghao Liu, Qingfu Zhu, Maosong Sun, Wanxiang Che

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出元认知框架,利用内部认知信号划分知识空间为掌握、混淆和缺失区域,通过差异化干预和认知一致性机制增强知识并校准置信度,实验证明优于基线方法。

详情
AI中文摘要

知识增强显著提升了大型语言模型(LLMs)在知识密集型任务中的性能。然而,现有方法通常基于模型性能等同于内部知识的简单前提,忽略了导致过度自信错误或不确定真相的知识-置信度差距。为弥合这一差距,我们提出了一种新颖的元认知框架,通过差异化干预和对齐实现可靠的知识增强。我们的方法利用内部认知信号将知识空间划分为掌握、混淆和缺失区域,指导有针对性的知识扩展。此外,我们引入了一种认知一致性机制,以同步主观确定性与客观准确性,确保校准的知识边界。大量实验表明,我们的框架持续优于强基线,验证了其在不仅增强知识能力,而且培养更好区分已知与未知的认知行为方面的合理性。所有代码均可在该 https URL 获取。

英文摘要

Knowledge augmentation has significantly enhanced the performance of Large Language Models (LLMs) in knowledge-intensive tasks. However, existing methods typically operate on the simplistic premise that model performance equates with internal knowledge, overlooking the knowledge-confidence gaps that lead to overconfident errors or uncertain truths. To bridge this gap, we propose a novel meta-cognitive framework for reliable knowledge augmentation via differentiated intervention and alignment. Our approach leverages internal cognitive signals to partition the knowledge space into mastered, confused, and missing regions, guiding targeted knowledge expansion. Furthermore, we introduce a cognitive consistency mechanism to synchronize subjective certainty with objective accuracy, ensuring calibrated knowledge boundaries. Extensive experiments demonstrate the our framework consistently outperforms strong baselines, validating its rationality in not only enhancing knowledge capabilities but also fostering cognitive behaviors that better distinguish knowns from unknowns. All codes are available at https://github.com/AI9Stars/Know-More-Know-Clearer.

2602.05774 2026-06-09 cs.LG cs.AI math.PR 版本更新

Variational Speculative Decoding: Rethinking Draft Training from Token Likelihood to Sequence Acceptance

变分推测解码:从令牌似然到序列接受的草稿训练再思考

Xiandong Zou, Jianshu Li, Jing Huang, Pan Zhou

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出变分推测解码(VSD),将草稿训练视为对潜在提议(草稿路径)的变分推断,通过最大化目标模型接受的边际概率来优化,结合路径级效用和期望最大化过程,显著提升解码效率。

详情
AI中文摘要

推测解码加速了(多模态)大语言模型的推理,但训练-解码之间存在不一致:现有方法优化单一贪婪轨迹,而解码涉及验证和排序多个采样草稿路径。我们提出变分推测解码(VSD),将草稿训练形式化为对潜在提议(草稿路径)的变分推断。VSD最大化目标模型接受的边际概率,得到一个ELBO,该ELBO促进高质量潜在提议,同时最小化与目标分布的散度。为提升质量并降低方差,我们引入路径级效用,并通过期望最大化过程进行优化。E步从经过oracle过滤的后验中抽取蒙特卡洛样本,M步使用自适应拒绝加权(ARW)和置信度感知正则化(CAR)最大化加权似然。理论分析证实VSD增加了期望接受长度和加速比。在LLM和MLLM上的大量实验表明,VSD相比EAGLE-3实现高达9.6%的加速,相比ViSpec实现7.9%的加速,显著提升了解码效率。

英文摘要

Speculative decoding accelerates inference for (M)LLMs, yet a training-decoding discrepancy persists: while existing methods optimize single greedy trajectories, decoding involves verifying and ranking multiple sampled draft paths. We propose Variational Speculative Decoding (VSD), formulating draft training as variational inference over latent proposals (draft paths). VSD maximizes the marginal probability of target-model acceptance, yielding an ELBO that promotes high-quality latent proposals while minimizing divergence from the target distribution. To enhance quality and reduce variance, we incorporate a path-level utility and optimize via an Expectation-Maximization procedure. The E-step draws Monte Carlo samples from an oracle-filtered posterior, while the M-step maximizes weighted likelihood using Adaptive Rejection Weighting (ARW) and Confidence-Aware Regularization (CAR). Theoretical analysis confirms that VSD increases expected acceptance length and speedup. Extensive experiments across LLMs and MLLMs show that VSD achieves up to a 9.6% speedup over EAGLE-3 and 7.9% over ViSpec, significantly improving decoding efficiency.

2602.12107 2026-06-09 cs.LG cs.AI stat.ML 版本更新

On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage

离线强化学习在 $Q^\star$ 近似与部分覆盖下的复杂性

Haolin Liu, Braham Snyder, Chen-Yu Wei

发表机构 * University of Virginia(弗吉尼亚大学)

AI总结 本文通过信息论下界证明 $Q^\star$ 可实现性与贝尔曼完备性在部分覆盖下不足以实现样本高效的离线强化学习,并提出一个通用决策-估计框架来统一和改进现有结果。

详情
AI中文摘要

我们研究了在 $Q^\star$ 近似和部分覆盖下的离线强化学习,这一设定激发了诸如保守 $Q$ 学习(CQL;Kumar et al., 2020)等实用算法,但理论上受到的关注有限。我们的工作受以下开放问题的启发:“在部分覆盖下,$Q^\star$ 可实现性和贝尔曼完备性是否足以实现样本高效的离线强化学习?”我们通过信息论下界给出了否定答案。为了识别在部分覆盖下实现样本高效离线强化学习的额外结构,我们引入了一个通用决策-估计框架,该框架受在线强化学习的无模型决策-估计系数(DEC;Foster et al., 2023b; Liu et al., 2025b)启发。我们的框架将离线强化学习的复杂性分解为决策复杂性和值估计误差,从而允许对这两个子问题进行模块化研究。我们的结果不仅统一了现有结果(Chen and Jiang, 2022; Uehara et al., 2023),而且进一步改进并推广了它们。在决策复杂性方面,我们的改进包括:在部分覆盖下软 $Q$ 学习的首个 $\epsilon^{-2}$ 样本复杂度界,改进了 Uehara 等人(2023)的 $\epsilon^{-4}$ 界;在 Chen 和 Jiang(2022)的值间隙设定中消除了对额外在线交互的需求;以及超越上述两种情况的新可学习设定。在值估计方面,我们提供了在部分覆盖下贝尔曼完备性作用的新刻画,以及一般低贝尔曼秩 MDP(Jiang et al., 2017; Du et al., 2021; Jin et al., 2021)离线可学习性的首个刻画。后者是一个经典的在线强化学习设定,除特殊情况外,在离线强化学习中尚未被探索。作为附带贡献,我们的技术给出了函数近似设定下 CQL 的首个分析。

英文摘要

We study offline reinforcement learning under $Q^\star$-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative $Q$-Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are $Q^\star$-realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?" We answer in the negative via an information-theoretic lower bound. To identify additional structure that enables sample-efficient offline RL under partial coverage, we introduce a general decision-estimation framework, inspired by model-free decision-estimation coefficients (DEC) for online RL (Foster et al., 2023b; Liu et al., 2025b). Our framework decomposes offline RL complexity into decision complexity and value estimation error. This allows modular study of both sub-problems. Our result not only unifies existing results (Chen and Jiang, 2022; Uehara et al., 2023), but further improves and generalizes them. On the decision complexity side, our improvement includes: the first $ε^{-2}$ sample complexity bound for soft $Q$-learning under partial coverage that improves Uehara et al.'s (2023) $ε^{-4}$ bound, the removal of the need for additional online interaction in the value-gap setting of Chen and Jiang (2022), and new learnable settings beyond the above two cases. On the value estimation side, we provide a new characterization of the role of Bellman completeness under partial coverage, and the first characterization of offline learnability for general low-Bellman-rank MDPs (Jiang et al., 2017; Du et al., 2021; Jin et al., 2021). The latter is a canonical online RL setting that has remained unexplored in offline RL except for special cases. As a side contribution, our techniques give the first analysis of CQL in the function approximation setting.

2602.11934 2026-06-09 cs.RO 版本更新

Robot-DIFT: Correspondence-Sensitive Diffusion Features for Contact-Rich Robot Manipulation

Robot-DIFT: 用于接触丰富机器人操作的对应敏感扩散特征

Yu Deng, Yufeng Jin, Xiaogang Jia, Jiahong Xue, Gerhard Neumann, Georgia Chalvatzaki

发表机构 * TU Darmstadt(图宾根大学) KIT(卡尔斯鲁厄理工学院) FZI(弗劳恩霍夫研究所) Hessian.AI(黑森人工智能公司) Robotics Institute Germany(德国机器人研究所) Honda Research Institute Europe GmbH(本田欧洲研究院)

AI总结 提出Robot-DIFT,通过流形蒸馏将扩散模型转化为确定性学生网络,结合空间-语义特征金字塔网络,为接触敏感任务提供实时对应敏感特征,在多个基准上超越现有方法。

详情
AI中文摘要

机器人操作常常在最后几毫米失败:策略可能识别出正确的物体,但忽略了动作所需的姿态偏移、边界或预接触对齐。我们认为,当语义不变性抑制了闭环控制的对应线索,或者这些线索未以可用形式暴露给策略时,就会发生此类失败。现代视觉编码器提供强大的语义抽象,但接触丰富的操作需要对应敏感性:对动作相关的姿态、边界和接触几何变化具有判别性特征响应。扩散特征为密集对应提供了强大的先验,但由于随机性、延迟和表示漂移,直接使用不切实际。我们引入了Robot-DIFT,一种用于实时控制的确定性扩散派生骨干网络。通过流形蒸馏,Robot-DIFT将噪声条件扩散教师网络转换为干净输入的单次学生网络,同时保留教师的特征流形。空间-语义特征金字塔网络(S2-FPN)将粗到细的学生解码器特征融合为视觉标记,向策略暴露语义上下文和精细接触细节。在RoboCasa、LIBERO-10和真实机器人上,Robot-DIFT在接触敏感任务上优于视觉-语言、自监督、几何导向和扩散基线。受控的骨干/读出交换表明,S2-FPN解锁而非取代了扩散对应先验。

英文摘要

Robot manipulation often fails in the final millimeters: a policy may recognize the right object yet miss the pose offsets, boundaries, or pre-contact alignments needed for action. We argue that such failures arise when semantic invariance suppresses correspondence cues for closed-loop control, or when these cues are not exposed to the policy in a usable form. Modern visual encoders provide strong semantic abstractions, but contact-rich manipulation requires correspondence sensitivity: discriminative feature responses to action-relevant changes in pose, boundary, and contact geometry. Diffusion features provide a strong prior for dense correspondence, but direct use is impractical due to stochasticity, latency, and representation drift. We introduce Robot-DIFT, a deterministic diffusion-derived backbone for real-time control. Through Manifold Distillation, Robot-DIFT converts a noise-conditioned diffusion Teacher into a clean-input, single-pass Student while preserving the teacher's feature manifold. A Spatial--Semantic Feature Pyramid Network (S2-FPN) fuses coarse-to-fine Student decoder features into visual tokens that expose semantic context and fine contact detail to the policy. Across RoboCasa, LIBERO-10, and real robots, Robot-DIFT outperforms vision--language, self-supervised, geometry-oriented, and diffusion baselines on contact-sensitive tasks. Controlled backbone/readout swaps show that S2-FPN unlocks, rather than replaces, the diffusion correspondence prior.

2602.11238 2026-06-09 cs.CL 版本更新

SurveyLens: A Discipline-Aware Benchmark for Automatic Survey Generation

SurveyLens:一个学科感知的自动综述生成基准

Beichen Guo, Zhiyuan Wen, Jia Gu, Haochen Shi, Jian Wang, Senzhang Wang, Haoyang Li, Ruosong Yang, Shuaiqi Liu

发表机构 * The Hong Kong Polytechnic University(香港理工大学) Sichuan University(四川大学) Central South University(中南大学) Alibaba Cloud(阿里巴巴云)

AI总结 提出SurveyLens,首个学科感知的自动综述生成基准,包含跨10个学科的1000篇人工撰写综述数据集和双视角评估框架,发现深度研究智能体在跨学科鲁棒性上最优,而所有范式在参考文献质量上仍薄弱。

Comments 8 pages, 9 figures

详情
AI中文摘要

自动综述生成旨在通过检索、组织和综合学术论文来生成全面的文献综述。尽管专门的ASG框架和深度研究智能体取得了快速进展,但现有评估主要集中于计算机科学或依赖通用标准,尚不清楚当前系统是否满足不同学科的综述标准。我们引入了SurveyLens,这是第一个学科感知的ASG基准。SurveyLens包含SurveyLens-1k,一个跨10个学科的1000篇人工撰写综述的精选数据集,以及一个双视角框架,该框架结合了学科感知的评分标准与基于参考的人工综述对齐。评估了11个最先进的系统,包括普通LLM、ASG系统和深度研究智能体,我们发现深度研究智能体是唯一在所有10个学科中表现稳健的范式,ASG系统在结构规划上领先,而所有范式在参考文献质量上仍然薄弱,这为特定学科的工具选择和未来的ASG设计提供了实用指导。

英文摘要

Automatic Survey Generation (ASG) aims to produce comprehensive literature surveys by retrieving, organizing, and synthesizing academic papers. Despite rapid progress in specialized ASG frameworks and Deep Research agents, existing evaluations largely center on Computer Science or rely on generic criteria, leaving it unclear whether current systems satisfy the survey standards of diverse disciplines. We introduce SurveyLens, the first discipline-aware ASG benchmark. SurveyLens comprises SurveyLens-1k, a curated dataset of 1,000 human-written surveys across 10 disciplines, and a dual-lens framework that combines discipline-aware rubric scoring with reference-based alignment to human-written surveys. Evaluating 11 state-of-the-art systems across vanilla LLMs, ASG systems, and Deep Research agents, we find that Deep Research agents are the only paradigm robust across all 10 disciplines, ASG systems lead on structural planning, and all paradigms remain weak on reference quality, providing practical guidance for discipline-specific tool selection and future ASG design.

2512.20591 2026-06-09 cs.RO 版本更新

LightTact: A Visual-Tactile Fingertip Sensor for Deformation-Independent Contact Sensing

LightTact: 一种用于变形无关接触感知的视觉-触觉指尖传感器

Changyi Lin, Boda Huo, Mingyang Yu, Emily Ruppel, Bingqing Chen, Jonathan Francis, Ding Zhao

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Bosch Center for Artificial Intelligence (BCAI)(博世人工智能中心(BCAI))

AI总结 提出LightTact传感器,通过环境光阻断光学配置实现变形无关的接触检测,在无宏观变形下(如液体、超软材料)实现高对比度接触分割,并解锁轻接触机器人操作。

Comments Project website: https://linchangyi1.github.io/LightTact

详情
AI中文摘要

接触通常发生在没有宏观表面变形的情况下,例如与液体、半液体或超软材料交互时。然而,大多数现有的触觉传感器依赖变形来推断接触,使得这种轻接触交互难以稳健感知。为解决这一问题,我们提出了LightTact,一种视觉-触觉指尖传感器,通过变形无关的原理使接触直接可见。LightTact采用环境光阻断光学配置,抑制非接触区域的外部光和内部照明,仅传输真实接触处产生的散射光。因此,LightTact生成高对比度原始图像,其中非接触像素保持近黑色(平均灰度值<3),接触像素保留接触表面的自然外观。基于此,LightTact实现了对材料属性、接触力、表面外观和环境光照鲁棒的像素级接触分割。我们进一步证明,LightTact解锁了需要检测极轻接触的新型机器人操作行为,包括水扩散、面霜蘸取和软薄膜交互。此外,我们展示了LightTact的空间对齐视觉-触觉图像可直接由视觉-语言模型解释。

英文摘要

Contact often occurs without macroscopic surface deformation, such as during interaction with liquids, semi-liquids, or ultra-soft materials. However, most existing tactile sensors rely on deformation to infer contact, making such light-contact interactions difficult to perceive robustly. To address this, we present LightTact, a visual-tactile fingertip sensor that makes contact directly visible via a deformation-independent principle. LightTact features an ambient-blocking optical configuration that suppresses both external light and internal illumination at non-contact regions, while transmitting only the scattered light generated at true contacts. As a result, LightTact produces high-contrast raw images in which non-contact pixels remain near-black (mean gray value < 3) and contact pixels preserve the natural appearance of the contacting surface. Built on this, LightTact achieves accurate pixel-level contact segmentation that is robust to material properties, contact force, surface appearance, and environmental lighting. We further demonstrate that LightTact unlocks new robotic manipulation behaviors that require detection of extremely light contact, including water spreading, facial-cream dipping, and soft thin-film interaction. In addition, we show that LightTact's spatially aligned visual-tactile images can be directly interpreted by vision-language models.

2602.10858 2026-06-09 cs.CV 版本更新

Hyperspectral Smoke Segmentation via Mixture of Prototypes

基于原型混合的高光谱烟雾分割

Lujian Yao, Haitao Zhao, Xianghai Kong, Yuhan Xu

发表机构 * Automation Department, School of Information Science and Engineering, East China University of Science and Technology(自动化系,信息科学与工程学院,东华大学)

AI总结 针对烟雾分割中光谱信息不足、云干扰和半透明区域问题,提出首个高光谱烟雾分割数据集HSSDataset,并设计原型混合网络(MoP),通过波段分离、原型光谱表示和双阶段路由器实现自适应波段加权,在高低光谱模态上均取得优异性能。

Comments 31 pages, 14 figures

详情
AI中文摘要

烟雾分割对于野火管理和工业安全应用至关重要。传统的可见光方法由于光谱信息不足而面临局限性,特别是在处理云干扰和半透明烟雾区域时。为了解决这些挑战,我们引入高光谱成像进行烟雾分割,并提出了第一个高光谱烟雾分割数据集(HSSDataset),该数据集使用多对一标注协议,从20个真实场景的超过18,000帧中收集了精心标注的样本。然而,不同的光谱波段在空间区域上表现出不同的判别能力,因此需要自适应的波段加权策略。我们将此分解为三个技术挑战:光谱交互污染、有限的光谱模式建模和复杂的加权路由器问题。我们提出了一种原型混合(MoP)网络,包括:(1) 波段分离(BS)用于光谱隔离,(2) 基于原型的光谱表示(PSR)用于多样化模式,以及(3) 双阶段路由器(DSR)用于自适应空间感知波段加权。我们进一步构建了一个包含RGB-红外图像的多光谱数据集(MSSDataset)。大量实验验证了该方法在高光谱和多光谱模态上的优越性能,为基于光谱的烟雾分割建立了新的范式。

英文摘要

Smoke segmentation is critical for wildfire management and industrial safety applications. Traditional visible-light-based methods face limitations due to insufficient spectral information, particularly struggling with cloud interference and semi-transparent smoke regions. To address these challenges, we introduce hyperspectral imaging for smoke segmentation and present the first hyperspectral smoke segmentation dataset (HSSDataset) with carefully annotated samples collected from over 18,000 frames across 20 real-world scenarios using a Many-to-One annotations protocol. However, different spectral bands exhibit varying discriminative capabilities across spatial regions, necessitating adaptive band weighting strategies. We decompose this into three technical challenges: spectral interaction contamination, limited spectral pattern modeling, and complex weighting router problems. We propose a mixture of prototypes (MoP) network with: (1) band split (BS) for spectral isolation, (2) prototype-based spectral representation (PSR) for diverse patterns, and (3) dual-stage router (DSR) for adaptive spatial-aware band weighting. We further construct a multispectral dataset (MSSDataset) with RGB-infrared images. Extensive experiments validate superior performance across both hyperspectral and multispectral modalities, establishing a new paradigm for spectral-based smoke segmentation.

2602.03395 2026-06-09 cs.LG 版本更新

The Label Horizon Paradox: Rethinking Supervision Targets in Financial Forecasting

标签地平线悖论:金融预测中监督目标的再思考

Chen-Hui Song, Shuoling Liu, Liyuan Chen

发表机构 * GitHub

AI总结 本文提出标签地平线悖论,指出最优监督信号常偏离预测目标,并基于动态信噪比权衡理论,提出双层优化框架自动寻找最优代理标签,在金融数据集上取得一致改进。

详情
AI中文摘要

虽然深度学习通过复杂的架构革新了金融预测,但监督信号本身的设计却很少受到审视。我们挑战了训练标签必须严格反映推理目标的经典假设,揭示了标签地平线悖论:最优监督信号往往偏离预测目标,而是在由市场动态决定的中间地平线上转移。我们从理论上将这一现象归结为动态信噪比权衡,证明泛化取决于边际信号实现与噪声积累之间的竞争。为了将这一见解付诸实践,我们提出了一个双层优化框架,能够在单次训练运行中自主识别最优代理标签。在大型金融数据集上的大量实验表明,该方法相比传统基线取得了一致的改进,从而为金融预测中基于标签的研究开辟了新途径。

英文摘要

While deep learning has revolutionized financial forecasting through sophisticated architectures, the design of the supervision signal itself is rarely scrutinized. We challenge the canonical assumption that training labels must strictly mirror inference targets, uncovering the Label Horizon Paradox: the optimal supervision signal often deviates from the prediction goal, shifting across intermediate horizons governed by market dynamics. We theoretically ground this phenomenon in a dynamic signal-noise trade-off, demonstrating that generalization hinges on the competition between marginal signal realization and noise accumulation. To operationalize this insight, we propose a bi-level optimization framework that autonomously identifies the optimal proxy label within a single training run. Extensive experiments on large-scale financial datasets demonstrate consistent improvements over conventional baselines, thereby opening new avenues for label-centric research in financial forecasting.

2602.08785 2026-06-09 cs.LG 版本更新

A Graphop Analysis of Graph Neural Networks on Sparse Graphs: Generalization and Universal Approximation

稀疏图上图神经网络的图算子分析:泛化与通用逼近

Ofek Amran, Tom Gilat, Ron Levie

发表机构 * Faculty of Mathematics Technion – Israel Institute of Technology(数学系技术学院 – 以色列理工学院)

AI总结 提出统一度量空间,涵盖稀疏与稠密图,证明MPNN等度连续,从而改进通用逼近定理和泛化界。

详情
AI中文摘要

消息传递图神经网络(MPNN)的泛化和逼近能力通常通过在输入图空间上定义紧度量来研究,在该度量下MPNN是等度连续的。这类分析有两种:1)当度量空间包含无界大小的图时,该理论仅适用于稠密图;2)当研究稀疏图时,度量空间仅包含有界大小的图。在这项工作中,我们提出了一种统一的方法,在所有大小的图(包括稀疏和稠密)的空间上定义一个紧度量,在该度量下MPNN是等度连续的。这导致了比先前工作更强大的通用逼近定理和泛化界。该理论基于并扩展了最近一种称为图算子分析的图极限理论方法。

英文摘要

Generalization and approximation capabilities of message passing graph neural networks (MPNNs) are often studied by defining a compact metric on a space of input graphs under which MPNNs are equicontinuous. Such analyses are of two varieties: 1) when the metric space includes graphs of unbounded sizes, the theory is only appropriate for dense graphs, and, 2) when studying sparse graphs, the metric space only includes graphs of uniformly bounded size. In this work, we present a unified approach, defining a compact metric on the space of graphs of all sizes, both sparse and dense, under which MPNNs are equicontinuous. This leads to more powerful universal approximation theorems and generalization bounds than previous works. The theory is based on, and extends, a recent approach to graph limit theory called graphop analysis.

2602.08733 2026-06-09 cs.LG 版本更新

Foundation Inference Models for Ordinary Differential Equations

常微分方程的基础推理模型

Maximilian Mauel, Johannes R. Hübers, David Berghaus, Patrick Seifner, Ramses J. Sanchez

发表机构 * University of Cambridge(剑桥大学) ETH Zurich(苏黎世联邦理工学院)

AI总结 提出FIM-ODE,一种预训练的基础推理模型,通过单次前向传播从含噪轨迹直接预测向量场,实现零样本性能匹配并超越ODEFormer,微调后优于现代神经和GP基线。

Comments Published in ICML 2026

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)
AI中文摘要

常微分方程(ODE)是科学建模的核心,但从含噪轨迹中推断其向量场仍然具有挑战性。当前的方法,如符号回归、高斯过程(GP)回归和神经常微分方程,通常需要复杂的训练流程和大量的机器学习专业知识,或者严重依赖于系统特定的先验知识。我们提出FIM-ODE,一种预训练的基础推理模型,通过单次前向传播直接从含噪轨迹数据预测向量场,从而摊销低维ODE推理。我们在具有低次多项式向量场的ODE先验分布上预训练FIM-ODE,并用神经算子表示目标场。FIM-ODE实现了强大的零样本性能,在多种设置下匹配并常常优于最近的预训练符号基线ODEFormer,尽管使用了更简单的预训练先验分布。预训练还为微调提供了强大的初始化,实现了快速且稳定的适应,在不需要机器学习专业知识的情况下优于现代神经和GP基线。

英文摘要

Ordinary differential equations (ODEs) are central to scientific modelling, but inferring their vector fields from noisy trajectories remains challenging. Current approaches such as symbolic regression, Gaussian process (GP) regression, and Neural ODEs often require complex training pipelines and substantial machine learning expertise, or they depend strongly on system-specific prior knowledge. We propose FIM-ODE, a pretrained Foundation Inference Model that amortises low-dimensional ODE inference by predicting the vector field directly from noisy trajectory data in a single forward pass. We pretrain FIM-ODE on a prior distribution over ODEs with low-degree polynomial vector fields and represent the target field with neural operators. FIM-ODE achieves strong zero-shot performance, matching and often improving upon ODEFormer, a recent pretrained symbolic baseline, across a range of regimes despite using a simpler pretraining prior distribution. Pretraining also provides a strong initialisation for finetuning, enabling fast and stable adaptation that outperforms modern neural and GP baselines without requiring machine learning expertise.

2602.08235 2026-06-09 cs.CL cs.AI cs.CR 版本更新

When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents

当良性输入导致严重危害:引发计算机使用代理的不安全意外行为

Jaylen Jones, Zhehao Zhang, Yuting Ning, Eric Fosler-Lussier, Pierre-Luc St-Charles, Yoshua Bengio, Dawn Song, Yu Su, Huan Sun

发表机构 * DeepMind, London, UK(深度Mind,伦敦,英国) Stanford University, Stanford, CA, USA(斯坦福大学,斯坦福,加利福尼亚州,美国) UC Berkeley, Berkeley, CA, USA(加州大学伯克利分校,伯克利,加利福尼亚州,美国)

AI总结 提出AutoElicit框架,通过迭代扰动良性指令并利用CUA执行反馈,自动引发前沿CUAs(如Claude 4.5 Haiku等)的数百种有害意外行为,并验证其跨模型可迁移性。

Comments ICML 2026, Project Homepage: https://osu-nlp-group.github.io/AutoElicit/

详情
AI中文摘要

尽管计算机使用代理(CUA)在自动化日益复杂的操作系统工作流程方面具有巨大潜力,但即使在良性输入上下文中,它们也可能表现出偏离预期结果的不安全意外行为。然而,对此风险的探索仍主要停留在轶事层面,缺乏具体的特征描述和自动化方法,无法在现实CUA场景下主动发现长尾意外行为。为填补这一空白,我们首次提出了针对CUA意外行为的概念和方法框架,通过定义其关键特征、自动引发它们以及分析它们如何从良性输入中产生。我们提出了AutoElicit:一个代理框架,它使用CUA执行反馈迭代地扰动良性指令,并在保持扰动现实且良性的同时引发严重危害。使用AutoElicit,我们从最先进的CUA(如Claude 4.5 Haiku、Claude 4.5 Opus和Operator)中发现了数百种有害的意外行为。我们进一步评估了人工验证的成功扰动的可迁移性,识别出各种前沿CUA对意外行为的持续易感性。这项工作为在现实计算机使用环境中系统分析意外行为奠定了基础。

英文摘要

Although computer-use agents (CUAs) hold significant potential to automate increasingly complex OS workflows, they can demonstrate unsafe unintended behaviors that deviate from expected outcomes even under benign input contexts. However, exploration of this risk remains largely anecdotal, lacking concrete characterization and automated methods to proactively surface long-tail unintended behaviors under realistic CUA scenarios. To fill this gap, we introduce the first conceptual and methodological framework for unintended CUA behaviors, by defining their key characteristics, automatically eliciting them, and analyzing how they arise from benign inputs. We propose AutoElicit: an agentic framework that iteratively perturbs benign instructions using CUA execution feedback, and elicits severe harms while keeping perturbations realistic and benign. Using AutoElicit, we surface hundreds of harmful unintended behaviors from state-of-the-art CUAs such as Claude 4.5 Haiku, Claude 4.5 Opus, and Operator. We further evaluate the transferability of human-verified successful perturbations, identifying persistent susceptibility to unintended behaviors across various other frontier CUAs. This work establishes a foundation for systematically analyzing unintended behaviors in realistic computer-use settings.

2602.08222 2026-06-09 cs.AI 版本更新

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

弱驱动学习:弱智能体如何使强智能体更强

Zehao Chen, Gongxun Li, Tianxiang Ai, Zixuan Huang, Xiaodong Liu, Yifei Li, Wang Zhou, Fuzhen Zhuang, Xianglong Liu, Jianxin Li, Deqing Wang, Yikun Ban

发表机构 * Beihang University(北航) China Telecom eSurfing Cloud(中国电信eSurfing云)

AI总结 针对大语言模型后训练中的饱和瓶颈,提出WMSS方法,利用模型历史弱检查点通过熵动力学识别可恢复学习差距并进行补偿学习,在数学推理和代码生成任务上实现有效性能提升且无额外推理成本。

详情
AI中文摘要

随着后训练优化成为改进大语言模型的核心,我们观察到一种持续的饱和瓶颈:一旦模型变得高度自信,进一步训练带来的收益递减。虽然现有方法继续强化目标预测,但我们发现信息丰富的监督信号仍然潜藏在模型自身的历史弱状态中。受此观察启发,我们提出WMSS(弱智能体可以使强智能体更强),一种利用弱检查点指导持续优化的后训练范式。通过熵动力学识别可恢复的学习差距,并通过补偿学习强化它们,WMSS使强智能体能够超越传统的后训练饱和。在数学推理和代码生成数据集上的实验表明,使用我们的方法训练的智能体实现了有效的性能提升,同时不产生额外的推理成本。

英文摘要

As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing methods continue to reinforce target predictions, we find that informative supervision signals remain latent in models' own historical weak states. Motivated by this observation, we propose WMSS (Weak Agents Can Make Strong Agents Stronger), a post-training paradigm that leverages weak checkpoints to guide continued optimization. By identifying recoverable learning gaps via entropy dynamics and reinforcing them through compensatory learning, WMSS enables strong agents to improve beyond conventional post-training saturation. Experiments on mathematical reasoning and code generation datasets show that agents trained with our approach achieve effective performance improvements, while incurring zero additional inference cost.

2602.07345 2026-06-09 cs.CV cs.LG 版本更新

Optimizing Few-Step Generation with Adaptive Matching Distillation

自适应匹配蒸馏优化少步生成

Lichen Bai, Zikai Zhou, Shitong Shao, Wenliang Zhong, Shuo Yang, Shuo Chen, Bojun Chen, Zeke Xie

发表机构 * xLeaF Lab, The Hong Kong University of Science(xLeaF实验室,香港科学与技术大学) Harbin Institute of Technology, Shenzhen(哈尔滨工业大学,深圳) School of Intelligence Science(智能科学学院)

AI总结 提出自适应匹配蒸馏(AMD),通过奖励代理检测并逃离禁止区域,结合结构信号分解和排斥景观锐化,提升少步生成模型的样本保真度和训练鲁棒性。

Comments 25 pages, 15 figures, 11 tables

详情
AI中文摘要

分布匹配蒸馏(DMD)是一种强大的加速范式,但其稳定性常在禁止区域(真实教师提供不可靠指导而虚假教师施加不足排斥力的区域)中受到损害。在这项工作中,我们提出了一个统一的优化框架,将先前的方法重新解释为避免这些受损区域的隐式策略。基于这一见解,我们引入了自适应匹配蒸馏(AMD),一种利用奖励代理显式检测和逃离禁止区域的自我纠正机制。AMD通过结构信号分解动态优先考虑纠正梯度,并引入排斥景观锐化以强制执行陡峭的能量屏障,防止失败模式崩溃。在图像和视频生成任务(如SDXL、Wan2.1)以及严格基准测试(如VBench、GenEval)上的大量实验表明,AMD显著提高了样本保真度和训练鲁棒性。例如,AMD将SDXL上的HPSv2分数从30.64提升至31.25,优于最先进的基线。这些发现验证了在禁止区域内显式纠正优化轨迹对于推动少步生成模型性能上限至关重要。

英文摘要

Distribution Matching Distillation (DMD) is a powerful acceleration paradigm, yet its stability is often compromised in Forbidden Zone, regions where the real teacher provides unreliable guidance while the fake teacher exerts insufficient repulsive force. In this work, we propose a unified optimization framework that reinterprets prior art as implicit strategies to avoid these corrupted regions. Based on this insight, we introduce Adaptive Matching Distillation (AMD), a self-correcting mechanism that utilizes reward proxies to explicitly detect and escape Forbidden Zones. AMD dynamically prioritizes corrective gradients via structural signal decomposition and introduces Repulsive Landscape Sharpening to enforce steep energy barriers against failure mode collapse. Extensive experiments across image and video generation tasks (e.g., SDXL, Wan2.1) and rigorous benchmarks (e.g., VBench, GenEval) demonstrate that AMD significantly enhances sample fidelity and training robustness. For instance, AMD improves the HPSv2 score on SDXL from 30.64 to 31.25, outperforming state-of-the-art baselines. These findings validate that explicitly rectifying optimization trajectories within Forbidden Zones is essential for pushing the performance ceiling of few-step generative models.

2602.06307 2026-06-09 cs.CL 版本更新

Lost in Speech: Benchmarking, Evaluation, and Parsing of Spoken Bilingual Conversational Language Beyond Standard UD Assumptions

迷失在语音中:超越标准UD假设的口语双语会话的基准测试、评估与解析

Nemika Tyagi, Olga Kellert, Holly Hendrix, Nelvin Licona-Guevara, Justin Mackie, Phanos Kareen, Megan Michelle Smith, Tatiana Gallego Hernande, Samhitha Harish, Chitta Baral

发表机构 * Arizona State University(亚利桑那州立大学)

AI总结 针对口语双语会话中的不流利和话语驱动结构,提出SpokeBench基准、Flex-UD评估指标和DECAP解析框架,显著提升依赖解析性能。

Comments 17 pages, 4 Figures

详情
AI中文摘要

口语双语会话给句法解析带来了重大挑战,因为它们通常包含不流利和话语驱动的结构,这些结构在标准的通用依赖(UD)假设和评估实践下使依赖解析复杂化。为了系统研究这些挑战,本文首先引入了一个基于语言学的会话双语现象分类法,以及SpokeBench,一个由专家标注的英语-西班牙语基准,用于结构复杂的语音。为了解决现有评估实践的局限性,我们提出了Flex-UD,一种歧义感知的评估指标,能够区分灾难性的结构失败和语言上可接受的变体。最后,我们引入了DECAP,一种解耦的代理解析框架,将口语现象处理与核心句法分析分离,无需重新训练即可实现鲁棒且可解释的依赖解析。在专有和开源大语言模型上的实验表明,DECAP在复杂的会话现象上显著提高了性能,在UPOS-F1分数上比基线提高了超过60%,而Flex-UD评估揭示了在标准基于附着的指标下部分隐藏的增益。

英文摘要

Spoken bilingual conversations pose substantial challenges for syntactic parsing because they often include disfluencies and discourse-driven structures that complicate dependency parsing under standard Universal Dependencies (UD) assumptions and evaluation practices. To systematically study these challenges, in this work, we first introduce a linguistically grounded taxonomy of conversational bilingual phenomena, together with SpokeBench, an expert-annotated English-Spanish benchmark for structurally complex speech. To address the limitations of existing evaluation practices, we propose Flex-UD, an ambiguity-aware evaluation metric that distinguishes catastrophic structural failures from linguistically acceptable variations. Finally, we introduce DECAP, a decoupled agentic parsing framework that separates spoken-phenomena handling from core syntactic analysis, enabling robust and interpretable dependency parsing without retraining. Experiments across both proprietary and open-weight LLMs show that DECAP substantially improves performance on complex conversational phenomena and achieves over 60% improvements in UPOS-F1 Score over baselines, while Flex-UD evaluations reveal gains that otherwise remain partially hidden under standard attachment-based metrics.

2602.05845 2026-06-09 cs.CV 版本更新

Self-Supervised Learning with a Multi-Task Latent Space Objective

基于多任务潜在空间目标的自监督学习

Pierre-François De Plaen, Abhishek Jha, Luc Van Gool, Tinne Tuytelaars, Marc Proesmans

发表机构 * ESAT-PSI, KU Leuven, Belgium(KU莱顿大学ESAT-PSI实验室) VIB.AI, KU Leuven, Belgium(KU莱顿大学VIB.AI实验室) CVL, ETH Zürich, Switzerland(苏黎世联邦理工学院CVL实验室) INSAIT, Sofia University, Bulgaria(保加利亚索菲亚大学INSAIT研究所) TRACE vzw(TRACE非营利组织)

AI总结 提出自预测孪生SSL的多任务公式,通过为每种空间变换分配专用预测器解决多裁剪训练失败问题,提升线性评估3.8-4%,并引入非对称裁剪视图实现语义修复预训练。

详情
AI中文摘要

我们提出了自预测孪生SSL的多任务公式,其中每个空间变换定义了一个不同的潜在空间对齐任务,由共享编码器上的专用预测器解决。这一视角直接解释了BYOL、SimSiam和MoCo v3等自预测方法中多裁剪训练长期存在的失败原因:共享预测器被迫同时解决异构对齐任务,导致优化不稳定。为每种视图类型分配一个预测器解决了这种干扰,跨框架实现了3.8-4%的线性评估提升。该视角还提出了一种通过引入额外空间变换作为互补任务来丰富预训练的原则性方法。我们通过引入非对称裁剪视图来证明这一点,其中掩码在线视图与完整目标对齐,形成语义修复目标。所得框架稳定、与骨干网络无关,并持续提升ResNet和ViT模型在ImageNet和COCO上的性能。

英文摘要

We propose a multi-task formulation of self-predictive Siamese SSL in which each spatial transformation defines a distinct latent-space alignment task, solved by a dedicated predictor over a shared encoder. This perspective directly explains a long-standing failure of multi-crop training in self-predictive methods such as BYOL, SimSiam, and MoCo v3: a shared predictor is forced to solve heterogeneous alignment tasks simultaneously, leading to unstable optimization. Assigning one predictor per view type resolves this interference, unlocking linear evaluation gains of 3.8-4\% across frameworks. This perspective also suggests a principled way to enrich pre-training by introducing additional spatial transformations as complementary tasks. We demonstrate this by introducing asymmetric cutout views, in which a masked online view is aligned with a complete target, forming a semantic inpainting objective. The resulting framework is stable, backbone-agnostic, and consistently improves the performance of ResNet and ViT models on ImageNet and COCO.

2602.05600 2026-06-09 cs.LG 版本更新

On the Superlinear Relationship between SGD Noise Covariance and Loss Landscape Curvature

关于SGD噪声协方差与损失景观曲率之间的超线性关系

Yikuan Zhang, Ning Yang, Yuhai Tu

发表机构 * School of Physics, Peking University(北京大学物理系) Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies(北京大学成都先进交叉生物技术研究院) Flatiron Institute(Flatiron研究所)

AI总结 本文发现SGD噪声协方差与Hessian矩阵之间存在超线性关系,而非简单的正比关系,并通过实验验证了层间幂律标度。

Comments 8 pages, 15 figures

详情
AI中文摘要

随机梯度下降(SGD)引入各向异性噪声,该噪声与损失景观的局部曲率相关,从而将优化偏向平坦最小值。先前的工作通常假设对于负对数似然损失,Fisher信息矩阵与Hessian矩阵等价,从而声称SGD噪声协方差$\mathbf{C}$与Hessian矩阵$\mathbf{H}$成正比。我们证明该假设仅在深度神经网络中通常违反的严格条件下成立。利用最近发现的Activity--Weight对偶性,我们找到了一个更一般的、与具体损失形式无关的关系,表明$\mathbf{C} \propto \mathbb{E}_p[\mathbf{h}_p^2]$,其中$\mathbf{h}_p$表示每个样本的Hessian矩阵,且$\mathbf{H} = \mathbb{E}_p[\mathbf{h}_p]$。因此,$\mathbf{C}$和$\mathbf{H}$近似交换而非精确相等。我们进一步发现,在所分析的全连接层内,它们的对角元素遵循每层经验幂律$C_{ii} \propto H_{ii}^{\gamma}$,其中层依赖的拟合指数满足$1 \leq \gamma \leq 2$。跨数据集、架构和损失函数的实验支持了所得的层间界限,为深度学习中噪声-曲率关系提供了统一的刻画。

英文摘要

Stochastic Gradient Descent (SGD) introduces anisotropic noise that is correlated with the local curvature of the loss landscape, thereby biasing optimization toward flat minima. Prior work often assumes an equivalence between the Fisher Information Matrix and the Hessian for negative log-likelihood losses, leading to the claim that the SGD noise covariance $\mathbf{C}$ is proportional to the Hessian $\mathbf{H}$. We show that this assumption holds only under restrictive conditions that are typically violated in deep neural networks. Using the recently discovered Activity--Weight Duality, we find a more general relationship agnostic to the specific loss formulation, showing that $\mathbf{C} \propto \mathbb{E}_p[\mathbf{h}_p^2]$, where $\mathbf{h}_p$ denotes the per-sample Hessian with $\mathbf{H} = \mathbb{E}_p[\mathbf{h}_p]$. As a consequence, $\mathbf{C}$ and $\mathbf{H}$ commute approximately rather than coincide exactly. We further find that, within the analyzed fully connected layers, their diagonal elements follow per-layer empirical power laws $C_{ii} \propto H_{ii}^γ$, with layer-dependent fitted exponents bounded by $1 \leq γ\leq 2$. Experiments across datasets, architectures, and loss functions support the resulting layerwise bounds, providing a unified characterization of the noise-curvature relationship in deep learning.

2602.05175 2026-06-09 cs.CV 版本更新

Enhancing Adversarial Robustness with Signed Distance Fields for Harmonizing Geometric Invariance and Texture

利用符号距离场增强对抗鲁棒性以协调几何不变性与纹理

Zhe Li, Bernhard Kainz

发表机构 * Department AIBE, FAU Erlangen-Nürnberg(AIBE部门,埃朗根-纽伦堡大学)

AI总结 提出GeoTexPuri框架,通过符号距离场将离散图像掩码转化为连续空间场,在训练中融合几何与纹理特征,实现高效对抗净化,在ImageNet上取得84.79%干净准确率和83.52%鲁棒准确率。

Comments 14 pages, 6 figures

详情
AI中文摘要

深度神经网络在视觉识别中表现出色,但仍极易受到难以察觉的对抗攻击。现有的防御策略如对抗训练和基于扩散的净化已取得显著进展,但常受限于高计算成本、信息丢失和推理延迟。为解决这些挑战,我们提出了一种几何与纹理平衡净化(GeoTexPuri)框架,通过协调不变的几何结构与纹理特征来增强对抗鲁棒性。具体而言,该框架通过符号距离场(SDF)将离散图像掩码转化为连续空间场,在训练阶段融入密集几何引导。这一过程建立了稳定的结构锚点,使模型免受局部像素噪声干扰。通过多流训练目标,模型学会内化净化后的表示,有效将语义纹理线索与这些底层几何不变量对齐。在ImageNet上的大量实验证明了我们方法的有效性。GeoTexPuri在AutoAttack下实现了84.79%的干净准确率和83.52%的鲁棒准确率。关键在于,GeoTexPuri在推理时作为确定性分类器运行,仅需输入图像,无需任何辅助几何模块或额外计算成本,从而为实时应用提供了可扩展且高效的解决方案。

英文摘要

Deep neural networks demonstrate impressive performance in visual recognition but remain highly vulnerable to imperceptible adversarial attacks. Existing defense strategies such as adversarial training and diffusion-based purification have achieved significant progress but are frequently constrained by high computational cost, information loss, and inference latency. To address these challenges, we propose a Geometric and Texture balancing Purification (GeoTexPuri) framework that enhances adversarial robustness by harmonizing invariant geometric structures with textural features. Specifically, the framework integrates dense geometric guidance into the training phase by transforming discrete image masks into continuous spatial fields via Signed Distance Fields (SDF). This process establishes stable structural anchors that shield the model from local pixel noise. Through a multi-stream training objective, the model learns to internalize purified representations that effectively align semantic textural cues with these underlying geometric invariants. Extensive experiments on ImageNet demonstrate the efficacy of our approach. GeoTexPuri achieves 84.79\% clean accuracy and 83.52\% robust accuracy under the AutoAttack. Crucially, GeoTexPuri functions as a deterministic classifier during inference, requiring only the input image without any auxiliary geometric modules or additional computational costs, thereby ensuring a scalable and efficient solution for real-time applications.

2602.03224 2026-06-09 cs.AI cs.LG 版本更新

TAME: A Trustworthy Test-Time Evolution of Agent Memory with Systematic Benchmarking

TAME: 一种可信的智能体记忆测试时演化与系统化基准测试

Yu Cheng, Yongkang Hu, Jiuan Zhou, Yushuo Zhang, Yihang Chen, Huichi Zhou, Mingang Chen, Zhizhong Zhang, Kun Shao, Yuan Xie, Zhaoxia Yin

发表机构 * East China Normal University(东华师范大学) Shanghai Innovation Institute(上海创新研究院) Shanghai Key Laboratory of Computer Software Evaluating and Testing(上海计算机软件评测与测试重点实验室) Huawei Noah’s Ark Lab(华为诺亚实验室)

AI总结 提出TAME框架,通过执行器-评估器循环实现记忆的可信演化,解决良性任务演化中智能体可信度下降问题,在GPT-5.2 AIME上准确率提升14.6个百分点。

详情
AI中文摘要

智能体记忆的测试时演化代表了推进AGI的关键范式,因为它通过经验积累增强复杂推理,而无需参数更新。然而,即使在良性任务演化过程中,智能体的安全对齐仍然脆弱,这种现象被称为智能体记忆误演化。为了评估这一现象,我们构建了Trust-Memevo基准测试,并发现智能体在良性任务演化过程中,多个任务的可信度整体下降。为了解决这个问题,我们提出了TAME,一个可信感知的记忆演化框架,其中共享记忆库由执行器和评估器共同管理。执行器检索并应用可迁移经验以支持任务求解,而评估器评估每个使用经验对结果的贡献,并产生可信感知的反馈以指导后续记忆使用。这种执行器-评估器循环使得记忆能够随时间被选择性强化、谨慎重用和持续扩展。实验表明,TAME在实现强任务性能的同时缓解了记忆误演化。特别是在GPT-5.2 AIME基准测试上,TAME相比现有最强方法准确率提高了14.6个百分点,并保持了有竞争力的可信度。

英文摘要

Test-time evolution of agent memory represents a pivotal paradigm for advancing AGI, as it strengthens complex reasoning through experience accumulation without requiring parameter updates. However, even during benign task evolution, agent safety alignment remains vulnerable, a phenomenon known as Agent Memory Misevolution. To evaluate this phenomenon, we construct the Trust-Memevo benchmark and find that agents exhibit an overall decline in trustworthiness across multiple tasks during benign task evolution. To address this issue, we propose TAME, a trust-aware memory evolution framework in which a shared memory bank is jointly governed by an Executor and an Evaluator. The Executor retrieves and applies transferable experiences to support task solving, while the Evaluator assesses the contribution of each utilized experience to the outcome and produces trust-aware feedback to guide subsequent memory use. This executor-evaluator loop enables memory to be selectively reinforced, cautiously reused, and continuously expanded over time. Experiments show that TAME mitigates memory misevolution while achieving strong task performance. In particular, on the GPT-5.2 AIME benchmark, TAME improves accuracy by 14.6 percentage points over the strongest existing method and maintains competitive trustworthiness.

2602.02572 2026-06-09 cs.LG cs.AI 版本更新

Reward Shaping for (Inference-Time) Alignment: A Stackelberg Game Perspective

奖励塑形用于(推理时)对齐:一个Stackelberg博弈视角

Haichuan Wang, Tao Lin, Lingkai Kong, Ce Li, Hezi Jiang, Milind Tambe

发表机构 * University of Southern California(南加州大学)

AI总结 针对KL正则化导致LLM继承基策略偏见的问题,提出将奖励模型优化形式化为Stackelberg博弈,并通过简单奖励塑形方案近似最优奖励模型,在推理时对齐中持续提升平均奖励并达到超过66%的胜率。

Comments Accepted to ICML 2026. Camera-ready version

详情
AI中文摘要

现有的对齐方法直接使用从用户偏好数据中学习到的奖励模型来优化LLM策略,并相对于基策略进行KL正则化。这种做法对于最大化用户效用是次优的,因为KL正则化可能导致LLM继承基策略中与用户偏好冲突的偏见。虽然放大偏好输出的奖励可以减轻这种偏见,但也增加了奖励黑客的风险。这种权衡激励了在KL正则化下最优设计奖励模型的问题。我们将这个奖励模型优化问题形式化为一个Stackelberg博弈,并表明一个简单的奖励塑形方案可以有效近似最优奖励模型。我们在推理时对齐设置中经验性地评估了我们的方法,并证明它可以无缝集成到现有的对齐方法中,且开销最小。我们的方法持续提高了平均奖励,并在所有评估设置中平均达到了超过66%的胜率(相对于所有基线)。

英文摘要

Existing alignment methods directly use the reward model learned from user preference data to optimize an LLM policy, subject to KL regularization with respect to the base policy. This practice is suboptimal for maximizing user's utility because the KL regularization may cause the LLM to inherit the bias in the base policy that conflicts with user preferences. While amplifying rewards for preferred outputs can mitigate this bias, it also increases the risk of reward hacking. This tradeoff motivates the problem of optimally designing reward models under KL regularization. We formalize this reward model optimization problem as a Stackelberg game, and show that a simple reward shaping scheme can effectively approximate the optimal reward model. We empirically evaluate our method in inference-time alignment settings and demonstrate that it integrates seamlessly into existing alignment methods with minimal overhead. Our method consistently improves average reward and achieves win-tie rates exceeding 66% against all baselines, averaged across evaluation settings.

2511.06644 2026-06-09 cs.CV 版本更新

UniADC: A Unified Framework for Anomaly Detection and Classification

UniADC:统一异常检测与分类框架

Ximiao Zhang, Min Xu, Zheng Zhang, Yap-Peng Tan, Xiuzhuang Zhou

发表机构 * Beijing University of Posts and Telecommunications(北京邮电大学) China University of Mining(中国矿业大学) VinUniversity(文理大学)

AI总结 提出UniADC模型,通过无训练可控修复网络和隐式正态判别器,同时实现异常区域检测与类别识别,在少样本甚至零样本下超越现有方法。

详情
AI中文摘要

在本文中,我们引入了一个称为统一异常检测与分类的新任务,旨在同时检测图像中的异常区域并识别其具体类别。现有方法通常将异常检测和分类视为独立任务,从而忽略了它们的内在关联并限制了信息共享,导致性能次优。为了解决这个问题,我们提出了UniADC,一个设计用于在仅有少量甚至没有异常图像的情况下有效执行这两项任务的模型。具体来说,UniADC由两个关键组件组成:一个无需训练的可控修复网络和一个隐式正态判别器。修复网络可以通过在异常先验指导下修复正常区域来合成特定类别的异常图像,并且还可以修复少量异常样本以扩充可用异常数据。隐式正态判别器通过隐式建模正常状态来解决正常与异常像素分布不平衡的严峻挑战,通过将细粒度图像特征与异常类别嵌入对齐来实现精确的异常检测和分类。我们在四个异常检测与分类数据集(包括MVTec-FS、MTD、WFDD和Real-IAD)上进行了大量实验,结果表明UniADC在异常检测、定位和分类方面始终优于现有方法。代码可在以下网址获取:this https URL。

英文摘要

In this paper, we introduce a novel task termed unified anomaly detection and classification, which aims to simultaneously detect anomalous regions in images and identify their specific categories. Existing methods typically treat anomaly detection and classification as separate tasks, thereby neglecting their inherent correlations and limiting information sharing, which results in suboptimal performance. To address this, we propose UniADC, a model designed to effectively perform both tasks with only a few or even no anomaly images. Specifically, UniADC consists of two key components: a training-free Controllable Inpainting Network and an Implicit-Normal Discriminator. The inpainting network can synthesize anomaly images of specific categories by repainting normal regions guided by anomaly priors, and can also repaint few-shot anomaly samples to augment the available anomaly data. The implicit-normal discriminator addresses the severe challenge of the imbalance between normal and anomalous pixel distributions by implicitly modeling the normal state, achieving precise anomaly detection and classification by aligning fine-grained image features with anomaly-category embeddings. We conduct extensive experiments on four anomaly detection and classification datasets, including MVTec-FS, MTD, WFDD and Real-IAD, and the results demonstrate that UniADC consistently outperforms existing methods in anomaly detection, localization, and classification. The code is available at https://github.com/cnulab/UniADC.

2509.17078 2026-06-09 cs.CV 版本更新

Enhanced Detection of Tiny Objects in Aerial Images

航拍图像中微小目标的增强检测

Kihyun Kim, Michalis Lazarou, Tania Stathaki

发表机构 * 1 Dept. of Electrical \& Electronic Engineering, Imperial College London 2 Center for Vision, Speech Signal Processing, University of Surrey

AI总结 针对YOLOv8在航拍图像中检测微小目标性能不足的问题,提出四种增强策略,并设计MoonNet管道,通过集成多种注意力模块提升检测精度,在微小目标基准上达到最优性能。

Comments Accepted at IEEE ICIP 2026

详情
AI中文摘要

虽然像YOLOv8这样的单阶段检测器训练速度快,但作为权衡,它们在小目标检测上往往表现不佳。在航拍图像中检测微小目标时,由于目标分辨率低和背景杂乱,这一问题变得更加关键。为了解决这个问题,我们引入了四种增强策略——输入图像分辨率调整、数据增强、注意力机制以及注意力模块的替代门控函数——这些策略可以轻松地在YOLOv8上实现。我们证明,增大图像尺寸和适当使用数据增强可以带来性能提升。此外,我们设计了一个混合正交神经模块网络(MoonNet)管道,该管道由多个注意力模块增强的CNN组成。两个著名的注意力模块,Squeeze-and-Excitation(SE)块和卷积块注意力模块(CBAM),被集成到YOLOv8的主干中,形成了MoonNet设计,与原始YOLOv8主干和单一类型注意力模块增强的主干相比,MoonNet主干获得了改进的检测精度。MoonNet通过与YOLC模型集成,在微小目标基准上实现了最先进的性能,进一步证明了其适应性和潜力。我们的代码可在以下网址获取:this https URL

英文摘要

While one-stage detectors like YOLOv8 offer fast training speed, they often under-perform on detecting small objects as a trade-off. This becomes even more critical when detecting tiny objects in aerial imagery due to low-resolution targets and cluttered backgrounds. To address this, we introduce four enhancement strategies-input image resolution adjustment, data augmentation, attention mechanisms, and an alternative gating function for attention modules-that can be easily implemented on YOLOv8. We demonstrate that image size enlargement and the proper use of augmentation can lead to enhancement. Additionally, we designed a Mixture of Orthogonal Neural-modules Network (MoonNet) pipeline which consists of multiple attention-module-augmented CNNs. Two well-known attention modules, Squeeze-and-Excitation (SE) Block and Convolutional Block Attention Module (CBAM), were integrated into the backbone of YOLOv8 to form the MoonNet design, and the MoonNet backbone obtained improved detection accuracy compared to the original YOLOv8 backbone and single-type attention-module-augmented backbones. MoonNet further proved its adaptability and potential by achieving state-of-the-art performance on a tiny-object benchmark when integrated with the YOLC model. Our code is available at: https://github.com/Kihyun11/MoonNet

2505.20137 2026-06-09 cs.LG cs.AI 版本更新

ePC: Fast and Deep Predictive Coding in Digital Simulation

ePC:数字仿真中的快速深度预测编码

Cédric Goemaere, Gaspard Oliviers, Rafal Bogacz, Thomas Demeester

发表机构 * IDLab, Ghent University -- imec, Belgium(ID实验室,根特大学——imec,比利时) Brain Network Dynamics Unit, University of Oxford, UK(脑网络动力学单位,牛津大学,英国)

AI总结 提出误差预测编码(ePC),通过重新参数化解决标准状态预测编码(sPC)在数字仿真中的指数信号衰减问题,实现与反向传播相当的深度模型训练速度。

Comments Accepted at ICML 2026 - Main Track. All code available at https://github.com/cgoemaere/error_based_PC

详情
AI中文摘要

预测编码(PC)为神经网络训练提供了一种受大脑启发的反向传播替代方案,被描述为最小化其内部能量的物理系统。然而,在实践中,PC主要是在数字仿真中实现的,需要大量的计算,同时难以扩展到更深的架构。本文重新构建了PC以克服这种硬件-算法不匹配。首先,我们揭示了规范的状态基PC(sPC)在数字仿真中本质上是深度低效的,不可避免地导致指数级信号衰减,从而阻碍整个最小化过程。然后,为了克服这一根本限制,我们引入了误差基PC(ePC),这是一种新的PC重新参数化,不会遭受信号衰减。虽然不再具有生物合理性,但ePC数值计算精确的PC权重梯度,运行速度比sPC快几个数量级。跨多个架构和数据集的实验表明,即使在sPC难以处理的更深模型中,ePC也能匹配反向传播的性能。除了实际改进,我们的工作还提供了对PC动力学的理论洞察,并为在数字硬件及更广泛领域将基于PC的学习扩展到更深架构奠定了基础。

英文摘要

Predictive Coding (PC) offers a brain-inspired alternative to backpropagation for neural network training, described as a physical system minimizing its internal energy. However, in practice, PC is predominantly digitally simulated, requiring excessive amounts of compute while struggling to scale to deeper architectures. This paper reformulates PC to overcome this hardware-algorithm mismatch. First, we uncover how the canonical state-based formulation of PC (sPC) is, by design, deeply inefficient in digital simulation, inevitably resulting in exponential signal decay that stalls the entire minimization process. Then, to overcome this fundamental limitation, we introduce error-based PC (ePC), a novel reparameterization of PC which does not suffer from signal decay. Though no longer biologically plausible, ePC numerically computes exact PC weights gradients and runs orders of magnitude faster than sPC. Experiments across multiple architectures and datasets demonstrate that ePC matches backpropagation's performance even for deeper models where sPC struggles. Besides practical improvements, our work provides theoretical insight into PC dynamics and establishes a foundation for scaling PC-based learning to deeper architectures on digital hardware and beyond.