arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.08000 2026-06-02 cs.CL cs.LG

SmartThinker: Progressive Chain-of-Thought Length Calibration for Efficient Large Language Model Reasoning

SmartThinker: 渐进式思维链长度校准以实现高效的大语言模型推理

Chenzhi Hu, Qinzhe Hu, Yuhang Xu, Junyi Chen, Ruijie Wang, Shengzhong Liu, Jianxin Li, Fan Wu, Guihai Chen

发表机构 * Tsinghua University（清华大学）

AI总结针对大型推理模型输出冗余问题，提出基于GRPO的渐进式CoT长度校准方法SmartThinker，通过动态估计最优长度和调节长度奖励系数，在压缩响应长度同时提升准确率。

Comments Accepted by ICML 2026, 18 pages, 13 figures

详情

AI中文摘要

大型推理模型（LRMs），如OpenAI o1和DeepSeek-R1，通过采用长思维链（CoT）推理路径在复杂任务上实现了高准确率。然而，这些过程固有的冗长常常导致冗余和过度思考。为了解决这一问题，现有工作利用组相对策略优化（GRPO）来减少LRM的输出长度，但其静态长度奖励设计无法根据问题相对难度和响应长度分布动态调整，导致过度压缩和准确率下降。因此，我们提出SmartThinker，一种新颖的基于GRPO的高效推理方法，具有渐进式CoT长度校准。SmartThinker有两个贡献：首先，它在训练期间动态估计具有峰值准确率的最优长度，并引导过长响应朝向该长度，以减少响应长度同时保持准确率。其次，它动态调节长度奖励系数，以避免对正确推理路径的不当惩罚。大量实验结果表明，SmartThinker在提高准确率的同时实现了高达52.5%的平均长度压缩，并在AIME25等具有挑战性的基准上实现了高达16.6%的准确率提升。源代码可在https://github.com/SJTU-RTEAS/SmartThinker获取。

英文摘要

Large reasoning models (LRMs) like OpenAI o1 and DeepSeek-R1 achieve high accuracy on complex tasks by adopting long chain-of-thought (CoT) reasoning paths. However, the inherent verbosity of these processes frequently results in redundancy and overthinking. To address this issue, existing works leverage Group Relative Policy Optimization (GRPO) to reduce LRM output length, but their static length reward design cannot dynamically adapt according to the relative problem difficulty and response length distribution, causing over-compression and compromised accuracy. Therefore, we propose SmartThinker, a novel GRPO-based efficient reasoning method with progressive CoT length calibration. SmartThinker makes a two-fold contribution: First, it dynamically estimates the optimal length with peak accuracy during training and guides overlong responses toward it to reduce response length while sustaining accuracy. Second, it dynamically modulates the length reward coefficient to avoid the unwarranted penalization of correct reasoning paths. Extensive experiment results show that SmartThinker achieves up to 52.5% average length compression with improved accuracy, and achieves up to 16.6% accuracy improvement on challenging benchmarks like AIME25. The source code can be found at https://github.com/SJTU-RTEAS/SmartThinker.

URL PDF HTML ☆

赞 0 踩 0

2603.07578 2026-06-02 cs.RO

从特征到行动：传统与智能体AI系统中的可解释性

Sindhuja Chaduvula, Jessee Ho, Kina Kim, Aravind Narayanan, Ahmed Y. Radwan, Mahshid Alinoori, Muskan Garg, Dhanesh Ramachandram, Shaina Raza

发表机构 * Vector Institute for Artificial Intelligence（向量人工智能研究所）； Independent Researcher（独立研究者）； Mayo Clinic（梅奥诊所）

AI总结本文比较了基于归因的解释与基于轨迹的诊断在静态和智能体设置中的效果，发现归因方法无法可靠诊断智能体轨迹中的执行级故障，而轨迹级可解释性更能定位行为故障。

详情

AI中文摘要

在过去十年中，可解释AI主要关注解释单个模型预测，在固定决策结构下生成将输入与输出关联的事后解释。大型语言模型的最新进展使得智能体AI系统能够在多步轨迹中展开行为。在这些设置中，成功与失败由决策序列而非单个输出决定。目前尚不清楚为静态预测设计的解释方法如何应用于行为随时间涌现的智能体设置。在这项工作中，我们通过比较两种设置中基于归因的解释与基于轨迹的诊断来弥合这一差距。我们的结果表明，虽然归因方法在静态设置中实现了稳定的特征排名（Spearman ρ = 0.86），但它们无法可靠地诊断智能体轨迹中的执行级故障。相比之下，针对智能体设置的轨迹接地评分标准能够一致地定位行为故障，并揭示状态跟踪不一致在失败运行中的普遍性高出2.7倍，并将成功概率降低49%。这些发现促使我们转向轨迹级可解释性，以评估和诊断智能体系统中自主AI行为。代码：https://github.com/VectorInstitute/unified-xai-evaluation-framework 项目页面：https://vectorinstitute.github.io/unified-xai-evaluation-framework

英文摘要

Over the last decade, Explainable AI has primarily focused on interpreting individual model predictions, producing post-hoc explanations that relate inputs to outputs under a fixed decision structure. Recent advances in large language models (LLMs) have enabled agentic AI systems whose behaviour unfolds over multi-step trajectories. In these settings, success and failure are determined by sequences of decisions rather than a single output. It remains unclear how explanation approaches designed for static predictions translate to agentic settings where behaviour emerges over time. In this work, we bridge this gap by comparing attribution-based explanations with trace-based diagnostics across both settings. Our results show that while attribution methods achieve stable feature rankings in static settings (Spearman \r{ho} = 0.86), they cannot be applied reliably to diagnose execution-level failures in agentic trajectories. In contrast, trace-grounded rubric evaluation for agentic settings consistently localizes behaviour breakdowns and reveals that state tracking inconsistency is 2.7x more prevalent in failed runs and reduces success probability by 49%. These findings motivate a shift towards trajectory-level explainability for evaluating and diagnosing autonomous AI behaviour in agentic systems. Code: https://github.com/VectorInstitute/unified-xai-evaluation-framework Project page: https://vectorinstitute.github.io/unified-xai-evaluation-framework

URL PDF HTML ☆

赞 0 踩 0

2603.04828 2026-06-02 cs.CL

From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models

从陌生到熟悉：通过梯度偏差检测大型语言模型中的预训练数据

Ruiqi Zhang, Lingxiang Wang, Hainan Zhang, Zhiming Zheng, Yanyan Lan

发表机构 * Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University（北京未来区块链与隐私计算先进创新中心，北京航空航天大学）； School of Artificial Intelligence, Beihang University（北京航空航天大学人工智能学院）； Institute for AI Industry Research (AIR), Tsinghua University（清华大学人工智能产业研究院）

AI总结提出GDS方法，通过分析目标样本的梯度偏差分数（包括更新幅度、位置和神经元激活集中度）来区分预训练成员与非成员数据，实现高效且跨数据集迁移的预训练数据检测。

Comments 17 pages, 8 figures

详情

AI中文摘要

大型语言模型的预训练数据检测对于解决版权问题和减轻基准污染至关重要。现有方法主要关注微调前后的基于似然的统计特征或启发式信号，但前者易受语料库中词频偏差的影响，后者强烈依赖于微调数据的相似性。从优化角度，我们观察到在训练过程中，样本从陌生到熟悉的转变方式体现在梯度行为的系统性差异上。熟悉样本表现出更小的更新幅度、模型组件中不同的更新位置以及更尖锐激活的神经元。基于这一洞察，我们提出GDS，一种通过探测目标样本的梯度偏差分数来识别预训练数据的方法。具体来说，我们首先使用梯度轮廓表示每个样本，该轮廓捕获跨FFN和注意力模块的参数更新的幅度、位置和集中度，揭示成员与非成员数据之间的一致区别。然后将这些特征输入轻量级分类器进行二值成员推断。在五个公共数据集上的实验表明，GDS在强基线上实现了最先进的性能，并显著提高了跨数据集的可迁移性。进一步的可解释性分析揭示了梯度分布的差异，半监督结果为检测预训练数据提供了一种实用方法。

英文摘要

Pre-training data detection for LLMs is essential for addressing copyright concerns and mitigating benchmark contamination. Existing methods mainly focus on the likelihood-based statistical features or heuristic signals before and after fine-tuning, but the former are susceptible to word frequency bias in corpora, and the latter strongly depend on the similarity of fine-tuning data. From an optimization perspective, we observe that during training, samples transition from unfamiliar to familiar in a manner reflected by systematic differences in gradient behavior. Familiar samples exhibit smaller update magnitudes, distinct update locations in model components, and more sharply activated neurons. Based on this insight, we propose GDS, a method that identifies pre-training data by probing Gradient Deviation Scores of target samples. Specifically, we first represent each sample using gradient profiles that capture the magnitude, location, and concentration of parameter updates across FFN and Attention modules, revealing consistent distinctions between member and non-member data. These features are then fed into a lightweight classifier to perform binary membership inference. Experiments on five public datasets show that GDS achieves state-of-the-art performance with significantly improved cross-dataset transferability over strong baselines. Further interpretability analyses reveal differences in gradient distributions, and the semi-supervised results offer a practical way to detect pre-training data.

URL PDF HTML ☆

赞 0 踩 0

2603.04430 2026-06-02 cs.LG

Flowers: A Warp Drive for Neural PDE Solvers

Flowers: 神经PDE求解器的曲速引擎

Till Muser, Alexandra Spitzer, Matti Lassas, Maarten V. de Hoop, Ivan Dokmanić

发表机构 * ETH Zurich（苏黎世联邦理工学院）； University of Helsinki（赫尔辛基大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结提出Flowers架构，通过多头扭曲场实现线性代价的自适应全局交互，在2D/3D时变PDE基准上超越傅里叶、卷积和注意力基线。

详情

AI中文摘要

我们引入了Flowers，一种完全由多头扭曲构建的神经架构，用于学习PDE解算子。除了逐点通道混合和多尺度支架外，Flowers不使用傅里叶乘子、点积注意力或卷积混合。每个头预测一个位移场并扭曲混合后的输入特征。受物理和计算效率的启发，位移是逐点预测的，没有任何空间聚合，非局域性仅通过每个头在源坐标处的稀疏采样引入。在多尺度残差块中堆叠扭曲得到Flowers，它以线性代价实现自适应的全局交互。我们通过三个互补视角从理论上论证了这一设计：守恒律的流图、非均匀介质中的波以及动力学理论的连续极限。Flowers在一系列2D和3D时变PDE基准上取得了优异性能，特别是流和波。一个紧凑的17M参数模型持续优于相似规模的傅里叶、卷积和注意力基线，而一个150M参数变体在参数、数据和训练计算量多得多的情况下，超越了近期基于transformer的基础模型。

英文摘要

We introduce Flowers, a neural architecture for learning PDE solution operators built entirely from multihead warps. Aside from pointwise channel mixing and a multiscale scaffold, Flowers use no Fourier multipliers, no dot-product attention, and no convolutional mixing. Each head predicts a displacement field and warps the mixed input features. Motivated by physics and computational efficiency, displacements are predicted pointwise, without any spatial aggregation, and nonlocality enters only through sparse sampling at source coordinates, one per head. Stacking warps in multiscale residual blocks yields Flowers, which implement adaptive, global interactions at linear cost. We theoretically motivate this design through three complementary lenses: flow maps for conservation laws, waves in inhomogeneous media, and a kinetic-theoretic continuum limit. Flowers achieve excellent performance on a broad suite of 2D and 3D time-dependent PDE benchmarks, particularly flows and waves. A compact 17M-parameter model consistently outperforms Fourier, convolution, and attention-based baselines of similar size, while a 150M-parameter variant improves over recent transformer-based foundation models with much more parameters, data, and training compute.

URL PDF HTML ☆

赞 0 踩 0

2602.23694 2026-06-02 cs.RO cs.AI

Interpretable Multimodal Gesture Recognition for Drone and Mobile Robot Teleoperation via Log-Likelihood Ratio Fusion

基于对数似然比融合的可解释多模态手势识别用于无人机和移动机器人遥操作

Seungyeol Baek, Jaspreet Singh, Lala Shakti Swarup Ray, Hymalai Bello, Paul Lukowicz, Sungho Suh

发表机构 * Department of Artificial Intelligence, Korea University（人工智能系，韩国大学）； Department of Computer Science, RPTU Kaiserslautern-Landau（计算机科学系，RPTU凯撒斯劳滕-兰道）； Embedded Intelligence, German Research Center for Artificial Intelligence (DFKI)（嵌入式智能，德国人工智能研究中心（DFKI））

AI总结提出一种融合腕戴式Apple Watch惯性数据和定制手套电容传感信号的多模态手势识别框架，利用对数似然比后期融合策略提升性能并提供可解释性，在降低计算成本的同时达到与视觉基线相当的识别效果。

详情

AI中文摘要

人类操作员仍经常暴露在危险环境中，如灾区及工业设施，在这些场景中，移动机器人和无人飞行器（UAV）的直观可靠遥操作至关重要。在此背景下，免手持遥操作增强了操作员的移动性和态势感知能力，从而提高了危险环境中的安全性。尽管基于视觉的手势识别已被探索作为免手持遥操作的一种方法，但其性能在遮挡、光照变化和杂乱背景下常会下降，限制了其在真实操作中的适用性。为克服这些限制，我们提出一种多模态手势识别框架，该框架融合来自双手腕上Apple Watch的惯性数据（加速度计、陀螺仪和方向）与来自定制手套的电容传感信号。我们设计了一种基于对数似然比（LLR）的后期融合策略，该策略不仅提升了识别性能，还通过量化模态特定贡献提供了可解释性。为支持本研究，我们引入了一个包含20种受飞机引导信号启发的手势的新数据集，包含同步的RGB视频、IMU和电容传感器数据。实验结果表明，我们的框架在显著降低计算成本、模型大小和训练时间的同时，达到了与最先进的视觉基线相当的性能，使其非常适合实时机器人控制。因此，我们强调了基于传感器的多模态融合作为手势驱动的移动机器人和无人机遥操作的鲁棒且可解释解决方案的潜力。

英文摘要

Human operators are still frequently exposed to hazardous environments such as disaster zones and industrial facilities, where intuitive and reliable teleoperation of mobile robots and Unmanned Aerial Vehicles (UAVs) is essential. In this context, hands-free teleoperation enhances operator mobility and situational awareness, thereby improving safety in hazardous environments. While vision-based gesture recognition has been explored as one method for hands-free teleoperation, its performance often deteriorates under occlusions, lighting variations, and cluttered backgrounds, limiting its applicability in real-world operations. To overcome these limitations, we propose a multimodal gesture recognition framework that integrates inertial data (accelerometer, gyroscope, and orientation) from Apple Watches on both wrists with capacitive sensing signals from custom gloves. We design a late fusion strategy based on the log-likelihood ratio (LLR), which not only enhances recognition performance but also provides interpretability by quantifying modality-specific contributions. To support this research, we introduce a new dataset of 20 distinct gestures inspired by aircraft marshalling signals, comprising synchronized RGB video, IMU, and capacitive sensor data. Experimental results demonstrate that our framework achieves performance comparable to a state-of-the-art vision-based baseline while significantly reducing computational cost, model size, and training time, making it well suited for real-time robot control. We therefore underscore the potential of sensor-based multimodal fusion as a robust and interpretable solution for gesture-driven mobile robot and drone teleoperation.

URL PDF HTML ☆

赞 0 踩 0

2602.22101 2026-06-02 cs.LG cs.AI

On Imbalanced Regression with Hoeffding Trees

关于使用Hoeffding树的不平衡回归

Pantia-Marina Alchirch, Dimitrios I. Diochnos

发表机构 * University of Oklahoma（俄克拉荷马大学）

AI总结针对不平衡回归中的数据流问题，将核密度估计扩展到流式设置并集成层次收缩到增量决策树中，实验表明KDE能持续提升早期流性能。

Comments 17 pages, 5 figures, 3 tables, 2 algorithms, authors' version of paper accepted in PAKDD 2026 special session on Data Science: Foundations and Applications (DSFA)

2512.00470 2026-06-02 cs.RO

Perry Dong, Chongyi Zheng, Chelsea Finn, Dorsa Sadigh, Benjamin Eysenbach

发表机构 * Stanford University（斯坦福大学）； Princeton University（普林斯顿大学）

AI总结本文利用基于流的生成模型估计完整未来回报分布，通过新的流匹配目标满足分布贝尔曼方程，并利用流导数ODE估计回报不确定性以优先学习，在离线与在线设置中平均成功率提升1.3倍。

Comments ICLR 2026

详情

AI中文摘要

虽然当今大多数强化学习方法将未来回报的分布压缩为单个标量值，但分布RL方法利用回报分布提供更强的学习信号，并支持探索和安全强化学习中的应用。虽然估计回报分布的主要方法是将其建模为离散区间上的分类分布或估计有限数量的分位数，但这些方法留下了关于回报分布的细粒度结构以及如何区分高回报不确定性的状态以进行决策的未解问题。本文的关键思想是使用现代、灵活的基于流的模型来估计完整的未来回报分布，并识别那些具有高回报方差的状态。我们通过制定一个新的流匹配目标来实现这一点，该目标生成满足分布贝尔曼方程的概率密度路径。基于学习到的流模型，我们使用一个新的流导数ODE来估计不同状态的回报不确定性。我们还利用这种不确定性信息，优先在某些转换上学习更准确的回报估计。我们将我们的方法（Value Flows）与先前的方法在离线和在线到在线设置中进行了比较。在37个基于状态和25个基于图像的基准任务上的实验表明，Value Flows在成功率上平均提高了1.3倍。网站：https://pd-perry.github.io/value-flows 代码：https://github.com/chongyi-zheng/value-flows

英文摘要

While most reinforcement learning methods today flatten the distribution of future returns to a single scalar value, distributional RL methods exploit the return distribution to provide stronger learning signals and to enable applications in exploration and safe RL. While the predominant method for estimating the return distribution is by modeling it as a categorical distribution over discrete bins or estimating a finite number of quantiles, such approaches leave unanswered questions about the fine-grained structure of the return distribution and about how to distinguish states with high return uncertainty for decision-making. The key idea in this paper is to use modern, flexible flow-based models to estimate the full future return distributions and identify those states with high return variance. We do so by formulating a new flow-matching objective that generates probability density paths satisfying the distributional Bellman equation. Building upon the learned flow models, we estimate the return uncertainty of distinct states using a new flow derivative ODE. We additionally use this uncertainty information to prioritize learning a more accurate return estimation on certain transitions. We compare our method (Value Flows) with prior methods in the offline and online-to-online settings. Experiments on $37$ state-based and $25$ image-based benchmark tasks demonstrate that Value Flows achieves a $1.3\times$ improvement on average in success rates. Website: https://pd-perry.github.io/value-flows Code: https://github.com/chongyi-zheng/value-flows

URL PDF HTML ☆

赞 0 踩 0

2603.03031 2026-06-02 cs.LG

Step-Level Sparse Autoencoder for Reasoning Process Interpretation

步骤级稀疏自编码器用于推理过程解释

Xuan Yang, Jiayu Liu, Yuhang Lai, Hao Xu, Zhenya Huang, Ning Miao

发表机构 * arXiv.org ； University of Science and Technology of China（中国科学技术大学）

AI总结提出步骤级稀疏自编码器（SSAE），通过条件稀疏化形成信息瓶颈，将推理步骤中的增量信息与背景信息分离为稀疏特征，用于解释大语言模型的推理过程。

详情

AI中文摘要

大型语言模型（LLMs）通过思维链（CoT）推理实现了强大的复杂推理能力。然而，它们的推理模式仍然过于复杂而难以分析。尽管稀疏自编码器（SAEs）已成为可解释性的强大工具，但现有方法主要在token级别操作，在捕获更关键的步骤级信息（如推理方向和语义转换）时存在粒度不匹配问题。在这项工作中，我们提出了步骤级稀疏自编码器（SSAE），作为一种分析工具，将LLMs推理步骤的不同方面解耦为稀疏特征。具体来说，通过精确控制步骤特征基于其上下文的稀疏性，我们在步骤重建中形成一个信息瓶颈，将增量信息从背景信息中分离出来，并将其解耦为几个稀疏激活的维度。在多个基础模型和推理任务上的实验显示了提取特征的有效性。通过线性探测，我们可以轻松预测表面级信息，如生成长度和第一个token分布，以及更复杂的属性，如步骤的正确性和逻辑性。这些观察表明，LLMs在生成过程中应该已经至少部分地知道这些属性，这为LLMs的自我验证能力提供了基础。我们的代码可在https://github.com/Miaow-Lab/SSAE获取。

英文摘要

Large Language Models (LLMs) have achieved strong complex reasoning capabilities through Chain-of-Thought (CoT) reasoning. However, their reasoning patterns remain too complicated to analyze. While Sparse Autoencoders (SAEs) have emerged as a powerful tool for interpretability, existing approaches predominantly operate at the token level, creating a granularity mismatch when capturing more critical step-level information, such as reasoning direction and semantic transitions. In this work, we propose step-level sparse autoencoder (SSAE), which serves as an analytical tool to disentangle different aspects of LLMs' reasoning steps into sparse features. Specifically, by precisely controlling the sparsity of a step feature conditioned on its context, we form an information bottleneck in step reconstruction, which splits incremental information from background information and disentangles it into several sparsely activated dimensions. Experiments on multiple base models and reasoning tasks show the effectiveness of the extracted features. By linear probing, we can easily predict surface-level information, such as generation length and first token distribution, as well as more complicated properties, such as the correctness and logicality of the step. These observations indicate that LLMs should already at least partly know about these properties during generation, which provides the foundation for the self-verification ability of LLMs. Our code is available at https://github.com/Miaow-Lab/SSAE.

URL PDF HTML ☆

赞 0 踩 0

2601.09566 2026-06-02 cs.CV cs.AI

Hot-Start Chinese Language Modeling:Visual Glyphs Accelerate Sample-Efficient Learning

热启动中文语言建模：视觉字形加速样本高效学习

Shuyang Xiang, Hao Guan

发表机构 * Independent Researcher（独立研究者）； Institute of Software, Chinese Academy of Sciences（中国科学院软件研究所）

AI总结本文通过将汉字渲染为视觉字形图像，研究其对字符级语言建模的归纳偏置，发现视觉输入产生显著的热启动效应，但最终精度与基于索引的方法一致。

Comments 15 pages, 5 figures, submitted to ACL 2026

详情

AI中文摘要

在这项工作中，我们研究了将汉字渲染为视觉字形图像（而非主流LLM使用的离散token ID）是否为字符级语言建模提供归纳偏置。我们的核心发现给出了一个双刃剑的见解：视觉输入产生显著的热启动效应，在第一个epoch内（占总训练步骤的0.4%）将早期准确率提高一倍以上（视觉输入12.3% vs. 基于索引的基线5.8%），但两种方法最终收敛到几乎相同的最终准确率（39%）。这一模式在低至8x8像素的分辨率、高达50%的部分裁剪以及从110M到1.78B参数的模型规模下均成立。我们识别的机制是，字形渲染在训练之前就将基于部首的结构预编码到嵌入空间中（余弦相似度0.27 vs. 随机嵌入的0.002），从而能够更快地对齐，但无法提高最终容量。我们的结果阐明了视觉表示作为中文语言建模归纳偏置的前景和根本局限性。

英文摘要

In this work, we study whether rendering Chinese characters as visual glyph images, rather than discrete token IDs as mainstream LLMs do, providing an inductive bias for character-level language modeling. Our central finding gives a double-edged insight: visual inputs produce a pronounced hot-start effect, more than doubling early-stage accuracy within the first epoch (at 0.4% of total training steps) (12.3% visual inputs vs. 5.8% index-based baseline), yet both approaches converge to essentially identical final accuracy (39%). This pattern holds across resolutions as low as 8x8 pixels, partial cropping up to 50%, and model scales from 110M to 1.78B parameters. The mechanism we identify is that glyph rendering pre-encodes radical-based structure into embedding space before any training (cosine similarity 0.27 vs. 0.002 for random embeddings), enabling faster alignment but not higher final capacity. Our results clarify both the promise and fundamental limitation of visual representations as inductive biases for Chinese language modeling.

URL PDF HTML ☆

赞 0 踩 0

2603.02650 2026-06-02 cs.LG cs.AI cs.RO

概念异质性感知表示引导

Laziz U. Abdullaev, Noelle Y. L. Wong, Ryan T. Z. Lee, Shiqi Jiang, Khoi N. M. Nguyen, Tan M. Nguyen

发表机构 * arXiv

AI总结针对大语言模型表示非均匀导致全局引导脆弱的问题，提出基于最优传输的输入依赖引导方法CHaRS，通过高斯混合模型和离散最优传输实现更有效的行为控制。

详情

Journal ref: ICML 2026

AI中文摘要

表示引导提供了一种轻量级机制，通过在推理时干预内部激活来控制大语言模型（LLMs）的行为。现有方法大多依赖于单个全局引导方向，通常通过对比较数据集进行均值差异得到。这种方法隐含假设目标概念在嵌入空间中均匀表示。然而在实践中，LLM表示可能高度非均匀，表现出聚类、上下文相关的结构，这使得全局引导方向变得脆弱。在这项工作中，我们通过最优传输（OT）的视角审视表示引导，注意到标准均值差异引导隐式对应于具有不同一阶矩的两个相同分布之间的OT映射，产生全局平移。为了放宽这一限制性假设，我们从理论上将源和目标表示建模为高斯混合模型，并将引导公式化为语义潜在聚类之间的离散OT问题。从得到的传输计划中，我们通过重心投影推导出显式的、输入依赖的引导映射，产生聚类级别偏移的平滑核加权组合。我们将此方法称为概念异质性感知表示引导（CHaRS）。通过大量实验设置，我们证明CHaRS比全局引导产生更有效的行为控制。

英文摘要

Representation steering offers a lightweight mechanism for controlling the behavior of large language models (LLMs) by intervening on internal activations at inference time. Most existing methods rely on a single global steering direction, typically obtained via difference-in-means over contrastive datasets. This approach implicitly assumes that the target concept is homogeneously represented across the embedding space. In practice, however, LLM representations can be highly non-homogeneous, exhibiting clustered, context-dependent structure, which renders global steering directions brittle. In this work, we view representation steering through the lens of optimal transport (OT), noting that standard difference-in-means steering implicitly corresponds to the OT map between two identical distributions with differing first moments, yielding a global translation. To relax this restrictive assumption, we theoretically model source and target representations as Gaussian mixture models and formulate steering as a discrete OT problem between semantic latent clusters. From the resulting transport plan, we derive an explicit, input-dependent steering map via barycentric projection, producing a smooth, kernel-weighted combination of cluster-level shifts. We term this method Concept Heterogeneity-aware Representation Steering (CHaRS). Through numerous experimental settings, we show that CHaRS yields more effective behavioral control than global steering.

URL PDF HTML ☆

赞 0 踩 0

2509.15394 2026-06-02 cs.LG

VMDNet: Temporal Leakage-Free Variational Mode Decomposition for Electricity Demand Forecasting

VMDNet：用于电力需求预测的无时间泄漏变分模态分解

Weibin Feng, Ran Tao, John Cartlidge, Jin Zheng

发表机构 * UKRI EPSRC Doctoral Training Partnership（UKRI EPSRC博士培训计划）； UKRI EPSRC ； AI for Collective Intelligence (AI4CI)（集体智能（AI4CI））

AI总结提出VMDNet框架，通过逐样本变分模态分解避免时间泄漏、频率感知嵌入和并行时间卷积网络建模各模态，并引入Stackelberg博弈双层优化选择超参数，在电力需求预测中超越现有方法。

Comments 5 pages, 1 figure, 2 tables. Version 3: Accepted author manuscript for the 34th European Signal Processing Conference (EUSIPCO 2026), Bruges, Belgium. Improved figures, additional details on TCN-based parallel decoding, and extended literature review. Code and data available: https://github.com/weibin-feng/VMDNet

详情

AI中文摘要

准确的电力需求预测具有挑战性，因为真实需求序列具有强多周期性，使得有效建模循环时间模式至关重要。分解技术使这种结构显式化，从而提升预测性能。变分模态分解（VMD）是一种用于周期性感知分解的强大信号处理方法，近年来得到越来越多的采用。然而，现有研究常遭受信息泄漏，并依赖不恰当的超参数调优。为解决这些问题，我们提出VMDNet，一个因果保持框架，它（i）应用逐样本VMD以避免时间泄漏；（ii）用频率感知嵌入表示每个分解模态，并使用并行时间卷积网络（TCNs）解码，确保模态独立性和高效学习；（iii）引入受Stackelberg博弈启发的双层方案来指导VMD两个关键超参数的选择。在三个广泛使用的电力需求数据集上的实验表明，VMDNet持续优于最先进的基线方法。

英文摘要

Accurate electricity demand forecasting is challenging due to the strong multi-periodicity of real-world demand series, which makes effective modeling of recurrent temporal patterns crucial. Decomposition techniques make such structure explicit and thereby improve predictive performance. Variational Mode Decomposition (VMD) is a powerful signal-processing method for periodicity-aware decomposition and has seen growing adoption in recent years. However, existing studies often suffer from information leakage and rely on inappropriate hyperparameter tuning. To address these issues, we propose VMDNet, a causality-preserving framework that (i) applies sample-wise VMD to avoid temporal leakage; (ii) represents each decomposed mode with frequency-aware embeddings and decodes it using parallel temporal convolutional networks (TCNs), ensuring mode independence and efficient learning; and (iii) introduces a Stackelberg game inspired bilevel scheme to guide the selection of VMD's two key hyperparameters. Experiments on three widely used electricity demand datasets show that VMDNet consistently outperforms state-of-the-art baselines.

URL PDF HTML ☆

赞 0 踩 0

2603.01302 2026-06-02 cs.RO

通过Logits凸性稳定策略优化

Hongzhan Chen, Tao Yang, Yuhua Zhu, Shiping Gao, Xiaojun Quan, Ting Yao

发表机构 * National University of Singapore（新加坡国立大学）； University of Science and Technology of China（中国科学技术大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结针对强化学习训练不稳定的问题，从梯度角度分析监督微调与强化学习的稳定性差距，提出Logits凸优化（LCO）框架，通过模拟logits级凸性来稳定策略优化，实验表明该方法能提升训练稳定性并在多个基准上优于传统方法。

详情

AI中文摘要

虽然强化学习（RL）在大语言模型（LLM）近期成功中发挥了核心作用，但RL优化以不稳定著称，尤其是与监督微调（SFT）相比。本文从梯度角度研究SFT和RL之间的稳定性差距，并表明SFT损失相对于模型logits的凸性在实现稳定训练中起关键作用。我们的理论分析证明，该性质在优化过程中诱导了有利的梯度方向性。相比之下，广泛采用的策略梯度算法——使用裁剪替代目标的近端策略优化（PPO）缺乏这种稳定性质。受此观察启发，我们提出Logits凸优化（LCO），一种简单而有效的策略优化框架，将学习策略与从原始RL目标导出的最优目标对齐，从而模拟logits级凸性的稳定效果。跨多个模型家族的大量实验表明，我们的LCO框架一致地提升了训练稳定性，并在广泛的基准测试中优于传统RL方法。

英文摘要

While reinforcement learning (RL) has been central to the recent success of large language models (LLMs), RL optimization is notoriously unstable, especially when compared to supervised fine-tuning (SFT). In this work, we investigate the stability gap between SFT and RL from a gradient-based perspective, and show that the convexity of the SFT loss with respect to model logits plays a key role in enabling stable training. Our theoretical analysis demonstrates that this property induces favorable gradient directionality during optimization. In contrast, Proximal Policy Optimization (PPO), a widely adopted policy gradient algorithm utilizing a clipped surrogate objective, lacks this stabilizing property. Motivated by this observation, we propose Logits Convex Optimization (LCO), a simple yet effective policy optimization framework that aligns the learned policy with an optimal target derived from the original RL objective, thereby emulating the stabilizing effects of logits-level convexity. Extensive experiments across multiple model families show that our LCO framework consistently improves training stability and outperforms conventional RL methods on a broad range of benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2603.00829 2026-06-02 cs.CL cs.AI cs.LG

Constitutional Black-Box Monitoring for Scheming in LLM Agents

LLM Agent 中阴谋行为的宪法黑盒监控

Simon Storf, Rich Barton-Cooper, James Peters-Gill, Marius Hobbhahn

发表机构 * University of Cambridge（剑桥大学）

AI总结研究使用基于宪法黑盒的监控器，通过仅观察外部输入和输出检测LLM Agent的阴谋行为，并在合成数据上优化后泛化到更真实环境。

Comments Accepted at ICML 2026. Camera-ready version

详情

AI中文摘要

在自主环境中安全部署大型语言模型（LLM）Agent需要可靠的监督机制。一个核心挑战是检测阴谋行为，即Agent暗中追求不一致的目标。缓解此类风险的一种方法是基于LLM的监控：使用语言模型检查Agent行为中的可疑动作。我们研究宪法黑盒监控器：仅利用外部可观测的输入和输出检测阴谋行为的提示分类器，并在从自然语言行为规范生成的合成数据上优化。我们引入两个生成合成Agent轨迹的流水线：STRIDE（迭代精炼）和Gloom（Agent-环境模拟），各生成1000个样本。通过提示扫描、人工精炼和自动提示优化，我们在这些数据集上优化前沿LLM监控器，并在ControlArena（一套Agent在更现实环境中运行的接地环境）中的7500个保留轨迹上评估性能。结果表明，仅基于合成数据选择的监控器可以泛化到更现实的环境，捕获有意义的阴谋信号。然而，我们发现性能在我们的设置中迅速饱和，简单的提示扫描匹配了更广泛优化的结果。超越这一限制不会带来进一步改进，反而导致过拟合。

英文摘要

Safe deployment of Large Language Model (LLM) agents in autonomous settings requires reliable oversight mechanisms. A central challenge is detecting scheming, where agents covertly pursue misaligned goals. One approach to mitigating such risks is LLM-based monitoring: using language models to examine agent behaviors for suspicious actions. We study constitutional black-box monitors: prompted classifiers that detect scheming using only externally observable inputs and outputs, optimized on synthetic data generated from natural-language behavior specifications. We introduce two pipelines for generating synthetic agent trajectories, STRIDE (iterative refinement) and Gloom (agent-environment simulation), from which we generate 1,000 samples each. We optimize frontier LLM monitors on these datasets via prompt sweeps, human refinement, and automated prompt optimization, and evaluate performance on 7,500 held-out trajectories from ControlArena, a suite of grounded environments where agents operate in more realistic contexts. Our results demonstrate that monitors selected purely on synthetic data can generalize to more realistic environments, capturing a meaningful scheming signal. However, we find that performance saturates quickly in our setting, with simple prompt sweeps matching the results of more extensive optimization. Pushing beyond this limit yields no further improvements and instead leads to overfitting.

URL PDF HTML ☆

赞 0 踩 0