arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.30866 2026-06-01 quant-ph cs.LG

Generative Quantum Data Embeddings for Supervised Learning

用于监督学习的生成式量子数据嵌入

Jaewoong Heo, Daniel K. Park

AI总结 提出一种基于能量的生成学习框架,通过保真度替代目标优化嵌入结构和参数,提升分类性能,并利用Wasserstein距离解释性能饱和现象。

详情
Comments
14 pages, 7 figures
AI中文摘要

量子机器学习的许多实际相关应用涉及经典数据,其性能关键取决于输入如何嵌入到量子态中。然而,使用固定的嵌入电路拟设仍是标准做法。我们提出了一种基于能量的生成学习框架,该框架合成门序列以优化嵌入结构并细化数据定制的参数,使用基于保真度的替代目标引导搜索以提高类别区分度。实验表明,该方法在不同设置下改善了分类性能,同时也揭示了在现有嵌入族内进行架构搜索仅带来有限额外收益的数据集。我们通过推导输入空间中Wasserstein距离的可实现经验风险界限来解释这种饱和,表明经典数据几何为不太可能从嵌入优化中获得实质性收益的情况提供了先验诊断。结果建立了一个实用且有理论依据的框架,通过生成优化搜索有效的量子数据嵌入,并通过底层经典数据的几何诊断可获得的收益。

英文摘要

Many practically relevant applications of quantum machine learning involve classical data, for which performance depends critically on how inputs are embedded into quantum states. Yet the use of a fixed embedding circuit ansatz remains standard practice. We propose an energy-based generative learning framework that synthesizes gate sequences to optimize embedding structures and refine data-tailored parameters, using a fidelity-based surrogate objective to guide the search toward improved class distinguishability. Empirically, the method improves classification performance across diverse settings, while also revealing datasets where architecture search within the present embedding family yields only limited additional gains. We explain this saturation by deriving bounds on the achievable empirical risk in terms of the Wasserstein distance in the input space, showing that classical data geometry provides an \emph{a priori} diagnostic for regimes in which substantial gains from embedding optimization are unlikely. The results establish a practically useful and theoretically motivated framework for searching effective quantum data embeddings through generative optimization, with the attainable gains diagnosed through the geometry of the underlying classical data.

2605.30865 2026-06-01 cs.LG

GlucoFM: A Dual-Stream Foundation Model for Continuous Glucose Monitoring

GlucoFM: 一种用于连续血糖监测的双流基础模型

Zechen Li, Keerthana Natarajan, Weizhi Zhang, Menglian Zhou, Simon A. Lee, Yuwei Zhang, Maxwell A. Xu, Zeinab Esmaeilpour, Flora D. Salim, Mark Malhotra, Lindsey Sunden, Shwetak Patel, Yuzhe Yang, Ahmed A. Metwally

AI总结 提出GlucoFM,一种轻量级CGM基础模型,通过将血糖动态分解为慢生理状态和瞬态事件流,在7个临床预测任务上平均PR-AUC比最佳CGM专用模型提高4.1点。

详情
AI中文摘要

连续血糖监测(CGM)提供了日常代谢生理的密集视图,然而现有的通用时间序列和CGM专用基础模型通常将血糖轨迹编码为纠缠的单流序列,使得血糖动态的独特时间结构仅被隐式建模。我们提出GlucoFM,一种轻量级CGM基础模型,它将不规则记录对齐到24小时时间网格,保留观测掩码,并将血糖动态分解为慢生理状态和瞬态事件流,捕捉低频血糖基线和可能反映急性生理反应或传感器伪影的短期偏差。GlucoFM在来自477名受试者的109,066小时未标记CGM记录上进行了预训练,具有两个互补目标:融合每日表示上的掩码上下文潜在预测以及状态和事件流上的时间动态预测。在四个不同队列和七个临床预测任务中,GlucoFM在评估基线中实现了最强的受试者分离线性探测性能,比最佳CGM专用基础模型平均PR-AUC提高4.1点。其收益在核心代谢结果上最为显著,在所有糖尿病风险和β细胞功能障碍任务以及4个胰岛素抵抗任务中的3个上领先PR-AUC。GlucoFM还在评估方法中实现了最佳的整体跨数据集迁移性能和强大的少样本适应能力,并且在聚合多天进行受试者级别预测时获得一致收益,突出了生理感知分解作为可迁移CGM表示学习的有效归纳偏置。

英文摘要

Continuous glucose monitoring (CGM) provides a dense view of daily metabolic physiology, yet existing generic time-series and CGM-specific foundation models often encode glucose traces as entangled single-stream sequences, leaving the distinct temporal structure of glycemic dynamics only implicitly modeled. We present GlucoFM, a lightweight CGM foundation model that aligns irregular recordings to a 24-hour chronological grid, preserves observation masks, and decomposes glucose dynamics into slow physiological state and transient event streams, capturing low-frequency glycemic baselines and short-term deviations that may reflect acute physiological responses or sensor artifacts. GlucoFM is pretrained on 109,066 hours of unlabeled CGM recordings from 477 subjects with two complementary objectives: masked contextual latent prediction over fused daily representations and temporal dynamics prediction over state and event streams. Across four diverse cohorts and seven clinical prediction tasks, GlucoFM achieves the strongest subject-disjoint linear-probing performance among evaluated baselines, improving average PR-AUC by 4.1 points over the best CGM-specific foundation model. Its gains are most pronounced on core metabolic outcomes, leading PR-AUC on all diabetes-risk and $β$-cell dysfunction tasks and on 3 of 4 insulin-resistance tasks. GlucoFM also achieves the best overall cross-dataset transfer performance and strong few-shot adaptation among evaluated methods, and consistent gains when aggregating multiple days for subject-level prediction, highlighting physiology-aware decomposition as an effective inductive bias for transferable CGM representation learning.

2605.30863 2026-06-01 cs.CV cs.GR

DSD-GS: Dynamic-Static Decomposition of Gaussian Splatting for Efficient and High-Fidelity Dynamic Scene Reconstruction

DSD-GS: 面向高效高保真动态场景重建的高斯泼溅动态-静态分解

Youngtae Han, Sung-hwan Han, Youngmin Yi

AI总结 提出基于前馈高斯泼溅编码器和光流模型的动态-静态分解框架,通过消除静态区域冗余计算,在渲染质量、训练/渲染速度和存储效率上达到最优。

详情
Comments
23 pages, 9 figures, 7 tables
AI中文摘要

动态场景重建和新视角合成是虚拟现实、机器人、数字孪生等下一代视觉智能应用的基础。然而,从任意视角对复杂时变场景进行高保真重建仍是一个重大挑战。现有的动态3DGS方法由于将所有高斯体建模为动态组件,存在计算效率低下的问题。虽然近期基于分解的方法试图解决这一问题,但仍面临重建质量下降和训练时间延长的问题。为缓解这些局限,我们提出一种新颖的动态重建框架,基于高效的静态-动态分解策略,使用前馈高斯泼溅编码器和光流模型。通过消除静态区域的冗余计算,我们的方法实现了最先进的性能,在渲染质量、训练和渲染速度以及存储效率上均优于现有基线。值得注意的是,在Neural 3D数据集上,我们的框架仅需10分钟训练,并在单张NVIDIA RTX 5090 GPU上以1352x1014分辨率实现了超过700 FPS的渲染速度。此外,我们的分解策略消除了COLMAP预处理的需求,并实现了确定性初始化,从而提高了效率和可重复性。

英文摘要

Dynamic scene reconstruction and novel view synthesis are fundamental to next-generation visual intelligence applications such as virtual reality, robotics, and digital twins. However, high-fidelity reconstruction of complex, time-varying scenes from arbitrary viewpoints remains a significant challenge. Existing dynamic 3DGS methods suffer from computational inefficiency, since they model all Gaussians as dynamic components. While recent decomposition-based approaches address this issue, they still struggle with degraded reconstruction quality and prolonged training time. To mitigate these limitations, we propose a novel dynamic reconstruction framework built upon an efficient static-dynamic decomposition strategy using a Feed-Forward Gaussian Splatting encoder and an optical flow model. By eliminating redundant computations on static regions, our method achieves state-of-the-art performance, outperforming existing baselines across rendering quality, training and rendering speed, and storage efficiency. Notably, on the Neural 3D dataset, our framework requires only 10 minutes for training and achieves a rendering speed of over 700 FPS on a single NVIDIA RTX 5090 GPU at resolution of 1352x1014. Furthermore, our decomposition strategy eliminates the need for COLMAP preprocessing and enables deterministic initialization, thereby enhancing both efficiency and reproducibility.

2605.30862 2026-06-01 cs.DB cs.AI

Sophrosyne: Agentic Exploration of Relational Data Systems Needs Moderation

Sophrosyne: 关系数据系统的智能体探索需要适度

Madhav Jivrajani, Ramnatthan Alagappan, Aishwarya Ganesan

AI总结 针对LLM驱动的Text2SQL智能体在探索数据系统时过度探索的问题,提出Sophrosyne环境,通过增强API响应中的指令来引导探索,减少过度探索并提升SQL生成准确性。

详情
AI中文摘要

由LLM驱动的Text2SQL智能体通过工具调用探索数据系统,将自然语言意图转化为SQL。然而,为了确保安全且受限的访问,数据系统构建了具有显式API表面的环境。我们研究并分类了当前暴露的API,将其分为粗粒度或细粒度,并认为在这两者之间进行选择会带来成本效益探索与准确SQL生成之间的基本权衡。大多数数据系统暴露细粒度API,但这无意中使智能体处于劣势:它们过度探索,将不相关的模式元素纳入查询公式中,并产生不准确的结果。我们认为,抑制过度探索是有效利用这些API表面的关键,并提出了Sophrosyne,一种数据系统环境,它通过增强API响应中的指令来引导智能体的探索过程。初步结果显示,指令将过度探索减少了4.6倍,并将准确率提高了高达12.4%(约4个百分点)。

英文摘要

Text2SQL agents powered by LLMs translate natural language intent into SQL by exploring the data system through tool calls before formulating the query. However, to ensure secure and scoped access, data systems construct environments with explicit API surfaces. We study and categorize these APIs exposed today as either coarse-grained or fine-grained and posit that choosing between them presents a fundamental tradeoff between cost-efficient exploration and accurate SQL generation. Most data systems expose fine-grained APIs, but this inadvertently disadvantages agents: they over-explore, incorporating irrelevant schema elements into their query formulation and produce inaccurate results. We argue that curbing over-exploration is key to the effective use of these API surfaces, and propose Sophrosyne, a data system environment that augments API responses with directives that guide the agent's exploration process. Initial results show that directives reduce over-exploration by 4.6x and boost accuracy by up to 12.4% (approx. 4 percentage points).

2605.30861 2026-06-01 cs.AI

Distilling LLM Feedback for Lean Theorem Proving

蒸馏LLM反馈用于Lean定理证明

Gaetan Narozniak, Gérard Biau, Rémi Munos, Ahmad Rammal, Pierre Marion

AI总结 提出反馈蒸馏方法,通过让模型在token级别匹配自身分布(基于语言模型提供的特权反馈)来训练,以解决GRPO在推理后训练中的稀疏奖励和模式崩溃问题,并在Lean4定理证明中取得更好效果。

详情
AI中文摘要

推理模型的后训练通常结合监督微调和基于可验证奖励的强化学习(最常见的是GRPO)。然而,该算法存在奖励稀疏、探索受限和模式崩溃的问题。基于最近关于自蒸馏的工作,我们提出了反馈蒸馏,这是一种训练方法,其中模型在token级别被训练以匹配自身分布,该分布以语言模型产生的特权反馈为条件。反馈蒸馏提供token级别的监督,并能注入外部知识。在Lean4定理证明中评估我们的方法,我们发现反馈蒸馏比GRPO在生成轨迹上保持更大的多样性,从而产生更高的策略熵和更好的pass@k缩放。这两种方法是互补的:从反馈蒸馏检查点初始化GRPO优于单独使用任何一种方法。总之,我们的结果为提高复杂推理的后训练提供了一条有前景的途径。

英文摘要

Post-training for reasoning models typically combines supervised fine-tuning with reinforcement learning from verifiable rewards, most commonly with GRPO. However, this algorithm suffers from sparse rewards, limited exploration, and mode collapse. Building upon recent works on self-distillation, we propose Feedback Distillation, a training method where the model is trained to match, at the token level, its own distribution conditioned on privileged feedback produced by a language model. Feedback Distillation offers token-level supervision and can inject external knowledge. Evaluating our method for Lean4 theorem-proving, we find that Feedback Distillation maintains greater diversity in generated trajectories than GRPO, yielding higher policy entropy and better pass@k scaling. The two methods are complementary: initializing GRPO from a Feedback Distillation checkpoint outperforms either method alone. All in all, our results suggest a promising avenue to improve post-training for complex reasoning.

2605.30860 2026-06-01 math.ST cs.LG math.PR stat.TH

Bayesian Inference with Shaped Deep Non-linear MLPs

具有形状深度非线性MLP的贝叶斯推断

Boris Hanin, Tianze Jiang

AI总结 本文通过神经协方差SDE分析深度非线性MLP在训练样本数、输入维数、隐藏层宽度和层数均较大时的贝叶斯推断,发现LP/N的一阶准则决定深度对模型证据的益处,并推导出贝叶斯预测后验等价于数据相关核方法。

详情
Comments
35 Pages
AI中文摘要

深度学习理论的一个核心目标是刻画神经网络在模型规模和训练集规模同时较大时的预测行为。由于模型参数数量和数据集大小发散极限不可交换,先验上并不清楚存在哪些极限。在这项工作中,我们通过研究深度非线性MLP在训练样本数($P$)、输入维数($N_0$)、隐藏层宽度($N$)和隐藏层数($L$)均可大时的贝叶斯推断,为这些问题提供了新的见解。我们基于神经协方差SDE(Li等人,2022)分析$LP/N\in\Theta(1)$(扮演有效网络深度角色)区域的预测后验。我们的框架涵盖光滑和ReLU激活函数,并适用于任意温度。我们发现,在$LP/N$的一阶近似下,存在一个简单准则,用于判断哪些数据生成过程能从深度中获益,即更大的$LP/N$会增加贝叶斯模型证据。我们还对物理学文献中的一个先前结果给出了新的推导:至少在$LP/N$的一阶近似下,贝叶斯预测后验极其简单,等价于一个数据相关的核方法。

英文摘要

A central aim of deep learning theory is to characterize how neural networks make predictions in the regime of simultaneously large model and training set size. Since the limits of diverging number of model parameters and dataset size do not commute it is not clear a priori what limits exist. In this work, we shed new light on these questions by studying Bayesian inference in deep non-linear MLPs in the regime where the number of training samples ($P$), the input dimension ($N_0$), the hidden layer width ($N$), and the number of hidden layers ($L$) can all be large. We build on the Neural Covariance SDE (Li et al., 2022) to analyze predictive posteriors in the regime where $LP/N\inΘ(1)$, playing the role of an effective network depth. Our framework covers both smooth and ReLU activation functions and applies to arbitrary temperature. We find to first order in $LP/N$ a simple criterion for which data generating processes benefit from depth in the sense that larger $LP/N$ increases the Bayesian model evidence. We also give a novel derivation of a prior result from the physics literature that at least to first order in $LP/N$, the Bayesian predictive posterior is remarkably simple and is simply equivalent to that of a data-dependent kernel method.

2605.30859 2026-06-01 cs.LG cs.AI

DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning

DARTS: 分布感知的主动展开轨迹塑造以加速LLM强化学习

Yujie Wang, Siwei Chen, Longzan Luo, Xinyi Liu, Xupeng Miao, Fangcheng Fu, Bin Cui

AI总结 针对强化学习中长尾响应分布导致的效率瓶颈,提出分布感知的主动轨迹塑造方法,通过细粒度识别提示内长尾并削减无效冗余,实现高达1.77倍的加速而不损失模型性能。

详情
Comments
16 pages, 14 figures, 5 tables. Accepted to ICML 2026
AI中文摘要

强化学习已成为提升模型能力的关键技术,但由于响应长度的长尾分布,其展开效率受到瓶颈制约。现有工作通过提示级尾部调度缓解长尾影响,但我们关注低效率的根本来源:分布本身。具体而言,我们以更细粒度刻画长尾分布,识别提示内长尾,并揭示它们通常包含无效冗余。为解决此问题,我们提出一种主动分布塑造的新范式,将展开分布向简洁性和确定性方向塑造,从而从根本上解决尾部带来的开销。我们通过一种分布感知的轨迹采样机制实现这一点,该机制为每个提示从冗余探索空间中选择轨迹,并采用自适应冗余分配方案以最大化塑造效果和系统效率。实验表明,与最先进系统相比,在不影响模型性能的情况下,实现了高达1.77倍的显著加速。

英文摘要

Reinforcement Learning (RL) has become pivotal for improving model capabilities yet suffers from rollout efficiency bottlenecks due to the long-tail response length distribution. While existing works mitigate the impact of long tails via prompt-level tail scheduling, we focus on the root source of inefficiency: the distribution itself. Specifically, we characterize the long-tail distribution at a finer granularity, identifying intra-prompt long tails, and revealing that they frequently consist of ineffective verbosity. To address this, we propose a novel paradigm of active distribution shaping to shape the rollout distribution towards conciseness and certainty, thereby fundamentally resolving tail-induced overheads. We achieve this through a distribution-aware trajectory sampling mechanism, which selects trajectories from a redundant exploration space for each prompt, and an adaptive redundancy allocation scheme to maximize both shaping effectiveness and system efficiency. Experiments demonstrate significant acceleration over state-of-the-art systems by up to 1.77x without compromising model performance.

2605.30858 2026-06-01 cs.LG

ForecastCompass: Guiding Agentic Forecasting with Adaptive Factor Memory

ForecastCompass: 自适应因子记忆引导的智能预测

Yurui Chang, Yongkang Du, Yuanpu Cao, Jinghui Chen, Lu Lin

AI总结 提出ForecastCompass框架,通过分层预测任务分类和双组件记忆(因子记忆与推理记忆),结合回顾分析迭代修正,提升智能体在动态环境中的概率预测准确性和校准性。

详情
AI中文摘要

智能预测对于动态环境中的决策至关重要,但由于智能体必须从不完整、时间有限的证据中进行推理,并在结果确定之前产生校准的概率,因此仍然具有挑战性。记忆提供了一种自然机制,将经验从已解决的预测转移到未来的预测任务。然而,现有的智能体记忆方法并非为预测量身定制,因为它们通常存储过去的交互、反思或事实关联,而没有明确表示可重用的预测因子或校准知识。我们提出了ForecastCompass (FoCo),一种用于智能预测的自适应因子记忆框架。FoCo通过分层预测任务分类来组织预测经验,从而能够检索与任务相关的预测知识。它维护两个互补的记忆组件:因子记忆(捕获可重用的预测维度)和推理记忆(编码概率更新、不确定性处理和校准原则)。利用回顾分析作为学习信号,FoCo通过口头记忆修正程序迭代修正记忆,使智能体能够随时间积累可迁移的预测知识。在Prophet Arena和FutureX上使用GPT-5-mini和Gemini-2.5-Flash进行的实验表明,FoCo提高了概率准确性和校准性。

英文摘要

Agentic forecasting is important for decision-making in dynamic environments, but it remains challenging because agents must reason from incomplete, time-limited evidence and produce calibrated probabilities before outcomes are resolved. Memory provides a natural mechanism for transferring experience from resolved forecasts to future prediction tasks. However, existing agent-memory methods are not tailored to forecasting, as they typically store past interactions, reflections, or factual associations without explicitly representing reusable predictive factors or calibration knowledge. We propose ForecastCompass (FoCo), an adaptive factor-based memory framework for agentic forecasting. FoCo organizes forecasting experience with a hierarchical forecasting-task taxonomy, enabling retrieval task-relevant forecasting knowledge. It maintains two complementary memory components: factor memory, which captures reusable predictive dimensions, and reasoning memory, which encodes probability updating, uncertainty handling, and calibration principles. Using retrospective analyses as learning signals, FoCo iteratively revises memory through a verbalized memory-revision procedure, enabling the agent to accumulate transferable forecasting knowledge over time. Experiments on Prophet Arena and FutureX with GPT-5-mini and Gemini-2.5-Flash show that FoCo improves both probabilistic accuracy and calibration.

2605.30857 2026-06-01 cs.CL

MADS: Model-Aware Diverse Core Set Selection for Instruction Tuning

MADS: 面向指令微调的模型感知多样化核心集选择

Yi Bai, Wenhao Zhang, Yao Chen, Jiao Xue, Zhumin Chen, Pengjie Ren

AI总结 提出一种基于模型推理时神经激活状态区分数据特征的多样化核心集选择方法,在减少数据量的同时提升大语言模型在多个下游任务上的性能。

详情
AI中文摘要

指令微调用于增强大语言模型(LLMs)的指令遵循能力。随着指令微调数据量的增加,选择最优核心集变得尤为重要。然而,确保核心集的多样性仍然是一个重大挑战。现有方法主要基于文本特征本身来区分不同的训练数据,与LLMs自身对数据的理解和表示相分离。为解决这一问题,我们提出了一种模型感知的多样化核心集选择方法,该方法基于LLM推理过程中的神经激活状态来区分数据特征。该方法利用模型内在的激活特征,实现了基于覆盖的选择的高效实例化,以确保核心集的多样性。我们在涵盖五个不同任务的六个基准上广泛评估了我们的方法。在我们的方法中,由3B参数LLM选择的核心集在用于微调7B、8B和13B参数的更大模型时表现有效。在包含52K指令-响应对的Alpaca-GPT4数据集上的实验结果表明,由Llama-3.2-3B-Instruct选择的、大小为原始数据集15%的核心集,在微调四个更大的基础模型时,与使用完整数据集训练相比,平均提升了2.5%。实验结果表明,我们的方法在减少数据需求的同时,提升了模型在多个下游任务上的性能。

英文摘要

Instruction fine-tuning is employed to enhance the instruction-following ability of large language models (LLMs). As the amount of instruction fine-tuning data increases, selecting the optimal core set becomes particularly important. However, ensuring the diversity of the core set remains a significant challenge. Existing methods predominantly distinguish different training data based on the text features themselves, decoupled from LLMs' own understanding and representation of the data. To address this issue, we propose a Model-Aware Diverse Core Set Selection method, which distinguishes data features based on the neural activation states during LLM inference. This approach serves as an efficient instantiation of coverage-based selection using model-intrinsic activation features to ensure the diversity in the core set. We extensively evaluate our method on six benchmarks that cover five distinct tasks. In our method, the core set selected by the 3B-parameter LLM performs effectively when utilized to fine-tune larger models with 7B, 8B, and 13B parameters. Experimental results on the Alpaca-GPT4 dataset, which comprises 52K instruction-response pairs, show that the core set, sized at 15\% of the original dataset and selected by Llama-3.2-3B-Instruct, achieves an average improvement of 2.5\% when fine-tuning four larger base models compared with training on the full dataset. The experimental results demonstrate that our method enhances model performance on multiple downstream tasks while reducing data requirements.

2605.30854 2026-06-01 cs.MA cs.AI

Safe Equilibrium Policy Optimization for Strategic Agent Policies

面向策略型智能体的安全均衡策略优化

Karthika Arumugam, Kiran Kumar Manku, Amit Dhanda

AI总结 提出Safe Equilibrium Policy Optimization (SEPO)方法,通过惩罚可剥削性、共谋风险和外部性成本,优化语言模型在多智能体博弈中的策略安全性。

详情
Comments
Submitted to EMNLP 2026
AI中文摘要

使用强化学习微调的语言模型通常优化任务奖励,忽略了多智能体策略结构。由于这些智能体以自然语言游戏状态描述为条件并通过自由生成发出动作,策略失败模式——利用较弱对手、协调有害均衡以及外部化成本——与语言接口本身密不可分。我们提出Safe Equilibrium Policy Optimization (\sepo{}),一种训练目标,通过显式惩罚可剥削性、共谋风险和外部性成本来增强期望收益。我们将\sepo{}作为组相对策略优化(GRPO)的奖励信号,应用于监督微调(SFT)后的Gemma~4 E4B-it和Qwen~3.5-4B。在五个策略领域(迭代囚徒困境、重复拍卖、两种谈判变体以及Kuhn扑克)中评估。\sepo{}在Kuhn扑克中实现了两种模型的零剥削池优势,在四个领域的安全性能上优于基础模型,并纠正了SFT引入的过度合作行为。在谈判中,\sepo{}实现了正安全结果,并且是唯一具有正归一化相对优势的谈判配置。消融实验证实,每次推演的剥削计算是必要的:共享常数惩罚在GRPO优势归一化中抵消(常数控制变量性质),产生零梯度。为支持智能体策略安全的进一步研究,我们发布了我们的\href{https://anonymous.4open.science/r/sepo-2668/README.md}{代码}和SFT数据集。

英文摘要

Language models fine-tuned with reinforcement learning typically optimize for task reward, ignoring multi-agent strategic structure. Because these agents condition on natural language game-state descriptions and emit actions through free-form generation, strategic failure modes -- exploiting weaker opponents, coordinating on harmful equilibria, and externalizing costs are inseparable from the language interface itself. We propose Safe Equilibrium Policy Optimization (\sepo{}), a training objective that augments expected payoff with explicit penalties for exploitability, collusion risk, and externality cost. We implement \sepo{} as a reward signal for Group Relative Policy Optimization (GRPO), applied to Gemma~4 E4B-it and Qwen~3.5-4B after supervised fine-tuning (SFT). Evaluated across five strategic domains: Iterated Prisoner's Dilemma, repeated auctions, two negotiation variants, and Kuhn Poker. \sepo{} achieves zero exploit-pool advantage in Kuhn Poker for both models, outperforms the base model on safety in four domains, and corrects the over-cooperative behavior introduced by SFT. In negotiation, \sepo{} achieves a positive-safety outcome and only the positive normalized relative advantage of any negotiation configuration. Ablation experiments confirm that per-rollout exploit computation is necessary: a shared constant penalty cancels in GRPO advantage normalization (constant control-variate property), producing zero gradient. To support further research in strategic safety for agents, we release our \href{https://anonymous.4open.science/r/sepo-2668/README.md}{code} and SFT datasets.

2605.30852 2026-06-01 cs.CL

Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism

推测性流水线解码:通过流水线并行实现更高准确度和零气泡推测

Yijiong Yu, Huazheng Wang, Shuai Yuan, Ruilong Ren, Ji Pei

AI总结 提出推测性流水线解码(SPD)框架,利用流水线并行将目标LLM划分为n个流水线阶段并行处理n个token,通过推测模块聚合中间特征预测下一token,实现有限难度、高接受率和零延迟气泡,显著提升理论加速比。

详情
AI中文摘要

推测性解码(SD)通过草稿-验证范式加速低并发LLM推理。然而,主流方法通常依赖多token预测,这引入了逐渐增加的预测难度和串行草稿延迟。为了解决这些问题,我们提出了推测性流水线解码(SPD),这是一个突破性的框架,释放了流水线并行的真正潜力。通过将目标LLM划分为$n$个流水线阶段,SPD允许LLM并行处理$n$个token以加速解码。为了在单序列解码中持续填充流水线,推测模块聚合不同流水线深度的中间特征来预测下一个token,与目标模型的流水线步骤严格并行执行,从而实现有限的难度、更高的接受率和零延迟气泡。我们的实验表明,与主流基线相比,SPD实现了显著更高的理论加速比,为LLM解码加速提供了高度可扩展的解决方案。我们的代码可在https://github.com/yuyijiong/speculative_pipeline_decoding获取。

英文摘要

Speculative Decoding (SD) accelerates low-concurrency LLM inference by employing a draft-then-verify paradigm. However, mainstream methods typically rely on multi-token prediction, which introduces escalating prediction difficulty and serial drafting latency. To address these, we propose Speculative Pipeline Decoding (SPD), a groundbreaking framework that unlocks the true potential of pipeline parallelism. By partitioning the target LLM into $n$ pipeline stages, SPD allows LLM to process $n$ tokens in parallel to accelerate decoding. To continuous fill the pipeline in single sequence decoding, a speculation module aggregates intermediate features across different pipeline depths to predict the next token, executing strictly in parallel with the target model's pipeline step, to realize bounded difficulty, higher acceptance rates, and zero latency bubbles. Our experiments demonstrate that SPD achieves a significantly higher theoretical speedup compared to mainstream baselines, offering a highly scalable solution for LLM decoding acceleration. Our code is available at https://github.com/yuyijiong/speculative_pipeline_decoding

2605.30849 2026-06-01 cs.RO

High-Load-Density Electro-Permanent Magnetic Foot with Controllable Adhesion for Quadruped Wall-Climbing Robots

用于四足爬壁机器人的高负载密度可控吸附电永磁足

An Li, Bo Tao, I-Ming Chen, Han Ding

AI总结 提出一种高负载密度可控吸附电永磁足,采用圆形Halbach网络电永磁(CHN-EPM)吸附单元和力反馈系统,实现四足机器人在铁磁表面的可靠攀爬,最大吸附力超过1000 N,负载重量比超过200:1。

详情
Comments
10 pages, 6 figures, 2 tables; project page and videos available in the repository
AI中文摘要

为了实现四足机器人在铁磁表面上的可靠攀爬运动,本文提出了一种具有可控吸附力的高负载密度电永磁足,其特点是采用力反馈圆形Halbach网络电永磁(CHN-EPM)吸附单元和磁化控制系统。由于其三维磁路结构和磁通集中效应,CHN-EPM实现了分布式并联磁通路径,提高了磁通利用率,从而降低了对气隙变化的敏感性,即使在部分接触条件下也能保持有效吸附。所提出的CHN-EPM最大吸附力超过1000 N,负载重量比超过200:1。开发了磁化驱动器和两级脉冲电流控制策略,以调节励磁电流幅值和持续时间,实现精确可靠的磁化。通过集成柔性压力传感器进行接触力反馈,系统可以有效监测附着和脱离状态,确保在不确定接触条件下实现可靠的吸附切换。所提出的系统被集成到商用四足机器人(Unitree GO2)中,展示了在天花板和垂直壁面上的高负载吸附,以及在涂漆、穿孔和弯曲铁磁表面上的稳定运动。

英文摘要

To enable reliable climbing locomotion of quadruped robots on ferromagnetic surfaces, this paper presents a high-load-density electro-permanent magnetic foot with controllable adhesion, featuring force-feedback circular Halbach-net electro-permanent magnet (CHN-EPM) adhesion units and a magnetization control system. Due to its three-dimensional magnetic circuit structure and flux-concentration effect, the CHN-EPM enables a distributed parallel magnetic flux path with enhanced flux utilization, resulting in reduced sensitivity to air-gap variations and allowing effective adhesion to be maintained even under partial contact conditions. The proposed CHN-EPM generates a maximum adhesion force exceeding 1000 N with a load-to-weight ratio over 200:1. A magnetization driver and a two-stage pulse current control strategy are developed to regulate the excitation current amplitude and duration, enabling accurate and reliable magnetization. By incorporating a flexible pressure sensor for contact force feedback, the system can effectively monitor attachment and detachment states, ensuring robust adhesion switching under uncertain contact conditions. The proposed system is integrated into a commercial quadruped robot (Unitree GO2), demonstrating high-load adhesion on ceiling and vertical-wall surfaces and stable locomotion on painted, perforated, and curved ferromagnetic surfaces.

2605.30846 2026-06-01 cs.CV

Count Anything

Count Anything

Mengqi Lei, Shuokun Cheng, Wei Bao, Shaoyi Du, Jun-Hai Yong, Siqi Li, Yue Gao

AI总结 提出跨域文本引导的目标计数模型Count Anything,通过双粒度实例枚举和互补计数融合,在统一基准CLOC上实现多域泛化。

详情
AI中文摘要

尽管通用视觉模型取得了快速进展,目标计数仍然分散在特定领域的数据集和任务公式中。现有的计数模型通常针对人群、车辆、细胞、农作物或遥感目标等场景定制,因此难以跨类别、视觉域、目标尺度和密度分布进行泛化。在本文中,我们研究了跨域的文本引导目标计数,其中模型以图像和自然语言查询为输入,并返回一组基于实例的目标点,其基数给出计数。这种公式将类别条件计数与可解释的空间定位统一起来。为了支持这一设置,我们构建了CLOC,一个跨域大规模目标计数数据集,将多样化的公共数据源重组为统一的基准。CLOC涵盖六个视觉域:通用场景、遥感、组织病理学、细胞显微镜、农业和微生物学,包含约22万张图像、619个类别和1500万个目标实例。基于CLOC,我们提出了Count Anything,一个用于文本引导目标计数的通用模型。与主导计数模型的密度图方法不同,Count Anything采用离散实例点并执行双粒度实例枚举。区域级稀疏计数器为大而稀疏的目标提供目标级锚点,而像素级密集计数器通过密集点预测处理小、拥挤和弱边界目标。点中心监督策略能够从异构标注中学习,互补计数融合以无参数方式结合两个计数器。大量实验表明,Count Anything实现了强准确性和多域泛化,优于现有的开放世界计数方法。代码可在:https://github.com/Mengqi-Lei/count-anything 获取。

英文摘要

Object counting remains fragmented across domain-specific datasets and task formulations, despite rapid progress in generalist vision models. Existing counting models are often tailored to scenarios such as crowds, vehicles, cells, crops, or remote-sensing objects, and thus struggle to generalize across categories, visual domains, object scales, and density distributions. In this paper, we study text-guided object counting across domains, where a model takes an image and a natural-language query as input and returns an instance-grounded set of target points whose cardinality gives the count. This formulation unifies category-conditioned counting with interpretable spatial localization. To support this setting, we construct CLOC, a Cross-domain Large-scale Object Counting dataset that reorganizes diverse public data sources into a unified benchmark. CLOC covers six visual domains: General Scene, Remote Sensing, Histopathology, Cellular Microscopy, Agriculture, and Microbiology, with about 220K images, 619 categories, and 15M object instances. Based on CLOC, we propose Count Anything, a generalist model for text-guided object counting. Unlike density-map-based methods, which dominate counting models, Count Anything adopts discrete instance points and performs dual-granularity instance enumeration. A Region-level Sparse Counter provides object-level anchors for large and sparse targets, while a Pixel-level Dense Counter handles small, crowded, and weakly bounded targets via dense point prediction. A point-centric supervision strategy enables learning from heterogeneous annotations, and Complementary Count Fusion combines both counters in a parameter-free manner. Extensive experiments show that Count Anything achieves strong accuracy and multi-domain generalization, outperforming existing open-world counting methods. Code is available at: https://github.com/Mengqi-Lei/count-anything.

2605.30844 2026-06-01 cs.CL cs.AI stat.ML

Fine-Tuning Improves Information Conveyance in Language Models

微调提升语言模型中的信息传递

Yuwei Cheng, Weiyi Tian, Haifeng Xu

AI总结 提出冠层熵(Canopy Entropy)度量,从树结构视角量化生成空间的有效大小,发现微调模型在总熵降低时仍能增强长度-熵率正相关,从而更高效地将不确定性转化为语义多样性。

详情
AI中文摘要

微调通常被认为会降低大型语言模型的不确定性和多样性,但现有分析忽略了输出长度这一关键混杂因素,因此未能捕捉不确定性在整个生成展开中的分布。为解决这一问题,我们提出冠层熵($\mathrm{CE}^\star$),一种从树视角看待语言生成的度量,其中“冠层”代表所有可能展开的空间,使得$\mathrm{CE}^\star$自然地量化生成空间的有效大小。$\mathrm{CE}^\star$共同捕捉输出长度$N$和生成序列$Y_{1:N}$中的不确定性——实际上,我们证明它等于总香农熵$H(N, Y_{1:N}\mid X)$,其中$X$表示提示。该公式产生了可解释的度量,包括长度-熵率相关项$ ho(N, r_N)$,其中$r_N$是熵率,通过指示较长输出是否每个标记信息量更多或更少来量化信息传递效率。实验上,跨任务和模型家族,我们发现微调模型一致地表现出更强的正相关$ ho(N, r_N)$,即使总熵降低。此外,在控制模型家族、任务、提示和输出长度效应后,我们发现微调几乎使熵率与语义多样性之间的相关强度增加了两倍,表明对齐模型更有效地将标记不确定性转化为语义多样性。总体而言,这些结果表明微调并非简单地降低不确定性,而是从根本上将其重组为更具信息性和语义意义的生成。我们的代码可在https://github.com/WeiyiTian/canopy-entropy获取。

英文摘要

Fine-tuning is often believed to reduce uncertainty and diversity in large language models, but existing analyses overlook output length, a key confounder, and therefore fail to capture how uncertainty is distributed across an entire generation rollout. To address this, we propose Canopy Entropy ($\mathrm{CE}^\star$), a measure that views language generation from a tree perspective, where ``canopy'' represents the space of all possible rollouts, making $\mathrm{CE}^\star$ naturally quantify the effective size of the generation space. $\mathrm{CE}^\star$ jointly captures uncertainty in both the output length $N$ and the generated sequence $Y_{1:N}$ -- indeed, we show that it equals to total Shannon entropy $H(N, Y_{1:N}\mid X)$, where $X$ denotes the prompt. This formulation yields interpretable metrics, including a length-entropy correlation term $ρ(N, r_N)$, where $r_N$ is the entropy rate, quantifying information conveyance efficiency by indicating whether longer outputs are more or less informative per token. Empirically, across tasks and model families, we find that fine-tuned models consistently exhibit stronger positive correlation $ρ(N, r_N)$, even when total entropy decreases. Furthermore, after controlling for model family, task, prompt, and output-length effects, we find that fine-tuning nearly triples the correlation strength between entropy rate and semantic diversity, suggesting that aligned models convert token uncertainty into semantic diversity more efficiently. Overall, these results demonstrate that fine-tuning does not simply reduce uncertainty, but fundamentally reorganizes it into more informative and semantically meaningful generations. Our code is available at https://github.com/WeiyiTian/canopy-entropy.

2605.30843 2026-06-01 cs.LG econ.EM

A Lecture Note on Offline RL and IRL, Part II: Foundations of Inverse Reinforcement Learning and Dynamic Discrete Choice Models

离线强化学习与逆强化学习讲义,第二部分:逆强化学习与动态离散选择模型的基础

Enoch Hyunwook Kang

AI总结 本文证明了逆强化学习(IRL)与动态离散选择(DDC)模型的等价性,回顾了经典识别结果和计算范式,并介绍了现代机器学习方法及其识别特性。

详情
AI中文摘要

在前向强化学习问题中,奖励是固定且已知的;学习者被要求找到一个好的策略或价值函数。这里我们反过来提问:给定由专家生成的离线数据,我们能否恢复专家所优化的奖励?这就是逆强化学习问题,值得注意的是,两个社区——研究动态离散选择(DDC)的结构计量经济学家和研究熵正则化IRL的机器学习者——一直在以不同的名称研究完全相同的概率模型。我们首先证明它们的等价性。然后,我们发展Magnac和Thesmar的经典识别结果以及由此产生的经典计算范式:Rust的嵌套不动点算法、Hotz和Miller的条件选择概率方法,以及Adusumilli和Eckardt的两种时间差分方法:线性半梯度TD和近似价值迭代。每种方法都有其局限性:维度、转移核估计、致命三元组或投影不动点偏差。接着,我们回顾现代ML/IRL分支:对抗性IRL、占用匹配、IQ-Learn和离线ML-IRL,推导每种方法的实际目标,并精确说明它识别了什么和没有识别什么。最后,我们介绍Kang等人的经验风险最小化框架,该框架为离线IRL/DDC提供了基于梯度的估计器。

英文摘要

In the forward reinforcement-learning problem, the reward is fixed and known; the learner is asked to find a good policy or value function. Here we turn the question around. Given offline data generated by an expert, can we recover the reward the expert was optimizing? This is the inverse reinforcement learning problem, and remarkably, two communities, structural econometricians studying dynamic discrete choice (DDC) and machine learners studying entropy-regularized IRL, have been working on exactly the same probabilistic model under different names. We begin by proving their equivalence. We then develop the classical identification result of Magnac and Thesmar and the classical computational paradigms that grew out of it: Rust's nested fixed-point algorithm, the conditional-choice-probability approach of Hotz and Miller, and the two temporal-difference approaches of Adusumilli and Eckardt: linear semi-gradient TD and approximate value iteration. Each route has its limits: dimensionality, transition-kernel estimation, the deadly triad, or projected fixed-point bias. We then walk through the modern ML/IRL strand: adversarial IRL, occupancy matching, IQ-Learn, and offline ML-IRL, deriving each method's actual objective and stating precisely what it does and does not identify. We close with the empirical-risk-minimization framework of Kang et al., which yields a gradient-based estimator for offline IRL/DDC.

2605.30842 2026-06-01 cs.LG

CoMem: Context Management with A Decoupled Long-Context Model

CoMem: 基于解耦长上下文模型的上下文管理

Yuwei Zhang, Chengyu Dong, Shuowei Jin, Changlong Yu, Hejie Cui, Hongye Jin, Xinyang Zhang, Hamed Bonab, Colin Lockard, Jianshu Chen, Zhenyu Shi, Jingbo Shang, Xian Li, Bing Yin

AI总结 提出CoMem框架,通过将记忆管理与智能体工作流解耦并采用k步偏移异步流水线,利用奖励驱动训练策略,在SWE-Bench-Verified上实现1.4倍延迟改进且保持大部分性能。

详情
Comments
Work in progress
AI中文摘要

上下文管理使智能体模型能够通过对先前交互历史的迭代总结来解决长时任务。然而,这一过程通常会因额外的总结标记而产生大量解码开销,显著影响部署时的端到端响应延迟。在本文中,我们介绍CoMem,一种新颖的框架,它将记忆管理与主要智能体工作流解耦,使这些过程能够并行执行。我们提出了一种k步偏移异步流水线,将记忆模型的总结与智能体的推理重叠,有效掩盖了上下文处理的延迟。为了确保在这种异步设置下的鲁棒性,我们引入了一种奖励驱动的训练策略,使记忆模型对齐以捕获足够统计信息供智能体决策。理论分析证实,与耦合架构相比,CoMem提供了更优的效率-效果权衡。我们在SWE-Bench-Verified上的广泛实验结果表明,CoMem在保留大部分性能的同时,相比普通长上下文解决方案提供了1.4倍的延迟改进。此外,我们证明这些延迟增益随系统吞吐量增加而有利地扩展,为智能体推理和记忆压缩的独立优化提供了一条模块化路径。

英文摘要

Context management enables agentic models to solve long-horizon tasks through iterative summarization of previous interaction histories. However, this process typically incurs substantial decoding overhead for the extra summarization tokens, which significantly affect the end-to-end response latency at deployment. In this paper, we introduce CoMem, a novel framework that decouples memory management from the primary agent workflow, enabling these processes to execute in parallel. We propose a $k$-step-off asynchronous pipeline that overlaps the memory model's summarization with the agent's inference, effectively masking the latency of context processing. To ensure robustness under this asynchronous setting, we introduce a reward-driven training strategy that aligns the memory model to capture sufficient statistics for the agent's decision-making. Theoretical analysis confirms that CoMem offers a superior efficiency-effectiveness trade-off compared to coupled architectures. Our extensive experimental results on SWE-Bench-Verified show that CoMem provides 1.4x latency improvements upon vanilla long-context solutions while preserving most of the performance. Furthermore, we demonstrate that these latency gains scale favorably with increased system throughput, offering a modular path forward for the independent optimization of agent reasoning and memory compression.

2605.30838 2026-06-01 cs.AI

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

COMPASS: 认知MCTS引导的过程对齐用于安全搜索代理

Wenkai Shen, Pengyang Zhou, Jiahe Xu, Jiaming Qian, Haozhe He, Zhihao Huang, Chaochao Chen, Xiaolin Zheng

AI总结 提出COMPASS框架,通过认知树探索和自省步骤对齐,在保持通用效用的同时实现搜索代理工作流中的鲁棒安全对齐。

详情
AI中文摘要

基于LLM的搜索代理能够进行多步推理和使用工具。然而,这些能力引入了检索诱导的安全退化,因为有害意图可能分解为看似无害的子查询,导致不安全的结果。现有的对齐方法难以捕捉稀疏的安全信号,并且无法监督多步交互中的各种违规行为。我们提出COMPASS,一种认知MCTS引导的过程对齐框架,旨在在保持通用效用的同时,实现代理工作流中的鲁棒安全对齐。COMPASS集成了认知树探索(CTE)以高效合成隐蔽攻击轨迹,以及自省步骤对齐(ISA)以隔离有风险的中间动作进行细粒度过程监督。实验结果表明,COMPASS在实现良好的安全-效用权衡的同时,所需训练数据大幅减少。

英文摘要

LLM-powered search agents enable multi-step reasoning and tool use. However, these capabilities introduce retrieval-induced safety degradation, as harmful intents may decompose into seemingly innocuous sub-queries that lead to unsafe outcomes. Existing alignment methods struggle to capture sparse safety signals and fail to supervise diverse violations across multi-step interactions. We propose COMPASS, a Cognitive MCTS-Guided Process Alignment framework designed to achieve robust safety alignment throughout the agent workflow while preserving general utility. COMPASS integrates cognitive tree exploration (CTE) to efficiently synthesize stealthy attack trajectories, and introspective step-wise alignment (ISA) to isolate risky intermediate actions for fine-grained process supervision. Empirical results show that COMPASS achieves a favorable safety-utility trade-off while requiring substantially less training data.

2605.30837 2026-06-01 cs.CR cs.LG

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

先派侦察兵:提示注入防御中自适应检测器分配的预推理方法

Shuhao Zhang, Jiarui Li, Qi Cao, Ruiyi Zhang, Pengtao Xie

AI总结 针对提示注入检测器异构且不可靠的问题,提出SCOUT框架,通过预测每个检测器对每个样本的可靠性和延迟,动态分配检测器,实现安全性与效率的权衡。

详情
Comments
We propose SCOUT, a detector allocation framework that predicts each detector's accuracy and latency on a given input before running it, letting operators control the safety-utility trade-off with a single threshold and route to an LLM judge only when needed
AI中文摘要

提示注入检测器是异构的:每个检测器在不同攻击切片上表现强劲,但没有一个始终可靠。然而现有系统仍将检测视为固定的单检测器流水线,将每个请求提交给一个检测器的盲点。我们将防御重新定义为检测器分配:给定一个异构池,决定每个请求运行哪些检测器以及是否升级到LLM法官。我们的框架SCOUT(可扩展且可控的结果预测用于不确定性感知分诊)通过预测每个检测器在类似历史输入上的样本级可靠性和延迟,使这一决策动态化,并向操作员暴露一个单一的安全-效用阈值(其中效用包含良性通过率和挂钟时间)。为了评估这一设置,我们构建了SCOUT-450基准,该基准捕捉了旧提示注入集未充分代表的、结构复杂的面向代理的注入。在SCOUT-450上,相对于始终开启的GPT-4o法官,安全导向的工作点将攻击成功率降低46%,总挂钟时间降低40%,同时良性效用下降5.1个百分点。SCOUT还迁移到三个外部基准(BIPIA、IPI和IHEval),改善了安全-效用前沿。

英文摘要

Prompt-injection detectors are heterogeneous: each is strong on a different slice of attacks, and none is always reliable. Yet existing systems still treat detection as a fixed single-detector pipeline, committing every request to one detector's blind spots. We reframe defense as detector allocation: given a heterogeneous pool, decide per request which detectors to run and whether to escalate to an LLM judge. Our framework SCOUT (Scalable and Controllable Outcome-prediction for Uncertainty-aware Triage) makes this decision dynamic by predicting each detector's per-sample reliability and latency from how it behaved on similar past inputs, and exposes a single safety-utility threshold to the operator (where utility bundles benign-pass rate and wall-clock). To evaluate this setting, we build SCOUT-450, a benchmark that captures the structurally complex, agent-facing injections that older prompt-injection sets under-represent. On SCOUT-450, a safety-oriented operating point reduces attack-success rate by 46% and total wall-clock by 40% relative to an always-on GPT-4o judge, at a 5.1-point benign-utility drop. SCOUT also transfers to three external benchmarks (BIPIA, IPI, and IHEval), improving the safety-utility frontier.

2605.30834 2026-06-01 cs.RO cs.AI

Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

轨迹中的捉迷藏:发现VLA运行时监控的失败信号

Seongheon Park, Wendi Li, Changdae Oh, Samuel Yeh, Zsolt Kira, Michael Hagenow, Sharon Li

AI总结 提出Hide-and-Seek框架,通过轨迹间和轨迹内对比学习,从轨迹级监督中定位失败指示动作,实现无需步骤标注的VLA模型运行时失败检测。

详情
AI中文摘要

视觉-语言-动作(VLA)模型使机器人能够遵循自然语言指令并在不同任务中泛化,但在实际部署中仍易受执行失败影响,损害可靠性。因此,在执行过程中检测此类失败对于具身系统的稳健部署至关重要。现有的失败检测方法要么依赖昂贵的动作重采样或外部模型,要么将轨迹级标签均匀传播到每个时间步,掩盖了局部失败信号。在本文中,我们提出 extbf{Hide-and-Seek}框架,将VLA失败检测形式化为粗监督学习问题。通过结合轨迹间和轨迹内对比目标,Hide-and-Seek能够定位指示失败的动作,并仅从轨迹级监督中诱导出具有时间结构的失败信号,无需任何步骤级标注。我们在LIBERO、VLABench和真实机器人平台上,针对三种代表性VLA策略(OpenVLA、$π_0$和$π_{0.5}$)评估了Hide-and-Seek。我们的方法在共形预测下实现了最先进的多任务失败检测性能,具有实用的准确度-及时性权衡,并且对已见和未见任务均具有良好的泛化能力。

英文摘要

Vision-Language-Action (VLA) models enable robots to follow natural language instructions and generalize across diverse tasks, but they remain vulnerable to execution failures that compromise reliability in real-world deployment. Detecting such failures during execution is therefore critical for the robust deployment of embodied systems. Existing failure detection methods either rely on expensive action resampling or external models, while alternatives propagate trajectory-level labels uniformly across every timestep, obscuring localized failure signals. In this paper, we propose \textbf{Hide-and-Seek}, a framework that formulates VLA failure detection as a coarsely supervised learning problem. By combining inter-trajectory and intra-trajectory contrastive objectives, Hide-and-Seek localizes failure-indicative actions and induces temporally structured failure signals from trajectory-level supervision alone, without any step-level annotation. We evaluate Hide-and-Seek on LIBERO, VLABench, and a real-world robotic platform across three representative VLA policies: OpenVLA, $π_0$, and $π_{0.5}$.Our method achieves state-of-the-art multi-task failure detection performance with a practical accuracy--timeliness trade-off under conformal prediction, and generalizes well to both seen and unseen tasks.

2605.30833 2026-06-01 cs.CL cs.AI

Your Teacher Can't Help You Here: Combating Supervision Fidelity Decay in On-Policy Distillation

你的老师在这里帮不了你:对抗在线策略蒸馏中的监督保真度衰减

Yanjiang Liu, Jie Lou, Xinyan Guan, Yuqiu Ji, Hongyu Lin, Ben He, Xianpei Han, Le Sun, Xing Yu, Yaojie Lu

AI总结 针对在线策略蒸馏中监督保真度衰减问题,提出前瞻组奖励方法,通过评估学生候选词在后续步骤中诱导的教师置信度并分配组归一化奖励,结合熵触发树注意力机制,显著提升长链推理性能。

详情
AI中文摘要

在线策略蒸馏通过使用来自教师的 token 级反馈,在学生模型自身生成的轨迹上训练学生模型来传递推理能力。然而,我们识别出一个关键瓶颈,即 extbf{监督保真度衰减(SFD)}:随着学生生成的前缀变长,教师的下一个 token 分布变得不那么自信和更具区分性。因此,反向 KL 蒸馏中依赖教师的纠正信号减弱,导致学生漂移在长推理链中累积。为了缓解 SFD,我们引入了 extbf{前瞻组奖励(\ours{})}。基于下一步教师置信度反映了未来反向 KL 监督的区分强度这一见解,\ours{} 通过学生在后续步骤中诱导的教师置信度来评估学生的 top-K 候选 token,并分配组归一化奖励。为了保持计算效率,我们进一步设计了一种熵触发的树注意力机制。在六个数学和代码基准测试中,\ours{} 在 7B 学生模型上比 OPD 提高了 mean@8 达 extbf{2.57} 个点,在长生成任务中增益更大,在 AIME-26 上达到 + extbf{4.92} 个点(39k token)。

英文摘要

On-policy distillation transfers reasoning capabilities by training a student model on its own generated trajectories using token-level feedback from a teacher. However, we identify a critical bottleneck, \textbf{Supervision Fidelity Decay (SFD)}: as student-generated prefixes lengthen, the teacher's next-token distribution becomes less confident and less discriminative. Consequently, the teacher-dependent corrective signal in reverse-KL distillation weakens, causing student drift to compound across long reasoning chains. To mitigate SFD, we introduce \textbf{Lookahead Group Reward (\ours{})}. Building on the insight that next-step teacher confidence reflects the discriminative strength of future reverse-KL supervision, \ours{} evaluates the student's top-K candidate tokens by the teacher confidence they induce at the subsequent step and assigns a group-normalized reward. To maintain computational efficiency, we further design an entropy-triggered tree-attention mechanism. Across six math and code benchmarks, \ours{} improves mean@8 by \textbf{2.57} points over OPD for a 7B student, with gains increasing in longer-generation and reaching +\textbf{4.92} points on AIME-26 at 39k tokens.

2605.30832 2026-06-01 cs.AI

SLAT: Segment-Level Adaptive Trimming for Efficient CoT Reasoning

SLAT:面向高效CoT推理的段级自适应修剪

Jian Yao, Xiongcai Luo, Ran Cheng, Kay Chen Tan

AI总结 提出段级自适应修剪框架SLAT,通过强化学习选择性抑制低边际效用的高概率冗余段,在保持准确率的同时将推理长度减少50%。

详情
AI中文摘要

近期大型推理模型通过强化学习显著提升了思维链(CoT)能力。然而,生成的推理链常存在结构冗余(即“过度思考”),在未提高答案正确性的情况下产生高计算开销。现有缓解策略通常依赖令牌均匀长度惩罚,这种粗粒度、段无关的缩短压力可能在不经意间抑制有用推理。为解决此问题,我们证明低效集中在高概率且边际效用低的段。我们推导了在正确性-长度权衡目标下段次优性的理论表征,并提出SLAT(段级自适应修剪),一种基于该准则选择性抑制冗余段的强化学习框架。在标准基准上的实验结果表明,SLAT建立了优越的准确率-效率帕累托前沿,与未压缩基线相比,将推理长度减少50%,同时保持有竞争力的准确率。总体而言,我们的结果表明,基于理论的段感知修剪是大型语言模型中高效CoT推理的一个有前景的方向。

英文摘要

Recent advances in Large Reasoning Models have significantly improved chain-of-thought (CoT) capabilities via reinforcement learning (RL). However, generated reasoning chains frequently suffer from structural redundancy (i.e., \emph{overthinking}), incurring high computational overhead without improving answer correctness. Existing mitigation strategies typically rely on token-uniform length penalties, which provide coarse, segment-agnostic pressure toward shorter outputs and can inadvertently suppress useful reasoning alongside redundancy. To address this, we demonstrate that inefficiency concentrates in high-probability segments with low marginal utility. We derive a theoretical characterization of segment suboptimality under the correctness-length trade-off objective and propose \textsc{SLAT} (Segment-Level Adaptive Trimming), an RL framework that selectively suppresses redundant segments based on this criterion. Empirical results on standard benchmarks indicate that \textsc{SLAT} establishes a superior accuracy-efficiency Pareto frontier, reducing reasoning length by $50\%$ relative to uncompressed baselines while maintaining competitive accuracy. Overall, our results suggest that theoretically grounded, segment-aware trimming is a promising direction for efficient CoT reasoning in large language models.

2605.30831 2026-06-01 q-bio.QM cs.LG physics.chem-ph

The Geometry of Activity Cliffs: Representation Dependence and Multi-Scale Characterization of Activity Landscapes

活性悬崖的几何结构:活性景观的表征依赖性与多尺度表征

Pawel Dabrowski-Tumanski, Bartosz Topolski, Dariusz Plewczynski, Tomasz Jetka

AI总结 本研究通过六步分析流程,系统探究不同分子表征(如指纹和嵌入)对活性悬崖定义的影响,发现无单一表征在所有标准下均最优,揭示了活性悬崖是表征诱导的几何现象而非分子对固有属性。

详情
AI中文摘要

活性悬崖是指结构相似但活性差异巨大的化合物,通常被视为化学数据集的固有特征。我们认为,除了靶标生物学因素外,我们对活性悬崖的理解很大程度上是由所选分子表征所诱导的几何结构决定的,而非分子对本身的属性。我们设计了一个六步分析流程来系统检验这一假设。该流程包括:评估成对距离几何、悬崖富集度、活性梯度分布、悬崖子空间的持续同调、嵌入和度量对的预测基准测试,以及最终匹配分子对和立体异构体的分析。我们将该流程应用于十五种嵌入和度量配置,以构建针对三个已知活性悬崖挑战的不同数据集的基准。没有一种表征在所有标准上均表现优异:Morgan Tanimoto 提供了最强的悬崖富集度和跨骨架泛化能力;MolFormer 余弦提供了唯一有意义的立体化学敏感性;MACCS 和 RDKit Dice 指纹对匹配分子对变换最敏感;ChemBERTa 由于嵌入坍缩而全面失败。这些发现并非排名。它们反映了不同表征编码了分子识别的不同方面,而选择一种表征实际上就隐含地定义了活性悬崖是什么。

英文摘要

Activity cliffs, structurally similar compounds with large potency differences, are widely treated as intrinsic features of chemical datasets. We argue that apart from target biology, much of our cliff understanding is a consequence of the geometry induced by the chosen molecular representation, not a property of a molecule pair itself. We designed a six-step pipeline to systematically test this hypothesis. The pipeline consists of: assessing pairwise distance geometry, cliff enrichment, activity gradient distribution, persistent homology of the cliff subspace, predictive benchmarking for a chosen pair of an embedding and a metric, and eventually, analysis of the matched molecular pairs and stereoisomers. We applied the pipeline to fifteen configurations of embeddings and metrics to build a benchmark across three distinctive datasets known of activity cliffs challenges. No representation excels on all criteria: Morgan Tanimoto provides the strongest cliff enrichment and cross-scaffold generalization; MolFormer cosine provides the only meaningful stereochemical sensitivity; MACCS and RDKit Dice fingerprints are most sensitive to matched-molecular-pair transformations; ChemBERTa fails uniformly due to embedding collapse. These findings are not a ranking. They reflect the fact that different representations encode different aspects of molecular recognition, and that choosing one implicitly defines what an activity cliff actually is.

2605.30829 2026-06-01 cs.CV

LegSegNet: A Public Deep Learning System for Lower Extremity CT Tissue Segmentation and Quantification

LegSegNet:用于下肢CT组织分割与量化的公共深度学习系统

Yuwen Chen, Yaqian Chen, Roy Colglazier, Haoyu Dong, Hanxue Gu, Maciej A. Mazurowski, Kevin W. Southerland

AI总结 提出LegSegNet深度学习系统,实现下肢CT中骨骼、肌肉、皮下脂肪和肌间/肌内脂肪的自动分割与量化,在测试集上平均Dice达89.31,是首个公开的端到端系统。

详情
Comments
9 pages
AI中文摘要

下肢计算机断层扫描(CT)包含用于身体成分分析、肌少症评估和肌肉骨骼疾病监测的临床相关信息,但大规模提取这些测量需要精确的组织分割和自动化量化工作流程。现有的公共分割工具并非为全面的下肢CT分析而设计,特别是对于临床重要的肌间/肌内脂肪组织,而且大多数公共方法仅提供掩膜预测而非端到端量化系统。为解决这一问题,我们提出了LegSegNet,一个用于下肢CT组织分割和身体成分量化的深度学习系统。给定输入CT扫描,LegSegNet分割骨骼、骨骼肌、皮下脂肪组织和肌间/肌内脂肪组织。然后计算定量的组织测量用于下游分析。我们使用1,302张手动标注的CT切片开发了分割模型,并在900张保留测试切片上进行了评估,所有标注均由放射科医生审核。我们将LegSegNet与广泛的2D分割方法进行基准测试,包括基于CNN的模型、基于Transformer的模型和微调的基础模型,并进一步在外部公共CT数据集上评估其泛化能力。LegSegNet实现了最佳的整体分割性能,在保留测试集上的平均Dice得分为89.31。据我们所知,LegSegNet是首个公开可用的用于下肢CT组织分割和量化的端到端系统,为未来医学图像分析中的计算机视觉研究提供了实用的评估工具。代码和模型权重可在https://github.com/mazurowski-lab/LegSegNet获取。

英文摘要

Lower extremity computed tomography (CT) contains clinically relevant information for body composition analysis, sarcopenia assessment, and musculoskeletal disease monitoring, but extracting these measurements at scale requires accurate tissue segmentation and an automated quantification workflow. Existing public segmentation tools are not designed for comprehensive lower extremity CT analysis, particularly for clinically important inter/intramuscular adipose tissue, and most public methods only provide mask prediction rather than an end-to-end quantification system. To address this problem, we present LegSegNet, a deep learning system for lower extremity CT tissue segmentation and body composition quantification. Given an input CT scan, LegSegNet segments bone, skeletal muscle, subcutaneous adipose tissue, and inter/intramuscular adipose tissue. It then computes quantitative tissue measurements for downstream analysis. We developed the segmentation model using 1,302 manually annotated CT slices and evaluated it on 900 held-out test slices, with all annotations reviewed by radiologists. We benchmark LegSegNet against a broad set of 2D segmentation methods, including CNN-based models, transformer-based models, and finetuned foundation models, and further evaluate its generalization on an external public CT dataset. LegSegNet achieves the best overall segmentation performance, with an average Dice score of 89.31 on the held-out test set. To our knowledge, LegSegNet is the first publicly available end-to-end system for lower extremity CT tissue segmentation and quantification, providing a practical evaluation tool for future computer vision research in medical image analysis. The code and model weights are available at: https://github.com/mazurowski-lab/LegSegNet

2605.30826 2026-06-01 cs.CL cs.AI

Beyond Agreement: Scoring Panel-Surfaced Biomedical Entity Candidates for Curator Triage

超越一致性:为策展人分类对面板筛选的生物医学实体候选进行评分

Shuheng Cao, Ruiqi Chen, Renjie Cao, Zhenhao Zhang, Siyu Zhang, Tingting Dan

AI总结 提出BioConCal评分器,利用无金标准的一致性、提及、表面可用性和文档特征,对多LLM面板筛选的候选实体进行评分,显著提高候选筛选的精确率和召回率。

详情
AI中文摘要

生物医学命名实体识别对于现代LLM来说看似简单:合理的生物医学提及容易浮现,但语料库约定正确性取决于标注约定、跨度边界、实体粒度和类型模式。多LLM一致性是一个显著性信号,而非语料库约定正确性。我们引入了一个候选级面板输出基准,用于面板筛选的候选验证,其中单元是由明确定义的多模型面板对齐的候选,而非独立提取器输出。该基准将八个LLM在五个公共生物医学NER数据集上的预测对齐到一个候选主表中。BioConCal是一个领域内监督评分器,它利用推理时的无金标准一致性、提及、表面可用性和文档特征,为固定候选流实例化这一层。在领域内,BioConCal将AUROC从原始一致性的0.753提高到0.910。在验证选择的0.95精确率目标下,它选择了1,340个候选,经验测试精确率为0.939,而原始一致性为293个候选。这对应于候选级召回率0.592和语料库级召回率0.523,而面板内行标签上限为0.883。主要好处不是恢复每个面板成员遗漏的实体,而是将嘈杂的面板流重塑为更高产出的审查队列。在实体类型转移下,阈值需要目标领域验证,而精确字符定位仍然是单独的后处理步骤。

英文摘要

Biomedical NER is deceptively simple for modern LLMs: plausible biomedical mentions are easy to surface, but corpus-convention correctness depends on annotation conventions, span boundaries, entity granularity, and type schemas. Multi-LLM agreement is a salience signal, not corpus-convention correctness. We introduce a candidate-level panel-output benchmark for panel-surfaced candidate verification, where the unit is an aligned candidate surfaced by an explicitly defined multi-model panel rather than a standalone extractor output. The benchmark aligns eight LLMs' predictions over five public biomedical NER datasets into a candidate master table. BioConCal is an in-domain supervised scorer that instantiates this layer with inference-time gold-free agreement, mention, surface-availability, and document features for a fixed candidate stream. In domain, BioConCal improves AUROC from 0.753 for raw agreement to 0.910. At a validation-selected 0.95 precision target it selects 1,340 candidates at empirical test precision 0.939, compared with 293 for raw agreement. This corresponds to candidate-level recall 0.592 and corpus-level recall 0.523 against a within-panel row-label ceiling of 0.883. The main benefit is not recovering entities missed by every panel member, but reshaping a noisy panel stream into a higher-yield review queue. Under entity-type shift, thresholds require target-domain validation, and exact character localization remains a separate deterministic post-processing step.

2605.30825 2026-06-01 cs.LG cs.AI math.OC stat.ML

Unlearning in Diffusion Models: A Unified Framework with KL Divergence and Likelihood Constraints

扩散模型中的遗忘学习:基于KL散度和似然约束的统一框架

Shervin Khalafi, Alejandro Ribeiro, Dongsheng Ding

AI总结 提出一个约束优化框架,通过最小化与预训练模型的偏差并施加与遗忘分布的分离约束,实现扩散模型中的概念和数据遗忘,并基于KL散度和似然约束推导最优解及原始-对偶算法。

详情
Comments
27 pages, 6 figures, 4 tables; Accepted by ICML 2026
AI中文摘要

扩散模型中的遗忘学习旨在移除不需要的数据或概念,同时保留预训练模型的效用——这两个目标本质上相互冲突。我们提出了一个原则性的约束优化框架,将遗忘学习形式化为在满足与遗忘分布的显式分离约束下,最小化与预训练模型的偏差。具体地,我们基于反向和正向KL散度以及似然约束,构建了三个约束优化问题。前两个问题泛化了现有的概念和数据遗忘方法,而第三个问题为遗忘学习提供了一种新颖且自然的表述。尽管KL约束非凸,我们证明了所有三个问题的强对偶性,从而能够显式地表征其最优解作为遗忘目标,并为每个公式开发原始-对偶算法。实验结果表明,与基于权重的基线方法相比,我们的KL约束方法在概念和数据遗忘中实现了更优的保留-遗忘权衡,而基于似然的方法在匹配遗忘效果的同时,更好地保留了保留概念。

英文摘要

Unlearning in diffusion models aims to remove undesirable data or concepts while preserving the utility of pretrained models -- two fundamentally conflicting objectives. We propose a principled constrained optimization framework that formulates unlearning as minimizing the deviation from a pretrained model, subject to explicit separation constraints from the unlearning distributions. Specifically, we formulate three constrained optimization problems based on reverse and forward KL divergences, and likelihood constraints. The first two generalize existing approaches for concept and data unlearning, while the third offers a novel and natural formulation for unlearning. Despite the nonconvexity of the KL constraints, we establish strong duality for all three problems, enabling us to explicitly characterize their optimal solutions as unlearning targets and develop primal-dual algorithms for each formulation. Experimental results demonstrate that our KL-constrained approach achieves superior retention-unlearning tradeoffs compared to weight-based baselines for concept and data unlearning, and that our likelihood-based approach matches unlearning effectiveness while better preserving retained concepts compared to baselines.

2605.30824 2026-06-01 cs.AI

Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward

面向深度研究的规划器中心强化学习与结构感知奖励

Mustafa Anis Hussain, Xinle Wu, Yao Lu

AI总结 提出DecomposeR框架,通过将研究计划表示为有向无环图(DAG)并采用两阶段强化学习(规划器RL和回答器RL),实现显式结构化规划与细粒度奖励分配,在长文本基准上提升5.1-8.0分。

详情
AI中文摘要

深度研究任务要求LLM规划调查内容、检索证据,并在多个研究分支中综合长格式答案。现有训练范式要么依赖短格式可验证问答作为代理,要么优化单一的长轨迹,这使得规划和执行难以分离,并导致规划过程的信用分配薄弱。我们提出DecomposeR,一种以规划器为中心的深度研究框架,将研究计划表示为类型化有向无环图(DAG),使规划变得显式、结构化且可奖励。我们分两个阶段训练Qwen3-8B模型:规划器强化学习(RL)首先学习图结构和查询分解以改进研究规划,然后回答器强化学习(RL)基于所学计划学习分支级执行和最终综合。通过将奖励分配给显式的规划器令牌和结构化组件,而不是平坦的轨迹,DecomposeR实现了对规划的更细粒度优化,同时减少了端到端训练的模糊性。实验表明,由于改进了规划和回答能力,DecomposeR-8B在流行的长文本基准上比强可比开源基线提高了5.1-8.0分。

英文摘要

Deep research tasks require LLMs to plan what to investigate, retrieve evidence, and synthesize long-form answers across multiple branches of inquiry. Existing training paradigms either rely on short-form verifiable QA as a proxy or optimize monolithic long trajectories, which makes planning and execution difficult to disentangle and yields weak credit assignment for the planning process. We propose DecomposeR, a planner-centric deep research framework that represents research plans as typed directed acyclic graphs (DAGs), allowing planning to be made explicit, structured, and rewardable. We train a Qwen3-8B model in two stages: planner reinforcement learning (RL) first learns graph structure and query decomposition to improve research planning, and answerer reinforcement learning (RL) then learns branch-level execution and final synthesis conditioned on the learned plan. By assigning rewards to explicit planner tokens and structured components rather than to a flat trajectory, DecomposeR enables finer-grained optimization of planning while reducing the ambiguity of end-to-end training. Experiments show that DecomposeR-8B improves over strong comparable open baselines by 5.1-8.0 points on popular long-form benchmarks due to improved planning and answering capabilities.

2605.30818 2026-06-01 cs.ET cs.AI cs.SD

GaMi: Geometry-Agnostic Material Identification via Cross-Modal Subtractive Disentanglement

GaMi: 通过跨模态减法解缠实现几何无关的材料识别

Zhiwei Chen, Yijie Li, Yimo Zhang, Shiyun Shao, Yichao Chen, Dian Ding, Liang Wang, Haiwei Wu, Liwei Guo, Jie Yang, Xiaosong Zhang, Yongzhao Zhang

AI总结 提出GaMi系统,利用毫米波和声学传感的跨模态减法解缠框架,在不受约束的几何条件下实现高精度材料识别。

详情
Comments
17 pages, 18 figures
AI中文摘要

非接触式材料识别使具身智能能够进行自适应交互,但面临几何诱导变化(如方向、形状、距离)和单模态模糊性的挑战。本文提出GaMi,一种集成毫米波和声学传感的多模态材料识别系统,可在不受约束的几何条件下稳健运行。利用共置双模态传感器之间共享几何一致性的洞察,GaMi采用样本内跨模态减法解缠框架。通过语义对齐模态并减去共享几何上下文,它隔离了内在材料特征。此外,GaMi引入样本间对比学习以纠正跨模态未对准引起的残余干扰。另外,两种模态之间的配对自适应策略实现了跨设备的少样本泛化。在20种材料上的广泛评估表明,GaMi达到了95.2%的准确率,在未见几何条件下优于单模态基线。

英文摘要

Non-contact material identification enables adaptive interaction for embodied intelligence yet faces challenges from geometry-induced variations (e.g., orientation, shape, distance) and single-modality ambiguities. In this paper, we present GaMi, a multimodal material identification system integrating mmWave and acoustic sensing to robustly operate under unconstrained geometric conditions. By leveraging the insight of shared geometric consistency between co-located bimodal sensors, GaMi employs an intra-sample cross-modal subtractive disentanglement framework. By semantically aligning modalities and subtracting the shared geometric context, it isolates intrinsic material features. Furthermore, GaMi incorporates inter-sample contrastive learning to correct the residual interference caused by cross-modal misalignment. Additionally, a pairing-based adaptation strategy between two modalities enables few-shot generalization across devices. Extensive evaluations on 20 materials show that GaMi achieves 95.2% accuracy, outperforming single-modality baselines across unseen geometric conditions.

2605.30813 2026-06-01 cs.CL cs.DS

Incremental BPE Tokenization

增量式 BPE 分词

Shenghu Jiang, Ruihao Gong

AI总结 提出一种增量式 BPE 分词算法,以 O(log² t) 时间处理每个字节,总复杂度 O(n log² t),支持流式场景,相比 Hugging Face 和 OpenAI 的实现速度提升约 3 倍并降低延迟。

详情
Comments
Accepted to ICML 2026 (Spotlight)
AI中文摘要

我们提出了一种用于增量式字节对编码(BPE)分词的新算法。该算法在最坏情况下以 $\mathcal{O}(\log^2 t)$ 时间处理每个输入字节,总复杂度为 $\mathcal{O}(n \log^2 t)$,其中 $n$ 是输入长度,$t$ 是最大分词长度。该算法增量地维护输入文本每个前缀的 BPE 分词结果,实现了由固定合并规则集定义的标准 BPE 合并过程。这使得在流式设置中能够进行高效的部分分词。作为标准 BPE 的即插即用替代方案,我们的方法相比 Hugging Face 的分词器实现了高达约 3 倍的加速,并在病态输入上相比 OpenAI 的 tiktoken 展示了显著的延迟降低。我们进一步引入了一种急切输出算法,支持流式输出,在增量分词过程中一旦确定分词边界即发出分词。总体而言,我们的结果表明 BPE 分词可以以具有强最坏情况保证的增量方式执行,同时在现代大语言模型流水线中提供实际的延迟优势。代码:https://github.com/ModelTC/mtc-inc-bpe

英文摘要

We propose a novel algorithm for incremental Byte Pair Encoding (BPE) tokenization. The algorithm processes each input byte in worst-case $\mathcal{O}(\log^2 t)$ time, leading to an overall complexity of $\mathcal{O}(n \log^2 t)$, where $n$ is the input length and $t$ is the maximum token length. The algorithm incrementally maintains BPE tokenization results for every prefix of the input text, implementing the standard BPE merge procedure defined by a fixed set of merge rules. This enables efficient partial tokenization in streaming settings. Functioning as a drop-in replacement for standard BPE, our approach achieves a speedup of up to ${\sim}3\times$ over Hugging Face's tokenizers, and demonstrates significant latency reductions over OpenAI's tiktoken on pathological inputs. We further introduce an eager output algorithm that enables streaming output, emitting tokens as soon as token boundaries are determined during incremental tokenization. Overall, our results demonstrate that BPE tokenization can be performed incrementally with strong worst-case guarantees, while providing practical latency benefits in modern large language model pipelines. Code: https://github.com/ModelTC/mtc-inc-bpe

2605.30812 2026-06-01 cs.LG physics.comp-ph

Learning Permutation-invariant Macroscopic Dynamics

学习置换不变的宏观动力学

Zhichao Han, Mengyi Chen, Qianxiao Li

AI总结 提出一种置换不变的自编码器框架,通过重建质量分布而非逐点重建来学习无序微观系统的宏观动力学,并在粒子系统、Lennard-Jones流体和聚合物拉伸动力学中验证了有效性。

详情
Comments
ICML 2026 submission
AI中文摘要

准确建模高维微观系统的宏观动力学在科学领域具有广泛兴趣。许多数据驱动方法通过自编码器学习低维潜在状态,该自编码器针对逐点输入重建进行训练。这些方法通常假设输入中微观自由度的固定顺序。然而,在许多场景中,例如粒子系统,微观状态本质上是无序的。这激发了一种学习置换不变潜在表示的自编码器框架。为此,我们采用置换不变的编码器,并设计解码器来重建以观测点为中心的质量分布,而不是逐样本重建。然后,我们联合学习可观测量和潜在状态的宏观动力学。我们展示了所提方法在各种微观设置中的有效性和鲁棒性,包括学习相互作用粒子系统中的能量动力学、预测Lennard-Jones流体中的混合动力学,以及从拉伸力场中运动的聚合物视频数据建模拉伸动力学。

英文摘要

Accurately modeling the macroscopic dynamics of high-dimensional microscopic systems is of broad interest across the sciences. Many data-driven approaches learn a low-dimensional latent state through an autoencoder trained for pointwise input reconstruction. These methods typically assume a fixed ordering of microscopic degrees of freedom in the input. However, in many settings, such as particle systems, the microscopic state is inherently unordered. This motivates an autoencoder framework that learns permutation-invariant latent representations. To this end, we adopt a permutation-invariant encoder and design the decoder to reconstruct the mass distribution centered at the observed points rather than per-sample reconstruction. We then jointly learn the macroscopic dynamics of the observables together with the latent states. We demonstrate the effectiveness and robustness of the proposed method across a range of microscopic settings, including learning the energy dynamics in interacting particle systems, predicting mixing dynamics in Lennard-Jones fluids, and modeling the stretching dynamics from video data of polymers moving in an elongational force field.

2605.30811 2026-06-01 cs.LG

Non-destructive Identification of Oyster Species is possible from Hyperspectral Images with Machine Learning

基于高光谱图像与机器学习实现牡蛎物种的无损鉴别

Ethan Kane Waters, Max Wingfield, Aiden Mellor, Paul Stewart, Iman Tahmasbian

AI总结 本研究利用高光谱成像结合偏最小二乘判别分析和卷积神经网络,实现了对黑唇岩牡蛎和悉尼岩牡蛎的无损、高准确率鉴别。

详情
Comments
13 pages, 9 figures
AI中文摘要

区分牡蛎物种对于开发适合生产系统的新型商业牡蛎物种至关重要,并且对海鲜供应链的可追溯性至关重要。常见方法(如DNA分析)具有破坏性且耗时。本研究探讨了使用高光谱成像(HSI)区分黑唇岩牡蛎(BL)和悉尼岩牡蛎(SR)的可能性。对活体BL和SR样本(N=156)用HSI相机(950-2515nm)进行扫描。使用蒙特卡洛交叉验证训练偏最小二乘判别分析(PLS-DA)和卷积神经网络(CNN),根据左右壳的光谱反射率区分BL和SR牡蛎。PLS-DA模型成功区分了左右壳的物种,中位测试集分类准确率为100%,优于CNN(分别为83%和96%)。通过电子显微镜测量了牡蛎壳表面和横截面的元素及矿物组成。右壳分析显示,BL的层数多于SR(4层 vs 2层)。右壳外层的碳和氧浓度存在差异,BL富含碳,SR富含氧。BL和SR右壳之间碳和氧浓度的变化可能反映了几丁质和糖蛋白的相对丰度或组成差异。模型导出的波长重要性对应于这些化合物特征官能团的振动模式,支持了这一观点。透射分析显示,光透过壳体和壳体边缘,表明光谱特征可能受到另一壳或肉的影响。最终,研究结果突显了一种快速、无损的牡蛎物种鉴别方法。

英文摘要

Differentiating between oyster species is important for developing new commercial oyster species suited to production systems and is critical for traceability in seafood supply chains. Common methods, such as DNA profiling, are destructive and time consuming. The possibility of using hyperspectral imaging (HSI) for discriminating between Black-Lip rock (BL) and Sydney rock (SR) oysters was investigated. Live BL and SR samples (N = 156) were scanned with a HSI camera (950-2515nm). Partial Least Square Discriminant Analysis and Convolutional Neural Networks were trained with Monte Carlo Cross Validation to distinguish BL and SR oysters from the spectral reflectance of their left and rights valves. The PLS-DA model successfully distinguished between the species from both the left and right valves with a median test set classification accuracy of 100%, out performing the CNN with 83% and 96% respectively. Elemental and mineralogical composition in the surface and cross-section of oyster valves were measured with electron microscopy. Analysis of the right valve revealed a greater number of layers in BL compared to SR (4 vs 2). The concentrations of carbon and oxygen varied in the outer layer of the right valves, with BL being rich in carbon and SR being rich in oxygen. The variation in carbon and oxygen concentrations observed between BL and SR right valves may reflect differences in the relative abundance or composition of chitin and glycoproteins. This is supported by model-derived wavelength importance corresponding to vibrational modes of functional groups characteristic of these compounds. Transmittance analysis revealed that light was transmitted through the valves, around the valve edges, indicating that the spectral signatures may have been influenced by the other valve or the meat. Ultimately, the findings highlight an effective rapid, non-destructive methodology for oyster species.