arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 3405
专题追踪 全部专题
2602.20210 2026-05-26 cs.LG cs.AI

Multimodal Crystal Flow: Any-to-Any Modality Generation for Unified Crystal Modeling

多模态晶体流:面向统一晶体建模的任意模态生成

Kiyoung Seong, Sungsoo Ahn, Sehui Han, Changyoung Park

发表机构 * Graduate School of AI, KAIST, Seoul, South Korea(韩国科学技术院人工智能研究生院,首尔,韩国) Materials Intelligence Lab, LG AI Research, Seoul, South Korea(LG AI研究所材料智能实验室,首尔,韩国)

AI总结 提出多模态晶体流(MCFlow),一种统一的多模态流模型,通过原子类型和晶体结构的独立时间变量实现多种晶体生成任务,并在MP-20和MPTS-52基准上达到与任务特定基线竞争的性能。

详情
AI中文摘要

晶体建模涵盖一系列条件和非条件生成任务,包括晶体结构预测(CSP)和从头生成(DNG)。尽管最近的深度生成模型表现出有前景的性能,但它们仍然主要是任务特定的,缺乏跨任务共享晶体表示的统一框架。为了解决这一限制,我们提出了多模态晶体流(MCFlow),一种统一的多模态流模型,通过原子类型和晶体结构的独立时间变量将多种晶体生成任务实现为不同的推理轨迹。为了在标准Transformer模型中实现多模态流,我们引入了一种具有层次排列增强的组合和对称感知原子排序,无需显式结构模板即可注入组合和晶体学先验。在MP-20和MPTS-52基准上的实验表明,单个MCFlow模型在CSP、DNG和结构条件原子类型生成方面与任务特定基线具有竞争力。

英文摘要

Crystal modeling spans a family of conditional and unconditional generation tasks, including crystal structure prediction (CSP) and de novo generation (DNG). While recent deep generative models have shown promising performance, they remain largely task-specific, lacking a unified framework that shares crystal representations across tasks. To address this limitation, we propose Multimodal Crystal Flow (MCFlow), a unified multimodal flow model that realizes multiple crystal generation tasks as distinct inference trajectories via independent time variables for atom types and crystal structures. To enable multimodal flow in a standard transformer model, we introduce a composition- and symmetry-aware atom ordering with hierarchical permutation augmentation, injecting compositional and crystallographic priors without explicit structural templates. Experiments on the MP-20 and MPTS-52 benchmarks show that a single MCFlow model is competitive with task-specific baselines across CSP, DNG, and structure-conditioned atom type generation.

2602.19333 2026-05-26 cs.CL cs.IR cs.SI

PerSoMed: A Large-Scale Balanced Dataset for Persian Social Media Text Classification

PerSoMed:用于波斯社交媒体文本分类的大规模平衡数据集

Isun Chehreh, Ebrahim Ansari

发表机构 * Institute for Advanced Studies in Basic Sciences (IASBS)(基础科学基础研究 institute)

AI总结 该研究构建了首个大规模平衡的波斯社交媒体文本分类数据集,包含9个类别共36,000条帖子,并基于BiLSTM、XLM-RoBERTa、TookaBERT等模型进行基准测试,其中TookaBERT-Large取得了最佳性能(F1分数0.9621)。

Comments 10 pages, including 1 figure

详情
AI中文摘要

本研究引入了首个大规模、良好平衡的波斯社交媒体文本分类数据集,专门用于解决该领域缺乏综合资源的问题。该数据集包含9个类别(经济、艺术、体育、政治、社会、健康、心理、历史、科技)的36,000条帖子,每个类别4,000个样本,以确保类别分布平衡。数据收集涉及来自多个波斯社交媒体平台的60,000条原始帖子,随后进行严格的预处理和混合标注,结合基于ChatGPT的少样本提示和人工验证。为了缓解类别不平衡,我们采用了带语义冗余移除的欠采样和结合词汇替换与生成提示的高级数据增强策略。我们对多个模型进行了基准测试,包括BiLSTM、XLM-RoBERTa(使用LoRA和AdaLoRA适配)、FaBERT、基于SBERT的架构以及波斯语专用TookaBERT(Base和Large)。实验结果表明,基于Transformer的模型始终优于传统神经网络,其中TookaBERT-Large取得了最佳性能(精确率:0.9622,召回率:0.9621,F1分数:0.9621)。按类别评估进一步证实了所有类别的稳健性能,尽管社会和政治文本由于固有歧义而得分略低。本研究提供了一个新的高质量数据集,并对前沿模型进行了全面评估,为波斯语自然语言处理的进一步发展奠定了坚实基础,包括趋势分析、社会行为建模和用户分类。该数据集公开可用,以支持未来的研究工作。

英文摘要

This research introduces the first large-scale, well-balanced Persian social media text classification dataset, specifically designed to address the lack of comprehensive resources in this domain. The dataset comprises 36,000 posts across nine categories (Economic, Artistic, Sports, Political, Social, Health, Psychological, Historical, and Science & Technology), each containing 4,000 samples to ensure balanced class distribution. Data collection involved 60,000 raw posts from various Persian social media platforms, followed by rigorous preprocessing and hybrid annotation combining ChatGPT-based few-shot prompting with human verification. To mitigate class imbalance, we employed undersampling with semantic redundancy removal and advanced data augmentation strategies integrating lexical replacement and generative prompting. We benchmarked several models, including BiLSTM, XLM-RoBERTa (with LoRA and AdaLoRA adaptations), FaBERT, SBERT-based architectures, and the Persian-specific TookaBERT (Base and Large). Experimental results show that transformer-based models consistently outperform traditional neural networks, with TookaBERT-Large achieving the best performance (Precision: 0.9622, Recall: 0.9621, F1- score: 0.9621). Class-wise evaluation further confirms robust performance across all categories, though social and political texts exhibited slightly lower scores due to inherent ambiguity. This research presents a new high-quality dataset and provides comprehensive evaluations of cutting-edge models, establishing a solid foundation for further developments in Persian NLP, including trend analysis, social behavior modeling, and user classification. The dataset is publicly available to support future research endeavors.

2602.18640 2026-05-26 cs.AI

Decoding ML Decision: An Agentic Reasoning Framework for Large-Scale Ranking System

解码机器学习决策:面向大规模排序系统的智能体推理框架

Longfei Yun, Yihan Wu, Haoran Liu, Xiaoxuan Liu, Ziyun Xu, Yi Wang, Yang Xia, Pengfei Wang, Mingze Gao, Yunxiang Wang, Changfan Chen, Wenjie Fu, Hong Yan, Junfeng Pan

发表机构 * Meta

AI总结 提出GEARS框架,通过智能体技能封装排序专家知识,将排序优化转化为自主发现过程,实现高层意图驱动的系统调控并保证生产可靠性。

Comments 12 pages, 5 figures

详情
AI中文摘要

现代大规模排序系统在竞争目标、操作约束和不断变化的产品需求的复杂环境中运行。该领域的进展越来越受到工程上下文约束的瓶颈:将模糊的产品意图转化为合理、可执行、可验证的假设的艰巨过程,而不仅仅是建模技术本身。我们提出了GEARS(生成式智能体排序系统引擎),这是一个将排序优化重新定义为可编程实验环境中的自主发现过程的框架。GEARS不是将优化视为静态模型选择,而是利用专门智能体技能将排序专家知识封装为可复用的推理能力,使操作者能够通过高层意图(如氛围个性化)来引导系统。此外,为确保生产可靠性,该框架集成了验证钩子以强制执行统计稳健性,并过滤掉过度拟合短期信号的脆弱策略。跨不同产品表面的实验验证表明,GEARS通过协同算法信号与深度排序上下文,同时保持严格的部署稳定性,能够持续识别出接近帕累托最优的优越策略。

英文摘要

Modern large-scale ranking systems operate within a sophisticated landscape of competing objectives, operational constraints, and evolving product requirements. Progress in this domain is increasingly bottlenecked by the engineering context constraint: the arduous process of translating ambiguous product intent into reasonable, executable, verifiable hypotheses, rather than by modeling techniques alone. We present GEARS (Generative Engine for Agentic Ranking Systems), a framework that reframes ranking optimization as an autonomous discovery process within a programmable experimentation environment. Rather than treating optimization as static model selection, GEARS leverages Specialized Agent Skills to encapsulate ranking expert knowledge into reusable reasoning capabilities, enabling operators to steer systems via high-level intent vibe personalization. Furthermore, to ensure production reliability, the framework incorporates validation hooks to enforce statistical robustness and filter out brittle policies that overfit short-term signals. Experimental validation across diverse product surfaces demonstrates that GEARS consistently identifies superior, near-Pareto-efficient policies by synergizing algorithmic signals with deep ranking context while maintaining rigorous deployment stability.

2602.17658 2026-05-26 cs.LG cs.AI cs.IT math.IT

MARS: Margin and Semantic-Aware Data Augmentation for Reward Modeling

MARS:面向奖励建模的边界与语义感知数据增强

Payel Bhattacharjee, Osvaldo Simeone, Ravi Tandon

发表机构 * University of Arizona(亚利桑那大学) Northeastern University London(伦敦东北大学)

AI总结 提出MARS框架,通过优先增强低边界偏好对并利用语义距离细化,提升奖励模型质量和对齐性能。

详情
AI中文摘要

奖励建模是RLHF、RLAIF和基于PPO的策略优化等对齐流程的核心,但其可靠性受限于有限且异构的人类偏好数据,这些数据难以大规模收集。虽然合成增强可以扩展偏好监督,但现有方法通常均匀增强或在表示层面增强,而不针对奖励模型不确定或容易误排序的示例。在本文中,我们介绍了MARS(面向奖励建模的边界与语义感知数据增强),一种自适应增强框架,优先考虑低边界偏好对,并使用语义距离作为第二层细化,以增强选择响应和拒绝响应之间的对比。在多个偏好数据集、奖励模型骨干、下游对齐设置以及包括RewardBench和AlpacaEval在内的基准测试中,MARS在奖励模型质量和对齐性能上都优于现有基线。我们的结果表明,当同时由模型边界和语义结构引导时,奖励模型增强最为有效。

英文摘要

Reward modeling is central to alignment pipelines such as RLHF, RLAIF, and PPO-based policy optimization, yet its reliability is constrained by limited and heterogeneous human preference data that are expensive to collect at scale. While synthetic augmentation can expand preference supervision, existing methods often augment uniformly or at the representation level, without targeting examples where the reward model is uncertain or prone to mis-ranking. In this paper, we introduce MARS (Margin and Semantic-Aware Data Augmentation for Reward Modeling), an adaptive augmentation framework that prioritizes low-margin preference pairs and uses semantic distance as a second layer for refinement to enhance the contrast between the chosen and rejected responses. Across multiple preference datasets, reward-model backbones, downstream alignment settings, and benchmarks including RewardBench and AlpacaEval, MARS improves both reward-model quality and alignment performance over existing baselines. Our results show that reward-model augmentation is most effective when guided by both model margins and semantic structure.

2602.17234 2026-05-26 cs.AI cs.LG

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection and Mitigation in LLM Backtesting

所有泄漏都重要,有些泄漏更重要:LLM回测中可解释的时间污染检测与缓解

Zeyu Zhang, Ryan Chen, Bradly C. Stadie

发表机构 * Department of Statistics and Data Science, Northwestern University(统计与数据科学系,西北大学) Bridgewater AIA Labs(布里奇沃特AIA实验室)

AI总结 提出基于Shapley值的声明级评估框架Shapley-DCLR和推理时架构TimeSPEC,用于检测和缓解LLM回测中的时间污染问题。

Comments 8 pages plus appendix

详情
AI中文摘要

对已解决事件进行回测的LLM假设模型仅基于截止前知识进行推理,然而预训练模型不可避免地泄漏截止后知识。我们引入了一个声明级评估框架,将预测理由分解为原子声明,并应用Shapley值量化每个声明的决策影响,从而得到 extbf{Shapley-DCLR}( extbf{Shapley}加权的 extbf{决策关键泄漏率})——一个可解释的度量,用于衡量决策驱动推理中被污染的比例。我们进一步提出 extbf{TimeSPEC}(基于提取声明的时间监督预测),一种推理时架构,它将时间过滤的检索与声明级监督交织在一起,生成完全基于截止前证据的预测。在三个LLM上的消融实验证实了检索和监督共同必要;三项任务探测进一步说明,时间强制的性能成本与每个任务对截止后信息的依赖程度成正比。

英文摘要

Backtesting LLMs on resolved events assumes models reason only from pre-cutoff knowledge, yet pretrained models inevitably leak post-cutoff knowledge. We introduce a claim-level evaluation framework that decomposes prediction rationales into atomic claims and applies Shapley values to quantify each claim's decision impact, yielding \textbf{Shapley-DCLR} (\textbf{Shapley}-weighted \textbf{D}ecision-\textbf{C}ritical \textbf{L}eakage \textbf{R}ate) -- an interpretable metric measuring what fraction of decision-driving reasoning is contaminated. We further propose \textbf{TimeSPEC} (\textbf{Time}-\textbf{S}upervised \textbf{P}rediction with \textbf{E}xtracted \textbf{C}laims), an inference-time architecture that interleaves temporally-filtered retrieval with claim-level supervision, producing predictions grounded entirely in pre-cutoff evidence. Across three LLMs, the ablation experiments confirm retrieval and supervision are jointly necessary; and a three-task probe further illstrates that the performance cost of temporal enforcement scales with each task's reliance on post-cutoff information.

2602.16229 2026-05-26 cs.LG

Factored Latent Action World Models

因子化潜在动作世界模型

Zizhao Wang, Chang Shi, Jiaheng Hu, Kevin Rohling, Roberto Martín-Martín, Amy Zhang, Peter Stone

发表机构 * University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 提出因子化潜在动作模型(FLAM),通过将场景分解为独立因子并学习各自的潜在动作,提升了无动作视频中多实体动态建模的准确性和视频生成质量。

详情
AI中文摘要

从无动作视频中学习潜在动作已成为扩展可控世界模型学习的强大范式。潜在动作为用户迭代生成和操作视频提供了自然接口。然而,大多数现有方法依赖整体逆动态和正动态模型,学习单一潜在动作来控制整个场景,因此在多个实体同时行动的复杂环境中表现不佳。本文引入因子化潜在动作模型(FLAM),一种因子化动态框架,将场景分解为独立因子,每个因子推断自己的潜在动作并预测自己的下一步因子值。与整体模型相比,这种因子化结构能够更准确地建模复杂多实体动态,并提高无动作视频设置中的视频生成质量。基于模拟和真实世界多实体数据集的实验,我们发现FLAM在预测准确性和表示质量方面优于先前工作,并促进了下游策略学习,展示了因子化潜在动作模型的优势。

英文摘要

Learning latent actions from action-free video has emerged as a powerful paradigm for scaling up controllable world model learning. Latent actions provide a natural interface for users to iteratively generate and manipulate videos. However, most existing approaches rely on monolithic inverse and forward dynamics models that learn a single latent action to control the entire scene, and therefore struggle in complex environments where multiple entities act simultaneously. This paper introduces Factored Latent Action Model (FLAM), a factored dynamics framework that decomposes the scene into independent factors, each inferring its own latent action and predicting its own next-step factor value. This factorized structure enables more accurate modeling of complex multi-entity dynamics and improves video generation quality in action-free video settings compared to monolithic models. Based on experiments on both simulation and real-world multi-entity datasets, we find that FLAM outperforms prior work in prediction accuracy and representation quality, and facilitates downstream policy learning, demonstrating the benefits of factorized latent action models.

2602.15811 2026-05-26 cs.CV cs.AI

CARL-CXR: Continual Adapter-Based Routing for Task-Unknown Chest Radiograph Classification

CARL-CXR:基于连续适配器路由的任务未知胸部X光片分类

Muthu Subash Kavitha, Anas Zafar, Amgad Muneer, Jia Wu

发表机构 * Department of Imaging Physics, The University of Texas MD Anderson Cancer Center(影像物理系,德克萨斯大学MD安德森癌症中心)

AI总结 提出CARL-CXR框架,通过固定高容量骨干网络、增量添加轻量级任务特定适配器和分类头,以及潜在任务选择器,解决任务未知推理下的胸部X光片增量分类问题,显著减少灾难性遗忘并提升路由准确性。

Comments 9 pages, 4 figures

详情
AI中文摘要

胸部X光片分类器的临床部署需要模型能够在新数据集可用时进行更新,而无需对先前观察到的数据进行重新训练或降低已验证的性能。我们研究了任务未知推理下的任务增量连续学习设置,其中异质的胸部X光数据集顺序到达,且在部署时任务身份不可用。我们提出了CARL-CXR,一个基于连续适配器的路由框架,该框架保持固定的高容量骨干网络,同时增量引入轻量级任务特定适配器和分类头。一个潜在任务选择器基于适配器条件特征进行操作,将每个输入动态路由到最相关的任务路径,利用紧凑的任务原型和特征级经验回放来在顺序更新中保留任务身份,而无需存储原始图像。在MIMIC-CXR和CheXpert两个具有不同患者群体、成像设备和注释流程的大规模数据集上的实验表明,CARL-CXR实现了最小的灾难性遗忘(AUROC下降0.012),比已建立的连续学习基线LwF和EWC分别减少了6倍和11倍,同时保持了具有竞争力的诊断性能(AUROC 0.74)。在任务未知部署下,CARL-CXR在路由准确性上比联合训练高出12.5个百分点(75.0% vs. 62.5%):与LwF和EWC不同,后者在推理时需要明确的任务标识符且不提供路由机制。

英文摘要

Clinical deployment of chest radiograph classifiers requires models that can be updated as new datasets become available without retraining on previously observed data or degrading validated performance. We study a task-incremental continual learning setting for chest radiograph classification under task-unknown inference, where heterogeneous chest X-ray datasets arrive sequentially and task identity is unavailable at deployment time. We propose CARL-CXR, a continual adapter-based routing framework that maintains a fixed high-capacity backbone while incrementally introducing lightweight task-specific adapters and classifier heads. A latent task selector operates on adapter-conditioned features to dynamically route each input to the most relevant task pathway, leveraging compact task prototypes and feature-level experience replay to preserve task identity across sequential updates without storing raw images. Experiments on MIMIC-CXR and CheXpert two large-scale datasets with distinct patient populations, imaging devices, and annotation pipelines demonstrate that CARL-CXR achieves minimal catastrophic forgetting (0.012 AUROC drop), representing a 6X and 11X reduction over established continual learning baselines LwF and EWC respectively, while maintaining competitive diagnostic performance (AUROC 0.74). Under task unknown deployment, CARL-CXR outperforms joint training by 12.5 points in routing accuracy (75.0% vs. 62.5%): unlike LwF and EWC, which require explicit task identifiers at inference and provide no routing mechanism.

2602.15620 2026-05-26 cs.CL cs.AI

STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens

STAPO:通过抑制稀有虚假标记稳定大语言模型的强化学习

Shiqi Liu, Zeyu He, Guojian Zhan, Letian Tao, Zhilong Zheng, Jiang Wu, Yinuo Wang, Yang Guan, Kehua Sheng, Bo Zhang, Keqiang Li, Jingliang Duan, Shengbo Eben Li

发表机构 * School of Vehicle Mobility \& College of AI, Tsinghua University Didi Voyager Labs, DiDi Autonomous Driving

AI总结 针对强化学习微调大语言模型时因稀有虚假标记导致训练不稳定和性能崩溃的问题,提出STAPO方法,通过抑制这些标记的梯度扰动,在多个数学推理基准上实现稳定训练和性能提升。

详情
AI中文摘要

强化学习显著提升了大语言模型的推理能力,但现有的强化学习微调方法严重依赖熵正则化和重加权等启发式技术来维持稳定性。实践中,这些方法常遭遇后期性能崩溃,导致推理质量下降和训练不稳定。我们识别出这一不稳定的关键因素:一小部分标记(称为虚假标记,约占0.01%)对推理结果贡献甚微,但由于继承了完整的序列级奖励而获得不成比例放大的梯度更新。我们提出了一个统一框架,用于评估虚假风险、梯度范数和熵变化下标记级优化影响。基于对严重破坏优化的标记特征的分析,我们提出了抑制虚假标记(S2T)机制,以有效抑制其梯度扰动。将该机制融入基于组的目标中,我们提出了虚假标记感知策略优化(STAPO),促进了稳定有效的大规模模型优化。在使用Qwen 1.7B、8B和14B基础模型的六个数学推理基准上,STAPO一致展现出优越的熵稳定性,并在GRPO、20-Entropy和JustRL基础上平均性能提升11.49%($\rho_{\mathrm{T}}$=1.0, top-p=1.0)和3.73%($\rho_{\mathrm{T}}$=0.7, top-p=0.9)。

英文摘要

Reinforcement Learning (RL) has significantly improved large language model reasoning, but existing RL fine-tuning methods rely heavily on heuristic techniques such as entropy regularization and reweighting to maintain stability. In practice, they often suffer from late-stage performance collapse, leading to degraded reasoning quality and unstable training. We identify a key factor behind this instability: a small fraction of tokens, termed spurious tokens (around 0.01%), which contribute little to the reasoning outcome but receive disproportionately amplified gradient updates due to inheriting the full sequence-level reward. We present a unified framework for evaluating token-level optimization impacts across spurious risk, gradient norms, and entropy changes. Building on the analysis of token characteristics that severely disrupt optimization, we propose the Silencing Spurious Tokens (S2T) mechanism to efficiently suppress their gradient perturbations. Incorporating this mechanism into a group-based objective, we propose Spurious-Token-Aware Policy Optimization (STAPO), which promotes stable and effective large-scale model refinement. Across six mathematical reasoning benchmarks using Qwen 1.7B, 8B, and 14B base models, STAPO consistently demonstrates superior entropy stability and achieves an average performance improvement of 11.49% ($ρ_{\mathrm{T}}$=1.0, top-p=1.0) and 3.73% ($ρ_{\mathrm{T}}$=0.7, top-p=0.9) over GRPO, 20-Entropy, and JustRL.

2602.11534 2026-05-26 cs.LG cs.AI

Krause Synchronization Transformers

Krause同步变换器

Jingkun Liu, Yisong Yue, Max Welling, Yue Song

发表机构 * Shanghai Qi Zhi Institute(上海启智研究院) College of AI, Tsinghua University(清华大学人工智能学院) Shanghai Jiao Tong University(上海交通大学) California Institute of Technology(加州理工学院) University of Amsterdam(阿姆斯特丹大学)

AI总结 提出基于有界置信共识动力学的Krause注意力机制,通过局部化稀疏交互替代全局softmax归一化,缓解表示坍缩和注意力汇聚现象,实现线性复杂度并提升性能。

Comments ICML 2026, Project page: https://jingkun-liu.github.io/krause-sync-transformers/

详情
AI中文摘要

Transformer中的自注意力依赖于全局归一化的softmax权重,导致所有token在每一层竞争影响力。当跨深度组合时,这种交互模式会诱导强同步动力学,倾向于收敛到主导模式,这种行为与表示坍缩和注意力汇聚现象相关。我们引入了Krause注意力,一种受有界置信共识动力学启发的原则性注意力机制。Krause注意力将基于相似性的全局聚合替换为基于距离的、局部化的、选择性稀疏的交互,促进结构化的局部同步而非全局混合。我们将这种行为与最近将Transformer动力学建模为相互作用粒子系统的理论联系起来,并展示有界置信交互如何自然地调节注意力集中并缓解注意力汇聚。将交互限制在局部邻域还将运行时复杂度从序列长度的二次方降低到线性。实验上,我们在多种设置中验证了Krause注意力,包括视觉(CIFAR/ImageNet上的ViT)、自回归图像生成(MNIST/CIFAR-10)、大语言模型(Llama/Qwen)以及从零开始训练的多种规模(100M/200M)的语言模型。在这些领域中,Krause注意力在提高计算效率的同时实现了持续的性能提升,突显了有界置信动力学作为注意力的一种可扩展且有效的归纳偏置。

英文摘要

Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer. When composed across depth, this interaction pattern induces strong synchronization dynamics that favor convergence toward a dominant mode, a behavior associated with representation collapse and attention sink phenomena. We introduce Krause Attention, a principled attention mechanism inspired by bounded-confidence consensus dynamics. Krause Attention replaces similarity-based global aggregation with distance-based, localized, and selectively sparse interactions, promoting structured local synchronization instead of global mixing. We relate this behavior to recent theory modeling Transformer dynamics as interacting particle systems, and show how bounded-confidence interactions naturally moderate attention concentration and alleviate attention sinks. Restricting interactions to local neighborhoods also reduces runtime complexity from quadratic to linear in sequence length. Empirically, we validate Krause Attention across diverse settings, including vision (ViT on CIFAR/ImageNet), autoregressive image generation (MNIST/CIFAR-10), large language models (Llama/Qwen), and language models trained from scratch at multiple scales (100M/200M). Across these domains, Krause Attention achieves consistent performance gains while improving computational efficiency, highlighting bounded-confidence dynamics as a scalable and effective inductive bias for attention.

2602.11439 2026-05-26 cs.LG

Multi-Level Strategic Classification: Incentivizing Improvement through Promotion and Relegation Dynamics

多层级策略分类:通过晋升与降级动态激励改进

Ziyuan Huang, Lina Alkarmi, Mingyan Liu

发表机构 * Electrical and Computer Engineering Department, University of Michigan, Ann Arbor, MI 48109, USA(密歇根大学电气与计算机工程系,安阿伯,MI 48109,美国)

AI总结 本文提出一种多层级晋升-降级框架,通过设计分类器阈值和难度递进来激励代理人诚实努力,并证明在温和条件下代理人可通过真实改进达到任意高水平。

Comments 9 pages, 4 figures, Accepted at ICML 2026

详情
AI中文摘要

策略分类研究自私个体或代理人操纵其响应以获得分类器有利决策结果的问题,通常当虚假行为成本低于真实努力时,他们会采取不诚实行为。虽然现有关于序列策略分类的研究主要关注优化动态分类器权重,但我们偏离这些以权重为中心的方法,分析了多层级晋升-降级框架中分类器阈值和难度递进的设计。我们的模型捕捉了由代理人的远见、技能保留以及资格与成就可自我强化的“助力效应”驱动的关键跨期激励。我们刻画了代理人的最优长期策略,并证明委托人可以设计一系列阈值来有效激励诚实努力。关键地,我们证明在温和条件下,该机制使代理人能够仅通过真实改进努力达到任意高水平。

英文摘要

Strategic classification studies the problem where self-interested individuals or agents manipulate their response to obtain favorable decision outcomes made by classifiers, typically turning to dishonest actions when they are less costly than genuine efforts. While existing studies on sequential strategic classification primarily focus on optimizing dynamic classifier weights, we depart from these weight-centric approaches by analyzing the design of classifier thresholds and difficulty progression within a multi-level promotion-relegation framework. Our model captures the critical inter-temporal incentives driven by an agent's farsightedness, skill retention, and a leg-up effect where qualification and attainment can be self-reinforcing. We characterize the agent's optimal long-term strategy and demonstrate that a principal can design a sequence of thresholds to effectively incentivize honest effort. Crucially, we prove that under mild conditions, this mechanism enables agents to reach arbitrarily high levels solely through genuine improvement efforts.

2602.08499 2026-05-26 cs.LG cs.AI

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards

上下文展开赌博机:面向可验证奖励的强化学习

Xiaodong Lu, Xiaohan Wang, Jiajun Chai, Guojun Yin, Wei Lin, Zhijun Chen, Yu Luo, Fuzhen Zhuang, Yikun Ban, Deqing Wang

发表机构 * School of Computer Science and Engineering, Beihang University(北京航空航天大学计算机科学与工程学院) School of Artificial Intelligence, Beihang University(北京航空航天大学人工智能学院) Huawei(华为)

AI总结 针对RLVR中展开使用无差别、短视导致的问题,提出上下文赌博机框架,自适应选择高价值展开,提升训练效率与性能。

详情
AI中文摘要

可验证奖励的强化学习(RLVR)是提升大型语言模型推理能力的有效范式。然而,现有RLVR方法以无差别和短视的方式使用展开:每个提示内不同质量的响应被统一对待,且历史展开在单次使用后被丢弃。这导致监督噪声大、样本效率低以及策略更新次优。我们通过将RLVR中的展开调度形式化为上下文赌博机问题,并提出一个统一的神经调度框架来解决这些问题,该框架在整个训练过程中自适应地选择高价值展开。每个展开被视为一个臂,其奖励由连续优化步骤之间诱导的性能增益定义。由此产生的调度器支持噪声感知的组内选择和历史展开的自适应全局重用,所有这些都在一个统一的原则性框架内。我们通过推导次线性遗憾界并证明扩大展开缓冲区可改善可实现性能上限,提供了理论依据。在六个数学推理基准上的实验表明,在多种RLVR优化方法中,性能和训练效率均有一致的提升。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) is an effective paradigm for improving the reasoning capabilities of large language models. However, existing RLVR methods utilize rollouts in an indiscriminate and short-horizon manner: responses of heterogeneous quality within each prompt are treated uniformly, and historical rollouts are discarded after a single use. This leads to noisy supervision, poor sample efficiency, and suboptimal policy updates. We address these issues by formulating rollout scheduling in RLVR as a contextual bandit problem and proposing a unified neural scheduling framework that adaptively selects high-value rollouts throughout training. Each rollout is treated as an arm whose reward is defined by the induced performance gain between consecutive optimization steps. The resulting scheduler supports both noise-aware intra-group selection and adaptive global reuse of historical rollouts within a single principled framework. We provide theoretical justification by deriving sublinear regret bounds and showing that enlarging the rollout buffer improves the achievable performance upper bound. Experiments on six mathematical reasoning benchmarks demonstrate consistent gains in performance and training efficiency across multiple RLVR optimization methods.

2602.08426 2026-05-26 cs.CL cs.AI cs.CV

Prism: Spectral-Aware Block-Sparse Attention

Prism: 频谱感知的块稀疏注意力

Xinghao Wang, Pengyu Wang, Xiaoran Liu, Fangxu Liu, Jason Chu, Kai Song, Xipeng Qiu

发表机构 * Fudan University(复旦大学) Shanghai Innovation Institute(上海创新研究院) ByteDance Inc.(字节跳动公司) OpenMOSS Team(OpenMOSS团队)

AI总结 针对长上下文LLM预填充中块稀疏注意力的块选择效率瓶颈,提出无训练频谱感知方法Prism,通过高低频分支分解和能量温度校准恢复位置信号,实现纯块级重要性估计,在保持精度同时实现高达5.1倍加速。

Comments ICML 2026

详情
AI中文摘要

块稀疏注意力有望加速长上下文LLM的预填充,但高效识别相关块仍是瓶颈。现有方法通常采用粗粒度注意力作为块重要性估计的代理,但往往诉诸昂贵的令牌级搜索或评分,导致显著的选择开销。在本工作中,我们将通过均值池化的标准粗粒度注意力的不准确性追溯到一个理论根源:均值池化与旋转位置嵌入(RoPE)之间的交互。我们证明均值池化充当低通滤波器,在高频维度上引起破坏性干扰,有效造成局部位置信息(如斜线模式)的“盲点”。为解决此问题,我们引入Prism,一种无训练的频谱感知方法,将块选择分解为高频和低频分支。通过应用基于能量的温度校准,Prism直接从池化表示中恢复衰减的位置信号,使得仅使用块级操作即可进行块重要性估计,从而提高效率。大量评估证实,Prism在保持与全注意力精度相当的同时,实现了高达$\mathbf{5.1 imes}$的加速。

英文摘要

Block-sparse attention is promising for accelerating long-context LLM pre-filling, yet identifying relevant blocks efficiently remains a bottleneck. Existing methods typically employ coarse-grained attention as a proxy for block importance estimation, but often resort to expensive token-level searching or scoring, resulting in significant selection overhead. In this work, we trace the inaccuracy of standard coarse-grained attention via mean pooling to a theoretical root cause: the interaction between mean pooling and Rotary Positional Embeddings (RoPE). We prove that mean pooling acts as a low-pass filter that induces destructive interference in high-frequency dimensions, effectively creating a "blind spot" for local positional information (e.g., slash patterns). To address this, we introduce Prism, a training-free spectral-aware approach that decomposes block selection into high-frequency and low-frequency branches. By applying energy-based temperature calibration, Prism restores the attenuated positional signals directly from pooled representations, enabling block importance estimation using purely block-level operations, thereby improving efficiency. Extensive evaluations confirm that Prism maintains accuracy parity with full attention while delivering up to $\mathbf{5.1\times}$ speedup.

2602.06717 2026-05-26 cs.LG cs.AI

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

F-GRPO: 别让你的策略学到显而易见的而忘记罕见的

Daniil Plyusov, Alexey Gorbatovski, Boris Shaposhnikov, Viacheslav Sinii, Alexey Malakhov, Daria Korotyshova, Daniil Gavrilov

发表机构 * T-Tech

AI总结 针对强化学习中有限采样组导致罕见正确轨迹被忽略的问题,提出基于Focal loss的难度感知缩放系数F-GRPO,在不增加组大小和计算成本下提升数学推理性能。

详情
AI中文摘要

基于可验证奖励的强化学习通常依赖组采样来估计优势并稳定策略更新。实践中,计算限制往往排除非常大的组,因此训练使用有限的rollout集合,这些集合只能强化它们暴露的正确行为。在实际组大小下,更新可能会遗漏罕见的正确轨迹,同时仍然包含混合奖励,将概率集中在更常见的采样解上。我们推导了这种提示局部尾部遗漏事件作为组大小函数的概率,展示了非单调行为,并在分类抽象中描述了未采样的正确质量如何在总正确质量增长时缩小。受此分析启发,我们提出了一种难度感知缩放系数,灵感来自Focal loss,它降低了高成功采样组的更新权重。经验上,分类模拟在分类设置中展示了相同效果,Maze提供了单解测试,LLM实验包括代表性的GRPO组大小扫描以及GRPO、DAPO和CISPO之间的固定N迁移。在Qwen2.5-7B上,N=8时,我们的方法将平均数学pass@256从64.1提高到70.3(GRPO),69.3提高到72.5(DAPO),73.2提高到76.8(CISPO);在所有三种情况下,OOD pass@256也得到改善,且不增加组大小或计算成本。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) is commonly based on group sampling to estimate advantages and stabilize policy updates. In practice, computational limits often rule out very large groups, so training proceeds with finite rollout sets that can reinforce only the correct behavior they expose. At practical group sizes, updates can miss rare-correct trajectories while still containing mixed rewards, concentrating probability on more common sampled solutions. We derive the probability of such prompt-local tail-miss events as a function of group size, showing non-monotonic behavior, and in the categorical abstraction characterize how unsampled-correct mass can shrink even as total correct mass grows. Motivated by this analysis, we propose a difficulty-aware scaling coefficient, inspired by Focal loss, that down-weights updates on high-success sampled groups. Empirically, categorical simulation illustrates the same effect in the categorical setting, Maze provides a single-solution test, and LLM experiments include a representative GRPO group-size sweep together with fixed-$N$ transfer across GRPO, DAPO, and CISPO. On Qwen2.5-7B at $N{=}8$, our method improves average math pass@256 from 64.1 $\rightarrow$ 70.3 (GRPO), 69.3 $\rightarrow$ 72.5 (DAPO), and 73.2 $\rightarrow$ 76.8 (CISPO); OOD pass@256 also improves in all three cases, without increasing group size or computational cost.

2602.06508 2026-05-26 cs.RO

World-VLA-Loop: Closed-Loop Learning of Video World Model and VLA Policy

World-VLA-Loop: 视频世界模型与VLA策略的闭环学习

Xiaokang Liu, Zechen Bai, Hai Ci, Kevin Yuchen Ma, Mike Zheng Shou

发表机构 * Show Lab, National University of Singapore(新加坡国立大学Show实验室)

AI总结 提出World-VLA-Loop框架,通过状态感知视频世界模型联合预测未来帧和二元奖励,并采用协同进化范式迭代优化VLA策略,减少对真实环境交互的依赖。

Comments 16 pages, 9 figures

详情
AI中文摘要

强化学习(RL)可以超越行为克隆,优化视觉-语言-动作(VLA)策略,但由于需要大量 rollout、重置、监督和安全风险,真实世界的RL仍然昂贵。基于动作条件的视频世界模型提供了在虚拟环境中训练的选项,但它们在精确的动作跟随方面表现不佳,尤其是在细微的接近成功失败情况下。此外,它们缺乏用于RL的原生奖励信号。基于不准确的视觉预测计算奖励仍然不可靠。我们引入了World-VLA-Loop,它围绕两个基础设计和一个更高级别的协同进化范式构建。我们首先策划了SANS,专门混合成功和接近成功的轨迹,以改善动作-结果对齐。然后,我们训练了一个状态感知视频世界模型,该模型从扩散潜变量中联合预测未来帧和二元奖励。它将奖励估计与生成器耦合,而不是单独模块,从而反过来有利于视觉预测。由于RL过程中VLA行为会发生变化,固定的模拟器可能与更新后的策略不对齐,因此World-VLA-Loop通过使用精炼的世界模型进行迭代VLA后训练,同时将每个改进策略的rollout反馈回来增强和微调世界模型,从而形成闭环。在仿真和真实机器人实验中,World-VLA-Loop显著提高了VLA性能,同时减少了对昂贵的物理交互的依赖。

英文摘要

Reinforcement learning (RL) can refine Vision-Language-Action (VLA) policies beyond behavior cloning, but real-world RL remains expensive due to extensive rollouts, resets, supervision, and safety risks. Action-conditioned video world models offer an option to train in virtual environments, yet they exhibit imprecise action following, particularly on subtle near-success failures. Besides, they lack native reward signals for RL. Computing rewards based on inaccurate visual predictions remain unreliable. We introduce World-VLA-Loop, structured around two foundational designs and a higher-level co-evolving paradigm. We first curate SANS, dedicatedly mixing successful and near-success trajectories to improve action-outcome alignment. Then, we train a state-aware video world model that jointly predicts future frames and binary rewards from diffusion latents. It couples reward estimation to the generator rather than a separate module, and in turn, benefits visual prediction. Since VLA behavior shifts during RL, a fixed simulator can misalign with the updated policy, World-VLA-Loop therefore closes the loop by using the refined world model for iterative VLA post-training while feeding rollouts from each improved policy back to augment and fine-tune the world model. Across simulation and real-robot experiments, World-VLA-Loop substantially improves VLA performance while reducing reliance on costly physical interaction.

2602.05052 2026-05-26 cs.LG

Learning, Solving and Optimizing PDEs with TensorGalerkin: an efficient high-performance Galerkin assembly algorithm

使用TensorGalerkin学习、求解和优化PDE:一种高效的高性能Galerkin组装算法

Shizheng Wen, Mingyuan Chi, Tianwei Yu, Ben Moseley, Mike Yan Michelis, Pu Ren, Hao Sun, Siddhartha Mishra

发表机构 * ETH Zurich, Switzerland(苏黎世联邦理工学院,瑞士) Imperial College London, UK(伦敦帝国学院,英国) Northeastern University, USA(东北大学,美国) Renmin University of China, China(中国人民大学,中国)

AI总结 提出基于Galerkin离散化的统一算法框架,通过张量化元素操作和稀疏矩阵乘法实现O(1)图规模的系统组装,高效求解、约束优化和物理信息学习变分PDE。

详情
AI中文摘要

我们提出了一个统一的算法框架,用于具有变分结构的PDE的数值求解、约束优化和物理信息学习。该框架基于底层变分形式的Galerkin离散化,其高效率源于一种新颖的高度优化且兼容GPU的TensorGalerkin框架,用于线性系统组装(刚度矩阵和载荷向量)。TensorGalerkin通过在Python级Map阶段张量化元素操作,然后使用稀疏矩阵乘法进行全局归约,该乘法在网格诱导的稀疏图上执行消息传递。Map和Reduce阶段在PyTorch的autograd内部协同设计,使得组装图包含O(1)个节点,无论元素数量和局部自由度如何缩放。我们通过将TensorGalerkin部署为i)高效的数值PDE求解器,ii)用于PDE约束优化的端到端可微框架,以及iii)用于PDE的物理信息算子学习算法,验证了这种O(1)图属性。通过多个基准测试,包括非结构化网格上的2D和3D椭圆、抛物线和双曲PDE,我们证明了所提出的框架在所有目标下游应用中相比各种基线提供了显著的计算效率和精度提升。

英文摘要

We present a unified algorithmic framework for the numerical solution, constrained optimization, and physics-informed learning of PDEs with a variational structure. Our framework is based on a Galerkin discretization of the underlying variational forms, and its high efficiency stems from a novel highly-optimized and GPU-compliant TensorGalerkin framework for linear system assembly (stiffness matrices and load vectors). TensorGalerkin operates by tensorizing element-wise operations within a Python-level Map stage and then performs global reduction with a sparse matrix multiplication that performs message passing on the mesh-induced sparsity graph. The Map and Reduce stages are co-designed inside PyTorch's autograd so that the assembly graph contains $O(1)$ nodes regardless of how the number of elements and local DoFs scale. We validate this $O(1)$-graph property by deploying TensorGalerkin downstream as i) a highly-efficient numerical PDEs solver, ii) an end-to-end differentiable framework for PDE-constrained optimization, and iii) a physics-informed operator learning algorithm for PDEs. With multiple benchmarks, including 2D and 3D elliptic, parabolic, and hyperbolic PDEs on unstructured meshes, we demonstrate that the proposed framework provides significant computational efficiency and accuracy gains over a variety of baselines in all the targeted downstream applications.

2602.04279 2026-05-26 cs.CL

ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation

ECG-R1: 协议引导且模态无关的可靠心电图解读多模态大语言模型

Jiarui Jin, Haoyu Wang, Xingliang Wu, Xiaocheng Fang, Xiang Lan, Zihan Wang, Deyun Zhang, Bo Liu, Yingying Zhang, Xian Wu, Hongyan Li, Shenda Hong

发表机构 * School of Intelligence Science and Technology, Peking University(北京大学智能科学与技术学院) National Institute of Health Data Science, Peking University(北京大学健康数据科学国家研究院) State Key Laboratory of General Artificial Intelligence, Peking University(北京大学通用人工智能国家重点实验室) Tianjin Institute of Cardiology, the Second Hospital of Tianjin Medical University(天津医科大学第二医院心内科) National University of Singapore(新加坡国立大学) Jarvis Lab, Tencent(腾讯 Jarvis实验室) HeartVoice Medical Technology(HeartVoice医疗科技)

AI总结 提出ECG-R1,通过协议引导数据生成、模态解耦架构和强化学习,实现可靠的心电图解读。

Comments Accepted to ICML 2026

详情
AI中文摘要

心电图(ECG)在临床实践中是一种不可或缺的诊断工具,然而现有的多模态大语言模型(MLLMs)在心电图解读方面仍不可靠,常常产生看似合理但临床错误的解读。为了解决这一问题,我们提出了ECG-R1,这是首个通过三项创新设计用于可靠心电图解读的推理型ECG MLLM。首先,我们利用 extit{协议引导的指令数据生成}构建解读语料库,将解读基于可测量的ECG特征以及专著定义的定量阈值和诊断逻辑。其次,我们提出了一种模态解耦架构,采用 extit{交错模态丢弃},以提高当ECG信号或ECG图像缺失时的鲁棒性和跨模态一致性。第三,我们提出了 extit{带有ECG诊断证据奖励的强化学习},以加强基于证据的ECG解读。此外,我们系统评估了专有、开源和医疗MLLM的心电图解读能力,并首次提供了定量证据表明严重的幻觉普遍存在,这表明公众不应在没有独立验证的情况下直接信任这些输出。代码可在\href{https://github.com/PKUDigitalHealth/ECG-R1}{此处}获取。

英文摘要

Electrocardiography (ECG) serves as an indispensable diagnostic tool in clinical practice, yet existing multimodal large language models (MLLMs) remain unreliable for ECG interpretation, often producing plausible but clinically incorrect analyses. To address this, we propose ECG-R1, the first reasoning ECG MLLM designed for reliable ECG interpretation via three innovations. First, we construct the interpretation corpus using \textit{Protocol-Guided Instruction Data Generation}, grounding interpretation in measurable ECG features and monograph-defined quantitative thresholds and diagnostic logic. Second, we present a modality-decoupled architecture with \textit{Interleaved Modality Dropout} to improve robustness and cross-modal consistency when either the ECG signal or ECG image is missing. Third, we present \textit{Reinforcement Learning with ECG Diagnostic Evidence Rewards} to strengthen evidence-grounded ECG interpretation. Additionally, we systematically evaluate the ECG interpretation capabilities of proprietary, open-source, and medical MLLMs, and provide the first quantitative evidence that severe hallucinations are widespread, suggesting that the public should not directly trust these outputs without independent verification. Code is available at \href{https://github.com/PKUDigitalHealth/ECG-R1}{here}.

2602.04139 2026-05-26 cs.LG physics.comp-ph

Generative Neural Operators through Diffusion Last Layer

通过扩散最后一层的生成式神经算子

Sungwon Park, Anthony Zhou, Hongjoong Kim, Amir Barati Farimani

发表机构 * Korea University, Seoul, South Korea(韩国大学,首尔,韩国) Carnegie Mellon University, Pittsburgh, USA(卡内基梅隆大学,匹兹堡,美国)

AI总结 提出扩散最后一层(DLL)作为神经算子的概率输出头,通过Karhunen-Loéve展开和系数空间的条件扩散模型实现高效分布建模,在随机PDE基准和确定性长时滚动任务中提升了分布保真度和不确定性估计。

Comments ICML 2026, code is available at https://github.com/sungwpark/dll-no

详情
AI中文摘要

神经算子为学习函数空间之间的离散化不变映射提供了强大框架,但标准确定性模型无法捕捉预测不确定性。我们引入了扩散最后一层(DLL),一种用于神经算子主干的模块化概率输出头。DLL通过受Karhunen-Loéve展开启发的输入依赖低秩展开表示目标场,并在相应系数空间上学习条件扩散模型。这种设计使得在保留算子学习结构优势的同时实现高效的分布建模。在具有随机强迫的随机PDE基准测试中,DLL实现了强分布保真度,并与像素空间和传统潜在扩散基线竞争。在确定性长时滚动任务中,DLL提高了底层主干的滚动稳定性,并在复合自回归误差下提供了有用的预测不确定性估计。这些结果表明,在学习到的系数空间中进行扩散建模为不确定性感知神经算子提供了一条实用途径。

英文摘要

Neural operators provide a powerful framework for learning discretization invariant mappings between function spaces, but standard deterministic models do not capture predictive uncertainty. We introduce diffusion last layer (DLL), a modular probabilistic output head for neural operator backbones. DLL represents target fields through an input dependent low rank expansion inspired by the Karhunen-Loéve expansion and learns a conditional diffusion model over the corresponding coefficient space. This design enables efficient distributional modeling while preserving the structural advantages of operator learning. On stochastic PDE benchmarks with random forcing, DLL achieves strong distributional fidelity and performs competitively with pixel space and conventional latent diffusion baselines. In deterministic long horizon rollout tasks, DLL improves rollout stability over the underlying backbone and provides useful estimates of predictive uncertainty under compounding autoregressive errors. These results suggest that diffusion modeling in learned coefficient spaces offers a practical route to uncertainty aware neural operators.

2602.02979 2026-05-26 cs.CL cs.LG

CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning

CPMobius: 无数据强化学习的迭代式教练-玩家推理

Ran Li, Zeyuan Liu, Yinghao Chen, Bingxiang He, Jiarui Yuan, Zixuan Fu, Weize Chen, Jinyi Hu, Chen Qian, Zhiyuan Liu, Maosong Sun

发表机构 * Tsinghua University(清华大学) University of Cambridge(剑桥大学) Shanghai Jiao Tong University(上海交通大学)

AI总结 提出CPMobius协作式教练-玩家范式,通过无外部数据的合作优化循环提升数学推理能力,在Qwen2.5-Math-7B-Instruct上总体准确率提升4.9%,OOD准确率提升5.4%。

Comments Accepted to the ICML 2026

详情
AI中文摘要

大型语言模型(LLMs)在复杂推理方面展现出强大潜力,但其进展仍从根本上受限于对大规模高质量人工策划任务和标签的依赖,无论是通过监督微调(SFT)还是基于推理特定数据的强化学习(RL)。这种依赖使得监督密集型训练范式日益不可持续,实践中已出现可扩展性减弱的迹象。为克服这一限制,我们引入了CPMöbius(CPMobius),一种用于推理模型无数据强化学习的协作式教练-玩家范式。与传统对抗性自博弈不同,CPMöbius受现实世界人类体育协作和多智能体协作启发,将教练和玩家视为独立但合作的角色。教练针对玩家的能力提出指令,并根据玩家表现的变化获得奖励,而玩家则因解决教练生成的越来越有指导性的任务而获得奖励。这种合作优化循环旨在直接提升玩家的数学推理能力。值得注意的是,CPMöbius在不依赖任何外部训练数据的情况下实现了显著改进,优于现有的无监督方法。例如,在Qwen2.5-Math-7B-Instruct上,我们的方法总体准确率平均提升4.9%,分布外(OOD)准确率平均提升5.4%,总体准确率超过RENT 1.5%,OOD准确率超过R-zero 4.2%。我们的代码库已在https://github.com/thunlp/CPMobius发布。

英文摘要

Large Language Models (LLMs) have demonstrated strong potential in complex reasoning, yet their progress remains fundamentally constrained by reliance on massive high-quality human-curated tasks and labels, either through supervised fine-tuning (SFT) or reinforcement learning (RL) on reasoning-specific data. This dependence renders supervision-heavy training paradigms increasingly unsustainable, with signs of diminishing scalability already evident in practice. To overcome this limitation, we introduce CPMöbius (CPMobius), a collaborative Coach-Player paradigm for data-free reinforcement learning of reasoning models. Unlike traditional adversarial self-play, CPMöbius, inspired by real world human sports collaboration and multi-agent collaboration, treats the Coach and Player as independent but cooperative roles. The Coach proposes instructions targeted at the Player's capability and receives rewards based on changes in the Player's performance, while the Player is rewarded for solving the increasingly instructive tasks generated by the Coach. This cooperative optimization loop is designed to directly enhance the Player's mathematical reasoning ability. Remarkably, CPMöbius achieves substantial improvement without relying on any external training data, outperforming existing unsupervised approaches. For example, on Qwen2.5-Math-7B-Instruct, our method improves accuracy by an overall average of +4.9 and an out-of-distribution average of +5.4, exceeding RENT by +1.5 on overall accuracy and R-zero by +4.2 on OOD accuracy. Our codebase has been released at https://github.com/thunlp/CPMobius.

2602.02495 2026-05-26 cs.CL cs.AI cs.LG

Reward-free Alignment for Conflicting Objectives

无奖励的冲突目标对齐

Peter Chen, Xiaopeng Li, Xi Chen, Tianyi Lin

发表机构 * Columbia University(哥伦比亚大学)

AI总结 提出RACO框架,通过冲突规避梯度下降的裁剪变体直接利用成对偏好数据解决多目标冲突,实现帕累托最优对齐。

Comments Accepted to ICML 2026 (Oral)

详情
AI中文摘要

直接对齐方法越来越多地用于将大型语言模型(LLMs)与人类偏好对齐。然而,许多现实世界的对齐问题涉及多个相互冲突的目标,简单的偏好聚合可能导致训练不稳定和糟糕的权衡。特别是,加权损失方法可能无法识别同时改善所有目标的更新方向,而现有的多目标方法通常依赖显式奖励模型,增加了额外复杂性并扭曲了用户指定的偏好。本文的贡献有两方面。首先,我们提出了一种用于冲突目标的无奖励对齐框架(RACO),该框架直接利用成对偏好数据,并通过一种新颖的冲突规避梯度下降的裁剪变体解决梯度冲突。我们提供了收敛到尊重用户指定目标权重的帕累托临界点的保证,并进一步证明在双目标设置中裁剪可以严格改善收敛速度。其次,我们使用一些启发式方法改进了我们的方法,并进行了实验,以证明所提框架在LLM对齐中的兼容性。在多个LLM家族(Qwen 3、Llama 3、Gemma 3)上的多目标摘要和安全对齐任务的定性和定量评估表明,与现有的多目标对齐基线相比,我们的方法始终能实现更好的帕累托权衡。

英文摘要

Direct alignment methods are increasingly used to align large language models (LLMs) with human preferences. However, many real-world alignment problems involve multiple conflicting objectives, where naive aggregation of preferences can lead to unstable training and poor trade-offs. In particular, weighted loss methods may fail to identify update directions that simultaneously improve all objectives, and existing multi-objective approaches often rely on explicit reward models, introducing additional complexity and distorting user-specified preferences. The contributions of this paper are two-fold. First, we propose a Reward-free Alignment framework for Conflicted Objectives (RACO) that directly leverages pairwise preference data and resolves gradient conflicts via a novel clipped variant of conflict-averse gradient descent. We provide convergence guarantees to Pareto-critical points that respect user-specified objective weights, and further show that clipping can strictly improve convergence rate in the two-objective setting. Second, we improve our method using some heuristics and conduct experiments to demonstrate the compatibility of the proposed framework for LLM alignment. Both qualitative and quantitative evaluations on multi-objective summarization and safety alignment tasks across multiple LLM families (Qwen 3, Llama 3, Gemma 3) show that our method consistently achieves better Pareto trade-offs compared to existing multi-objective alignment baselines.

2602.01322 2026-05-26 cs.LG cs.CL

PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding

PolySAE: 通过多项式解码建模稀疏自编码器中的特征交互

Panagiotis Koromilas, Andreas D. Demou, James Oldfield, Yannis Panagakis, Mihalis Nicolaou

发表机构 * The Cyprus Institute(塞浦路斯研究所) University of Athens(雅典大学) University of Oxford(牛津大学) Archimedes AI/Athena Research Center(阿基米德AI/雅典娜研究中心) University of Cyprus(塞浦路斯大学)

AI总结 提出PolySAE,在稀疏自编码器解码器中引入高阶项以建模特征交互,通过低秩张量分解在共享投影子空间上捕获成对和三元特征交互,在保持可解释性的同时提升探测F1约8%,并产生与共现频率无关的组合结构。

Comments 43rd International Conference on Machine Learning (ICML 2026); Code: https://github.com/pakoromilas/PolySAE

详情
AI中文摘要

稀疏自编码器(SAE)通过将激活分解为字典原子的稀疏组合来解释神经网络表示。然而,SAE假设特征通过线性重建相加组合,这种假设无法捕捉组合结构:线性模型无法区分“Starbucks”是由“star”和“coffee”特征的组合还是仅由它们的共现产生。这迫使SAE为复合概念分配整体特征,而不是将其分解为可解释的组成部分。我们引入了PolySAE,它通过高阶项扩展SAE解码器以建模特征交互,同时保留对可解释性至关重要的线性编码器。通过在共享投影子空间上进行低秩张量分解,PolySAE以较小的参数开销(GPT2上为3%)捕获成对和三元特征交互。在四个语言模型和三个SAE变体上,PolySAE在保持可比重建误差的同时,探测F1平均提升约8%,并产生类别条件特征分布之间2-10倍更大的Wasserstein距离。关键的是,学习到的交互权重与共现频率的相关性可忽略不计(r = 0.06,而SAE特征协方差为r = 0.82),表明多项式项捕获了很大程度上独立于表面统计的组合结构。最后,学习到的交互方向因果性地将模型输出引导向相应的组合语义。

英文摘要

Sparse autoencoders (SAEs) interpret neural network representations by decomposing activations into sparse combinations of dictionary atoms. However, SAEs assume features combine additively through linear reconstruction, an assumption that cannot capture compositional structure: linear models cannot distinguish whether ''Starbucks'' arises from the composition of ''star'' and ''coffee'' features or merely their co-occurrence. This forces SAEs to allocate monolithic features for compound concepts rather than decomposing them into interpretable constituents. We introduce PolySAE, which extends the SAE decoder with higher-order terms to model feature interactions while preserving the linear encoder essential for interpretability. Through low-rank tensor factorization on a shared projection subspace, PolySAE captures pairwise and triple feature interactions with small parameter overhead (3% on GPT2). Across four language models and three SAE variants, PolySAE achieves an average improvement of $\sim$8% in probing F1 while maintaining comparable reconstruction error, and produces 2--10$\times$ larger Wasserstein distances between class-conditional feature distributions. Critically, learned interaction weights exhibit negligible correlation with co-occurrence frequency ($r = 0.06$ vs $r = 0.82$ for SAE feature covariance), suggesting that polynomial terms capture compositional structure largely independent of surface statistics. Finally, the learned interaction directions causally steer model outputs toward the corresponding compositional semantics.

2602.01183 2026-05-26 cs.CV cs.LG

Refining Context-Entangled Content Segmentation via Curriculum Selection and Anti-Curriculum Promotion

通过课程选择与反课程促进优化上下文纠缠内容分割

Chunming He, Rihan Zhang, Fengyang Xiao, Dingming Zhang, Zhiwen Cao, Sina Farsiu

发表机构 * Duke University(杜克大学) Adobe(Adobe公司)

AI总结 提出CurriSeg双阶段学习框架,结合课程学习与反课程学习原理,通过动态数据选择与频谱盲性微调提升上下文纠缠内容分割的鲁棒性和泛化能力。

Comments ICML 2026, 8 figures, 11 tables

详情
AI中文摘要

生物学习从简单到困难的任务逐步进行,逐渐增强感知和鲁棒性。受此原理启发,我们解决上下文纠缠内容分割(CECS)这一具有挑战性的场景,其中对象与周围环境共享内在视觉模式,如伪装目标检测。传统分割网络主要依赖架构增强,但往往忽略了在纠缠数据分布下控制鲁棒性的学习动态。我们引入CurriSeg,一个双阶段学习框架,统一了课程和反课程原则以提高表示可靠性。在课程选择阶段,CurriSeg基于样本损失的时间统计动态选择训练数据,区分困难但有信息的样本与噪声或模糊样本,从而实现稳定的能力增强。在反课程促进阶段,我们设计了频谱盲性微调,抑制高频成分以强制依赖低频结构和上下文线索,从而增强泛化能力。大量实验表明,CurriSeg在多种CECS基准上取得了一致的改进,无需增加参数或增加总训练时间,为进展与挑战如何相互作用以促进鲁棒且上下文感知的分割提供了原则性视角。代码将发布。

英文摘要

Biological learning proceeds from easy to difficult tasks, gradually reinforcing perception and robustness. Inspired by this principle, we address Context-Entangled Content Segmentation (CECS), a challenging setting where objects share intrinsic visual patterns with their surroundings, as in camouflaged object detection. Conventional segmentation networks predominantly rely on architectural enhancements but often ignore the learning dynamics that govern robustness under entangled data distributions. We introduce CurriSeg, a dual-phase learning framework that unifies curriculum and anti-curriculum principles to improve representation reliability. In the Curriculum Selection phase, CurriSeg dynamically selects training data based on the temporal statistics of sample losses, distinguishing hard-but-informative samples from noisy or ambiguous ones, thus enabling stable capability enhancement. In the Anti-Curriculum Promotion phase, we design Spectral-Blindness Fine-Tuning, which suppresses high-frequency components to enforce dependence on low-frequency structural and contextual cues and thus strengthens generalization. Extensive experiments demonstrate that CurriSeg achieves consistent improvements across diverse CECS benchmarks without adding parameters or increasing total training time, offering a principled view of how progression and challenge interplay to foster robust and context-aware segmentation. Code will be released.

2601.22466 2026-05-26 cs.LG

EvoEGF-Mol: Evolving Exponential Geodesic Flow for Structure-based Drug Design

EvoEGF-Mol:用于基于结构的药物设计的演化指数测地流

Yaowei Jin, Junjie Wang, Cheng Cao, Penglei Wang, Duo An, Qian Shi

发表机构 * Lingang Laboratory(Lingang 实验室) School of Information Science(信息科学学院) Technology, ShanghaiTech University(技术,上海科技大学)

AI总结 针对基于结构的药物设计中欧几里得空间与概率空间不匹配的问题,提出EvoEGF-Mol模型,通过复合指数族分布和演化指数测地流统一表示分子,实现高几何精度和相互作用保真度。

Comments Accepted to ICML 2026

详情
AI中文摘要

基于结构的药物设计(SBDD)旨在发现生物活性配体。传统方法在欧几里得空间和概率空间中分别构建连续原子坐标和离散化学类别的概率路径,导致与底层统计流形不匹配。我们通过使用复合指数族分布来表示分子来解决这个问题,其中坐标和类别在统一的自然参数空间中表示,并在Fisher-Rao度量下沿指数测地线同步演化。为了避免直接针对狄拉克分布的测地线导致的瞬时轨迹崩溃,我们提出了用于SBDD的演化指数测地流(EvoEGF-Mol),该方法用动态集中的分布替代静态狄拉克目标,并通过渐进参数细化架构进行训练。我们的模型在CrossDock上达到了参考级别的PoseBusters通过率(93.4%),展示了卓越的几何精度和相互作用保真度,同时在真实世界的MolGenBench任务中,在生物活性骨架恢复方面取得了优于基线方法的性能。代码可在https://github.com/BLEACH366/EvoEGF-Mol获取。

英文摘要

Structure-Based Drug Design (SBDD) aims to discover bioactive ligands. Conventional approaches construct probability paths separately in Euclidean and probabilistic spaces for continuous atomic coordinates and discrete chemical categories, leading to a mismatch with the underlying statistical manifolds. We address this issue by representing molecules using composite exponential-family distributions, where coordinates and categories are represented within a unified natural parameter space to evolve synchronously along exponential geodesics under the Fisher-Rao metric. To avoid the instantaneous trajectory collapse induced by geodesics directly targeting Dirac distributions, we propose Evolving Exponential Geodesic Flow for SBDD (EvoEGF-Mol), which replaces static Dirac targets with dynamically concentrating distributions and is trained with a progressive-parameter-refinement architecture. Our model approaches a reference-level PoseBusters passing rate (93.4%) on CrossDock, demonstrating remarkable geometric precision and interaction fidelity, while achieving superior performance over baseline methods on real-world MolGenBench tasks for bioactive scaffold recovery. Code is available at https://github.com/BLEACH366/EvoEGF-Mol.

2601.21406 2026-05-26 cs.CV cs.LG

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

通过多表示生成增强统一多模态模型的理解能力

Zihan Su, Hongyang Wei, Kangrui Cen, Yong Wang, Guanhua Chen, Chun Yuan, Xiangxiang Chu

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University(清华大学深圳国际研究生院,清华大学) AMAP, Alibaba Group(阿里妈妈,阿里巴巴集团) Shanghai Jiao Tong University(上海交通大学) Southern University of Science and Technology(南方科技大学)

AI总结 提出UniMRG方法,通过辅助生成像素、深度和分割等多重表示,增强统一多模态模型的理解能力,减少幻觉并提升空间理解。

Comments Code: https://github.com/Sugewud/UniMRG

详情
AI中文摘要

统一多模态模型(UMMs)在单一框架内整合了视觉理解和生成。其最终目标是创建一个理解和生成相互促进的循环。虽然最近的后训练方法成功利用理解来增强生成,但利用生成来改善理解的逆向方向仍基本未被探索。在这项工作中,我们提出了UniMRG(统一多表示生成),一种简单而有效的架构无关的后训练方法。UniMRG通过引入辅助生成任务来增强UMMs的理解能力。具体来说,我们训练UMMs生成输入图像的多种内在表示,即像素(重建)、深度(几何)和分割(结构),同时进行标准的视觉理解目标。通过综合这些多样化的表示,UMMs捕获关于外观、空间关系和结构布局的互补信息。因此,UMMs对视觉输入形成了更深入和全面的理解。跨多种UMM架构的大量实验表明,我们的方法显著增强了细粒度感知,减少了幻觉,并改善了空间理解,同时提升了生成能力。

英文摘要

Unified Multimodal Models (UMMs) integrate both visual understanding and generation within a single framework. Their ultimate aspiration is to create a cycle where understanding and generation mutually reinforce each other. While recent post-training methods have successfully leveraged understanding to enhance generation, the reverse direction of utilizing generation to improve understanding remains largely unexplored. In this work, we propose UniMRG (Unified Multi-Representation Generation), a simple yet effective architecture-agnostic post-training method. UniMRG enhances the understanding capabilities of UMMs by incorporating auxiliary generation tasks. Specifically, we train UMMs to generate multiple intrinsic representations of input images, namely pixel (reconstruction), depth (geometry), and segmentation (structure), alongside standard visual understanding objectives. By synthesizing these diverse representations, UMMs capture complementary information regarding appearance, spatial relations, and structural layout. Consequently, UMMs develop a deeper and more comprehensive understanding of visual inputs. Extensive experiments across diverse UMM architectures demonstrate that our method notably enhances fine-grained perception, reduces hallucinations, and improves spatial understanding, while simultaneously boosting generation capabilities.

2601.21094 2026-05-26 cs.LG cs.AI cs.SY eess.SY

Safety Generalization Under Distribution Shift in Safe Reinforcement Learning: A Diabetes Testbed

安全强化学习中的分布偏移下的安全泛化:一个糖尿病测试平台

Minjae Kwon, Josephine Lamp, Lu Feng

发表机构 * Department of Computer Science, University of Virginia(弗吉尼亚大学计算机科学系)

AI总结 研究安全强化学习算法在分布偏移下训练时安全保证能否迁移到部署中,使用糖尿病管理作为测试平台,发现安全泛化差距并通过测试时屏蔽有效恢复安全性。

Comments Accepted at ICML 2026. Camera-ready version

详情
AI中文摘要

安全强化学习算法通常在固定的训练条件下进行评估。我们使用糖尿病管理作为安全关键测试平台,研究训练时的安全保证是否能在分布偏移下迁移到部署中。我们在统一的临床模拟器上对安全强化学习算法进行基准测试,并揭示了一个安全泛化差距:在训练期间满足约束的策略经常在未见过的患者身上违反安全要求。我们证明,测试时屏蔽(使用学习到的动力学模型过滤不安全动作)能有效恢复跨算法和患者群体的安全性。在八种安全强化学习算法、三种糖尿病类型和三个年龄组中,屏蔽使得PPO-Lag和CPO等强基线的血糖达标时间范围提高了13-14%,同时降低了临床风险指数和血糖变异性。我们的模拟器和基准测试为研究安全关键控制领域中分布偏移下的安全性提供了一个平台。代码可在https://github.com/safe-autonomy-lab/GlucoSim 和 https://github.com/safe-autonomy-lab/GlucoAlg 获取。

英文摘要

Safe Reinforcement Learning (RL) algorithms are typically evaluated under fixed training conditions. We investigate whether training-time safety guarantees transfer to deployment under distribution shift, using diabetes management as a safety-critical testbed. We benchmark safe RL algorithms on a unified clinical simulator and reveal a safety generalization gap: policies satisfying constraints during training frequently violate safety requirements on unseen patients. We demonstrate that test-time shielding, which filters unsafe actions using learned dynamics models, effectively restores safety across algorithms and patient populations. Across eight safe RL algorithms, three diabetes types, and three age groups, shielding achieves Time-in-Range gains of 13--14\% for strong baselines such as PPO-Lag and CPO while reducing clinical risk index and glucose variability. Our simulator and benchmark provide a platform for studying safety under distribution shift in safety-critical control domains. Code is available at https://github.com/safe-autonomy-lab/GlucoSim and https://github.com/safe-autonomy-lab/GlucoAlg.

2601.18597 2026-05-26 cs.CV

EFSI-DETR: Efficient Frequency-Semantic Integration for Real-Time Small Object Detection in UAV Imagery

EFSI-DETR:面向无人机图像实时小目标检测的高效频率-语义集成

Yu Xia, Chang Liu, Tianqi Xiang, Zhigang Tu

发表机构 * State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing(信息工程测绘遥感国家重点实验室) Wuhan University(武汉大学) Wuhan University Shenzhen Research Institute(武汉大学深圳研究院) School of Computer Science(计算机学院) School of Automation Science and Engineering(自动化科学与工程学院) South China University of Technology(华南理工大学)

AI总结 提出EFSI-DETR框架,通过动态频率-空间统一协同网络和高效语义特征集中器,实现无人机图像中实时小目标检测的先进性能。

详情
AI中文摘要

由于有限的特征表示和无效的多尺度融合,无人机图像中的实时小目标检测仍然具有挑战性。现有方法未充分利用频率信息并依赖静态卷积操作,限制了获取丰富特征表示的能力,并阻碍了深层语义特征的有效利用。为解决这些问题,我们提出EFSI-DETR,一种新颖的检测框架,将高效语义特征增强与动态频率-空间引导相结合。EFSI-DETR包含两个主要组件:(1) 动态频率-空间统一协同网络(DyFusNet),联合利用频率和空间线索进行鲁棒的多尺度特征融合;(2) 高效语义特征集中器(ESFC),以最小计算成本实现深层语义提取。此外,采用细粒度特征保留(FFR)策略,在融合过程中纳入空间丰富的浅层特征,以保留对无人机图像中小目标检测至关重要的细粒度细节。在VisDrone和CODrone基准上的大量实验表明,我们的EFSI-DETR以实时效率实现了最先进的性能,在VisDrone上AP和AP_s分别提升了 extbf{1.6}\%和 extbf{5.8}\%,同时在单个RTX 4090 GPU上获得 extbf{188} FPS的推理速度。

英文摘要

Real-time small object detection in Unmanned Aerial Vehicle (UAV) imagery remains challenging due to limited feature representation and ineffective multi-scale fusion. Existing methods underutilize frequency information and rely on static convolutional operations, which constrain the capacity to obtain rich feature representations and hinder the effective exploitation of deep semantic features. To address these issues, we propose EFSI-DETR, a novel detection framework that integrates efficient semantic feature enhancement with dynamic frequency-spatial guidance. EFSI-DETR comprises two main components: (1) a Dynamic Frequency-Spatial Unified Synergy Network (DyFusNet) that jointly exploits frequency and spatial cues for robust multi-scale feature fusion, (2) an Efficient Semantic Feature Concentrator (ESFC) that enables deep semantic extraction with minimal computational cost. Furthermore, a Fine-grained Feature Retention (FFR) strategy is adopted to incorporate spatially rich shallow features during fusion to preserve fine-grained details, crucial for small object detection in UAV imagery. Extensive experiments on VisDrone and CODrone benchmarks demonstrate that our EFSI-DETR achieves the state-of-the-art performance with real-time efficiency, yielding improvement of \textbf{1.6}\% and \textbf{5.8}\% in AP and AP$_{s}$ on VisDrone, while obtaining \textbf{188} FPS inference speed on a single RTX 4090 GPU.

2601.18135 2026-05-26 cs.CV

Forward Consistency Learning with Gated Context Aggregation for Video Anomaly Detection

基于门控上下文聚合的前向一致性学习用于视频异常检测

Jiahao Lyu, Minghua Zhao, Xuewen Huang, Yifei Chen, Shuangli Du, Jing Hu, Cheng Shi, Zhiyong Lv

发表机构 * Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi’an University of Technology(陕西网络计算与安全技术重点实验室,西安理工大学计算机科学与工程学院) School of Cyber Science and Engineering, Xi’an Jiaotong University(网络安全与工程学院,西安交通大学)

AI总结 提出轻量级FoGA模型,通过前向一致性学习和门控上下文聚合,在资源受限设备上实现高效视频异常检测,性能优于现有方法且速度达155 FPS。

Comments It has been submitted to the KBS journal

Journal ref Knowledge-Based Systems 2026

详情
AI中文摘要

作为公共安全的关键要素,视频异常检测(VAD)旨在实时监控系统中衡量各种事件与正常模式的偏差。然而,现有大多数VAD方法依赖大规模模型追求极端精度,限制了其在资源受限边缘设备上的可行性。此外,主流基于预测的VAD仅利用单帧未来预测误差检测异常,忽略了更长时域前向信息的更丰富约束。本文提出FoGA,一种轻量级VAD模型,执行基于门控上下文聚合的前向一致性学习,包含约2M参数,专为潜在边缘设备设计。具体而言,我们提出一种基于Unet的方法,对连续帧进行特征提取以生成即时预测和前向预测。然后,我们在跳跃连接中引入门控上下文聚合模块,动态融合相同空间尺度下的编码器和解码器特征。最后,模型通过新颖的前向一致性损失联合优化,并采用混合异常测量策略整合即时帧和前向帧的误差以实现更准确检测。大量实验证明了所提方法的有效性,其显著优于最先进的竞争方法,运行速度高达155 FPS。因此,我们的FoGA在性能与效率指标之间实现了出色的权衡。

英文摘要

As a crucial element of public security, video anomaly detection (VAD) aims to measure deviations from normal patterns for various events in real-time surveillance systems. However, most existing VAD methods rely on large-scale models to pursue extreme accuracy, limiting their feasibility on resource-limited edge devices. Moreover, mainstream prediction-based VAD detects anomalies using only single-frame future prediction errors, overlooking the richer constraints from longer-term temporal forward information. In this paper, we introduce FoGA, a lightweight VAD model that performs Forward consistency learning with Gated context Aggregation, containing about 2M parameters and tailored for potential edge devices. Specifically, we propose a Unet-based method that performs feature extraction on consecutive frames to generate both immediate and forward predictions. Then, we introduce a gated context aggregation module into the skip connections to dynamically fuse encoder and decoder features at the same spatial scale. Finally, the model is jointly optimized with a novel forward consistency loss, and a hybrid anomaly measurement strategy is adopted to integrate errors from both immediate and forward frames for more accurate detection. Extensive experiments demonstrate the effectiveness of the proposed method, which substantially outperforms state-of-the-art competing methods, running up to 155 FPS. Hence, our FoGA achieves an excellent trade-off between performance and the efficiency metric.

2601.14249 2026-05-26 cs.CL

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

哪种推理轨迹能更好地教会学生推理?一个信息对齐的简单度量

Yuming Yang, Mingyoung Lai, Wanxu Zhao, Xiaoran Fan, Zhiheng Xi, Mingqi Wu, Chiyue Huang, Jun Zhao, Haijun Lv, Jian Tong, Yunhua Zhou, Yicheng Zou, Qipeng Guo, Tao Gui, Qi Zhang, Xuanjing Huang

发表机构 * Fudan University(复旦大学) Shanghai AI Laboratory(上海人工智能实验室) University of Toronto(多伦多大学) University of Sydney(悉尼大学)

AI总结 提出Rank-Surprisal Ratio (RSR)度量,通过结合对齐性和信息性评估推理轨迹对学生模型的适用性,在轨迹选择和教师选择中显著优于现有方法。

Comments Accepted to ACL 2026 (Main Conference). 31 pages. Project page: https://github.com/UmeanNever/RankSurprisalRatio

详情
AI中文摘要

长链思维(CoT)轨迹为从教师到学生大语言模型的推理蒸馏提供了丰富的监督信号。然而,先前工作和我们的实验均表明,来自更强教师的轨迹并不一定能产生更好的学生,凸显了蒸馏中数据-学生适配性的重要性。现有方法主要通过学生似然评估适配性,倾向于选择与学生模型当前行为高度一致的轨迹,但忽略了更具信息性的轨迹。针对这一问题,我们提出Rank-Surprisal Ratio (RSR),一个简单的度量,同时捕捉对齐性和信息性以评估推理轨迹的适用性。RSR的动机源于有效轨迹通常通过结合低绝对概率和相对高排名的token(在学生模型下)来平衡学习信号强度和行为对齐。具体而言,RSR定义为轨迹的平均token级排名与其平均负对数似然之比,计算和解释直观。在五个学生模型和来自11个不同教师的推理轨迹上,RSR与训练后推理性能强相关(平均Spearman 0.86),持续优于现有度量。我们进一步展示了其在轨迹选择和教师选择中的实际效用。

英文摘要

Long chain-of-thought (CoT) trajectories provide rich supervision signals for distilling reasoning from teacher to student LLMs. However, both prior work and our experiments show that trajectories from stronger teachers do not necessarily yield better students, highlighting the importance of data-student suitability in distillation. Existing methods assess suitability primarily through student likelihood, favoring trajectories that align closely with the student model's current behavior but overlooking more informative ones. Addressing this, we propose Rank-Surprisal Ratio (RSR), a simple metric that captures both alignment and informativeness to assess the suitability of a reasoning trajectory. RSR is motivated by the observation that effective trajectories typically balance learning signal strength and behavioral alignment by combining low absolute probability with relatively high-ranked tokens under the student model. Concretely, RSR is defined as the ratio of a trajectory's average token-wise rank to its average negative log-likelihood, and is straightforward to compute and interpret. Across five student models and reasoning trajectories from 11 diverse teachers, RSR strongly correlates with post-training reasoning performance (average Spearman 0.86), consistently outperforming existing metrics. We further demonstrate its practical utility in both trajectory selection and teacher selection.

2601.05613 2026-05-26 cs.LG cs.AI

PiXTime: A Model for Federated Time Series Forecasting with Heterogeneous Data across Nodes

PiXTime: 一种跨节点异构数据联邦时间序列预测模型

Yiming Zhou, Jiahao Wang, Mingyue Cheng, Hao Wang, Defu Lian, Enhong Chen

发表机构 * University of Science and Technology of China(科学技术大学)

AI总结 提出基于Transformer的PiXTime框架,通过参数解耦架构(局部个性化模块+全局共享骨干)处理异构时间序列,实现联邦学习中的异构数据预测,并在多个基准上达到最优性能。

详情
AI中文摘要

虽然对分布式时间序列进行协同预测非常理想,但由于数据共享限制,直接合并局部数据集通常不可行。联邦学习提供了一种有前景的替代方案,但传统的联邦学习算法要求同构模型架构,这与去中心化节点中常见的结构差异(如时间分辨率不对齐、变量通道不匹配)不兼容。为弥合这一差距,我们引入了PiXTime,一种新颖的基于Transformer的框架,旨在原生适应并利用结构异构的时间数据。其核心采用参数解耦架构,将模型策略性地划分为局部个性化模块和全局聚合共享骨干。具体而言,节点特定的局部模块作为维度适配器,将不同长度的原始序列投影到统一表示空间。同时,全局同步的VE表将一致的类别标识注入特征空间,使共享骨干能够跨不一致的变量分布协同学习并泛化表示。在多个基准上的全面评估表明,PiXTime在异构联邦环境中实现了最先进的性能,同时在标准同构和集中式预测设置中保持强大的优势。

英文摘要

While collaborative forecasting on distributed time series is highly desirable, directly pooling localized datasets is often impractical due to data sharing constraints. Federated learning offers a promising alternative, yet conventional federated learning algorithms require homogeneous model architectures, which are incompatible with the structural discrepancies, such as unaligned temporal resolutions and mismatched variable channels, commonly observed across decentralized nodes. To bridge this gap, we introduce PiXTime, a novel Transformer-based framework designed to natively accommodate and leverage structurally heterogeneous temporal data. At its core, PiXTime adopts a parameter-decoupling architecture, strategically partitioning the model into localized personalized modules and a globally aggregated shared backbone. Specifically, node-specific local modules act as dimensional adapters, projecting raw sequences of diverse lengths into a unified representation space. Concurrently, a globally synchronized VE Table injects consistent categorical identities into the feature space, allowing the shared backbone to collaboratively learn and generalize representations across inconsistent variable distributions. Comprehensive evaluations on multiple benchmarks demonstrate that PiXTime achieves state-of-the-art performance in heterogeneous federated environments, while maintaining robust superiority in standard homogeneous and centralized forecasting settings.

2601.05483 2026-05-26 cs.AI

MMUEChange: A Generalized LLM Agent Framework for Intelligent Multi-Modal Urban Environment Change Analysis

MMUEChange:面向智能多模态城市环境变化分析的通用LLM智能体框架

Zixuan Xiao, Jun Ma, Siwei Zhang

发表机构 * Department of Urban Planning and Design, The University of Hong Kong(香港大学城市规划与设计系)

AI总结 提出MMUEChange多模态智能体框架,通过模块化工具包和模态控制器实现异构城市数据灵活集成与跨模态对齐,在三个城市案例中任务成功率提升46.7%并有效缓解幻觉。

Journal ref Applied Soft Computing 190 (2026) 114576

详情
AI中文摘要

理解城市环境变化对于可持续发展至关重要。然而,当前方法,特别是遥感变化检测,通常依赖于刚性的单模态分析。为克服这些限制,我们提出MMUEChange,一个多模态智能体框架,通过模块化工具包和核心模块——模态控制器实现跨模态和模态内对齐,灵活集成异构城市数据,从而支持对复杂城市变化场景的稳健分析。案例研究包括:纽约向小型社区公园的转变,反映了当地的绿地建设努力;香港各区集中水污染的扩散,指向协调的水管理;深圳露天垃圾场的显著减少,以及夜间经济活动与垃圾类型之间的对比关联,表明生活垃圾和建筑垃圾背后不同的城市压力。与性能最佳的基线相比,MMUEChange智能体在任务成功率上提升了46.7%,并有效缓解了幻觉,展示了其支持具有实际政策影响的复杂城市变化分析任务的能力。

英文摘要

Understanding urban environment change is essential for sustainable development. However, current approaches, particularly remote sensing change detection, often rely on rigid, single-modal analysis. To overcome these limitations, we propose MMUEChange, a multi-modal agent framework that flexibly integrates heterogeneous urban data via a modular toolkit and a core module, Modality Controller for cross- and intra-modal alignment, enabling robust analysis of complex urban change scenarios. Case studies include: a shift toward small, community-focused parks in New York, reflecting local green space efforts; the spread of concentrated water pollution across districts in Hong Kong, pointing to coordinated water management; and a notable decline in open dumpsites in Shenzhen, with contrasting links between nighttime economic activity and waste types, indicating differing urban pressures behind domestic and construction waste. Compared to the best-performing baseline, the MMUEChange agent achieves a 46.7% improvement in task success rate and effectively mitigates hallucination, demonstrating its capacity to support complex urban change analysis tasks with real-world policy implications.

2601.03790 2026-05-26 cs.CL cs.AI

NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning

NeoAMT: 基于强化学习的新词感知智能机器翻译

Zhongtao Miao, Kaiyan Zhao, Masaaki Nagata, Yoshimasa Tsuruoka

发表机构 * The University of Tokyo(东京大学) NTT Communication Science Laboratories, NTT, Inc.(NTT通信科学实验室,NTT公司)

AI总结 提出NeoAMT框架,利用基于Wiktionary的搜索工具和强化学习训练翻译智能体,以提升包含新词的源句翻译质量。

Comments ACL 2026 Main. Fixed minor typos

详情
AI中文摘要

新词感知机器翻译旨在将包含新词的源句翻译成目标语言。与通用机器翻译相比,该领域仍未被充分探索。本文提出一个智能体框架NeoAMT,用于新词感知机器翻译,配备基于Wiktionary的搜索工具。具体而言,我们首先构建了一个专门用于新词感知机器翻译的数据集,并建立了一个基于Wiktionary的搜索工具。该数据集涵盖16种语言和75个翻译方向,源自约1000万条英文Wiktionary转储记录。搜索工具的检索语料库也来自同一转储中约300万条清洗后的记录。然后,我们利用该数据集和工具,通过强化学习训练翻译智能体,并评估新词感知机器翻译的准确性。此外,我们提出了一个强化学习训练框架,具有新颖的奖励设计和自适应展开生成策略,利用翻译难度进一步提高使用我们搜索工具的翻译智能体的翻译质量。

英文摘要

Neologism-aware machine translation aims to translate source sentences containing neologisms into target languages. This field remains underexplored compared with general machine translation (MT). In this paper, we propose an agentic framework, NeoAMT, for neologism-aware machine translation equipped with a Wiktionary-based search toolkit. Specifically, we first construct a dedicated dataset for neologism-aware machine translation and build a search toolkit grounded in Wiktionary. The dataset covers 16 languages and 75 translation directions in total, derived from approximately 10 million records of an English Wiktionary dump. The retrieval corpus of the search toolkit is also constructed from around 3 million cleaned records of the same dump. We then leverage the dataset and toolkit to train a translation agent via reinforcement learning (RL) and to evaluate the accuracy of neologism-aware machine translation. Furthermore, we propose an RL training framework featuring a novel reward design and an adaptive rollout generation strategy that exploits translation difficulty to further improve the translation quality of translation agents using our search toolkit.