arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.20210 2026-05-26 cs.LG cs.AI

MARS：面向奖励建模的边界与语义感知数据增强

Payel Bhattacharjee, Osvaldo Simeone, Ravi Tandon

发表机构 * University of Arizona（亚利桑那大学）； Northeastern University London（伦敦东北大学）

AI总结提出MARS框架，通过优先增强低边界偏好对并利用语义距离细化，提升奖励模型质量和对齐性能。

详情

AI中文摘要

奖励建模是RLHF、RLAIF和基于PPO的策略优化等对齐流程的核心，但其可靠性受限于有限且异构的人类偏好数据，这些数据难以大规模收集。虽然合成增强可以扩展偏好监督，但现有方法通常均匀增强或在表示层面增强，而不针对奖励模型不确定或容易误排序的示例。在本文中，我们介绍了MARS（面向奖励建模的边界与语义感知数据增强），一种自适应增强框架，优先考虑低边界偏好对，并使用语义距离作为第二层细化，以增强选择响应和拒绝响应之间的对比。在多个偏好数据集、奖励模型骨干、下游对齐设置以及包括RewardBench和AlpacaEval在内的基准测试中，MARS在奖励模型质量和对齐性能上都优于现有基线。我们的结果表明，当同时由模型边界和语义结构引导时，奖励模型增强最为有效。

英文摘要

Reward modeling is central to alignment pipelines such as RLHF, RLAIF, and PPO-based policy optimization, yet its reliability is constrained by limited and heterogeneous human preference data that are expensive to collect at scale. While synthetic augmentation can expand preference supervision, existing methods often augment uniformly or at the representation level, without targeting examples where the reward model is uncertain or prone to mis-ranking. In this paper, we introduce MARS (Margin and Semantic-Aware Data Augmentation for Reward Modeling), an adaptive augmentation framework that prioritizes low-margin preference pairs and uses semantic distance as a second layer for refinement to enhance the contrast between the chosen and rejected responses. Across multiple preference datasets, reward-model backbones, downstream alignment settings, and benchmarks including RewardBench and AlpacaEval, MARS improves both reward-model quality and alignment performance over existing baselines. Our results show that reward-model augmentation is most effective when guided by both model margins and semantic structure.

URL PDF HTML ☆

赞 0 踩 0

2602.17234 2026-05-26 cs.AI cs.LG

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection and Mitigation in LLM Backtesting

所有泄漏都重要，有些泄漏更重要：LLM回测中可解释的时间污染检测与缓解

Zeyu Zhang, Ryan Chen, Bradly C. Stadie

发表机构 * Department of Statistics and Data Science, Northwestern University（统计与数据科学系，西北大学）； Bridgewater AIA Labs（布里奇沃特AIA实验室）

AI总结提出基于Shapley值的声明级评估框架Shapley-DCLR和推理时架构TimeSPEC，用于检测和缓解LLM回测中的时间污染问题。

Comments 8 pages plus appendix

详情

AI中文摘要

对已解决事件进行回测的LLM假设模型仅基于截止前知识进行推理，然而预训练模型不可避免地泄漏截止后知识。我们引入了一个声明级评估框架，将预测理由分解为原子声明，并应用Shapley值量化每个声明的决策影响，从而得到 extbf{Shapley-DCLR}（ extbf{Shapley}加权的 extbf{决策关键泄漏率}）——一个可解释的度量，用于衡量决策驱动推理中被污染的比例。我们进一步提出 extbf{TimeSPEC}（基于提取声明的时间监督预测），一种推理时架构，它将时间过滤的检索与声明级监督交织在一起，生成完全基于截止前证据的预测。在三个LLM上的消融实验证实了检索和监督共同必要；三项任务探测进一步说明，时间强制的性能成本与每个任务对截止后信息的依赖程度成正比。

英文摘要

Backtesting LLMs on resolved events assumes models reason only from pre-cutoff knowledge, yet pretrained models inevitably leak post-cutoff knowledge. We introduce a claim-level evaluation framework that decomposes prediction rationales into atomic claims and applies Shapley values to quantify each claim's decision impact, yielding \textbf{Shapley-DCLR} (\textbf{Shapley}-weighted \textbf{D}ecision-\textbf{C}ritical \textbf{L}eakage \textbf{R}ate) -- an interpretable metric measuring what fraction of decision-driving reasoning is contaminated. We further propose \textbf{TimeSPEC} (\textbf{Time}-\textbf{S}upervised \textbf{P}rediction with \textbf{E}xtracted \textbf{C}laims), an inference-time architecture that interleaves temporally-filtered retrieval with claim-level supervision, producing predictions grounded entirely in pre-cutoff evidence. Across three LLMs, the ablation experiments confirm retrieval and supervision are jointly necessary; and a three-task probe further illstrates that the performance cost of temporal enforcement scales with each task's reliance on post-cutoff information.

URL PDF HTML ☆

赞 0 踩 0

2602.16229 2026-05-26 cs.LG

Factored Latent Action World Models

因子化潜在动作世界模型

Zizhao Wang, Chang Shi, Jiaheng Hu, Kevin Rohling, Roberto Martín-Martín, Amy Zhang, Peter Stone

发表机构 * University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结提出因子化潜在动作模型（FLAM），通过将场景分解为独立因子并学习各自的潜在动作，提升了无动作视频中多实体动态建模的准确性和视频生成质量。

详情

AI中文摘要

从无动作视频中学习潜在动作已成为扩展可控世界模型学习的强大范式。潜在动作为用户迭代生成和操作视频提供了自然接口。然而，大多数现有方法依赖整体逆动态和正动态模型，学习单一潜在动作来控制整个场景，因此在多个实体同时行动的复杂环境中表现不佳。本文引入因子化潜在动作模型（FLAM），一种因子化动态框架，将场景分解为独立因子，每个因子推断自己的潜在动作并预测自己的下一步因子值。与整体模型相比，这种因子化结构能够更准确地建模复杂多实体动态，并提高无动作视频设置中的视频生成质量。基于模拟和真实世界多实体数据集的实验，我们发现FLAM在预测准确性和表示质量方面优于先前工作，并促进了下游策略学习，展示了因子化潜在动作模型的优势。

英文摘要

Learning latent actions from action-free video has emerged as a powerful paradigm for scaling up controllable world model learning. Latent actions provide a natural interface for users to iteratively generate and manipulate videos. However, most existing approaches rely on monolithic inverse and forward dynamics models that learn a single latent action to control the entire scene, and therefore struggle in complex environments where multiple entities act simultaneously. This paper introduces Factored Latent Action Model (FLAM), a factored dynamics framework that decomposes the scene into independent factors, each inferring its own latent action and predicting its own next-step factor value. This factorized structure enables more accurate modeling of complex multi-entity dynamics and improves video generation quality in action-free video settings compared to monolithic models. Based on experiments on both simulation and real-world multi-entity datasets, we find that FLAM outperforms prior work in prediction accuracy and representation quality, and facilitates downstream policy learning, demonstrating the benefits of factorized latent action models.

URL PDF HTML ☆

赞 0 踩 0

2602.15811 2026-05-26 cs.CV cs.AI

CARL-CXR: Continual Adapter-Based Routing for Task-Unknown Chest Radiograph Classification

CARL-CXR：基于连续适配器路由的任务未知胸部X光片分类

Muthu Subash Kavitha, Anas Zafar, Amgad Muneer, Jia Wu

发表机构 * Department of Imaging Physics, The University of Texas MD Anderson Cancer Center（影像物理系，德克萨斯大学MD安德森癌症中心）

AI总结提出CARL-CXR框架，通过固定高容量骨干网络、增量添加轻量级任务特定适配器和分类头，以及潜在任务选择器，解决任务未知推理下的胸部X光片增量分类问题，显著减少灾难性遗忘并提升路由准确性。

Comments 9 pages, 4 figures

详情

AI中文摘要

胸部X光片分类器的临床部署需要模型能够在新数据集可用时进行更新，而无需对先前观察到的数据进行重新训练或降低已验证的性能。我们研究了任务未知推理下的任务增量连续学习设置，其中异质的胸部X光数据集顺序到达，且在部署时任务身份不可用。我们提出了CARL-CXR，一个基于连续适配器的路由框架，该框架保持固定的高容量骨干网络，同时增量引入轻量级任务特定适配器和分类头。一个潜在任务选择器基于适配器条件特征进行操作，将每个输入动态路由到最相关的任务路径，利用紧凑的任务原型和特征级经验回放来在顺序更新中保留任务身份，而无需存储原始图像。在MIMIC-CXR和CheXpert两个具有不同患者群体、成像设备和注释流程的大规模数据集上的实验表明，CARL-CXR实现了最小的灾难性遗忘（AUROC下降0.012），比已建立的连续学习基线LwF和EWC分别减少了6倍和11倍，同时保持了具有竞争力的诊断性能（AUROC 0.74）。在任务未知部署下，CARL-CXR在路由准确性上比联合训练高出12.5个百分点（75.0% vs. 62.5%）：与LwF和EWC不同，后者在推理时需要明确的任务标识符且不提供路由机制。

英文摘要

Clinical deployment of chest radiograph classifiers requires models that can be updated as new datasets become available without retraining on previously observed data or degrading validated performance. We study a task-incremental continual learning setting for chest radiograph classification under task-unknown inference, where heterogeneous chest X-ray datasets arrive sequentially and task identity is unavailable at deployment time. We propose CARL-CXR, a continual adapter-based routing framework that maintains a fixed high-capacity backbone while incrementally introducing lightweight task-specific adapters and classifier heads. A latent task selector operates on adapter-conditioned features to dynamically route each input to the most relevant task pathway, leveraging compact task prototypes and feature-level experience replay to preserve task identity across sequential updates without storing raw images. Experiments on MIMIC-CXR and CheXpert two large-scale datasets with distinct patient populations, imaging devices, and annotation pipelines demonstrate that CARL-CXR achieves minimal catastrophic forgetting (0.012 AUROC drop), representing a 6X and 11X reduction over established continual learning baselines LwF and EWC respectively, while maintaining competitive diagnostic performance (AUROC 0.74). Under task unknown deployment, CARL-CXR outperforms joint training by 12.5 points in routing accuracy (75.0% vs. 62.5%): unlike LwF and EWC, which require explicit task identifiers at inference and provide no routing mechanism.

URL PDF HTML ☆

赞 0 踩 0

2602.15620 2026-05-26 cs.CL cs.AI

上下文展开赌博机：面向可验证奖励的强化学习

Xiaodong Lu, Xiaohan Wang, Jiajun Chai, Guojun Yin, Wei Lin, Zhijun Chen, Yu Luo, Fuzhen Zhuang, Yikun Ban, Deqing Wang

发表机构 * School of Computer Science and Engineering, Beihang University（北京航空航天大学计算机科学与工程学院）； School of Artificial Intelligence, Beihang University（北京航空航天大学人工智能学院）； Huawei（华为）

AI总结针对RLVR中展开使用无差别、短视导致的问题，提出上下文赌博机框架，自适应选择高价值展开，提升训练效率与性能。

详情

AI中文摘要

可验证奖励的强化学习（RLVR）是提升大型语言模型推理能力的有效范式。然而，现有RLVR方法以无差别和短视的方式使用展开：每个提示内不同质量的响应被统一对待，且历史展开在单次使用后被丢弃。这导致监督噪声大、样本效率低以及策略更新次优。我们通过将RLVR中的展开调度形式化为上下文赌博机问题，并提出一个统一的神经调度框架来解决这些问题，该框架在整个训练过程中自适应地选择高价值展开。每个展开被视为一个臂，其奖励由连续优化步骤之间诱导的性能增益定义。由此产生的调度器支持噪声感知的组内选择和历史展开的自适应全局重用，所有这些都在一个统一的原则性框架内。我们通过推导次线性遗憾界并证明扩大展开缓冲区可改善可实现性能上限，提供了理论依据。在六个数学推理基准上的实验表明，在多种RLVR优化方法中，性能和训练效率均有一致的提升。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) is an effective paradigm for improving the reasoning capabilities of large language models. However, existing RLVR methods utilize rollouts in an indiscriminate and short-horizon manner: responses of heterogeneous quality within each prompt are treated uniformly, and historical rollouts are discarded after a single use. This leads to noisy supervision, poor sample efficiency, and suboptimal policy updates. We address these issues by formulating rollout scheduling in RLVR as a contextual bandit problem and proposing a unified neural scheduling framework that adaptively selects high-value rollouts throughout training. Each rollout is treated as an arm whose reward is defined by the induced performance gain between consecutive optimization steps. The resulting scheduler supports both noise-aware intra-group selection and adaptive global reuse of historical rollouts within a single principled framework. We provide theoretical justification by deriving sublinear regret bounds and showing that enlarging the rollout buffer improves the achievable performance upper bound. Experiments on six mathematical reasoning benchmarks demonstrate consistent gains in performance and training efficiency across multiple RLVR optimization methods.

URL PDF HTML ☆

赞 0 踩 0

2602.08426 2026-05-26 cs.CL cs.AI cs.CV

Prism: Spectral-Aware Block-Sparse Attention

Prism: 频谱感知的块稀疏注意力

Xinghao Wang, Pengyu Wang, Xiaoran Liu, Fangxu Liu, Jason Chu, Kai Song, Xipeng Qiu

发表机构 * Fudan University（复旦大学）； Shanghai Innovation Institute（上海创新研究院）； ByteDance Inc.（字节跳动公司）； OpenMOSS Team（OpenMOSS团队）

AI总结针对长上下文LLM预填充中块稀疏注意力的块选择效率瓶颈，提出无训练频谱感知方法Prism，通过高低频分支分解和能量温度校准恢复位置信号，实现纯块级重要性估计，在保持精度同时实现高达5.1倍加速。

Comments ICML 2026

详情

AI中文摘要

块稀疏注意力有望加速长上下文LLM的预填充，但高效识别相关块仍是瓶颈。现有方法通常采用粗粒度注意力作为块重要性估计的代理，但往往诉诸昂贵的令牌级搜索或评分，导致显著的选择开销。在本工作中，我们将通过均值池化的标准粗粒度注意力的不准确性追溯到一个理论根源：均值池化与旋转位置嵌入（RoPE）之间的交互。我们证明均值池化充当低通滤波器，在高频维度上引起破坏性干扰，有效造成局部位置信息（如斜线模式）的“盲点”。为解决此问题，我们引入Prism，一种无训练的频谱感知方法，将块选择分解为高频和低频分支。通过应用基于能量的温度校准，Prism直接从池化表示中恢复衰减的位置信号，使得仅使用块级操作即可进行块重要性估计，从而提高效率。大量评估证实，Prism在保持与全注意力精度相当的同时，实现了高达$\mathbf{5.1 imes}$的加速。

英文摘要

Block-sparse attention is promising for accelerating long-context LLM pre-filling, yet identifying relevant blocks efficiently remains a bottleneck. Existing methods typically employ coarse-grained attention as a proxy for block importance estimation, but often resort to expensive token-level searching or scoring, resulting in significant selection overhead. In this work, we trace the inaccuracy of standard coarse-grained attention via mean pooling to a theoretical root cause: the interaction between mean pooling and Rotary Positional Embeddings (RoPE). We prove that mean pooling acts as a low-pass filter that induces destructive interference in high-frequency dimensions, effectively creating a "blind spot" for local positional information (e.g., slash patterns). To address this, we introduce Prism, a training-free spectral-aware approach that decomposes block selection into high-frequency and low-frequency branches. By applying energy-based temperature calibration, Prism restores the attenuated positional signals directly from pooled representations, enabling block importance estimation using purely block-level operations, thereby improving efficiency. Extensive evaluations confirm that Prism maintains accuracy parity with full attention while delivering up to $\mathbf{5.1\times}$ speedup.

URL PDF HTML ☆

赞 0 踩 0

2602.06717 2026-05-26 cs.LG cs.AI

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

F-GRPO: 别让你的策略学到显而易见的而忘记罕见的

Daniil Plyusov, Alexey Gorbatovski, Boris Shaposhnikov, Viacheslav Sinii, Alexey Malakhov, Daria Korotyshova, Daniil Gavrilov

发表机构 * T-Tech

AI总结针对强化学习中有限采样组导致罕见正确轨迹被忽略的问题，提出基于Focal loss的难度感知缩放系数F-GRPO，在不增加组大小和计算成本下提升数学推理性能。

详情

AI中文摘要

基于可验证奖励的强化学习通常依赖组采样来估计优势并稳定策略更新。实践中，计算限制往往排除非常大的组，因此训练使用有限的rollout集合，这些集合只能强化它们暴露的正确行为。在实际组大小下，更新可能会遗漏罕见的正确轨迹，同时仍然包含混合奖励，将概率集中在更常见的采样解上。我们推导了这种提示局部尾部遗漏事件作为组大小函数的概率，展示了非单调行为，并在分类抽象中描述了未采样的正确质量如何在总正确质量增长时缩小。受此分析启发，我们提出了一种难度感知缩放系数，灵感来自Focal loss，它降低了高成功采样组的更新权重。经验上，分类模拟在分类设置中展示了相同效果，Maze提供了单解测试，LLM实验包括代表性的GRPO组大小扫描以及GRPO、DAPO和CISPO之间的固定N迁移。在Qwen2.5-7B上，N=8时，我们的方法将平均数学pass@256从64.1提高到70.3（GRPO），69.3提高到72.5（DAPO），73.2提高到76.8（CISPO）；在所有三种情况下，OOD pass@256也得到改善，且不增加组大小或计算成本。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) is commonly based on group sampling to estimate advantages and stabilize policy updates. In practice, computational limits often rule out very large groups, so training proceeds with finite rollout sets that can reinforce only the correct behavior they expose. At practical group sizes, updates can miss rare-correct trajectories while still containing mixed rewards, concentrating probability on more common sampled solutions. We derive the probability of such prompt-local tail-miss events as a function of group size, showing non-monotonic behavior, and in the categorical abstraction characterize how unsampled-correct mass can shrink even as total correct mass grows. Motivated by this analysis, we propose a difficulty-aware scaling coefficient, inspired by Focal loss, that down-weights updates on high-success sampled groups. Empirically, categorical simulation illustrates the same effect in the categorical setting, Maze provides a single-solution test, and LLM experiments include a representative GRPO group-size sweep together with fixed-$N$ transfer across GRPO, DAPO, and CISPO. On Qwen2.5-7B at $N{=}8$, our method improves average math pass@256 from 64.1 $\rightarrow$ 70.3 (GRPO), 69.3 $\rightarrow$ 72.5 (DAPO), and 73.2 $\rightarrow$ 76.8 (CISPO); OOD pass@256 also improves in all three cases, without increasing group size or computational cost.

URL PDF HTML ☆

赞 0 踩 0

2602.06508 2026-05-26 cs.RO

World-VLA-Loop: Closed-Loop Learning of Video World Model and VLA Policy

World-VLA-Loop: 视频世界模型与VLA策略的闭环学习

Xiaokang Liu, Zechen Bai, Hai Ci, Kevin Yuchen Ma, Mike Zheng Shou

发表机构 * Show Lab, National University of Singapore（新加坡国立大学Show实验室）

AI总结提出World-VLA-Loop框架，通过状态感知视频世界模型联合预测未来帧和二元奖励，并采用协同进化范式迭代优化VLA策略，减少对真实环境交互的依赖。

Comments 16 pages, 9 figures

详情

AI中文摘要

强化学习（RL）可以超越行为克隆，优化视觉-语言-动作（VLA）策略，但由于需要大量 rollout、重置、监督和安全风险，真实世界的RL仍然昂贵。基于动作条件的视频世界模型提供了在虚拟环境中训练的选项，但它们在精确的动作跟随方面表现不佳，尤其是在细微的接近成功失败情况下。此外，它们缺乏用于RL的原生奖励信号。基于不准确的视觉预测计算奖励仍然不可靠。我们引入了World-VLA-Loop，它围绕两个基础设计和一个更高级别的协同进化范式构建。我们首先策划了SANS，专门混合成功和接近成功的轨迹，以改善动作-结果对齐。然后，我们训练了一个状态感知视频世界模型，该模型从扩散潜变量中联合预测未来帧和二元奖励。它将奖励估计与生成器耦合，而不是单独模块，从而反过来有利于视觉预测。由于RL过程中VLA行为会发生变化，固定的模拟器可能与更新后的策略不对齐，因此World-VLA-Loop通过使用精炼的世界模型进行迭代VLA后训练，同时将每个改进策略的rollout反馈回来增强和微调世界模型，从而形成闭环。在仿真和真实机器人实验中，World-VLA-Loop显著提高了VLA性能，同时减少了对昂贵的物理交互的依赖。

英文摘要

Reinforcement learning (RL) can refine Vision-Language-Action (VLA) policies beyond behavior cloning, but real-world RL remains expensive due to extensive rollouts, resets, supervision, and safety risks. Action-conditioned video world models offer an option to train in virtual environments, yet they exhibit imprecise action following, particularly on subtle near-success failures. Besides, they lack native reward signals for RL. Computing rewards based on inaccurate visual predictions remain unreliable. We introduce World-VLA-Loop, structured around two foundational designs and a higher-level co-evolving paradigm. We first curate SANS, dedicatedly mixing successful and near-success trajectories to improve action-outcome alignment. Then, we train a state-aware video world model that jointly predicts future frames and binary rewards from diffusion latents. It couples reward estimation to the generator rather than a separate module, and in turn, benefits visual prediction. Since VLA behavior shifts during RL, a fixed simulator can misalign with the updated policy, World-VLA-Loop therefore closes the loop by using the refined world model for iterative VLA post-training while feeding rollouts from each improved policy back to augment and fine-tune the world model. Across simulation and real-robot experiments, World-VLA-Loop substantially improves VLA performance while reducing reliance on costly physical interaction.

URL PDF HTML ☆

赞 0 踩 0

2602.05052 2026-05-26 cs.LG

EFSI-DETR：面向无人机图像实时小目标检测的高效频率-语义集成

Yu Xia, Chang Liu, Tianqi Xiang, Zhigang Tu

发表机构 * State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing（信息工程测绘遥感国家重点实验室）； Wuhan University（武汉大学）； Wuhan University Shenzhen Research Institute（武汉大学深圳研究院）； School of Computer Science（计算机学院）； School of Automation Science and Engineering（自动化科学与工程学院）； South China University of Technology（华南理工大学）

AI总结提出EFSI-DETR框架，通过动态频率-空间统一协同网络和高效语义特征集中器，实现无人机图像中实时小目标检测的先进性能。

详情

AI中文摘要

由于有限的特征表示和无效的多尺度融合，无人机图像中的实时小目标检测仍然具有挑战性。现有方法未充分利用频率信息并依赖静态卷积操作，限制了获取丰富特征表示的能力，并阻碍了深层语义特征的有效利用。为解决这些问题，我们提出EFSI-DETR，一种新颖的检测框架，将高效语义特征增强与动态频率-空间引导相结合。EFSI-DETR包含两个主要组件：(1) 动态频率-空间统一协同网络（DyFusNet），联合利用频率和空间线索进行鲁棒的多尺度特征融合；(2) 高效语义特征集中器（ESFC），以最小计算成本实现深层语义提取。此外，采用细粒度特征保留（FFR）策略，在融合过程中纳入空间丰富的浅层特征，以保留对无人机图像中小目标检测至关重要的细粒度细节。在VisDrone和CODrone基准上的大量实验表明，我们的EFSI-DETR以实时效率实现了最先进的性能，在VisDrone上AP和AP_s分别提升了 extbf{1.6}\%和 extbf{5.8}\%，同时在单个RTX 4090 GPU上获得 extbf{188} FPS的推理速度。

英文摘要

Real-time small object detection in Unmanned Aerial Vehicle (UAV) imagery remains challenging due to limited feature representation and ineffective multi-scale fusion. Existing methods underutilize frequency information and rely on static convolutional operations, which constrain the capacity to obtain rich feature representations and hinder the effective exploitation of deep semantic features. To address these issues, we propose EFSI-DETR, a novel detection framework that integrates efficient semantic feature enhancement with dynamic frequency-spatial guidance. EFSI-DETR comprises two main components: (1) a Dynamic Frequency-Spatial Unified Synergy Network (DyFusNet) that jointly exploits frequency and spatial cues for robust multi-scale feature fusion, (2) an Efficient Semantic Feature Concentrator (ESFC) that enables deep semantic extraction with minimal computational cost. Furthermore, a Fine-grained Feature Retention (FFR) strategy is adopted to incorporate spatially rich shallow features during fusion to preserve fine-grained details, crucial for small object detection in UAV imagery. Extensive experiments on VisDrone and CODrone benchmarks demonstrate that our EFSI-DETR achieves the state-of-the-art performance with real-time efficiency, yielding improvement of \textbf{1.6}\% and \textbf{5.8}\% in AP and AP$_{s}$ on VisDrone, while obtaining \textbf{188} FPS inference speed on a single RTX 4090 GPU.

URL PDF HTML ☆

赞 0 踩 0

2601.18135 2026-05-26 cs.CV

Forward Consistency Learning with Gated Context Aggregation for Video Anomaly Detection

基于门控上下文聚合的前向一致性学习用于视频异常检测

Jiahao Lyu, Minghua Zhao, Xuewen Huang, Yifei Chen, Shuangli Du, Jing Hu, Cheng Shi, Zhiyong Lv

发表机构 * Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi’an University of Technology（陕西网络计算与安全技术重点实验室，西安理工大学计算机科学与工程学院）； School of Cyber Science and Engineering, Xi’an Jiaotong University（网络安全与工程学院，西安交通大学）

AI总结提出轻量级FoGA模型，通过前向一致性学习和门控上下文聚合，在资源受限设备上实现高效视频异常检测，性能优于现有方法且速度达155 FPS。

Comments It has been submitted to the KBS journal

Journal ref Knowledge-Based Systems 2026

详情

DOI: 10.1016/j.knosys.2026.116118

AI中文摘要

作为公共安全的关键要素，视频异常检测（VAD）旨在实时监控系统中衡量各种事件与正常模式的偏差。然而，现有大多数VAD方法依赖大规模模型追求极端精度，限制了其在资源受限边缘设备上的可行性。此外，主流基于预测的VAD仅利用单帧未来预测误差检测异常，忽略了更长时域前向信息的更丰富约束。本文提出FoGA，一种轻量级VAD模型，执行基于门控上下文聚合的前向一致性学习，包含约2M参数，专为潜在边缘设备设计。具体而言，我们提出一种基于Unet的方法，对连续帧进行特征提取以生成即时预测和前向预测。然后，我们在跳跃连接中引入门控上下文聚合模块，动态融合相同空间尺度下的编码器和解码器特征。最后，模型通过新颖的前向一致性损失联合优化，并采用混合异常测量策略整合即时帧和前向帧的误差以实现更准确检测。大量实验证明了所提方法的有效性，其显著优于最先进的竞争方法，运行速度高达155 FPS。因此，我们的FoGA在性能与效率指标之间实现了出色的权衡。

英文摘要

As a crucial element of public security, video anomaly detection (VAD) aims to measure deviations from normal patterns for various events in real-time surveillance systems. However, most existing VAD methods rely on large-scale models to pursue extreme accuracy, limiting their feasibility on resource-limited edge devices. Moreover, mainstream prediction-based VAD detects anomalies using only single-frame future prediction errors, overlooking the richer constraints from longer-term temporal forward information. In this paper, we introduce FoGA, a lightweight VAD model that performs Forward consistency learning with Gated context Aggregation, containing about 2M parameters and tailored for potential edge devices. Specifically, we propose a Unet-based method that performs feature extraction on consecutive frames to generate both immediate and forward predictions. Then, we introduce a gated context aggregation module into the skip connections to dynamically fuse encoder and decoder features at the same spatial scale. Finally, the model is jointly optimized with a novel forward consistency loss, and a hybrid anomaly measurement strategy is adopted to integrate errors from both immediate and forward frames for more accurate detection. Extensive experiments demonstrate the effectiveness of the proposed method, which substantially outperforms state-of-the-art competing methods, running up to 155 FPS. Hence, our FoGA achieves an excellent trade-off between performance and the efficiency metric.

URL PDF HTML ☆

赞 0 踩 0

2601.14249 2026-05-26 cs.CL

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

哪种推理轨迹能更好地教会学生推理？一个信息对齐的简单度量

Yuming Yang, Mingyoung Lai, Wanxu Zhao, Xiaoran Fan, Zhiheng Xi, Mingqi Wu, Chiyue Huang, Jun Zhao, Haijun Lv, Jian Tong, Yunhua Zhou, Yicheng Zou, Qipeng Guo, Tao Gui, Qi Zhang, Xuanjing Huang

发表机构 * Fudan University（复旦大学）； Shanghai AI Laboratory（上海人工智能实验室）； University of Toronto（多伦多大学）； University of Sydney（悉尼大学）

AI总结提出Rank-Surprisal Ratio (RSR)度量，通过结合对齐性和信息性评估推理轨迹对学生模型的适用性，在轨迹选择和教师选择中显著优于现有方法。

Comments Accepted to ACL 2026 (Main Conference). 31 pages. Project page: https://github.com/UmeanNever/RankSurprisalRatio

详情

AI中文摘要

长链思维（CoT）轨迹为从教师到学生大语言模型的推理蒸馏提供了丰富的监督信号。然而，先前工作和我们的实验均表明，来自更强教师的轨迹并不一定能产生更好的学生，凸显了蒸馏中数据-学生适配性的重要性。现有方法主要通过学生似然评估适配性，倾向于选择与学生模型当前行为高度一致的轨迹，但忽略了更具信息性的轨迹。针对这一问题，我们提出Rank-Surprisal Ratio (RSR)，一个简单的度量，同时捕捉对齐性和信息性以评估推理轨迹的适用性。RSR的动机源于有效轨迹通常通过结合低绝对概率和相对高排名的token（在学生模型下）来平衡学习信号强度和行为对齐。具体而言，RSR定义为轨迹的平均token级排名与其平均负对数似然之比，计算和解释直观。在五个学生模型和来自11个不同教师的推理轨迹上，RSR与训练后推理性能强相关（平均Spearman 0.86），持续优于现有度量。我们进一步展示了其在轨迹选择和教师选择中的实际效用。

英文摘要

Long chain-of-thought (CoT) trajectories provide rich supervision signals for distilling reasoning from teacher to student LLMs. However, both prior work and our experiments show that trajectories from stronger teachers do not necessarily yield better students, highlighting the importance of data-student suitability in distillation. Existing methods assess suitability primarily through student likelihood, favoring trajectories that align closely with the student model's current behavior but overlooking more informative ones. Addressing this, we propose Rank-Surprisal Ratio (RSR), a simple metric that captures both alignment and informativeness to assess the suitability of a reasoning trajectory. RSR is motivated by the observation that effective trajectories typically balance learning signal strength and behavioral alignment by combining low absolute probability with relatively high-ranked tokens under the student model. Concretely, RSR is defined as the ratio of a trajectory's average token-wise rank to its average negative log-likelihood, and is straightforward to compute and interpret. Across five student models and reasoning trajectories from 11 diverse teachers, RSR strongly correlates with post-training reasoning performance (average Spearman 0.86), consistently outperforming existing metrics. We further demonstrate its practical utility in both trajectory selection and teacher selection.

URL PDF HTML ☆

赞 0 踩 0

2601.05613 2026-05-26 cs.LG cs.AI

PiXTime: A Model for Federated Time Series Forecasting with Heterogeneous Data across Nodes

PiXTime: 一种跨节点异构数据联邦时间序列预测模型

Yiming Zhou, Jiahao Wang, Mingyue Cheng, Hao Wang, Defu Lian, Enhong Chen

发表机构 * University of Science and Technology of China（科学技术大学）

AI总结提出基于Transformer的PiXTime框架，通过参数解耦架构（局部个性化模块+全局共享骨干）处理异构时间序列，实现联邦学习中的异构数据预测，并在多个基准上达到最优性能。

详情

AI中文摘要

虽然对分布式时间序列进行协同预测非常理想，但由于数据共享限制，直接合并局部数据集通常不可行。联邦学习提供了一种有前景的替代方案，但传统的联邦学习算法要求同构模型架构，这与去中心化节点中常见的结构差异（如时间分辨率不对齐、变量通道不匹配）不兼容。为弥合这一差距，我们引入了PiXTime，一种新颖的基于Transformer的框架，旨在原生适应并利用结构异构的时间数据。其核心采用参数解耦架构，将模型策略性地划分为局部个性化模块和全局聚合共享骨干。具体而言，节点特定的局部模块作为维度适配器，将不同长度的原始序列投影到统一表示空间。同时，全局同步的VE表将一致的类别标识注入特征空间，使共享骨干能够跨不一致的变量分布协同学习并泛化表示。在多个基准上的全面评估表明，PiXTime在异构联邦环境中实现了最先进的性能，同时在标准同构和集中式预测设置中保持强大的优势。

英文摘要

While collaborative forecasting on distributed time series is highly desirable, directly pooling localized datasets is often impractical due to data sharing constraints. Federated learning offers a promising alternative, yet conventional federated learning algorithms require homogeneous model architectures, which are incompatible with the structural discrepancies, such as unaligned temporal resolutions and mismatched variable channels, commonly observed across decentralized nodes. To bridge this gap, we introduce PiXTime, a novel Transformer-based framework designed to natively accommodate and leverage structurally heterogeneous temporal data. At its core, PiXTime adopts a parameter-decoupling architecture, strategically partitioning the model into localized personalized modules and a globally aggregated shared backbone. Specifically, node-specific local modules act as dimensional adapters, projecting raw sequences of diverse lengths into a unified representation space. Concurrently, a globally synchronized VE Table injects consistent categorical identities into the feature space, allowing the shared backbone to collaboratively learn and generalize representations across inconsistent variable distributions. Comprehensive evaluations on multiple benchmarks demonstrate that PiXTime achieves state-of-the-art performance in heterogeneous federated environments, while maintaining robust superiority in standard homogeneous and centralized forecasting settings.

URL PDF HTML ☆

赞 0 踩 0

2601.05483 2026-05-26 cs.AI

MMUEChange: A Generalized LLM Agent Framework for Intelligent Multi-Modal Urban Environment Change Analysis

MMUEChange：面向智能多模态城市环境变化分析的通用LLM智能体框架

Zixuan Xiao, Jun Ma, Siwei Zhang

发表机构 * Department of Urban Planning and Design, The University of Hong Kong（香港大学城市规划与设计系）

AI总结提出MMUEChange多模态智能体框架，通过模块化工具包和模态控制器实现异构城市数据灵活集成与跨模态对齐，在三个城市案例中任务成功率提升46.7%并有效缓解幻觉。

Journal ref Applied Soft Computing 190 (2026) 114576

详情

DOI: 10.1016/j.asoc.2026.114576

AI中文摘要

理解城市环境变化对于可持续发展至关重要。然而，当前方法，特别是遥感变化检测，通常依赖于刚性的单模态分析。为克服这些限制，我们提出MMUEChange，一个多模态智能体框架，通过模块化工具包和核心模块——模态控制器实现跨模态和模态内对齐，灵活集成异构城市数据，从而支持对复杂城市变化场景的稳健分析。案例研究包括：纽约向小型社区公园的转变，反映了当地的绿地建设努力；香港各区集中水污染的扩散，指向协调的水管理；深圳露天垃圾场的显著减少，以及夜间经济活动与垃圾类型之间的对比关联，表明生活垃圾和建筑垃圾背后不同的城市压力。与性能最佳的基线相比，MMUEChange智能体在任务成功率上提升了46.7%，并有效缓解了幻觉，展示了其支持具有实际政策影响的复杂城市变化分析任务的能力。

英文摘要

Understanding urban environment change is essential for sustainable development. However, current approaches, particularly remote sensing change detection, often rely on rigid, single-modal analysis. To overcome these limitations, we propose MMUEChange, a multi-modal agent framework that flexibly integrates heterogeneous urban data via a modular toolkit and a core module, Modality Controller for cross- and intra-modal alignment, enabling robust analysis of complex urban change scenarios. Case studies include: a shift toward small, community-focused parks in New York, reflecting local green space efforts; the spread of concentrated water pollution across districts in Hong Kong, pointing to coordinated water management; and a notable decline in open dumpsites in Shenzhen, with contrasting links between nighttime economic activity and waste types, indicating differing urban pressures behind domestic and construction waste. Compared to the best-performing baseline, the MMUEChange agent achieves a 46.7% improvement in task success rate and effectively mitigates hallucination, demonstrating its capacity to support complex urban change analysis tasks with real-world policy implications.

URL PDF HTML ☆

赞 0 踩 0

2601.03790 2026-05-26 cs.CL cs.AI

NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning

NeoAMT: 基于强化学习的新词感知智能机器翻译

Zhongtao Miao, Kaiyan Zhao, Masaaki Nagata, Yoshimasa Tsuruoka

发表机构 * The University of Tokyo（东京大学）； NTT Communication Science Laboratories, NTT, Inc.（NTT通信科学实验室，NTT公司）

AI总结提出NeoAMT框架，利用基于Wiktionary的搜索工具和强化学习训练翻译智能体，以提升包含新词的源句翻译质量。

Comments ACL 2026 Main. Fixed minor typos

详情

AI中文摘要

新词感知机器翻译旨在将包含新词的源句翻译成目标语言。与通用机器翻译相比，该领域仍未被充分探索。本文提出一个智能体框架NeoAMT，用于新词感知机器翻译，配备基于Wiktionary的搜索工具。具体而言，我们首先构建了一个专门用于新词感知机器翻译的数据集，并建立了一个基于Wiktionary的搜索工具。该数据集涵盖16种语言和75个翻译方向，源自约1000万条英文Wiktionary转储记录。搜索工具的检索语料库也来自同一转储中约300万条清洗后的记录。然后，我们利用该数据集和工具，通过强化学习训练翻译智能体，并评估新词感知机器翻译的准确性。此外，我们提出了一个强化学习训练框架，具有新颖的奖励设计和自适应展开生成策略，利用翻译难度进一步提高使用我们搜索工具的翻译智能体的翻译质量。

英文摘要

Neologism-aware machine translation aims to translate source sentences containing neologisms into target languages. This field remains underexplored compared with general machine translation (MT). In this paper, we propose an agentic framework, NeoAMT, for neologism-aware machine translation equipped with a Wiktionary-based search toolkit. Specifically, we first construct a dedicated dataset for neologism-aware machine translation and build a search toolkit grounded in Wiktionary. The dataset covers 16 languages and 75 translation directions in total, derived from approximately 10 million records of an English Wiktionary dump. The retrieval corpus of the search toolkit is also constructed from around 3 million cleaned records of the same dump. We then leverage the dataset and toolkit to train a translation agent via reinforcement learning (RL) and to evaluate the accuracy of neologism-aware machine translation. Furthermore, we propose an RL training framework featuring a novel reward design and an adaptive rollout generation strategy that exploits translation difficulty to further improve the translation quality of translation agents using our search toolkit.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Multimodal Crystal Flow: Any-to-Any Modality Generation for Unified Crystal Modeling

PerSoMed: A Large-Scale Balanced Dataset for Persian Social Media Text Classification

Decoding ML Decision: An Agentic Reasoning Framework for Large-Scale Ranking System

MARS: Margin and Semantic-Aware Data Augmentation for Reward Modeling

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection and Mitigation in LLM Backtesting

Factored Latent Action World Models

CARL-CXR: Continual Adapter-Based Routing for Task-Unknown Chest Radiograph Classification

STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens

Krause Synchronization Transformers

Multi-Level Strategic Classification: Incentivizing Improvement through Promotion and Relegation Dynamics

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards

Prism: Spectral-Aware Block-Sparse Attention

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

World-VLA-Loop: Closed-Loop Learning of Video World Model and VLA Policy

Learning, Solving and Optimizing PDEs with TensorGalerkin: an efficient high-performance Galerkin assembly algorithm

ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation

Generative Neural Operators through Diffusion Last Layer

CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning

Reward-free Alignment for Conflicting Objectives

PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding

Refining Context-Entangled Content Segmentation via Curriculum Selection and Anti-Curriculum Promotion

EvoEGF-Mol: Evolving Exponential Geodesic Flow for Structure-based Drug Design

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

Safety Generalization Under Distribution Shift in Safe Reinforcement Learning: A Diabetes Testbed

EFSI-DETR: Efficient Frequency-Semantic Integration for Real-Time Small Object Detection in UAV Imagery

Forward Consistency Learning with Gated Context Aggregation for Video Anomaly Detection

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

PiXTime: A Model for Federated Time Series Forecasting with Heterogeneous Data across Nodes

MMUEChange: A Generalized LLM Agent Framework for Intelligent Multi-Modal Urban Environment Change Analysis

NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning