arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2163
2605.19316 2026-05-20 cs.CL

A Multi-Agent Framework for Feature-Constrained Difficulty Control in Reading Comprehension Item Generation

一种用于特征约束难度控制的多智能体框架(在阅读理解项目生成中)

Seonjeong Hwang, Jun Seo, Hyounghun Kim, Gary Geunbae Lee

AI总结 本文提出MAFIG多智能体框架,通过多个LLM代理和特征特定评估器协作生成并迭代修订项目,以满足特征约束,从而实现更稳定的难度控制。

Comments ACL 2026 Main Conference

详情
AI中文摘要

最近的研究在难度控制的阅读理解项目生成中利用大型语言模型(LLMs)通过调整与难度相关的特征来生成项目。然而,现有方法通常依赖于单代理提示方法,这往往无法一致地满足指定的特征约束,导致生成的项目偏离目标难度水平。为了解决这一限制,我们引入了MAFIG,一种用于特征约束项目生成的多代理框架,其中多个LLM代理和特征特定评估器协作生成并根据预期约束迭代修订项目。此外,为了验证MAFIG在难度控制中的有效性,我们提出了一种构造特征约束集序列的方法,该序列产生难度单调递增的项目。实验结果表明,MAFIG生成符合目标约束的项目率显著高于基线方法,通过难度校准的约束序列实现了稳健的难度控制。

英文摘要

Recent studies in difficulty-controlled reading comprehension item generation have leveraged large language models (LLMs) to produce items by adjusting difficulty-related features. However, existing methods typically rely on a single-agent prompting approach, which often fails to consistently satisfy specified feature constraints, resulting in items that deviate from the target difficulty level. To address this limitation, we introduce MAFIG, a Multi-agent Framework for Feature-constrained Item Generation, where multiple LLM agents and feature-specific evaluators collaborate to generate and iteratively revise items based on intended constraints. Furthermore, to verify the efficacy of MAFIG in difficulty control, we propose a method for constructing a sequence of feature constraint sets that yield items with monotonically increasing difficulty. Experimental results demonstrate that MAFIG generates items that adhere to target constraints at a significantly higher rate than baselines, achieving robust difficulty control through the difficulty-calibrated constraint sequence.

2605.19314 2026-05-20 cs.RO cs.AI

ContextFlow: Hierarchical Task-State Alignment for Long-Horizon Embodied Agents

ContextFlow:长周期具身智能体的分层任务-状态对齐

Shuhan Guo, Kun Zhang, Haifei Liu, Xingyu Gao, Yongqi Zhang, Yaqing Wang, Quanming Yao

AI总结 本文研究了长周期具身智能体中任务-状态不一致问题,提出ContextFlow框架通过显式合同表示阶段、运行时观测转为证据包以及应用作用域更新来实现任务前沿对齐,提高任务执行的连贯性和可审计性。

详情
AI中文摘要

长周期具身智能体越来越多地将导航、搜索、接近和操作任务委托给专门执行器。随着这些执行器变得更强,瓶颈从局部技能执行转移到在规划、监控、记忆和执行之间保持一致的任务前沿。我们研究了任务-状态不一致,即在任务层面一致性失败,其中规划器的活跃阶段、运行时证据、记忆上下文和委托执行器不再支持相同的下一步决策。这种失败可能导致不支持的手动交接、阶段锁定、执行器-上下文不匹配和不必要的重新规划。我们提出ContextFlow,一个可检查的对齐框架,将阶段表示为显式合同,将运行时观测转换为证据包,并应用包括继续、细化、转移、提升和修复在内的作用域更新。ContextFlow使专门执行器负责局部闭环控制,同时使任务前沿对齐显式且可审计。在长周期具身任务上的实验和演示轨迹展示了证据基础的作用域更新如何诊断和缓解反复出现的任务-状态失败。

英文摘要

Long-horizon embodied agents increasingly delegate navigation, search, approach, and manipulation to specialist executors. As these executors become stronger, the main bottleneck shifts from local skill execution to maintaining a coherent task frontier across planning, monitoring, memory, and execution. We study task-state misalignment, a task-level consistency failure in which the planner's active stage, runtime evidence, remembered context, and delegated executor no longer justify the same next-step decision. This failure can lead to unsupported handoffs, stage lock, executor-context mismatch, and unnecessary replanning. We propose ContextFlow, an inspectable alignment framework that represents stages as explicit contracts, converts runtime observations into evidence packets, and applies scoped updates including continue, refine, transfer, promote, and repair. ContextFlow keeps specialist executors responsible for local closed-loop control while making task-frontier alignment explicit and auditable. Experiments and demonstration traces on long-horizon embodied tasks illustrate how evidence-grounded scoped updates diagnose and mitigate recurring task-state failures.

2605.19311 2026-05-20 cs.LG eess.SP

An Objective Performance Evaluation of the LSTM Networks in Time Series Classification

LSTM网络在时间序列分类中的客观性能评估

Sooraj Sunil, Balakumar Balasingam

AI总结 本文提出了一种评估框架,比较了LSTM分类器与基于模型的期望最大化(EM)分类器在二元时间序列分类中的性能,发现当数据符合假设模型类时,EM分类器表现优异,而LSTM分类器需要更大的噪声统计分离度才能实现可靠的分类,且在模型仅在测量噪声上不同的情况下,其性能低于参考分类器。

Comments Accepted in 2026 29th International Conference on Information Fusion

详情
AI中文摘要

深度学习的快速采用已导致数据驱动模型取代经典基于模型的算法,即使在由良好理解的物理定律支配的领域也是如此。尽管数据驱动模型,如长短期记忆(LSTM)网络,已成为时间序列分析的流行选择,但其在结构化环境中的性能相对于基于模型的方法很少被客观评估。本文提出了一种性能评估框架,比较了LSTM分类器与基于模型的期望最大化(EM)分类器在二元时间序列分类中的性能。评估是在两个仅在噪声统计上不同的标量线性高斯状态空间模型上进行的,其中卡尔曼滤波似然比率检验使用真实参数作为最佳可实现分类性能的参考。通过蒙特卡洛模拟,分类器在三个轴上进行评估:任务难度,由过程或测量噪声之间分离度控制;序列长度;以及训练数据集大小。结果表明,当数据符合假设模型类时,利用已知模型结构的EM分类器表现良好。LSTM分类器需要更大的噪声统计分离度才能实现可靠的分类,并且在模型仅在测量噪声上不同的情况下,其性能低于参考分类器,无论序列长度或训练数据集大小如何。

英文摘要

The rapid adoption of deep learning has increasingly led to data-driven models replacing classical model-based algorithms, even in domains governed by well-understood physical laws. While data-driven models, such as long short-term memory (LSTM) networks, have become a popular choice for time-series analysis, their performance relative to model-based approaches in structured environments is rarely evaluated objectively. This paper presents a performance evaluation framework comparing an LSTM classifier against a model-based expectation maximization (EM) classifier for binary time-series classification. The evaluation is conducted on two scalar linear Gaussian state space models differing only in their noise statistics, where the Kalman filter likelihood ratio test with true parameters serves as a reference for the best achievable classification performance.Through Monte Carlo simulations, the classifiers are evaluated across three axes: task difficulty, controlled by the separation in process or measurement noise between the two models; sequence length; and training dataset size. The results show that the EM classifier, which exploits the known model structure, performs strongly when the data conform to the assumed model class. The LSTM classifier requires a larger separation in noise statistics to achieve reliable classification, and its performance saturates below the reference classifier when the models differ only in measurement noise, regardless of sequence length or training dataset size.

2605.19307 2026-05-20 cs.CV

MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems

MetaRA: 多模态大语言模型基于视觉问答系统的元形态鲁棒性评估

Quanxing Xu, Yuhao Tian, Ling Zhou, Xian Zhong, Xiaohua Huang, Rubing Huang, Chia-Wen Lin

AI总结 本文提出MetaRA,一种基于元形态测试的框架,用于评估多模态大语言模型基于视觉问答系统的鲁棒性,通过生成受控的图像-问题输入变体,揭示模型在语言扰动、视觉线索依赖和多模态推理中的弱点。

详情
AI中文摘要

视觉问答(VQA)作为代表性多模态任务,是评估多模态大语言模型(MLLMs)推理能力的关键基准。然而,现有评估主要依赖静态数据集和基于准确性的指标,无法捕捉鲁棒性、一致性和泛化能力。受元形态测试(MT)启发,我们提出元形态鲁棒性评估(MetaRA),一种测试框架,利用元形态关系(MRs)系统性地探测MLLM基于VQA系统的漏洞。MetaRA根据特定MRs生成受控的图像-问题输入变体,并在多样化的条件下评估模型。将MetaRA应用于多个基于MLLM的VQA模型,揭示了细微的失败模式,包括对语言扰动的敏感性、对表面视觉线索的过度依赖以及更深层次的多模态推理弱点。实验结果表明,MetaRA提供的诊断见解比传统准确性指标更丰富,暴露了在标准基准下仍隐藏的失败模式。总体而言,本文强调了在VQA中系统性鲁棒性评估的必要性,并将元形态评估定位为一种可扩展、模型无关的方法,用于可信的多模态AI。

英文摘要

Visual Question Answering (VQA), as the representative multimodal task, serves as a key benchmark for evaluating the reasoning capabilities of Multimodal Large Language Models (MLLMs). However, existing evaluations largely rely on static datasets and accuracy-based metrics, which fail to capture robustness, consistency, and generalization. Inspired by Metamorphic Testing (MT), we propose Metamorphic Robustness Assessment (MetaRA), a testing framework that employs Metamorphic Relations (MRs) to systematically probe vulnerabilities in MLLM-based VQA systems. MetaRA generates controlled variations of image-question inputs based on specific MRs and evaluates models across diverse conditions. Applying MetaRA to multiple MLLM-based VQA models across different tasks reveals nuanced failure patterns, including sensitivity to linguistic perturbations, over-reliance on superficial visual cues, and deeper weaknesses in multimodal reasoning. Experimental results demonstrate that MetaRA provides richer diagnostic insights than conventional accuracy metrics, exposing failure modes that remain hidden under standard benchmarks. Overall, this work highlights the need for systematic robustness evaluation in VQA and positions metamorphic assessment as a scalable, model-agnostic approach toward trustworthy multimodal AI.

2605.19306 2026-05-20 cs.LG math.OC

A Two-Phase Adaptive Balanced Penalty Method for Controllable Pareto Front Learning under Split Feasibility Conditions

一种用于在分割可行性条件下可控帕累托前沿学习的两阶段自适应平衡惩罚方法

Nguyen Viet Hoang, Dung D. Le, Tran Ngoc Thang

AI总结 本文提出了一种自适应平衡惩罚算法,用于在分割可行性条件下训练可控帕累托前沿学习的超网络,通过自适应指标驱动的可计算下界,将约束帕累托问题转化为双层标量分割问题,并证明了在标准凸性假设下的完全序列收敛性。

Comments 36 pages, 18 figures, 12 tables. Submitted to Neural Networks (Elsevier)

详情
AI中文摘要

我们解决在分割可行性条件下训练超网络用于可控帕累托前沿学习(CPFL)的开放问题,具有严格的理论保证。我们将约束帕累托问题重新表述为双层标量分割问题(BSSP),并提出自适应平衡惩罚(ABP)算法,其三个梯度组件——最优性、集可行性以及图像可行性——通过由可计算下界驱动的自适应指标进行混合。利用一种新的凸替代技术,我们证明在标准凸性和Robbins-Monro步长假设下实现了完全序列收敛性。然后将ABP惩罚结构转换为一种两阶段、以可行性优先的训练策略,用于超MLP和超Trans架构(ABP-HyperNet)。为了评估受约束的CPFL,我们引入了预期可行超体积(EFHV),该指标联合捕捉了解的质量和约束满足。在五个多目标基准上的实验验证了ABP求解器相对于真实值的性能,同时三个多任务学习数据集展示了ABP-HyperNet在提高可行性从36-49%到87-100%的情况下,相比无约束基线达到了2.3倍更高的EFHV。

英文摘要

We address the open problem of training hypernetworks for Controllable Pareto Front Learning (CPFL) under split feasibility conditions with rigorous theoretical guarantees. We reformulate the constrained Pareto problem as a Bi-Level Scalarized Split Problem (BSSP) and propose the Adaptive Balanced Penalty (ABP) algorithm, whose three gradient components -- optimality, set feasibility, and image feasibility -- are blended through an adaptive indicator driven by a computable lower bound. Using a novel convex surrogate technique, we prove full-sequence convergence under standard convexity and Robbins-Monro step-size assumptions. The ABP penalty structure is then translated into a two-phase, feasibility-first training strategy for Hyper-MLP and HyperTrans architectures (ABP-HyperNet). To evaluate constrained CPFL, we introduce the Expected Feasible Hypervolume (EFHV), which jointly captures solution quality and constraint satisfaction. Experiments on five multi-objective benchmarks validate the ABP solver against ground truth, while three multi-task learning datasets demonstrate that ABP-HyperNet achieves up to 2.3x higher EFHV than unconstrained baselines by raising feasibility from 36-49% to 87-100%.

2605.19304 2026-05-20 cs.CV cs.GR

MMGS: 10$\times$ Compressed 3DGS through Optimal Transport Aggregation based on Multi-view Ranking

MMGS: 通过多视图排序基于最优传输的10倍压缩3DGS

Beizhen Zhao, Sicheng Yu, Ziran Yin, Dongxu Shen, Hao Wang

AI总结 本文提出了一种基于最优传输聚合的多视图排序方法,通过全局几何分布匹配问题优化高斯参数,实现3DGS的10倍压缩和10倍加速训练速度,同时保持高质量渲染效果。

Comments 19 pages

详情
AI中文摘要

尽管3D高斯散射(3DGS)已革新了3D重建,但其因大量冗余原始体而存在显著开销。现有压缩方法通常依赖局部采样或固定修剪阈值,难以在减少冗余与高保真渲染之间取得平衡。为此,我们提出了一种新的框架,将高斯优化建模为全局几何分布匹配问题。具体而言,我们的方法集成了三个组成部分:(1)我们引入了多视图3D高斯贡献排序机制,通过几何一致性过滤原始体,而不是使用局部启发式方法;(2)我们提出了基于全局最优传输(OT)的聚合算法,合并冗余原始体的同时保持底层几何;(3)我们设计了基于OT的致密化操作符,保持高斯的分布属性以实现稳定的优化。我们的方法仅使用10%的原始体和10倍于vanilla 3DGS的加速训练速度,实现了最先进的渲染质量。

英文摘要

While 3D Gaussian Splatting (3DGS) has revolutionized 3D reconstruction, it suffers from significant overhead due to massive redundant primitives. Existing compression methods typically rely on local sampling or fixed pruning thresholds, which often struggle to balance redundancy reduction with high-fidelity rendering. To address this, we propose a novel framework that formulates Gaussian optimization as a global geometric distribution matching problem. Specifically, our approach integrates three components: (1) we introduce a multi-view 3D Gaussian contribution ranking mechanism that filters primitives using geometric consistency instead of local heuristics; (2) we propose a global Optimal Transport (OT)-based aggregation algorithm that merges redundant primitives while preserving the underlying geometry; and (3) we design an OT-based densification operator that maintains the Gaussian's distributional properties for stable optimization. Our approach achieves state-of-the-art rendering quality with only \textbf{10$\%$} primitives and \textbf{10$\times$} accelerated training speeds compared to vanilla 3DGS.

2605.19301 2026-05-20 cs.CV

iGSP:Implicit Gradient Subspace Projection for Efficient Continual Learning of Vision-Language Models

iGSP:隐式梯度子空间投影用于高效视觉-语言模型的持续学习

Xuezhi Cui, Dongbo Zhou, Wang Guo, Zeyuan Wang, Ziyu Li, Gaozhi Zhou, Xian Li, Ling Zhao, Wentao Yang, Chao Tao, Haifeng Li

AI总结 本文提出iGSP框架,通过隐式梯度子空间投影实现视觉-语言模型的高效持续学习,解决了传统方法在参数效率和任务间对齐一致性上的不足,显著提升了训练效率和知识重用率。

详情
AI中文摘要

视觉-语言模型需要高效适应不断出现的下游任务。尽管参数高效微调可以缓解灾难性遗忘,但为每个任务分配孤立模块会导致参数爆炸。相反,最近的相似性驱动共享机制错误地将表面视觉相似性等同于底层对齐一致性。这种根本性不匹配导致在视觉相似但逻辑不同的任务之间产生严重的负迁移,并未能利用在视觉上多样的任务之间的对齐重用。我们提出,对齐共享本质上是共享低秩子空间内重叠优化轨迹的几何问题。基于这一见解,我们提出iGSP,一种通过隐式梯度子空间投影实现高效适应的新框架。利用MoE路由器的早期收敛性来建立子空间基底,iGSP将适应过程分为两个阶段。首先,子空间识别阶段通过基底预扩展引入候选专家,应用一种新的子空间约束正则化来隐式地将新任务梯度投影到历史子空间,并通过将路由概率视为梯度流指示器来精确修剪冗余维度,最终最大化知识重用。其次,正交子空间微调阶段固定这一结构基底并去除正则化,快速拟合任务特定的残差损失。在MTIL基准测试中,iGSP在准确率上达到最先进的水平,同时显著提高了训练效率,与当前最先进的方法相比,平均可训练参数减少了42.7%,相对于其他方法最终总参数减少了86.9%。源代码可在https://github.com/GeoX-Lab/iGSP上获得。

英文摘要

Vision-Language Models require efficient adaptation to continually emerging downstream tasks. While Parameter-Efficient Fine-Tuning mitigates catastrophic forgetting, assigning isolated modules per task leads to parameter explosion. Conversely, recent similarity-driven sharing mechanisms falsely equate superficial visual similarity with underlying alignment consistency. This fundamental mismatch triggers severe negative transfer between visually similar but logically distinct tasks and fails to exploit alignment reuse across visually diverse ones. We argue thatalignment sharing is fundamentally a geometric problem of overlapping optimization trajectories within shared low-rank subspaces. Grounded in this insight, we propose iGSP, a novel framework that achieves efficient adaptation via implicit gradient subspace projection. Leveraging the early convergence of MoE routers to establish the subspace basis, iGSP bifurcates the adaptation process into two phases. First, the Subspace Identification phase introduces candidate experts via basis pre-expansion, applies a novel subspace-constrained regularization to implicitly project new task gradients onto the historical subspace, and precisely prunes redundant dimensions by treating routing probabilities as gradient flow indicators, ultimately to maximize knowledge reuse. Second, the Orthogonal Subspace Fine-Tuning phase fixes this structural basis and removes the regularization to rapidly fit the task-specific residual loss. Extensive experiments on the MTIL benchmark demonstrate that iGSP achieves state-of-the-art accuracy while significantly improving training efficiency, reducing the average trainable parameters by 42.7\% compared to current SOTA methods, and decreasing the final total parameters by 86.9\% relative to counterparts. The source code is available at https://github.com/GeoX-Lab/iGSP.

2605.19299 2026-05-20 cs.LG

Cross-Paradigm Knowledge Distillation: A Comprehensive Study of Bidirectional Transfer Between Random Forests and Deep Neural Networks for Big Data Applications

跨范式知识蒸馏:随机森林与深度神经网络之间双向知识转移的综合性研究用于大数据应用

Mahdi Naser Moghadasi

AI总结 本文研究了随机森林与深度神经网络之间双向知识蒸馏,提出了新的方法,通过144次实验展示了双向RF-DL蒸馏在分类和回归任务中的竞争力,同时提供了可解释性和表达性的互补优势。

详情
AI中文摘要

大数据的指数增长加剧了对能够处理多样化数据特征并保持计算效率的高效且可解释的机器学习模型的需求。知识蒸馏主要集中在神经网络到神经网络的转移,跨范式知识转移则鲜有探索。本文首次系统研究了随机森林(RF)与深度神经网络(DNN)之间的双向知识蒸馏,填补了集成学习和大数据应用中的模型压缩关键空白。我们提出了一种新的方法,包括渐进多阶段蒸馏、来自多样化树模型的多教师集成蒸馏以及不确定性感知的跨范式转移机制。通过在6个多样化的数据集上进行144次全面实验,涵盖了分类和回归任务,我们证明双向RF-DL蒸馏在保持可解释性的同时,提供了神经网络的表达能力。我们的结果表明,多教师集成蒸馏在传统方法上始终表现更优,其中NN-COMPACT在分类任务中达到98.13%的分类准确率,NN-WIDE在回归任务中达到92.6%的R²分数。所提出的框架使大数据环境中的部署更加灵活,可以根据计算约束和可解释性需求进行最优模型选择。这项工作在跨范式知识转移领域建立了新的研究方向,对可解释AI和资源受限大数据系统中的可扩展模型部署具有重要影响。

英文摘要

The exponential growth of big data has intensified the need for efficient and interpretable machine learning models that can handle diverse data characteristics while maintaining computational efficiency. Knowledge distillation has primarily focused on neural network-to-neural network transfer, leaving cross-paradigm knowledge transfer largely unexplored. This paper presents the first comprehensive study of bidirectional knowledge distillation between Random Forests (RF) and Deep Neural Networks (DNN), addressing critical gaps in ensemble learning and model compression for big data applications. We propose novel methodologies including progressive multi-stage distillation, multi-teacher ensemble distillation from diverse tree models, and uncertainty-aware cross-paradigm transfer mechanisms. Through 144 comprehensive experiments across 6 diverse datasets encompassing classification and regression tasks, we demonstrate that bidirectional RF-DL distillation achieves competitive performance while providing complementary benefits: interpretability from tree models and expressiveness from neural networks. Our results show that multi-teacher ensemble distillation consistently outperforms traditional approaches, with NN-COMPACT achieving 98.13% classification accuracy and NN-WIDE reaching 92.6% R^2 score in regression tasks. The proposed framework enables deployment flexibility in big data environments, allowing optimal model selection based on computational constraints and interpretability requirements. This work establishes a new research direction in cross-paradigm knowledge transfer with significant implications for interpretable AI and scalable model deployment in resource-constrained big data systems.

2605.19289 2026-05-20 cs.CV

What Makes Synthetic Data Effective in Image Segmentation

是什么使合成数据在图像分割中有效

Jinjin Zhang, Xiefan Guo, Yizhou Jin, Nan Zhou, Di Huang

AI总结 本文研究了合成数据在图像分割中的有效性,通过分析最先进的扩散模型生成的合成图像,发现密集场景构成和精细实例保真度是关键因素,并提出了一种统一框架SENSE,以提升分割性能。

Comments Accepted to ICML 2026

详情
AI中文摘要

受大规模生成模型快速发展的推动,合成数据已成为视觉理解的有前途的解决方案。尽管现代扩散模型在生成逼真图像方面表现出色,但其在复杂视觉分割任务中的潜力仍待探索。在本工作中,我们系统分析了最先进的扩散模型生成的合成图像,以揭示其有效性的决定因素。特别是,具有密集场景构成和精细实例保真度的合成图像表现出显著优势,能够产生更具判别性的空间表示。基于这些见解,我们提出了SENSE,一种利用灵活且可扩展的合成数据显著提升分割性能的统一框架。值得注意的是,SENSE是模型无关的,可与多种架构(如DPT和Mask2Former)兼容,并能有效扩展到参数容量不同的模型。在Cityscapes、COCO和ADE20K上的广泛实验验证了我们方法的有效性和泛化能力。代码可在https://github.com/zhang0jhon/SENSE获取。

英文摘要

Driven by rapid advances in large-scale generative models, synthetic data has emerged as a promising solution for visual understanding. While modern diffusion models achieve remarkable photorealistic image synthesis, their potential in complex visual segmentation tasks remains underexplored. In this work, we conduct a systematic analysis of synthetic images from state-of-the-art diffusion models to uncover the factors governing their utility. In particular, synthetic images characterized by dense scene composition and fine instance fidelity demonstrate distinctive benefits, yielding significantly more discriminative spatial representations. Building on these insights, we propose SENSE, a unified framework that leverages flexible and scalable synthetic data to substantially enhance segmentation performance. Notably, SENSE is model-agnostic, compatible with diverse architectures (e.g., DPT and Mask2Former), and scales effectively across models with varying parameter capacities. Extensive experiments on Cityscapes, COCO, and ADE20K validate the effectiveness and generalization capability of our approach. Code is available at https://github.com/zhang0jhon/SENSE.

2605.19285 2026-05-20 cs.CL cs.AI cs.CY

Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection

理性是否必要且充分?为可解释的虚假信息检测调优大语言模型

Bing Wang, Rui Miao, Ximing Li, Chen Shen, Shaotian Yan, Changchun Li, Kaiyuan Liu, Xiaosong Yuan, Jieping Ye

AI总结 本文研究了如何通过调优大语言模型(LLM)来提升可解释性虚假信息检测的性能,提出了一种新的数据合成管道LONSREX,用于定位必要且充分的理性,以解决现有方法中因粗粒度标签和过度验证行为导致的理性不足和冗余问题。

Comments Accepted by KDD 2026. 12 pages, 8 figures. Code: https://github.com/wangbing1416/LONSREX

详情
AI中文摘要

社交媒体上虚假信息的快速传播已成为一个严峻挑战。为缓解其扩散,虚假信息检测(MD)已成为关键研究领域。传统基于小模型的MD方法通常通过黑盒过程进行二元分类。近年来,大型语言模型(LLMs)的兴起使可解释性MD成为可能,其中模型生成理性以解释其决策,从而提高透明度。现有可解释性MD方法主要集中在构建复杂的提示以从现成的LLMs中提取理性。在本文中,我们提出了一种管道来调优专门用于可解释性MD的LLM。我们的管道首先收集大规模经过事实核查的文章,然后使用多个强大的LLMs生成真实性预测和理性。为了确保高质量的训练数据,我们利用一种过滤策略,仅选择正确的实例进行微调。虽然该管道直观且普遍,但我们的实验表明,仅基于标签正确性的简单过滤在实践中是不够的,并存在两个关键限制:(1)粗粒度标签导致理性不足:仅基于二元标签过滤的理性不足以充分支持其决策;(2)过度验证行为导致不必要的理性:更强的LLMs倾向于表现出过度验证行为,生成过度冗长和不必要的理性。为了解决这些问题,我们引入了LONSREX,一种新的数据合成管道,用于定位可解释性MD中必要且充分的理性。具体来说,我们提出了一种度量标准,量化每个验证步骤对最终预测的贡献,从而评估其必要性和充分性。实验结果展示了LONSREX的有效性。

英文摘要

The rapid spread of misinformation on social media platforms has become a formidable challenge. To mitigate its proliferation, Misinformation Detection (MD) has emerged as a critical research topic. Traditional MD approaches based on small models typically perform binary classification through a black-box process. Recently, the rise of Large Language Models (LLMs) has enabled explainable MD, where models generate rationales that explain their decisions, thereby enhancing transparency. Existing explainable MD methods primarily focus on crafting sophisticated prompts to elicit rationales from off-the-shelf LLMs. In this work, we propose a pipeline to fine-tune a dedicated LLM specifically for explainable MD. Our pipeline begins by collecting large-scale fact-checked articles, and then uses multiple strong LLMs to produce veracity predictions and rationales. To ensure high-quality training data, we leverage a filtering strategy that selects only the correct instances for fine-tuning. While this pipeline is intuitive and prevalent, our experiments reveal that naive filtering based solely on label correctness is insufficient in practice and suffers from two critical limitations: (1) Coarse-grained labels cause insufficient rationales: Rationales filtered solely based on binary labels are insufficient to adequately support their decisions; (2) Over-verification behavior causes unnecessary rationales: Stronger LLMs tend to exhibit over-verification behavior, producing excessively verbose and unnecessary rationales. To address these issues, we introduce LONSREX, a novel data synthesis pipeline to Locate Necessary and Sufficient Rationales for Explainable MD. Specifically, we propose a metric that quantifies the contribution of each verification step to the final prediction, thereby evaluating its necessity and sufficiency. Experimental results demonstrate the effectiveness of LONSREX.

2605.19284 2026-05-20 cs.CL cs.LG

Language models struggle with compartmentalization

语言模型在 compartmentalization 方面遇到困难

Thomas Vincent Howe, David Wingate

AI总结 研究探讨了大型语言模型在处理统一概念的不同表达方式时的 compartmentalization 问题,发现模型在不同表达方式之间无法有效共享统计信息,导致模型容量浪费和样本效率降低。

Comments 9 pages, 8 figures, plus 9 pages of appendices. Submitted to NeurIPS 2026. Code: https://github.com/vinhowe/compartmentalization. Eval data: https://doi.org/10.5281/zenodo.20171021

详情
AI中文摘要

在大型语言模型(LLMs)使用的训练数据中,相同的潜在概念通常以多种不同的方式呈现:相同的事实出现在英语和斯瓦希里语中;许多函数可以用Python和Haskell表达;命题可以用正式语言和自然语言表达。我们展示了LLMs可能会表现出compartmentalization,即在不同的统一概念的不同表达方式之间无法识别和共享统计信息。在最坏的情况下,LLMs只是学习了每个概念表达方式的平行内部表示,用冗余性耗尽模型容量,并随着这些表达方式的数量增加而降低样本效率。我们还证明,即使合成平行数据容易学习,它也可能无法改善这一问题。在此框架下,我们发现,对于小型模型,早期多语言学习几乎完全是 compartmentalized 的。最后,所有我们研究的干预措施都表现出相变,其有效性取决于不同的表达方式数量,这表明语言建模目标可能只能不一致地统一表示。

英文摘要

In the training data used by large language models (LLMs), the same latent concept is often presented in multiple distinct ways: the same facts appear in English and Swahili; many functions can be expressed in both Python and Haskell; we can express propositions in both formal and natural language. We show that LLMs can exhibit compartmentalization, where they fail to identify and share statistical strength between distinct presentations of unified concepts. In the worst case, LLMs simply learn parallel internal representations of each presentation of the concept, saturating model capacity with redundancies and decreasing sample efficiency with the number of such presentations. We also demonstrate that synthetic parallel data can fail to improve this despite being easily learned itself. Under this framework, we find that, for small models, early multilingual learning is nearly entirely compartmentalized. Finally, all interventions that we study exhibit a phase transition in which their effectiveness depends on the number of distinct presentations, suggesting that the language modeling objective may only inconsistently unify representations.

2605.19283 2026-05-20 cs.LG cs.AI stat.ML

EviTrack: Selection over Sampling for Delayed Disambiguation

EviTrack: 在延迟歧义中选择而非采样

Omer Haq

AI总结 本文提出EviTrack框架,通过在潜在轨迹上进行选择而非边际状态,以在延迟歧义中实现更有效的序列推理,其核心方法是基于证据和似然比的轨迹假设选择,从而在数据支持后延迟承诺,优于基于采样的基线方法。

Comments https://github.com/Haq94/EviTrack

详情
AI中文摘要

在延迟歧义的环境中,顺序预测具有挑战性,因为早期观测模糊,多个潜在解释在足够证据积累之前仍然合理。基于边际推断的标准方法在此设置中表现不佳,要么过早坍塌不确定性,要么在信息证据出现后无法恢复。我们引入EviTrack,一种测试时间推断框架,该框架在潜在轨迹上而非边际状态上操作。EviTrack维护一组竞争轨迹假设,并应用基于证据和似然比的选择来延迟承诺,直到有数据支持。受多假设跟踪和先检测前跟踪中的假设管理启发。为了评估此设置,我们构建了一个受控的合成基准,具有已知的潜在真实值,明确展示了延迟歧义。在匹配的推断预算下,EviTrack显著优于基于采样的基线方法,实现更快的后歧义恢复。这些结果表明,在延迟歧义环境中,适度的轨迹级选择比增加采样覆盖更有效,突显了选择而非采样作为可靠序列推断的关键原则。

英文摘要

Sequential prediction is challenging in regimes of delayed disambiguation, where early observations are ambiguous and multiple latent explanations remain plausible until sufficient evidence accumulates. Standard approaches based on marginal inference struggle in this setting, either collapsing uncertainty prematurely or failing to recover once informative evidence arrives. We introduce EviTrack, a test-time inference framework that operates over latent trajectories rather than marginal states. EviTrack maintains a set of competing trajectory hypotheses and applies evidence- and likelihood-ratio-based selection to delay commitment until supported by data, drawing inspiration from hypothesis management in multiple hypothesis tracking and track-before-detect. To evaluate this setting, we construct a controlled synthetic benchmark with known latent ground truth that explicitly exhibits delayed disambiguation. At matched inference budget, EviTrack substantially outperforms sampling-based baselines, achieving faster post-disambiguation recovery. These results show that, in delayed disambiguation regimes, moderate trajectory-level selection is more effective than increasing sampling coverage, highlighting selection over sampling as a key principle for reliable sequential inference.

2605.19282 2026-05-20 cs.LG

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

重新思考Muon超越预训练:VLA和RLVR中的频谱失败及高频修复

Chongyu Fan, Gaowen Liu, Mingyi Hong, Ramana Rao Kompella, Sijia Liu

AI总结 本文研究了Muon优化器在预训练之外的局限性,提出Pion通过高频NS迭代机制改进VLA和RLVR任务的性能。

详情
AI中文摘要

Muon是一种矩阵感知优化器,利用牛顿-施楚兹(NS)迭代来通过驱动动量矩阵的所有奇异值趋近于1来强制梯度正交化。尽管这种均匀频谱白化增强了探索并优于AdamW在LLM预训练中,我们显示它在两个领域可能导致根本限制:(i)跨模态视觉-语言-动作(VLA)训练,其中固有低秩动作模块梯度导致噪声尾部方向的放大,以及(ii)可验证奖励的强化学习(RLVR),其中低信噪比梯度和需要保留先前训练的每头专业化使白化不稳定。为了解决这些挑战,我们提出Pion,作为Muon的即插即用替代品,保持其计算效率,同时将均匀频谱白化替换为两阶段的提升+抑制机制,我们称之为高频NS迭代。这种设计诱导了锐利的频谱高频效应,将主导奇异值锚定在1,同时将噪声尾部组件抑制到0,具有可控的滤波强度。为了保持预训练的每头异质性,Pion还支持一种每头模式,通过简单的reshape在注意力头之间独立应用更新,而无需额外成本。在LIBERO和LIBERO-Plus上的VLA训练中,Pion在l_1回归(VLA-Adapter)和流匹配(VLANeXt)架构上一致优于基线,例如在1,500次训练步骤后达到LIBERO Object的100%成功率,而Muon为97.0%,AdamW仅为32.2%。Pion的优势进一步扩展到使用pi_0.5骨干的现实Franka Research 3机器人在DROID设置下的三个抓取和放置任务。在Qwen3-1.7B/4B上的RLVR后训练中,Pion在MATH和GSM8K上优于AdamW,而Muon则崩溃为零。

英文摘要

Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to fundamental limitations beyond pretraining in two regimes: (i) cross-modality vision-language-action (VLA) training, where inherently low-rank action-module gradients cause amplification of noisy tail directions, and (ii) reinforcement learning with verifiable rewards (RLVR), where low-SNR gradients and the need to preserve per-head specialization from prior training make whitening unstable. To address these challenges, we propose Pion, a drop-in replacement for Muon that preserves its computational efficiency while replacing uniform spectral whitening with a two-stage Promotion+Suppression mechanism, which we call the high-pass NS iteration. This design induces a sharp spectral high-pass effect, anchoring dominant singular values at 1 while suppressing noisy tail components toward 0, with controllable filter strength. To preserve pretrained per-head heterogeneity, Pion also supports a per-head mode that applies updates independently across attention heads via a simple reshape, at no extra cost. In VLA training on LIBERO and LIBERO-Plus, Pion consistently outperforms both baselines across l_1-regression (VLA-Adapter) and flow-matching (VLANeXt) architectures, e.g., reaching 100% success rate on LIBERO Object after 1,500 training steps with VLA-Adapter, vs. 97.0% for Muon and only 32.2% for AdamW. The advantage of Pion further extends to a real Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on three grasp-and-place tasks. In RLVR post-training on Qwen3-1.7B/4B with GRPO and GMPO, Pion also outperforms AdamW on MATH and GSM8K while Muon collapses to zero.

2605.19279 2026-05-20 cs.CV

FPED: A Functional-Network Prior-Guided Mixture-of-Experts Framework for Interpretable Brain Decoding

FPED: 一种基于功能网络先验的可解释性脑解码混合专家框架

Yudan Ren, Pengcheng Shi, Zihan Ma, Xiaowei He, Xiao Li

AI总结 本文提出FPED框架,通过建模不同的功能脑网络作为专家,利用自适应路由机制捕捉其对视觉语义理解的互补贡献,实现可解释的脑解码。

Comments 15 pages,4 figures

详情
AI中文摘要

从功能磁共振成像(fMRI)进行视觉图像重建是脑解码中的基本任务,为理解人类感知机制和开发高级脑机接口(BCIs)提供了关键路径。然而,大多数现有方法将局部视觉皮层的fMRI信号简单地展平为一维向量,直接映射到对比语言-图像预训练(CLIP)等潜在空间。这种范式不仅破坏了大脑固有网络拓扑结构,导致神经科学解释性有限,还忽略了其他分布式功能网络在处理高级视觉语义中的协同作用。为解决这些限制,我们提出了FPED,一种基于功能网络先验的混合专家(MoE)框架,用于可解释的脑解码。FPED明确将不同的功能脑网络建模为专门的专家,并利用自适应路由机制捕捉其对视觉语义理解的互补贡献。与传统同质解码范式不同,我们的框架整合了神经生物学基础的先验知识,以实现结构化且可解释的网络层面表示学习。实验结果表明,FPED仅使用0.68B参数即可实现高度竞争的语义重建性能。所学的路由动态揭示了功能脑网络与模态特定语义处理之间的生物意义对应关系,提供了透明的神经科学解释性。这表明,具有脑网络意识的专家建模是连接神经解码与生物启发式人工智能的有前景方向。

英文摘要

Visual image reconstruction from functional Magnetic Resonance Imaging (fMRI) is a fundamental task in brain decoding, providing a crucial pathway for understanding human perceptual mechanisms and developing advanced brain-computer interfaces (BCIs). However, most current methods simply flatten fMRI signals from localized visual cortices into one-dimensional (1D) vectors, mapping them directly into latent spaces such as that of Contrastive Language-Image Pre-training (CLIP). This paradigm not only disrupts the inherent network topology of the brain-leading to limited neuroscientific interpretability-but also overlooks the synergistic contributions of other distributed functional networks in processing high-level visual semantics. To address these limitations, we propose FPED, a Functional-Network Prior-Guided Mixture of Experts (MoE) framework for interpretable brain decoding. FPED explicitly models different functional brain networks as specialized experts and employs adaptive routing to capture their complementary contributions to visual semantic understanding. Unlike conventional homogeneous decoding paradigms, our framework incorporates neurobiologically grounded priors to enable structured and interpretable network-level representation learning. Experimental results demonstrate that FPED achieves highly competitive semantic reconstruction performance with only 0.68B parameters. The learned routing dynamics reveal biologically meaningful correspondence between functional brain networks and modality-specific semantic processing, providing transparent neuroscientific interpretability. This suggests that brain network-aware expert modeling is a promising direction for bridging neural decoding and biologically inspired artificial intelligence.

2605.19274 2026-05-20 cs.CL

Lost in Interpretation: The Plausibility-Faithfulness Trade-off in Cross-Lingual Explanations

迷失在解释中:跨语言解释中的可信度与忠实度的权衡

Somnath Banerjee, Pranav Jha, Rima Hazra, Animesh Mukherjee

AI总结 本文研究了跨语言解释中可信度与忠实度之间的权衡,发现以英语为枢纽的解释在与人类理由的一致性上表现更好,但其证据在模型预测中的因果基础较弱。研究发现,英语解释虽然流畅,但与原生语言条件相比,其解释的全面性下降了5.7倍,甚至在任务准确性保持稳定的情况下,也未能保留语义细微差别。因此,建议在输入语言中审计解释,报告多方面的忠实度指标,并将英语解释视为沟通摘要而非忠实的决策轨迹。

详情
AI中文摘要

多语言部署的LLM通常通过英语解释对非英语输入进行审计。我们评估了提取性解释(模型识别输入token跨度作为证据并生成理由)并发现存在系统性的权衡:英语枢纽解释在与人类理由的一致性上表现更好,但其证据在模型预测中的因果基础较弱,这通过全面性和充分性来衡量。在三个任务、五种语言和两种多语言LLM家族中,我们发现英语解释经常产生流畅但松散锚定的理由,其全面性相对于原生语言条件下降高达5.7倍——即使在不同设置中任务准确性保持稳定。对于社会细微差别分类,英语枢纽也未能保留语义线索,从而降低忠实度和跨度一致性。我们建议在输入语言中审计解释,报告超越词法重叠的多方面忠实度指标,并将英语理由视为沟通摘要而非忠实的决策轨迹。

英文摘要

LLMs deployed multilingually are often audited via English explanations for non-English inputs. We evaluate extractive explanations ''where the model identifies input token spans as evidence alongside a generated rationale'' and uncover a systematic trade-off: English-pivot explanations can achieve higher span agreement with human rationales while their evidence becomes less causally grounded in the model's prediction, as measured by both comprehensiveness and sufficiency. Across 3 tasks, 5~languages, and 2~multilingual LLM families, we find that English explanations frequently produce fluent but loosely anchored rationales, with comprehensiveness degrading by up to 5.7x relative to native-language conditions - even as task accuracy remains stable across settings. For socially nuanced classification, English pivots also fail to preserve pragmatic cues, reducing both faithfulness and span agreement. We recommend auditing explanations in the input language, reporting multi-faceted faithfulness metrics beyond lexical overlap, and treating English rationales as communication summaries rather than faithful decision traces.

2605.19270 2026-05-20 cs.CL

DECOR: Auditing LLM Deception via Information Manipulation Theory

DECOR:通过信息操纵理论审计大语言模型的欺骗行为

Linyue Cai, Samuel Yeh, Jwala Dhamala, Rahul Gupta, Sharon Li

AI总结 本文提出DECOR框架,基于信息操纵理论,通过细粒度审计实现对大语言模型欺骗行为的有效检测,展示了其在单轮和多轮欺骗检测中的优越性能。

详情
AI中文摘要

大型语言模型可以通过微妙地操纵真实信息来欺骗,例如省略关键事实、转移焦点或模糊意义,使这种行为难以检测。现有的黑盒方法依赖于粗粒度判断,提供有限的可解释性,并未能确定哪些事实被扭曲以及如何扭曲。我们引入DECOR,一种基于信息操纵理论的多智能体框架,用于细粒度审计LLM响应中的战略性欺骗。DECOR将输入上下文分解为原子信息单元,并在四个操纵维度上对每个单元进行评分,生成可解释的操纵配置文件,并将其汇总为全局欺骗指数。我们全面评估了DECOR在单轮和多轮欺骗检测基准上,涵盖现实世界领域,并显示DECOR在两者上均达到最先进的性能,优于竞争基线。该框架在15种前沿模型上具有泛化能力,消融研究证实了每个关键设计组件的贡献。我们的发现表明,基于理论的细粒度信息操纵审计为LLM欺骗检测提供了一条有效且可解释的路径。

英文摘要

Large language models can deceive by subtly manipulating truthful information -- omitting key facts, shifting focus, or obscuring meaning -- making such behavior difficult to detect. Existing black-box methods rely on coarse-grained judgments, offering limited interpretability and failing to pinpoint which facts were distorted and how. We introduce DECOR, a multi-agent framework grounded in Information Manipulation Theory for fine-grained auditing of strategic deception in LLM responses. DECOR decomposes input contexts into atomic informational units and scores each unit against the response across four dimensions of manipulation, producing interpretable manipulation profiles that are aggregated into a global deception index. We comprehensively evaluate DECOR on both single-turn and multi-turn deception detection benchmarks spanning real-world domains, and show that DECOR achieves state-of-the-art performance on both, outperforming competitive baselines. The framework generalizes across 15 frontier models, and ablation studies confirm the contribution of each key design component. Our findings demonstrate that fine-grained, theory-grounded auditing of information manipulation offers an effective and interpretable path for LLM deception detection.

2605.19264 2026-05-20 cs.AI cs.MA

Swimming with Whales: Analysis of Power Imbalances in Stake-Weighted Governance

与鲸鱼游泳:对基于权益治理中权力不平衡的分析

Yuzhe Zhang, Manvir Schneider, Qin Wang, Davide Grossi

AI总结 本文研究了基于权益的投票机制中权力失衡现象,通过计算社会选择理论分析了权益加权投票中权力不平衡的程度,并提供了理论和实证贡献。

详情
AI中文摘要

基于权益的投票方法是权益证明(PoS)区块链的基本治理范式。这种范式已知容易产生权力扭曲:少数拥有大权益的用户可能完全控制决策,即使他们不拥有全部权益。我们通过计算社会选择的视角研究这一现象,关注在使用Penrose-Banzhaf权力指数量化权力的情况下,权益加权投票中的权力不平衡程度。我们的工作提供了分析和实证贡献。分析上,我们证明虽然权力与相对权益所有权之间的完美一致通常无法实现,但在特定条件下可以期望近似。实证上,利用现实世界链上治理系统(Project Catalyst)的数据,我们提供了当前权益加权治理系统中可能发生的权力不平衡的更细致理解。

英文摘要

Voting methods weighted by stakes are the fundamental governance paradigm in Proof-of-Stake (PoS) blockchains. Such a paradigm is known to be prone to power distortions: a few users possessing large stakes may completely control decision making, even without owning the totality of the stakes. We study this phenomenon through the lens of computational social choice, focusing on the extent of power imbalances in stake-weighted voting when power is quantified using the Penrose-Banzhaf power index. Our work presents both analytical and empirical contributions. Analytically, we demonstrate that while a perfect alignment between power and relative stake ownership is generally unattainable, it can be approximated in expectation under specific conditions. Empirically, using data from a real-world on-chain governance system (Project Catalyst), we provide a more fine-grained understanding of the power imbalances that are likely to occur in current stake-weighted governance systems.

2605.19260 2026-05-20 cs.AI cs.CV cs.MA

AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees

AQuaUI: 用于GUI代理的视觉令牌减少方法基于自适应四叉树

Yuankai Li, Tinghui Zhu, Ha Min Son, Zhe Zhao, Xin Liu, Muhao Chen

AI总结 本文提出AQuaUI,一种无需训练的推理时GUI代理模型的视觉令牌减少方法,利用屏幕截图中的非均匀信息密度,通过自适应四叉树结构保持令牌位置以确保一致性,并通过条件四叉树算法提升多步骤GUI交互的时序一致性,实验表明其在准确性和效率之间取得了改进。

详情
AI中文摘要

大型多模态模型(LMMs)最近已作为GUI代理模型的有希望的骨干出现,其中在每个迭代步骤中将高分辨率GUI截图引入提示中。然而,这些截图表现出高度非均匀的空间信息密度:大区域可能携带很少的信息且视觉上同质,而关键文本和图标可能需要高视觉保真度。现有方法要么需要额外训练,要么依赖于基于注意力的令牌压缩,忽略了GUI截图的结构布局和空间冗余。为填补这一空白,本文提出了AQuaUI,一种用于GUI代理模型的无训练推理时令牌减少方法,利用截图中的非均匀信息密度。AQuaUI在每个截图输入上构建一个自适应四叉树,并在四叉树的每个叶子节点保留一个代表性的合并令牌。AQuaUI在整个管道中保持保留令牌的空间位置,以确保所有位置编码阶段保持一致。为进一步提高多步骤GUI交互中的时间一致性,我们提出了一种条件四叉树算法,利用单个请求内连续截图之间的连续性。具体而言,它利用先前的四叉树作为参考来细化当前四叉树,帮助在静态或轻微移动的GUI状态下保留细粒度区域。我们在最先进的GUI代理模型上实现了AQuaUI,并在标准的地面和导航基准上进行了实验。AQuaUI在准确性和效率之间始终优于先前的基线。值得注意的是,在GUI-Owl-1.5-32B-Instruct上,AQuaUI实现了高达13.22%的速度提升和29.52%的更少视觉令牌,同时保留了99.06%的完整令牌性能,表明可以在不重新训练的情况下利用GUI截图的空间冗余。

英文摘要

Large Multimodal Models (LMMs) have recently emerged as promising backbones for GUI-agent models, where high-resolution GUI screenshots are introduced to the prompts at each iteration step. However, these screenshots exhibit highly non-uniform spatial information density: large regions may carry little information and are visually homogeneous, while key text and icons may require high visual fidelity. Existing approaches to this problem either require additional training or rely on attention-based token compression, ignoring the structured layout and spatial redundancy of GUI screenshots. To fill the gap, this paper proposes AquaUI, a training-free inference-time token reduction method for GUI agent models that utilizes the non-uniform information density in screenshots. AQuaUI constructs an adaptive quadtree on each screenshot input and keeps one representative merged token per leaf of the quadtree. AQuaUI preserves the spatial positions of retained tokens throughout the pipeline to ensure that all position-encoding stages remain consistent. To further improve temporal consistency across multi-step GUI interactions, we propose a conditional quadtree algorithm that leverages the continuity between consecutive screenshots within a single request. Specifically, it refines the current quadtree using previous quadtrees as references, helping preserve fine-grained regions across static or mildly shifted GUI states. We implement AQuaUI on state-of-the-art GUI agent models and conduct experiments on standard grounding and navigational benchmarks. AQuaUI consistently shows improved accuracy-efficiency trade-offs over prior baselines. Notably, on GUI-Owl-1.5-32B-Instruct, AQuaUI achieves up to 13.22% speedup and 29.52% fewer visual tokens while retaining 99.06% of full-token performance, suggesting that the spatial redundancy of GUI screenshots can be exploited at inference without retraining.

2605.19258 2026-05-20 cs.LG cs.AI

ExECG: An Explainable AI Framework for ECG models

ExECG:用于ECG模型的可解释AI框架

Jong-Hwan Jang, Yong-yeon Jo

AI总结 本文提出ExECG框架,旨在解决ECG模型在临床应用中缺乏解释性的问题,通过三阶段流程提供可重用和可复现的ECG可解释性。

详情
AI中文摘要

深度学习已使ECG诊断模型在如心律失常分类和异常检测等任务中表现出强大的性能。然而,仅凭准确性不足以满足临床部署的需求,因为它无法解释为何产生特定的输出,限制了验证、错误分析和信任。尽管ECG XAI已被广泛研究并持续改进,但不同研究中的实际流程和报告规范差异较大,阻碍了重用和可复现性。为了解决这些问题,我们提出了ExECG,一个Python框架,提供三阶段流程:Wrapper标准化访问异构ECG格式和中间表示,Explainer统一各种XAI方法到共享的执行协议,Visualizer支持在统一界面内一致的跨方法比较。我们通过简洁的例子和两个案例研究展示了端到端的使用,强调了可互操作和可复现的ECG可解释性。

英文摘要

Deep learning has enabled ECG diagnostic models with strong performance in tasks such as arrhythmia classification and abnormality detection. However, accuracy alone is insufficient for clinical deployment because it does not explain why a specific output was produced, limiting justification, error analysis, and trust. Although ECG XAI has been extensively investigated and steadily improved, practical pipelines and reporting conventions vary across studies, hindering reuse and reproducibility. To address these issues, we present Explainable AI framework for ECG models (ExECG), a Python framework that provides a three-stage pipeline: Wrapper standardizes access across heterogeneous ECG formats and intermediate representations, Explainer unifies diverse XAI methods under a shared execution protocol, and Visualizer supports consistent cross-method comparison within a unified interface. We demonstrate end-to-end usage with concise examples and two case studies, highlighting interoperable and reproducible ECG explainability.

2605.19256 2026-05-20 cs.CV

Distribution Matching Distillation without Fake Score Network

无需假评分网络的分布匹配蒸馏

Youngjoong Kim, Deokyeong Lee, Jaesik Park

AI总结 本文提出无需假评分网络的分布匹配蒸馏(FSF-DMD),通过流图生成器自身诱导的伪速度替代传统假评分网络,实现了分布级校正,并在ImageNet-1K数据集上验证了其有效性。

详情
AI中文摘要

分布匹配蒸馏(DMD)为少步生成提供了有效的分布级校正,但依赖辅助的假评分网络来跟踪生成分布的演变。近期工作将DMD式目标与流图生成器结合,以利用正向发散训练和反向发散校正。假评分估计器仍是一个额外的组件,具有内存和更新开销。在本工作中,我们研究当生成器本身具有流图结构时是否可以避免显式跟踪器。我们提出无需假评分网络的DMD(FSF-DMD),一种适用于流图生成器的DMD形式,其用生成器诱导的伪速度替代传统假评分估计器。关键观察是流图生成器的端点伪速度提供了一个可计算的假速度估计代理,使生成器本身能够提供反向发散信号。基于这一观察,我们推导出一个实用的目标,扩展了流图一致的反向模拟,并引入了自教师变体以从头开始训练。在ImageNet-1K 256×256实验中,FSF-DMD改进了流图基线,达到了流图初始化设置下低于列出的DMD2比较的FID,并在流图匹配初始化和从头开始训练时仍保持有效。

英文摘要

Distribution Matching Distillation (DMD) provides an effective distribution-level correction for few-step generation, while relying on an auxiliary fake-score network to track the evolving generative distribution. Recent work combines DMD-style objectives with flow-map generators to exploit both forward-divergence training and reverse-divergence correction. The fake-score estimator remains an additional component with memory and update overhead. In this work, we study whether this explicit tracker can be avoided when the generator itself has a flow-map structure. We propose Fake-Score-network-Free DMD (FSF-DMD), a DMD formulation for flow-map generators that replaces the auxiliary fake-score estimator with a generator-induced pseudo-velocity surrogate. The key observation is that the endpoint pseudo-velocity of a flow-map generator provides a tractable proxy for fake-velocity estimation, allowing the generator itself to supply the reverse-divergence signal. Building on this observation, we derive a practical objective, extend it with flow-map-consistent backward simulation, and introduce a self-teacher variant for training from scratch. In our ImageNet-1K $256 \times 256$ experiments, FSF-DMD improves flow-map baselines, reaches lower FID than the listed DMD2 comparisons in the flow-map-initialized setting, and remains effective under flow-matching initialization and training from scratch.

2605.19255 2026-05-20 cs.RO

Bilateral Teleoperation with Compliant 6-DOF Pose-and-Force Sensing

双通道远程操作与合规6自由度位姿和力感知

Yue Feng, Weicheng Huang, I-Ming Chen

AI总结 本文提出了一种基于硬件无关的WinGs操作系统(WOS)中间件的笛卡尔双通道框架,通过低成本的6自由度位姿和力感知末端执行器Delta6实现远程操作,该框架能够稳定跟踪高达120±40ms延迟和1%丢包率的系统,并在接触时匹配规定的虚拟刚度。

Comments 8 pages, 16 figures, 2 tables. Preprint

详情
AI中文摘要

现有的双通道远程操作平台仍然依赖于昂贵的刚性六轴力/扭矩传感器、紧密耦合的主从硬件和千赫兹控制回路。我们提出了一种基于硬件无关的WinGs操作系统(WOS)中间件的笛卡尔双通道框架,其中低成本的合规6自由度位姿和力感知末端执行器Delta6被安装在两侧,使得每个机械臂行为如同一个末端执行器6自由度系列弹性执行器(SEA)。主控制器运行一个仅含阻尼的顺应回路,配以6-D双二次-notch滤波器;从控制器通过基于位置的外环实现刚度-阻尼阻抗,通过PID力到位姿的映射。三个时间尺度(硬件I/O、中速阻抗/顺应、低速远程操作消息)被显式解耦,使同一应用能够驱动异构机械臂。在Lite6/FR3测试平台上以150Hz运行时,系统在高达120±40ms延迟和1%丢包率下稳定跟踪,接触时匹配规定的虚拟刚度,并在被动式测试中表现出良好的累积能量特征。

英文摘要

Existing bilateral teleoperation platforms still rely on costly rigid six-axis force/torque sensors, tightly coupled leader-follower hardware, and kilohertz control loops. We present a Cartesian bilateral framework built on the hardware-agnostic WinGs Operating Studio (WOS) middleware, in which a low-cost compliant 6-DOF pose-and-force sensing end-effector, Delta6, is mounted on both sides so that each manipulator behaves as an end-effector 6-DOF series elastic actuator (SEA). The leader runs a damping-only admittance loop with a 6-D biquad notch filter; the follower realizes a stiffness-damping impedance through a position-based outer loop with a PID wrench-to-pose mapping. Three time scales (hardware I/O, mid-rate impedance/admittance, low-rate teleoperation messages) are explicitly decoupled, enabling the same application to drive heterogeneous arms. On a Lite6/FR3 testbed at 150 Hz, the system tracks stably under delays up to $120\pm40$ ms and 1% packet loss, matches the prescribed virtual stiffness in contact, and shows a favorable cumulative energy signature in passivity-style tests.

2605.19250 2026-05-20 cs.AI

Causal Evidence for Attention Head Imbalance in Modality Conflict Hallucination

因果证据:模态冲突幻觉中的注意力头不平衡

Jinrui Jiang, Zhangtai Wu, Zhen Wu, Xinyu Dai

AI总结 本文研究了多模态大语言模型在模态冲突中产生幻觉的原因,通过分析注意力头的因果作用,发现驱动幻觉的头部分布更广且权重更大,而抑制幻觉的头部集中在少量重要头部,提出MACI方法在减少幻觉的同时保持准确性。

详情
AI中文摘要

当多模态大语言模型(MLLMs)优先考虑错误的文本前提而非矛盾的视觉证据时,就会出现模态冲突幻觉。为了理解为什么视觉证据在生成过程中无法占据优势,我们从机制角度出发,考察哪些内部组件驱动或阻碍这一失败。我们通过在五个开源MLLMs上进行头部层面的因果分析,识别出两组具有相反因果作用的注意力头:驱动幻觉的头部和抑制幻觉的头部。我们发现一种一致的不对称性:驱动效应更广泛分布且具有更大的总权重,而抑制效应集中在少量重要头部。消融实验进一步证实,这些组在生成过程中产生相反效果:分布驱动影响和局部抑制共同形成不平衡的路由结构,使生成偏向于错误前提。受此发现启发,我们提出了MACI(模态冲突感知因果干预),一种条件干预方法,仅在检测到冲突时抑制因果识别出的驱动幻觉头部。在五个MLLMs上,MACI在MMMC基准测试中实现了最大的幻觉减少,同时在幻觉准确性之间取得了有利的权衡,并能够零样本转移到SCI-SemanticConflict测试。

英文摘要

Modality-conflict hallucination occurs when multimodal large language models (MLLMs) prioritize erroneous textual premises over contradictory visual evidence. To understand why visual evidence fails to prevail during generation, we take a mechanistic perspective and examine which internal components drive or resist this failure. We perform head-level causal analysis using path patching across five open-source MLLMs and identify two groups of attention heads with opposing causal roles: hallucination-driving heads and hallucination-resisting heads. We find a consistent asymmetry: driving effects are more broadly distributed and carry greater aggregate weight, whereas resisting effects concentrate in a small number of high-importance heads. Ablation experiments further confirm that these groups exert opposing effects during generation: distributed driving influence and localized resistance together form an imbalanced routing structure that biases generation toward the erroneous premise. Motivated by this finding, we propose MACI (Modality-conflict-Aware Causal Intervention), a conditional intervention that suppresses causally identified hallucination-driving heads only when conflict is detected. Across five MLLMs, MACI achieves the largest hallucination reduction among compared inference-time baselines on the MMMC benchmark with a favorable hallucination-accuracy trade-off, and transfers zero-shot to the SCI-SemanticConflict test.

2605.19249 2026-05-20 cs.LG

Beyond Extrapolation: Knowledge Utilization Paradigm with Bidirectional Inspiration for Time Series Forecasting

超越外推:基于双向启发的知识利用范式用于时间序列预测

Liu Chong, Yingjie Zhou, Hao Li, Pengyang Wang, Qingsong Wen, Ce Zhu

AI总结 本文提出了一种新的时间序列预测范式KUP-BI,通过从训练历史库中提炼出延续式知识,为双向预测提供结构化知识,从而提升预测性能。

Comments Accepted to ICML 2026. 18 pages, 6 figures

详情
AI中文摘要

时间序列预测在能源、交通和公共卫生等场景中至关重要。然而,大多数现有预测模型主要依赖单向推理,即从历史映射到目标,而忽略了由修订的自然链('历史(模型输入)--目标(真实输出)--目标后延续')提供的结构信息。目标后延续记录了轨迹在目标后的发展情况,有助于稳定预测,但无法在推理时观测到。本文旨在获得当前输入的近似后延续代理,为双向预测提供结构化知识。该想法被实例化为KUP-BI(Knowledge Utilization Paradigm with Bidirectional Inspiration),一种新的时间序列建模范式,从仅训练的历史库中提炼出延续式知识(作为近似后延续代理),并将其整合到标准预测骨干中。输入流和延续代理流通过轻量级的特征级门控模块进行融合。这种设计不引入训练轨迹中已包含的信息之外的内容;相反,它提供了一种结构化的归纳偏置,帮助骨干利用典型的延续模式,而不是仅依赖参数外推。在六个公开数据集上的实验结果表明,KUP-BI在提升最先进模型的预测性能方面表现一致,且具有较小的额外开销。

英文摘要

Time-series forecasting is critical in various scenarios, such as energy, transportation, and public health. However, most existing forecasters rely primarily on one-way inference, \textit{i.e.}, mapping \textbf{history} to \textbf{target}, and overlook the structural information provided by a revised natural chain (``\textbf{history} (model input) -- \textbf{target} (ground-truth output) -- \textbf{post-target continuation}''). The post-target continuation records how trajectories evolve after the target, which can help stabilize forecasting, but it is not observable at inference time. In this work, we aim to obtain an approximate proxy of the post-target continuation for the current input, providing structural knowledge for bidirectional forecasting. This idea is instantiated as KUP-BI (Knowledge Utilization Paradigm with Bidirectional Inspiration), a new time-series modeling paradigm that distills continuation-style knowledge (as an approximate post-target continuation proxy) from a \emph{train-only} historical library and integrates it into standard forecasting backbones. The input stream and the continuation-proxy stream are fused via a lightweight feature-level gating module. This design does not introduce information beyond what is already contained in the training trajectories; instead, it provides a structured inductive bias that helps backbones exploit typical continuation patterns rather than relying solely on parametric extrapolation. Experimental results on six public datasets show that KUP-BI consistently improves the forecasting performance of state-of-the-art models, with small additional overhead.

2605.19247 2026-05-20 cs.CV

Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

结构化开放端NAS:利用LLM进行半自动设计知识结构化以实现高效的神经架构搜索

Yuiko Sakuma, Masakazu Yoshimura, Marcel Gröpl, Zitang Sun, Junji Otsuka, Atsushi Irie, Takeshi Ohashi

AI总结 本文提出一种半自动方法,利用LLM结构化模型设计知识,以指导神经架构搜索过程,通过定义高层结构模板和引入FairNAD算法,实现了高效的开放端搜索空间探索,提升了在多个数据集上的性能。

Comments 42 pages

详情
AI中文摘要

当前的神经架构搜索(NAS)方法通常受到预定义、限制性搜索空间的限制。尽管最近的基于大语言模型(LLM)的NAS方法能够实现开放式的搜索空间,但它们往往由于偏见或低质量的设计想法而导致探索效率低下。为了解决这些问题,我们提出了一种半自动的方法来结构化模型设计知识以指导搜索过程。我们的方法首先定义了高层结构模板,然后通过分析论文,利用LLM填充此模板,从而创建了一个丰富且多样的搜索空间,该空间体现了这种结构化设计知识。为了高效地探索这个庞大的空间,我们引入了FairNAD,使用多类型突变,通过公平的想法采样、帕累托感知突变、LLM驱动的迭代突变和细粒度反馈循环实现广泛的探索。我们展示了FairNAD在发现高性能架构方面的有效性,这些架构在CIFAR-10、CIFAR-100和ImageNet16-120上分别比当前最先进的方法提高了0.84、2.17和2.35个点。

英文摘要

Current neural architecture search (NAS) methods are often limited by their predefined, restrictive search spaces. While recent large language model (LLM)-assisted NAS methods enable open-ended search spaces, they often suffer from inefficient exploration due to biased or low-quality design ideas. To address these issues, we propose to semi-automatically structure model design knowledge to guide the search process. Our approach first defines a high-level structural template of architectural attributes. An LLM then populates this template by analyzing papers, creating a rich and diverse search space that embodies this structured design knowledge. To efficiently explore this vast space, we introduce FairNAD, using a multi-type mutation that enables broad exploration through mutation with fair idea sampling, Pareto-aware mutation, LLM-driven iterative mutation, and a fine-grained feedback loop. We demonstrate the effectiveness of FairNAD in discovering high-performing architectures that yield 0.84, 2.17, and 2.35 points improvement on CIFAR-10, CIFAR-100, and ImageNet16-120, respectively, compared to current state-of-the-art methods.

2605.19243 2026-05-20 cs.LG cs.AI cs.CG

Euclidean Embedding of Data Using Local Distances

利用局部距离进行数据的欧几里得嵌入

Dimitris Arabadjis

AI总结 本文研究了在仅给定局部距离图的情况下恢复全局一致的欧几里得嵌入问题,提出了一种能够最优表示这些距离的方法。该方法仅在由成对距离加权的邻域图上操作,不需要任何先前的数据向量表示。通过求解一个变分问题,将图上的局部距离与由嵌入函数微分诱导的欧几里得度量匹配。所得的欧拉-拉格朗日方程以坐标自由形式推导,允许仅从距离图直接评估所有算子。尽管非线性和缺少非线性的显式表达式,这些方程被证明可以作为迭代更新的稀疏线性问题解决。本文的主要贡献包括:(a)推导出在连续体中支配最优欧几里得嵌入的功能方程;(b)一种不依赖于特征向量的表示形式,仅需要邻域距离图;(c)基于纯粹局部图操作的估计程序。我们在合成流形和真实数据集上实验性地评估了所得到的非参数算法,证明了在保持局部度量结构和邻近关系的同时,能够近似全局等距嵌入。

详情
AI中文摘要

我们研究了在仅给定局部距离图的情况下恢复全局一致的欧几里得嵌入问题,并提出了一种能够最优表示这些距离的方法。该方法仅在由成对距离加权的邻域图上操作,不需要任何先前的数据向量表示。嵌入是通过求解一个变分问题来实现的,该问题将图上的局部距离与由嵌入函数微分诱导的欧几里得度量匹配。所得的欧拉-拉格朗日方程以坐标自由形式推导,允许仅从距离图直接评估所有算子。尽管非线性和缺少非线性的显式表达式,这些方程被证明可以作为迭代更新的稀疏线性问题解决。本文的主要贡献包括:(a)推导出在连续体中支配最优欧几里得嵌入的功能方程;(b)一种不依赖于特征向量的表示形式,仅需要邻域距离图;(c)基于纯粹局部图操作的估计程序。我们在合成流形和真实数据集上实验性地评估了所得到的非参数算法,证明了在保持局部度量结构和邻近关系的同时,能够近似全局等距嵌入。

英文摘要

We study the problem of recovering a globally consistent Euclidean embedding of data, given only a local distance graph and propose a method that optimally represents these distances. The method operates solely on a neighborhood graph weighted by pairwise distances, without requiring any prior vector representation of the data. The embedding is obtained by solving a variational problem that matches local, on-graph distances to the Euclidean metric, induced by the differentials of the embedding functions. The resulting Euler-Lagrange equations are derived in a coordinate-free form, enabling direct evaluation of all operators from the distance graph alone. Though non-linear and missing an explicit expression for their non-linearity, these equations are shown to be resolved as an iteratively updated sparse linear problem. The main contributions of the proposed approach are (a) the derivation of the functional equations governing the optimal Euclidean embedding in the continuum, (b) a representation-free formulation that requires only a neighborhood distance graph and no feature vectors and (c) an estimation procedure based exclusively on local graph operations. We experimentally evaluate the resulting non-parametric algorithm on synthetic manifolds and real datasets, demonstrating consistent preservation of local metric structure and neighboring relations, while approximating the global isometric embedding.

2605.19242 2026-05-20 cs.CV cs.AI cs.ET cs.LG cs.MM

PhyWorld: Physics-Faithful World Model for Video Generation

PhyWorld: 用于视频生成的物理忠实世界模型

Pu Zhao, Juyi Lin, Timothy Rupprecht, Arash Akbari, Chence Yang, Rahul Chowdhury, Elaheh Motamedi, Arman Akbari, Yumei He, Chen Wang, Geng Yuan, Weiwei Chen, Yanzhi Wang

AI总结 本文提出PhyWorld,一种通过两阶段训练提升视频生成模型的物理忠实性,以改进世界模拟器的性能,从而更有效地支持物理AI系统。

详情
AI中文摘要

世界模拟器可以在真实世界部署前提供安全且可扩展的环境来训练物理AI系统。大型视频生成模型正成为此类模拟器的有希望的基础,因为它们能够生成多样且逼真的视觉未来。然而,将其用作世界模拟器需要物理忠实的视频延续,即生成的视频应保持由条件输入隐含的物理状态,并以符合基本物理原理的方式演变。我们提出了PhyWorld,一种视频生成世界模型,通过两阶段的后训练来生成时间上一致且物理忠实的场景延续。在第一阶段,我们通过流匹配微调改进视频到视频延续,鼓励稳定视觉属性和帧间一致的运动动态。在第二阶段,我们通过直接偏好优化(DPO)对物理偏好对进行对齐,使模型朝着更符合物理合理性的输出发展。为了评估PhyWorld,我们使用了标准视频质量基准和专门的物理忠实性基准,并对每条物理定律进行评分。实验表明,PhyWorld提高了视频一致性,其在VBench上的平均得分为0.769,比最先进的基线0.756或更低。PhyWorld还提高了物理合理性,其在我们物理忠实性基准上的平均得分为3.09,比最强基线的2.99有所提高。这些结果表明,通过延续和物理偏好信号对大型视频生成模型进行后训练,可以使其成为更有效的物理AI世界模拟器。

英文摘要

World simulators can provide safe and scalable environments for training Physical AI systems before real-world deployment. Large video generation models are emerging as a promising basis for such simulators because they can generate diverse and realistic visual futures. However, using them as world simulators requires physically faithful video continuations, namely, generated videos that preserve the physical state implied by the conditioning input, and evolve in ways consistent with basic physical principles. We propose PhyWorld, a video generation world model designed to produce temporally coherent and physically faithful scene continuations through two-stage post-training. In the first stage, we improve video-to-video continuation with flow matching fine-tuning, encouraging stable visual attributes and coherent motion dynamics across frames. In the second stage, we align generated dynamics with physical principles using Direct Preference Optimization (DPO) over physics preference pairs, guiding the model toward outputs with higher physical plausibility. To evaluate PhyWorld, we use both standard video-quality benchmarks and a dedicated physical-faithfulness benchmark with per-law scoring. Experiments show that PhyWorld improves video consistency, achieving an average score of 0.769 on VBench compared with 0.756 or below for state-of-the-art baselines. PhyWorld also improves physical plausibility, reaching an average score of 3.09 on our physical-faithfulness benchmark compared with 2.99 for the strongest baseline. These results suggest that post-training large video generation models with continuation and physics-preference signals can make them more effective world simulators for Physical AI.

2605.19235 2026-05-20 cs.LG cs.GT

GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning

GAE在不完全信息自博弈强化学习中表现不足

Zhiyuan Fan, Gabriele Farina

AI总结 本文研究了不完全信息博弈中自博弈强化学习中GAE估计器的方差问题,提出Q-boosting和VRPO算法以减少方差并提升性能。

详情
AI中文摘要

不完全信息博弈中的竞争多智能体强化学习需要智能体在部分可观测环境下对抗对手,需要随机策略。虽然使用近端策略优化(PPO)的自博弈强化学习在经验上取得了成功,但其标准优势估计器广义优势估计(GAE)由于随机未来动作的采样而产生额外的方差。在均衡自博弈中,这种方差被均衡策略的随机性放大,并且即使当批评器是精确的时仍然存在。我们通过引入基于集中动作价值批评的Q-boosting,一种方差减少的优势估计器,以及提出方差减少策略优化(VRPO),将此新估计器纳入其中。该算法用多步期望SARSA(λ)轨迹替代了采样的多步备份,每一步计算策略期望以平均动作采样噪声,同时保留PPO的裁剪目标和在线策略演员更新。经验上,VRPO在中等规模到大规模游戏,包括斗地主和头衔无限制德州扑克中都表现出强劲的性能。

英文摘要

Competitive multi-agent reinforcement learning in imperfect-information games requires agents to act under partial observability and against adversarial opponents, necessitating stochastic policies. While self-play reinforcement learning with Proximal Policy Optimization (PPO) has achieved strong empirical success, its standard advantage estimator, generalized advantage estimation, suffers from additional variance due to the sampling of stochastic future actions. This variance is amplified in equilibrium self-play because of the stochastic nature of the equilibrium policy and persists even when the critic is exact. We address this bottleneck by introducing $Q$-boosting, a variance-reduced advantage estimator based on a centralized action-value critic, and propose Variance-Reduced Policy Optimization (VRPO), incorporating this new estimator. The algorithm replaces sampled multi-step backups with a multi-step Expected SARSA$(λ)$ trace, computing policy expectations at each step to average out action-sampling noise, while retaining PPO's clipped objective and on-policy actor updates. Empirically, VRPO consistently achieves strong performance from mid-sized to large-scale games including Dou Dizhu and Heads-Up No-Limit Texas Hold'em.

2605.19234 2026-05-20 cs.CL cs.AI

AI Technologies in Language Access: Attitudes Towards AI and the Human Value of Language Access Managers

人工智能技术在语言接入中的应用:对人工智能的态度以及语言接入管理者的人类价值

Miguel A. Jiménez-Crespo, Stephanie Rodriguez, Alejandro Jaume Losa

AI总结 本文探讨了人工智能在语言接入中的影响,通过分析十位美国语言接入管理者在医疗、法庭、公共服务和地方政府领域的半结构化访谈,揭示了语言接入管理者对人工智能的有条件乐观态度以及对人工智能实施中人类价值和人类监督的高度重视。

Comments 11 pages, 2 tables, Convergence Conference 2026

详情
AI中文摘要

人工智能技术的快速出现正在重塑翻译实践和理论。本文探讨了人工智能在语言接入中的影响。这一领域的特点在于需要服务于广泛且多样化的用户群体,而效率和可及性受到法律要求、伦理和商业矛盾以及安全问题的影响。本文报告了语言接入管理者对人工智能以及人工智能时代的人类价值的态度和看法。方法上,本文呈现了一项关于语言接入和技术的更大研究的子集分析,具体为对十位美国语言接入管理者进行的定性主题分析,这些管理者在医疗、法庭、公共服务和地方政府领域工作。结果表明,语言接入管理者对不可避免的人工智能实施表现出有条件乐观,对风险具有强烈意识,并对人工智能实施和输出中的人类价值和人类监督有深刻承诺。

英文摘要

The rapid emergence of AI technologies is reshaping translation practices and theory across the board. This paper deals with the impact of AI in language access. This area is characterized by the need to serve broad and diverse user populations, within a context where efficiency and access are shaped by legal mandates, ethical and commercial tensions, and safety concerns. This paper reports on the attitudes and perceptions of language access managers towards the AI and the human value in the AI age. Methodologically, this paper presents an analysis of a subset of a broader study on language access and technology, specifically a qualitative thematic analysis of ten semi-structured interviews with language access managers in the USA working in healthcare, court, public service and local government contexts. The results indicate that language access managers show conditional optimism towards the inevitable AI implementations, are strongly risk aware, and deeply committed to the human value and human oversight of AI implementations and output.

2605.19231 2026-05-20 cs.LG stat.ML

DeRegiME: Deep Regime Mixtures for Probabilistic Forecasting under Distribution Shift

DeRegiME:用于分布偏移下概率预测的深度制度混合

Kieran Wood, Stefan Zohren, Stephen J. Roberts

AI总结 DeRegiME通过引入稀疏变分高斯过程,实现了概率预测中的制度混合,解决了神经预测器在处理分布偏移时的不足,提升了预测密度的准确性。

详情
AI中文摘要

我们介绍了DeRegiME--深度制度混合专家--一种直接多时间跨度的概率预测器,它将潜在的不确定性制度与底层信号分开,并使用稀疏变分高斯过程(GP)软地将每个预测位置分配给学习到的重复制度。该过程通过共享门将非平稳制度混合核和学生t分布似然结合起来,从而得到一个单一的稀疏GP后验,而不是GP专家的混合。DeRegiME解决了神经预测器的一个关键限制:点预测丢弃残差不确定性,而概率头--无论是单边际、未解释的混合、分位数集还是扩散样本--很少暴露残差的制度结构。然而,在噪声异方差时间序列中,分布偏移可能突然、逐渐或时间依赖性出现,通常出现在残差不确定性而非条件均值中。DeRegiME提供了一个可解释的均值-残差-噪声分解,通过直接求和的特征空间表示,将制度锚定为残差相似性的聚类,其转换表现为隐含的转折点。有效制度的数量通过粘性打破门进行修剪。我们证明了核的有效性及预测密度的正确性,并在十个基准和三个编码器网格上,DeRegiME在最强大的编码器匹配基线(DeepAR/GluonTS风格的动态学生t头)上将负对数预测密度(NLPD)提高了20.3%,并在CRPS(3.0%)和MSE(4.7%)上获得并行收益。改进在所有数据集中保持一致,这些数据集涵盖了突然、逐渐和季节性偏移。

英文摘要

We introduce DeRegiME -- Deep Regime Mixture of Experts -- a direct multi-horizon probabilistic forecaster that separates latent uncertainty regimes from the underlying signal and softly assigns each forecast location to learned recurring regimes using a sparse variational Gaussian process (GP) whose nonstationary regime-mixing kernel and Student-t likelihood combine per-regime sub-kernels and noise processes via a shared gate. This yields a single sparse-GP posterior, not a mixture of GP experts. DeRegiME addresses a key limitation of neural forecasters: point forecasts discard residual uncertainty, and probabilistic heads -- whether single marginals, uninterpreted mixtures, quantile sets, or diffusion samples -- rarely expose the regime structure of the residual. Yet distribution shift in noisy heteroskedastic time series may be abrupt, gradual, or horizon-dependent and often appears in residual uncertainty rather than the conditional mean. DeRegiME yields an interpretable mean-residual-noise decomposition with a direct-sum feature-space representation that anchors regimes as clusters of residual similarity whose transitions surface as implicit changepoints. The effective number of regimes is pruned by the stick-breaking gate. We prove kernel validity and predictive-density propriety, and across ten benchmarks and three encoder grids DeRegiME improves negative log predictive density (NLPD) by 20.3% over the strongest encoder-matched baseline, a DeepAR/GluonTS-style dynamic Student-t head, with parallel gains on CRPS (3.0%) and MSE (4.7%). Improvements are consistent across all datasets, which span abrupt, gradual, and seasonal shifts.

2605.19230 2026-05-20 cs.CV cs.LG

Robust Mitigation of Age-Dependent Confounding Effects via Sample-Difficulty Decorrelation

通过样本难度去相关性实现鲁棒的年龄依赖性混杂效应缓解

Nikhil Cherian Kurian, Victor Caquilpan Parra, Abin Shoby, Luke Whitbread, Lyle J. Palmer

AI总结 本文提出了一种鲁棒框架,通过针对虚假的年龄相关趋势而非强制不变性来缓解年龄依赖性混杂效应,通过样本难度建模和去相关年龄与主导年龄难度趋势,减少年龄相关的真阳性与假阳性差异,同时保持临床有意义的非线性年龄信息。

Comments 10 Pages, 3 Figures

详情
AI中文摘要

医学图像分类中的年龄依赖性性能差异通常是因为年龄作为混杂因素,将成像形态与疾病流行率联系起来。在实践中,差异可能表现为在疾病流行率较高的年龄过诊断,而在流行率较低的年龄下诊断不足,并在训练测试年龄分布变化时恶化。传统缓解方法强制严格年龄不变性可能会抑制在年龄中编码的诊断性信息。因此,我们提出了一种鲁棒框架,通过针对虚假的年龄相关趋势而非强制不变性来缓解年龄依赖性混杂效应。在预热阶段后,我们表征样本难度并以标签条件方式建模其年龄依赖性趋势。通过使用鲁棒的Huber加权亲和权重去相关年龄与主导年龄难度趋势,削弱由混杂驱动的捷径,同时保留临床有意义的非线性年龄信息。我们进一步引入了一个年龄覆盖分数,通过mini-batch年龄方差缩放去相关惩罚,以确保在有限年龄多样性下稳定的优化。在两个放射学数据集中,我们的方法在最小化AUC影响的同时减少了年龄相关的真阳性与假阳性差异,并在增加的训练测试年龄分布变化下保持稳健。

英文摘要

Age dependent performance disparities in medical image classification often arise because age acts as a confounder, linking imaging morphology with disease prevalence. In practice, disparities can manifest as overdiagnosis at ages where disease prevalence is higher and underdiagnosis at ages where prevalence is lower, and can worsen under train test shifts in the age distribution. Conventional mitigation approaches that enforce strict age invariance may suppress diagnostically meaningful information encoded in age. We therefore propose a robust framework that mitigates the effects of age-dependent confounding by targeting spurious age linked trends rather than enforcing invariance. Following a warm-up phase, we characterize sample difficulty and model its age-dependent trends in a label-conditioned manner. We decorrelate age from dominant age difficulty trends using robust, Huber weighted affinity weights, attenuating confounding-driven shortcuts while preserving clinically meaningful, nonlinear age information. We further introduce an Age Coverage Score that scales the decorrelation penalty by minibatch age variance to ensure stable optimization under limited age diversity. Across two radiology datasets, our approach reduces age dependent true and false positive disparities with minimal AUC impact and remains robust to increasing train test age distribution shifts.