arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 3405
专题追踪
2604.17538 2026-05-26 cs.RO

Novel Algorithms for Smoothly Differentiable and Efficiently Vectorizable Contact Manifold Construction

用于光滑可微且高效可向量化的接触流形构建的新算法

Onur Beker, Andreas René Geist, Anselm Paulus, Georg Martius

发表机构 * University of Tübingen(图宾根大学)

AI总结 针对接触丰富场景中机器人行为优化,提出一种以光滑二次可微性和GPU大规模可向量化为优先的新碰撞检测流水线,包括可微SDF表示、宽/窄阶段例程和凸分解接触融合。

Comments This version adds late-breaking results in preparation for the CR2 workshop in ICRA 2026

详情
AI中文摘要

在接触丰富的环境中生成智能机器人行为是一个目前零阶方法占主导的研究问题。开发利用接触存在下刚体动力学的一阶/二阶信息的方法,在提高求解速度和计算效率方面具有巨大潜力。该研究方向的主要瓶颈在于,由于常见模拟流水线中所有三个步骤(i)碰撞检测、(ii)接触动力学、(iii)时间积分)的病态性,难以获得对数值优化实际有用的梯度和Hessian矩阵。本文提出了一种旨在解决该难题中碰撞检测部分的方法,通过一个从头设计的新流水线,以光滑(即二次)可微性和GPU上的大规模可向量化作为主要优先级。这与标准碰撞检测例程形成对比,后者针对CPU上的运行时间和最小内存占用进行了优化,但采用了阻碍可微性和向量化的逻辑和控制流。所提出的流水线包括以下贡献:i)高度表达力强且计算高效的SDF表示,ii)使用这些表示生成顶点-SDF和边-SDF接触的可微宽阶段和窄阶段例程,iii)基于凸分解的接触融合的可微例程。

英文摘要

Generating intelligent robot behavior in contact-rich settings is a research problem where zeroth-order methods currently prevail. Developing methods that make use of first/second order information about rigid-body dynamics in the presence of contact holds great promise in terms of increasing the solution speed and computational efficiency. The main bottleneck in this research direction is the difficulty in obtaining gradients and Hessians that are actually useful for numerical optimization, due to pathologies in all three steps of a common simulation pipeline: i) collision detection, ii) contact dynamics, iii) time integration. This abstract proposes a method that aims to address the collision detection part of the puzzle, via a novel pipeline designed from scratch with smooth (i.e. twice) differentiability and massive vectorizability on GPUs as the main priorities. This is in contrast to standard collision detection routines that are instead optimized for runtime on CPUs and minimal memory footprint, but do employ logic and control flow that hinder differentiability and vectorization. The proposed pipeline consists of the following contributions: i) highly expressive and compute efficient SDF representations, ii) differentiable broad-phase and narrow-phase routines that use these representations to generate vertex-SDF and edge-SDF contacts, iii) a differentiable routine for convex decomposition based contact blending.

2604.17328 2026-05-26 cs.LG cs.AI

Rethinking the Comparison Unit in Sequence-Level Reinforcement Learning: An Equal-Length Paired Training Framework from Loss Correction to Sample Construction

重新思考序列级强化学习中的比较单元:从损失校正到样本构建的等长配对训练框架

Fei Ding, Yongkang Zhang, Runhao Liu, Yuhao Liao, Zijian Zeng, Huiming Yang, Sibo wang, Linglin Liao

发表机构 * Alibaba Group(阿里巴巴集团) Tsinghua University(清华大学)

AI总结 本文提出序列级相对强化学习中的长度问题本质是比较单元构建问题,并基于此提出等长配对训练框架EqLen,通过双轨同步生成、前缀继承和段掩码构建可比较的训练样本。

详情
AI中文摘要

本文研究了序列级相对强化学习中的长度问题。我们观察到,尽管现有方法部分缓解了与长度相关的现象,但一个更根本的问题仍未得到充分刻画:训练过程中使用的比较单元缺乏内在可比性。基于这一观察,我们提出一个新的视角:长度问题不应仅仅被视为损失缩放或归一化偏差,而应被视为一个比较单元构建问题。我们进一步建立了一个基于样本构建的训练框架,该框架不是对不等长响应进行事后校正,而是在生成过程中主动构建等长、可对齐且可比较的训练段。在该框架内,我们提出了EqLen,一种适用于组相对比较算法(如GRPO、GSPO和RLOO)的具体方法。通过双轨同步生成、前缀继承和段掩码,EqLen高效地收集有效的等长训练段,并实现稳定的训练。

英文摘要

This paper investigates the length problem in sequence-level relative reinforcement learning. We observe that, although existing methods partially alleviate length-related phenomena, a more fundamental issue remains insufficiently characterized: the comparison units used during training lack inherent comparability. Building on this observation, we propose a new perspective: the length problem should not be viewed merely as a loss-scaling or normalization bias, but rather as a \emph{comparison unit construction} problem. We further establish a sample-construction-based training framework that, instead of applying post-hoc corrections to unequal-length responses, proactively constructs equal-length, alignable, and comparable training segments during generation. Within this framework, we propose EqLen, a concrete method applicable to group-relative comparison algorithms such as GRPO, GSPO, and RLOO. Through dual-track synchronous generation, prefix inheritance, and segment masking, EqLen efficiently collects effective equal-length training segments and enables stable

2604.16778 2026-05-26 cs.LG cs.AI

Federation over Text: Insight Sharing for Multi-Agent Reasoning

文本上的联邦:多智能体推理的洞察共享

Dixi Yao, Tahseen Rabbani, Manzil Zaheer, Tian Li

发表机构 * University of Chicago(芝加哥大学) Google DeepMind(谷歌DeepMind)

AI总结 提出一种类似联邦学习的框架FoT,通过迭代聚合多个客户端的本地推理过程,构建跨任务元认知洞察库,无需共享问题实例或任务指令,显著提升推理效果和效率。

Comments 46 pages

详情
AI中文摘要

我们提出了一种类似联邦学习的框架——文本上的联邦(FoT),它使得处理不同任务的多个客户端能够通过迭代地联邦化其本地推理过程,共同生成一个共享的元认知洞察库,而无需共享实际的问题实例或任务指令。与梯度上的联邦(例如分布式训练)不同,FoT在语义层面运作,无需任何梯度优化或监督信号。迭代地,每个客户端运行一个LLM智能体,独立地对其特定任务进行本地思考和自我改进,并将推理轨迹与中央服务器共享,中央服务器将其聚合和提炼成一个跨任务(和跨领域)的洞察库,现有和未来的智能体可以利用该库来改进相关任务的性能。实验表明,FoT在广泛具有挑战性的应用中提高了推理效果和效率,包括数学问题求解、跨领域协作、现实世界日常任务以及机器学习研究洞察发现。具体而言,在前三个应用中,它平均提高了25%的性能得分,同时减少了4%的推理令牌。在研究洞察发现应用中,FoT能够生成覆盖后续论文中80%以上主要贡献的洞察。

英文摘要

We propose a federated learning-like framework, Federation over Text (FoT), that enables multiple clients solving different tasks to collectively generate a shared library of metacognitive insights by iteratively federating their local reasoning processes without sharing actual problem instances or task instructions. Instead of federation over gradients (e.g., as in distributed training), FoT operates at the semantic level without any gradient optimization or supervision signal. Iteratively, each client runs an LLM agent that does local thinking and self-improvement on their specific tasks independently, and shares reasoning traces with a central server, which aggregates and distills them into a cross-task (and cross-domain) insight library that existing and future agents can leverage to improve performance on related tasks. Experiments show that FoT improves reasoning effectiveness and efficiency across a wide range of challenging applications, including mathematical problem solving, cross-domain collaboration, real-world daily tasks, and machine learning research insight discovery. Specifically, it improves average performance scores by 25% while reducing the reasoning tokens by 4% across the first three applications. In the research insight discovery application, FoT is able to generate insights that cover over 80% of the major contributions in the subsequent papers.

2604.12116 2026-05-26 cs.AI cs.SE

The A-R Behavioral Space: Execution-Level Profiling of Tool-Using Language Model Agents in Organizational Deployment

A-R行为空间:组织部署中工具使用语言模型代理的执行层剖析

Shasha Yu, Fiona Carroll, Barry L. Bentley

发表机构 * Cardiff School of Technologies, Cardiff Metropolitan University(卡迪夫技术学院,卡迪夫市政大学) School of Professional Studies, Clark University(专业研究学院,克拉克大学) Harvard Medical School, Harvard University(哈佛医学院,哈佛大学)

AI总结 提出基于动作率(A)和拒绝信号(R)的二维A-R空间及散度(D)来测量执行层行为,评估不同规范制度和自主性配置下语言模型代理的执行与拒绝分布模式。

详情
AI中文摘要

大型语言模型(LLMs)越来越多地被部署为能够执行系统级操作的工具增强型代理。虽然现有基准主要评估文本对齐或任务成功,但较少关注在不同自主性支架下语言信号与可执行行为之间的结构关系。本研究引入了一种基于二维A-R空间的执行层行为测量方法,该空间由动作率(A)和拒绝信号(R)定义,散度(D)捕捉两者之间的协调性。模型在四种规范制度(控制、灰色、困境和恶意)和三种自主性配置(直接执行、规划和反思)下进行评估。该方法不是分配聚合安全分数,而是描述执行和拒绝如何随上下文框架和支架深度重新分布。实证结果表明,执行和拒绝构成了可分离的行为维度,其联合分布在制度和自主性水平上系统性地变化。基于反思的支架通常会在风险情境中促使配置转向更高的拒绝,但重新分布模式在不同模型间存在结构性差异。A-R表示使得横截面行为剖面、支架诱导的转变和协调变异性直接可观察。通过将执行层表征置于标量排名之上,这项工作为在组织环境中分析和选择工具增强的LLM代理提供了面向部署的视角,其中执行权限和风险容忍度各不相同。

英文摘要

Large language models (LLMs) are increasingly deployed as tool-augmented agents capable of executing system-level operations. While existing benchmarks primarily assess textual alignment or task success, less attention has been paid to the structural relationship between linguistic signaling and executable behavior under varying autonomy scaffolds. This study introduces an execution-layer be-havioral measurement approach based on a two-dimensional A-R space defined by Action Rate (A) and Refusal Signal (R), with Divergence (D) capturing coor-dination between the two. Models are evaluated across four normative regimes (Control, Gray, Dilemma, and Malicious) and three autonomy configurations (di-rect execution, planning, and reflection). Rather than assigning aggregate safety scores, the method characterizes how execution and refusal redistribute across contextual framing and scaffold depth. Empirical results show that execution and refusal constitute separable behavioral dimensions whose joint distribution varies systematically across regimes and autonomy levels. Reflection-based scaffolding often shifts configurations toward higher refusal in risk-laden contexts, but redis-tribution patterns differ structurally across models. The A-R representation makes cross-sectional behavioral profiles, scaffold-induced transitions, and coordination variability directly observable. By foregrounding execution-layer characterization over scalar ranking, this work provides a deployment-oriented lens for analyzing and selecting tool-enabled LLM agents in organizational settings where execution privileges and risk tolerance vary.

2604.08988 2026-05-26 cs.AI

SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment

SEA-Eval: 超越情景评估的自进化智能体基准

Sihang Jiang, Lipeng Ma, Zhonghua Hong, Keyi Wang, Zhiyu Lu, Tengfei Wang, Shisong Chen, Jinghao Zhang, Tianjun Pan, Weijia Li, Jiaqing Liang, Yanghua Xiao

发表机构 * Fudan University(复旦大学)

AI总结 本文提出自进化智能体(SEA)的形式化定义及其最小充分架构进化飞轮,并构建首个专门评估SEA的基准SEA-Eval,通过顺序任务流设计量化进化增益、稳定性和隐式对齐收敛。

详情
AI中文摘要

当前基于LLM的智能体在情景任务执行中表现出强大性能,但仍受限于静态工具集和情景遗忘,无法跨任务边界积累经验。本文从数字具身和连续跨任务进化的角度形式化自进化智能体(SEA),引入进化飞轮作为其最小充分架构,并提出SEA-Eval——首个专门设计用于评估SEA的基准。基于飞轮理论,SEA-Eval将SR和T作为主要指标,并通过顺序任务流设计,旨在量化进化增益、进化稳定性和隐式对齐收敛。实证评估表明,在可比成功率下,不同框架在单个任务上的token消耗差异高达31.2倍,且在顺序分析下出现不同的进化轨迹——这表明成功率单独造成能力幻觉,而T的顺序收敛是区分真正进化与伪进化的关键标准。

英文摘要

Current LLM-based agents demonstrate strong performance in episodic task execution but remain constrained by static toolsets and episodic amnesia, failing to accumulate experience across task boundaries. This paper formalizes the Self-Evolving Agent (SEA) from the perspective of digital embodiment and continuous cross-task evolution, introduces the Evolutionary Flywheel as its minimal sufficient architecture, and presents SEA-Eval -- the first benchmark designed specifically for evaluating SEAs. Grounded in Flywheel theory, SEA-Eval establishes SR and T as primary metrics and, through sequential task stream design, is designed to quantify evolutionary gain, evolutionary stability, and implicit alignment convergence. Empirical evaluation reveals that, under comparable success rates, token consumption differs by up to 31.2 times between frameworks on individual tasks, with divergent evolutionary trajectories emerging under sequential analysis -- demonstrating that success rate alone creates a capability illusion and that the sequential convergence of $T$ is the key criterion for distinguishing genuine evolution from pseudo-evolution.

2603.28128 2026-05-26 cs.LG cs.CR

ORACAL: A Robust and Explainable Multimodal Framework for Smart Contract Vulnerability Detection with Causal Graph Enrichment

ORACAL: 一种基于因果图增强的鲁棒且可解释的智能合约漏洞检测多模态框架

Tran Duong Minh Dai, Triet Huynh Minh Le, M. Ali Babar, Van-Hau Pham, Phan The Duy

发表机构 * Information Security Lab, University of Information Technology(信息安全部,信息科技大学) Vietnam National University(越南国家大学) School of Computer Science and Information Technology, Adelaide University(计算机科学与信息技术学院,阿德莱德大学)

AI总结 提出ORACAL异构多模态图学习框架,集成控制流图、数据流图和调用图,通过RAG和LLM增强关键子图,并采用因果注意力机制和PGExplainer实现鲁棒且可解释的智能合约漏洞检测。

Comments 21 pages, version 2

详情
AI中文摘要

尽管图神经网络(GNN)在智能合约漏洞检测中展现出潜力,但仍面临显著限制。同构图模型无法捕捉控制流与数据依赖之间的相互作用,而异构图方法通常缺乏深层语义理解,使其易受对抗攻击。此外,大多数黑盒模型无法提供可解释证据,阻碍了专业审计的信任。为解决这些挑战,我们提出ORACAL(基于可观测RAG增强的因果推理分析),一种异构多模态图学习框架,集成了控制流图(CFG)、数据流图(DFG)和调用图(CG)。ORACAL选择性地用检索增强生成(RAG)和大语言模型(LLM)的专家级安全上下文增强关键子图,并采用因果注意力机制从虚假相关性中分离真正的漏洞指示。为提升透明度,该框架采用PGExplainer生成子图级解释,识别漏洞触发路径。在大型数据集上的实验表明,ORACAL实现了最先进的性能,在主要基准上以91.28%的峰值宏F1超越MANDO-HGT、MTVHunter、GNN-SC和SCVHunter高达39.6个百分点。ORACAL在分布外数据集上保持强泛化能力,在CGT Weakness和DAppScan上分别达到91.8%和77.1%。在可解释性评估中,PGExplainer针对人工标注的漏洞触发路径实现了32.51%的平均交并比(MIoU)。在对抗攻击下,ORACAL将性能下降限制在约2.35%的F1下降,攻击成功率(ASR)仅为3%,优于ASR在10.91%至18.73%之间的SCVHunter和MANDO-HGT。

英文摘要

Although Graph Neural Networks (GNNs) have shown promise for smart contract vulnerability detection, they still face significant limitations. Homogeneous graph models fail to capture the interplay between control flow and data dependencies, while heterogeneous graph approaches often lack deep semantic understanding, leaving them susceptible to adversarial attacks. Moreover, most black-box models fail to provide explainable evidence, hindering trust in professional audits. To address these challenges, we propose ORACAL (Observable RAG-enhanced Analysis with CausAL reasoning), a heterogeneous multimodal graph learning framework that integrates Control Flow Graph (CFG), Data Flow Graph (DFG), and Call Graph (CG). ORACAL selectively enriches critical subgraphs with expert-level security context from Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs), and employs a causal attention mechanism to disentangle true vulnerability indicators from spurious correlations. For transparency, the framework adopts PGExplainer to generate subgraph-level explanations identifying vulnerability triggering paths. Experiments on large-scale datasets demonstrate that ORACAL achieves state-of-the-art performance, outperforming MANDO-HGT, MTVHunter, GNN-SC, and SCVHunter by up to 39.6 percentage points, with a peak Macro F1 of 91.28% on the primary benchmark. ORACAL maintains strong generalization on out-of-distribution datasets with 91.8% on CGT Weakness and 77.1% on DAppScan. In explainability evaluation, PGExplainer achieves 32.51% Mean Intersection over Union (MIoU) against manually annotated vulnerability triggering paths. Under adversarial attacks, ORACAL limits performance degradation to approximately 2.35% F1 decrease with an Attack Success Rate (ASR) of only 3%, surpassing SCVHunter and MANDO-HGT which exhibit ASRs ranging from 10.91% to 18.73%.

2603.18766 2026-05-26 cs.LG

Enhancing the Parameterization of Reservoir Properties for Data Assimilation Using Deep VAE-GAN

利用深度VAE-GAN增强数据同化中储层属性的参数化

M. A. Sampaio, P. H. Ranazzi, M. J. Blunt

发表机构 * Departamento de Engenharia de Minas e de Petróleo, Escola Politécnica, Universidade de São Paulo(圣保罗大学采矿与石油工程系,理工学院) Department of Earth Science and Engineering, Imperial College London(伦敦帝国理工学院地球科学与工程系)

AI总结 提出将VAE-GAN与ESMDA结合,以同时实现高质量储层描述和良好历史拟合,克服传统方法在非高斯分布和有限集合大小上的局限。

详情
AI中文摘要

目前,称为迭代集合平滑器的方法,特别是称为多重数据同化集合平滑器(ESMDA)的方法,可被视为石油储层模拟中历史拟合的最先进技术。然而,这种方法有两个重要限制:使用有限大小的集合来表示分布,以及参数和数据不确定性中的高斯假设。后者尤为重要,因为许多储层属性具有非高斯分布。参数化涉及在更新前将非高斯参数映射到高斯场,然后将其映射回原始域以将集合通过储层模拟器向前传播。一种有前景的参数化方法是通过深度学习模型。最近的研究表明,生成对抗网络(GAN)在数据同化方面表现不佳,但能生成地质上更合理的储层实现,而变分自编码器(VAE)在数据同化中表现优于GAN,但生成的地质模型不太真实。本工作的创新之处在于结合两者的优势,实现一个称为变分自编码器生成对抗网络(VAE-GAN)的深度学习模型,并与ESMDA集成。该方法应用于两个案例研究,一个案例是分类的,另一个是连续渗透率值。我们的发现表明,通过应用VAE-GAN模型,我们可以同时获得高质量的储层描述(就像GAN)和良好的生产曲线历史拟合(就像VAE)。

英文摘要

Currently, the methods called Iterative Ensemble Smoothers, especially the method called Ensemble Smoother with Multiple Data Assimilation (ESMDA) can be considered state-of-the-art for history matching in petroleum reservoir simulation. However, this approach has two important limitations: the use of an ensemble with finite size to represent the distributions and the Gaussian assumption in parameter and data uncertainties. This latter is particularly important because many reservoir properties have non-Gaussian distributions. Parameterization involves mapping non-Gaussian parameters to a Gaussian field before the update and then mapping them back to the original domain to forward the ensemble through the reservoir simulator. A promising approach to perform parameterization is through deep learning models. Recent studies have shown that Generative Adversarial Networks (GAN) performed poorly concerning data assimilation, but generated more geologically plausible realizations of the reservoir, while the Variational Autoencoder (VAE) performed better than the GAN in data assimilation, but generated less geologically realistic models. This work is innovative in combining the strengths of both to implement a deep learning model called Variational Autoencoder Generative Adversarial Network (VAE-GAN) integrated with ESMDA. The methodology was applied in two case studies, one case being categorical and the other with continuous values of permeability. Our findings demonstrate that by applying the VAE-GAN model we can obtain high quality reservoir descriptions (just like GANs) and a good history matching on the production curves (just like VAEs) simultaneously.

2603.16481 2026-05-26 cs.LG cs.SY eess.SY math.OC

Optimal uncertainty bounds for multivariate kernel regression under bounded noise: A Gaussian process-based dual function

有界噪声下多元核回归的最优不确定性界:基于高斯过程的对偶函数

Amon Lahr, Anna Scampicchio, Johannes Köhler, Melanie N. Zeilinger

发表机构 * Institute for Dynamical Systems and Control, ETH Zurich(动态系统与控制研究所,苏黎世联邦理工学院) Department of Electrical Engineering, Chalmers University of Technology(电气工程系,查尔姆斯理工大学) Department of Mechanical Engineering, Imperial College London(机械工程系,伦敦帝国理工学院)

AI总结 针对有界噪声下再生核希尔伯特空间中的多输出函数,提出一种紧致、确定性的不确定性界,通过无约束对偶公式获得,具有与经典高斯过程置信界相同的结构,便于集成到下游优化中。

Comments Extended version

详情
AI中文摘要

非保守的不确定性界对于从含噪数据中对潜在函数进行可靠预测至关重要,因此是安全学习控制的关键推动因素。在该领域,高斯过程回归等核方法因其固有的不确定性量化机制而成为成熟技术。然而,现有方法要么对底层噪声分布施加强假设,要么保守,要么不直接适用于多输出情况,要么难以集成到下游任务中。本文通过提出一种针对再生核希尔伯特空间(RKHS)中多输出函数的紧致、确定性界来应对这些限制,该函数受有界噪声影响。该界通过无约束的对偶公式获得,该公式具有与经典高斯过程置信界相同的结构,因此可以直接集成到下游优化流程中。我们证明了所提出的界推广了现有结果,并使用四旋翼动力学学习的示例说明了其应用。

英文摘要

Non-conservative uncertainty bounds are essential for making reliable predictions about latent functions from noisy data, and thus, a key enabler for safe learning-based control. In this domain, kernel methods such as Gaussian process regression are established techniques, thanks to their inherent uncertainty quantification mechanism. Still, existing bounds either pose strong assumptions on the underlying noise distribution, are conservative, do not directly apply in the multi-output case, or are difficult to integrate into downstream tasks. This paper addresses these limitations by presenting a tight, deterministic bound for multi-output functions in Reproducing Kernel Hilbert Spaces (RKHSs) subject to bounded noise. It is obtained through an unconstrained, duality-based formulation, which shares the same structure as classic Gaussian process confidence bounds, and can thus be straightforwardly integrated into downstream optimization pipelines. We show that the proposed bound generalizes existing results and illustrate its application using an example inspired by quadrotor dynamics learning.

2603.16100 2026-05-26 cs.CV

Reevaluating the Intra-Modal Misalignment Hypothesis in CLIP

重新评估CLIP中的模态内错位假设

Jonas Herzog, Yue Wang

发表机构 * Zhejiang University(浙江大学)

AI总结 本文质疑CLIP的模态内错位假设,通过理论分析和实验证明图像嵌入距离不存在所谓的自由度,且模态内任务性能差异主要源于任务歧义而非错位。

Comments Accepted for CVPR'26. Project Page: https://vision-kek.github.io/Is-CLIP-Really-Misaligned/

详情
AI中文摘要

最近的研究表明,CLIP类对比语言-图像训练产生的嵌入对于纯图像任务并非最优。主要理论是跨模态(语言-图像)对齐损失忽略了模态内(图像-图像)对齐,导致图像间距离校准不良。在本研究中,我们质疑这一模态内错位假设。我们重新审视其基础理论论证、支持该假设的指标以及受影响的性能指标。对于理论论证,我们证明图像嵌入距离不存在所谓的自由度。对于经验度量,我们的发现表明,它们在语言-图像训练模型(CLIP、SigLIP)和图像-图像训练模型(DINO、SigLIP2)上产生相似结果。这表明观察到的现象并非源于前者特有的错位。对常见模态内任务(检索和少样本分类)的实验证实,解决任务歧义(而非所谓的错位)才是获得最佳结果的关键。

英文摘要

Recent research suggested that the embeddings produced by CLIP-like contrastive language-image training are suboptimal for image-only tasks. The main theory is that the inter-modal (language-image) alignment loss ignores intra-modal (image-image) alignment, leading to poorly calibrated distances between images. In this study, we question this intra-modal misalignment hypothesis. We reexamine its foundational theoretical argument, the indicators used to support it, and the performance metrics affected. For the theoretical argument, we demonstrate that there are no such supposed degrees of freedom for image embedding distances. For the empirical measures, our findings reveal they yield similar results for language-image trained models (CLIP, SigLIP) and image-image trained models (DINO, SigLIP2). This indicates the observed phenomena do not stem from a misalignment specific to the former. Experiments on the commonly studied intra-modal tasks retrieval and few-shot classification confirm that addressing task ambiguity, not supposed misalignment, is key for best results.

2603.11583 2026-05-26 cs.CL cs.AI

UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Tasks

UtilityMax Prompting:多目标大语言模型任务的形式化框架

Ofir Marom

发表机构 * Independent Researcher(独立研究者)

AI总结 提出UtilityMax Prompting框架,用影响图和期望效用最大化将多目标LLM任务形式化,在MovieLens 1M数据集上相比自然语言基线提升了精度和NDCG。

详情
AI中文摘要

大语言模型(LLM)任务的成功在很大程度上取决于其提示词。大多数用例使用自然语言指定提示词,当必须同时满足多个目标时,自然语言本质上是模糊的。在本文中,我们引入了UtilityMax Prompting,一个使用形式化数学语言指定任务的框架。我们将任务重构为一个影响图,其中LLM的答案是唯一的决策变量。在图中条件概率分布上定义效用函数,并指示LLM找到最大化期望效用的答案。这迫使LLM明确推理目标的每个组成部分,将其输出导向精确的优化目标,而非主观的自然语言解释。我们在MovieLens 1M数据集上,使用三个前沿模型(Claude Sonnet 4.6、GPT-5.4和Gemini 2.5 Pro)验证了我们的方法,在多目标电影推荐任务中,与自然语言基线相比,在精度和归一化折损累计增益(NDCG)上表现出一致的改进。

英文摘要

The success of a Large Language Model (LLM) task depends heavily on its prompt. Most use-cases specify prompts using natural language, which is inherently ambiguous when multiple objectives must be simultaneously satisfied. In this paper we introduce UtilityMax Prompting, a framework that specifies tasks using formal mathematical language. We reconstruct the task as an influence diagram in which the LLM's answer is the sole decision variable. A utility function is defined over the conditional probability distributions within the diagram, and the LLM is instructed to find the answer that maximises expected utility. This constrains the LLM to reason explicitly about each component of the objective, directing its output toward a precise optimization target rather than a subjective natural language interpretation. We validate our approach on the MovieLens 1M dataset across three frontier models (Claude Sonnet 4.6, GPT-5.4, and Gemini 2.5 Pro), demonstrating consistent improvements in precision and Normalized Discounted Cumulative Gain (NDCG) over natural language baselines in a multi-objective movie recommendation task.

2603.10250 2026-05-26 cs.LG

GeMPO: Generalized Measure Matching for Online Diffusion Reinforcement Learning

GeMPO:在线扩散强化学习的广义度量匹配

Haitong Ma, Chenxiao Gao, Tianyi Chen, Na Li, Bo Dai

发表机构 * Harvard University(哈佛大学) Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出GeMPO框架,通过将扩散RL中的重加权从softmax推广到一般单调函数,并引入负重加权机制,以解决过贪策略和负样本利用不足的问题。

Comments 22 pages, 6 figures

详情
AI中文摘要

扩散策略的强化学习中常用的一类算法对来自行为策略的样本进行softmax重加权,这通常会导致过贪策略,并且未能利用负样本的反馈。在这项工作中,我们引入了GeMPO,一个简单且统一的框架,将扩散RL中的重加权方案从softmax推广到一般单调函数。GeMPO通过度量匹配的视角重新审视扩散RL:首先,通过求解正则化策略优化目标构建虚拟目标策略度量;其次,通过重加权流匹配最小化当前策略与该目标度量之间的散度。这种公式有两个关键优势:i) 它将权重设计扩展到传统的指数重加权之外,允许针对不同的奖励景观进行定制;ii) 通过放松目标度量的非负性约束,我们的框架为负重加权提供了原则性的理由。我们解释了负重加权如何主动使策略远离次优动作,从而促进探索。大量的实证评估表明,GeMPO通过利用这些灵活的加权方案实现了具有竞争力或更优的性能,并且我们提供了在实践中选择重加权方法的实用指南。

英文摘要

A commonly used family of RL algorithms for diffusion policies conducts softmax reweighting over samples from the behavior policy, which often induces an overgreedy policy and fails to utilize feedback from negative samples. In this work, we introduce GeMPO, a simple and unified framework that generalizes reweighting scheme in diffusion RL from softmax to general monotonic functions. GeMPO revisits diffusion RL via a measure matching perspective: First, we construct a virtual target policy measure via solving a regularized policy optimization objective; Second, we minimize the divergence between the current policy and this target measure through reweighted flow matching. This formulation offers two key advantages: i) It extends weight design beyond traditional exponential reweighting, allowing it to be tailored to diverse reward landscapes; and ii) by relaxing the non-negativity constraint on the target measure, our framework provides a principled justification for negative reweighting. We provide interpretations of how negative reweighting actively repels the policy from suboptimal actions and thus facilitates exploration. Extensive empirical evaluations demonstrate that GeMPO achieves competitive or superior performance by leveraging these flexible weighting schemes, and we provide practical guidelines for selecting reweighting methods in practice.

2603.06626 2026-05-26 cs.LG cs.AI

Grouter: Decoupling Routing from Representation for Accelerated MoE Training

Grouter: 将路由与表示解耦以加速MoE训练

Yuqi Xu, Rizhen Hu, Zihan Liu, Mou Sun, Kun Yuan

发表机构 * School of Mathematical Sciences, Peking University, Beijing, China(北京大学数学科学学院) Center for Machine Learning Research, Peking University, Beijing, China(北京大学机器学习研究中心) Yuanpei College, Peking University, Beijing, China(北京大学元培学院) Zhejiang Lab, Hangzhou, China(浙江实验室)

AI总结 提出Grouter方法,通过从预训练MoE模型中蒸馏高质量结构作为固定路由器,解耦结构优化与权重更新,显著加速模型收敛并提升训练吞吐量。

详情
AI中文摘要

传统的混合专家(MoE)训练通常没有任何结构先验,实际上要求模型在训练专家权重的同时,在巨大的组合空间中搜索最优路由策略。这种纠缠常常导致收敛缓慢和训练不稳定。本文介绍了Grouter,一种先发制人的路由方法,通过从完全训练的MoE模型中蒸馏高质量结构,并作为目标模型的固定路由器。通过将结构优化与权重更新解耦,Grouter显著加速了模型收敛的速度和质量。为了确保框架的通用性,我们还引入了专家折叠以适应不同模型配置的Grouter,以及专家调优以重新平衡不同数据分布下的工作负载。此外,通过利用先发制人路由提供的结构先验,我们可以实施有针对性的优化以进一步提高训练吞吐量。实验表明,Grouter实现了卓越的性能和效率,将预训练数据利用率提高了4.28倍,并实现了高达33.5%的吞吐量加速,确立了先发制人路由作为可扩展MoE训练的基本范式。我们在https://github.com/JimmyAwoe/Grouter公开了我们的代码和预训练的Grouter检查点。

英文摘要

Traditional Mixture-of-Experts (MoE) training typically proceeds without any structural priors, effectively requiring the model to simultaneously train expert weights while searching for an optimal routing policy within a vast combinatorial space. This entanglement often leads to sluggish convergence and training instabilities. This paper introduces Grouter, a preemptive routing method that by distilling high-quality structures from fully-trained MoE models and serving as a fixed router for target models. By decoupling structural optimization from weight updates, Grouter significantly accelerates both the speed and quality of model convergence. To ensure the framework's versatility, we also introduce expert folding to adapt Grouter across varying model configurations and expert tuning to rebalance workloads across different data distributions. Furthermore, by leveraging the structural priors provided by preemptive routing, we can implement targeted optimizations to further enhance training throughput. Experiments demonstrate that Grouter achieves superior performance and efficiency which boosts pre-training data utilization by 4.28x and achieves up to 33.5% throughput acceleration, establishing preemptive routing as a fundamental paradigm for scalable MoE training. We publicly release our code and pretrained Grouter checkpoints at https://github.com/JimmyAwoe/Grouter.

2603.05450 2026-05-26 cs.AI cs.CL

Distributed Partial Information Puzzles: Examining Common Ground Construction Under Epistemic Asymmetry

分布式部分信息谜题:在认知不对称下检验共同基础的构建

Yifan Zhu, Mariah Bradford, Kenneth Lai, Timothy Obiso, Videep Venkatesha, James Pustejovsky, Nikhil Krishnaswamy

发表机构 * Brandeis University(布兰迪斯大学) Colorado State University(科罗拉多州立大学)

AI总结 提出分布式部分信息谜题(DPIP)任务,收集多模态数据集,并评估大语言模型与动态认知逻辑方法在追踪信念状态和共同基础构建上的表现。

Comments 10 pages, 4 figures

Journal ref Proceedings of COLING-LREC 2026

详情
AI中文摘要

建立共同基础(一组共享的信念和相互认可的事实)对于协作至关重要,但仍然是当前AI系统面临的挑战,尤其是在多模态、多方设置中,协作者带来不同的信息。我们引入了分布式部分信息谜题(DPIP),这是一个协作构建任务,在认知不对称下引发丰富的多模态交流。我们提供了这些交互的多模态数据集,并在语音、手势和动作模态上进行注释和时间对齐,以支持对命题内容和信念动态的推理。然后,我们评估了两种建模共同基础(CG)的范式:(1)最先进的大语言模型(LLMs),被提示从多模态更新中推断共享信念,以及(2)基于动态认知逻辑(DEL)的公理流水线,逐步执行相同的任务。在注释的DPIP数据上的结果表明,它对现代LLMs跟踪任务进展和信念状态的能力构成了挑战。

英文摘要

Establishing common ground, a shared set of beliefs and mutually recognized facts, is fundamental to collaboration, yet remains a challenge for current AI systems, especially in multimodal, multiparty settings, where the collaborators bring different information to the table. We introduce the Distributed Partial Information Puzzle (DPIP), a collaborative construction task that elicits rich multimodal communication under epistemic asymmetry. We present a multimodal dataset of these interactions, annotated and temporally aligned across speech, gesture, and action modalities to support reasoning over propositional content and belief dynamics. We then evaluate two paradigms for modeling common ground (CG): (1) state-of-the-art large language models (LLMs), prompted to infer shared beliefs from multimodal updates, and (2) an axiomatic pipeline grounded in Dynamic Epistemic Logic (DEL) that incrementally performs the same task. Results on the annotated DPIP data indicate that it poses a challenge to modern LLMs' abilities to track both task progression and belief state.

2603.00777 2026-05-26 cs.CV

DUCX: Decomposing Unfairness in Tool-Using Chest X-ray Agents

DUCX:分解使用工具的胸部X光代理中的不公平性

Zikang Xu, Ruinan Jin, Xiaoxiao Li

发表机构 * Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Anhui, China(人工智能研究所,合肥国家科学中心,安徽,中国) The University of British Columbia, Vancouver, BC V6Z 1Z4, Canada(不列颠哥伦比亚大学,温哥华,BC V6Z 1Z4,加拿大) Vector Institute, Toronto, ON M5G 1M1, Canada(向量研究所,多伦多,ON M5G 1M1,加拿大)

AI总结 提出DUCK框架,通过阶段式公平性分解方法,系统审计使用工具的胸部X光代理中的工具暴露偏差、工具转换偏差和模型推理偏差,揭示端到端评估无法预测的群体差异。

Comments Early accepted by MICCAI 2026

详情
AI中文摘要

随着使用工具的临床AI系统协调专门的视觉和语言模块执行胸部X光问答等任务,医疗代理中的公平性变得至关重要。虽然这些医疗AI代理可以提高灵活性,但其增加的流水线复杂性也为人口统计偏差创造了新的途径,超出了独立模型。我们提出了DUCK,即分解胸部X光代理中的不公平性,这是一个对使用MedRAX实例化的工具型胸部X光代理的公平性进行系统审计的方法。为了定位差异产生的位置,我们引入了一种阶段式公平性分解,将端到端偏差与三个代理特定来源分开:工具暴露偏差,即基于工具存在的效用差距;工具转换偏差,即工具路由模式中的子组差异;以及模型推理偏差,即合成行为中的子组差异。在五个驱动骨干网络上对使用工具的代理框架进行的大量实验表明,端到端性能中存在人口统计差距,均等几率高达20.79%,最低公平-效用权衡降至28.65%。中间行为,包括工具使用、转换模式和推理轨迹,表现出明显的子组差异,这些差异无法仅从端到端评估中预测。例如,在分割工具可用的情况下,子组效用差距高达50%。我们的研究结果强调了过程级公平性审计和去偏的必要性,以确保临床代理系统的公平部署。代码:https://github.com/Nanboy-Ronan/DUCK。

英文摘要

Fairness in medical agents is becoming critical as tool-using clinical AI systems orchestrate specialized vision and language modules for tasks such as chest X-ray question answering. While these medical AI agents can improve flexibility, their added pipeline complexity also creates new pathways for demographic bias beyond standalone models. We present DUCK, Decomposing Unfairness in Chest X-ray agents, a systematic audit of fairness in tool-using chest X-ray agents instantiated with MedRAX. To localize where disparities arise, we introduce a stage-wise fairness decomposition that separates end-to-end bias from three agent-specific sources: tool exposure bias, or utility gaps conditioned on tool presence; tool transition bias, or subgroup differences in tool-routing patterns; and model reasoning bias, or subgroup differences in synthesis behaviors. Extensive experiments on tool-using agentic frameworks across five driver backbones reveal that demographic gaps persist in end-to-end performance, with equalized odds up to 20.79% and the lowest fairness-utility tradeoff down to 28.65%. Intermediate behaviors, including tool usage, transition patterns, and reasoning traces, exhibit distinct subgroup disparities that are not predictable from end-to-end evaluation alone. For example, conditioned on segmentation-tool availability, the subgroup utility gap reaches as high as 50%. Our findings underscore the need for process-level fairness auditing and debiasing to ensure the equitable deployment of clinical agentic systems. Code: https://github.com/Nanboy-Ronan/DUCK.

2603.00191 2026-05-26 cs.LG cs.CV

Task-Driven Subspace Decomposition for Knowledge Sharing and Isolation in LoRA-based Continual Learning

基于LoRA的持续学习中任务驱动的子空间分解用于知识共享与隔离

Lingfeng He, De Cheng, Huaijie Wang, Xi Yang, Nannan Wang, Xinbo Gao

发表机构 * Department of XXX, University of YYY, Location, Country(XXX部门,YYY大学,地点,国家) School of ZZZ, Institute of WWW, Location, Country(ZZZ学院,WWW研究所,地点,国家) State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University, Xi'an, China(信息服务网络国家重点实验室,电信工程学院,西安电子科技大学,西安,中国) School of Electronic Engineering, Xidian University, Xi'an, China(电子工程学院,西安电子科技大学,西安,中国)

AI总结 提出LoDA方法,通过任务驱动分解构建通用和任务特定LoRA子空间,结合梯度对齐优化和闭式重校准,实现知识共享与隔离,提升持续学习性能。

Comments Accepted by ICML 2026

详情
AI中文摘要

持续学习要求模型在不遗忘旧知识的情况下顺序适应新任务。最近,低秩适应(LoRA)作为一种代表性的参数高效微调方法,在持续学习中受到越来越多的关注。几种基于LoRA的持续学习方法通过分离更新空间来减少任务间的干扰,通常从过去任务的估计零空间中构建新空间。然而,它们(i)忽略了任务共享方向,抑制了知识迁移;(ii)未能捕获真正有效的任务特定方向,因为旧任务的这些“零基”在相关任务下对新任务几乎保持不活跃。为了解决这个问题,我们从投影能量的角度研究LoRA的学习能力,并提出了低秩分解与适应(LoDA)。它通过解决两个基于能量的目标,执行任务驱动分解以构建通用和真正的任务特定LoRA子空间,解耦知识共享和隔离的方向。LoDA固定两个子空间上的LoRA下投影,并通过梯度对齐优化方法学习鲁棒的上投影。在每个任务之后,在将LoRA更新集成到主干之前,LoDA为通用更新推导出一个闭式重校准,沿着这个任务共享方向近似特征级联合最优。实验表明,LoDA优于现有的持续学习方法。我们的代码可在https://github.com/HHHLF/LoDA_ICML2026获取。

英文摘要

Continual Learning (CL) requires models to sequentially adapt to new tasks without forgetting old knowledge. Recently, Low-Rank Adaptation (LoRA), a representative Parameter-Efficient Fine-Tuning (PEFT) method, has gained increasing attention in CL. Several LoRA-based CL methods reduce interference across tasks by separating their update spaces, typically building the new space from the estimated null space of past tasks. However, they (i) overlook task-shared directions, which suppresses knowledge transfer, and (ii) fail to capture truly effective task-specific directions since these ``null bases" of old tasks can remain nearly inactive for new task under correlated tasks. To address this, we study LoRA learning capability from a projection energy perspective, and propose Low-rank Decomposition and Adaptation (LoDA). It performs a task-driven decomposition to build general and truly task-specific LoRA subspaces by solving two energy-based objectives, decoupling directions for knowledge sharing and isolation. LoDA fixes LoRA down-projections on two subspaces and learns robust up-projections via a Gradient-Aligned Optimization (GAO) approach. After each task, before integrating the LoRA updates into the backbone, LoDA derives a closed-form recalibration for the general update, approximating a feature-level joint optimum along this task-shared direction. Experiments indicate that LoDA outperforms existing CL methods. Our code is available at https://github.com/HHHLF/LoDA_ICML2026.

2602.23916 2026-05-26 cs.CV cs.AI

Topology-Driven Transferability Estimation of Medical Foundation Models for Segmentation

基于拓扑驱动的医学基础模型分割迁移性估计

Jiaqi Tang, Shaoyang Zhang, Xiaoqi Wang, Jiaying Zhou, Yang Liu, Qingchao Chen

发表机构 * Peking University(北京大学) Hohai University(河海大学) Beijing Normal University-Hong Kong Baptist University United International College(北京师范大学-香港 Baptist大学联合国际学院) National Institute of Health Data Science, Peking University(健康数据科学国家研究院,北京大学) Institute of Medical Technology, Peking University(北京大学医学技术研究院) State Key Laboratory of General Artificial Intelligence, Peking University(通用人工智能国家重点实验室,北京大学)

AI总结 提出拓扑驱动迁移性估计框架,通过全局表示拓扑散度、局部边界感知拓扑一致性和任务自适应融合,无需微调即可高效选择医学基础模型,在OpenMind基准上加权Kendall指标相对提升约31%。

详情
AI中文摘要

大规模自监督学习(SSL)的出现产生了大量的医学基础模型。然而,为特定分割任务选择最优的医学基础模型仍然是一个计算瓶颈。现有的迁移性估计(TE)指标主要针对分类任务设计,依赖于全局统计假设,无法捕捉密集预测所需的拓扑复杂性。我们提出了一种新颖的拓扑驱动迁移性估计框架,评估流形可处理性而非统计重叠。我们的方法引入了三个组成部分:(1)全局表示拓扑散度(GRTD),利用最小生成树量化特征-标签结构同构性;(2)局部边界感知拓扑一致性(LBTC),专门在关键解剖边界评估流形可分离性;(3)任务自适应融合,根据目标任务的语义基数动态整合全局和局部指标。在跨不同解剖目标和SSL基础模型的大规模OpenMind基准上验证,我们的方法在加权Kendall指标上显著优于最先进的基线,相对提升约31%,提供了一种鲁棒的、无需训练的代理,用于高效模型选择而无需微调成本。代码将在接收后公开。

英文摘要

The advent of large-scale self-supervised learning (SSL) has produced a vast zoo of medical foundation models. However, selecting optimal medical foundation models for specific segmentation tasks remains a computational bottleneck. Existing Transferability Estimation (TE) metrics, primarily designed for classification, rely on global statistical assumptions and fail to capture the topological complexity essential for dense prediction. We propose a novel Topology-Driven Transferability Estimation framework that evaluates manifold tractability rather than statistical overlap. Our approach introduces three components: (1) Global Representation Topology Divergence (GRTD), utilizing Minimum Spanning Trees to quantify feature-label structural isomorphism; (2) Local Boundary-Aware Topological Consistency (LBTC), which assesses manifold separability specifically at critical anatomical boundaries; and (3) Task-Adaptive Fusion, which dynamically integrates global and local metrics based on the semantic cardinality of the target task. Validated on the large-scale OpenMind benchmark across diverse anatomical targets and SSL foundation models, our approach significantly outperforms state-of-the-art baselines by around 31% relative improvement in the weighted Kendall metric, providing a robust, training-free proxy for efficient model selection without the cost of fine-tuning. The code will be made publicly available upon acceptance.

2602.23872 2026-05-26 cs.CV cs.RO

Altitude-Adaptive Vision-Only Geo-Localization for UAVs in GPS-Denied Environments

GPS拒止环境下无人机的高度自适应纯视觉地理定位

Xingyu Shao, Mengfan He, Chunyu Li, Liangzheng Sun, Ziyang Meng

发表机构 * Department of Precision Instrument, Tsinghua University(清华大学精密仪器系) School of Aerospace Engineering, Beijing Institute of Technology(北京理工大学航天工程学院) School of Instrumentation Science and Opto-electronics Engineering, Beijing Information Science and Technology University(北京信息科技大学仪器科学与光电工程学院)

AI总结 针对无人机视觉位置识别中高度变化导致的尺度不匹配问题,提出一种基于单目视觉的高度自适应地理定位框架,通过频域变换估计相对高度并用于图像尺度归一化,结合分类-检索视觉位置识别模块实现粗定位,引入质量自适应边缘分类器提升检索鲁棒性。

详情
AI中文摘要

为了解决无人机视觉位置识别中由高度大幅变化引起的尺度不匹配问题,我们提出了一种仅依赖单目视觉的高度自适应地理定位框架。该方法首先通过将输入图像转换到频域,并将高度估计建模为回归作为分类问题,从单张下视图像中估计相对高度。然后利用估计的高度将查询图像裁剪到规范尺度,之后通过分类-检索视觉位置识别模块进行粗定位。为了在图像质量变化的情况下提高检索鲁棒性,我们进一步引入了质量自适应边缘分类器,并通过加权坐标估计对最终位置进行精化,该估计基于前k个检索候选。在两个合成数据集和两个真实飞行数据集上的实验表明,相对高度估计模块在显著高度变化下,下游检索性能有显著提升。与使用相同检索流程但未进行高度归一化相比,我们的视觉位置识别模块通过高度自适应使平均R@1和R@5分别提高了41.50和56.83个百分点,完整系统在报告的工作站硬件上以13.3帧/秒运行。这些结果表明,相对高度估计为跨高度无人机地理定位提供了有效的尺度先验,并在无需辅助距离传感器或时间输入的情况下支持GPS拒止环境下的粗初始化。

英文摘要

To address the scale mismatch caused by large altitude variations in UAV visual place recognition, we propose a monocular vision-only altitude-adaptive geo-localization framework. The method first estimates relative altitude from a single downward-looking image by transforming the input into the frequency domain and formulating altitude estimation as a regression-as-classification (RAC) problem. The estimated altitude is then used to crop the query image to a canonical scale, after which a classification-then-retrieval visual place recognition module performs coarse localization. To improve retrieval robustness under varying image quality, we further introduce a quality-adaptive margin classifier (QAMC) and refine the final location by weighted coordinate estimation over the top retrieved candidates. Experiments on two synthetic datasets and two real-flight datasets show that the relative altitude estimation (RAE) module yields clear overall improvements in downstream retrieval performance under significant altitude changes. With our visual place recognition module, altitude adaptation improves average R@1 and R@5 by 41.50 and 56.83 percentage points, respectively, compared with using the same retrieval pipeline without altitude normalization, and the full system runs at 13.3 frames/s on the reported workstation hardware. These results indicate that relative altitude estimation provides an effective scale prior for cross-altitude UAV geo-localization and supports GPS-denied coarse initialization without auxiliary range sensors or temporal inputs.

2602.23217 2026-05-26 cs.CV cs.NA math.NA

Multidimensional Task Learning: A Unified Tensor Framework for Computer Vision Tasks

多维任务学习:计算机视觉任务的统一张量框架

Alaa El Ichi, Khalide Jbilou

发表机构 * Université du Littoral Cote d’Opale(卢瓦尔海岸大学)

AI总结 提出基于广义爱因斯坦MLP的多维任务学习框架,通过张量运算统一分类、分割和检测等视觉任务,并证明其表达空间大于传统矩阵方法。

Comments This manuscript is under review at Pattern Recognition Letters

详情
AI中文摘要

本文介绍了多维任务学习(MTL),这是一个基于广义爱因斯坦MLP(GE-MLPs)的统一数学框架,通过爱因斯坦积直接在张量上操作。我们认为当前的计算机视觉任务公式本质上受限于基于矩阵的思维:标准架构依赖于矩阵值权重和向量值偏置,需要结构展平,这限制了自然可表达任务的空间。GE-MLPs通过使用张量值参数消除了这一约束,使得能够显式控制哪些维度被保留或收缩,而不会丢失信息。通过严格的数学推导,我们证明了分类、分割和检测是MTL的特例,仅在正式定义的任务空间中的维度配置上有所不同。我们进一步证明,这个任务空间严格大于基于矩阵的公式所能原生表达的空间,从而能够实现原则性的任务配置,例如时空或跨模态预测,这些在传统方法下需要破坏性展平。这项工作为通过张量代数的视角理解、比较和设计计算机视觉任务提供了数学基础。

英文摘要

This paper introduces Multidimensional Task Learning (MTL), a unified mathematical framework based on Generalized Einstein MLPs (GE-MLPs) that operate directly on tensors via the Einstein product. We argue that current computer vision task formulations are inherently constrained by matrix-based thinking: standard architectures rely on matrix-valued weights and vectorvalued biases, requiring structural flattening that restricts the space of naturally expressible tasks. GE-MLPs lift this constraint by operating with tensor-valued parameters, enabling explicit control over which dimensions are preserved or contracted without information loss. Through rigorous mathematical derivations, we demonstrate that classification, segmentation, and detection are special cases of MTL, differing only in their dimensional configuration within a formally defined task space. We further prove that this task space is strictly larger than what matrix-based formulations can natively express, enabling principled task configurations such as spatiotemporal or cross modal predictions that require destructive flattening under conventional approaches. This work provides a mathematical foundation for understanding, comparing, and designing computer vision tasks through the lens of tensor algebra.

2602.19878 2026-05-26 cs.CL cs.LO

Axis-Aligned Semantics for ODRL: Resolving Dimensional Ambiguity in Policy Constraints

面向ODRL的轴对齐语义:解决策略约束中的维度歧义

Daham Mustafa, Diego Collarana, Sabrina Kirrane, Christoph Lange, Christoph Quix, Rafiqul Haque, Yixin Peng, Stefan Decker

发表机构 * RWTH Aachen University(亚琛工业大学) Fraunhofer FIT(弗劳恩霍夫研究所) University of Galway(Galway大学) Vienna University of Economics and Business (WU)(维也纳大学)

AI总结 针对ODRL中多轴操作数导致的维度歧义问题,提出轴分解方法,将约束转化为每个轴上的标量操作,从而将冲突检测简化为盒比较,并定义三值语义,通过基准测试验证了方法的正确性和兼容性。

Comments 17 pages. Preprint. v3: expanded benchmark to 256 problems; revised semantics and profile (OAAP)

详情
AI中文摘要

开放数字权利语言(ODRL)将策略约束表示为左操作数、运算符和值的三元组。然而,多个空间操作数涉及宽度、高度和深度等多轴域,而约束语法未提供明确的轴标识。因此,策略引擎无法确定多个约束是应用于同一轴还是不同轴,导致冲突检测不可靠或不完整。我们通过轴分解解决这一歧义,将多轴操作数替换为全序域上的轴特定标量操作数。每个约束表示每个轴上的一个区间,每个策略表示一个轴对齐的盒,从而将冲突检测简化为盒比较。我们定义了三值语义(冲突、兼容、未知),证明了分解的正确性及其与ODRL的向后兼容性,将其实例化为ODRL轴对齐配置文件(OAAP),并在包含256个ODRL策略问题的基准测试上进行了验证,每个问题以Turtle表示并编译为一阶形式(TPTP)和SMT-LIB形式,使用了Vampire、E、Z3和cvc5求解器。

英文摘要

The Open Digital Rights Language (ODRL) represents policy constraints as triples of a left operand, an operator, and a value. Several spatial operands, however, range over multi-axis domains such as width, height, and depth, while the constraint syntax provides no explicit axis identity. As a result, policy engines cannot determine whether multiple constraints apply to the same axis or different ones, making conflict detection unsound or incomplete. We resolve this ambiguity by axis decomposition, replacing multi-axis operands with axis-specific scalar operands over totally ordered domains. Each constraint then denotes an interval per axis and each policy an axis-aligned box, reducing conflict detection to box comparison. We define a three-valued semantics (Conflict, Compatible, Unknown), prove the decomposition sound and backward compatible with ODRL, instantiate it as ODRL Axis-Aligned Profile (OAAP), and validate it on a benchmark of 256 ODRL policy problems, each expressed in Turtle and compiled to first-order (TPTP) and SMT-LIB form, using Vampire, E, Z3, and cvc5.

2602.18956 2026-05-26 cs.AI

INDUCTION: Finite-Structure Concept Synthesis in First-Order Logic

INDUCTION: 一阶逻辑中的有限结构概念合成

Serafim Batzoglou

发表机构 * Independent Researcher(独立研究者)

AI总结 提出INDUCTION基准,用于一阶逻辑中有限结构的概念合成,通过精确模型检查验证公式的正确性,并发现低冗余公式在未见世界上的泛化能力更强。

详情
AI中文摘要

我们引入了INDUCTION,这是一个用于一阶逻辑中有限结构概念合成的基准。给定具有外延标记目标谓词的小型有限关系世界,模型必须输出一个单一的一阶逻辑公式,该公式统一解释所有世界中的目标,并通过精确模型检查验证其正确性。该基准包括三个设置:FullObs、CI(对比)和EC(存在性完成),并对公式冗余进行惩罚。我们发现了尖锐的难度梯度、持久的困难结构族,并观察到低冗余公式在未见世界上的泛化能力远优于高冗余公式。最新的精英模型在不同任务和性能指标上表现出定性不同的行为,暗示了它们不同的概念泛化策略。

英文摘要

We introduce INDUCTION, a benchmark for finite structure concept synthesis in first order logic. Given small finite relational worlds with extensionally labeled target predicates, models must output a single first order logical formula that explains the target uniformly across worlds, with correctness verified via exact model checking. The benchmark includes three regimes, FullObs, CI (contrastive), and EC (existential completion), nd penalizes formula bloat. We find sharp difficulty gradients, persistent hard structural families, and observe that low bloat formulas generalize far better on held out worlds. Elite recent models show qualitatively different behaviors across tasks and performance metrics, hinting to their different strategies of concept generalization.

2602.16340 2026-05-26 cs.LG stat.ML

The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks

Adam和Muon在光滑齐次神经网络上的隐式偏差

Eitan Gronich, Gal Vardi

发表机构 * Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel(计算机科学与应用数学系,魏茨曼科学研究院,以色列雷霍夫特)

AI总结 研究动量优化器在光滑齐次模型上的隐式偏差,证明Muon、MomentumGD和Signum在衰减学习率下近似于最速下降轨迹,并偏向于对应边际最大化问题的KKT点,同时将分析扩展到Adam和混合范数优化器。

Comments ICML 2026. 8 pages, 1 figure (with appendix: 45 pages, 3 figures)

详情
AI中文摘要

我们研究了动量优化器在光滑齐次模型上的隐式偏差。我们证明,在衰减学习率调度下,像Muon(谱范数)、MomentumGD(ℓ2范数)和Signum(ℓ∞范数)这样的动量最速下降算法是近似最速下降轨迹,从而证明这些算法偏向于对应边际最大化问题的KKT点。我们将分析扩展到Adam(不含稳定性常数),它最大化ℓ∞边际,以及Muon-Signum和Muon-Adam,它们最大化混合范数。我们的实验证实了理论,并表明最大化的边际类型取决于优化器的选择。总体而言,我们的结果扩展了早期关于齐次模型中最速下降和线性模型中动量优化器的工作线。

英文摘要

We study the implicit bias of momentum-based optimizers on smooth homogeneous models. We show that \textit{momentum steepest descent} algorithms like Muon (spectral norm), MomentumGD ($\ell_2$ norm), and Signum ($\ell_\infty$ norm) are \textit{approximate} steepest descent trajectories under a decaying learning rate schedule, proving that these algorithms have a bias towards KKT points of the corresponding margin maximization problem. We extend the analysis to Adam (without the stability constant), which maximizes the $\ell_\infty$ margin, and to Muon-Signum and Muon-Adam, which maximize a hybrid norm. Our experiments corroborate the theory and show that the identity of the margin maximized depends on the choice of optimizer. Overall, our results extend earlier lines of work on steepest descent in homogeneous models and momentum-based optimizers in linear models.

2602.11173 2026-05-26 cs.CL

Author-in-the-Loop Response Generation and Evaluation: Integrating Author Expertise and Intent in Responses to Peer Review

作者参与的回信生成与评估:将作者专业知识和意图整合到对同行评审的回复中

Qian Ruan, Iryna Gurevych

发表机构 * Ubiquitous Knowledge Processing Lab (UKP Lab)(通用知识处理实验室) Department of Computer Science(计算机科学系) Hessian Center for AI (hessian.AI)(海德堡人工智能中心)

AI总结 提出作者参与的回信生成与评估框架,通过引入对齐的评审-回复-修订三元组数据集、支持灵活作者输入和可控生成的REspGen系统以及包含20+指标的综合评估套件REspEval,填补了作者信号利用和评估的空白。

Comments accepted to ACL 2026 Main Conference

详情
AI中文摘要

作者回复(反驳)写作是科学同行评审的关键阶段,需要作者付出大量努力。在实践中,作者拥有领域专业知识、仅作者可用的信息和回复策略——作者专业知识和意图的具体形式——并寻求NLP辅助,将这些信号整合到作者回复生成(ARG)中。然而,这种作者参与范式缺乏正式的NLP表述和系统研究:没有数据集提供细粒度的作者信号,现有的ARG工作缺乏作者输入和控制,也没有评估指标衡量回复对作者信号的反映以及解决评审者关注点的有效性。为填补这些空白,我们引入了(i)Re3Align,第一个大规模的对齐评审-回复-修订三元组数据集,其中修订代理作者信号;(ii)REspGen,一个作者参与的ARG框架,支持灵活的作者输入、多属性控制和评估引导的细化;以及(iii)REspEval,一个包含20多个指标的全面评估套件,涵盖输入利用、可控性、回复质量和话语。使用SOTA LLMs的实验证明了作者输入和评估引导细化的好处、输入特异性对回复质量的影响以及可控性与质量之间的权衡。我们发布了我们的数据集、生成和评估工具。

英文摘要

Author response (rebuttal) writing is a critical stage of scientific peer review that demands substantial author effort. In practice, authors possess domain expertise, author-only information, and response strategies - concrete forms of author expertise and intent - and seek NLP assistance that integrates these signals into author response generation (ARG). Yet this author-in-the-loop paradigm lacks formal NLP formulation and systematic study: no dataset provides fine-grained author signals, existing ARG work lacks author inputs and controls, and no evaluation measures response reflection of author signals and effectiveness in addressing reviewer concerns. To fill these gaps, we introduce (i) Re3Align, the first large-scale dataset of aligned review-response-revision triplets, where revisions proxy author signals; (ii) REspGen, an author-in-the-loop ARG framework supporting flexible author input, multi-attribute control, and evaluation-guided refinement; and (iii) REspEval, a comprehensive evaluation suite with 20+ metrics spanning input utilization, controllability, response quality, and discourse. Experiments with SOTA LLMs demonstrate the benefits of author input and evaluation-guided refinement, the impact of input specificity on response quality, and controllability-quality trade-offs. We release our dataset, generation and evaluation tools.

2602.09130 2026-05-26 cs.LG

UniComp: A Unified Evaluation of Large Language Model Compression via Pruning, Quantization and Distillation

UniComp: 通过剪枝、量化和蒸馏对大型语言模型压缩的统一评估

Jonathan von Rad, Yong Cao, Andreas Geiger

发表机构 * University College London(伦敦大学学院) University of Tübingen(图宾根大学) Tübingen AI Center(图宾根人工智能中心)

AI总结 提出UniComp框架,统一评估剪枝、量化和知识蒸馏三种压缩方法,从性能、可靠性和效率三个维度在40个数据集上分析,发现知识偏差、性能与可靠性解耦以及任务特定校准可提升推理性能。

Comments 18 pages, 5 figures, 18 tables

详情
AI中文摘要

模型压缩对于部署大型语言模型(LLM)日益重要,然而现有的比较研究主要集中在剪枝和量化,且主要基于知识中心的基准进行评估。因此,我们引入了UniComp,一个用于比较剪枝、量化和知识蒸馏的统一评估框架。UniComp从性能、可靠性和效率三个维度评估压缩模型,使用多样化的面向能力和安全性的基准以及硬件感知的效率分析。通过对40个数据集上的六种压缩技术进行评估,我们观察到:(i) 一致的知识偏差,即事实回忆基本保留,而多步推理、多语言和指令遵循能力下降;(ii) 性能与可靠性之间的解耦,表明保留的性能并不一致地意味着保留的可靠性;(iii) 任务特定校准可以在剪枝模型中实现高达50%的推理性能相对提升。

英文摘要

Model compression is increasingly essential for deploying large language models (LLMs), yet existing comparative studies largely focus on pruning and quantization evaluated primarily on knowledge-centric benchmarks. Thus, we introduce UniComp, a unified evaluation framework for comparing pruning, quantization, and knowledge distillation. UniComp evaluates compressed models along three dimensions: performance, reliability, and efficiency, using a diverse set of capability- and safety-oriented benchmarks together with a hardware-aware efficiency analysis. Through evaluation of six compression techniques across 40 datasets, we observe (i) a consistent knowledge bias, where factual recall is largely preserved while multi-step reasoning, multilingual, and instruction-following capabilities degrade; (ii) a decoupling between performance and reliability, indicating that retained performance does not consistently imply preserved reliability; and (iii) that task-specific calibration can yield up to 50% relative improvement of reasoning performance in pruned models.

2602.08615 2026-05-26 cs.CV

Inspiration Seeds: Learning Non-Literal Visual Combinations for Generative Exploration

灵感种子:学习用于生成式探索的非字面视觉组合

Kfir Goldberg, Elad Richardson, Yael Vinker

发表机构 * MIT(麻省理工学院)

AI总结 提出Inspiration Seeds框架,通过CLIP稀疏自编码器提取编辑方向并隔离概念对,实现无需文本提示的两张输入图像的视觉组合生成,支持早期创意阶段的探索性构思。

Comments Project page available at https://kfirgoldberg.github.io/InspirationSeeds/

详情
AI中文摘要

虽然生成模型已成为图像合成的强大工具,但它们通常针对执行精心设计的文本提示进行优化,对于想法形成之前常见的开放式视觉探索支持有限。相比之下,设计师经常从松散连接的视觉参考中汲取灵感,寻找能激发新想法的涌现连接。我们提出了Inspiration Seeds,这是一个将图像生成从最终执行转变为探索性构思的生成框架。给定两张输入图像,我们的模型生成多样且视觉连贯的组合,揭示输入之间的潜在关系,而无需依赖用户指定的文本提示。我们的方法是前馈式的,在完全通过视觉手段分解的合成三元组上训练:我们使用CLIP稀疏自编码器在CLIP潜在空间中提取编辑方向并隔离概念对。通过消除对语言的依赖并支持快速、直观的重组,我们的方法支持在创意工作的早期和模糊阶段进行视觉构思。

英文摘要

While generative models have become powerful tools for image synthesis, they are typically optimized for executing carefully crafted textual prompts, offering limited support for the open-ended visual exploration that often precedes idea formation. In contrast, designers frequently draw inspiration from loosely connected visual references, seeking emergent connections that spark new ideas. We propose Inspiration Seeds, a generative framework that shifts image generation from final execution to exploratory ideation. Given two input images, our model produces diverse, visually coherent compositions that reveal latent relationships between inputs, without relying on user-specified text prompts. Our approach is feed-forward, trained on synthetic triplets of decomposed visual aspects derived entirely through visual means: we use CLIP Sparse Autoencoders to extract editing directions in CLIP latent space and isolate concept pairs. By removing the reliance on language and enabling fast, intuitive recombination, our method supports visual ideation at the early and ambiguous stages of creative work.

2602.06357 2026-05-26 cs.LG

LLM-SAA: LLM-persona Generated Distributions for Decision-making

LLM-SAA:基于LLM人格生成分布的决策方法

Jackie Baek, Yunhan Chen, Ziyu Chi, Will Ma

发表机构 * Stern School of Business, New York University(纽约大学 Stern 商学院) Department of Computer Science, Columbia University(哥伦比亚大学 计算机科学系) Graduate School of Business and Data Science Institute, Columbia University(哥伦比亚大学 商学院和数据科学研究院)

AI总结 研究利用LLM生成分布(如模拟消费者支付意愿)支持下游决策,通过三个经典问题(分类优化、定价、报童模型)评估其实际效用,发现低数据场景下有效,且决策无关指标(如Wasserstein距离)可能误导。

详情
AI中文摘要

LLM可以生成丰富的数据,从模拟人类估值和偏好的虚拟人格,到基于世界知识的需求预测。但这类LLM生成的分布对下游决策的支持程度如何?例如,在定价新产品时,企业可以提示LLM根据产品描述模拟消费者愿意支付的价格,但由此得到的分布对优化价格有多大用处?我们将这种方法称为LLM-SAA,即利用LLM构建估计分布,然后在该分布下优化决策。在本文中,我们研究基于这些分布所诱导的决策来评估其质量的指标。以三个经典决策问题(分类优化、定价和报童模型)为例,我们发现LLM生成的分布在实际中是有用的,尤其是在低数据场景下。我们还表明,在评估这些分布用于决策时,诸如Wasserstein距离等与决策无关的指标可能会产生误导。

英文摘要

LLMs can generate a wealth of data, ranging from simulated personas imitating human valuations and preferences, to demand forecasts based on world knowledge. But how well do such LLM-generated distributions support downstream decision-making? For example, when pricing a new product, a firm could prompt an LLM to simulate how much consumers are willing to pay based on a product description, but how useful is the resulting distribution for optimizing the price? We refer to this approach as LLM-SAA, in which an LLM is used to construct an estimated distribution and the decision is then optimized under that distribution. In this paper, we study metrics to evaluate the quality of these LLM-generated distributions, based on the decisions they induce. Taking three canonical decision-making problems (assortment optimization, pricing, and newsvendor) as examples, we find that LLM-generated distributions are practically useful, especially in low-data regimes. We also show that decision-agnostic metrics such as Wasserstein distance can be misleading when evaluating these distributions for decision-making.

2602.04360 2026-05-26 cs.LG cs.AI cs.CY

Counterfactual Explanations for Hypergraph Neural Networks

超图神经网络的反事实解释

Fabiano Veglianti, Lorenzo Antonelli, Gabriele Tolomei

发表机构 * Department of Computer Control and Management Engineering, Sapienza University(计算机控制与管理工程系,萨皮恩扎大学) Department of Computer Science, Sapienza University(计算机科学系,萨皮恩扎大学)

AI总结 提出CF-HyperGNNExplainer方法,通过最小结构变化生成反事实超图,以解释超图神经网络的预测决策。

详情
AI中文摘要

超图神经网络(HGNNs)有效建模了许多现实系统中的高阶交互,但仍难以解释,限制了其在高风险场景中的部署。我们引入了CF-HyperGNNExplainer,一种针对HGNNs的反事实解释方法,该方法识别改变模型预测所需的最小结构变化。该方法通过仅限于删除节点-超边关联或删除超边的可操作编辑生成反事实超图,产生简洁且结构上有意义的解释。在超图基准数据集上的大量实验表明,CF-HyperGNNExplainer生成了有效且简洁的反事实,突出了对HGNN决策最关键的高阶关系。

英文摘要

Hypergraph neural networks (HGNNs) effectively model higher-order interactions in many real-world systems but remain difficult to interpret, limiting their deployment in high-stakes settings. We introduce CF-HyperGNNExplainer, a counterfactual explanation method for HGNNs that identifies the minimal structural changes required to alter a model's prediction. The method generates counterfactual hypergraphs using actionable edits limited to removing node-hyperedge incidences or deleting hyperedges, producing concise and structurally meaningful explanations. Extensive experiments on hypergraph benchmark datasets show that CF-HyperGNNExplainer generates valid and concise counterfactuals, highlighting the higher-order relations most critical to HGNN decisions.

2602.03983 2026-05-26 cs.RO cs.CV

Efficient Long-Horizon Vision-Language-Action Models via Static-Dynamic Disentanglement

通过静态-动态解耦实现高效长程视觉-语言-动作模型

Weikang Qiu, Huashuo Lei, Tinglin Huang, Rex Ying

发表机构 * Yale University(耶鲁大学) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

AI总结 提出DySta框架,通过将视觉输入解耦为多级静态和动态令牌,减少上下文长度并复用KV缓存,实现高效多帧集成和推理,在基准测试和真实任务中显著提升性能。

详情
AI中文摘要

视觉-语言-动作(VLA)模型最近成为通用机器人控制的一种有前景的范式。基于视觉-语言模型(VLM)架构,VLA模型根据视觉观察和语言指令预测动作,在任务中实现了强大的性能和泛化能力。然而,VLA模型面临两个主要挑战:输入帧的有限上下文窗口,以及由于二次注意力复杂性和大参数数量导致的低效推理。为此,我们提出了DySta,一个将视觉输入解耦为多级静态和动态令牌的框架,使得(1)在帧间保留静态令牌的单一副本以显著减少上下文长度,以及(2)通过轻量级重缓存门(仅在必要时更新)重用静态令牌的键值(KV)缓存。这种设计实现了高效的多帧集成和高效推理。此外,我们引入了一个新的基准测试,更有效地评估VLA模型的多帧集成能力。实验表明,DySta在我们的基准测试中各项指标上提高了24.5%的多帧集成能力,在真实世界记忆依赖任务中绝对成功率达到23.3%,同时在模拟基准测试中推理速度提升2.0倍(成功率+2.3%),在真实世界通用任务中推理速度提升2.2倍(成功率+10.6%)。

英文摘要

Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for generalist robotic control. Built upon vision-language model (VLM) architectures, VLAs predict actions conditioned on visual observations and language instructions, achieving strong performance and generalization across tasks. However, VLAs face two major challenges: a limited context window for input frames and inefficient inference due to the quadratic attention complexity and large parameter counts. To this end, we propose DySta, a framework that disentangles visual inputs into multi-level static and dynamic tokens, which enables (1) retaining a single copy of static tokens across frames to significantly reduce context length, and (2) reusing the key-value (KV) cache of static tokens through a lightweight recache gate that updates only when necessary. This design enables efficient multi-frame integration and efficient inference. In addition, we introduce a new benchmark that more effectively evaluates the multi-frame integration ability of VLAs. Experiments show that Dysta improves multi-frame integration by 24.5% across metrics on our benchmark and 23.3% in absolute success rate on real-world memory-dependent tasks, while accelerating inference by 2.0x (with +2.3% success rate) on simulation benchmarks and 2.2x (with +10.6% success rate) on real-world general tasks.

2602.02843 2026-05-26 cs.CL

Act or Clarify? Modeling Sensitivity to Uncertainty and Cost in Communication

行动还是澄清?建模沟通中对不确定性和成本的敏感性

Polina Tsvilodub, Karl Mulligan, Todd Snider, Robert D. Hawkins, Michael Franke

发表机构 * Department of Linguistics, University of Tübingen(图宾根大学语言系) Department of Linguistics, Stanford University(斯坦福大学语言系)

AI总结 提出基于预期遗憾的计算模型,研究人类在不确定性下是否选择提问澄清,取决于不确定性和错误行动成本之间的理性权衡。

Comments 6 pages, 3 figures, accepted to CogSci 2026

详情
AI中文摘要

在不确定性下决定如何行动时,智能体可以选择行动以减少不确定性,也可以不顾不确定性而行动。在沟通场景中,减少不确定性的一个重要方式是提出澄清问题(CQs)。我们预测,是否提出CQ的决定取决于上下文不确定性和替代行动的成本,并且这些因素相互作用:当错误行动代价高昂时,不确定性最为重要。我们在一个基于预期遗憾的计算模型中形式化了这种相互作用:该模型衡量智能体在当前行动而非拥有完整信息时可能遭受的损失。我们在两个实验中测试了这些预测,一个实验考察对问题的纯语言回应,另一个扩展到在澄清和非语言行动之间的选择。综合来看,我们的结果表明一种理性权衡:人类倾向于在不确定性下行动时,根据可能遭受重大损失的风险比例来寻求澄清。

英文摘要

When deciding how to act under uncertainty, agents may choose to act to reduce uncertainty or they may act despite that uncertainty. In communicative settings, an important way of reducing uncertainty is by asking clarification questions (CQs). We predict that the decision to ask a CQ depends on both contextual uncertainty and the cost of alternative actions, and that these factors interact: uncertainty should matter most when acting incorrectly is costly. We formalize this interaction in a computational model based on expected regret: how much an agent stands to lose by acting now rather than with full information. We test these predictions in two experiments, one examining purely linguistic responses to questions and another extending to choices between clarification and non-linguistic action. Taken together, our results suggest a rational tradeoff: humans tend to seek clarification proportional to the risk of substantial loss when acting under uncertainty.

2602.02544 2026-05-26 cs.LG cs.AI

SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models

SPA-Cache: 扩散语言模型中的自适应缓存奇异代理

Wenhao Sun, Rong-Cheng Tu, Yifu Ding, Zhao Jin, Jingyi Liao, Yongcheng Jing, Dacheng Tao

发表机构 * College of Computing(计算学院) Data Science, Nanyang Technological University, Singapore, Singapore(数据科学,南洋理工大学,新加坡,新加坡)

AI总结 针对扩散语言模型因非因果特性无法使用标准KV缓存导致计算开销大的问题,提出SPA-Cache方法,通过低维奇异代理识别关键令牌并自适应分配缓存预算,实现高达8倍吞吐量提升和2-4倍加速。

Comments Accepted by ICML 2026.The code repository is available at https://github.com/wenhao728/spa-cache

详情
AI中文摘要

尽管扩散语言模型(DLM)为自回归范式提供了一种灵活、任意顺序的替代方案,但其非因果特性排除了标准的KV缓存,迫使在每个解码步骤进行昂贵的隐藏状态重新计算。现有的DLM缓存方法通过选择性隐藏状态更新来降低这一成本;然而,它们仍然受限于(i)昂贵的逐令牌更新识别启发式方法和(ii)僵化的统一预算分配,未能考虑异构的隐藏状态动态。为了解决这些挑战,我们提出了SPA-Cache,它在DLM缓存中联合优化了更新识别和预算分配。首先,我们推导出一个低维奇异代理,能够在低维子空间中识别更新关键令牌,大幅降低更新识别的开销。其次,我们引入一种自适应策略,在不降低生成质量的情况下,为稳定层分配更少的更新。这些贡献共同显著提高了DLM的效率,相比原始解码实现了高达8倍的吞吐量提升,相比现有缓存基线实现了2-4倍的加速。

英文摘要

While Diffusion Language Models (DLMs) offer a flexible, arbitrary-order alternative to the autoregressive paradigm, their non-causal nature precludes standard KV caching, forcing costly hidden state recomputation at every decoding step. Existing DLM caching approaches reduce this cost by selective hidden state updates; however, they are still limited by (i) costly token-wise update identification heuristics and (ii) rigid, uniform budget allocation that fails to account for heterogeneous hidden state dynamics. To address these challenges, we present SPA-Cache that jointly optimizes update identification and budget allocation in DLM cache. First, we derive a low-dimensional singular proxy that enables the identification of update-critical tokens in a low-dimensional subspace, substantially reducing the overhead of update identification. Second, we introduce an adaptive strategy that allocates fewer updates to stable layers without degrading generation quality. Together, these contributions significantly improve the efficiency of DLMs, yielding up to an $8\times$ throughput improvement over vanilla decoding and a $2$--$4\times$ speedup over existing caching baselines.

2602.02474 2026-05-26 cs.CL cs.AI cs.LG

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

MemSkill:面向自进化智能体的可学习与进化记忆技能

Haozhen Zhang, Quanyu Long, Jianzhu Bao, Tao Feng, Weizhi Zhang, Haodong Yue, Wenya Wang

发表机构 * Nanyang Technological University(南洋理工大学) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) University of Illinois Chicago(伊利诺伊大学芝加哥分校) Tsinghua University(清华大学)

AI总结 提出MemSkill框架,将记忆操作转化为可学习和可进化的技能,通过控制器选择技能、执行器生成记忆、设计者进化技能集,形成闭环提升LLM智能体任务性能。

Comments Code is available at https://github.com/ViktorAxelsen/MemSkill

详情
AI中文摘要

大多数大语言模型(LLM)智能体记忆系统依赖少量静态、手工设计的操作来提取记忆。这些固定程序硬编码了关于存储内容和如何修订记忆的人类先验知识,使其在多样化的交互模式下僵化,并在长历史记录上效率低下。为此,我们提出 extbf{MemSkill},将这些操作重新定义为可学习和可进化的记忆技能,即从交互轨迹中提取、整合和修剪信息的结构化可重用例程。受智能体技能设计哲学的启发,MemSkill采用一个 extit{控制器},学习选择少量相关技能,并与基于LLM的 extit{执行器}配对,生成技能引导的记忆。除了学习技能选择,MemSkill引入一个 extit{设计者},定期审查所选技能产生错误或不完整记忆的困难案例,并通过提出改进和新技能来进化技能集。共同地,MemSkill形成了一个闭环流程,改进了技能选择策略和技能集本身。在LoCoMo、LongMemEval、HotpotQA和ALFWorld上的实验表明,MemSkill在强基线上提高了任务性能,并在不同设置下具有良好的泛化能力。进一步分析揭示了技能如何进化,为LLM智能体更自适应、自进化的记忆管理提供了见解。

英文摘要

Most Large Language Model (LLM) agent memory systems rely on a small set of static, hand-designed operations for extracting memory. These fixed procedures hard-code human priors about what to store and how to revise memory, making them rigid under diverse interaction patterns and inefficient on long histories. To this end, we present \textbf{MemSkill}, which reframes these operations as learnable and evolvable memory skills, structured and reusable routines for extracting, consolidating, and pruning information from interaction traces. Inspired by the design philosophy of agent skills, MemSkill employs a \emph{controller} that learns to select a small set of relevant skills, paired with an LLM-based \emph{executor} that produces skill-guided memories. Beyond learning skill selection, MemSkill introduces a \emph{designer} that periodically reviews hard cases where selected skills yield incorrect or incomplete memories, and evolves the skill set by proposing refinements and new skills. Together, MemSkill forms a closed-loop procedure that improves both the skill-selection policy and the skill set itself. Experiments on LoCoMo, LongMemEval, HotpotQA, and ALFWorld demonstrate that MemSkill improves task performance over strong baselines and generalizes well across settings. Further analyses shed light on how skills evolve, offering insights toward more adaptive, self-evolving memory management for LLM agents.