arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.14279 2026-06-03 cs.LG cs.AI cs.CL cs.SI

Whom to Query for What: Adaptive Group Elicitation via Multi-Turn LLM Interactions

为谁查询什么：通过多轮LLM交互的自适应群体征询

Ruomeng Ding, Tianwei Gao, Thomas P. Zollo, Eitan Bachmat, Richard Zemel, Zhun Deng

发表机构 * University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校）； Columbia University（哥伦比亚大学）； Ben-Gurion University of the Negev（贝内-约尔大学内盖夫分校）

AI总结针对有限预算下群体属性不确定性降低问题，提出结合LLM期望信息增益与异构图神经网络传播的自适应群体征询框架，实现问题与受访者联合选择，在三个真实数据集上显著提升群体响应预测。

Comments Published as a conference paper at ICML 2026

详情

AI中文摘要

从调查和其他集体评估中征询信息以减少关于潜在群体属性的不确定性，需要在实际成本和缺失数据下分配有限的提问努力。尽管大型语言模型支持自然语言中的自适应多轮交互，但大多数现有征询方法优化了在固定受访者池中询问什么，并且在响应部分或不完整时不会调整受访者选择或利用群体结构。为解决这一差距，我们研究了自适应群体征询，这是一个多轮设置，其中智能体在明确的查询和参与预算下自适应地选择问题和受访者。我们提出了一个理论基础的框架，该框架结合了(i)基于LLM的期望信息增益目标，用于评分候选问题，以及(ii)异构图神经网络传播，该传播聚合观察到的响应和参与者属性，以插补缺失响应并指导每轮受访者选择。这种闭环过程查询一个小的、信息丰富的个体子集，同时通过结构化相似性推断群体级别的响应。在三个真实世界意见数据集上，我们的方法在预算受限的情况下持续提高了群体级别响应预测，包括在10%受访者预算下CES上相对提升超过12%。

英文摘要

Eliciting information to reduce uncertainty about latent group-level properties from surveys and other collective assessments requires allocating limited questioning effort under real costs and missing data. Although large language models enable adaptive, multi-turn interactions in natural language, most existing elicitation methods optimize what to ask with a fixed respondent pool, and do not adapt respondent selection or leverage population structure when responses are partial or incomplete. To address this gap, we study adaptive group elicitation, a multi-round setting where an agent adaptively selects both questions and respondents under explicit query and participation budgets. We propose a theoretically grounded framework that combines (i) an LLM-based expected information gain objective for scoring candidate questions with (ii) heterogeneous graph neural network propagation that aggregates observed responses and participant attributes to impute missing responses and guide per-round respondent selection. This closed-loop procedure queries a small, informative subset of individuals while inferring population-level responses via structured similarity. Across three real-world opinion datasets, our method consistently improves population-level response prediction under constrained budgets, including a >12% relative gain on CES at a 10% respondent budget.

URL PDF HTML ☆

赞 0 踩 0

2602.11908 2026-06-03 cs.AI cs.CL cs.LG

When Should LLMs Be Less Specific? Selective Abstraction for Reliable Long-Form Text Generation

LLM何时应降低具体性？面向可靠长文本生成的选择性抽象

Shani Goren, Ido Galil, Ran El-Yaniv

发表机构 * Technion（技术离子大学）； NVIDIA（英伟达）

AI总结针对LLM在长文本生成中因低置信度而丢弃有价值信息的问题，提出选择性抽象框架，通过原子级抽象替换不确定内容，在保持语义的同时提升准确性和可靠性。

详情

AI中文摘要

LLM被广泛使用，但仍容易出现事实错误，这削弱了用户信任并限制了在高风险场景中的采用。缓解这一风险的一种方法是为模型配备不确定性估计机制，在置信度低时弃权。然而，这种二元的“全有或全无”方法在长文本场景中过于严格，常常丢弃有价值的信息。我们引入了选择性抽象（SA），这是一个框架，使LLM能够通过选择性地降低不确定内容的细节来用具体性换取可靠性。我们首先通过选择性风险和覆盖率的视角形式化SA。然后，我们提出原子级选择性抽象，这是一种声明级别的实例化，将响应分解为原子声明（简短、自包含的陈述，每个表达一个单一事实），并用更高置信度、更低具体性的抽象替换不确定的原子。为了评估这一框架，我们开发了一个新颖的端到端流水线用于开放式生成，将风险实例化为事实正确性，并使用信息论度量保留信息来衡量覆盖率。在FactScore和LongFact-Objects基准测试上的六个开源模型中，原子级SA始终优于现有基线，在风险-覆盖率曲线下面积（AURC）上比声明移除方法提升高达27.73%，表明降低具体性可以在保留大部分原始含义的同时提升准确性和可靠性。

英文摘要

LLMs are widely used, yet they remain prone to factual errors that erode user trust and limit adoption in high-risk settings. One approach to mitigate this risk is to equip models with uncertainty estimation mechanisms that abstain when confidence is low. However, this binary "all-or-nothing" approach is excessively restrictive in long-form settings, often discarding valuable information. We introduce Selective Abstraction (SA), a framework that enables LLMs to trade specificity for reliability by selectively reducing the detail of uncertain content. We first formalize SA through the lenses of selective risk and coverage. We then propose Atom-wise Selective Abstraction, a claim-level instantiation that decomposes responses into atomic claims (short, self-contained statements each expressing a single fact) and replaces uncertain atoms with higher confidence, less specific abstractions. To evaluate this framework, we develop a novel end-to-end pipeline for open-ended generation that instantiates risk as factual correctness and measures coverage using an information-theoretic measure of retained information. Across six open-source models on the FactScore and LongFact-Objects benchmarks, atom-wise SA consistently outperforms existing baselines, improving the area under the risk-coverage curve (AURC) by up to 27.73% over claim removal, demonstrating that reducing specificity can boost accuracy and reliability while preserving most of their original meaning.

URL PDF HTML ☆

赞 0 踩 0

2602.11804 2026-06-03 cs.CV eess.IV

Efficient Segment Anything with Depth-Aware Fusion and Limited Training Data

基于深度感知融合与有限训练数据的高效分割一切

Yiming Zhou, Xuenjie Xie, Panfeng Li, Albrecht Kunz, Ahmad Osman, Xavier Maldague

发表机构 * University of Cambridge（剑桥大学）

AI总结提出一种轻量级RGB-D融合框架，通过单目深度先验增强EfficientViT-SAM，在仅使用11.2k训练样本（不到SA-1B的0.1%）的情况下，实现比EfficientViT-SAM更高的分割精度。

详情

DOI: 10.1109/ICASSP55912.2026.11464597
Journal ref: ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1731-1735

AI中文摘要

分割一切模型（SAM）实现了令人印象深刻的通用分割性能，但需要大规模数据集（例如1100万张图像）且仅依赖RGB输入。最近的高效变体减少了计算量，但仍依赖于大规模训练。我们提出了一种轻量级RGB-D融合框架，用单目深度先验增强EfficientViT-SAM。深度图通过预训练的估计器生成，并通过专门的深度编码器与RGB特征进行中层融合。仅使用11.2k样本（不到SA-1B的0.1%）训练，我们的方法比EfficientViT-SAM取得了更高的准确率，表明深度线索为分割提供了强大的几何先验。

英文摘要

Segment Anything Models (SAM) achieve impressive universal segmentation performance but require massive datasets (e.g., 11M images) and rely solely on RGB inputs. Recent efficient variants reduce computation but still depend on large-scale training. We propose a lightweight RGB-D fusion framework that augments EfficientViT-SAM with monocular depth priors. Depth maps are generated with a pretrained estimator and fused mid-level with RGB features through a dedicated depth encoder. Trained on only 11.2k samples (less than 0.1\% of SA-1B), our method achieves higher accuracy than EfficientViT-SAM, showing that depth cues provide strong geometric priors for segmentation.

URL PDF HTML ☆

赞 0 踩 0

2602.10352 2026-06-03 cs.CL cs.AI cs.LG

Learning Self-Interpretation from Interpretability Artifacts: Training Lightweight Adapters on Vector-Label Pairs

从可解释性工件中学习自我解释：在向量-标签对上训练轻量级适配器

Keenan Pepper, Alex McKenzie, Florin Pop, Stijn Servaes, Martin Leitgab, Mike Vaiana, Judd Rosenblatt, Michael S. A. Graziano, Diogo de Lucena

发表机构 * University of Washington（华盛顿大学）

AI总结通过训练轻量级适配器（标量仿射适配器，仅需d_model+1参数）在可解释性工件上，保持语言模型完全冻结，实现了跨任务和模型族的可靠自我解释，在稀疏自编码器特征标注、主题识别和多跳推理桥接实体解码等任务上显著优于未训练基线。

Comments 26 pages, 18 tables, 17 figures. Code and data at https://github.com/agencyenterprise/selfie-adapters

详情

AI中文摘要

自我解释方法促使语言模型描述其内部状态，但由于超参数敏感性而仍然不可靠。我们表明，在可解释性工件上训练轻量级适配器，同时保持语言模型完全冻结，可以在任务和模型族中产生可靠的自我解释。一个仅需$d_\text{model}+1$个参数的标量仿射适配器就足够了：训练后的适配器生成稀疏自编码器特征标签，其性能优于训练标签本身（在70B规模下，生成评分为70% vs 50%），以94%的召回率@1识别主题（未训练基线为1%），并在多跳推理中解码既不在提示中也不在响应中出现的桥接实体，从而无需思维链即可揭示隐式推理。仅学习到的偏置向量就占了改进的85%，更简单的适配器比更具表达力的替代方案具有更好的泛化能力。通过提示描述控制模型知识，我们发现从7B到72B参数，自我解释的提升超过了能力提升。我们的结果表明，自我解释随着规模扩大而改善，且无需修改被解释的模型。

英文摘要

Self-interpretation methods prompt language models to describe their own internal states, but remain unreliable due to hyperparameter sensitivity. We show that training lightweight adapters on interpretability artifacts, while keeping the LM entirely frozen, yields reliable self-interpretation across tasks and model families. A scalar affine adapter with just $d_\text{model}+1$ parameters suffices: trained adapters generate sparse autoencoder feature labels that outperform the training labels themselves (70% vs 50% generation scoring at 70B scale), identify topics with 94% recall@1 versus 1% for untrained baselines, and decode bridge entities in multi-hop reasoning that appear in neither prompt nor response, surfacing implicit reasoning without chain-of-thought. The learned bias vector alone accounts for 85% of improvement, and simpler adapters generalize better than more expressive alternatives. Controlling for model knowledge via prompted descriptions, we find self-interpretation gains outpace capability gains from 7B to 72B parameters. Our results demonstrate that self-interpretation improves with scale, without modifying the model being interpreted.

URL PDF HTML ☆

赞 0 踩 0

2602.05302 2026-06-03 cs.AI

PieArena: Ranking and Profiling Language Agents in Realistic Negotiation Scenarios

PieArena：在真实谈判场景中对语言智能体进行排名与画像

Chris Zhu, Sasha Cui, Will Sanok Dufallo, Runzhi Jin, Zhen Xu, Linjun Zhang, Daylian Cain

发表机构 * Yale University（耶鲁大学）； UC Berkeley（加州大学伯克利分校）； BloomBerg（摩根大通）； Rutgers University（罗格斯大学）

AI总结本文提出PieArena基准，通过多智能体交互评估LLM的谈判能力，并开发排名模型与行为画像，发现联合意图框架对中低端模型提升显著，前沿模型（如GPT-5）在谈判中达到或超过人类基线。

详情

AI中文摘要

我们深入评估了LLM的谈判能力，这是一项需要战略推理、心理理论和经济价值创造的核心商业任务。为此，我们引入了PieArena，一个基于精英商学院MBA谈判课程中真实场景的多智能体交互的大规模谈判基准。我们在三种配对模式下评估语言智能体：镜像博弈、交叉博弈和人与语言模型博弈。我们开发了一个用于连续谈判收益的排名模型，该模型生成顺序不变、不确定性量化的排行榜，同时纠正系统性的实验不对称性。我们进一步研究了联合意图智能体框架的效果，发现其收益不对称：对中低端语言模型有大幅提升，而对前沿语言模型的边际收益递减。作为校准锚点，我们收集了受过训练的商学院学生之间以及学生与语言模型之间的谈判数据，发现代表性前沿语言智能体（GPT-5）在我们的评估设置中达到或超过了这一人类基线。除了交易结果，PieArena还提供了多维行为画像，揭示了指令遵从性、计算准确性以及法官评估的欺骗性和声誉方面的跨模型异质性，说明了超越仅基于结果的排行榜的评估价值。

英文摘要

We present an in-depth evaluation of LLMs' ability to negotiate, a central business task requiring strategic reasoning, theory of mind, and economic value creation. To do so, we introduce PieArena, a large-scale negotiation benchmark grounded in multi-agent interactions over realistic scenarios adapted from MBA negotiation courses at an elite business school. We evaluate language agents across three pairing regimes: mirror-play, cross-play, and human-LM play. We develop a ranking model for continuous negotiation payoffs that yields order-invariant, uncertainty-quantified leaderboards while correcting for systematic experimental asymmetries. We further study the effects of joint-intentionality agentic scaffolding and find asymmetric gains, with large improvements for mid- and lower-tier LMs and diminishing returns for frontier LMs. As calibration anchors, we collect human-human and human-LM negotiation data from trained business school students, finding that a representative frontier language agent (GPT-5) matches or exceeds this human baseline in our evaluation settings. Beyond deal outcomes, PieArena provides a multi-dimensional behavioral profile that reveals cross-model heterogeneity in instruction compliance, computation accuracy, as well as judge-assessed deception and reputation, illustrating the value of evaluation beyond outcome-only leaderboards.

URL PDF HTML ☆

赞 0 踩 0

2602.09708 2026-06-03 cs.LG cs.AI cs.CV cs.NA math.NA

Physics-informed diffusion models in spectral space

谱空间中的物理信息扩散模型

Davide Gallon, Philippe von Wurstemberger, Patrick Cheridito, Arnulf Jentzen

发表机构 * ETH Zürich（苏黎世联邦理工学院）

AI总结提出物理信息谱扩散（PISD）方法，结合生成式潜扩散模型与物理信息机器学习，在谱表示潜空间中对偏微分方程参数和解进行扩散建模，通过扩散后验采样施加物理约束和测量条件，在泊松、亥姆霍兹和不可压缩纳维-斯托克斯方程上展现出比现有扩散求解器更高的精度和计算效率。

Comments 18 pages, 10 figures

详情

AI中文摘要

我们提出物理信息谱扩散（PISD），一种将生成式潜扩散模型与物理信息机器学习相结合的方法，用于生成基于部分观测的偏微分方程（PDE）的解，特别包括正向和逆向PDE问题。我们在缩放谱表示的潜空间中通过扩散过程学习PDE参数和解的联合分布，其中高斯噪声对应于具有受控正则性的函数。与基于网格的扩散模型相比，这种谱公式能够实现显著的降维，并确保函数空间中的诱导过程保持在PDE算子定义良好的函数类内。基于扩散后验采样，我们在推理过程中施加物理信息约束和测量条件，在每个扩散步骤应用基于Adam的更新。我们在泊松、亥姆霍兹和不可压缩纳维-斯托克斯方程上评估了所提出的方法，与现有的基于扩散的PDE求解器（在稀疏观测下达到最先进水平）相比，展示了更高的精度和计算效率。代码可在 https://github.com/deeplearningmethods/PISD 获取。

英文摘要

We propose physics-informed spectral diffusion (PISD), a methodology that combines generative latent diffusion models with physics-informed machine learning to generate solutions of partial differential equations (PDEs) conditioned on partial observations, which includes, in particular, forward and inverse PDE problems. We learn the joint distribution of PDE parameters and solutions via a diffusion process in a latent space of scaled spectral representations, where Gaussian noise corresponds to functions with controlled regularity. This spectral formulation enables significant dimensionality reduction compared to grid-based diffusion models and ensures that the induced process in function space remains within a class of functions for which the PDE operators are well defined. Building on diffusion posterior sampling, we enforce physics-informed constraints and measurement conditions during inference, applying Adam-based updates at each diffusion step. We evaluate the proposed approach on Poisson, Helmholtz, and incompressible Navier-Stokes equations, demonstrating improved accuracy and computational efficiency compared with existing diffusion-based PDE solvers, which are state of the art for sparse observations. Code is available at https://github.com/deeplearningmethods/PISD.

URL PDF HTML ☆

赞 0 踩 0

2602.08335 2026-06-03 cs.AI

Who Deserves the Reward? SHARP: Shapley Credit-based Optimization for Multi-Agent System

谁应得奖励？SHARP：基于Shapley信用的多智能体系统优化

Yanming Li, Xuelin Zhang, WenJie Lu, Ziye Tang, Maodong Wu, Haotian Luo, Tongtong Wu, Zijie Peng, Hongze Mi, Yibo Feng, Naiqiang Tan, Chao Huang, Lian Peng, Li Shen

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对多智能体系统中信用分配难题，提出SHARP框架，通过分解奖励机制（全局广播奖励、Shapley边际信用奖励和工具过程奖励）实现精确信用归因，显著提升强化学习性能。

详情

AI中文摘要

通过多智能体系统将大型语言模型（LLMs）与外部工具集成，为分解和解决复杂问题提供了一种有前景的新范式。然而，由于信用分配挑战，训练这些系统仍然非常困难，因为通常不清楚哪个特定功能智能体对决策轨迹的成功或失败负责。现有方法通常依赖稀疏或全局广播奖励，无法捕捉个体贡献，导致强化学习效率低下。为解决这些限制，我们引入了基于Shapley的层次化强化策略归因（SHARP），一种通过精确信用归因优化多智能体强化学习的新框架。SHARP通过跨轨迹组归一化智能体特定优势来有效稳定训练，主要通过一种分解奖励机制实现，该机制包括全局广播准确率奖励、每个智能体的基于Shapley的边际信用奖励，以及提高执行效率的工具过程奖励。在各种真实世界基准上的大量实验表明，SHARP显著优于近期最先进的基线，在单智能体和多智能体方法上分别实现了23.66%和14.05%的平均匹配改进。

英文摘要

Integrating Large Language Models (LLMs) with external tools via multi-agent systems offers a promising new paradigm for decomposing and solving complex problems. However, training these systems remains notoriously difficult due to the credit assignment challenge, as it is often unclear which specific functional agent is responsible for the success or failure of decision trajectories. Existing methods typically rely on sparse or globally broadcast rewards, failing to capture individual contributions and leading to inefficient reinforcement learning. To address these limitations, we introduce the Shapley-based Hierarchical Attribution for Reinforcement Policy (SHARP), a novel framework for optimizing multi-agent reinforcement learning via precise credit attribution. SHARP effectively stabilizes training by normalizing agent-specific advantages across trajectory groups, primarily through a decomposed reward mechanism comprising a global broadcast-accuracy reward, a Shapley-based marginal-credit reward for each agent, and a tool-process reward to improve execution efficiency. Extensive experiments across various real-world benchmarks demonstrate that SHARP significantly outperforms recent state-of-the-art baselines, achieving average match improvements of 23.66% and 14.05% over single-agent and multi-agent approaches, respectively.

URL PDF HTML ☆

赞 0 踩 0

2602.06960 2026-06-03 cs.CL cs.AI

InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

InftyThink+：通过强化学习实现高效且有效的无限时域推理

Yuchen Yan, Liang Jiang, Jin Jiang, Shuaicheng Li, Zujie Wen, Zhiqiang Zhang, Jun Zhou, Jian Shao, Yueting Zhuang, Yongliang Shen

发表机构 * Tsinghua University（清华大学）

AI总结提出InftyThink+框架，通过强化学习优化迭代推理的总结时机、保留内容和恢复策略，在DeepSeek-R1-Distill-Qwen-1.5B上提升AIME24准确率21%，并降低推理延迟。

Comments ICML 2026: https://openreview.net/forum?id=tyul8kXaJU Project Page: https://zju-real.github.io/InftyThink-Plus Code: https://github.com/ZJU-REAL/InftyThink-Plus Models: https://huggingface.co/collections/yanyc/inftythink

详情

AI中文摘要

大型推理模型通过扩展推理时的思维链取得了强劲性能，但这种范式存在二次成本、上下文长度限制以及因中间丢失效应导致的推理退化问题。迭代推理通过定期总结中间思考缓解了这些问题，但现有方法依赖监督学习或固定启发式，未能优化何时总结、保留什么以及如何恢复推理。我们提出InftyThink+，一个端到端强化学习框架，它优化整个迭代推理轨迹，基于模型控制的迭代边界和显式总结。InftyThink+采用两阶段训练方案：监督冷启动后接轨迹级强化学习，使模型学习策略性总结和继续决策。在DeepSeek-R1-Distill-Qwen-1.5B上的实验表明，InftyThink+在AIME24上准确率提升21%，显著优于传统长思维链强化学习，同时在分布外基准上泛化能力更强。此外，InftyThink+大幅降低推理延迟并加速强化学习训练，展示了更强的推理效率与性能提升。

英文摘要

Large reasoning models achieve strong performance by scaling inference-time chain-of-thought, but this paradigm suffers from quadratic cost, context length limits, and degraded reasoning due to lost-in-the-middle effects. Iterative reasoning mitigates these issues by periodically summarizing intermediate thoughts, yet existing methods rely on supervised learning or fixed heuristics and fail to optimize when to summarize, what to preserve, and how to resume reasoning. We propose InftyThink+, an end-to-end reinforcement learning framework that optimizes the entire iterative reasoning trajectory, building on model-controlled iteration boundaries and explicit summarization. InftyThink+ adopts a two-stage training scheme with supervised cold-start followed by trajectory-level reinforcement learning, enabling the model to learn strategic summarization and continuation decisions. Experiments on DeepSeek-R1-Distill-Qwen-1.5B show that InftyThink+ improves accuracy by 21% on AIME24 and outperforms conventional long chain-of-thought reinforcement learning by a clear margin, while also generalizing better to out-of-distribution benchmarks. Moreover, InftyThink+ significantly reduces inference latency and accelerates reinforcement learning training, demonstrating improved reasoning efficiency alongside stronger performance.

URL PDF HTML ☆

赞 0 踩 0

2602.07842 2026-06-03 cs.CL

Evaluating and Calibrating LLM Confidence on Questions with Multiple Correct Answers

评估和校准LLM在多个正确答案问题上的置信度

Yuhan Wang, Shiyu Ni, Zhikai Ding, Zihang Zhan, Yuanzi Li, Keping Bi

发表机构 * State Key Laboratory of AI Safety（人工智能安全国家重点实验室）； Institute of Computing Technology, Chinese Academy of Sciences（中国科学院计算技术研究所）； University of Chinese Academy of Sciences（中国科学院大学）； Tsinghua University（清华大学）； Renmin University of China（中国人民大学）

AI总结针对多正确答案问题导致现有置信度校准方法失效的问题，提出语义置信度聚合（SCA）方法，通过聚合多个高概率采样响应的置信度，在混合答案设置下实现最优校准性能。

详情

AI中文摘要

置信度校准对于使大型语言模型（LLM）可靠至关重要，然而现有的无训练方法主要在单答案问答场景下研究。本文表明，这些方法在存在多个有效答案时会失效，因为同等正确响应之间的分歧导致置信度系统性低估。为了系统研究这一现象，我们引入了MACE基准，包含跨越六个领域的12,000个事实性问题，每个问题有不同数量的正确答案。在15种代表性校准方法和四个LLM系列（7B-72B）上的实验表明，虽然准确率随答案基数增加而提高，但估计置信度持续下降，导致对于混合答案数量的问题出现严重校准偏差。为解决此问题，我们提出语义置信度聚合（SCA），该方法聚合多个高概率采样响应的置信度。SCA在混合答案设置下实现了最先进的校准性能，同时在单答案问题上保持强校准能力。

英文摘要

Confidence calibration is essential for making large language models (LLMs) reliable, yet existing training-free methods have been primarily studied under single-answer question answering. In this paper, we show that these methods break down in the presence of multiple valid answers, where disagreement among equally correct responses leads to systematic underestimation of confidence. To enable a systematic study of this phenomenon, we introduce MACE, a benchmark of 12,000 factual questions spanning six domains with varying numbers of correct answers. Experiments across 15 representative calibration methods and four LLM families (7B-72B) reveal that while accuracy increases with answer cardinality, estimated confidence consistently decreases, causing severe miscalibration for questions with mixed answer counts. To address this issue, we propose Semantic Confidence Aggregation (SCA), which aggregates confidence over multiple high-probability sampled responses. SCA achieves state-of-the-art calibration performance under mixed-answer settings while preserving strong calibration on single-answer questions.

URL PDF HTML ☆

赞 0 踩 0

2602.07639 2026-06-03 cs.CL

Letting Tutor Personas Speak Up for LLMs: Learning Steering Vectors from Dialogue via Preference Optimization

让导师角色为LLMs发声：通过偏好优化从对话中学习引导向量

Jaewook Lee, Alexander Scarlatos, Simon Woodhead, Andrew Lan

发表机构 * University of Massachusetts Amherst（马萨诸塞大学阿姆赫斯特分校）； Eedi

AI总结本文提出使用偏好优化训练引导向量，从人类导师-学生对话中提取导师角色信息，以控制大语言模型的行为，实现多样化的教学风格。

Comments Accepted to ACL 2026 BEA Workshop

详情

AI中文摘要

随着大语言模型（LLMs）作为一类强大的生成式人工智能（AI）的出现，它们在辅导中的应用日益突出。先前基于LLM的辅导工作通常学习单一的辅导策略，未能捕捉辅导风格的多样性。在现实世界的导师-学生互动中，教学意图通过适应性教学策略实现，导师根据学习者的需求调整支架式教学、指导性、反馈和情感支持的级别。这些差异都会影响对话动态和学生参与度。在本文中，我们探讨如何利用嵌入在人类导师-学生对话中的导师角色来引导LLM行为，而不依赖于显式提示指令。我们使用偏好优化训练一个引导向量：一个激活空间方向，用于引导模型响应朝向特定的导师角色。我们发现，这个引导向量捕捉了跨对话上下文的导师特定变化，提高了与真实导师话语的语义对齐，并增加了基于偏好的评估，同时很大程度上保留了词汇相似性。对学习到的缩放系数的进一步分析揭示了跨导师的可解释结构，对应于辅导行为的一致差异。这些结果表明，激活引导提供了一种有效且可解释的方式，利用直接从人类对话数据中获得的信号来控制LLM中导师特定的变化。

英文摘要

With the emergence of large language models (LLMs) as a powerful class of generative artificial intelligence (AI), their use in tutoring has become increasingly prominent. Prior works on LLM-based tutoring typically learn a single tutor policy and do not capture the diversity of tutoring styles. In real-world tutor-student interactions, pedagogical intent is realized through adaptive instructional strategies, with tutors varying the level of scaffolding, instructional directiveness, feedback, and affective support in response to learners' needs. These differences can all impact dialogue dynamics and student engagement. In this paper, we explore how tutor personas embedded in human tutor-student dialogues can be used to guide LLM behavior without relying on explicitly prompted instructions. We train a steering vector using preference optimization: an activation-space direction that guides model responses toward specific tutor personas. We find that this steering vector captures tutor-specific variation across dialogue contexts, improving semantic alignment with ground-truth tutor utterances and increasing preference-based evaluations, while largely preserving lexical similarity. Analysis of the learned scaling coefficients further reveals interpretable structure across tutors, corresponding to consistent differences in tutoring behavior. These results demonstrate that activation steering offers an effective and interpretable way for controlling tutor-specific variation in LLMs using signals derived directly from human dialogue data.

URL PDF HTML ☆

赞 0 踩 0

2511.16275 2026-06-03 cs.CL cs.AI

SeSE: Black-Box Uncertainty Quantification for Large Language Models Based on Structural Information Theory

SeSE: 基于结构信息理论的大语言模型黑盒不确定性量化

Xingtao Zhao, Hao Peng, Dingli Su, Xianghua Zeng, Chunyang Liu, Jinzhi Liao, Philip S. Yu

发表机构 * School of Cyber Science and Technology Beihang University（北航信息科学与技术学院）； School of Computer Science and Engineering Beihang University（北航计算机科学与工程学院）； Didi Chuxing（滴滴出行）； Laboratory for Big Data and Decision National University of Defense Technology（国防科技大学大数据与决策实验室）； Department of Computer Science University of Illinois Chicago（伊利诺伊大学芝加哥分校计算机科学系）

AI总结提出SeSE框架，通过构建语义空间的最优层次抽象并计算结构熵，实现大语言模型的黑盒不确定性量化，理论推广了语义熵并在长文本生成中优于现有方法。

Comments Accepted by UAI 2026

详情

AI中文摘要

可靠的不确定性量化（UQ）对于在安全关键场景中部署大语言模型（LLMs）至关重要，因为它使模型能够在不确定时避免回应，从而避免产生幻觉，即看似合理但事实错误的回应。然而，尽管语义UQ方法取得了先进性能，它们忽略了可能实现更精确不确定性估计的潜在语义结构信息。在本文中，我们提出了语义结构熵（SeSE），一个适用于开源和闭源LLMs的原则性黑盒UQ框架。为了揭示语义空间的内在结构，SeSE通过具有最小结构熵的编码树构建其最优层次抽象。因此，该编码树的结构熵量化了最优压缩后LLM语义空间内的固有不确定性。此外，与主要关注简单短文本生成的现有方法不同，我们将SeSE扩展到为长文本输出提供可解释的、细粒度的不确定性估计。我们从理论上证明SeSE推广了语义熵（LLM中UQ的金标准），并通过24个模型-数据集组合的实验证明其优于强基线的性能。

英文摘要

Reliable uncertainty quantification (UQ) is essential for deploying large language models (LLMs) in safety-critical scenarios, as it enables them to abstain from responding when uncertain, thereby avoiding hallucinations, i.e., plausible yet factually incorrect responses. However, while semantic UQ methods have achieved advanced performance, they overlook latent semantic structural information that could enable more precise uncertainty estimates. In this paper, we propose \underline{Se}mantic \underline{S}tructural \underline{E}ntropy ({SeSE}), a principled black-box UQ framework applicable to both open- and closed-source LLMs. To reveal the intrinsic structure of the semantic space, SeSE constructs its optimal hierarchical abstraction through an encoding tree with minimal structural entropy. The structural entropy of this encoding tree thus quantifies the inherent uncertainty within LLM semantic space after optimal compression. Additionally, unlike existing methods that primarily focus on simple short-form generation, we extent SeSE to provide interpretable, granular uncertainty estimation for long-form outputs. We theoretically prove that SeSE generalizes semantic entropy, the gold standard for UQ in LLMs, and empirically demonstrate its superior performance over strong baselines across 24 model-dataset combinations.

URL PDF HTML ☆

赞 0 踩 0

2602.06219 2026-06-03 cs.RO cs.AI

Coupled Local and Global World Models for Efficient First Order RL

耦合局部与全局世界模型的高效一阶强化学习

Joseph Amigo, Rooholla Khorrambakht, Nicolas Mansard, Ludovic Righetti

发表机构 * Machines in Motion Laboratory, New York University, USA（纽约大学运动机器实验室）； LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France（图卢兹大学LAAS-CNRS中心）； Artificial and Natural Intelligence Toulouse Institute, Toulouse, France（图卢兹人工智能与自然智能研究所）

AI总结提出一种通过解耦一阶梯度方法在数据驱动的世界模型内训练策略的方法，结合局部和全局世界模型实现高效梯度计算，在Push-T任务和四足机器人操作任务中显著优于PPO。

Comments Project website: https://coupled-global-local-wm-rl.pages.dev/

详情

AI中文摘要

世界模型为在标准模拟器难以处理的情况下更忠实地捕捉复杂动力学（包括接触和非刚性）以及复杂感官信息（如视觉感知）提供了一条有前景的途径。然而，这些模型的计算复杂度高，对流行的强化学习方法构成了挑战，这些方法已成功用于模拟器解决复杂运动任务，但在操作任务上仍存在困难。本文介绍了一种完全绕过模拟器的方法，在从机器人与真实环境交互中学习到的世界模型内部训练强化学习策略。其核心是通过一种新颖的解耦一阶梯度方法实现大规模扩散模型的策略训练：全尺度世界模型生成准确的前向轨迹，而轻量级潜在空间代理近似其局部动力学以实现高效梯度计算。这种局部与全局世界模型的耦合确保了高保真展开以及计算上可处理的微分。我们在Push-T操作任务上证明了该方法的有效性，其在样本效率上显著优于PPO。我们还通过四足机器人的自我中心物体操作任务进一步评估了该方法。这些结果共同表明，在数据驱动的世界模型内部学习是解决难以建模的图像空间强化学习任务的一条有前景的途径，无需依赖手工设计的物理模拟器。

英文摘要

World models offer a promising avenue for more faithfully capturing complex dynamics, including contacts and non-rigidity, as well as complex sensory information, such as visual perception, in situations where standard simulators struggle. However, these models are computationally complex to evaluate, posing a challenge for popular RL approaches that have been successfully used with simulators to solve complex locomotion tasks but yet struggle with manipulation. This paper introduces a method that bypasses simulators entirely, training RL policies inside world models learned from robots' interactions with real environments. At its core, our approach enables policy training with large-scale diffusion models via a novel decoupled first-order gradient (FoG) method: a full-scale world model generates accurate forward trajectories, while a lightweight latent-space surrogate approximates its local dynamics for efficient gradient computation. This coupling of a local and global world model ensures high-fidelity unrolling alongside computationally tractable differentiation. We demonstrate the efficacy of our method on the Push-T manipulation task, where it significantly outperforms PPO in sample efficiency. We further evaluate our approach through an ego-centric object manipulation task with a quadruped. Together, these results demonstrate that learning inside data-driven world models is a promising pathway for solving hard-to-model RL tasks in image space without reliance on hand-crafted physics simulators.

URL PDF HTML ☆

赞 0 踩 0

2602.05031 2026-06-03 cs.LG

Laplacian Representations for Decision-Time Planning

用于决策时规划的拉普拉斯表示

Dikshant Shehmar, Matthew Schlegel, Matthew E. Taylor, Marlos C. Machado

发表机构 * University of Cambridge（剑桥大学）

AI总结本文提出利用拉普拉斯表示作为决策时规划的潜在空间，通过多时间尺度捕捉状态空间距离，并基于此设计层次规划算法ALPS，在离线目标条件强化学习任务中优于常用基线。

Comments Accepted at ICML 2026

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)

AI中文摘要

在基于模型的强化学习中，使用学习到的模型进行规划仍然是一个关键挑战。在决策时规划中，状态表示至关重要，因为它们必须支持局部成本计算，同时保持长时程结构。在本文中，我们展示了拉普拉斯表示通过在多时间尺度上捕捉状态空间距离，为规划提供了一个有效的潜在空间。这种表示保留了有意义的距离，并自然地将长时程问题分解为子目标，同时也减轻了长预测范围内出现的复合误差。基于这些特性，我们引入了ALPS，一种层次规划算法，并证明它在来自OGBench（一个以前由无模型方法主导的基准）的离线目标条件强化学习任务选择上优于常用的基线。

英文摘要

Planning with a learned model remains a key challenge in model-based reinforcement learning (RL). In decision-time planning, state representations are critical as they must support local cost computation while preserving long-horizon structure. In this paper, we show that the Laplacian representation provides an effective latent space for planning by capturing state-space distances at multiple time scales. This representation preserves meaningful distances and naturally decomposes long-horizon problems into subgoals, also mitigating the compounding errors that arise over long prediction horizons. Building on these properties, we introduce ALPS, a hierarchical planning algorithm, and demonstrate that it outperforms commonly used baselines on a selection of offline goal-conditioned RL tasks from OGBench, a benchmark previously dominated by model-free methods.

URL PDF HTML ☆

赞 0 踩 0

2507.10419 2026-06-03 cs.LG cs.AI cs.CL stat.ML

Multiple Choice Learning of Low-Rank Adapters for Language Modeling

低秩适配器的多选学习用于语言建模

Victor Letzelter, Hugo Malard, Mathieu Fontaine, Gaël Richard, Slim Essid, Andrei Bursuc, Patrick Pérez

发表机构 * Institut National de la Recherche Scientifique (INRS)（国家科学研究院）

AI总结提出LoRA-MCL训练方案，通过多选学习和低秩适配扩展语言模型的下一词预测，以在推理时解码多样且合理的句子延续。

Comments ICML 2026

2602.03681 2026-06-03 cs.CL cs.LG

Neural Attention Search Linear: Towards Adaptive Token-Level Hybrid Attention Models

神经注意力搜索线性：迈向自适应令牌级混合注意力模型

Difan Deng, Andreas Bentzen Winje, Lukas Fehring, Marius Lindauer

发表机构 * University of Copenhagen（哥本哈根大学）

AI总结提出NAtS-L框架，在同一层内对不同令牌自适应选择线性注意力或softmax注意力，以平衡效率与表达能力。

Comments 21 pages, 12 figures

详情

AI中文摘要

softmax变换器的二次计算复杂度已成为长上下文场景的瓶颈。相比之下，线性注意力模型系列为更高效的序列模型提供了有希望的方向。这些线性注意力模型将过去的KV值压缩成单个隐藏状态，从而在训练和推理期间有效降低复杂度。然而，它们的表达能力仍然受限于隐藏状态的大小。先前的工作提出交错使用softmax和线性注意力层，以在保持表达能力的同时降低计算复杂度。然而，这些模型的效率仍然受限于其softmax注意力层。在本文中，我们提出神经注意力搜索线性（NAtS-L）框架，该框架在同一层内对不同令牌同时应用线性注意力和softmax注意力操作。NAtS-L自动判断一个令牌是否可以由线性注意力模型处理（即仅具有短期影响且可编码为固定大小隐藏状态的令牌），或者是否需要softmax注意力（即包含与长期检索相关信息且需要为未来查询保留的令牌）。通过跨令牌搜索最优的门控DeltaNet和softmax注意力组合，我们展示了NAtS-L提供了一种强大而高效的令牌级混合架构。

英文摘要

The quadratic computational complexity of softmax transformers has become a bottleneck in long-context scenarios. In contrast, linear attention model families provide a promising direction towards a more efficient sequential model. These linear attention models compress past KV values into a single hidden state, thereby efficiently reducing complexity during both training and inference. However, their expressivity remains limited by the size of their hidden state. Previous work proposed interleaving softmax and linear attention layers to reduce computational complexity while preserving expressivity. Nevertheless, the efficiency of these models remains bottlenecked by their softmax attention layers. In this paper, we propose Neural Attention Search Linear (NAtS-L), a framework that applies both linear attention and softmax attention operations within the same layer on different tokens. NAtS-L automatically determines whether a token can be handled by a linear attention model, i.e., tokens that have only short-term impact and can be encoded into fixed-size hidden states, or require softmax attention, i.e., tokens that contain information related to long-term retrieval and need to be preserved for future queries. By searching for optimal Gated DeltaNet and softmax attention combinations across tokens, we show that NAtS-L provides a strong yet efficient token-level hybrid architecture.

URL PDF HTML ☆

赞 0 踩 0

2602.01483 2026-06-03 cs.LG cs.AI stat.ME

Causal Preference Elicitation

因果偏好启发

Edwin V. Bonilla, He Zhao, Daniel M. Steinberg

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种贝叶斯框架，通过主动查询局部边关系来集中有向无环图的后验分布，实现专家参与的因果发现。

2512.00956 2026-06-03 cs.LG cs.CL

WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

WUSH: 面向LLM量化的近最优自适应变换

Jiale Chen, Vage Egiazarian, Roberto L. Castro, Torsten Hoefler, Dan Alistarh

发表机构 * University of Tartu（塔尔图大学）

AI总结提出一种结合Hadamard基与数据依赖二阶矩的非正交变换WUSH，在标准RTN AbsMax缩放块量化器下实现权重-激活联合量化的闭式最优解，显著提升低比特量化精度并支持高效GPU实现。

Comments Published as a conference paper at the 43rd International Conference on Machine Learning (ICML 2026): https://openreview.net/forum?id=ZsECxUkbKB

详情

AI中文摘要

量化LLM权重和激活是实现高效部署的标准方法，但少数极端异常值会拉伸动态范围并放大低比特量化误差。先前的基于变换的缓解方法（例如Hadamard旋转）是固定的且与数据无关，其量化最优性尚不明确。我们推导了在标准RTN AbsMax缩放块量化器下，用于联合权重-激活量化的闭式最优线性块变换，涵盖整数和浮点格式。由此产生的构造WUSH将Hadamard骨干与数据依赖的二阶矩分量相结合，形成一种非正交变换，在温和假设下对FP和INT量化器证明是近最优的，同时支持高效的融合GPU实现。实验上，WUSH在最强Hadamard基线（例如，在Llama-3.1-8B-Instruct的MXFP4上，RTN平均提升+2.8个点，GPTQ提升+0.7个点）上改善了W4A4精度，同时通过FP4 MatMul实现了高达BF16的5.8倍每层吞吐量。源代码可在https://github.com/IST-DASLab/WUSH获取。

英文摘要

Quantizing LLM weights and activations is a standard approach for efficient deployment, but a few extreme outliers can stretch the dynamic range and amplify low-bit quantization errors. Prior transform-based mitigations (e.g., Hadamard rotations) are fixed and data-agnostic, and their optimality for quantization has remained unclear. We derive closed-form optimal linear blockwise transforms for joint weight-activation quantization under standard RTN AbsMax-scaled block quantizers, covering both integer and floating-point formats. The resulting construction, WUSH, combines a Hadamard backbone with a data-dependent second-moment component to form a non-orthogonal transform that is provably near-optimal for FP and INT quantizers under mild assumptions while admitting an efficient fused GPU implementation. Empirically, WUSH improves W4A4 accuracy over the strongest Hadamard-based baselines (e.g., on Llama-3.1-8B-Instruct in MXFP4, it gains +2.8 average points with RTN and +0.7 with GPTQ) while delivering up to 5.8$\times$ per-layer throughput over BF16 via FP4 MatMul. Source code is available at https://github.com/IST-DASLab/WUSH.

URL PDF HTML ☆

赞 0 踩 0

2510.02763 2026-06-03 cs.LG cs.AI

Fusing Multi- and Hyperspectral Satellite Data for Harmful Algal Bloom Monitoring with Self-Supervised and Hierarchical Deep Learning

融合多光谱和高光谱卫星数据用于有害藻华监测的自监督与分层深度学习

Nicholas LaHaye, Kelly M. Luis, Michelle M. Gierach

发表机构 * University of Colorado Boulder（科罗拉多大学博尔德分校）

AI总结提出自监督机器学习框架SIT-FUSE，融合多传感器卫星反射率与TROPOMI太阳诱导荧光数据，通过分层深度聚类生成有害藻华严重程度和物种分类产品，在墨西哥湾和南加州验证了与实测数据的一致性。

详情

DOI: 10.1029/2025EA004881

AI中文摘要

我们提出了一种自监督机器学习框架，用于利用多传感器卫星数据检测和绘制有害藻华（HABs）的严重程度和物种分类。通过融合来自运行极轨卫星仪器（VIIRS、MODIS、OLCI和OCI）的反射率数据与TROPOMI太阳诱导荧光（SIF），我们的框架SIT-FUSE无需每个仪器的标记数据集即可生成HAB严重程度和物种分类产品。该框架采用自监督表示学习和分层深度聚类，将浮游植物细胞丰度和物种分割成可解释的类别，并利用墨西哥湾和南加州（2018-2025年）的原位数据进行了验证。结果显示与总浮游植物、短凯伦藻和拟菱形藻属测量值高度一致。这项工作推进了在地面观测有限的环境中进行可扩展的HAB监测，同时通过分层嵌入实现探索性分析——这是将自监督学习应用于全球水生生物地球化学操作化的关键一步。

英文摘要

We present a self-supervised machine learning framework for detecting and mapping the severity and speciation of harmful algal blooms (HABs) using multi-sensor satellite data. By fusing reflectance data from operational polar-orbiting satellite-based instruments (VIIRS, MODIS, OLCI, and OCI) with TROPOMI solar-induced fluorescence (SIF), our framework, called SIT-FUSE, generates HAB severity and speciation products without requiring per-instrument labeled datasets. The framework employs self-supervised representation learning and hierarchical deep clustering to segment phytoplankton cell abundance and species into interpretable classes, validated against in-situ data from the Gulf of Mexico and Southern California (2018-2025). Results show strong agreement with total phytoplankton, Karena brevis, and Pseudo-nitzschia spp. measurements. This work advances scalable HAB monitoring in environments where ground truth observations are limited, while enabling exploratory analysis via hierarchical embeddings - a critical step toward operationalizing self-supervised learning for global aquatic biogeochemistry.

URL PDF HTML ☆

赞 0 踩 0

2601.23229 2026-06-03 cs.AI cs.CC

Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs

$L_\infty$ 鲁棒 MDP 的策略迭代的强多项式时间复杂度

Ali Asadi, Krishnendu Chatterjee, Ehsan Goharshady, Mehrdad Karrabi, Alipasha Montaseri, Carlo Pagano

发表机构 * Institute for Computer Science, Austrian Academy of Sciences（奥地利科学院计算机科学研究所）； Concordia University（康科迪亚大学）

AI总结针对 $(s,a)$-矩形 $L_\infty$ 鲁棒 MDP 的折扣问题，证明了策略迭代算法在固定折扣因子下具有强多项式时间复杂度。

Comments To Appear in The 39th Annual Conference on Learning Theory (COLT'26)

详情

AI中文摘要

马尔可夫决策过程（MDP）是序列决策中的基本模型。鲁棒 MDP（RMDP）通过允许转移概率存在不确定性并针对最坏情况不确定性进行优化来扩展此框架。特别地，具有 $L_\infty$ 不确定性集的 $(s,a)$-矩形 RMDP 构成一个基础且富有表现力的模型：它们包含经典 MDP 和回合制随机博弈。我们考虑具有折扣收益的此模型。多项式时间和强多项式时间算法的存在性是这些优化模型的基本问题。对于 MDP，线性规划为任意折扣因子提供了多项式时间算法，而 Ye 的开创性工作为固定折扣因子建立了强多项式时间。将这些结果推广到 RMDP 仍然是一个重要的开放问题。在这项工作中，我们证明了鲁棒策略迭代算法在常数（固定）折扣因子下对于 $(s,a)$-矩形 $L_\infty$ RMDP 以强多项式时间运行，解决了一个重要的算法问题。

英文摘要

Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, $(s, a)$-rectangular RMDPs with $L_\infty$ uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games. We consider this model with discounted payoffs. The existence of polynomial and strongly-polynomial time algorithms is a fundamental problem for these optimization models. For MDPs, linear programming yields polynomial-time algorithms for any arbitrary discount factor, and the seminal work of Ye established strongly--polynomial time for a fixed discount factor. The generalization of such results to RMDPs has remained an important open problem. In this work, we show that a robust policy iteration algorithm runs in strongly-polynomial time for $(s, a)$-rectangular $L_\infty$ RMDPs with a constant (fixed) discount factor, resolving an important algorithmic question.

URL PDF HTML ☆

赞 0 踩 0

2601.23169 2026-06-03 cs.LG cs.LO cs.SC

Names Don't Matter: Symbol-Invariant Transformer for Open-Vocabulary Learning

名称无关：面向开放词汇学习的符号不变Transformer

İlker Işık, Wenchao Li

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出一种符号不变Transformer机制，通过并行嵌入流和聚合注意力实现可互换令牌的重命名不变性，在开放词汇任务上取得显著性能提升。

Comments ICML 2026 Poster (Camera-Ready Version)

2601.22443 2026-06-03 cs.LG cs.CV stat.CO stat.ML

Weak Diffusion Priors Can Still Achieve Strong Inverse-Problem Performance

弱扩散先验仍能实现强逆问题性能

Jing Jia, Wei Yuan, Sifan Liu, Liyue Shen, Guanyang Wang

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结研究弱扩散先验在逆问题中的鲁棒性，通过贝叶斯一致性和局部相关性分析揭示其在信息丰富测量下仍有效的原因。

Comments 37 pages, ICML 2026 spotlight. Code: https://github.com/jjia131/weak-diffusion-priors-inverse-problem, Project Page: https://jjia131.github.io/weak-diffusion-priors-inverse-problem/

详情

AI中文摘要

在卧室图像上训练的扩散模型能否恢复人脸图像？扩散模型被广泛用作逆问题的先验，但标准方法通常假设一个高保真模型，该模型在与未知信号高度匹配的数据上训练。实践中，常常必须使用不匹配或低保真的扩散先验。令人惊讶的是，这些弱先验的表现往往几乎与全强度的域内基线相当。我们研究了逆求解器何时以及为何对弱扩散先验具有鲁棒性。通过大量实验，我们发现当测量信息高度丰富（例如，大量观测像素）时，弱先验能够成功，并识别了它们失败的场景。为了解释这一行为，我们将贝叶斯一致性理论与局部相关性分析相结合：理论给出了高维测量使后验集中于真实信号附近的条件，而相关性分析表明弱先验和更强的自然图像先验可以共享相似的局部空间结构。这些结果为何时可以可靠地使用弱扩散先验提供了原则性依据。代码可在 https://github.com/jjia131/weak-diffusion-priors-inverse-problem 获取。

英文摘要

Can a diffusion model trained on bedrooms recover human faces? Diffusion models are widely used as priors for inverse problems, but standard approaches usually assume a high-fidelity model trained on data that closely match the unknown signal. In practice, one often must use a mismatched or low-fidelity diffusion prior. Surprisingly, these weak priors often perform nearly as well as full-strength, in-domain baselines. We study when and why inverse solvers are robust to weak diffusion priors. Through extensive experiments, we find that weak priors succeed when measurements are highly informative (e.g., many observed pixels), and we identify regimes where they fail. To explain this behavior, we combine Bayesian-consistency theory with local-correlation analysis: the theory gives conditions under which high-dimensional measurements make the posterior concentrate near the true signal, while the correlation analysis shows that weak and stronger natural-image priors can share similar local spatial structure. These results provide a principled justification on when weak diffusion priors can be used reliably. Code is available at https://github.com/jjia131/weak-diffusion-priors-inverse-problem.

URL PDF HTML ☆

赞 0 踩 0

2601.21683 2026-06-03 cs.LG

Can Local Learning Match Self-Supervised Backpropagation?

局部学习能否匹配自监督反向传播？

Wu S. Zihan, Ariane Delrocq, Wulfram Gerstner, Guillaume Bellec

发表机构 * University of Zurich（苏黎世大学）

AI总结本文通过理论分析和算法变体，证明局部自监督学习在深度非线性卷积网络中可接近全局反向传播自监督学习的性能，并在图像数据集上达到或超越现有最优水平。

Comments Accepted at ICML 2026; Code is available at https://github.com/zihan-wu/local-SSL

详情

AI中文摘要

虽然基于反向传播的端到端自监督学习（全局BP-SSL）已成为训练现代AI系统的核心，但局部自监督学习（local-SSL）理论在深度神经网络中构建功能表示方面仍面临挑战。为建立全局与局部规则之间的联系，我们首先发展了深度线性网络的理论：识别了局部SSL算法（如Forward-forward或CLAPP）实现与全局BP-SSL完全相同的权重更新的条件。从理论见解出发，我们随后开发了局部SSL算法的新变体，以近似深度非线性卷积神经网络中的全局BP-SSL。那些提高局部SSL与全局BP-SSL梯度更新相似性的变体在图像数据集（CIFAR-10、STL-10和Tiny ImageNet）上也表现出更好的性能。使用CLAPP损失函数的最佳局部SSL规则与使用InfoNCE或CPC类损失函数的可比全局BP-SSL性能相匹配，并在这些基准上改进了局部SSL的最新技术水平。

英文摘要

While end-to-end self-supervised learning with backpropagation (global BP-SSL) has become central for training modern AI systems, theories of local self-supervised learning (local-SSL) have struggled to build functional representations in deep neural networks. To establish a link between global and local rules, we first develop a theory for deep linear networks: we identify conditions for local-SSL algorithms (like Forward-forward or CLAPP) to implement exactly the same weight update as a global BP-SSL. Starting from the theoretical insights, we then develop novel variants of local-SSL algorithms to approximate global BP-SSL in deep non-linear convolutional neural networks. Variants that improve the similarity between gradient updates of local-SSL with those of global BP-SSL also show better performance on image datasets (CIFAR-10, STL-10, and Tiny ImageNet). The best local-SSL rule with the CLAPP loss function matches the performance of a comparable global BP-SSL with InfoNCE or CPC-like loss functions, and improves upon state-of-the-art for local SSL on these benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2601.20844 2026-06-03 cs.LG cs.AI cs.IR

$\mathbb{R}^{2k}$ is Theoretically Large Enough for Embedding-based Top-$k$ Retrieval

$\mathbb{R}^{2k}$ 理论上足够大，用于基于嵌入的 Top-$k$ 检索

Zihao Wang, Hang Yin, Lihui Liu, Hanghang Tong, Yangqiu Song, Ginny Wong, Simon See

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结研究最小可嵌入维度（MED），证明对于内积、欧氏距离和余弦相似度，MED 为 Θ(k)，与 m 无关；进一步考虑鲁棒 MED（RMED），推导出可行性上限 ε_⋆(m,k)，并通过实验验证理论结果。

Comments v2: fix broken citation. v3: ICML 2026

详情

AI中文摘要

本文研究最小可嵌入维度（MED）：即存在 m 个对象向量配置的最小维度，使得每个大小至多为 k 的子集都能通过分数比较被精确检索。我们的结果表明，对于内积、欧氏距离和余弦相似度，MED 为 Θ(k)，与 m 无关。然后我们考虑鲁棒 MED（RMED），其中所有向量为单位范数，并且需要 ε 的分数间隙。我们推导出依赖于 m 的可行性上限 ε_⋆(m,k)=m/√(k(m-1)(m-k))，当 m≫k 时趋近于 1/√k，并且高斯质心构造在可行边界区域内给出了鲁棒见证的上界。在合成 top-2 检索上的数值模拟，使用循环多面体和质心查询优化，证实了我们的理论主张。在 LIMIT 和 LIMIT-small 数据集上的实验也表明，简单的基于嵌入的检索基线可能过拟合，并优于报告的单向量 LLM 嵌入基线。理论和实证结果都排除了精确几何容量不足作为障碍的可能性。

英文摘要

This paper studies the Minimal Embeddable Dimension (MED): the least dimension in which there exists a configuration of $m$ object vectors so that every subset of size at most $k$ is exactly retrieved by score comparison. Our result shows MED is $Θ(k)$, independent of $m$, for inner product, Euclidean distance, and cosine similarity. We then consider Robust MED (RMED), where all vectors are unit normed and an $ε$ gap of scores is required. We derive the $m$-dependent feasibility ceiling $ε_\star(m,k)=m/\sqrt{k(m-1)(m-k)}$, which approaches $1/\sqrt{k}$ when $m\gg k$, and a Gaussian centroid construction gives a robust witness upper bound in the feasible margin regime. Numerical simulation on synthetic top-$2$ retrieval with cyclic polytope and centroid query optimization confirmed our theoretical claims. Experiments on LIMIT and LIMIT-small datasets also show that simple embedding-based retrieval baselines can overfit and outperform the reported single-vector LLM embedding baseline. Both theoretical and empirical findings rule out the lack of exact geometric capacity as the obstruction.

URL PDF HTML ☆

赞 0 踩 0

2601.12247 2026-06-03 cs.CL cs.AI cs.LG

Plan, Verify and Fill: A Structured Parallel Decoding Approach for Diffusion Language Models

规划、验证与填充：扩散语言模型的结构化并行解码方法

Miao Li, Hanyang Jiang, Sikai Cheng, Hengyu Fu, Yuhang Cai, Baihe Huang, Tinghan Ye, Xuanzhou Chen, Pascal Van Hentenryck

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； University of California, Berkeley（加州大学伯克利分校）； University of Michigan（密歇根大学）

AI总结提出Plan-Verify-Fill (PVF)方法，通过定量验证进行分层骨架规划，并采用验证协议实现结构化停止，在保持准确性的同时将函数评估次数减少高达65%。

详情

AI中文摘要

扩散语言模型（DLM）为文本生成提供了一种有前景的非顺序范式，不同于标准的自回归（AR）方法。然而，当前的解码策略通常采取被动姿态，未能充分利用全局双向上下文来指导全局轨迹。为了解决这个问题，我们提出了Plan-Verify-Fill（PVF），一种无需训练的范式，通过定量验证来锚定规划。PVF通过优先考虑高杠杆语义锚点主动构建分层骨架，并采用验证协议来实现实用的结构化停止，在进一步思考收益递减时停止。在LLaDA-8B-Instruct和Dream-7B-Instruct上的广泛评估表明，与基于置信度的并行解码相比，PVF在基准数据集上将函数评估次数（NFE）减少了高达65%，在不牺牲准确性的情况下实现了卓越的效率。

英文摘要

Diffusion Language Models (DLMs) present a promising non-sequential paradigm for text generation, distinct from standard autoregressive (AR) approaches. However, current decoding strategies often adopt a reactive stance, underutilizing the global bidirectional context to dictate global trajectories. To address this, we propose Plan-Verify-Fill (PVF), a training-free paradigm that grounds planning via quantitative validation. PVF actively constructs a hierarchical skeleton by prioritizing high-leverage semantic anchors and employs a verification protocol to operationalize pragmatic structural stopping where further deliberation yields diminishing returns. Extensive evaluations on LLaDA-8B-Instruct and Dream-7B-Instruct demonstrate that PVF reduces the Number of Function Evaluations (NFE) by up to 65% compared to confidence-based parallel decoding across benchmark datasets, unlocking superior efficiency without compromising accuracy.

URL PDF HTML ☆

赞 0 踩 0

2501.17377 2026-06-03 cs.LG cs.AI

ASAP: Exploiting the Satisficing Generalization Edge in Neural Combinatorial Optimization

ASAP：利用神经组合优化中的满意泛化优势

Han Fang, Paul Weng, Yutong Ban

发表机构 * GitHub

AI总结针对神经组合优化模型在分布偏移下的脆弱性，提出ASAP框架，通过将决策分解为提案和选择两阶段，并利用MAML增强在线适应能力，在3D-BPP、TSP和CVRP上提升了泛化性能。

Comments Accepted as poster of ICML-2026

详情

AI中文摘要

深度强化学习（DRL）已成为解决组合优化（CO）问题（如3D装箱问题（3D-BPP）、旅行商问题（TSP）或车辆路径问题（VRP））的一种有前景的方法，但这些神经求解器在面对分布偏移时往往表现出脆弱性。为了解决这个问题，我们揭示了满意泛化优势，并在理论和实验上进行了验证：识别一组有希望的行动本质上比选择单一最优行动更具泛化性。为了利用这一特性，我们提出了自适应选择后提案（ASAP），这是一个通用框架，将决策过程分解为两个不同的阶段：作为鲁棒过滤器的提案策略和作为可适应决策者的选择策略。这种架构使得一种高效的在线适应策略成为可能，其中选择策略可以在新分布上快速微调。具体地，我们引入了一个由模型无关元学习（MAML）增强的两阶段训练框架，以使模型能够快速适应。在3D-BPP、TSP和CVRP上的大量实验表明，ASAP提高了最先进基线的泛化能力，并在分布外实例上实现了优越的在线适应。

英文摘要

Deep Reinforcement Learning (DRL) has emerged as a promising approach for solving Combinatorial Optimization (CO) problems, such as the 3D Bin Packing Problem (3D-BPP), Traveling Salesman Problem (TSP), or Vehicle Routing Problem (VRP), but these neural solvers often exhibit brittleness when facing distribution shifts. To address this issue, we uncover the Satisficing Generalization Edge, which we validate both theoretically and experimentally: identifying a set of promising actions is inherently more generalizable than selecting the single optimal action. To exploit this property, we propose Adaptive Selection After Proposal (ASAP), a generic framework that decomposes the decision-making process into two distinct phases: a proposal policy that acts as a robust filter, and a selection policy as an adaptable decision maker. This architecture enables a highly effective online adaptation strategy where the selection policy can be rapidly fine-tuned on a new distribution. Concretely, we introduce a two-phase training framework enhanced by Model-Agnostic Meta-Learning (MAML) to prime the model for fast adaptation. Extensive experiments on 3D-BPP, TSP, and CVRP demonstrate that ASAP improves the generalization capability of state-of-the-art baselines and achieves superior online adaptation on out-of-distribution instances.

URL PDF HTML ☆

赞 0 踩 0

2601.17130 2026-06-03 cs.LG cs.CR

Impact of Graph Structure on Membership-Inference Risk for Graph Neural Networks

图结构对图神经网络成员推理风险的影响

Megha Khosla

发表机构 * Delft University of Technology（代尔夫特理工大学）

AI总结本文通过分析训练图构建和推理时边访问两个维度，研究了图结构如何影响图神经网络的节点级成员推理风险，并发现雪球采样会损害泛化能力，而推理时边访问能显著改变成员推理优势。

Comments Accepted for publication in PETS 2026

详情

AI中文摘要

图神经网络（GNN）广泛用于节点分类和链接预测等任务，但在敏感场景中的使用引发了训练数据泄露的担忧。先前关于GNN隐私泄露的工作大多借鉴非图领域的假设，忽视了图结构的作用。我们主张对隐私风险进行图特定的分析，并研究图结构如何影响节点级成员推理。我们形式化了节点-邻域元组上的成员推理（MI），并探讨了两个重要维度：（i）训练图构建和（ii）推理时边访问。我们比较了雪球采样（一种结构感知过程）与均匀随机节点采样用于构建训练图。实验表明，雪球采样由于其覆盖偏差，通常比随机采样更损害泛化能力。相反，在推理时允许访问训练-测试间边可以提高测试准确率，缩小训练-测试差距，同时也会对成员推理优势产生强烈且依赖于设置的影响。这些结果表明图结构直接塑造了隐私风险。我们进一步表明，泛化差距（以训练和测试节点之间的性能差异衡量）是成员推理风险的不完全代理：成员推理优势可以独立于该差距的变化而上升或下降，而推理时边访问通常起着关键作用。理论上，我们证明对于节点级任务，基于成员推理的标准隐私审计结果不能直接推广到归纳图设置，因为训练和测试节点在结构上相互依赖而非可互换。我们在https://github.com/PriXAI/GraphStructurePrivacyAnalysis-public 发布代码和数据。

英文摘要

Graph neural networks (GNNs) are widely used for tasks such as node classification and link prediction, but their use in sensitive settings raises concerns about training-data leakage. Prior work on privacy leakage in GNNs largely borrows assumptions from non-graph domains, overlooking the role of graph structure. We argue for a graph-specific analysis of privacy risk and study how graph structure affects node-level membership inference. We formalize membership inference (MI) over node-neighborhood tuples and investigate two important dimensions: (i) training-graph construction and (ii) inference-time edge access. We compare snowball sampling, a structure-aware procedure, with uniform random node sampling for constructing training graphs. Our experiments show that snowball sampling often hurts generalization relative to random sampling due to its coverage bias. In contrast, allowing access to inter-train-test edges at inference improves test accuracy, reduces the train-test gap, while also having a strong and setting-dependent effect on membership advantage. These results show that graph structure directly shapes privacy risk. We further show that the generalization gap, measured as the performance difference between training and test nodes, is an incomplete proxy for membership inference risk: membership advantage can rise or fall independently of changes in this gap, with inference-time edge access often playing a crucial role. Theoretically, we show that for node-level tasks, standard privacy-auditing results based on membership inference do not directly carry over to inductive graph settings, because training and test nodes are structurally dependent rather than interchangeable. We release the code and data at https://github.com/PriXAI/GraphStructurePrivacyAnalysis-public.

URL PDF HTML ☆

赞 0 踩 0

2601.14569 2026-06-03 cs.CL cs.LG

Social Caption: Evaluating Social Understanding in Multimodal Models

Social Caption: 评估多模态模型的社会理解能力

Leena Mathur, Bhaavanaa Thumu, Youssouf Kebe, Louis-Philippe Morency

发表机构 * School of Computer Science, Carnegie Mellon University（卡内基梅隆大学计算机科学学院）

AI总结提出基于交互理论的SOCIAL CAPTION框架，从社会推理、整体社会分析和定向社会分析三个维度评估多模态大语言模型的社会理解能力，并分析影响性能的因素。

Comments 25 pages, 10 figures

2505.16014 2026-06-03 cs.CL

Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains

无排序RAG：用选择替代重排序以应用于敏感领域

Yash Saxena, Ankur Padia, Mandar S Chaudhary, Kalpa Gunaratna, Srinivasan Parthasarathy, Manas Gaur

发表机构 * University of Washington（华盛顿大学）

AI总结提出METEORA框架，通过DPO微调LLM生成检索理由、统计肘部检测自适应截断和验证器过滤，在敏感领域实现可解释、高效且鲁棒的证据选择，无需重排序。

Comments ICML 2026

详情

AI中文摘要

部署在敏感领域的检索增强生成（RAG）系统必须提供可解释的证据选择，并针对数据投毒提供稳健的防护，然而当前方法依赖于不透明的基于相似性的检索，并采用任意的top-k截断，这些方法对其选择不提供任何解释，且容易受到对抗性操纵。METEORA通过三个组件用理由驱动的选择替代重排序：一个DPO微调的LLM，生成明确的检索理由；一个证据块选择引擎（ECSE），利用这些理由结合统计肘部检测进行自适应截断确定；以及一个验证器LLM，使用相同的理由过滤投毒证据。在六个数据集上，METEORA实现了召回率提高13.41%，精确率提高21.05%（无扩展），证据量减少80%，答案准确率提高33.34%，对抗鲁棒性提高4.4倍。人工评估证实了真正的可解释性（置信度3.64/5；86%的真实标签一致性），表明可解释性、效率和鲁棒性是协同而非竞争的目标。代码可在GitHub仓库https://github.com/YashSaxena21/METEORA中获取。

英文摘要

Retrieval-Augmented Generation (RAG) systems deployed in sensitive domains must provide interpretable evidence selection and robust safeguards against data poisoning, yet current approaches rely on opaque similarity-based retrieval with arbitrary top-k cutoffs that offer no explanation for their selections and remain vulnerable to adversarial manipulation. METEORA replaces re-ranking with rationale-driven selection via three components: a DPO-tuned LLM that generates explicit retrieval rationales, an Evidence Chunk Selection Engine (ECSE) that uses those rationales with statistical elbow detection for adaptive cutoff determination, and a Verifier LLM that filters poisoned evidence using the same rationales. Across six datasets, METEORA achieves 13.41% higher recall, 21.05% higher precision (without expansion), an 80% reduction in evidence volume, a 33.34% improvement in answer accuracy, and a 4.4x improvement in adversarial robustness. Human evaluation confirms genuine interpretability (3.64/5 confidence; 86% ground-truth agreement), demonstrating that interpretability, efficiency, and robustness are synergistic rather than competing objectives. The code is available in the GitHub repository https://github.com/YashSaxena21/METEORA

URL PDF HTML ☆

赞 0 踩 0

2601.11667 2026-06-03 cs.LG cs.AI

Distill-then-Replace: Efficient Task-Specific Hybrid Attention Model Construction

Distill-then-Replace: 高效的任务特定混合注意力模型构建

Xiaojie Xia, Huigang Zhang, Chaoliang Zhong, Jun Sun, Yusuke Oishi

发表机构 * Fujitsu Research & Development Center CO., LTD（富士通研发中心有限公司）； Fujitsu Research, FUJITSU LTD（富士通研究所，富士通有限公司）

AI总结提出Distill-then-Replace (DtR)方法，通过逐块局部蒸馏和贪婪层替换策略，将预训练的全注意力模型高效转换为任务特定的混合注意力模型，无需重新训练或神经架构搜索。

详情

AI中文摘要

Transformer架构通过密集的全注意力机制实现了最先进的准确性，但其相对于序列长度的二次时间和内存复杂度限制了实际部署。线性注意力机制提供线性或接近线性的缩放，但通常会导致性能下降。集成全注意力和线性注意力层的混合模型有望在效率和表达能力之间取得平衡，但面临两个主要挑战：从头训练此类混合模型计算成本高，且手动设计注意力类型的最佳放置位置非常困难。我们提出DtR（Distill-then-Replace），首先通过逐块局部蒸馏将预训练的全注意力模块的权重转移到其线性注意力对应模块，然后应用贪婪层替换策略，迭代地用线性注意力块替换全注意力块，同时监控目标任务的验证性能。DtR在单次高效过程中生成任务特定的混合模型，无需昂贵的重新训练或神经架构搜索，并可应用于任何预训练的全注意力骨干网络以处理各种下游任务。

英文摘要

Transformer architectures deliver state-of-the-art accuracy via dense full-attention, but their quadratic time and memory complexity with respect to sequence length limits practical deployment. Linear attention mechanisms offer linear or near-linear scaling yet often incur performance degradation. Hybrid models that integrate full and linear attention layers promise a balance between efficiency and expressiveness, but face two major challenges: training such hybrid models from scratch is computationally expensive, and manually designing the optimal placement of attention types is highly nontrivial. We propose DtR (Distill-then-Replace), which first transfers weights from the pretrained full-attention modules to its linear attention counterparts through blockwise local distillation, and then applies a greedy layer replacement strategy that iteratively substitutes full attention blocks with linear ones while monitoring validation performance on the target task. DtR yields a task-specific hybrid model in a single efficient pass, without costly re-training or neural architecture search, and can be applied to any pretrained full-attention backbone for diverse downstream tasks.

URL PDF HTML ☆

赞 0 踩 0

2510.22491 2026-06-03 cs.LG cs.CE cs.CV

LAMP: Data-Efficient Linear Affine Weight-Space Models for Parameter-Controlled 3D Shape Generation and Extrapolation

LAMP: 数据高效的线性仿射权重空间模型用于参数控制的3D形状生成与外推

Ghadi Nehme, Yanxia Zhang, Dule Shu, Matt Klenk, Faez Ahmed

发表机构 * GitHub

AI总结提出LAMP框架，通过过拟合共享初始化的符号距离函数解码器并对齐权重空间，以少量样本实现参数约束下的可控3D生成与外推，并引入线性失配安全度量确保可靠性。

详情

AI中文摘要

在显式参数约束下生成高保真3D几何体是工程设计的核心，但当前方法通常需要大型数据集，且无法在训练分布之外提供可靠控制。我们提出LAMP，一个数据高效的框架，用于可控和可解释的3D生成，该框架通过从共享初始化过拟合每个样本并对齐符号距离函数（SDF）解码器，然后在对齐的权重空间中通过求解参数约束的仿射混合问题来生成新设计。为了提高可靠性，我们提出一种线性失配安全度量，用于检测混合解码器何时离开有效的局部区域。我们在DrivAerNet++、BlendedNet以及额外的工业级车辆系列（包括跑车、SUV和敞篷车）上评估LAMP。LAMP能够以少至50个样本实现受控插值，在训练范围外安全外推高达100%，并在固定参数下进行性能引导优化，在外推、数据效率和参数保真度方面优于条件自编码器和深度网络插值（DNI）基线。我们的结果表明，LAMP推进了用于设计探索、数据集生成和性能驱动优化的可控、数据高效且安全的3D生成。

英文摘要

Generating high-fidelity 3D geometries under explicit parameter constraints is central to engineering design, yet current methods often require large datasets and fail to provide reliable control beyond the training distribution. We introduce LAMP, a data-efficient framework for controllable and interpretable 3D generation that aligns signed distance function (SDF) decoders by overfitting each exemplar from a shared initialization, then generates new designs by solving a parameter-constrained affine mixing problem in the aligned weight space. To improve reliability, we propose a linearity-mismatch safety metric that detects when mixed decoders leave the valid local regime. We evaluate LAMP on DrivAerNet++, BlendedNet, and additional industry-level vehicle families, including sports cars, SUVs, and convertibles. LAMP enables controlled interpolation with as few as 50 samples, safe extrapolation up to 100% beyond training ranges, and performance-guided optimization under fixed parameters, outperforming conditional autoencoder and Deep Network Interpolation (DNI) baselines in extrapolation, data efficiency, and parameter fidelity. Our results demonstrate that LAMP advances controllable, data-efficient, and safe 3D generation for design exploration, dataset generation, and performance-driven optimization.

URL PDF HTML ☆

赞 0 踩 0