URL PDF HTML ☆

赞 0 踩 0

2602.20141 2026-05-29 cs.AI

GICDM: 缓解枢纽性以实现可靠的基于距离的生成模型评估

Nicolas Salvy, Hugues Talbot, Bertrand Thirion

发表机构 * Inria, Palaiseau, France（法国帕莱索研究所）

AI总结针对生成模型评估中高维嵌入空间的枢纽性现象，提出GICDM方法（基于迭代上下文不相似度度量），通过多尺度扩展校正邻域估计，恢复可靠度量并与人类评估对齐。

Comments Forty-third International Conference on Machine Learning, 2026

2602.15382 2026-05-29 cs.CL cs.CV cs.LG

The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems

视觉虫洞：异构多智能体系统中的潜在空间通信

Xiaoze Liu, Ruowang Zhang, Weichen Yu, Siheng Xiong, Liu He, Feijie Wu, Hoin Jung, Matt Fredrikson, Xiaoqian Wang, Jing Gao

发表机构 * Purdue University（普渡大学）； Contextual AI（情境人工智能）； Carnegie Mellon University（卡内基梅隆大学）； Georgia Institute of Technology（佐治亚理工学院）

AI总结提出Vision Wormhole框架，通过通用视觉编解码器将推理轨迹映射到共享连续空间，实现异构VLM间的潜在状态传输，无需配对翻译器，降低对齐复杂度并提升效率。

Comments Preprint. Work in progress

详情

AI中文摘要

由大型语言模型驱动的多智能体系统（MAS）实现了先进的协作推理，但仍受限于离散文本通信，这带来了运行时开销和信息量化损失。虽然潜在状态传输提供了一种替代方案，但现有方法要么假设同构的发送器-接收器架构，要么依赖于特定配对的学得翻译器，限制了跨具有不连续流形的不同模型族的可扩展性。我们将为自然图像训练的视觉-语言模型（VLM）的视觉界面重新概念化为异构智能体之间的连续通信通道，并将这一思想实例化为 extbf{视觉虫洞}：一种通用视觉编解码器，将推理轨迹映射到共享的连续参考空间，并将其注入接收器的视觉通路，实现无需配对翻译器的跨架构潜在状态传输。该框架采用中心辐射拓扑，将对齐复杂度从$O(N^2)$降低到$O(N)$，并通过无标签的教师-学生蒸馏针对文本通道进行训练，无需并行隐藏状态监督。在异构VLM族（Qwen-VL、Gemma、SmolVLM2、LFM2.5-VL）和九个推理基准上的大量实验表明，视觉虫洞在大多数评估设置中减少了端到端挂钟时间，并产生了正的平均宏$Δ$-准确率。

英文摘要

Multi-Agent Systems (MAS) powered by Large Language Models have unlocked advanced collaborative reasoning, yet they remain bottlenecked by discrete text communication, which imposes runtime overhead and information quantization loss. While latent state transfer offers an alternative, existing approaches either assume homogeneous sender--receiver architectures or rely on pair-specific learned translators, limiting scalability across diverse model families with disjoint manifolds. We reconceptualize the visual interface of Vision-Language Models (VLMs), trained for natural images, as a continuous communication channel between heterogeneous agents, and instantiate this idea as the \textbf{Vision Wormhole}: a Universal Visual Codec maps reasoning traces into a shared continuous reference space and injects them into the receiver's visual pathway, yielding cross-architecture latent state transfer without per-pair translators. The framework adopts a hub-and-spoke topology that reduces alignment complexity from $O(N^2)$ to $O(N)$, and is trained by label-free teacher--student distillation against the text channel, requiring no parallel hidden-state supervision. Extensive experiments across heterogeneous VLM families (Qwen-VL, Gemma, SmolVLM2, LFM2.5-VL) and nine reasoning benchmarks show that the Vision Wormhole reduces end-to-end wall-clock time across most evaluated settings and yields positive macro-average $Δ$-accuracy.

URL PDF HTML ☆

赞 0 踩 0

2602.15239 2026-05-29 cs.LG

大型语言模型的罕见事件分析

Jake McAllister Dorman, Edward Gillman, Dominic C. Rose, Jamie F. Mair, Juan P. Garrahan

发表机构 * School of Physics and Astronomy, University of Nottingham（物理与天文学学院，诺丁汉大学）

AI总结本文提出一个端到端框架，用于系统分析大型语言模型中的罕见事件，涵盖理论、高效生成策略、概率估计和误差分析，并通过实例展示其应用。

Comments ICML 2026 Oral Spotlight

2602.06036 2026-05-29 cs.CL

DFlash: Block Diffusion for Flash Speculative Decoding

DFlash：用于快速推测解码的块扩散模型

Jian Chen, Yesheng Liang, Zhijian Liu

发表机构 * Z-Lab

AI总结提出DFlash框架，利用轻量级块扩散模型并行生成草稿，通过目标模型上下文特征条件化，实现高质量草稿和高接受率，在多种模型和任务上实现超过6倍无损加速，比最先进的推测解码方法EAGLE-3快2.5倍。

Comments Accepted at ICML 2026. Camera-ready version. Code: https://github.com/z-lab/dflash

详情

AI中文摘要

自回归大型语言模型（LLMs）性能强大，但需要固有的顺序解码，导致高推理延迟和低GPU利用率。推测解码通过使用快速草稿模型来缓解这一瓶颈，其输出由目标LLM并行验证；然而，现有方法仍然依赖于自回归草稿生成，这仍然是顺序的，限制了实际加速。扩散LLMs通过实现并行生成提供了一种有希望的替代方案，但当前的扩散模型通常性能不如自回归模型。在本文中，我们介绍了DFlash，一种采用轻量级块扩散模型进行并行草稿生成的推测解码框架。通过在单次前向传播中生成草稿标记，并将草稿模型条件化于从目标模型提取的上下文特征，DFlash实现了高效草稿生成，具有高质量输出和更高的接受率。实验表明，DFlash在多种模型和任务上实现了超过6倍的无损加速，比最先进的推测解码方法EAGLE-3提供高达2.5倍的加速提升。

英文摘要

Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding mitigates this bottleneck by using a fast draft model whose outputs are verified in parallel by the target LLM; however, existing methods still rely on autoregressive drafting, which remains sequential and limits practical speedups. Diffusion LLMs offer a promising alternative by enabling parallel generation, but current diffusion models typically underperform compared with autoregressive models. In this paper, we introduce DFlash, a speculative decoding framework that employs a lightweight block diffusion model for parallel drafting. By generating draft tokens in a single forward pass and conditioning the draft model on context features extracted from the target model, DFlash enables efficient drafting with high-quality outputs and higher acceptance rates. Experiments show that DFlash achieves over 6x lossless acceleration across a range of models and tasks, delivering up to 2.5x higher speedup than the state-of-the-art speculative decoding method EAGLE-3.

URL PDF HTML ☆

赞 0 踩 0

2602.05961 2026-05-29 cs.LG stat.ML

Discrete diffusion samplers and bridges: Off-policy algorithms and applications in latent spaces

离散扩散采样器与桥：离策略算法及其在潜在空间中的应用

Arran Carter, Sanghyeok Choi, Kirill Tamogashev, Víctor Elvira, Esmeralda S. Whitammer

发表机构 * University of Edinburgh（爱丁堡大学）； CIFAR Fellow（卡尔·弗里德里希·列文森研究员）

AI总结提出离策略训练技术改进离散扩散采样器性能，并首次引入离散域的数据到能量薛定谔桥训练，应用于图像生成模型的离散潜在空间中的无数据后验采样。

Comments ICML 2026. Code: https://github.com/mmacosha/offpolicy-discrete-diffusion-samplers-and-bridges

详情

AI中文摘要

从已知归一化常数的分布 $p(x) \propto e^{-\mathcal{E}(x)}$ 中采样是统计学中一个重要且具有挑战性的问题。近年来，出现了一类新的摊销采样算法，通常称为扩散采样器，能够从未归一化的密度中快速高效地采样。这类算法在连续空间采样任务中已被广泛研究；然而，它们在离散空间问题中的应用仍 largely 未被探索。尽管该领域已取得一些进展，但离散扩散采样器并未充分利用连续空间采样中常用的思想。在本文中，我们提出通过引入离散扩散采样器的离策略训练技术来弥合这一差距。我们证明这些技术在已有和新颖的合成基准上提高了离散采样器的性能。接下来，我们将离散扩散采样器推广到两个任意分布之间的桥接任务，首次为离散域引入了数据到能量薛定谔桥训练。最后，我们展示了所提出的扩散采样器在图像生成模型的离散潜在空间中进行无数据后验采样的应用。

英文摘要

Sampling from a distribution $p(x) \propto e^{-\mathcal{E}(x)}$ known up to a normalising constant is an important and challenging problem in statistics. Recent years have seen the rise of a new family of amortised sampling algorithms, commonly referred to as diffusion samplers, that enable fast and efficient sampling from an unnormalised density. Such algorithms have been widely studied for continuous-space sampling tasks; however, their application to problems in discrete space remains largely unexplored. Although some progress has been made in this area, discrete diffusion samplers do not take full advantage of ideas commonly used for continuous-space sampling. In this paper, we propose to bridge this gap by introducing off-policy training techniques for discrete diffusion samplers. We show that these techniques improve the performance of discrete samplers on both established and new synthetic benchmarks. Next, we generalise discrete diffusion samplers to the task of bridging between two arbitrary distributions, introducing data-to-energy Schrödinger bridge training for the discrete domain for the first time. Lastly, we showcase the application of the proposed diffusion samplers to data-free posterior sampling in the discrete latent spaces of image generative models.

URL PDF HTML ☆

赞 0 踩 0

2602.03357 2026-05-29 cs.LG math.OC

Achieving Linear Speedup for Composite Federated Learning

实现复合联邦学习的线性加速

Kun Huang, Shi Pu, Karl Henrik Johansson

发表机构 * KTH Royal Institute of Technology（皇家理工学院）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结提出基于法向映射的FedNMap方法，通过法向映射更新处理非光滑项并采用局部校正策略减轻数据异质性，在非凸损失下首次实现关于客户端数和本地更新次数的线性加速。

Comments 38 pages, 19 figures

2602.02849 2026-05-29 cs.AI

从元思维到执行：面向通用且可靠的大语言模型推理的认知对齐后训练

Shaojie Wang, Liang Zhang

发表机构 * Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））

AI总结提出一种认知启发的两阶段后训练框架，通过元思维链监督学习通用策略和置信度校准强化学习优化执行可靠性，在分布内和分布外分别提升2.10%和3.86%。

详情

AI中文摘要

当前的大语言模型后训练方法通过监督微调（SFT）后接基于结果的强化学习（RL）来优化完整的推理轨迹。虽然有效，但仔细审视发现一个根本差距：这种方法与人类实际解决问题的方式不一致。人类认知自然地将问题解决分解为两个不同的阶段：首先获取跨问题泛化的抽象策略（即元知识），然后将其适应到具体实例。相比之下，通过将完整轨迹视为基本单元，当前方法本质上是问题中心的，将抽象策略与问题特定的执行纠缠在一起。为了解决这种错位，我们提出了一个认知启发的框架，明确地模仿人类认知的两阶段过程。具体而言，元思维链（CoMT）将监督学习聚焦于抽象推理模式而不涉及具体执行，从而能够获取可泛化的策略。然后，置信度校准强化学习（CCRL）通过中间步骤上的置信度感知奖励来优化任务适应，防止过度自信的错误级联并提高执行可靠性。在四个模型和十个基准上的实验表明，与标准方法相比，分布内和分布外分别提升了2.10%和3.86%，同时对教师模型选择、优化方法和符号扰动的变化保持高度鲁棒。

英文摘要

Current LLM post-training methods optimize complete reasoning trajectories through Supervised Fine-Tuning (SFT) followed by outcome-based Reinforcement Learning (RL). While effective, a closer examination reveals a fundamental gap: this approach does not align with how humans actually solve problems. Human cognition naturally decomposes problem-solving into two distinct stages: first acquiring abstract strategies (i.e., meta-knowledge) that generalize across problems, then adapting them to specific instances. In contrast, by treating complete trajectories as basic units, current methods are inherently problem-centric, entangling abstract strategies with problem-specific execution. To address this misalignment, we propose a cognitively-inspired framework that explicitly mirrors the two-stage human cognitive process. Specifically, Chain-of-Meta-Thought CoMT focuses supervised learning on abstract reasoning patterns without specific executions, enabling acquisition of generalizable strategies. Confidence-Calibrated Reinforcement Learning (CCRL) then optimizes task adaptation via confidence-aware rewards on intermediate steps, preventing overconfident errors from cascading and improving execution reliability. Experiments across four models and ten benchmarks show 2.10% and 3.86% improvements in-distribution and out-of-distribution respectively over standard methods, while remaining highly robust to variations in teacher model selection, optimization methods, and symbolic perturbations.

URL PDF HTML ☆

赞 0 踩 0

2601.21568 2026-05-29 cs.LG

Bridging Functional and Representational Similarity via Usable Information

通过可用信息桥接功能相似性与表征相似性

Antonio Almudévar, Alfonso Ortega

发表机构 * ViVoLab, Aragón Institute for Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain（ViVoLab，阿拉贡工程研究院（I3A），萨拉戈萨大学，西班牙萨拉戈萨）

AI总结提出一个基于可用信息的统一框架，从功能相似性、表征相似性及其关系三个维度进行理论和实证综合，揭示表征相似性是功能相似性的充分非必要条件。

详情

AI中文摘要

我们提出了一个通过可用信息量化表征之间相似性的统一框架，在三个关键维度上提供了严格的理论和实证综合。首先，针对功能相似性，我们建立了拼接性能与条件互信息之间的形式化联系。我们进一步揭示拼接本质上是非对称的，证明稳健的功能比较需要双向分析而非单向映射。其次，关于表征相似性，我们发现基于重构的指标和标准工具（如CKA、RSA）在特定约束下充当可用信息的估计量。关键的是，我们表明相似性是相对于预测族的能力而言的：对刚性观察者而言不同的表征，对更具表达力的观察者可能是相同的。第三，我们证明表征相似性是功能相似性的充分非必要条件。我们通过任务粒度层次统一这些概念：复杂任务上的相似性保证了任何更粗粒度衍生任务上的相似性，将表征相似性确立为最大粒度的极限：输入重构。

英文摘要

We present a unified framework for quantifying the similarity between representations through the lens of \textit{usable} information, offering a rigorous theoretical and empirical synthesis across three key dimensions. First, addressing functional similarity, we establish a formal link between stitching performance and conditional mutual information. We further reveal that stitching is inherently asymmetric, demonstrating that robust functional comparison necessitates a bidirectional analysis rather than a unidirectional mapping. Second, concerning representational similarity, we find that reconstruction-based metrics and standard tools (e.g., CKA, RSA) act as estimators of usable information under specific constraints. Crucially, we show that similarity is relative to the capacity of the predictive family: representations that appear distinct to a rigid observer may be identical to a more expressive one. Third, we demonstrate that representational similarity is sufficient but not necessary for functional similarity. We unify these concepts through a task-granularity hierarchy: similarity on a complex task guarantees similarity on any coarser derivative, establishing representational similarity as the limit of maximum granularity: input reconstruction.

URL PDF HTML ☆

赞 0 踩 0

2601.21564 2026-05-29 cs.LG

Representation Unlearning: Forgetting through Information Compression

表示遗忘：通过信息压缩实现遗忘

Antonio Almudévar, Alfonso Ortega

发表机构 * ViVoLab, Aragón Institute for Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain（ViVoLab，阿拉贡工程研究所（I3A），萨拉戈萨大学，西班牙萨拉戈萨）

AI总结提出表示遗忘框架，通过在模型表示空间学习信息瓶颈变换来直接执行遗忘，无需修改模型参数，实现可靠遗忘、保持效用和计算高效。

详情

AI中文摘要

机器遗忘旨在消除特定训练数据对模型的影响，这一需求由隐私法规和鲁棒性关注驱动。现有方法通常修改模型参数，但此类更新可能不稳定、计算成本高且受局部近似限制。我们引入表示遗忘，一个直接在模型表示空间中执行遗忘的框架。我们不修改模型参数，而是学习一个对表示施加信息瓶颈的变换：最大化与保留数据的互信息，同时抑制关于待遗忘数据的信息。我们推导出使这一目标可处理的变分替代，并展示如何在两种实际场景中实例化：当保留和遗忘数据都可用时，以及在仅能访问遗忘数据的零样本设置中。在多个基准上的实验表明，与以参数为中心的基线相比，表示遗忘实现了更可靠的遗忘、更好的效用保持和更高的计算效率。

英文摘要

Machine unlearning seeks to remove the influence of specific training data from a model, a need driven by privacy regulations and robustness concerns. Existing approaches typically modify model parameters, but such updates can be unstable, computationally costly, and limited by local approximations. We introduce Representation Unlearning, a framework that performs unlearning directly in the model's representation space. Instead of modifying model parameters, we learn a transformation over representations that imposes an information bottleneck: maximizing mutual information with retained data while suppressing information about data to be forgotten. We derive variational surrogates that make this objective tractable and show how they can be instantiated in two practical regimes: when both retain and forget data are available, and in a zero-shot setting where only forget data can be accessed. Experiments across several benchmarks demonstrate that Representation Unlearning achieves more reliable forgetting, better utility retention, and greater computational efficiency than parameter-centric baselines.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Catalyst-Agent: Autonomous heterogeneous catalyst screening with an LLM Agent

AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning

Routing by Reaching: Composition of Pre-trained GFlowNets for Multi-Objective Generation

Recurrent Structural Policy Gradient for Partially Observable Mean Field Games

JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments

RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference

Who can we trust? LLM-as-a-jury for Comparative Assessment

GICDM: Mitigating Hubness for Reliable Distance-Based Generative Model Evaluation

The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems

Size Transferability of Graph Transformers with Convolutional Positional Encodings

OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model

A Language-Guided Bayesian Optimization for Efficient LoRA Hyperparameter Search

S-MARC: Causal Streaming Reasoning for Full-Duplex Conversational Behavior Modeling

Coarse-Grained Boltzmann Generators

Prioritize the Process, Not Just the Outcome: Rewarding Latent Thought Trajectories Improves Reasoning in Looped Language Models

Beyond Transcripts: A Renewed Perspective on Audio Chaptering

Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure

Rare Event Analysis of Large Language Models

DFlash: Block Diffusion for Flash Speculative Decoding

Discrete diffusion samplers and bridges: Off-policy algorithms and applications in latent spaces

Achieving Linear Speedup for Composite Federated Learning

AutoSizer: Automatic Sizing of Analog and Mixed-Signal Circuits via Large Language Model (LLM) Agents

How Far Ahead Do LLMs Plan? Uncovering the Latent Horizon in Chain-of-Thought Reasoning

Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents

Rectified LpJEPA: Joint-Embedding Predictive Architectures with Sparse and Maximum-Entropy Representations

Unsupervised Hierarchical Skill Discovery

Server-Proximal Aggregation for Federated Domain-Incremental Learning under Partial Participation: Task-Uniform Convergence and Backward Transfer

From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning

Bridging Functional and Representational Similarity via Usable Information

Representation Unlearning: Forgetting through Information Compression