URL PDF HTML ☆

赞 0 踩 0

2605.26108 2026-06-09 cs.CV 版本更新

Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

通过奖励倾斜分布匹配增强少步生成器

Yushi Huang, Xiangxin Zhou, Ruoyu Wang, Chi Zhang, Jun Zhang, Tianyu Pang

发表机构 * Tencent Hunyuan（腾讯文英）； Hong Kong University of Science and Technology（香港科技大学）； Westlake University（西湖大学）

AI总结提出奖励倾斜分布匹配蒸馏（RTDMD）两阶段框架，结合分布匹配蒸馏与奖励引导强化学习，在仅4步推理下实现文本到图像生成的最新性能。

Comments Code and models are available at https://github.com/Harahan/RTDMD

详情

AI中文摘要

近期少步扩散蒸馏的进展实现了高效图像生成，但将这些模型与人类偏好对齐仍具挑战。我们提出奖励倾斜分布匹配蒸馏（RTDMD），一个两阶段框架，将分布匹配蒸馏与奖励引导的强化学习统一用于少步流生成器。我们证明，最小化到奖励倾斜教师分布的KL散度自然分解为分布匹配项和奖励最大化项。在第一阶段，我们引入环境一致分布匹配蒸馏（AC-DMD），它执行子区间分布匹配，并用一致性正则化增强假分数目标，帮助假分数模型在有限更新下跟踪变化的生成器分布。在第二阶段，我们联合优化两项：对于奖励最大化项，我们推导出一个混合策略梯度，将GRPO风格的估计器用于随机中间过渡，与通过确定性最后步骤的直接奖励反向传播相结合，并进一步引入步骤子集GRPO（SubGRPO）以降低方差。在SD3、SD3.5和FLUX.2上的实验表明，RTDMD在偏好、美学和组合指标上仅用4步推理就建立了新的最先进结果，超越了先前的少步文本到图像生成方法。代码和模型见https://github.com/Harahan/RTDMD。

英文摘要

Recent advances in few-step diffusion distillation have enabled efficient image generation, yet aligning these models with human preferences remains challenging. We propose Reward-Tilted Distribution Matching Distillation (RTDMD), a two-stage framework that unifies distribution matching distillation with reward-guided reinforcement learning for few-step flow generators. We show that minimizing the KL divergence to a reward-tilted teacher distribution naturally decomposes into a distribution matching term and a reward maximization term. In the first stage, we introduce Ambient-Consistent Distribution Matching Distillation (AC-DMD), which performs subinterval-wise distribution matching and augments the fake score objective with a consistency regularizer to help the fake score model track the shifting generator distribution under limited updates. In the second stage, we jointly optimize both terms: for the reward maximization term, we derive a hybrid policy gradient that combines a GRPO-style estimator for the stochastic intermediate transitions with direct reward backpropagation through the deterministic final step, and further introduce step-subset GRPO (SubGRPO) to reduce variance. Experiments on SD3, SD3.5, and FLUX.2 demonstrate that RTDMD establishes new state-of-the-art results across preference, aesthetic, and compositional metrics with only 4 inference steps, outperforming previous few-step text-to-image generation methods. Code and models are available at https://github.com/Harahan/RTDMD.

URL PDF HTML ☆

赞 0 踩 0

2605.30226 2026-06-09 cs.RO cs.AI 版本更新

BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models

BORA: 弥合离线强化学习与在线残差适应以实现真实世界灵巧VLA模型

Zhongxi Chen, Yifan Han, Yanming Shao, Huanming Liu, Congsheng Xu, Xiaoyu Chen, Yao Mu, Wenzhao Lian

发表机构 * Shanghai Jiao Tong University（上海交通大学）； CASIA（中国科学院自动化研究所）； Shanghai AI Laboratory（上海人工智能实验室）； USTC（中国科学技术大学）

AI总结提出BORA框架，通过离线构建动作条件价值引导的评论家，并结合在线冻结VLA基础、引入人类在环的分块残差适应机制，解决灵巧操作中高维探索导致的时间不一致、样本低效和硬件风险问题，在五个真实灵巧任务上平均成功率提升33%。

Comments 24 pages,11 figures

详情

AI中文摘要

视觉-语言-动作（VLA）模型已成为将视觉-语言理解融入真实世界机器人操作的一种有前景的范式。然而，由于高维手部控制和复合执行误差，灵巧操作对VLA策略仍然具有挑战性，这使得真实世界的强化学习后训练对于弥合视觉基础动作生成与物理可靠灵巧执行之间的差距至关重要。然而，高维灵巧探索常常引发真实世界中的时间不一致性、样本低效和硬件风险。为应对这些挑战，我们提出BORA，一种为真实世界灵巧VLA模型设计的离线到在线强化学习后训练框架。在离线阶段，BORA构建一个以VLM的认知令牌和动作块作为输入的评论家。这种设计实现了动作条件价值引导，使评论家能够评估超越视觉上下文的灵巧手部运动。在随后的在线阶段，BORA冻结VLA基础，并引入一种轻量级、人类在环（HiL）的分块残差适应机制，以减轻真实世界执行误差并进一步在真实物理环境中纠正离线学习到的意图。通过继承离线评论家并采用干预驱动奖励，BORA有效纠正执行差异并适应真实世界物理变化，同时将预训练策略作为稳定先验。在五个复杂真实世界灵巧任务上的广泛评估表明，BORA显著优于纯模仿学习和传统解耦强化学习基线，在标准设置下平均成功率绝对提升33%，在未见物体泛化中提升高达43%。

英文摘要

Vision-Language-Action (VLA) models have emerged as a promising paradigm for grounding visual-language understanding into real-world robotic manipulation. However, dexterous manipulation remains challenging for VLA policies due to high-dimensional hand control and compounding execution errors, which makes real-world RL post-training essential for bridging the gap between visually grounded action generation and physically reliable dexterous execution. However, high-dimensional dexterous exploration often triggers temporal inconsistency, sample inefficiency and hardware risks in the real world. To address these challenges, we propose BORA, an offline-to-online RL post-training framework designed for real-world dexterous VLA models. In the offline phase, BORA constructs a critic that takes both the VLM's cognition tokens and action chunks as inputs. This design enables action-conditioned value guidance, allowing the critic to evaluate dexterous hand motions beyond visual context alone. During the subsequent online phase, BORA freezes the VLA base and introduces a lightweight, Human-in-the-Loop (HiL) chunk-wise residual adaptation mechanism to mitigate real-world execution errors and further correct the offline-learned intents within the actual physical environment. By inheriting the offline critic and employing intervention-driven rewards, BORA effectively corrects execution discrepancies and adapts to real-world physical variances while preserving the pretrained policy as a stable prior. Extensive evaluations across five complex real-world dexterous tasks demonstrate that BORA significantly outperforms pure imitation learning and traditional decoupled RL baselines, achieving a 33% absolute increase in average success rate under standard settings and up to a 43% improvement in unseen object generalization.

URL PDF HTML ☆

赞 0 踩 0

2605.30184 2026-06-09 cs.LG physics.ao-ph 版本更新

Can AI Weather Models Predict Beyond Two Weeks? A Quantitative Benchmark and Analysis of Long Rollouts

AI天气模型能否预测两周以上？长期推演的定量基准与分析

Fanny Lehmann, Firat Ozdemir, Yun Cheng, Torsten Hoefler, Sebastian Schemm, Benedikt Soja, Siddhartha Mishra

发表机构 * ETH AI Center（ETH人工智能中心）； ETH Zurich（苏黎世联邦理工学院）； Swiss Data Science Center（瑞士数据科学中心）； Scalable Parallel Computing Lab（可扩展并行计算实验室）； Dep. of Applied Mathematics and Theoretical Physics（应用数学与理论物理系）； University of Cambridge（剑桥大学）； Institute of Geodesy and Photogrammetry（大地测量与摄影测量研究所）； Seminar for Applied Mathematics（应用数学研讨会）

AI总结通过九种AI天气模型的一年推演，将长期不稳定性分类为爆发、漂移和季节性丧失三种模式，并发现稳定性取决于对小时空尺度的处理。

2605.29920 2026-06-09 cs.LG 版本更新

Midpoint Generative Models

中点生成模型

Daniil Shlenskii, Nikita Gushchin, Lev Novitskiy, Dmitry V. Dylov, Alexander Korotin

发表机构 * AXXX, Russia（俄罗斯AXXX）； Applied AI Institute, Russia（俄罗斯应用人工智能研究所）； Kandinsky Lab, Russia（俄罗斯康德斯基实验室）

AI总结提出中点生成模型（MGM），利用流匹配的对称性定义中点散度，并通过变分目标训练单步生成模型，在性能上与现有方法竞争。

详情

AI中文摘要

我们引入了中点生成模型（MGM），这是一个用于训练单步生成模型的原则性框架。MGM基于线性插值流匹配的一个简单对称性：当两个端点分布重合时，相应的漂移场在中点时间$t=1/2$处消失。我们证明该场的范数定义了分布之间的有效差异，称为中点散度。我们通过引入随机翻转插值将该散度扩展到中点之外，并通过用对称随机插值替代确定性线性流匹配插值进一步推广，得到广义中点散度。最后，我们推导了广义散度的变分形式，从而得到一个可处理的目标用于训练单步生成器。由此产生的MGM算法为生成建模提供了一种有效且理论上有依据的方法，在单步生成建模方法中取得了有竞争力的性能。

英文摘要

We introduce Midpoint Generative Models (MGM), a principled framework for training one-step generative models. MGM is based on a simple symmetry of Flow Matching with linear interpolation: when the two endpoint distributions coincide, the corresponding drift field vanishes at the midpoint time, $t=1/2$. We show that the norm of this field defines a valid discrepancy between distributions, which we call the Midpoint Divergence. We extend this discrepancy beyond the midpoint by introducing randomly flipped interpolations and further generalize it by replacing deterministic linear Flow Matching interpolations with symmetric stochastic interpolants, yielding a generalized Midpoint Divergence. Finally, we derive a variational formulation of our generalized divergence, yielding a tractable objective for training a one-step generator. The resulting MGM algorithm offers an effective and theoretically grounded approach to generative modeling, achieving competitive performance against existing one-step generative modeling methods.

URL PDF HTML ☆

赞 0 踩 0

2605.29823 2026-06-09 cs.AI 版本更新

Quantifying and Optimizing Simplicity via Polynomial Representations

通过多项式表示量化和优化简单性

Tianren Zhang, Xiangxin Li, Minghao Xiao, Guanyu Chen, Feng Chen

发表机构 * [cs.AI]（计算机科学与人工智能）

AI总结提出多项式表示作为分布感知的低维神经函数代理，通过正交多项式基近似网络预测行为，以有效度作为简单性度量，并导出可微正则化器以提升泛化。

Comments ICML 2026

详情

AI中文摘要

深度网络通常表现出对“简单”解的偏好，这种简单性偏差被广泛认为在泛化中起关键作用。然而，一种广泛适用、定量的简单性度量仍然难以捉摸。我们引入多项式表示作为分布感知的、低维神经函数代理：我们使用正交多项式基沿数据依赖的插值路径近似网络的预测行为，从而得到紧凑的函数表示。我们表明，该表示的有效度可作为实用的简单性度量，能够预测跨任务和架构的泛化，并且持续优于现有的泛化代理（如锐度）。最后，多项式表示自然产生可微的简单性正则化器，在图像和文本分类、微调对比视觉语言模型以及强化学习中持续改善泛化。

英文摘要

Deep networks often exhibit a preference for "simple" solutions, and such a simplicity bias is widely believed to play a key role in generalization. Yet a broadly applicable, quantitative measure of simplicity remains elusive. We introduce polynomial representations as a distribution-aware, low-dimensional surrogate for neural functions: we approximate a network's predictive behavior along data-dependent interpolation paths using orthogonal polynomial bases, yielding a compact functional representation. We show that the effective degree of this representation serves as a practical simplicity metric that is predictive of generalization across tasks and architectures, and consistently outperforms existing generalization proxies such as sharpness. Finally, polynomial representations naturally yield a differentiable simplicity regularizer, which consistently improves generalization in image and text classification, fine-tuning contrastive vision-language models, and reinforcement learning.

URL PDF HTML ☆

赞 0 踩 0

2504.19399 2026-06-09 cs.RO 版本更新

S3Mem：用于长时域交互式问答的结构化时空场景-事件记忆

Encheng Su, Jianyu Wu, Jinouwen Zhang, Qiucheng Yu, Chen Tang, Pengze Li, Lintao Wang, Aoran Wang, Xinzhu Ma, Shixiang Tang, Yizhou Wang, Houqiang Li

发表机构 * University of Science and Technology of China（中国科学技术大学）； Shanghai Jiao Tong University（上海交通大学）； Shanghai AI Laboratory（上海人工智能实验室）； City University of Hong Kong（香港城市大学）； The Chinese University of Hong Kong（香港中文大学）； Fudan University（复旦大学）； The University of Sydney（悉尼大学）； Beihang University（北航）

AI总结提出S3MEM框架，通过结构化场景-事件记忆和锚点敏感检索，在长时域交互式问答中实现比通用记忆接口更优的准确率-效率平衡。

详情

AI中文摘要

长时域交互代理通常积累大量轨迹历史，但仍无法可靠地回答关于早期事件的问题。我们认为主要瓶颈不仅是上下文长度，而是长期记忆的轨迹到答案接口。当历史以纯文本块存储并使用标准检索增强生成（RAG）查询时，系统通常检索到局部相关但链不完整的证据，特别是对于空间、时间、重复事件和多跳状态问题。我们提出S3MEM，一种用于长时域交互式问答（QA）的结构化场景-事件情节记忆框架。S3MEM将轨迹写入结构化记忆单元，通过锚点敏感检索检索证据，并为答案时间推理提供紧凑的令牌预算感知证据接口。从这个意义上说，S3MEM是一种结构化证据利用工具，将代理轨迹转换为查询对齐的支持。我们在两个内部标题环境（Crafter、Jericho）和两个外部环境（SciWorld、ALFWorld）上评估S3MEM。在共享的冻结答案时间协议下，S3MEM在所有四个环境中一致优于Vanilla RAG，在Crafter、Jericho和ALFWorld上超过Graph-NoReader，在SciWorld上与之匹配，同时使用的证据令牌显著减少。三个改编的近期基线——A-MEM启发、MemoryOS改编和LightMem改编——在多个设置中优于Vanilla RAG，但没有一个达到S3MEM的整体准确率-效率前沿。总体而言，证据支持一个有限的结论：在当前冻结的答案时间协议下，结构化写入和锚点敏感证据路由为长时域交互式QA提供了比通用记忆接口更强的准确率-效率前沿。

英文摘要

Long-horizon memory question answering often requires sparse evidence from heterogeneous histories, including events, object states, visual observations, temporal relations, and causal steps. Existing memory interfaces expand reader context, retrieve semantically related chunks, or expose graph neighborhoods, but they are not explicitly designed to select compact evidence for a fixed reader. We propose Structured Spatiotemporal Scene--Event Memory (S3Mem), a query-time memory interface that writes textual, visual, and agent-use histories into structured scene--event units and routes compact evidence packs to the reader. Its router scores candidate units, query anchors, and anchor--support links, enabling both single-hop selection and short multi-hop evidence chains without reader fine-tuning or test-time training. Across LoCoMo, EMemBench Visual Games, and AMA-Bench, S3Mem provides a strong score--token trade-off, with the clearest gains on localized event, state, temporal, causal, or provenance evidence. On LoCoMo, S3Mem reaches $0.48$ F1 and $0.40$ BLEU with (1{,}073) evidence tokens per question, about $15.8\times$ fewer than the LoCoMo reference. On EMemBench Visual Games, it obtains the best F1 and second-best accuracy with only $189$tokens.On AMA-Bench, it is not the highest-scoring method, but remains competitive while using the fewest reader-visible evidence tokens.

URL PDF HTML ☆

赞 0 踩 0

2605.19276 2026-06-09 cs.CL cs.LG 版本更新

OpenCompass: A Universal Evaluation Platform for Large Language Models

OpenCompass：大型语言模型的通用评估平台

Maosong Cao, Kai Chen, Haodong Duan, Yixiao Fang, Zhiwei Fei, Tong Gao, Ge Jiaye, Mo Li, Hongwei Liu, Junnan Liu, Yuan Liu, Chengqi Lyu, Han Lyu, Ningsheng Ma, Zerun Ma, Yu Sun, Zhiyong Wu, Linchen Xiao, Zhuozhi Xiong, Jun Xu, Haochen Ye, Zhaohui Yu, Yike Yuan, Songyang Zhang, Yufeng Zhao, Fengzhe Zhou, Peiheng Zhou, Dongsheng Zhu, Lin Zhu, Jingming Zhuo

发表机构 * OpenCompass Team（OpenCompass团队）； Shanghai AI Laboratory（上海人工智能实验室）

AI总结提出OpenCompass，一个模块化、高兼容性、灵活且高并发的通用LLM评估平台，支持多种任务场景和主流基准数据集。

详情

AI中文摘要

近年来，人工智能领域经历了从特定任务的小规模模型到通用大型语言模型（LLM）的范式转变。随着LLM的快速迭代，对其能力进行客观、定量和全面的评估已成为推动技术发展的关键环节。目前，基于静态基准数据集的主流评估方法面临任务类型多样性、评估标准不一致以及数据处理流程碎片化等挑战，难以高效进行跨领域和大规模模型评估。为解决上述问题，本文提出并开源了OpenCompass，一个一站式、可扩展且支持高并发的通用LLM评估平台。该平台遵循模块化和组件解耦的设计理念，具有三大核心优势：高兼容性、灵活性和高并发性。OpenCompass的核心架构包括五个关键组件：配置系统、任务划分模块、执行与调度模块、任务执行单元和结果可视化模块。其工作流程提供基于规则、LLM作为评判者和级联评估器，以适应不同任务场景的需求。平台支持知识、推理、计算、科学、语言、代码等多个领域的基准数据集，为学术界和工业界提供统一高效的LLM评估工具，有助于准确识别LLM的优缺点并进行后续优化。

英文摘要

In recent years, the field of artificial intelligence has undergone a paradigm shift from task-specific small-scale models to general-purpose large language models (LLMs). With the rapid iteration of LLMs, objective, quantitative, and comprehensive evaluation of their capabilities has become a critical link in advancing technological development. Currently, the mainstream static benchmark dataset-based evaluation methods face challenges such as the diversity of task types, inconsistent evaluation criteria, and fragmentation of data and processing workflows, making it difficult to efficiently conduct cross-domain and large-scale model evaluation. To address the aforementioned issues, this paper proposes and open-sources OpenCompass, a one-stop, scalable, and high-concurrency-supported general-purpose LLM evaluation platform. Adhering to the design philosophy of modularization and component decoupling, the platform boasts three core advantages: high compatibility, flexibility, and high concurrency. The core architecture of OpenCompass comprises five key components: the Configuration System, Task Partitioning Module, Execution and Scheduling Module, Task Execution Unit, and Result Visualization Module. Its workflow provides rule-based, LLM-as-a-Judge, and cascaded evaluators to adapt to the requirements of different task scenarios. Supporting mainstream benchmark datasets across multiple domains, including knowledge, reasoning, computation, science, language, code, etc., the platform offers a unified and efficient LLM evaluation tool for both academia and industry, facilitating the accurate identification of strengths and weaknesses of LLMs as well as their subsequent optimization.

URL PDF HTML ☆

赞 0 踩 0

2605.25985 2026-06-09 cs.AI 版本更新

Neural Scalable Symbolic Search Framework for Complex Logical Queries with Multiple Free Variables

面向多自由变量复杂逻辑查询的神经可扩展符号搜索框架

Weizhi Fei, Hang Yin, Zihao Wang, Shukai Zhao, Wei Zhang, Yangqiu Song

发表机构 * Department of Mathematical Sciences, Tsinghua University（清华大学数学科学系）； Squarepoint Capital（Squarepoint资本）； Department of Computer Science and Engineering, Hong Kong University of Science and Technology（香港科学与技术大学计算机科学与工程系）； Department of Computer Sciences, University of Rochester（罗切斯特大学计算机科学系）

AI总结针对知识图谱上多自由变量复杂查询的联合排序难题，提出神经可扩展符号搜索（NS3）框架，通过预算约束和超节点合并近似联合排序，显著提升性能。

Comments 10 pages, 5 figures

详情

AI中文摘要

先聚焦后聆听：探索用于噪声鲁棒的大规模音频语言模型的即插即用音频增强器

Han Yin, Yang Xiao, Younghoo Kwon, Ting Dang, Jung-Woo Choi

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Washington（华盛顿大学）

AI总结提出即插即用的音频增强器FTL，通过分离语音与非语音并利用模态路由器预测目标模态，生成任务自适应增强信号，无需微调即可提升LALMs在噪声环境下的性能。

Comments Accepted by ICML 2026 Workshop (Machine Learning for Audio)

详情

AI中文摘要

大规模音频语言模型（LALMs）是一类用于音频理解的基础模型。现有的LALMs在现实世界的噪声声学条件下，当语音和非语音声音干扰时，性能往往会显著下降。虽然噪声感知微调可以提高鲁棒性，但它需要特定任务的噪声数据和昂贵的重新训练，限制了可扩展性。为了解决这个问题，我们提出了先聚焦后聆听（FTL），一种即插即用的音频增强器，可提高LALMs的噪声鲁棒性。具体来说，FTL首先将输入波形分离为语音和非语音，并应用模态路由器根据用户指令预测目标音频模态（例如，语音）。最后，一个模态感知融合模块生成任务自适应的增强信号，以改善下游感知和推理。跨多个LALMs和任务的实验表明，FTL在不同噪声水平下都能提升性能，而无需对LALMs进行微调。

英文摘要

Large audio language models (LALMs) are a class of foundation models for audio understanding. Existing LALMs tend to degrade significantly in real-world noisy acoustic conditions where speech and non-speech sounds interfere. While noise-aware fine-tuning can improve robustness, it requires task-specific noisy data and expensive retraining, limiting scalability. To address this issue, we propose Focus-Then-Listen (FTL), a plug-and-play audio enhancer that improves LALMs' noise robustness. Specifically, FTL first separates the input waveform into speech and non-speech, and a modality router is applied to predict the target audio modality (e.g., speech) based on the user's instruction. Finally, a modality-aware fusion block generates a task-adaptive enhanced signal for improved downstream perception and reasoning. Experiments across multiple LALMs and tasks show that FTL improves performance across different noise levels without fine-tuning on LALMs.

URL PDF HTML ☆

赞 0 踩 0

2605.23595 2026-06-09 cs.LG cs.AI cs.CV cs.ET cs.PF 版本更新

Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning

基于元学习的成本效益模型评估

Trinh Pham, Viet Huynh, Hongzhi Yin, Quoc Viet Hung Nguyen, Thanh Tam Nguyen

发表机构 * Griffith University（格里菲斯大学）； Edith Cowan University（埃迪斯科文大学）； The University of Queensland（昆士兰大学）

AI总结提出MetaEvaluator，一种基于元学习的模型无关框架，通过参考模型池实现无标签数据上的快速、准确且成本效益高的新模型评估。

Comments Accepted by KDD 2026

详情

AI中文摘要

机器学习的快速发展产生了不断扩展的模型生态系统，使得在未见过的未标记数据上验证新发布模型的可靠性变得越来越具有挑战性。传统的评估流程依赖于昂贵的标注、重复的微调或无法跨模型家族迁移的狭窄假设。我们提出了MetaEvaluator，一个成本效益高、模型无关的框架，用于快速、无标签地评估跨不同架构和模态的未见模型。MetaEvaluator利用参考模型池上的元学习来获得可迁移的初始化，从而能够准确评估新模型，同时将成本分摊到整个池中，并消除了每个模型重新训练的需要。据我们所知，这是第一个能够在完全未标记数据集上评估新模型的模型无关框架。大量实验表明，与传统方法相比，MetaEvaluator以显著降低的成本产生稳定且准确的性能估计，使得在未标记数据上对新出现的模型进行可扩展的基准测试变得实用。

英文摘要

The rapid advancement of machine learning has led to an unprecedented expansion of model ecosystems, making it increasingly difficult to assess the reliability of newly released models on unseen and unlabeled data. Existing evaluation pipelines typically rely on costly annotation, repeated fine-tuning, or assumptions that do not generalize well to new models. We introduce MetaEvaluator, a cost-effective, model-agnostic framework for fast, label-free evaluation of unseen models across diverse architectures and modalities. MetaEvaluator meta-learns over a pool of reference models to acquire an effective initialization for accurate assessment of unseen models, thereby amortizing evaluation cost and eliminating the need for per-model retraining. To the best of our knowledge, this is the first model-agnostic framework that evaluates new models on unlabeled datasets. Extensive experiments demonstrate that MetaEvaluator delivers stable and accurate performance estimates at substantially lower cost than conventional approaches, enabling scalable benchmarking on unlabeled datasets for emerging models. The code is available at: https://github.com/phkhanhtrinh23/MetaEvaluator.

URL PDF HTML ☆

赞 0 踩 0

2605.23247 2026-06-09 cs.LG 版本更新

Accelerating Divisible Load Processing Through Machine Learning: A Practical Framework for Large-Scale Workloads

通过机器学习加速可分负载处理：大规模工作负载的实用框架

Bharadwaj Veeravalli

发表机构 * Department of Electrical and Computer Engineering, National University of Singapore（电子与计算机工程系，新加坡国立大学）

AI总结提出首个机器学习框架，使用前馈神经网络预测单级树网络架构中的最优处理时间，实现97-99%准确率和1-5%平均绝对百分比误差，推理时间小于1毫秒，相比传统方法加速10-100倍。

详情

AI中文摘要

本文介绍了首个用于可分负载理论（DLT）范式下单级树网络（SLTN）架构中预测最优处理时间的机器学习框架。使用具有16个工程特征的前馈神经网络（FNN），我们在100,000个合成生成的配置上训练模型，无需显式推导DLT方程即可预测最优处理时间。模型达到97-99%的准确率（R平方因子），平均绝对百分比误差为1-5%，表明神经网络能够有效学习复杂的负载分布关系。特征重要性分析显示，模型隐式捕捉了DLT的数学结构，包括负载守恒和同时完成约束。推理时间低于1毫秒，该方法相比传统DLT计算提供10-100倍的加速，适用于实时调度、设计空间探索和云资源分配。该方法在多样化的系统配置（n=3到20，负载大小=1到100 GB）中泛化良好，精度一致，尽管在非常大或高度异构的系统中性能略有下降。本工作证明了使用机器学习加速分布式计算优化同时保持接近最优精度的可行性。

英文摘要

In this paper, we introduce the first machine learning framework for predicting optimal processing times in Single-Level Tree Network (SLTN) architectures for the Divisible Load Theory (DLT) paradigm. Using a feedforward neural network(FNN) with 16 engineered features, we train a model on 100,000 synthetically generated configurations to predict optimal processing times without explicit formulation of DLT equations. The model achieves 97-99% accuracy (R-square factor) with mean absolute percentage error of 1-5%, demonstrating that neural networks can effectively learn complex load distribution relationships. Feature importance analysis reveals that the model implicitly captures DLT mathematical structure, including load conservation and simultaneous finishing constraints. With inference times under 1 millisecond, the approach serves as a viable option over traditional DLT computation, enabling applications in real-time scheduling, design space exploration, and cloud resource allocation. The method generalizes well across diverse system configurations (n=3 to 20, load size =1 to 100 GB) with consistent accuracy, though performance degrades slightly for very large or highly heterogeneous systems. This work demonstrates the feasibility of using machine learning to accelerate distributed computing optimization while maintaining near-optimal accuracy.

URL PDF HTML ☆

赞 0 踩 0

2605.22863 2026-06-09 cs.LG 版本更新

Latent Cache Flow: Model-to-Model Communication Without Text

潜在缓存流：无需文本的模型间通信

Maximillian Rossi, Prajwal Raghunath, Eugene Wu

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出潜在缓存流（LCF）方法，通过联合翻译和压缩键值缓存实现高效模型间通信，在上下文不同场景下比基于文本的通信准确率提高23%、速度提升8.5倍。

Comments 6 pages, 5 figures

详情

AI中文摘要

当今的LLM智能体通过文本进行通信，由于需要自回归解码共享模型的状态并在接收模型处编码，这会导致显著的延迟和信息损失。最近的工作如Cache-to-Cache（C2C；Fu等人，2026）试图通过学习适配器来交换KV缓存，该适配器将共享者的KV矩阵转换为接收者模型。然而，这些适配器体积庞大且训练成本高，并且逐词翻译，要求目标上下文完全相同。这对于LLM具有不同上下文的智能体通信来说是不合适的。我们引入了潜在缓存流（LCF）。为了解决效率问题，我们观察到键和值可以联合翻译和压缩，将适配器大小减少到C2C的约4%。为了解决上下文不同的问题，我们设计了适配器来传输目标模型所没有的新信息的摘要。我们的初步实验表明，在共享上下文设置中，一个13 MB的LCF适配器可以比956 MB的C2C适配器更准确；对于不同上下文，LCF比基于文本的通信准确率提高23%，速度提升8.5倍。

英文摘要

LLM agents today communicate via text, which incurs considerable latency and information loss due to the need to autoregressively decode the sharer model's state and encode at the receiver model. Recent work such as Cache-to-Cache (C2C; Fu et al., 2026) seeks to exchange KV caches by learning adapters that translate sharer KV matrices to the receiver model. However, the adapters are large and expensive to train, and translate individual tokens, which requires the target context to be identical. This is unsuitable for agent communication, where the LLMs have differing context. We introduce Latent Cache Flow (LCF). To address efficiency, we observe that keys and values can be jointly translated and compressed, reducing the adapter to about 4% of C2C's size. To address differing context, we design the adapter to transmit a summary of new information that the target model does not have. Our early experiments show that a pruned 13 MB LCF adapter can be more accurate than C2C at 956 MB in shared-context settings; for different contexts, LCF improves F1 by 7.5% and Exact Match by 23% while 8.5 times faster than text-based communication.

URL PDF HTML ☆

赞 0 踩 0

2604.24594 2026-06-09 cs.CL cs.AI 版本更新

Skill Retrieval Augmentation for Agentic AI

面向智能体AI的技能检索增强

Weihang Su, Jianming Long, Qingyao Ai, Qiaozhi He, Yichen Tang, Changyue Wang, Yiteng Tu, Yingbo Wang, Yiqun Liu

发表机构 * Department of Computer Science and Technology, Tsinghua University（清华大学计算机科学与技术系）； ByteDance Inc.（字节跳动公司）

AI总结针对现有智能体系统在技能库扩展时上下文窗口不足、技能识别准确率下降的问题，提出技能检索增强（SRA）范式，通过动态检索外部技能库提升智能体性能，并构建SRA-Bench基准揭示技能整合中的瓶颈。

详情

基于漂移模型的语音增强

Liang Xu, Diego Caviedes-Nozal, W. Bastiaan Kleijn, Longfei Felix Yan, Rasmus Kongsgaard Olsson

发表机构 * Victoria University of Wellington（维多利亚大学）； Lincoln University（林肯大学）； GN Advanced Science（GN先进科学）

AI总结本文提出了一种基于漂移模型的语音增强框架DriftSE，通过将去噪问题建模为平衡问题，实现单步推理，从而在无需配对数据的情况下实现高质量语音增强。

Comments 6 pages, 2 figures

详情

AI中文摘要

我们提出了一种基于漂移模型的语音增强（DriftSE），一种新颖的生成框架，将去噪建模为一个平衡问题。与依赖迭代采样的方法不同，DriftSE通过演化映射函数的推动分布来实现单步推理，直接匹配干净语音分布。这种演化由漂移场驱动，这是一种学习到的修正向量，引导样本向干净分布的高密度区域发展，这自然促进了在未配对数据上的训练，通过匹配分布而非配对样本。我们从两种形式研究了该框架：从噪声观测到直接映射，以及从高斯先验的随机条件生成模型。在VoiceBank-DEMAND基准测试中，DriftSE在单步中实现了高保真度的增强，优于多步扩散基线，并建立了语音增强的新范式。

英文摘要

We propose Speech Enhancement based on Drifting Models (DriftSE), a novel generative framework that formulates denoising as an equilibrium problem. Rather than relying on iterative sampling, DriftSE natively achieves one-step inference by evolving the pushforward distribution of a mapping function to directly match the clean speech distribution. This evolution is driven by a Drifting Field, a learned correction vector that guides samples toward the high-density regions of the clean distribution, which naturally facilitates training on unpaired data by matching distributions rather than paired samples. We investigate the framework under two formulations: a direct mapping from the noisy observation, and a stochastic conditional generative model from a Gaussian prior. Experiments on the VoiceBank-DEMAND benchmark demonstrate that DriftSE achieves high-fidelity enhancement in a single step, outperforming multi-step diffusion baselines and establishing a new paradigm for speech enhancement.

URL PDF HTML ☆

赞 0 踩 0