arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

cs.LG 机器学习应用 16 cs.AI AI应用与系统 13 cs.AI 机器学习与表示学习 12 cs.LG 强化学习与序列决策 12 cs.AI 可信、安全与AI治理 10 cs.LG 数据集、基准与评测 10 cs.AI 评测、基准与数据集 9 cs.LG 优化、泛化与理论分析 9 cs.CV 生成式视觉与世界模型 8 cs.LG 深度学习架构与训练方法 8 cs.RO 导航、定位与SLAM 8 cs.AI 机器人与具身智能 7 cs.LG 高效学习、压缩与部署 7 cs.AI 自然语言与多模态智能 6 cs.CV 数据集、基准、评测与训练方法 6 cs.CV 多模态与视觉语言模型 5 cs.CV 医学影像与生物视觉 5 cs.CL 大语言模型与基础模型 5 cs.LG 生成模型与概率建模 5 cs.LG 鲁棒性、不确定性与可信学习 5 cs.AI 智能体、规划与决策 4 cs.AI 其他/综合AI 4 cs.CV 3D视觉、点云与空间智能 4 cs.CV 鲁棒性、安全、隐私与可信视觉 4 cs.CL 对话系统与智能体 4 cs.CL 多模态语言处理 4 cs.CL 评测、数据集与基准 4 cs.CL 安全、隐私、公平与可解释NLP 4 cs.LG 其他/综合机器学习 4 cs.RO 操作、抓取与灵巧手 4 cs.RO 无人车、无人机与移动机器人 4 cs.AI 多智能体与博弈 3 cs.CV 具身智能、机器人与自动驾驶 3 cs.CV 图像识别、检索与分类 3 cs.CV 目标检测、分割与定位 3 cs.CV 其他/综合视觉 3 cs.CL 语音语言联合与音频文本 3 cs.CL 其他/综合NLP 3 cs.LG 图学习与结构化数据 3 cs.LG 迁移、元学习与持续学习 3 cs.RO 机器人学习与模仿强化学习 3 cs.RO 仿真、数据集与评测 3 cs.CL 机器翻译与跨语言处理 2 cs.LG 表示学习、自监督与对比学习 2 cs.RO 运动规划、控制与动力学 2 cs.RO 人机交互与协作机器人 2 cs.RO 具身智能与视觉语言动作模型 2 cs.RO 软体机器人与硬件设计 2 cs.SD 语音识别与关键词检测 2 cs.AI 搜索、优化与约束求解 1 cs.CV 低层视觉、计算成像与图像增强 1 cs.CL 文本生成、摘要与编辑 1 cs.CL 低资源、领域适配与高效训练 1 cs.RO 多机器人与群体系统 1 cs.RO 安全、鲁棒性与可信机器人 1 cs.RO 其他/综合机器人 1 cs.SD 语音合成与声音生成 1 cs.SD 音频事件检测与场景理解 1 cs.SD 安全、隐私与深度伪造音频 1 cs.SD 其他/综合语音音频 1

2510.01565 2026-06-19 cs.LG cs.DC 版本更新

TetriServe: Efficiently Serving Mixed DiT Workloads

TetriServe: 高效服务混合DiT工作负载

Runyu Lu, Shiqi He, Wenxuan Tan, Shenggui Li, Ruofan Wu, Jeff J. Ma, Ang Chen, Mosharaf Chowdhury

发表机构 * University of Michigan（密歇根大学）； University of Wisconsin-Madison（威斯康星大学麦迪逊分校）； Nanyang Technological University（南洋理工大学）

AI总结针对混合分辨率与截止时间的异构DiT工作负载，提出基于步骤级序列并行的TetriServe系统，通过轮次调度与自适应并行度，在保证图像质量下将SLO达成率提升32%。

详情

AI中文摘要

扩散Transformer（DiT）模型通过迭代去噪步骤生成高质量图像，但由于其高计算成本（尤其在大分辨率下），在严格服务级别目标（SLO）下服务这些模型具有挑战性。现有服务系统使用固定程度的序列并行，这对于具有混合分辨率和截止时间的异构工作负载效率低下，导致GPU利用率低和SLO达成率低。在本文中，我们提出步骤级序列并行，根据请求的截止时间动态调整单个请求的并行度。我们提出了TetriServe，一个实现此策略的DiT服务系统，用于高效图像生成。具体来说，TetriServe引入了一种新颖的基于轮次的调度机制，通过（1）将时间离散化为固定轮次以使截止时间感知调度可处理，（2）在步骤级别自适应并行度并最小化GPU小时消耗，以及（3）联合打包请求以最小化延迟完成，从而提高SLO达成率。对最先进的DiT模型进行的广泛评估表明，与现有解决方案相比，TetriServe在不降低图像质量的情况下实现了高达32%的SLO达成率提升。

英文摘要

Diffusion Transformer (DiT) models excel at generating high-quality images through iterative denoising steps, but serving them under strict Service Level Objectives (SLOs) is challenging due to their high computational cost, particularly at larger resolutions. Existing serving systems use fixed-degree sequence parallelism, which is inefficient for heterogeneous workloads with mixed resolutions and deadlines, leading to poor GPU utilization and low SLO attainment. In this paper, we propose step-level sequence parallelism to dynamically adjust the degree of parallelism of individual requests according to their deadlines. We present TetriServe, a DiT serving system that implements this strategy for highly efficient image generation. Specifically, TetriServe introduces a novel round-based scheduling mechanism that improves SLO attainment by (1) discretizing time into fixed rounds to make deadline-aware scheduling tractable, (2) adapting parallelism at the step level and minimizing GPU hour consumption, and (3) jointly packing requests to minimize late completions. Extensive evaluation on state-of-the-art DiT models shows that TetriServe achieves up to 32% higher SLO attainment compared to existing solutions without degrading image quality.

URL PDF HTML ☆

赞 0 踩 0

2508.02604 2026-06-19 cs.RO cs.SY eess.SY 版本更新

在线镜像下降中近似的隐藏代价

Ofir Schlisselberg, Uri Sherman, Tomer Koren, Yishay Mansour

发表机构 * Tel Aviv University（特拉维夫大学）； Google Research（谷歌研究）

AI总结研究在线镜像下降（OMD）在近似误差下的鲁棒性，发现正则子光滑度与误差容忍度密切相关：均匀光滑正则子有紧界，而负熵在单纯形上需指数小误差，对数障碍和Tsallis正则子仅需多项式误差。

详情

AI中文摘要

在线镜像下降（OMD）是一个基本的算法范式，支撑着优化、机器学习和序列决策中的许多算法。OMD迭代被定义为优化子问题的解，而这些子问题通常只能近似求解，导致算法的不精确版本。然而，现有的OMD分析通常假设理想的无误差环境，从而限制了我们对实践中应期望的性能保证的理解。在这项工作中，我们启动了对不精确OMD的系统研究，并揭示了正则子光滑性与对近似误差鲁棒性之间的复杂关系。当正则子一致光滑时，我们建立了由误差引起的超额遗憾的紧界。然后，对于单纯形及其子集上的障碍正则子，我们识别出一个尖锐的分离：负熵需要指数小的误差以避免线性遗憾，而对数障碍和Tsallis正则子即使在误差仅为多项式大小时也能保持鲁棒。最后，我们表明当损失是随机的且域是单纯形时，负熵重新获得鲁棒性——但这种性质并不扩展到所有子集，在那里指数小的误差再次是避免次优遗憾所必需的。

英文摘要

Online mirror descent (OMD) is a fundamental algorithmic paradigm that underlies many algorithms in optimization, machine learning and sequential decision-making. The OMD iterates are defined as solutions to optimization subproblems which, oftentimes, can be solved only approximately, leading to an inexact version of the algorithm. Nonetheless, existing OMD analyses typically assume an idealized error free setting, thereby limiting our understanding of performance guarantees that should be expected in practice. In this work we initiate a systematic study into inexact OMD, and uncover an intricate relation between regularizer smoothness and robustness to approximation errors. When the regularizer is uniformly smooth, we establish a tight bound on the excess regret due to errors. Then, for barrier regularizers over the simplex and its subsets, we identify a sharp separation: negative entropy requires exponentially small errors to avoid linear regret, whereas log-barrier and Tsallis regularizers remain robust even when the errors are only polynomial. Finally, we show that when the losses are stochastic and the domain is the simplex, negative entropy regains robustness-but this property does not extend to all subsets, where exponentially small errors are again necessary to avoid suboptimal regret.

URL PDF HTML ☆

赞 0 踩 0

2508.04424 2026-06-19 cs.CV 版本更新

Composed Object Retrieval: Object-level Retrieval via Composed Expressions

组合对象检索：通过组合表达式进行对象级检索

Tong Wang, Guanyu Yang, Nian Liu, Zongyan Han, Jinxing Zhou, Salman Khan, Fahad Shahbaz Khan

发表机构 * Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Southeast University, Ministry of Education, Jiangsu, China（新一代人工智能技术及跨学科应用国家重点实验室，东南大学，教育部，江苏，中国）； Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE（穆罕默德·本·扎耶德人工智能大学（MBZUAI），阿布扎赫德，阿联酋）

AI总结提出组合对象检索（COR）任务，通过组合参考对象、掩码和检索文本进行对象级检索，并构建COR125K基准和CORE模型，显著优于现有方法。

详情

AI中文摘要

基于用户意图检索细粒度视觉内容在多模态系统中仍然是一个挑战。尽管当前的组合图像检索（CIR）方法结合了参考图像和检索文本，但它们局限于图像级匹配，无法定位特定对象。为此，我们提出了组合对象检索（COR），一种新的对象级检索任务，从目标图像中的候选对象中检索目标对象，并用像素级掩码对检索结果进行定位。给定一个参考对象、其掩码、一个目标图像以及描述所需修改的检索文本，COR要求模型执行组合视觉-文本推理，而不是依赖显式的类别名称。这一设置带来了若干挑战，包括细粒度组合匹配、在视觉相似干扰物下的负对象过滤以及灵活的单对象或多对象检索。我们构建了COR125K，第一个大规模COR基准，包含408个类别的125,541个检索三元组，并划分基础/新类别以评估类别级泛化能力。我们还提出了CORE，一个统一的端到端模型，集成了参考区域编码、自适应视觉-文本交互和区域级对比学习，以将组合表示与目标对象对齐，同时抑制背景和干扰物。大量实验表明，CORE在基础和新类别上均显著优于现有的基于CIR的流程和强基线，为细粒度对象级多模态检索建立了一个简单而有效的基础。代码将在此https URL公开发布。

英文摘要

Retrieving fine-grained visual content based on user intent remains a challenge in multimodal systems. Although current Composed Image Retrieval (CIR) methods combine reference images with retrieval texts, they are constrained to image-level matching and cannot localize specific objects. To this end, we propose Composed Object Retrieval (COR), a new object-level retrieval task that retrieves target object(s) from candidate objects in a target image and grounds the retrieved result with pixel-level masks. Given a reference object, its mask, a target image, and a retrieval text describing the desired modification, COR requires models to perform composed visual-textual reasoning rather than relying on explicit category names. This setting introduces several challenges, including fine-grained compositional matching, negative-object filtering under visually similar distractors, and flexible single- or multi-object retrieval. We construct COR125K, the first large-scale COR benchmark, containing 125,541 retrieval triplets across 408 categories with base/novel splits for evaluating category-level generalization. We also present CORE, a unified end-to-end model that integrates reference region encoding, adaptive vision-text interaction, and region-level contrastive learning to align composed representations with target objects while suppressing background and distractors. Extensive experiments demonstrate that CORE significantly outperforms existing CIR-based pipelines and strong baselines in both base and novel categories, establishing a simple and effective foundation for fine-grained object-level multimodal retrieval. Code will be released publicly at https://github.com/wangtong627/COR.

URL PDF HTML ☆

赞 0 踩 0

2511.04260 2026-06-19 cs.CV cs.AI 版本更新

Proto-LeakNet: Towards Signal-Leak Aware Attribution in Synthetic Human Face Imagery

Proto-LeakNet：面向合成人脸图像中信号泄漏感知的归因方法

Claudio Giusti, Luca Guarnera, Sebastiano Battiato

发表机构 * Department of Mathematics and Computer Science（数学与计算机科学系）； University of Catania（卡塔尼亚大学）

AI总结提出Proto-LeakNet，利用扩散模型中的信号泄漏痕迹，结合闭集分类与密度开集评估，实现可解释的生成器归因，在闭集上训练后对未见生成器也有效。

Comments 44 pages, 27 figures, 11 tables

详情

DOI: 10.1016/j.cviu.2026.104848

AI中文摘要

合成图像和深度伪造生成模型的日益复杂使得源归因和真实性验证成为现代计算机视觉系统的关键挑战。最近的研究表明，扩散管道会在其输出中无意中留下持久的统计痕迹，称为信号泄漏，特别是在潜在表示中。基于这一观察，我们提出了Proto-LeakNet，一个信号泄漏感知且可解释的归因框架，它将闭集分类与基于密度的开集评估相结合，对学习到的嵌入进行开集评估，从而无需重新训练即可分析未见过的生成器。我们的方法作用于扩散模型的潜在域，重新模拟部分前向扩散以暴露残留的生成器特定线索。一个时间注意力编码器聚合多步潜在特征，而一个特征加权原型头则结构化嵌入空间并实现透明的归因。仅在闭集数据上训练并达到98.13%的宏AUC，Proto-LeakNet学习到的潜在几何结构在后处理下保持鲁棒，超越了最先进的方法，并且在真实图像与已知生成器之间以及已知与未见生成器之间实现了强可分离性。代码库可在以下链接获取：this https URL。

英文摘要

The growing sophistication of synthetic image and deepfake generation models has turned source attribution and authenticity verification into a critical challenge for modern computer vision systems. Recent studies suggest that diffusion pipelines unintentionally imprint persistent statistical traces, known as signal-leaks, within their outputs, particularly in latent representations. Building on this observation, we propose Proto-LeakNet, a signal-leak-aware and interpretable attribution framework that integrates Closed-set classification with a density-based Open-set evaluation on the learned embeddings, enabling analysis of unseen generators without retraining. Acting in the latent domain of diffusion models, our method re-simulates partial forward diffusion to expose residual generator-specific cues. A temporal attention encoder aggregates multi-step latent features, while a feature-weighted prototype head structures the embedding space and enables transparent attribution. Trained solely on closed data and achieving a Macro AUC of 98.13\%, Proto-LeakNet learns a latent geometry that remains robust under post-processing, surpassing state-of-the-art methods, and achieves strong separability both between real images and known generators, and between known and unseen ones. The codebase is available at the following link: https://github.com/claudiunderthehood/Proto-LeakNet .

URL PDF HTML ☆

赞 0 踩 0

2510.18784 2026-06-19 cs.LG 版本更新

CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training

CAGE: 曲率感知梯度估计用于精确的量化感知训练

Soroush Tabesh, Mher Safaryan, Andrei Panferov, Alexandra Volkova, Dan Alistarh

发表机构 * Anonymous Authors（匿名作者）

AI总结提出CAGE方法，通过曲率感知校正项改进直通估计器，平衡损失最小化与量化约束，在平滑非凸设置下提供收敛保证，显著提升低比特量化感知训练的精度。

Comments Accepted at MLSys 2026 (Oral). To appear in Proceedings of Machine Learning and Systems 8

Journal ref Proceedings of Machine Learning and Systems 8 (MLSys 2026)

详情

AI中文摘要

尽管在低比特量化感知训练（QAT）方面已有大量工作，但这些技术与原生训练之间仍存在精度差距。为解决这一问题，我们引入了CAGE（曲率感知梯度估计），一种新的QAT方法，它用曲率感知校正项增强直通估计器（STE）梯度，旨在抵消量化引起的损失增加。CAGE源自QAT的多目标视角，平衡损失最小化与量化约束，产生一个依赖于局部曲率信息的原理性校正项。在理论方面，我们引入了量化优化的帕累托最优解概念，并证明CAGE在平滑非凸设置下具有强收敛保证。在实现方面，我们的方法是优化器无关的，但我们提供了一个利用Adam统计信息的高效实现。在相似计算成本下，CAGE在精度上显著优于先前最先进的方法：对于QAT微调，它将压缩精度损失相对于先前最佳方法减半；而对于Llama模型的QAT预训练，其在3比特权重和激活（W3A3）下的精度与先前最佳方法在4比特（W4A4）下达到的精度相当。官方实现可在以下链接找到：https://github.com/IST-DASLab/CAGE。

英文摘要

Despite significant work on low-bit quantization-aware training (QAT), there is still an accuracy gap between such techniques and native training. To address this, we introduce CAGE (Curvature-Aware Gradient Estimation), a new QAT method that augments the straight-through estimator (STE) gradient with a curvature-aware correction designed to counteract the loss increase induced by quantization. CAGE is derived from a multi-objective view of QAT that balances loss minimization with the quantization constraints, yielding a principled correction term that depends on local curvature information. On the theoretical side, we introduce the notion of Pareto-optimal solutions for quantized optimization, and establish that CAGE yields strong convergence guarantees in the smooth non-convex setting. In terms of implementation, our approach is optimizer-agnostic, but we provide a highly-efficient implementation that leverages Adam statistics. CAGE significantly improves upon the prior state-of-the-art methods in terms of accuracy, for similar computational cost: for QAT fine-tuning, it halves the compression accuracy loss relative to the prior best method, while for QAT pre-training of Llama models, its accuracy for 3-bit weights-and-activations (W3A3) matches the accuracy achieved at 4-bits (W4A4) with the prior best method. The official implementation can be found over https://github.com/IST-DASLab/CAGE .

URL PDF HTML ☆

赞 0 踩 0

2507.23534 2026-06-19 cs.LG cs.CV 版本更新

Continual Learning with Support Boundary Experience Blending

支持边界经验混合的持续学习

Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

发表机构 * National Taiwan University（国立台湾大学）

AI总结提出经验混合框架，通过差分隐私启发的噪声生成支持边界数据，联合训练样本和边界数据以正则化决策边界，在多个数据集上提升持续学习准确率。

详情

AI中文摘要

持续学习旨在减轻模型在顺序任务训练时的灾难性遗忘。常见方法经验回放存储过去的样本，但仅稀疏地近似数据分布，导致决策边界脆弱且过于简化。我们通过引入支持边界数据来解决这一限制，该数据通过差分隐私启发的噪声注入潜在特征，生成边界邻近表示，隐式正则化决策边界。基于此，我们提出经验混合框架，通过双模型聚合策略联合训练样本和支持边界数据。经验混合有两个组成部分：(1) 潜在空间噪声注入以生成支持边界数据，(2) 联合利用样本和支持边界数据的端到端训练。与标准经验回放不同，支持边界数据丰富了决策边界附近的特征空间，从而实现更稳定和鲁棒的持续学习。在CIFAR-10、CIFAR-100、Tiny ImageNet和ImageNet1K上的大量实验分别展示了10%、6%、13%和2%的持续准确率提升。

英文摘要

Continual learning (CL) seeks to mitigate catastrophic forgetting when models are trained with sequential tasks. A common approach, experience replay (ER), stores past exemplars but only sparsely approximates the data distribution, yielding fragile and oversimplified decision boundaries. We address this limitation by introducing Support Boundary Data (SBD), generated via differential-privacy-inspired noise into latent features to create boundary-adjacent representations that implicitly regularize decision boundaries. Building on this idea, we propose Experience Blending (EB), a framework that jointly trains on exemplars and SBD through a dual-model aggregation strategy. EB has two components: (1) latent-space noise injection to generate support boundary data, and (2) end-to-end training that jointly leverages exemplars and SBD. Unlike standard experience replay, SBD enriches the feature space near decision boundaries, leading to more stable and robust continual learning. Extensive experiments on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet1K demonstrate consistent accuracy improvements of 10%, 6%, 14%, 2%, respectively.

URL PDF HTML ☆

赞 0 踩 0

2510.27285 2026-06-19 cs.CV cs.CR 版本更新

MENTOR: 通过灵活的教师优化奖励进行工具使用蒸馏的强化学习

ChangSu Choi, Hoyun Song, Dongyeon Kim, WooHyeon Jung, Minkyung Cho, Sunjin Park, NohHyeob Bae, Seona Yu, KyungTae Lim

发表机构 * Seoul National University of Science and Technology（首尔科学技术大学）； Korea Advanced Institute of Science and Technology（韩国科学技术院）； LG CNS

AI总结提出MENTOR方法，通过灵活的教师优化奖励结构，平衡行为对齐与下游性能，提升小模型在工具使用任务中的域外泛化能力。

详情

AI中文摘要

将大型语言模型（LLMs）的工具使用能力蒸馏到小型语言模型（SLMs）中对其实际应用至关重要。主要方法监督微调（SFT）由于与静态教师轨迹的刚性对齐，导致域外（OOD）泛化性能较差。虽然强化学习（RL）提供了一种替代方案，但SLMs的能力限制带来了严峻的困境：稀疏的结果奖励提供的指导不足，而严格的轨迹匹配施加了过于严格的约束。为了弥合这一能力驱动的差距，我们提出了MENTOR，它引入了一种灵活且过程感知的奖励结构。MENTOR不强制执行刚性复制，而是利用教师的参考来指导工具使用行为，平衡行为对齐与下游性能。在可控可执行工具基准上的大量实验表明，与SFT和严格RL基线相比，MENTOR提高了OOD工具使用性能。我们的研究结果表明，在可验证的工具使用环境中，灵活的工具使用对齐比严格的轨迹复制为开发适应性小模型提供了更有效的方法。

英文摘要

Distilling the tool-use capabilities of large language models (LLMs) into small language models (SLMs) is essential for their practical application. The predominant approach, supervised fine-tuning (SFT), suffers from poor out-of-domain (OOD) generalization due to its rigid alignment with static teacher trajectories. While reinforcement learning (RL) offers an alternative, the capacity limitations of SLMs pose a severe dilemma: sparse outcome rewards provide insufficient guidance, whereas strict trajectory matching imposes overly restrictive constraints. To bridge this capacity-driven gap, we propose MENTOR, which introduces a flexible yet process-aware reward structure. Instead of enforcing rigid replication, MENTOR uses the teacher's reference to guide tool-use behavior, balancing behavioral alignment with downstream performance. Extensive experiments on controlled executable-tool benchmarks demonstrate that MENTOR improves OOD tool-use performance compared to SFT and strict RL baselines. Our findings suggest that within verifiable tool-use environments, flexible tool-use alignment offers a more effective approach than strict trajectory replication for developing adaptable small models.

URL PDF HTML ☆

赞 0 踩 0

2510.21978 2026-06-19 cs.LG cs.AI 版本更新

面向一般有向图对比学习：双空间视角

Zhengyu Wu, Daohan Su, Yang Zhang, Xunkai Li, Rong-Hua Li, Guoren Wang

发表机构 * National University of Singapore（新加坡国立大学）； University of Science and Technology of China（中国科学技术大学）

AI总结提出S2-DiGCL框架，从复数域和实数域双空间视角对有向图进行对比学习，通过磁拉普拉斯自适应调制和路径子图增强，在节点分类和链接预测任务上分别提升4.41%和4.34%。

详情

AI中文摘要

图对比学习（GCL）已成为一种从图中提取一致表示而无需标签信息的强大工具。然而，现有方法主要关注无向图，忽略了在实际网络（如社交网络和推荐系统）中基础且不可或缺的关键方向信息。本文提出了S2-DiGCL，一种新颖的框架，强调从复杂域和实数域视角对有向图进行对比学习的空间洞察。从复数域视角，S2-DiGCL在磁拉普拉斯中引入个性化扰动，以自适应地调制边相位和方向语义。从实数域视角，它采用基于路径的子图增强策略，捕捉细粒度的局部不对称性和拓扑依赖性。通过联合利用这两个互补的空间视图，S2-DiGCL构建了高质量的正负样本，从而实现更通用和鲁棒的有向图对比学习。在7个真实有向图数据集上的大量实验证明了我们方法的优越性，在监督和无监督设置下，节点分类和链接预测分别实现了4.41%和4.34%的性能提升，达到了最先进水平。

英文摘要

Graph Contrastive Learning (GCL) has emerged as a powerful tool for extracting consistent representations from graphs, independent of labeled information. However, existing methods predominantly focus on undirected graphs, disregarding the pivotal directional information that is fundamental and indispensable in real-world networks (e.g., social networks and recommendations).In this paper, we introduce S2-DiGCL, a novel framework that emphasizes spatial insights from complex and real domain perspectives for directed graph (digraph) contrastive learning. From the complex-domain perspective, S2-DiGCL introduces personalized perturbations into the magnetic Laplacian to adaptively modulate edge phases and directional semantics. From the real-domain perspective, it employs a path-based subgraph augmentation strategy to capture fine-grained local asymmetries and topological dependencies. By jointly leveraging these two complementary spatial views, S2-DiGCL constructs high-quality positive and negative samples, leading to more general and robust digraph contrastive learning. Extensive experiments on 7 real-world digraph datasets demonstrate the superiority of our approach, achieving SOTA performance with 4.41% improvement in node classification and 4.34% in link prediction under both supervised and unsupervised settings.

URL PDF HTML ☆

赞 0 踩 0

2510.08807 2026-06-19 cs.RO cs.LG 版本更新

AAPA：用于大型语言模型后训练的对抗锚定偏好对齐

Faqiang Qian, Kang An, Weikun Zhang, Ziliang Wang, Xuhui Zheng, Liangjian Wen, Yong Dai, Mengya Gao, Yichao Wu

发表机构 * Southwest University of Finance and Economics（西南财经大学）

AI总结提出AAPA框架，通过固定轻量判别器对策略输出与专家响应进行句子级对抗锚定，增强SFT、GRPO等后训练目标，在指令遵循基准上持续提升性能。

详情

AI中文摘要

大型语言模型的后训练对齐通常结合了专家演示上的监督微调（SFT）和来自偏好或可验证反馈的强化学习（RL）。SFT提供了有用的行为锚点，但可能过拟合静态演示，而RL鼓励探索但可能偏离专家行为或利用不完美的奖励。我们提出\textbf{AAPA}（\emph{对抗锚定偏好对齐}），这是一个插件式框架，通过句子级对抗锚定信号增强现有的后训练目标。AAPA使用固定的轻量判别器将策略生成结果与离线预收集的专家响应进行比较，因此在策略优化期间既不需要在线教师推理，也不需要判别器协同训练。相同的锚定项可以添加到SFT、GRPO和CHORD中，同时保留其原始训练流程。在指令遵循基准上的实验表明，AAPA在不同模型规模上一致地改善了相应的基础目标。特别是，分阶段的AAPA配置在\texttt{Qwen3-0.6B}上比强GRPO基线提高了5.77%，在\texttt{Qwen3-4B}上提高了3.75%。对响应长度、对数概率分布和判别器变体的进一步分析表明，对抗锚定为偏好优化提供了稳定的语义基础信号。代码可在\url{this https URL}获取。

英文摘要

Post-training alignment of large language models often combines supervised fine-tuning (SFT) on expert demonstrations with reinforcement learning (RL) from preference or verifiable feedback. SFT provides a useful behavioral anchor but can overfit to static demonstrations, whereas RL encourages exploration but may drift from expert behavior or exploit imperfect rewards. We propose \textbf{AAPA} (\emph{Adversarially Anchored Preference Alignment}), a plug-in framework that augments existing post-training objectives with a sentence-level adversarial anchoring signal. AAPA compares policy rollouts with offline, pre-collected expert responses using a fixed lightweight discriminator, and therefore requires neither online teacher inference nor discriminator co-training during policy optimization. The same anchoring term can be added to SFT, GRPO, and CHORD while preserving their original training pipelines. Experiments on instruction-following benchmarks show that AAPA consistently improves the corresponding base objectives across model scales. In particular, the staged AAPA configuration improves over a strong GRPO baseline by 5.77\% on \texttt{Qwen3-0.6B} and 3.75\% on \texttt{Qwen3-4B}. Further analyses on response length, log-probability distributions, and discriminator variants suggest that adversarial anchoring provides a stable semantic grounding signal for preference optimization. Code is available at \url{https://github.com/IsFaqq/AAPA}.

URL PDF HTML ☆

赞 0 踩 0

2509.19658 2026-06-19 cs.RO cs.AI 版本更新

RoboSSM: Scalable In-context Imitation Learning via State-Space Models

RoboSSM: 基于状态空间模型的可扩展上下文模仿学习

Youngju Yoo, Jiaheng Hu, Yifeng Zhu, Bo Liu, Qiang Liu, Roberto Martín-Martín, Peter Stone

发表机构 * The University of Texas at Austin（德克萨斯大学奥斯汀分校）； KAIST（韩国科学技术院）； FAIR at Meta（元宇宙FAIR）； Amazon（亚马逊）； Sony AI（索尼人工智能）

AI总结提出RoboSSM，用状态空间模型替代Transformer实现上下文模仿学习，在LIBERO基准上对未见和长时任务泛化更优，首次证明SSM是ICIL高效可扩展的骨干网络。

Comments IROS 2026

详情

AI中文摘要

上下文模仿学习（ICIL）使机器人能够从仅包含少量演示的提示中学习任务。通过消除部署时参数更新的需求，该范式支持对新任务的少样本适应。然而，最近的ICIL方法依赖于Transformer，其计算能力有限，并且在处理比训练时更长的提示时往往表现不佳。在这项工作中，我们引入了RoboSSM，一种基于状态空间模型（SSM）的可扩展上下文模仿学习方案。具体来说，RoboSSM用Longhorn（一种最先进的SSM）替代Transformer，该模型提供线性时间推理和强大的外推能力，非常适合长上下文提示。通过在LIBERO基准上的多样化实验，我们证明了将SSM应用于ICIL的有效性，通过处理测试时更长的上下文，实现了比基于Transformer的ICIL方法对未见和长时任务更好的泛化。这些结果首次表明，SSM是ICIL高效且可扩展的骨干网络。我们的代码可在此网址获取。

英文摘要

In-context imitation learning (ICIL) enables robots to learn tasks from prompts consisting of just a handful of demonstrations. By eliminating the need for parameter updates at deployment time, this paradigm supports few-shot adaptation to novel tasks. However, recent ICIL methods rely on Transformers, which have computational limitations and tend to underperform when handling longer prompts than those seen during training. In this work, we introduce RoboSSM, a scalable recipe for in-context imitation learning based on state-space models (SSM). Specifically, RoboSSM replaces Transformers with Longhorn -- a state-of-the-art SSM that provides linear-time inference and strong extrapolation capabilities, making it well-suited for long-context prompts. Through diverse experiments on the LIBERO benchmark, we demonstrate the effectiveness of applying SSMs to ICIL, achieving improved generalization to both unseen and long-horizon tasks than Transformer-based ICIL methods by handling longer contexts at test-time. These results show for the first time that SSMs are an efficient and scalable backbone for ICIL. Our code is available at https://github.com/youngjuY/RoboSSM.

URL PDF HTML ☆

赞 0 踩 0

2509.10416 2026-06-19 cs.RO 版本更新

TASC: Task-Aware Shared Control for Relational Telemanipulation

TASC：面向关系遥操作的任务感知共享控制

Ze Fu, Pinhao Song, Yutong Hu, Renaud Detry

发表机构 * KU Leuven, Dept. Mechanical Engineering, Research unit Robotics, Automation and Mechatronics（KU莱顿机械工程系，机器人、自动化与机电一体化研究单位）； KU Leuven, Dept. Electrical Engineering, Research unit Processing Speech and Images（KU莱顿电气工程系，语音与图像处理研究单位）

AI总结提出TASC框架，通过视觉构建开放词汇交互图推断任务级用户意图，并基于空间约束提供共享控制辅助，提升关系遥操作效率与泛化能力。

Comments Accepted to IROS 2026

详情

AI中文摘要

我们提出了TASC，一个面向关系遥操作的任务感知共享控制框架，该框架从仅运动输入中推断任务级用户意图并提供辅助。为了在没有预定义模板的情况下支持抓取关系任务，TASC从视觉输入构建一个开放词汇的交互图来表示功能性物体关系，并据此推断用户意图。然后，共享控制策略在抓取和物体交互过程中提供辅助，该辅助由视觉语言模型预测的空间约束引导。我们的方法解决了共享控制下关系遥操作的两个关键挑战：（1）从低级运动命令中推断任务级意图，以及（2）跨不同物体和任务的泛化辅助。在仿真和真实世界的实验表明，与先前方法相比，TASC提高了任务效率并减少了用户输入努力，同时实现了跨多种关系遥操作任务的零样本泛化。支持我们实验的代码在此https URL公开提供。

英文摘要

We present TASC, a Task-Aware Shared Control framework for relational telemanipulation that infers task-level user intent and provides assistance from motion-only input. To support prehensile relational tasks without predefined templates, TASC constructs an open-vocabulary interaction graph from visual input to represent functional object relationships, and infers user intent accordingly. A shared control policy then provides assistance during both grasping and object interaction, guided by spatial constraints predicted by a vision-language model. Our method addresses two key challenges in relational telemanipulation under shared control: (1) task-level intent inference from low-level motion commands, and (2) generalizable assistance across diverse objects and tasks. Experiments in both simulation and the real world demonstrate that TASC improves task efficiency and reduces user input effort compared to prior methods, while enabling zero-shot generalization across diverse relational telemanipulation tasks. The code that supports our experiments is publicly available at https://github.com/fitz0401/tasc.

URL PDF HTML ☆

赞 0 踩 0

2504.11171 2026-06-19 cs.CV cs.AI 版本更新

TerraMind: Large-Scale Generative Multimodality for Earth Observation

TerraMind：面向地球观测的大规模生成式多模态模型

Johannes Jakubik, Felix Yang, Benedikt Blumenstiel, Erik Scheurer, Rocco Sedona, Stefano Maurogiovanni, Jente Bosmans, Nikolaos Dionelis, Valerio Marsocci, Niklas Kopp, Rahul Ramachandran, Paolo Fraccaro, Thomas Brunschwiler, Gabriele Cavallaro, Juan Bernabe-Moreno, Nicolas Longépé

发表机构 * IBM Research – Europe（IBM欧洲研究院）； ETH Zurich（苏黎世联邦理工学院）； Forschungszentrum Jülich（尤利希研究中心）； European Space Agency（欧洲航天局）； Φ \Phi -Lab（Φ实验室）； NASA IMPACT ； University of Iceland（爱沙尼亚大学）

AI总结提出首个任意到任意生成式多模态基础模型TerraMind，通过双尺度表示（token级和像素级）预训练，实现零样本/少样本应用，并引入“模态思考”能力，在PANGAEA等基准上达到领先性能。

Comments Accepted at ICCV'25

详情

AI中文摘要

我们提出了TerraMind，这是首个面向地球观测（EO）的任意到任意生成式多模态基础模型。与其他多模态模型不同，TerraMind在跨模态的双尺度表示（结合token级和像素级数据）上进行预训练。在token级别，TerraMind编码高层上下文信息以学习跨模态关系；在像素级别，TerraMind利用细粒度表示捕捉关键空间细节。我们在一个全球大规模数据集的九种地理空间模态上预训练了TerraMind。在本文中，我们证明：（i）TerraMind的双尺度早期融合方法为地球观测解锁了一系列零样本和少样本应用；（ii）TerraMind引入了“模态思考”（TiM）——在微调和推理过程中生成额外人工数据以改善模型输出的能力；（iii）TerraMind在PANGAEA等社区标准的地球观测基准上达到了超越现有最优的性能。预训练数据集、模型权重和我们的代码均在宽松许可下开源。

英文摘要

We present TerraMind, the first any-to-any generative, multimodal foundation model for Earth observation (EO). Unlike other multimodal models, TerraMind is pretrained on dual-scale representations combining both token-level and pixel-level data across modalities. On a token level, TerraMind encodes high-level contextual information to learn cross-modal relationships, while on a pixel level, TerraMind leverages fine-grained representations to capture critical spatial nuances. We pretrained TerraMind on nine geospatial modalities of a global, large-scale dataset. In this paper, we demonstrate that (i) TerraMind's dual-scale early fusion approach unlocks a range of zero-shot and few-shot applications for Earth observation, (ii) TerraMind introduces "Thinking-in-Modalities" (TiM) -- the capability of generating additional artificial data during finetuning and inference to improve the model output -- and (iii) TerraMind achieves beyond state-of-the-art performance in community-standard benchmarks for EO like PANGAEA. The pretraining dataset, the model weights, and our code are open-sourced under a permissive license.

URL PDF HTML ☆

赞 0 踩 0

2506.14990 2026-06-19 cs.AI 版本更新

MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning

MEAL: 持续多智能体强化学习基准

Tristan Tomilin, Luka van den Boogaard, Samuel Garcin, Constantin Ruhdorfer, Bram Grooten, Fabrice Kusters, Yali Du, Andreas Bulling, Mykola Pechenizkiy, Meng Fang

发表机构 * Eindhoven University of Technology, The Netherlands（埃因霍温理工大学，荷兰）； University of Edinburgh, UK（爱丁堡大学，英国）； University of Stuttgart, Germany（斯图加特大学，德国）； King's College London, UK（伦敦国王学院，英国）； University of Liverpool, UK（利物浦大学，英国）

AI总结提出MEAL基准，利用JAX和GPU加速实现100任务序列训练，揭示长序列中出现的失败模式。

Comments To be published in the International Conference on Machine Learning (ICML) 2026

2509.00271 2026-06-19 cs.RO 版本更新

Learn from What We HAVE: History-Aware VErifier that Reasons about Past Interactions Online

从我们所拥有的学习：在线推理过去交互的历史感知验证器

Yishu Li, Xinyi Mao, Ying Yuan, Kyutae Sim, Ben Eisner, David Held

发表机构 * Robotics Institute, Carnegie Mellon University（卡内基梅隆大学机器人研究所）； Computer Science and Technology, Tsinghua University（清华大学计算机科学与技术系）

AI总结提出历史感知验证器HAVE，通过解耦动作生成与验证，利用历史交互在线消除歧义，理论证明其提升期望动作质量，在多个模拟和真实环境中验证有效性。

Comments CoRL 2025

详情

AI中文摘要

我们引入了一种新颖的历史感知验证器（HAVE），通过利用过去的交互来在线消除不确定场景中的歧义。机器人经常遇到视觉上模糊的物体，这些物体的操作结果直到物理交互之前都是不确定的。虽然仅凭生成模型理论上可以适应这种模糊性，但在实践中，即使在以动作历史为条件的情况下，它们在模糊情况下也会获得次优性能。为了解决这个问题，我们提出明确地将动作生成与验证解耦：我们使用无条件的基于扩散的生成器来提出多个候选动作，并采用我们的历史感知验证器通过推理过去的交互来选择最有希望的动作。通过理论分析，我们证明了使用验证器显著提高了期望动作质量。在多个模拟和真实环境（包括铰接物体、多模态门和不均匀物体拾取）中的实证评估和分析证实了我们方法的有效性以及对基线的改进。我们的项目网站位于：this https URL

英文摘要

We introduce a novel History-Aware VErifier (HAVE) to disambiguate uncertain scenarios online by leveraging past interactions. Robots frequently encounter visually ambiguous objects whose manipulation outcomes remain uncertain until physically interacted with. While generative models alone could theoretically adapt to such ambiguity, in practice they obtain suboptimal performance in ambiguous cases, even when conditioned on action history. To address this, we propose explicitly decoupling action generation from verification: we use an unconditional diffusion-based generator to propose multiple candidate actions and employ our history-aware verifier to select the most promising action by reasoning about past interactions. Through theoretical analysis, we demonstrate that employing a verifier significantly improves expected action quality. Empirical evaluations and analysis across multiple simulated and real-world environments including articulated objects, multi-modal doors, and uneven object pick-up confirm the effectiveness of our method and improvements over baselines. Our project website is available at: https://liy1shu.github.io/HAVE_CoRL25/

URL PDF HTML ☆

赞 0 踩 0