arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.28468 2026-05-28 cs.RO

EIT-Pneumatic Hybrid Robotic Skin for Practical and Accurate Force Map Reconstruction

EIT-气动混合机器人皮肤用于实用且精确的力图重建

Junhwi Cho, Sunggyu Bae, Junghyeon Ma, Hyosang Lee, Jung Kim, Kyungseo Park

AI总结 提出一种结合电阻抗断层成像(EIT)与气动触觉传感的混合机器人皮肤,通过Tikhonov正则化逆重建和逐垫气动校准,实现大面积精确触觉传感,并降低灵敏度不均匀性。

详情
Comments
8 pages, 8 figures. Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026. J. Cho, S. Bae, J. Ma contributed equally
AI中文摘要

我们提出了一种混合机器人皮肤,它结合了电阻抗断层成像(EIT)与气动触觉传感,以提高力重建能力。所开发的机器人皮肤完全通过3D打印和喷涂制造,成本低廉且易于构建。采用Tikhonov正则化逆重建,配合逐垫气动校准,通过简单的测量方案实现了精确的大面积触觉传感。为了验证,我们进行了测力计压痕实验;结果显示,在垫内不同位置,力重建保持一致。与仅使用EIT的基线相比,灵敏度不均匀性也有所降低,变异系数从0.31降至0.14,表明所提出的方法解决了EIT长期存在的局限性。我们进一步在仿人机器人上展示了胸部安装集成,并发现气动信号在各种接触场景下保持可靠,包括同一传感垫上的多个同时接触。这些结果表明,在真实机器人系统中实现精确、可扩展的全身触觉传感是一条实用路径。

英文摘要

We present a hybrid robotic skin that combines electrical impedance tomography (EIT) with pneumatic tactile sensing to improve force reconstruction capability. The developed robotic skin is fabricated entirely by 3D printing and spray coating, making it affordable and easy to build. A Tikhonov-regularized inverse reconstruction, paired with per-pad pneumatic calibration, enables accurate large-area tactile sensing with a simple measurement scheme. For validation, we conducted load-cell indentation experiments; the results showed consistent force reconstruction across locations within a pad. Compared with an EIT-only baseline, sensitivity non-uniformity was also reduced, with the coefficient of variation decreasing from 0.31 to 0.14, indicating that the proposed approach addresses a longstanding limitation of EIT. We further demonstrated chest-mounted integration on a humanoid robot and found that the pneumatic signals remained reliable across diverse contact scenarios, including multiple simultaneous contacts on the same sensing pad. These results indicate a practical path toward accurate, scalable whole-body tactile sensing in real robotic systems.

2605.28467 2026-05-28 cs.LG

Mitigating Adaptive Attacks against Reasoning Models with Activation Consistency Training

通过激活一致性训练缓解针对推理模型的自适应攻击

Avidan Shah, Jannik Brinkmann, Rico Angell

AI总结 提出激活一致性训练(ACT)方法,通过监督内部表示来防御针对推理模型的对抗性越狱和提示注入攻击,实验表明ACT在自适应攻击下保持鲁棒性。

详情
AI中文摘要

随着LLMs获得更强的推理能力,其扩展的思维链为防御对抗性越狱和提示注入引入了新的复杂性。我们研究了一致性训练,这是一系列微调目标,强制在干净提示和对抗性重写上行为一致,并评估了其两个主要变体:输出级(BCT)和激活级(ACT),在五个推理模型上。我们将这两种方法表述为提示注入防御,并发现ACT与其他基于训练的防御相比具有竞争力,同时仅需要干净和包装提示的自监督对。我们的实验还将这两种技术推广到越狱设置中,证明ACT对自适应攻击保持更强的鲁棒性。我们还提供了机制证据,表明ACT对越狱的防御被编码为在助手回合边界处激活空间中的大致线性偏移。经过ACT训练后,我们可以恢复一个单一的引导方向,该方向控制推理模型上的拒绝,而对良性输入影响最小。我们发现,即使模型的思维链被替换为来自未防御基础模型的顺从轨迹,ACT仍然保持鲁棒性,转而拒绝预填充的越狱。这些结果共同表明,监督内部表示是推理模型中各种形式安全训练的一种出乎意料有效且可解释的方法。

英文摘要

As LLMs gain stronger reasoning capabilities, their extended chain-of-thought introduces new degrees of complexity for defending against adversarial jailbreaks and prompt injection. We study consistency training, a family of fine-tuning objectives that enforce identical behavior on clean prompts and adversarial rewrites, and evaluate its two main variants, output-level (BCT) and activation-level (ACT), across five reasoning models. We formulate both methods as a prompt injection defense and find ACT to be competitive with other training-based defenses while requiring only self-supervised pairs of clean and wrapped prompts. Our experiments also generalize both techniques within the jailbreak setting, demonstrating that ACT remains more robust to adaptive attacks. We also provide mechanistic evidence that ACT's defense against jailbreaks is encoded as a roughly linear shift in activation space at the assistant-turn boundary. After ACT training, we can recover a single steering direction that controls refusal on reasoning models with minimal effect on benign inputs. We find that ACT remains robust even when the model's chain-of-thought is replaced with a compliant trace from the undefended base model, pivoting to refuse prefilled jailbreaks. Together, these results suggest that supervising internal representations is a surprisingly effective and interpretable approach to various forms of safety training in reasoning models.

2605.28465 2026-05-28 cs.CL

Beyond One Path: Evaluating and Enhancing Divergent Thinking in Interactive LLM Agents

超越单一路径:评估与增强交互式LLM代理的发散性思维

Jihyeong Park, Ingeol Baek, Jeonghyun Park, Hwanhee Lee

AI总结 提出交互式基准MUTATE和策略ReDNA,用于评估和增强LLM代理在路径和动作层面的发散性思维,解决现有框架中即时收敛压力导致的动作固定问题。

详情
Comments
28 pages, 16 figures, 19 tables
AI中文摘要

发散性思维是创造力的核心维度,然而现有对大型语言模型(LLM)的评估将其视为单轮文本生成,未能捕捉代理通过迭代交互进行推理的过程。为解决这一问题,我们引入MUTATE,一个交互式基准,旨在从两个层面评估代理的发散性思维:路径层面,代理发现通向同一目标的多个替代路径;动作层面,单个动作需要非典型、机制转换的物体使用。与仅评估成功不同,MUTATE对完成的路径和偏离路径的尝试都进行评分,捕捉传统成功率忽略的发散性推理。我们对前沿LLM的实验揭示了现有框架中的结构性盲点:当面临即时收敛压力时,它们倾向于陷入即时动作固定,无法改善动作层面的发散性。为克服这一点,我们提出ReDNA,它将无约束的发散候选生成与收敛约束选择分离。ReDNA在两个发散性层面上显著优于先前方法,并能有效泛化到外部创造力环境。我们还确认其成功源于弹性发散推理的定性增强,而非简单的环境探索。

英文摘要

Divergent thinking is a core dimension of creativity, yet existing evaluations of Large Language Models (LLMs) treat them as single-turn text generations, failing to capture how an agent reasons through iterative interaction. To address this, we introduce MUTATE, an interactive benchmark designed to evaluate agentic divergent thinking at two levels: path-level, where an agent discovers multiple alternative paths to the same goal, and action-level, where individual actions require non-typical, mechanism-shifting object uses. Unlike success-only evaluations, MUTATE scores both completed paths and off-path attempts, capturing divergent reasoning that conventional success rates discard. Our experiments with frontier LLMs reveal a structural blind spot in existing frameworks: when exposed to immediate convergence pressure, they tend to fall into immediate action fixation, failing to improve action-level divergence. To overcome this, we propose ReDNA, which separates unconstrained divergent candidate generation from convergent constraint selection. ReDNA significantly outperforms prior methods across both divergence levels and generalizes effectively to an external creativity environment. We also confirm its success stems from a qualitative enhancement of resilient divergent reasoning rather than simple environmental exploration.

2605.28464 2026-05-28 cs.CL cs.AI

The Cases LJP Never Sees: Prosecution Decision Prediction for More Complete Criminal Liability Assessment

LJP 从未见过的案件:面向更完整刑事责任评估的起诉决定预测

Junyu Lu, Qi Wei, Peishuo Zheng, Jie Zhang, Hui Huang, Qianru Wang, Chuan Xiao, Jianbin Qin, Shuyuan Zheng

AI总结 提出起诉决定预测(PDP)任务,通过分类起诉或三种不起诉决定,弥补法律判决预测(LJP)在刑事责任评估中的盲区,并构建PDP-Bench基准,实验表明大语言模型在PDP上表现显著差于LJP。

详情
Comments
24 pages, 5 figures, 22 tables
AI中文摘要

法律判决预测(LJP)已成为评估刑事法律领域人工智能的核心基准,但它只涉及已经通过检察审查并正式起诉的刑事案件。因此,LJP在评估刑事责任方面留下了大量盲区,忽略了证据不足、无刑事责任或免予处罚的案件。为填补这一空白,我们提出了 extbf{起诉决定预测(PDP)},这是首个围绕检察审查构建的法律AI任务,它将每个案件分类为起诉或三种不起诉决定之一,并反映了法律AI在证据评估、法律归类和基于价值的裁量方面的能力。我们进一步构建了 extbf{PDP-Bench},一个包含4,630个真实中国检察决定、涵盖190个罪名的基准。大量实验表明,最先进的大语言模型在PDP上的表现显著差于LJP,且主流增强途径未能缩小差距。此外,受控的RLVR干预表明,简单的结果奖励无法产生可泛化的PDP判别能力。

英文摘要

Legal Judgment Prediction (LJP) has become a core benchmark for evaluating AI in the criminal legal domain, but it only sees criminal cases that have already passed prosecutorial review and been formally indicted. As a result, LJP leaves a substantial blind spot in assessing criminal liability, overlooking cases involving insufficient evidence, no criminal liability, or guilt exempted from punishment. To fill this gap, we propose \textbf{Prosecution Decision Prediction (PDP)}, the first Legal AI task built around prosecutorial review, which classifies each case into prosecution or one of three non-prosecution decisions and reflects legal AI's capabilities in evidence evaluation, legal subsumption, and value-based discretion. We further construct \textbf{PDP-Bench}, a benchmark of 4{,}630 real Chinese prosecutorial decisions spanning 190 charges. Extensive experiments show that state-of-the-art LLMs perform substantially worse on PDP than on LJP and that mainstream enhancement routes fail to close the gap. Moreover, controlled RLVR interventions show that simple outcome rewards fail to produce generalizable PDP discrimination.

2605.28462 2026-05-28 cs.RO

Learning a Kinodynamic Trajectory Manifold for Impact-Aware Compliant Catching of Fast-Moving Objects

学习动力学轨迹流形以实现对快速移动物体的冲击感知柔顺抓取

Guorui Pei, Mengshi Zhang, Xi Chen, Jinsong Wu, Jiaming Qi, Peng Zhou

AI总结 本文通过仿真中的强化学习收集成功抓取轨迹,学习低维动力学轨迹流形,并在运行时将估计的物体初始状态直接映射到参考抓取轨迹,结合近接触柔顺控制实现快速移动物体的冲击感知抓取。

详情
AI中文摘要

快速抓取自由飞行物体由于反应时间短、冲击不确定性和动力学约束而困难。我们在仿真中使用强化学习收集成功的抓取轨迹,并学习一个低维的动力学轨迹流形。在运行时,估计的物体初始状态直接映射到参考抓取轨迹,无需在线非线性优化。轨迹通过近接触柔顺控制进行跟踪,以改善冲击吸收和抓取稳定性。

英文摘要

Fast catching of free-flying objects is difficult because of short reaction time, impact uncertainty, and kinodynamic constraints. We use reinforcement learning in simulation to collect successful catching trajectories and learn a low-dimensional kinodynamic trajectory manifold. At run time, the estimated object initial state is mapped directly to a reference catching trajectory without online nonlinear optimization. The trajectory is tracked with compliant control near contact for improved impact absorption and capture stability.

2605.28459 2026-05-28 cs.CV

REVEAL: Reference-Grounded Reasoning for Multimodal Manipulation Detection

REVEAL:基于参考依据的多模态篡改检测推理

Jun Zhou, Bingwen Hu, Yaxiong Wang, Zhedong Zheng, Yongzhen Wang, Yuchen Zhang, Ping Liu

AI总结 提出REVEAL框架,通过参考依据验证和差异感知融合机制,结合任务解耦的混合专家架构,实现多模态篡改检测与定位,并支持无训练域适应。

详情
Comments
11 pages, 3 figures
AI中文摘要

多模态篡改检测旨在同时识别伪造的图像-文本对并定位被篡改区域,然而现有方法通常依赖于记忆孤立伪影,难以应对难以察觉的篡改痕迹或域偏移。受人类比较推理启发,我们将此任务重新表述为基于参考依据的验证问题,通过将查询与检索到的真实证据进行比较来评估真实性。我们提出REVEAL(参考支持的证据分析与定位验证),一个专门为此比较范式设计的框架。为支持该范式,我们构建了一个包含17万对真实新闻图像-文本对的大规模参考库,涵盖超过4万名公众人物。在技术上,REVEAL采用差异感知融合机制来捕捉查询与检索证据之间的细粒度差异。此外,我们引入任务解耦的混合专家(MoE)架构,以联合执行实例级检测和细粒度定位,有效缓解这些异构目标之间的优化冲突。大量实验表明,REVEAL显著优于最先进方法,并且通过简单更新参考库即可实现无训练域适应,为检测不断演变的虚假信息提供了稳健且实用的解决方案。代码可在 https://anonymous.4open.science/r/REVEAL-Reference-A006 获取。

英文摘要

Multimodal manipulation detection aims to simultaneously identify forged image--text pairs and localize tampered regions, yet existing methods typically rely on memorizing isolated artifacts and struggle with imperceptible manipulation traces or domain shifts. Inspired by human comparative reasoning, we reformulate this task as a reference-grounded verification problem, where authenticity is assessed by comparing a query against retrieved authentic evidence. We propose REVEAL Reference-Enabled Verification for Evidence Analysis and Localization), a framework explicitly designed for this comparative paradigm. To support this paradigm, we construct a large-scale reference library comprising 170K authentic news image--text pairs featuring over 40K public figures. Technically, REVEAL employs a difference-aware fusion mechanism to capture fine-grained discrepancies between the query and retrieved evidence. Furthermore, we introduce a task-decoupled Mixture-of-Experts (MoE) architecture to jointly execute instance-level detection and fine-grained grounding, effectively mitigating optimization conflicts between these heterogeneous objectives. Extensive experiments demonstrate that REVEAL significantly outperforms state-of-the-art methods, and notably enables \emph{training-free domain adaptation} by simply updating the reference library, offering a robust and practical solution for detecting evolving misinformation. Code is available at https://anonymous.4open.science/r/REVEAL-Reference-A006.

2605.28456 2026-05-28 cs.AI cs.CV eess.AS

Diffusion Large Language Models for Visual Speech Recognition

用于视觉语音识别的扩散大语言模型

Jeong Hun Yeo, Chae Won Kim, Hyeongseop Rha, Yong Man Ro

AI总结 提出首个基于扩散大语言模型(DLLM)的视觉语音识别框架DLLM-VSR,通过迭代掩码去噪和灵活顺序解码,结合置信度引导的解掩码策略及两阶段训练,并引入长度引导候选解码以降低目标长度不确定性,在LRS3上取得19.5%的词错误率。

详情
Comments
Code: https://github.com/JeongHun0716/dllm-vsr
AI中文摘要

现有的视觉语音识别(VSR)系统通常依赖于从左到右的自回归解码,这可能在获得足够上下文之前,迫使对视觉模糊的令牌做出过早决策。我们提出DLLM-VSR,据我们所知,这是首个基于扩散大语言模型(DLLM)的VSR框架,将转录过程表述为具有灵活顺序解码的迭代掩码去噪。通过基于置信度的解掩码,DLLM-VSR早期提交高置信度位置,并利用已提交的令牌作为双向上下文来细化模糊令牌。为了使DLLM适应VSR,我们引入了一种两阶段掩码去噪训练策略,将视觉到文本的内容对齐与长度建模分离。我们进一步观察到,在假设知道真实转录长度的oracle长度解码下存在性能差距,这表明减少目标长度不确定性可以改善基于DLLM的VSR。为了缩小这一差距,我们开发了长度引导的候选解码,利用视频时长构建合理的转录长度假设,在多个假设下解码,并使用长度合理性和解码置信度对候选进行重新排序。所提出的方法仅使用LRS3的标注训练数据,就实现了19.5%的词错误率(WER),达到了最先进水平。

英文摘要

Existing Visual Speech Recognition (VSR) systems commonly rely on left-to-right autoregressive decoding, which can force premature decisions on visually ambiguous tokens before sufficient context is available. We propose DLLM-VSR, to the best of our knowledge, the first Diffusion Large Language Model (DLLM)-based VSR framework, formulating transcription as iterative masked denoising with flexible-order decoding. With confidence-based unmasking, DLLM-VSR commits high-confidence positions early and uses the committed tokens as bidirectional context to refine ambiguous ones. To adapt DLLMs to VSR, we introduce a two-stage masked-denoising training strategy that separates visual-to-text content alignment from length modeling. We further observe a performance gap with oracle-length decoding, which assumes access to the true transcript length, indicating that reducing target-length uncertainty can improve DLLM-based VSR. To reduce this gap, we develop length-guided candidate decoding, which uses video duration to construct plausible transcript-length hypotheses, decodes under multiple hypotheses, and reranks candidates using length plausibility and decoding confidence. The proposed method achieves a state-of-the-art WER of 19.5\% on LRS3 using only its labeled training data.

2605.28454 2026-05-28 cs.AI

GONDOR to the Rescue: Satisficing Planning with Low Memory

GONDOR 救援:低内存下的满意规划

Yonatan Vernik, Alexander Tuisov, Alexander Shleyfman

AI总结 提出 GONDOR 算法,通过周期压缩搜索树并保留稀疏锚点状态,在严格内存限制下扩展 GBFS,实现低内存预算下的满意规划。

详情
AI中文摘要

贪婪最佳优先搜索(GBFS)是解决可通过启发式估计目标(如规划、路径查找、导航和寻路)的搜索问题的主要方法。当内存严格受限时(例如在边缘设备上规划),尤其如此。为了缓解这一问题,我们提出了 GONDOR(基于动态前哨站再搜索的贪婪在线导航),这是 GBFS 的一种内存高效扩展,通过周期性地压缩搜索树同时保留一组稀疏的锚点状态,允许在严格内存限制下继续搜索,然后在到达目标时通过在稀疏状态之间重新搜索来重建路径。我们分析了该算法,并讨论了由不同前哨站选择策略定义的几种变体。此外,我们探索了在关闭列表中使用布隆过滤器进行紧凑的重复检测。跨数值规划领域和启发式配置的实验表明,与标准 GBFS 相比,GONDOR 在低内存预算下持续提高了覆盖率。我们发布了 GONDOR 和布隆过滤器变体的实现,以促进对内存高效启发式搜索的进一步研究。

英文摘要

Greedy Best-First Search (GBFS) is the dominant approach for solving search problems where the goal can be estimated with a heuristic, such as planning, route finding, navigation, and pathfinding. This is especially true when the memory is tightly constrained, such as planning on edge devices. To alleviate that, we present GONDOR (Greedy Online Navigation with Dynamic Outpost-based Re-search), a memory-efficient extension of GBFS that allows search to continue under strict memory limits by periodically compressing the search tree while retaining a sparse set of anchor states, then upon reaching the goal reconstructs the path by re-searching between the sparse states. We analyze the algorithm and discuss several variants defined by different outpost selection policies. In addition, we explore using Bloom filters for compact duplicate detection in the closed list. Experiments across numeric planning domains and heuristic configurations show that GONDOR consistently improves coverage under low memory budgets compared to standard GBFS. We release the implementation of GONDOR and the Bloom-filter variant to facilitate further research on memory-efficient heuristic search.

2605.28450 2026-05-28 cs.CV cs.AI

BiasEdit: A Training-Free Bias-Detect-and-Edit Framework for Learning Fair Visual Classifiers

BiasEdit: 一种无需训练的偏差检测与编辑框架,用于学习公平的视觉分类器

Jungwook Seo, Yoonsik Park, Changmin Lee, Sungyong Baik

AI总结 提出BiasEdit框架,通过统计依赖和互信息分析自动检测偏差属性,并利用文本引导的图像编辑生成无偏样本,无需手动标注即可实现公平分类。

详情
Comments
Accepted to The Web Conference 2026 (formerly WWW) as an Oral presentation
AI中文摘要

来自网络的视觉数据为图像分类器提供动力,这些分类器通常支撑着许多网络服务,如推荐和内容审核。然而,原始网络数据常常包含虚假关联和社会偏见,而神经网络以其倾向于学习数据中存在的偏见而闻名。这可能会加剧网络服务和网络数据中的不公平性,导致恶性循环。在图像分类的背景下,当大多数图像仅针对给定类包含相同属性时,网络会学习该类别的偏差属性。因此,从有偏数据集中训练公平且去偏的分类器需要处理多数具有偏差属性的图像(偏差对齐样本)与少数没有偏差属性的图像(偏差冲突样本)之间的不平衡问题。在这项工作中,我们引入了BiasEdit,一个模块化框架,能够自动从原始数据集中检测偏差属性并对其进行编辑,以构建去偏数据集。具体来说,BiasEdit首先通过视觉-语言表示的统计依赖性和互信息分析检测未知的偏差属性,然后使用文本引导的图像编辑显式编辑这些属性,以生成逼真的偏差冲突样本。与先前假设已知偏差属性或依赖合成混合的工作不同,我们的方法无需手动标注,并且可以利用现成的视觉-语言和编辑模型。BiasEdit解决了网络来源视觉AI中的一个基本挑战,减轻了数据集引起的偏差,并在训练数据完全有偏的情况下实现了最先进的去偏性能。

英文摘要

Visual data from the Web power image classifiers, which often underpin many web services, such as recommendation and content moderation. However, the raw Web data often contain spurious correlations and social biases, and neural networks are known for their tendency to learn biases present in data. This can reinforce unfairness in web services and the web data, leading to a vicious cycle. In the context of image classification, networks learn bias attributes for a specific class when a majority of images contain the same attribute only for a given class. Hence, training a fair and debiased classifier from a biased dataset demands handling an imbalanced problem between a majority of images with bias attributes (bias-aligned samples) and a minority without (bias-conflict samples). In this work, we introduce BiasEdit, a modular framework that automatically detects bias attributes from the original dataset and edits them to construct a debiased dataset. Specifically, BiasEdit first detects unknown bias attributes via statistical dependence and mutual information analysis of visual-linguistic representations, and then explicitly edits those attributes using text-guided image editing to generate realistic bias-conflict samples. Unlike prior works that assume known bias attributes or relies on synthetic mixing, our method operates without manual annotations and can leverage off-the-shelf vision-language and editing models. BiasEdit addresses a fundamental challenge in Web-sourced visual AI, mitigating dataset-induced bias and achieving state-of-the-art debiasing performance even when training data are fully biased.

2605.28448 2026-05-28 cs.RO

A Digital Twin Framework for Virtual Visuo-Haptic Teleoperation of Complex-Shaped Optical Microrobots

复杂形状光学微机器人的虚拟视觉-触觉遥操作数字孪生框架

Zongcai Tan, Lan Wei, Dandan Zhang

AI总结 本文提出一个数字孪生框架,集成多陷阱光学操纵、图像位姿估计、微机器人运动仿真和基于模型的触觉渲染,用于复杂形状光学微机器人的虚拟视觉-触觉遥操作,实验表明触觉反馈显著降低接触力和位置误差标准差并提高任务成功率。

详情
Comments
Accepted by 2026 MARSS
AI中文摘要

光镊(OT)为精细生物医学任务提供皮牛级操纵,其中视觉-触觉反馈可通过传达交互力线索和陷阱稳定性信息来增强操作员感知。然而,针对复杂形状光学微机器人的视觉-触觉遥操作框架仍不成熟,特别是在多陷阱操纵场景中。本文提出一个用于复杂形状OT驱动微机器人的虚拟视觉-触觉遥操作数字孪生框架。该框架在机器人操作系统(ROS)连接的双臂遥操作系统中集成了数字孪生环境、基于图像的位姿和深度估计、微机器人运动仿真以及基于模型的触觉渲染。在力建模方面,我们结合了多球分布操纵(MSDM)模型与来自光镊工具箱的光学力估计,从而实现仿真驱动的视觉-触觉反馈。该框架再现了代表性微机器人的运动趋势,并提供了与拟合光学力模型数值一致的触觉力渲染。在模拟细胞递送任务中,触觉反馈使接触力指标和微机器人到陷阱中心距离指标的标准差分别降低了53.2%和55.2%,并将任务成功率从30%提高到80%。这些结果证明了该框架在评估复杂形状光学微机器人视觉-触觉遥操作策略方面的有效性。

英文摘要

Optical tweezers (OT) provide piconewton-scale manipulation for delicate biomedical tasks, where visuo-haptic feedback can improve operator awareness by conveying interaction-force cues and trap-stability information. However, visuo-haptic teleoperation frameworks for complex-shaped optical microrobots remain underdeveloped, particularly in multi-trap manipulation scenarios. This paper presents a digital twin framework for virtual visuo-haptic teleoperation of complex-shaped OT-driven microrobots. The framework integrates a digital twin environment, image-based pose and depth estimation, microrobot motion simulation, and model-based haptic rendering within a Robot Operating System (ROS)-connected bimanual teleoperation system. For force modeling, we combine a Multi-Sphere Distributed Manipulation (MSDM) model with optical-force estimation from the Optical Tweezers Toolbox, enabling simulator-driven visuo-haptic feedback. The framework reproduces representative microrobot motion trends and provides haptic force rendering that is numerically consistent with the fitted optical-force model. In simulated cell-delivery tasks, haptic feedback reduced the standard deviations of the contact-force metric and the microrobot-to-trap-center distance metric by 53.2% and 55.2%, respectively, and improved task success from 30% to 80%. These results demonstrate the framework's effectiveness for evaluating visuo-haptic teleoperation strategies for complex-shaped optical microrobots.

2605.28444 2026-05-28 cs.LG

Bilinear Coordinate Alignment for Training-Free Task-Vector Transfer

双线性坐标对齐用于免训练任务向量迁移

Jungyong Son, Jinwook Jung, Minhee Park, Sungyong Baik

AI总结 针对预训练模型版本更新后微调知识无法直接复用的问题,提出基于双线性坐标对齐的免训练框架BiCo,通过少量校准数据的前向-反向传播估计正交Procrustes映射,实现任务向量在模型间的有效迁移。

详情
AI中文摘要

微调大规模预训练模型是近期将通用表示适配到专门任务的流行范式。然而,当预训练模型的新版本可用时,通过微调获得的专业知识无法直接重用,因为它与原始模型的参数化绑定,需要另一次昂贵的微调。为解决这一低效问题,近期工作使用任务向量(定义为微调模型与其基础模型之间的参数差异)在模型间迁移专业知识。现有方法通过匹配激活或梯度来桥接不同模型,但与直接微调相比仍存在显著性能差距,表明这些部分对应关系不足。在本工作中,我们不将任务向量仅视为参数偏移,而是重新审视任务向量的形成,并表明它们可以推导为输入侧激活与输出侧梯度之间的累积双线性交互。受此观察启发,我们将任务向量迁移形式化为双空间对齐问题,并提出BiCo,一种通过双线性坐标对齐进行任务向量迁移的免训练框架。BiCo使用少量校准集上的单次前向-反向传播估计两个空间中的正交Procrustes映射,无需任何参数更新。在广泛的计算机视觉和自然语言处理基准测试中,BiCo在宽度、深度和预训练配置不同的模型间始终优于现有迁移方法。

英文摘要

Fine-tuning large-scale pre-trained models is a recent prevalent paradigm for adapting general representations to specialized tasks. However, when a new version of a pre-trained model becomes available, expertise acquired through fine-tuning cannot be directly reused because it is tied to the parameterization of the original model, requiring another costly fine-tuning. To address this inefficiency, recent work uses task vectors, defined as the parameter difference between a fine-tuned model and its base model, to transfer expertise across models. While existing methods bridge disparate models by matching activations or gradients, a significant performance gap remains relative to direct fine-tuning, suggesting that these partial correspondences are insufficient. In this work, instead of viewing a task vector merely as a parameter offset, we revisit the formation of task vectors and show that they can be derived as accumulated bilinear interactions between input-side activations and output-side gradients. Motivated by this observation, we formulate task-vector transfer as a dual-space alignment problem and propose BiCo, a training-free framework for transferring task vectors through Bilinear Coordinate alignment. BiCo estimates orthogonal Procrustes mappings in both spaces using a single forward-backward pass on a small calibration set, without any parameter update. Across extensive computer vision and natural language processing benchmarks, BiCo consistently outperforms existing transfer methods across models that differ in width, depth, and pre-training configuration.

2605.28441 2026-05-28 cs.CV cs.AI

Bayesian Gated Non-Negative Contrastive Learning

贝叶斯门控非负对比学习

Peng Cui, Jiahao Zhang, Lijie Hu

AI总结 针对对比学习中表示纠缠问题,提出贝叶斯门控非负对比学习,通过概率门控机制动态过滤无关特征,在Imagenet-100上语义一致性提升142.1%。

详情
Comments
Accepted by ICML 2026
AI中文摘要

虽然对比学习(CL)已经革新了自监督表示学习,但其潜在表示仍然高度纠缠且不透明,限制了在安全关键应用中的可解释性。我们发现这种纠缠的一个根本原因是对确定性相似度量的依赖,该度量平等地对待所有特征维度。在组合场景中,这会产生优化冲突:常见的背景特征(如“蓝天”)被鼓励在正对中对齐,但同时又在负对中排斥,导致梯度振荡,阻碍精确的语义解缠。为了解决这个问题,我们提出了BayesNCL(贝叶斯门控非负对比学习)。与标准方法不同,BayesNCL引入了一种概率门控机制,动态过滤掉与任务无关的高频常见特征,同时选择性地保留判别性语义。通过将特征选择形式化为具有稀疏伯努利先验的变分推理问题,我们的方法有效解决了优化冲突。在Imagenet-100上的实验结果表明,与最先进的基线相比,BayesNCL在语义一致性上实现了142.1%的显著提升,在不影响下游任务性能的情况下产生了高度可解释的表示。代码可在 https://github.com/Cui-Peng-624/BayesNCL 获取。

英文摘要

While Contrastive Learning (CL) has revolutionized self-supervised representation learning, its latent representations remain highly entangled and opaque, limiting their interpretability in safety-critical applications. We identify that a fundamental cause of this entanglement is the reliance on deterministic similarity measures, which treat all feature dimensions equally. In compositional scenes, this creates an Optimization Conflict: common background features, such as, "blue sky", are encouraged to align in positive pairs but simultaneously repelled in negative pairs, causing gradient oscillations that hinder precise semantic disentanglement. To address this, we propose BayesNCL (Bayesian Gated Non-Negative Contrastive Learning). Unlike standard approaches, BayesNCL introduces a probabilistic gating mechanism that dynamically filters out task-irrelevant, high-frequency common features while selectively retaining discriminative semantics. By formalizing feature selection as a variational inference problem with a sparse Bernoulli prior, our method effectively resolves the optimization conflict. Empirical experimental results on Imagenet-100 demonstrate that BayesNCL achieves a remarkable 142.1% improvement in semantic consistency compared to state-of-the-art baselines, yielding highly interpretable representations without compromising downstream task performance. Code is available at https://github.com/Cui-Peng-624/BayesNCL.

2605.28440 2026-05-28 cs.CL cs.LG

AdaDPO: Self-Adaptive Direct Preference Optimization with Balanced Gradient Updates

AdaDPO:具有平衡梯度更新的自适应直接偏好优化

Shaolong Chen, Madalina Ciobanu, Qingqing Mao, Ritankar Das

AI总结 针对DPO中梯度不对称导致模型偏向避免不良回答而非生成优质回答的问题,提出AdaDPO算法,通过引入基于策略模型生成概率的自适应系数来平衡正负偏好梯度,在AlpacaEval 2上优于DPO并缓解长度偏差。

详情
Comments
5 figures
AI中文摘要

DPO已成为替代RLHF用于将LLM与人类偏好对齐的广泛采用方法,无需单独的奖励模型或RL循环。最近的理论分析揭示了DPO中不对称的梯度行为:损失抑制不偏好响应的速度远快于促进偏好响应,导致模型学习避免生成坏答案而非生成好答案。我们提出AdaDPO,一种DPO算法的自适应变体,它引入了基于策略模型生成概率的每偏好对、基于停止梯度的系数,并以参考模型的概率作为可选组件。AdaDPO旨在强制偏好和不偏好概率的梯度幅度相等;实际实现平衡了每token梯度并应用数值裁剪边界以保持稳定性,同时保留DPO的原始超参数结构。在SimPO类似设置下使用UltraFeedback训练的Llama-3-8B-Instruct上,AdaDPO在AlpacaEval 2上持续优于DPO:它在81%的超参数组合中实现了更高的长度控制胜率(LC),达到了全局最佳LC(48.3%)和原始胜率(46.1%),并在88%的组合中扩大了LC与WR的差距,表明有效缓解了长度偏差。对KL散度、奖励边际和奖励准确率的额外分析证实,AdaDPO纠正了梯度不平衡并产生了更高效的优化。由于它纯粹在损失层面操作,AdaDPO可以无缝集成到现有的基于偏好的对齐流程中,无需改变数据收集或模型架构。该方法仅需几行代码,并且相同的自适应原理可推广到广泛的成对对比偏好损失族,包括SimPO、R-DPO、IPO、CPO和ORPO。

英文摘要

DPO has become a widely adopted alternative to RLHF for aligning LLMs with human preferences, eliminating the need for a separate reward model or RL loop. Recent theoretical analysis uncovers an asymmetric gradient behavior in DPO: the loss suppresses dispreferred responses substantially faster than it promotes preferred ones, causing the model to learn to avoid bad answers rather than to generate good ones. We propose AdaDPO, a Self-Adaptive variant of the DPO algorithm that introduces per-preference-pair, stop-gradient-based coefficients derived directly from the policy model's generation probabilities, with the reference model's probabilities as an optional component. AdaDPO is constructed to enforce equality of gradient magnitudes between preferred and dispreferred probabilities; the practical implementation balances per-token gradients and applies a numerical clipping bound for stability, while retaining DPO's original hyperparameter structure. On Llama-3-8B-Instruct trained on UltraFeedback under a SimPO similar setup, AdaDPO consistently outperforms DPO on AlpacaEval 2: it achieves higher length-controlled win rates (LC) in 81% of hyperparameter combinations, attains the global best LC (48.3%) and raw win rate (46.1%), and enlarges the LC-over-WR margin in 88% of combinations, indicating effective mitigation of length bias. Additional analyses on KL divergence, reward margin, and reward accuracy confirm that AdaDPO rectifies the gradient imbalance and yields more efficient optimization. Because it operates purely at the loss level, AdaDPO can be dropped into existing preference-based alignment pipelines without changing data collection or model architectures. The method requires only a few lines of code, and the same self-adaptive principle generalizes to a broad family of pairwise contrastive preference losses including SimPO, R-DPO, IPO, CPO, and ORPO.

2605.28438 2026-05-28 cs.CL

Breaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin Scripts

打破脚本障碍:实现基于词性标注的ASR错误分析在非拉丁脚本中的自动对齐

Prasenjit K Mudi, Dahlia Devapriya, Sheetal Kalyani

AI总结 提出一种语言无关的自动对齐机制,使基于词性标注的ASR错误分析能在拉丁和非拉丁脚本中可靠进行,并应用于多种书写系统以提升WER。

详情
AI中文摘要

自动语音识别(ASR)系统通常使用词错误率(WER)等聚合指标进行评估,但这些指标无法捕捉错误的语言结构。细粒度分析(如基于词性(PoS)的错误特征)需要ASR假设与参考转录之间的准确对齐。然而,现有的对齐工具对于非拉丁脚本的语言通常不可靠。在这项工作中,我们通过提出一种鲁棒、自动、语言无关的对齐机制来填补这一空白,该机制适用于各种ASR架构以及拉丁和非拉丁脚本的语言。这使得假设、参考和评估序列能够一致对齐,为下游语言分析奠定基础。在此基础上,我们使用标准PoS标注器进行可扩展且可重复的基于PoS的错误分析。值得注意的是,我们对三种主要的分段书写系统进行了对齐和下游ASR错误分析,即元音附标文字(泰米尔语、印地语、卡纳达语)、字母文字(英语、俄语、希腊语)和辅音文字(阿拉伯语)。我们进一步展示了如何在ASR训练中利用此类错误信息来改进WER等指标。

英文摘要

Automatic Speech Recognition (ASR) systems are commonly evaluated using aggregate metrics such as Word Error Rate (WER), which do not capture the linguistic structure of errors. Fine-grained analysis, such as Part-of-Speech (PoS)-wise error characterization, requires accurate alignment between ASR hypotheses and reference transcriptions. However, existing alignment tools are often unreliable for languages written in non-Latin scripts. In this work, we address this gap by proposing a robust, automated, language-agnostic alignment mechanism applicable across ASR architectures and across languages written in both Latin and non-Latin scripts. This enables consistent alignment of hypotheses, references, and evaluation sequences, forming the basis for downstream linguistic analysis. Building on this, we employ standard PoS taggers to perform scalable and reproducible PoS-wise error analysis. Notably, we perform alignment and downstream ASR error analysis across three major segmented writing systems, namely, Abugida (Tamil, Hindi, Kannada), Alphabetic (English, Russian, Greek), and Abjad (Arabic). We further demonstrate how such error information can be leveraged during ASR training to improve metrics such as WER.

2605.28433 2026-05-28 cs.CL

Roles with Rails: Contract-Preserving Role Evolution in Multi-Agent Structured Reasoning

角色与轨道:多智能体结构化推理中保持契约的角色演化

Ling-Yue Ge, Lan-Zhe Guo

AI总结 提出SERO框架,通过契约保持的角色演化机制(信用引导检索、保护终端聚合器、条件验证器修复、上下文赌博机控制器)解决多智能体系统中角色漂移和契约破坏问题,在真实推理基准上验证有效性。

详情
Comments
33 pages, 23 figures, 12 tables
AI中文摘要

基于角色的LLM多智能体系统需要自适应角色池,但适应此类系统不仅仅是提示优化的问题:角色通常带有结构性义务,包括能力覆盖、消息兼容性、验证、最终答案聚合以及解析器兼容的输出协议。现有系统要么固定角色清单而失去自适应性,要么允许无约束生成导致角色漂移,移除结构上必要的角色并破坏答案契约。我们将此形式化为保持契约的角色演化,要求每次提交的编辑保留五个结构性契约(能力、通信、验证、聚合、输出协议)。我们在SERO(自演化角色编排框架)中实例化这一形式化,该框架通过信用引导检索、带有保护终端聚合器和条件验证器修复的信用排序通信DAG,以及一个上下文赌博机控制器来演化类型化角色卡池,其中LLM提出的编辑仅在它们保持契约并提高任务分数时被提交。在三个LLM骨干上的真实世界推理基准实验证实了保持契约的角色演化的价值。

英文摘要

Role-based LLM multi-agent systems need adaptive role pools, yet adapting such systems is not merely a matter of prompt optimization: roles often carry structural obligations, including capability coverage, message compatibility, validation, final-answer aggregation, and parser-compatible output protocols. Existing systems either fix the role inventory and lose adaptivity, or allow unconstrained generation to induce role drift, removing structurally necessary roles and breaking answer contracts. We formulate this as contract-preserving role evolution, requiring every committed edit to preserve five structural contracts (capability, communication, validation, aggregation, output protocol). We instantiate this formulation in SERO, a Self-Evolving Role Orchestration framework that evolves a typed role-card pool through credit-guided retrieval, a credit-ranked communication DAG with a protected terminal aggregator and conditional validator repair, and a contextual-bandit controller whose LLM-proposed edits are committed only when they preserve the contracts and improve task score. Experiments on real-world reasoning benchmarks across three LLM backbones confirm the value of contract-preserving role evolution.

2605.28428 2026-05-28 cs.CV cs.AI

Anomaly as Non-Conformity via Training-Free Graph Laplacian Energy Minimization

通过无训练图拉普拉斯能量最小化的非一致性异常检测

Jungwook Seo, Minjeong Kim, Younkwan Lee, Seungho Shin, Sungyong Baik

AI总结 提出一种无训练图拉普拉斯能量优化方法ANoCo,通过查询补丁与正常流形对齐所需的更新幅度来度量异常,无需学习参数或采样,在标准基准上取得强图像级AUROC和稳定定位图。

详情
Comments
Accepted to CVPR 2026
AI中文摘要

检测图像中的细微视觉异常仍然具有挑战性,特别是当仅预先提供正常样本时。这种无监督异常检测通常通过测量查询补丁与正常补丁记忆库的特征相似性来解决。然而,仅凭相似性无法揭示查询补丁在多大程度上违反了正常特征流形的结构。我们提出了一种无训练的拉普拉斯图能量优化公式,名为ANoCo,它通过查询补丁与固定正常流形对齐所需的非一致性成本来评分异常。对于每个查询补丁,我们构建一个由余弦亲和性加权的二分查询-正常图,明确移除查询-查询和正常-正常边以防止证据稀释。我们将异常评分公式化为带有锚定正常节点的凸拉普拉斯能量,并以闭式求解。特别地,我们不使用优化后的特征本身——异常分数是满足正常性约束所需的更新幅度,将图拉普拉斯重新定义为非一致性算子而非平滑先验。所提出的方法不引入可学习参数、消息传递或采样,其复杂度与单次线性求解相当。在标准基准上,它实现了强大的图像级AUROC、稳定的定位图以及相比先前方法更强的鲁棒性,证明了使用优化诱导的特征漂移作为异常度量的有效性。

英文摘要

Detecting subtle visual anomalies in images remains challenging, particularly when only normal samples are available a priori. Such unsupervised anomaly detection is typically solved by measuring feature similarity of a query patch to a memory of normal patches. However, similarity alone does not reveal how strongly a query patch violates the structure of the normal feature manifold. We propose a training-free Laplacian graph energy optimization formulation, named ANoCo that scores Anomaly by the cost of Non-Conformity of a query patch to align with a fixed normal manifold. For each query patch, we construct a bipartite query to normal graph weighted by cosine affinity, explicitly removing query-query and normal-normal edges to prevent evidence dilution. We formulate anomaly scoring as a convex Laplacian energy with anchored normal nodes, and solve in closed form. In particular, we do not use the optimized features themselves-the anomaly score is the magnitude of the update required to satisfy normality constraints, reframing the graph Laplacian as a non-conformity operator rather than a smoothing prior. The proposed method introduces no learnable parameters, message passing, or sampling, and has complexity comparable to a single linear solve. Across standard benchmarks, it delivers strong image-level AUROC, stable localization maps, and improved robustness over prior methods, demonstrating the effectiveness of using optimization-induced feature drift as anomaly measure.

2605.28427 2026-05-28 cs.LG stat.ML

Latent Diffusion for Missing Data

缺失数据的潜在扩散模型

Alberte Heering Estad, Ignacio Peis, Jes Frellsen

AI总结 提出两阶段框架,先利用鲁棒VAE从缺失数据中学习潜在表示,再训练扩散模型,在MCAR缺失率高达50%时仍保持高质量生成,优于像素空间扩散。

详情
AI中文摘要

扩散模型已成为缺失数据插补的强大生成方法,但大多数现有方法直接在数据空间中操作,当训练数据严重不完整时会退化。我们研究将扩散转移到学习到的潜在表示是否能在完全随机缺失(MCAR)损坏下提高鲁棒性。为此,我们提出一个两阶段框架:一个基于VAE的鲁棒插补器首先从不完整观测中学习紧凑的语义特征,然后在得到的潜在空间中训练扩散模型。在不同的训练缺失率下,我们在相同的不完整数据设置下与像素空间扩散模型进行受控比较。潜在扩散模型保持高样本质量,并在缺失率高达50%时保持稳定,而像素空间扩散随着缺失率增加逐渐退化。对于下游插补,潜在扩散也始终比像素空间扩散表现更好。这些发现表明,潜在空间建模减轻了零插补输入带来的伪影放大,并为不完整数据学习提供了更鲁棒的生成先验。总体而言,我们的结果支持潜在扩散作为缺失数据问题中像素空间扩散的一个强大且实用的替代方案。

英文摘要

Diffusion models have emerged as powerful generative approaches for missing-data imputation, yet most existing methods operate directly in data space and degrade when training data are heavily incomplete. We investigate whether shifting diffusion to a learned latent representation improves robustness under missing-completely-at-random (MCAR) corruption. To this end, we propose a two-stage framework: a robust VAE-based imputer first learns compact semantic features from incomplete observations, and a diffusion model is then trained in the resulting latent space. Across training missing rates, we perform a controlled comparison against pixel-space diffusion models under the same incomplete-data setting. The latent diffusion model maintains high sample quality and remains stable up to 50\% missingness, while pixel-space diffusion degrades progressively as missingness increases. For downstream imputation, latent diffusion also achieves consistently better performance than pixel-space diffusion. These findings indicate that latent-space modeling mitigates artifact amplification from zero-imputed inputs and provides a more robust generative prior for incomplete-data learning. Overall, our results support latent diffusion as a strong and practically useful alternative to pixel-space diffusion for missing-data problems.

2605.28424 2026-05-28 cs.CL

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

Skill0.5:面向智能体强化学习中分布外泛化的联合技能内化与利用

Jiapeng Zhu, Jianxiang Yu, Yibo Zhao, Chengcheng Han, Qi Gu, Xunliang Cai, Xiang Li, Weining Qian

AI总结 提出Skill0.5框架,通过区分通用技能内化与任务特定技能利用,结合动态难度感知路由器,在ALFWorld和WebShop上提升了分布内和分布外场景的性能。

详情
AI中文摘要

将显式技能赋予大型语言模型已成为使自主智能体解决复杂任务的一种有前景的范式。智能体技能可以内在地分为用于广泛认知迁移的通用技能和用于动态执行的任务特定技能。然而,现有的基于技能的强化学习方法通常强制在完全外化(导致高昂的上下文开销)和完全内化(存在过拟合和知识冲突风险)之间做出僵化选择。为了解决这一困境,我们提出了Skill0.5,一种新颖的智能体强化学习框架,通过结合通用技能内化与任务特定技能利用来明确区分技能处理方式。在动态、难度感知路由器的驱动下,Skill0.5将任务流式传输到不同的掌握层级,以应用定制的优化策略:它通过特权蒸馏内化通用技能,为困难任务构建认知基础,同时在简单任务上使用诊断性探测来惩罚捷径并强制特定技能利用。在ALFWorld和WebShop上的实验表明,Skill0.5优于基于记忆和基于技能的强化学习基线,在分布内和分布外场景中均实现了性能提升。

英文摘要

Equipping large language models with explicit skills has emerged as a promising paradigm for enabling autonomous agents to solve complex tasks. Agent skills can be inherently divided into general skills for broad cognitive transfer and task-specific skills for dynamic execution. However, existing skill-based reinforcement learning (RL) methods typically force a rigid choice between full externalization, which incurs prohibitive context overhead, and full internalization, which risks overfitting and knowledge conflicts. To address this dilemma, we propose Skill0.5, a novel agentic RL framework that explicitly differentiates skill treatments by combining general skill internalization with task-specific skill utilization. Driven by a dynamic, difficulty-aware router, Skill0.5 streams tasks into distinct mastery tiers to apply tailored optimization strategies: it internalizes general skills via privileged distillation to build a cognitive foundation for hard tasks, while using diagnostic probing on easy tasks to penalize shortcuts and enforce specific skill utilization. Experiments on ALFWorld and WebShop demonstrate that Skill0.5 outperforms both memory-based and skill-based RL baselines, yielding performance improvements across both in-distribution and out-of-distribution scenarios.

2605.28422 2026-05-28 cs.CV cs.AI

VITAL: Visual-Semantic Dual Supervision for Enhanced and Interpretable Latent Reasoning in Medical MLLMs

VITAL: 视觉-语义双重监督增强可解释的医学多模态大语言模型潜在推理

Qiaoru Li, Shaotian Liang, Jintao Chen, Haoran Sun, Yuxiang Cai, Jianwei Yin, Yankai Jiang

AI总结 提出VITAL框架,通过视觉-语义双重监督(文本解码器重构推理链、视觉投影器回归ROI特征)实现医学MLLM的可解释潜在推理,在7个基准上达到SOTA。

详情
AI中文摘要

潜在推理能够对连续隐藏状态而非显式token进行推理,避免了医学VQA中思维链的语言瓶颈和推理开销。然而,现有方法存在模态崩溃、视觉监督不足以及训练-推理不匹配的问题。此外,其不透明的潜在状态缺乏可解释性,而这在临床应用中至关重要。我们提出VITAL,一个用于医学MLLM的潜在空间推理框架,具有视觉-语义双重监督:一个辅助文本解码器从潜在状态重建推理链,同时一个视觉投影器从冻结的独立医学视觉编码器回归ROI特征。两个模块在推理时被丢弃,零开销,但可以在事后重新附加以实现双重可解释性,在不牺牲效率的情况下提供推理过程的文本和视觉解释。我们构建了一个涵盖9种成像模态的61K数据集,比之前的医学视觉潜在推理数据集大一个数量级。在7个基准上的实验表明,VITAL一致且显著优于骨干模型、所有潜在推理基线以及在更大数据上训练的医学MLLM,达到了与万亿参数专有模型竞争的最先进结果。

英文摘要

Latent reasoning enables reasoning over continuous hidden states rather than explicit tokens, avoiding the language bottleneck and inference overhead of chain-of-thought for medical VQA. However, existing methods suffer from modality collapse, insufficient visual supervision, and train-inference mismatch. Moreover, their opaque latent states offer no interpretability, which is critical in clinical applications. We propose VITAL, a latent-space reasoning framework for medical MLLMs with visual-semantic dual supervision: an auxiliary text decoder reconstructs reasoning chains from latent states, while a visual projector regresses ROI features from a frozen, independent medical vision encoder. Both modules are discarded at inference with zero overhead, yet can be re-attached post-hoc for dual interpretability, providing textual and visual explanations of the reasoning process without sacrificing efficiency. We construct a 61K dataset spanning 9 imaging modalities, exceeding prior medical visual latent reasoning datasets by an order of magnitude. Experiments on 7 benchmarks show that VITAL consistently and substantially outperforms the backbone, all latent reasoning baselines, and medical MLLMs trained on far larger data, achieving state-of-the-art results competitive with trillion-parameter proprietary models.

2605.28421 2026-05-28 cs.AI

DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

DenoiseRL:引导推理模型从噪声前缀中恢复

Caijun Xu, Changyi Xiao, Zhongyuan Peng, Yixin Cao

AI总结 提出DenoiseRL框架,通过强化学习从弱模型的错误推理中学习,无需外部监督或强教师模型,提升推理性能和训练效率。

详情
Comments
17 pages, 6 figures
AI中文摘要

强化学习已成为推动大型语言模型推理能力发展的核心范式,然而现有方法仍依赖更强的教师模型或精心策划的困难数据集,限制了可扩展的能力提升。在本文中,我们提出DenoiseRL,一种强化学习框架,通过从弱模型的失败中恢复导向优化来替代外部监督。DenoiseRL不依赖更强的监督或精心设计的数据,而是直接从错误的推理轨迹中学习,将其转化为改进的机会,使训练更具可扩展性且更少依赖外部资源。这产生了更丰富、更多样化的学习信号,提高了从非完美模型行为中探索的效率。因此,DenoiseRL提升了推理性能和整体训练效率,同时减少了对昂贵数据整理或更强教师模型的需求。实验表明,DenoiseRL在竞争性数学和通用推理基准上持续优于强在线强化学习基线,并随着训练难度增加促进更强的自我纠正行为,突显了改进大型语言模型推理的一种有效且可扩展的替代路径。

英文摘要

Reinforcement learning has become a central paradigm for advancing reasoning in large language models, yet most existing methods still depend on stronger teacher models or heavily curated difficult datasets, limiting scalable capability improvement. In this paper, we introduce DenoiseRL, a reinforcement learning framework that substitutes external supervision with recovery-oriented optimization over failures from weak models. Instead of relying on stronger supervision or carefully engineered data, DenoiseRL learns directly from incorrect reasoning traces by converting them into opportunities for improvement, making training more scalable and less dependent on external resources. This yields a richer and more diverse learning signal, improving exploration efficiency from imperfect model behavior. As a result, DenoiseRL improves reasoning performance and overall training efficiency while reducing the need for expensive data curation or stronger teacher models. Empirically, DenoiseRL consistently outperforms strong on-policy RL baselines across competitive mathematical and general reasoning benchmarks and promotes stronger self-corrective behavior as training difficulty increases, highlighting an effective and scalable alternative pathway for improving reasoning in large language models.

2605.28412 2026-05-28 cs.RO cs.LG

Tactile-Proprioceptive Sensor Fusion for Contact Wrench Estimation in Whole-Body Physical Human-Robot Interaction

触觉-本体感觉传感器融合用于全身物理人机交互中的接触力估计

Junha Min, Junghyeon Ma, Jiwung Kwon, Sunggyu Bae, Joohyung Kim, Kyungseo Park

AI总结 提出触觉-本体感觉融合框架,利用气动皮肤垫的触觉线索作为接触指示器,结合基于电机电流的本体感觉,通过时间卷积网络消除摩擦滞后,实现多轴接触力重建,提高物理人机交互的灵敏度和响应性。

详情
Comments
8 pages, 6 figures. Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026
AI中文摘要

直接物理引导是一种自然的教学和与机器人交互的方式,机器人皮肤通过实现灵敏的接触感知和定位做出关键贡献。本文提出了一种用于自然物理人机交互的触觉-本体感觉传感器融合框架。来自气动皮肤垫的触觉线索作为接触指示器,绕过了摩擦残余和施加外力之间的模糊性,实现了无需明确摩擦识别的高灵敏度接触检测。我们将这些线索与基于电机电流的本体感觉融合,以重建机器人表面的多轴接触力。为了在运动过程中保持精度,我们采用时间卷积网络(TCN)来减轻粘滑过渡期间的摩擦滞后,减少接触起始时的不确定性,并产生平滑、响应灵敏的引导。我们在集成皮肤的机器人臂上验证了该方法:(i)在静止接触中重建多轴力,以及(ii)同时进行力估计和动觉教学。结果表明,与仅触觉和仅本体感觉的基线相比,在不同接触条件下灵敏度和响应性均有提高,支持触觉-本体感觉融合作为安全、直观的物理人机交互的可靠途径。

英文摘要

Direct physical guidance is a natural means of teaching and interacting with robots, and robotic skins make a key contribution by enabling sensitive contact sensing and localization. This paper presents a tactile-proprioceptive sensor fusion framework for natural physical human-robot interaction. Tactile cues from pneumatic skin pads serve as contact indicators that bypass the ambiguity between frictional residues and applied external forces, enabling highly sensitive contact detection without explicit friction identification. We fuse these cues with motor-current-based proprioception to reconstruct multi-axis contact forces on the robot surface. To maintain accuracy during motion, we employ a temporal convolutional network (TCN) to mitigate friction hysteresis during stick-slip transitions, reducing uncertainty at contact onset and yielding smooth, responsive guidance. We validate the approach on a skin-integrated robot arm: (i) multi-axis forces are reconstructed in stationary contacts, and (ii) simultaneous force estimation and kinesthetic teaching are demonstrated. Results indicate improved sensitivity and responsiveness across diverse contact conditions compared with tactile-only and proprioceptive-only baselines, supporting tactile-proprioceptive fusion as a reliable pathway to safe, intuitive physical human-robot interaction.

2605.28409 2026-05-28 cs.AI

Efficient Post-training of LLMs for Code Generation With Offline Reinforcement Learning

基于离线强化学习的代码生成LLM高效后训练

Mingze Wu, Abhinav Anand, Shweta Verma, Mira Mezini

AI总结 本文探索使用离线强化学习利用现有代码数据集对代码生成LLM进行后训练,实验表明该方法能有效提升模型性能,尤其适用于小模型和复杂编码问题。

详情
AI中文摘要

使用在线强化学习(RL)进行后训练是LLM(包括代码生成模型)的重要训练步骤。然而,用于代码生成的在线RL涉及LLM推理和生成输出的验证,这可能耗费大量时间和资源。在本文中,我们通过利用现有代码数据集,探索将离线RL应用于代码生成模型。我们的实验表明,离线RL是提升LLM性能的有效训练策略。我们证明,离线RL对于小型LLM和具有挑战性的编码问题尤其有益。

英文摘要

Post-training using online reinforcement learning (RL) is an important training step for LLMs, including code-generating models. However, online RL for code generation involves LLM inference and verification of the generated output, which can take considerable time and resources. In this paper, we explore the application of offline RL to code-generating models by leveraging existing code datasets. Our experiments demonstrate that offline RL is an effective training strategy for improving LLM performance. We show that offline RL can be especially beneficial for small LLMs and challenging coding problems.

2605.28405 2026-05-28 cs.AI

Measuring Progress Toward AGI: A Cognitive Framework

衡量AGI进展:一个认知框架

Ryan Burnell, Yumeya Yamamori, Orhan Firat, Kate Olszewska, Steph Hughes-Fitt, Oran Kelly, Isaac R. Galatzer-Levy, Meredith Ringel Morris, Allan Dafoe, Alison M. Snyder, Noah D. Goodman, Matthew Botvinick, Shane Legg

AI总结 本文提出一个基于认知分类学的框架,通过10个关键认知能力评估系统性能,以量化AGI进展。

详情
Comments
32 pages, 2 figures
AI中文摘要

尽管AGI被广泛讨论,但目前缺乏衡量其进展的明确框架。这种模糊性助长了主观论断,使追踪进展变得困难,并可能阻碍负责任的治理。作为解决这一问题的起点,我们提出了一个理解系统能力与人类认知能力关系的框架。借鉴心理学、神经科学和认知科学数十年的研究,我们引入了一个认知分类学,将通用智能分解为10个关键认知能力。然后,我们提出一个严格的评估协议,通过一套有针对性的、保留的认知任务来衡量系统性能,生成可用于理解系统优缺点的“认知轮廓”。我们希望这一框架能为更严格、更实证的AGI评估提供实用路线图和初步步骤。

英文摘要

Despite widespread discussion of AGI, there is no clear framework for measuring progress toward it. This ambiguity fuels subjective claims, makes it difficult to track progress, and risks hindering responsible governance. As a starting point to address this gap, we present a framework for understanding system capabilities in relation to human cognitive abilities. Drawing from decades of research in psychology, neuroscience, and cognitive science, we introduce a Cognitive Taxonomy that deconstructs general intelligence into 10 key cognitive faculties. We then propose a rigorous evaluation protocol in which a system's performance is measured across a suite of targeted, held-out cognitive tasks, generating a 'cognitive profile' that can be used to understand a system's strengths and weaknesses. We hope this framework will provide a practical roadmap and an initial step toward more rigorous, empirical evaluation of AGI.

2605.28401 2026-05-28 cs.CV

EgoRelight: Egocentric Human Capture and Illumination Recovery for Relightable and Photoreal Avatar Rendering

EgoRelight: 基于自我中心的人体捕捉与光照恢复实现可重光照和逼真化身渲染

Jianchun Chen, Yinda Zhang, Rohit Pandey, Thabo Beeler, Marc Habermann, Christian Theobalt

AI总结 提出EgoRelight框架,通过头戴显示器上的立体下视相机提取深度图驱动网格化身,并利用神经外观模型分别合成视角相关镜面反射和视角无关漫反射,结合测试时逆渲染恢复HDR环境图,实现从单一HMD进行全身性能捕捉、逼真可重光照外观合成和环境光照估计。

详情
AI中文摘要

混合现实(MR)头戴显示器承诺了一个沉浸式远程呈现的未来,其中虚拟人无缝地融入真实或虚拟环境。实现这一愿景需要一种方法,能够从头戴显示器(HMD)的受限视角捕捉用户的运动、估计新光照下的外观并理解环境。现有方法将这些视为孤立问题:它们要么专注于驱动具有固定光照的化身,要么依赖工作室设置进行重光照。在本文中,我们提出了EgoRelight,一个用于自我中心远程呈现的整体框架,它同时捕捉全身人体性能、合成逼真且可重光照的外观,并从单个HMD估计高动态范围(HDR)环境图。首先,为了确保运动和表面重建,我们提出了一个自我中心感知模块,利用立体下视相机提取密集深度图,作为几何控制信号驱动基于网格的化身。其次,我们引入了一种新颖的神经外观模型,该模型学习分别合成视角相关的镜面反射和视角无关的漫反射。通过采用专门的射线采样策略,我们的模型能够泛化到未见过的光照,而不依赖限制性的解析BRDF先验。第三,我们通过测试时逆渲染过程实现化身无缝集成到物理世界,该过程通过将预训练化身的外观与实时自我中心相机观测匹配来恢复HDR环境图。我们通过一个社交远程呈现应用演示了我们的系统,其中远程用户根据其物理环境被一致地重光照。大量实验表明,我们的组件和集成系统在几何精度、渲染以及重光照保真度方面显著优于最先进的基线方法。

英文摘要

Mixed Reality (MR) headsets promise a future of immersive telepresence where virtual humans blend indistinguishably into real or virtual surroundings. Achieving this vision requires a method for capturing a user's motion, estimating appearance under novel lighting, and understanding the environment - all from the constrained viewpoint of a head-mounted display (HMD). Existing approaches treat these as isolated problems: they either focus on driving avatars with baked-in lighting or rely on studio setups for relighting. In this paper, we present EgoRelight, a holistic framework for egocentric telepresence that simultaneously captures full-body human performance, synthesizes photorealistic and relightable appearance, and estimates high dynamic range (HDR) environment maps from a single HMD. First, to ensure motion and surface reconstruction, we propose an egocentric perception module that leverages stereo down-facing cameras to extract dense depth maps, which serve as geometric control signals to drive a mesh-based avatar. Second, we introduce a novel neural appearance model that learns to synthesize view-dependent specular and view-independent diffuse shading separately. By employing a specialized ray-sampling strategy, our model generalizes to unseen illumination without relying on restrictive analytical BRDF priors. Third, we enable seamless avatar integration into the physical world via a test-time inverse rendering process, which recovers an HDR environment map by matching the pre-trained avatar's appearance to live egocentric camera observations. We demonstrate our system through a social telepresence application, where remote users are coherently relit according to their physical environment. Extensive experiments show that our components and the integrated system significantly outperform state-of-the-art baselines in geometric accuracy and rendering as well as relighting fidelity.

2605.28398 2026-05-28 cs.AI

HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs

HRBench:混合推理大语言模型中思维模式切换策略的基准测试与理解

Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang, Hao Liu

AI总结 提出HRBench统一评估框架,系统研究混合推理LLM中基于提示、外部路由和推测执行三类切换策略在四种训练机制下的效率-效果权衡,揭示策略选择随模型规模和任务领域的变化规律。

详情
Comments
Under review
AI中文摘要

混合推理大语言模型(LLMs)暴露了对推理努力程度的显式控制,允许用户或系统在答案质量与推理成本之间进行权衡。然而,现有的自适应思维模式选择方法通常在不同模型、数据集和实现假设下进行评估,使得比较它们的实际行为变得困难。我们引入了HRBench,一个用于研究混合推理LLM中思维模式切换的统一评估框架。HRBench沿两个轴组织设计空间:三种切换策略族(基于提示的选择、外部路由和推测执行)和四种训练机制(无训练、SFT、离线RL和在线RL),产生12种受控评估设置。我们在6个LLM(从Qwen3.5-2B到Kimi-K2.5-1.1T)和5个涵盖数学、科学和代码的推理基准上评估这些设置,并在同一流水线中重新实现了12种以上有代表性的先前方法。我们的分析表征了不同切换策略如何占据不同的效率-效果权衡区域:基于提示的方法通常提供有利的token-准确率权衡,路由方法提供更稳定的成本降低,而推测方法倾向于以更高的token成本提高准确率。我们进一步发现训练对不同策略的影响不同,且首选策略随模型规模和任务领域而变化。HRBench提供了参考实现和统一评估平台,以支持对混合推理LLM中高效推理的更受控研究。我们的数据、代码和仓库可在https://github.com/usail-hkust/HRBench获取。

英文摘要

Hybrid-reasoning large language models (LLMs) expose explicit controls over reasoning effort, allowing users or systems to trade off answer quality against inference cost. However, existing methods for adaptive thinking-mode selection are typically evaluated under different models, datasets, and implementation assumptions, making it difficult to compare their practical behavior. We introduce HRBench, a unified evaluation framework for studying thinking-mode switching in hybrid-reasoning LLMs. HRBench organizes the design space along two axes: three switching strategy families, prompt-based selection, external routing, and speculative execution, and four training regimes, training-free, SFT, offline and online RL, yielding 12 controlled evaluation settings. We evaluate these settings across 6 LLMs, from Qwen3.5-2B to Kimi-K2.5-1.1T, and 5 reasoning benchmarks covering mathematics, science, and code, while reimplementing 12+ representative prior methods within the same pipeline. Our analysis characterizes how different switching strategies occupy distinct effectiveness-efficiency trade-off regions: prompt-based methods often provide favorable token-accuracy trade-offs, routing methods offer more stable cost reduction, and speculative methods tend to improve accuracy at higher token cost. We further find that training affects strategies differently, and that the preferred strategy varies with model scale and task domain. HRBench provides reference implementations and a unified evaluation platform to support more controlled research on efficient reasoning in hybrid-reasoning LLMs. Our data, code and repository are available at https://github.com/usail-hkust/HRBench.

2605.28397 2026-05-28 cs.CV

Adaptive Temporal Gating of Longitudinal Magnetic Resonance Imaging for Alzheimer's Prediction

用于阿尔茨海默病预测的纵向磁共振成像自适应时间门控

Alireza Moayedikia, Sara Fin, Alicia Troncoso Lora, Uffe Kock Wiil

AI总结 提出TAF-Net混合CNN-Transformer架构,通过自适应时间门控融合纵向3D MRI的时空表示,在MCI-to-AD转化预测中仅用结构MRI即达到最优性能,接近需多模态数据的方法。

详情
AI中文摘要

从轻度认知障碍(MCI)到阿尔茨海默病(AD)的转化预测对于早期干预至关重要。当前的深度学习范式主要依赖于横截面结构MRI,忽略了患者特定解剖轨迹中的预后价值。我们引入了时间自适应融合网络(TAF-Net),这是一种混合CNN-Transformer架构,用于建模配对的纵向3D MRI扫描。TAF-Net的核心是由自适应时间门控的时间融合模块,该模块学习患者特定的权重以合成三种时空表示:显式结构变化、区域间时间交叉注意力和双侧特征拼接。在阿尔茨海默病神经影像学倡议队列上进行的三年MCI-to-AD转化预测评估中,TAF-Net仅使用结构MRI就在所有评估方法中取得了最高的判别性能,显著优于最强基线,并接近需要PET、CSF或遗传数据的多模态方法。该架构表现出卓越的数据效率,仅用少量训练数据即可匹配基线性能。消融研究表明,纵向融合提高了判别能力,同时与单时间点评估相比,预测方差降低了48%。可解释性分析显示,空间注意力与内侧颞叶和脑室中已建立的AD病理学一致,而门控机制优先考虑与转化风险强正相关的显式体积变化。

英文摘要

Predicting conversion from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD) is critical for early intervention. Current deep learning paradigms predominantly rely on cross-sectional structural MRI, neglecting prognostic value in patient-specific anatomical trajectories. We introduce the Temporal Adaptive Fusion Network (TAF-Net), a hybrid CNN-Transformer architecture that models paired longitudinal 3D MRI scans. Central to TAF-Net is a Temporal Fusion Module governed by an Adaptive Temporal Gate, which learns patient-specific weightings to synthesize three spatiotemporal representations: explicit structural change, region-to-region temporal cross-attention, and bilateral feature concatenation. Evaluated on the Alzheimer's Disease Neuroimaging Initiative cohort for three-year MCI-to-AD conversion prediction, TAF-Net achieved the highest discriminative performance among all evaluated methods using only structural MRI, significantly outperforming the strongest baseline and approaching multimodal methods requiring PET, CSF, or genetic data. The architecture exhibited exceptional data efficiency, matching baseline performance with a fraction of training data. Ablation studies demonstrate that longitudinal fusion improves discrimination while reducing predictive variance by 48% compared to single-timepoint evaluation. Interpretability analyses reveal spatial attention aligned with established AD pathology in the medial temporal lobe and ventricles, while the gating mechanism prioritizes explicit volumetric change with strong positive correlation to conversion risk.

2605.28396 2026-05-28 cs.LG cs.AI

ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation

ADWIN: 用于视野感知在线策略蒸馏的自适应窗口

Kun Liang, Chenming Tang, Clive Bai, Weijie Liu, Saiyong Yang, Yunfang Wu

AI总结 提出ADWIN框架,通过自适应窗口动态调整在线策略蒸馏中的轨迹长度,在保持或提升准确率的同时,将训练成本降低最多4.1倍。

详情
AI中文摘要

在线策略蒸馏(OPD)通过沿着学生生成的轨迹训练学生模型,并利用教师反馈来迁移推理行为,但标准的全轨迹训练将每次更新与昂贵的完整轨迹绑定,并且可能过度分配监督到对当前学生边际价值较低的后半部分。我们通过有用监督视野重新审视这一假设:学生引起的轨迹可能偏离教师偏好的延续,而对齐的前缀可能已经保留了长视野OPD更新方向。我们提出ADWIN,一种用于OPD的自适应窗口框架,将轨迹长度视为在线可接受性决策,在短的教师锚定前缀上训练,同时使用延迟的全轨迹探测来审计前缀与全轨迹的对齐情况,并通过陈旧性控制自适应调整下一视野。在数学和代码推理基准测试中,包括单任务、多任务和强到弱设置,ADWIN在全轨迹OPD和基于前缀的基线方法上改善了准确率与计算成本的权衡,将端到端训练成本降低最多4.1倍,同时达到相当或更好的准确率。

英文摘要

On-policy distillation (OPD) transfers reasoning behavior by training a student on teacher feedback along student-generated trajectories, but standard full-rollout training ties every update to a costly completion and can over-allocate supervision to late positions with low marginal value for the current student. We revisit this assumption through the useful supervision horizon: student-induced rollouts can drift from teacher-preferred continuations, while aligned prefixes may already preserve the long-horizon OPD update direction. We propose ADWIN, an adaptive-window framework for OPD that treats rollout length as an online admissibility decision, training on short teacher-anchored prefixes while using delayed full-rollout probes to audit prefix--full alignment and adapt the next horizon with staleness control. Across math and code reasoning benchmarks in single-task, multi-task, and strong-to-weak settings, ADWIN improves the accuracy--compute trade-off over full-rollout OPD and prefix-based baselines, reducing end-to-end training cost by up to 4.1 times while achieving comparable or better accuracy.

2605.28394 2026-05-28 cs.CV cs.GR

Sketch2Motion: Text-driven 2D Sketch to 3D Animation via Diffusion-guided Skeleton Optimization

Sketch2Motion: 文本驱动的二维草图到三维动画的扩散引导骨架优化

Gaurav Rai, Ojaswa Sharma

AI总结 提出Sketch2Motion框架,结合扩散模型和骨架优化,将二维草图转化为三维动画,无需配对运动数据,支持多种角色类型。

详情
AI中文摘要

二维手绘草图的动画化提供了一种有效的视觉交流媒介。然而,这些草图带来了挑战,特别是在处理遮挡和准确映射运动方面。虽然三维动画自然地解决了这些挑战,但估计三维运动仍然是一项非常复杂的任务。最近将二维草图转换为三维动画的方法主要集中在特定类型的运动上,例如双足运动和面部表情。我们提出了Sketch2Motion,一个基于扩散引导的骨架运动合成框架,它将经典的角色动画流程与深度生成先验相结合。我们的方法使用骨架变换来表示运动,通过线性混合蒙皮将其传播到网格变形。为了引导生成的动画朝向真实且语义上有意义的运动,我们通过运动感知分数蒸馏采样(MoSDS)集成了文本到视频扩散模型,从而无需配对运动数据即可进行优化。此外,我们应用物理启发的平滑性、拓扑和接触约束来稳定优化并保持运动合理性。进一步地,我们集成了一个弹簧-质量模拟器来引入次级运动效果。所提出的框架是通用的、完全可微的、模块化的,并且兼容双足、四足和非生命体铰接角色。实验表明,我们的方法生成了时间上连贯、与文本对齐的动画,其性能优于缺乏生成先验或显式物理约束的基线运动迁移方法。我们将公开我们的代码和数据集。

英文摘要

Animation of 2D hand-drawn sketches provides an effective medium for visual communication. However, these sketches pose challenges, particularly in handling occlusions and accurately mapping motion. While 3D animation naturally addresses these challenges, estimating 3D motion remains a very complex task. Recent approaches to converting 2D sketches to 3D animations have mainly focused on specific types of motion, such as bipedal movements and facial expressions. We propose Sketch2Motion, a diffusion-guided framework for skeleton-based motion synthesis that combines classical character animation pipelines with deep generative priors. Our method represents motion using skeletal transformations, which are propagated to mesh deformations via linear blend skinning. To guide the resulting animation toward realistic and semantically meaningful motion, we integrate a text-to-video diffusion model via motion-aware score-distillation sampling (MoSDS), enabling optimization without paired motion data. Additionally, we apply physics-inspired smoothness, topological, and contact constraints to stabilize optimization and preserve motion plausibility. Further, we integrate a spring-mass simulator to introduce secondary motion effects. The proposed framework is generalized, fully differentiable, modular, and compatible with biped, quadruped, and non-living articulated characters. Experiments demonstrate that our approach produces temporally coherent, text-aligned animations that outperform baseline motion transfer methods that lack generative priors or explicit physical constraints. We will make our code and dataset publicly available.

2605.28392 2026-05-28 cs.CV

Bound-Constrained Sparse Representation for Electrical Impedance Tomography

边界约束稀疏表示用于电阻抗成像

Chun Zhang, Dong Liu

AI总结 提出一种边界约束稀疏表示框架,通过隐式复合参数化从低维潜变量生成电导率,无需显式正则化即可改善电阻抗成像中的电导率估计。

详情
AI中文摘要

本研究提出了一种用于电阻抗成像(EIT)的边界约束稀疏表示(BC-SR)框架,旨在无需显式正则化的情况下改善电导率估计。BC-SR采用表示驱动策略,通过隐式复合参数化从低维潜变量生成电导率。利用截断图拉普拉斯基嵌入结构先验,同时通过边界保持非线性映射强制电导率处于允许范围内,并通过隐式梯度调制改善条件。该方法即使在噪声或不完整数据下也能确保鲁棒收敛。在2D/3D模拟、水箱实验和体内肺部数据上的广泛验证表明,BC-SR提高了物理一致性和结构保真度,与传统方法相比具有更强的鲁棒性。此外,BC-SR能够实现3D时差EIT重建,提供更好的空间分辨率和更连贯的3D电导率分布表示,尤其对于体内肺部数据。这表明其在EIT中具有改进性能的潜力,特别是在呼吸监测的临床应用中。

英文摘要

This study proposes a bound-constrained sparse representation (BC-SR) framework for electrical impedance tomography (EIT), aimed at improving conductivity estimation without explicit regularization. BC-SR adopts a representation-driven strategy, generating conductivity from low-dimensional latent variables via an implicit composite parameterization. Structural priors are embedded using a truncated graph-Laplacian basis, while a bound-preserving nonlinear mapping enforces admissible conductivity ranges and improves conditioning through implicit gradient modulation. The approach ensures robust convergence, even under noisy or incomplete data. Extensive validation on 2D/3D simulations, tank experiments, and in-vivo lung data shows that BC-SR improves physical consistency and structural fidelity, offering enhanced robustness compared to traditional methods. Additionally, BC-SR enables 3D time-difference EIT reconstruction, offering improved spatial resolution and a more coherent representation of 3D conductivity distributions, particularly for in-vivo lung data. This suggests potential for improved performance in EIT, particularly in clinical applications for respiratory monitoring.

2605.28390 2026-05-28 cs.AI

You Live More Than Once: Towards Hierarchical Skill Meta-Evolving

你活不止一次:迈向分层技能元进化

Xujun Li, Kehan Zheng, Mingyuan Zhao, Yize Geng, Jinfeng Zhou, Qi Zhu, Fei Mi, Lifeng Shang, Minlie Huang, Hongning Wang

AI总结 本文提出HiSME,一种轻量级分层技能元进化方法,通过从智能体任务执行轨迹中学习元技能,联合优化技能和技能进化策略,以持续提升部署的智能体系统在不同下游场景中的性能。

详情
AI中文摘要

测试时技能进化被视为增强已部署智能体系统的新范式。现有工作主要关注硬编码的技能进化策略或依赖底层LLM中昂贵参数更新的参数化学习。在本文中,我们证明,对于在不同下游场景中持续改进智能体系统,对技能进化框架本身进行测试时优化是必要的,并且轻量级的算法适应是可行的。具体来说,我们提出HiSME,一种轻量级分层技能元进化解决方案,通过从智能体的任务执行轨迹中学习元技能,联合优化技能和技能进化策略。在多样化智能体基准上的实验表明,元进化可以产生比纯技能进化更高质量的技能库,并能为不同场景推导出多样化的元技能,从而促进未来的持续经验学习。我们的代码暂时公开在https://anonymous.4open.science/r/HiSME-BD45。

英文摘要

Test-time skill evolving is regarded as a new paradigm for enhancing deployed agentic systems. Existing works mainly focus on hard-coded skill evolving strategies or parametric learning that rely on expensive parameter updates in the underlying LLMs. In this paper, we demonstrate that test-time refinement of the skill evolving framework itself is necessary for continuous improvement of the agent systems in different downstream scenarios, and lightweight algorithmic adaptation is feasible. Specifically, we propose HiSME, a lightweight hierarchical skill meta-evolving solution that jointly optimizes skills and the skill evolving strategy by learning meta-skills from agents' task execution traces. Experiments on diverse agentic benchmarks show that meta-evolving can produce a higher-quality skill library than pure skill evolving and can derive diverse meta-skills for different scenarios, thereby facilitating future continual experience learning. Our code is temporarily public at https://anonymous.4open.science/r/HiSME-BD45.