arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.03536 2026-06-03 cs.RO

Bionic Human-Motion Style Transfer for Physically Executable Whole-Body Control of Humanoid Robots

仿人运动风格迁移用于人形机器人物理可执行全身控制

Tianchen Huang, Mingkuan Zhao, Yang Gao, Feiyang Yuan, Junchi Gu, Xiaohu Zhang, Dongdong Zhao, Shi Yan, Yu Wang, Wei Gao, Shiwu Zhang

AI总结 提出一种仿生生成到控制框架,通过物理感知多条件潜扩散模型和预览式全身跟踪策略,将短时人体风格示例迁移到不同运动内容上,实现人形机器人可执行且表达性强的全身运动。

详情
Comments
Project page: https://huangtc233.github.io/bionic-style-transfer/
AI中文摘要

表达性全身运动对于在人类环境中运行的人形机器人至关重要,机器人需要稳定移动的同时呈现可读且可调整的身体行为。然而,大多数表达性运动仍来自固定演示或手动设计的脚本,难以在不同运动内容间复用演示风格。受人体运动风格通过步态节奏、姿态、手臂摆动和身体摇摆传递情感和意图线索的启发,本文提出了一种仿生生成到控制框架,用于人形机器人上的示例驱动风格迁移。给定一个短时人体风格示例和目标内容运动,所提框架生成一个风格化全身参考,保留预期运动内容的同时迁移演示风格。开发了一个物理感知多条件潜扩散模型来融合风格、内容和轨迹条件,并使用无分类器引导在不重新训练的情况下调整风格强度。为提高硬件可执行性,在训练期间对解码后的运动施加接触一致性和时间平滑正则化。生成的参考随后转换为G1兼容的机器人参考,并由基于预览的全身跟踪策略执行,该策略采用聚类和蒸馏策略训练。仿真和Unitree G1实验表明,所提方法可以将短时人体风格示例迁移到多样化的机器人运动内容,与面向动画的风格迁移基线相比减少接触和抖动伪影,并在125次真实机器人试验中达到96.0%的成功率。结果证明了使用短时人体运动示例作为可复用的仿生源实现物理可执行表达性人形运动的可行性。

英文摘要

Expressive whole-body motion is important for humanoid robots operating in human environments, where robots are expected to move stably while presenting readable and adjustable body behaviors. However, most expressive motions are still obtained from fixed demonstrations or manually designed scripts, making it difficult to reuse a demonstrated style across different motion contents. Inspired by the way human motion styles convey affective and intentional cues through gait rhythm, posture, arm swing and body sway, this paper proposes a bionic generation-to-control framework for exemplar-driven style transfer on humanoid robots. Given a short human style exemplar and a target content motion, the proposed framework generates a stylized whole-body reference that preserves the intended motion content while transferring the demonstrated style. A physics-aware multi-condition latent diffusion model is developed to fuse style, content and trajectory conditions, and classifier-free guidance is used to adjust the style intensity without retraining. To improve hardware executability, contact-consistency and temporal-smoothness regularization are imposed on decoded motions during training. The generated references are then converted into G1-compatible robot references and executed by a preview-based whole-body tracking policy trained with a cluster-and-distill strategy. Simulation and Unitree G1 experiments show that the proposed method can transfer short human style exemplars to diverse robot motion contents, reduce contact and jitter artifacts compared with animation-oriented style-transfer baselines, and achieve a 96.0% success rate over 125 reported real-robot trials. The results demonstrate the feasibility of using short human motion exemplars as reusable bionic sources for physically executable expressive humanoid motion.

2606.03535 2026-06-03 cs.IR cs.CL

Can LLM Rerankers Predict Their Own Ranking Performance?

LLM 重排序器能否预测自身的排序性能?

Shiyu Ni, Keping Bi, Jiafeng Guo, Jingtong Wu, Zengxin Han, Xueqi Cheng

AI总结 研究 LLM 重排序器能否通过自一致性或口头化置信度来估计自身生成的排序质量,并提出两种监督方法 Verb-Num 和 Verb-List 以改进校准。

详情
AI中文摘要

检索效果在不同查询间差异显著,因此在获得相关性判断之前估计排序质量非常重要。查询性能预测(QPP)解决了这一需求,但大多数现有方法依赖于检索或重排序后的外部预测器。本文研究 extit{重排序器内部 QPP}:LLM 重排序器能否估计其刚刚产生的排序的质量?我们探讨了无训练和基于训练的方法。对于无训练估计,我们检查了跨采样排序的特定于度量的自一致性以及由重排序器直接生成的口头化置信度。在 TREC Deep Learning 2019--2022 上使用四个 LLM 的实验表明,自一致性与最先进(SOTA)方法竞争力相当,并且在几乎所有设置下校准更好,而直接口头化置信度严重过度自信。为了改进口头化置信度,我们提出了两种监督方法 Verb-Num 和 Verb-List,使 LLM 重排序器仅需少量额外输出标记即可生成校准的排序质量估计。

英文摘要

Retrieval effectiveness varies substantially across queries, making it important to estimate ranking quality before relevance judgments are available. Query performance prediction (QPP) addresses this need, but most existing methods rely on external predictors after retrieval or reranking. In this paper, we study \textit{reranker-internal QPP}: can an LLM reranker estimate the quality of the ranking it has just produced? We investigate both training-free and training-based approaches. For training-free estimation, we examine metric-specific self-consistency across sampled rankings and verbalized confidence produced directly by the reranker. Experiments on TREC Deep Learning 2019--2022 with four LLMs show that self-consistency is competitive with the state-of-the-art (SOTA) approach and better calibrated in almost all settings, while direct verbalized confidence is severely overconfident. To improve verbalized confidence, we propose two supervised methods, Verb-Num and Verb-List, which enable LLM rerankers to produce calibrated ranking-quality estimates with only a few additional output tokens.

2606.03532 2026-06-03 cs.LG cs.AI

When Should the Teacher Move? Temporal Coupling and Stability in Self On-Policy Distillation

教师何时应该移动?自在线策略蒸馏中的时间耦合与稳定性

Haowei Guo, Baolong Bi, Ruicheng Zhang, Bingqian Sun, Wentao Zhang

AI总结 研究自在线策略蒸馏中教师更新调度对稳定性的影响,提出基于隔离期和门控机制的CGTR方法,实现零崩溃和最佳性能。

详情
AI中文摘要

自在线策略蒸馏针对从自身参数历史派生的教师训练学生策略,但教师的更新调度——控制教师与学生之间的\emph{时间耦合}——尚未作为稳定性变量被系统研究。通过对Qwen3-8B进行受控调度扫描,我们确定\emph{隔离期}(定义为更新之间教师完全冻结)是实现稳定学习的关键结构属性,而非教师年龄。为了刻画这些底层训练动态,我们引入了一个诊断框架,包括时间KL结构、刷新冲击和长度尾部风险。该框架进一步揭示了\emph{状态遗忘崩溃}:最优的短视固定调度在长视训练下灾难性失败,因为时钟驱动的刷新可以在单个不可逆步骤中将短暂漂移的学生复制到教师中。这种失败模式在短视评估下不可见,并且在机制上不同于EMA的慢性污染。为了解决这个问题,我们提出了\emph{巩固门控教师刷新}(CGTR),它在保持隔离期的同时,基于奖励改进和长度尾部安全的联合证据对每次刷新进行门控,确保每次教师移动响应于真正的学生巩固而非时钟信号。使用单一共享参数集且无需每数据集重新调整,CGTR在所有四个任务(化学、生物学、物理学、工具使用)上实现了 extbf{零崩溃}和最佳最终分数,并自动调节其刷新频率以适应每个任务的学习动态。

英文摘要

Self on-policy distillation trains a student policy against a teacher derived from its own parameter history, yet the teacher's update schedule -- which governs the \emph{temporal coupling} between teacher and student -- has not been systematically studied as a stability variable. Through a controlled schedule sweep on Qwen3-8B, we establish that \emph{isolation periods}, defined as complete teacher freezing between updates, are the key structural property enabling stable learning, not teacher age. To characterize these underlying training dynamics, we introduce a diagnostic framework of temporal KL structure, refresh shock, and length-tail risk. This framework further uncovers \emph{state-oblivious collapse}: optimal short-horizon fixed schedules catastrophically fail under long-horizon training because a clock-driven refresh can copy a transiently drifting student into the teacher in a single, irreversible step. This failure mode is invisible under short-horizon evaluation and mechanistically distinct from EMA's chronic contamination. To address this, we propose \emph{Consolidation-Gated Teacher Refresh} (CGTR), which preserves isolation periods while gating each refresh on joint evidence of reward improvement and length-tail safety, ensuring every teacher movement responds to genuine student consolidation rather than a clock signal. With a single shared parameter set and no per-dataset retuning, CGTR achieves \textbf{zero collapse} and the best final score on all four tasks (Chemistry, Biology, Physics, ToolUse), self-regulating its refresh frequency to each task's learning dynamics.

2606.03523 2026-06-03 cs.CR cs.AI cs.LG

High-Precision APT Malware Attribution with Out-of-Scope Resilience

高精度APT恶意软件归因与越界鲁棒性

Peter Williams, Adam Sobey, Erisa Karafili

AI总结 提出基于排名二元分类器与显式弃权的APT恶意软件归因方法,在越界样本占比87%时仍保持92%精度和95%选择性准确率。

详情
AI中文摘要

早期归因高级持续性威胁(APT)活动可帮助防御者优先调查、选择对策并减少入侵影响。恶意软件提供了有用的归因证据,但自动化APT恶意软件归因在实践中仍然困难。现有方法通常作为封闭集分类器在有限数量的已知APT组织上进行训练和评估。然而,在操作环境中,分类器很可能遇到训练中未出现的组织样本。封闭集分类器被迫将这些样本分配给已知组织,产生无根据且可能误导的归因。我们提出一种基于排名二元分类器与显式弃权的高精度APT恶意软件归因方法。我们的方法不是训练单个多类分类器,而是为每个APT组织训练和调整两个二元分类器,根据验证性能对分类器进行排名,并顺序应用它们。仅当分类器提供足够证据时才对样本进行归因;否则,弃权。我们在APT恶意软件数据集和旨在压力测试越界行为的更大组合数据集上评估该方法。在APT恶意软件数据集上,该方法实现了比先前公布结果更高的精度。在最具挑战性的设置中,87%的测试样本来自训练中排除的60个APT组织,该方法对94%的越界样本弃权,同时在其分类的样本上保持92%的精度和95%的选择性准确率。

英文摘要

Early attribution of Advanced Persistent Threat (APT) activity can help defenders prioritise investigation, select countermeasures, and reduce the impact of an intrusion. Malware provides useful attribution evidence, but automated APT malware attribution remains difficult in practice. Existing approaches are typically trained and evaluated as closed-set classifiers over a limited number of known APT groups. In operational environments, however, classifiers are likely to encounter samples from groups not represented during training. Closed-set classifiers are then forced to assign such samples to known groups, producing unsupported and potentially misleading attributions. We present a high-precision APT malware attribution method based on ranked binary classifiers with explicit abstention. Rather than training a single multi-class classifier, our approach trains and tunes two binary classifiers per APT group, ranks the classifiers by validation performance, and applies them sequentially. A sample is attributed only when a classifier provides sufficient evidence; otherwise, it abstains. We evaluate the method on the APT Malware dataset and on a larger combined dataset designed to stress-test out-of-scope behaviour. On the APT Malware dataset, the method achieves higher precision than previously published results on the same dataset. In the most challenging setting, where 87% of test samples came from 60 APT groups excluded from training, the method abstained on 94% of out-of-scope samples while maintaining 92% precision and 95% selective accuracy on the samples it classified.

2606.03521 2026-06-03 cs.LG cs.AI

Post-Hoc Robustness for Model-Based Reinforcement Learning

基于模型的强化学习的后验鲁棒性

Siemen Herremans, Ali Anwar, Siegfried Mercelis

AI总结 提出一种在推理时利用学习模型和名义策略进行鲁棒策略改进的后验鲁棒化方法,通过对抗性展开的模型预测控制提升鲁棒性,无需额外训练神经网络。

详情
AI中文摘要

为了提高强化学习(RL)在现实世界中的适用性,对抗鲁棒RL领域研究如何在对抗环境扰动下训练智能体。在该设置中,主角智能体在对手的环境扰动下优化策略,形成零和马尔可夫博弈。当对抗鲁棒RL与基于模型的RL结合时,对手可以针对学习到的转移模型而非训练环境。扩展这一思想,本文引入了深度RL智能体在推理时的后验鲁棒化。通过将学习模型与训练的名义策略结合使用,我们的方法执行鲁棒策略改进步骤。目标是提高鲁棒性而无需对神经网络进行额外训练。具体来说,我们利用对抗性展开下的模型预测控制,这些展开通过有界不确定性集内的投影梯度下降进行近似。此外,这些离线展开在执行时考虑并缓解了分布外问题。通过在扰动的Gymnasium MuJoCo环境中评估算法,同时考虑后验推理设置的计算限制,验证了所提方法在鲁棒性上的显著提升。

英文摘要

To improve the real-world applicability of reinforcement learning (RL), the field of adversarially robust RL studies how to train agents under adversarial environment perturbations. In this setting, a protagonist agent optimizes a policy under environmental perturbations from an adversary, resulting in a zero-sum Markov game. When adversarially robust RL is combined with model-based RL, the adversary can target a learned transition model instead of the training environment. Extending this idea, this work introduces post-hoc robustification of deep RL agents at inference time. By using the learned model in combination with a trained nominal policy, our approach performs a robust policy improvement step. The goal is to improve robustness without any additional training of neural networks. Specifically, we utilize model-predictive control under adversarial rollouts, which are approximated via projected gradient descent within a bounded uncertainty set. Furthermore, these offline rollouts are performed while considering and mitigating out-of-distribution issues. The proposed methodology is validated by demonstrating significant improvements in robustness when the algorithm is evaluated in perturbed Gymnasium MuJoCo environments, while considering the computational limitations of the post-hoc inference setting.

2606.03518 2026-06-03 cs.AI cs.CR

Overlaying Governance: A Compositional Authorization Framework for Delegation and Scope in Agentic AI

覆盖治理:面向代理型人工智能的委托与范围的组合授权框架

Amjad Ibrahim, Yong Li

AI总结 针对代理型AI中传统授权框架无法处理递归委托、动态范围等问题,提出一种组合治理框架,通过定义委托类型、权限责任和资源范围衰减,并引入组合算子在不重写现有策略的情况下叠加代理语义,实现可问责的授权。

详情
Comments
12 pages
AI中文摘要

随着AI系统从被动模型演变为能够发起行动、协作和委托任务的自主主动代理,软件系统的传统边界变得模糊。围绕固定主体、显式请求和静态范围构建的传统授权和委托框架不足以治理代理系统。代理型AI需要更丰富的授权语义:代理必须继承和委托权限,在时间限制的权限下行动,并通过共享协议进行协调。现有的身份和访问管理(IAM)系统未能完全捕捉这种代理概念,缺乏递归委托、上下文边界和动态范围作为可执行治理原语的机制。与OAuth 2.0等访问委托标准不同,我们将委托视为合同条款,而不仅仅是基于静态令牌的同意凭证。本文提出一个组合治理框架,引入了代理型AI不可或缺的原语。我们定义了委托类型及其权限和问责含义,并引入了资源范围衰减的概念以限制代理访问范围。这些概念被表达为通用的关系定义,可以组合到现有的授权域(例如金融系统)中。为了操作化这种组合,我们定义了一个组合算子,将新的代理语义(例如递归委托链)叠加到现有关系策略上,而无需重写它们。我们通过形式化证明和实证评估来证实该框架,表明它为代理型AI系统中的可问责授权提供了形式化且实用的基础。

英文摘要

As AI systems evolve from passive models into autonomous active agents capable of initiating actions, collaborating, and delegating tasks, the traditional boundaries of software systems blur. Traditional authorization and delegation frameworks, built around fixed principals, explicit requests, and static scopes, are insufficient to govern agentic systems. Agentic AI demands richer authorization semantics: agents must inherit and delegate permissions, act under time-limited authority, and coordinate through shared protocols. Existing Identity and Access Management (IAM) systems fail to fully capture this notion of agency, lacking mechanisms for recursive delegation, contextual boundaries, and dynamic scoping as executable governance primitives. Unlike access delegation standards such as OAuth 2.0, we treat delegation as a contractual term rather than merely a static token-based consent credential. This paper proposes a compositional governance framework that introduces primitives indispensable for agentic AI. We define types of delegation and their permissions and accountability implications, and we introduce a notion of resource scope attenuation to bound agentic access envelopes. These concepts are expressed as general relational definitions that can be composed into existing authorization domains (e.g., financial systems). To operationalize this composition, we define a compositional operator that overlays new agentic semantics, such as recursive delegation chains, onto existing relational policies without rewriting them. We substantiate this framework through formal proofs and empirical evaluation, showing that it provides a formal yet practical foundation for accountable authorization in agentic AI systems.

2606.03512 2026-06-03 cs.RO cs.AI

SPADE: Sketch-guided Path Planning Augmented with Diffusion Experts

SPADE: 草图引导的路径规划增强扩散专家

Charbel Abi Hana, Tatiana Ghantous, Mikael Khalil, Anthony Rizk

AI总结 提出一种结合扩散增强的框架,通过改进的标注工具和训练策略,在保持实时性的同时提升路径规划的泛化能力和鲁棒性,显著降低姿态误差和FID。

详情
AI中文摘要

路径规划对于自主移动机器人(AMR)至关重要。将人类偏好纳入规划的常规方法通常依赖于复杂的奖励工程或硬件密集型解决方案。最近的最先进框架利用模仿学习从专家演示中训练特定行为的路径规划模型。然而,这些方法面临两个关键限制:对未见环境的泛化能力有限,以及演示收集中的鲁棒性较低。为了解决这些挑战,本文介绍了一个增强框架,专注于两个主要贡献:一个基于ROS 2重构的标注工具,以及一种新颖的训练策略,将基于扩散的数据增强集成到基线行为克隆模型中。提供了专家演示数据集,并通过消融研究评估所提出解决方案的鲁棒性。增强方法优于最先进的方法,绝对姿态误差(APE)降低39.1%,Fréchet初始距离(FID)降低33.5%,同时可训练参数减少93.8%。此外,它达到了扩散级别的泛化能力,同时保留了最先进模型的实时、边缘特性。

英文摘要

Path planning is essential for Autonomous Mobile Robots (AMRs). Conventional methods for incorporating human preferences into planning typically rely on either complex reward engineering or hardware-intensive solutions. Recent state-of-the-art frameworks leverage imitation learning to train behavior-specific path planning models from expert demonstrations. However, these approaches face two key limitations: limited generalization to unseen environments and low robustness in demonstration collection. To address these challenges, this work introduces an enhanced framework that focuses on two main contributions: an overhauled annotation tool built on ROS 2, and a novel training strategy that integrates diffusion-based augmentation into baseline behavioral cloning models. A dataset of expert demonstrations is provided and evaluated through ablation studies to assess the robustness of the proposed solution. The enhanced approach outperforms state-of-the-art methods with 39.1% lower Absolute Pose Error (APE) and 33.5% lower Fr'echet Inception Distance (FID) while having 93.8% less trainable parameters. Moreover it attains diffusion-level generalization while preserving the real-time, on-edge properties of state-of-the-art models.

2606.03509 2026-06-03 cs.CV

EvoMemNav: Efficient Self-Evolving Fine-Grained Memory for Zero-Shot Embodied Navigation

EvoMemNav: 用于零样本具身导航的高效自进化细粒度记忆

Zuhao Ge, Xiaosong Jia, Chao Wu, Yuchen Zhou, Zuxuan Wu, Yu-Gang Jiang

AI总结 提出EvoMemNav框架,通过构建视觉-语义记忆图并采用预算驱动的粗到细策略,结合反射驱动写回机制,实现零样本具身导航中高效、自进化的细粒度记忆,提升多实例区分和停止验证性能。

详情
Comments
Preprint
AI中文摘要

构建记忆对于零样本具身导航中的长时程规划至关重要。以检测器为中心的场景图通常将观测压缩为稀疏节点,丢弃细粒度视觉证据并积累噪声,而基于3D重建的方法计算成本高昂。我们提出EvoMemNav,一种用于零样本具身导航的高效、自进化、细粒度记忆框架。EvoMemNav构建视觉-语义记忆图(VSMGraph),将原始视图作为一等记忆,并通过轻量级语义线索和拓扑关系将其组织成房间-视图-对象层次结构,保留用于消歧和停止验证的细粒度细节。为了扩展到不断增长的记忆,我们引入预算驱动的粗到细策略:粗阶段将搜索空间压缩到有希望的区域,细阶段仅调用VLM进行目标验证和决策。除了静态记忆,EvoMemNav在每个子任务后执行反射驱动的写回,更新附加到图上的先验知识,编码累积的环境知识以优化未来决策而无需重新训练。在GOAT-Bench和HM3D上,针对物体、文本描述和图像目标模态的实验显示,SR/SPL持续提升,具有更好的多实例区分能力、更少的过早停止和更强的零样本泛化能力。

英文摘要

Building memory is essential for long-horizon planning in zero-shot embodied navigation. Detector-centric scene graphs often compress observations into sparse nodes, discarding fine-grained visual evidence and accumulating noise, while 3D reconstruction-based methods remain computationally prohibitive. We present EvoMemNav, an efficient, self-evolving, fine-grained memory framework for zero-shot embodied navigation. EvoMemNav constructs a Visual-Semantic Memory Graph (VSMGraph) that keeps raw views as first-class memory and organizes them with lightweight semantic cues and topological relations into a room-view-object hierarchy, preserving fine-grained details for disambiguation and Stop verification. To scale to growing memory, we introduce a budgeted coarse-to-fine policy: a coarse stage compresses the search space into promising regions, and a fine stage invokes a VLM only for targeted verification and decision. Beyond static memories, EvoMemNav performs reflection-driven write-back after each subtask, updating graph-attached priors that encode accumulated environmental knowledge to refine future decisions without retraining. Experiments on GOAT-Bench and HM3D across object, text-description, and image-goal modalities show consistent gains in SR/SPL, with better multi-instance disambiguation, fewer premature stops, and stronger zero-shot generalization.

2606.03508 2026-06-03 cs.CV

Structure-Guided Mixed Masked Pretraining and Spatial Continuity Regularization for Printed Circuit Board Defect Detection

结构引导混合掩码预训练与空间连续性正则化用于印刷电路板缺陷检测

Peitong Wang, Nuo Wang, Enxin Qin, Chengjin Yu, Hanyu Xuan, Yuanting Yan

AI总结 提出两阶段PCB缺陷检测框架,通过结构引导混合掩码预训练学习PCB结构先验,并在微调阶段引入空间连续性正则化提升细长缺陷定位紧凑性,在DsPCBSD+数据集上达到85.5% mAP0.5。

详情
Comments
Preprint. 38 pages, 12 figures, 6 tables
AI中文摘要

印刷电路板(PCB)缺陷检测是自动光学检测(AOI)的关键环节,但在实际应用中仍具挑战性,因为许多缺陷微小、低对比度且嵌入密集电路背景中。为解决这些问题,本文提出一种两阶段PCB缺陷检测框架,结合结构引导混合掩码预训练与空间连续性正则化。在预训练阶段,我们设计了一种稀疏卷积掩码预训练方案,利用无标签PCB图像,其中结构引导混合掩码用于构建信息丰富的掩码输入。稀疏卷积重建管道抑制掩码区域的无效响应,使检测器主干能够从可见导电模式推断缺失的PCB结构,从而学习PCB结构先验。在微调阶段,预训练主干被迁移到下游缺陷检测任务。针对该任务,在微调过程中引入空间连续性正则化项,该项约束分配给同一缺陷实例的分散正预测,并促进细长缺陷区域上更紧凑的定位。在DsPCBSD+数据集上的实验表明,所提方法达到85.5% mAP0.5和52.3% mAP0.5:0.95,优于多个强基线检测器。消融研究和定性结果进一步证实了所提框架在工业AOI场景中稳健PCB缺陷检测的有效性。

英文摘要

Printed circuit board (PCB) defect detection is an essential part of automated optical inspection (AOI); yet it remains challenging in practice because many defects are tiny, low-contrast, and embedded in dense circuit backgrounds. To address these issues, this paper presents a two-phase PCB defect detection framework that combines structure-guided mixed masked pretraining with spatial continuity regularization. In the pretraining stage, we design a sparse convolutional masked pretraining scheme to exploit unlabeled PCB images, where structure-guided mixed masking is used to construct informative masked inputs. The sparse convolutional reconstruction pipeline suppresses invalid responses from masked regions and enables the detector backbone to infer missing PCB structures from visible conductive patterns, thereby learning PCB structural priors. In the fine-tuning stage, the pretrained backbone is transferred to the downstream defect detection task. For the task, a spatial continuity regularization term is introduced during fine-tuning. This term constrains dispersed positive predictions assigned to the same defect instance and promotes more compact localization on elongated defect regions. Experiments on the DsPCBSD+ dataset show that the proposed method achieves 85.5% mAP0.5 and 52.3% mAP0.5:0.95, outperforming several strong baseline detectors. Ablation studies and qualitative results further confirm the effectiveness of the proposed framework for robust PCB defect detection in industrial AOI scenarios.

2606.03506 2026-06-03 cs.CV cs.GR

AvatarMix: Identity-Preserving Cross-Avatar Composition for Outfit Personalization

AvatarMix: 保持身份特征的跨化身组合用于服装个性化

Zhaorong Wang, Yoshihiro Kanamori, Yuki Endo

AI总结 提出AvatarMix方法,通过直接组合两个高保真高斯化身实现服装迁移,并采用SeamFix和FullbodyFix两级细化策略解决接缝伪影和身体重塑后的外观保真问题。

详情
Journal ref
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026, pp. 425-435
Comments
CVPR 2026 Findings. 16 pages, including supplementary material
AI中文摘要

现有的3D化身服装迁移方法面临不同挑战:将2D编辑提升到3D的方法通常会导致服装或身份质量下降,而分别建模身体和服装层的方法则容易出现交叉伪影。我们提出AvatarMix,一种组合范式,通过直接组合两个高保真高斯化身的头部和身体来绕过这些问题。虽然这种范式固有地保留了服装质量并避免了交叉,但在创建无缝连接和保持身体重塑后的外观保真度方面带来了挑战。为此,我们提出两级细化策略:SeamFix,一个局部扩散模块,用于细化头发和颈部以确保无伪影连接;以及一个可选的全身细化模块FullbodyFix,当重定向导致穿衣身体退化时恢复服装外观。两者都在已经3D一致的高斯化身渲染上操作,与2D到3D提升相比,这限制了多视图伪影。为了保留用户的身体身份,我们的基于网格的高斯表示能够适应鲁棒的网格重定向技术,精确地将穿衣身体重塑为用户体型,并鲁棒地处理多样化的身体形状。大量实验表明,我们的方法在服装保真度和身份保持方面达到了最先进的结果,为逼真的3D服装个性化提供了新视角。项目页面:此https URL

英文摘要

Existing 3D avatar outfit transfer methods face distinct challenges: approaches that lift 2D edits to 3D often suffer from outfit or identity quality degradation, while those that separately model body and clothing layers are prone to intersection artifacts. We introduce AvatarMix, a compositional paradigm that bypasses these issues by directly composing the head and body from two high-fidelity Gaussian avatars. While this paradigm inherently preserves outfit quality and avoids intersections, it introduces challenges in creating a seamless join and maintaining appearance fidelity after body reshaping. To this end, we propose a two-tier refinement strategy: SeamFix, a localized diffusion module that refines hair and neck to ensure an artifact-free join, and an optional full-body refinement, FullbodyFix, that restores garment appearance when retargeting degrades the clothed body. Both operate on renders from an already 3D-consistent Gaussian avatar, which limits multi-view artifacts compared to 2D-to-3D lifting. To preserve the user's body identity, our mesh-based Gaussian representation enables the adaptation of a robust mesh retargeting technique, precisely reshaping the clothed body to the user's physique and robustly handling diverse body shapes. Extensive experiments demonstrate that our method achieves state-of-the-art results in outfit fidelity and identity preservation, providing a new perspective for realistic 3D outfit personalization. Project page: https://larsph.github.io/avatarmix/

2606.03503 2026-06-03 cs.AI

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

ThoughtFold: 通过内省偏好学习折叠推理链

Ziyan Liu, Xueda Shen, Yuzhe Gu, Songyang Gao, Kuikun Liu, Guangran Cheng, Chengqi Lyu, Dahua Lin, Wenwei Zhang, Kai Chen

AI总结 提出ThoughtFold框架,通过细粒度偏好学习惩罚冗余探索并鼓励直接连接关键推理段,将推理链折叠为更简洁路径,在保持精度的同时大幅降低token使用量。

详情
AI中文摘要

大型推理模型(LRMs)由于在思维链(CoTs)上使用可验证奖励的强化学习(RLVR)取得了显著进展。然而,由于长CoT自然包含试错,且主流RLVR方法选择结果正确的CoT轨迹进行记忆,长CoT中的冗余探索不可避免地得到强化,导致LRMs的过度思考问题。先前解决此问题的尝试主要给较短轨迹更多优势,但其学习信号仍基于结果,无法减少长CoT中冗余探索的记忆。因此,我们提出ThoughtFold,一个利用细粒度偏好学习来缓解冗余探索以实现高效推理的框架。ThoughtFold采用内省策略识别每个正确轨迹中的冗余,从而产生一系列候选子轨迹。利用这一谱系,我们引入一个掩码偏好优化目标,明确惩罚冗余探索并鼓励模型直接桥接关键推理段,有效地将其推理链折叠为更简洁的路径。大量实验表明,ThoughtFold显著提高了效率。它将DeepSeek-R1-Distill-Qwen-7B的token使用量减少约56%,同时保持最先进的准确性。

英文摘要

Large Reasoning Models (LRMs) have achieved remarkable progress thanks to Reinforcement Learning with Verifiable Rewards (RLVR) on Chain-of-Thoughts (CoTs). However, since long CoTs naturally contain trial and errors and mainstream RLVR approaches choose outcome-correct CoT trajectories for memorization, the redundant explorations in long CoTs are inevitably reinforced, which results in the over-thinking issues of LRMs. Previous attempts to resolve this issue mainly give more advantage to shorter trajectories, yet their learning signals are still outcome-based and cannot reduce the memorization of redundant explorations in long CoTs. Therefore, we propose ThoughtFold, a framework that leverages fine-grained preference learning to mitigate redundant explorations for efficient reasoning. ThoughtFold employs an introspective strategy to identify redundancy within each correct trajectory, which yields a spectrum of candidate sub-trajectories. Leveraging this spectrum, we introduce a masked preference optimization objective that explicitly penalizes redundant explorations and encourages the model to directly bridge essential reasoning segments, effectively folding its reasoning chains into a more concise path. Extensive experiments show that ThoughtFold significantly enhances efficiency. It reduces the token usage of DeepSeek-R1-Distill-Qwen-7B by approximately 56% while maintaining state-of-the-art accuracy.

2606.03499 2026-06-03 cs.CV

Characterizing Detectability in 3DGS Poisoning: A Stage-wise Benchmark

表征3DGS投毒中的可检测性:分阶段基准测试

Quoc-Anh Bui-Huynh, Thanh Duc Ngo, Xue Geng, Kaixin Xu, Wang Zhe, Xulei Yang, Ngai-Man Cheung

AI总结 针对3DGS易受多种投毒攻击的问题,提出分阶段基准Poison-3DGS,系统研究各阶段可检测性差异,发现不同攻击在不同阶段产生独特取证信号,后期阶段(如训练动态和高斯参数统计)提供早期不可观测的强线索。

详情
AI中文摘要

3D高斯泼溅(3DGS)已迅速成为实时新视角合成的主要表示方法,但近期研究表明它易受多种投毒攻击,包括虚幻物体注入、计算成本放大和事后模型水印。尽管威胁面不断扩大,现有研究主要关注攻击成功,而防御和检测仍探索不足。从检测角度看,3DGS重建流程的多阶段特性产生了异构的中间表示,这既是关键挑战也是机遇。检测投毒的取证信号本质上是阶段依赖的:在一个阶段引入的攻击可能仅在后续阶段产生信号。这促使我们采用超越单阶段评估的分阶段可检测性视角。我们引入Poison-3DGS,一个用于分阶段表征3DGS投毒检测的基准。它暴露了跨多种场景和攻击的阶段特定伪影,包括多视图图像、几何、训练动态和高斯参数。利用该基准,我们对流水线各阶段的可检测性进行了系统研究。分析揭示了若干见解。首先,可检测性在不同阶段间差异显著,且没有任何单一阶段在所有攻击类型中持续占优。其次,不同攻击表现出不同的阶段特定取证信号,因此检测有效性关键取决于信号在何处被观测到。第三,后期阶段的信号(如训练动态和高斯参数统计)提供了早期阶段不可观测的强线索。总体而言,我们的工作提供了一个原则性基准,并首次系统表征了3DGS中阶段依赖的可检测性,为未来研究鲁棒可靠的3DGS系统奠定了基础。

英文摘要

3D Gaussian Splatting (3DGS) has rapidly emerged as a leading representation for real-time novel view synthesis, but recent work shows it is vulnerable to diverse poisoning attacks, including illusory object injection, computation cost amplification, and post hoc model watermarking. Despite this expanding threat surface, existing studies focus mainly on attack success, while defense and detection remain underexplored. From a detection perspective, a key challenge and opportunity arise from the multi-stage nature of the 3DGS reconstruction pipeline, which produces heterogeneous intermediate representations. Forensic signals for detecting poisoning are inherently stage dependent: an attack introduced at one stage may produce signals that emerge only at later stages. This motivates a stage-wise view of detectability that goes beyond single-stage evaluation. We introduce Poison-3DGS, a benchmark for stage-wise characterization of poisoning detection in 3DGS. It exposes stage-specific artifacts, including multi-view images, geometry, training dynamics, and Gaussian parameters, across a diverse set of scenes and attacks. Using it, we conduct a systematic study of detectability across pipeline stages. Our analysis reveals several insights. First, detectability varies significantly across stages, and no single stage consistently dominates across attack types. Second, different attacks exhibit distinct stage-specific forensic signals, so detection effectiveness depends critically on where signals are observed. Third, later-stage signals such as training dynamics and Gaussian parameter statistics provide strong cues not observable at earlier stages. Overall, our work provides a principled benchmark and the first systematic characterization of stage-dependent detectability in 3DGS, offering a foundation for future research on robust and reliable 3DGS systems.

2606.03498 2026-06-03 cs.LG cs.DC

Demystifying Pipeline Parallelism: First Theory for PipeDream

揭秘流水线并行:PipeDream 的首个理论

Ivan Ilin, Peter Richtárik

AI总结 本文通过引入随机化 PipeDream (RPD) 抽象,首次为 PipeDream 风格方法提供了非凸收敛保证,并分析了其稳态延迟与阶段数的缩放关系,同时与 LocalSGD 进行了比较。

详情
Comments
40 pages, 4 figures
AI中文摘要

训练现代机器学习模型越来越需要跨多个加速器进行分布式计算。数据并行仍然是默认选择,并且通常与张量并行分片相结合,但一旦参数、激活或优化器状态不再适合单个设备,模型并行就变得不可避免。本文通过 PipeDream (PD) (Harlap et al., 2018) 的视角研究流水线模型并行。我们的第一个贡献是理论性的:我们引入了随机化 PipeDream (RPD),一种陈旧块-SGD 抽象,据我们所知,这为 PD 风格方法提供了第一个干净的非凸收敛保证。我们的第二个贡献是扩展诊断:我们证明了稳态 PD 引起的延迟随阶段数 S 增长为 $S^2 - S/2 + O(1)$,因此收敛定理中的陈旧读取贡献缩放为 $\Theta(\gamma^2 S^4)$,在调谐速率形式中等价于 $\Theta(S^4/K)$。我们的第三个贡献是与 LocalSGD 的比较,后者通过周期性模型平均来权衡权重陈旧性与同步气泡。在我们报告的模拟时间实验中,表现更好的方法取决于目标:PD 在二次目标和小型语言建模训练损失任务上表现更好,而对于逻辑回归,随着阶段数增加,LocalSGD 变得优越。

英文摘要

Training modern machine learning models increasingly requires computation to be distributed across many accelerators. Data parallelism remains the default choice and is often paired with tensor-parallel sharding, but model parallelism becomes unavoidable once parameters, activations, or optimizer states no longer fit on a single device. This paper studies pipeline model parallelism through the lens of PipeDream (PD) (Harlap et al., 2018). Our first contribution is theoretical: we introduce Randomized PipeDream (RPD), a stale block-SGD abstraction that yields, to our knowledge, the first clean nonconvex convergence guarantee for a PD-style method. Our second contribution is a scaling diagnosis: we prove that the delay induced by steady-state PD grows as $S^2 - S/2 + O(1)$ for $S$ stages, so the stale-read contribution in the convergence theorem scales as $Θ(γ^2 S^4)$, equivalently as $Θ(S^4/K)$ in the tuned-rate form. Our third contribution is a comparison with LocalSGD, whose periodic model averaging trades weight staleness for synchronization bubbles. In our reported simulated-time experiments, the better-performing method depends on the objective: PD performs better on the quadratic objective and on a small language-modeling training-loss task, while for logistic regression LocalSGD becomes superior as the number of stages increases.

2606.03495 2026-06-03 cs.LG

HiSE: A Lightweight Hierarchical Semantic Explainer for Heterogeneous Graph Neural Networks

HiSE:一种用于异构图神经网络的轻量级层次语义解释器

Zongrui Li, Yuhang Zhao, Ying Zhao, Yuanzhao Guo, Qiang Huang, Yuan Tian

AI总结 提出HiSE,一种轻量级特征导向的可解释模型,通过层次语义建模(语义级LASSO稀疏特征学习和跨语义级KL散度自适应融合)实现高保真、低计算开销的异构图神经网络解释。

详情
AI中文摘要

异构图神经网络(HGNNs)在建模复杂关系数据方面表现出色,然而在高风险应用中的可解释性仍然是一个关键挑战。现有的解释方法存在两个主要局限性:一方面,生成的解释未能反映HGNNs固有的语义层次,导致对模型内部决策机制的保真度不足;另一方面,特征解释通常依赖于复杂的搜索或扰动机制,导致计算复杂度过高且效率低下。为了解决这些问题,我们提出了HiSE,一种轻量级特征导向的HGNNs可解释模型。HiSE通过层次语义建模实现语义感知的特征解释:在语义层面,采用基于最小绝对收缩和选择算子(LASSO)的局部代理模型学习每个语义视图下的稀疏特征表示;在跨语义层面,通过KL散度自适应地表征不同语义视图的贡献,生成统一的解释。大量实验表明,HiSE在保真度、鲁棒性和跨语义解释能力方面优于现有方法,同时其轻量级框架具有较低的计算开销,能够高效应用于大规模、复杂的真实世界异构图。

英文摘要

Heterogeneous graph neural networks (HGNNs) have demonstrated remarkable performance in modeling complex relational data, however their interpretability in high-stakes applications remains a critical challenge. Existing explanation methods suffer from two major limitations: on the one hand, the generated explanations fail to reflect the inherent semantic hierarchy of HGNNs, resulting in a lack of fidelity to the model's internal decision-making mechanism; on the other hand, feature explanations often rely on complex search or perturbation mechanisms, leading to excessive computational complexity and poor efficiency. To address these issues, we propose HiSE, a lightweight feature-oriented interpretable model for HGNNs. HiSE achieves semantically aware feature explanations through hierarchical semantic modeling: at the semantic level, local surrogate models based on the Least Absolute Shrinkage and Selection Operator (LASSO) are employed to learn sparse feature representations under each semantic view; at the cross-semantic level, the contributions of different semantic views are adaptively characterized via KL divergence to produce a unified explanation. Extensive experiments demonstrate that HiSE outperforms existing methods in terms of fidelity, robustness, and cross-semantic explanation capability, while its lightweight framework incurs low computational overhead, enabling efficient application to large-scale, complex real-world heterogeneous graphs.

2606.03493 2026-06-03 cs.CV cs.LG

Low-Frequency Shortcuts in Texture-Driven Visual Learning

纹理驱动视觉学习中的低频捷径

Utku Şirin, Cathy Hou, David Alvarez-Melis, Stratos Idreos

AI总结 本文分析了纹理驱动领域中神经网络依赖低频成分作为捷径的现象,提出通过裁剪低频成分来消除捷径,从而提升分布内准确率和鲁棒性。

详情
AI中文摘要

神经网络存在捷径学习问题,即学习到的特征在训练集上泛化良好,但在分布内(ID)或分布外(OOD)测试集上表现不佳。现有研究均基于少数几个标准基准,这些基准是形状驱动的。然而,许多应用领域是纹理驱动的。在这项工作中,我们针对纹理驱动领域进行了捷径学习分析,并将其与标准基准进行了比较。我们表明,纹理驱动领域存在低频捷径。它们主要基于少数具有偏斜频谱行为的低频成分(LFC)做出决策,尽管其分类信息存在于更高频率的细粒度细节中。从训练集和测试集中裁剪LFC可以消除捷径,并提供更平衡的频谱行为,将ID准确率提升高达8%。我们表明,低频捷径使模型极易受到OOD干扰的影响,导致与ID准确率相比下降高达70%。裁剪LFC显著提高了对低频干扰的鲁棒性,提升高达40%,并引入了对高频干扰的权衡;平衡的频谱行为提供了更好的泛化性能,而对高频特征的依赖增加则降低了泛化性能。OOD准确率取决于这两个因素之间的相互作用。

英文摘要

Neural networks suffer from shortcut learning, where learned features generalize well to the training set but not to in-distribution (ID) or out-of-distribution (OOD) test sets. Existing studies are all based on a few standard benchmarks, which are shape-driven. Numerous application domains, however, are texture-driven. In this work, we present shortcut learning analysis for texture-driven domains, and compare it with that of a standard benchmark. We show that texture-driven domains suffer from low-frequency shortcuts. They make the majority of their decisions based on a few low-frequency components (LFCs) with a skewed spectral behavior, despite that their classification information is in higher-frequency, fine-grained details. Pruning LFCs from training and test sets eliminates the shortcut and provides a more balanced spectral behavior, improving the ID accuracy by up to 8%. We show that low-frequency shortcuts make the models highly vulnerable to OOD corruptions, leading up to 70% accuracy drop compared to the ID accuracy. Pruning LFCs significantly improves robustness to low-frequency corruptions, by up to 40%, and introduces a trade-off for high-frequency corruptions; the balanced spectral behavior provides a better generalization performance, whereas the increased dependence on high-frequency features reduces it. OOD accuracy depends on the interaction between these two factors.

2606.03490 2026-06-03 cs.CV

TrAction: Action Recognition with Sparse Trajectories

TrAction: 基于稀疏轨迹的动作识别

Jan F. Meier, Felix B. Mueller, Alexander Ecker, Timo Lüddecke

AI总结 提出使用稀疏点轨迹作为输入模态,结合掩码轨迹预训练的Transformer架构,在降低计算成本的同时实现高效动作识别,并证明轨迹特征与外观特征互补。

详情
AI中文摘要

现代动作识别模型运行在内存和计算密集的密集RGB视频体积上,并且经常利用外观和背景捷径,例如从物体或场景而不是特征运动来预测动作。我们研究了一种高效的替代输入模态,它通过构造在很大程度上避免了这种偏差:稀疏点轨迹。为此,我们开发了一个简单的Transformer架构用于基于2.5D轨迹的识别,并配合掩码轨迹预训练,我们证明这能显著提高下游动作识别准确率。尽管仅使用密集RGB输入的一小部分,我们的方法在Something-Something V2上达到45%的top-1准确率,在EPIC-Kitchens-100上达到54%,并在时间反转敏感性上超过了V-JEPA。更重要的是,我们发现轨迹特征与最先进的基于外观的特征互补。将我们的预训练模型与DINOv2和V-JEPA 2融合,在Something-Something V2上top-1准确率分别提高了8.7和1.6个百分点。代码:此 https URL

英文摘要

Modern action recognition models operate on memory- and compute-intensive dense RGB video volumes and frequently exploit appearance and background shortcuts, for example, predicting actions from objects or scenes instead of characteristic motion. We investigate an efficient alternative input modality that is largely free of such biases by construction: sparse point trajectories. To this end, we develop a simple transformer architecture for 2.5D trajectory-based recognition together with a masked-trajectory pretraining, which we show to substantially improve downstream action recognition accuracy. Despite using only a fraction of the dense RGB input, our method reaches 45% top-1 on Something-Something V2 and 54% on EPIC-Kitchens-100, and surpasses V-JEPA on time-reversal sensitivity. More importantly, we find trajectory features to be complementary to state-of-the-art appearance-based features. Fusing our pretrained model with DINOv2 and V-JEPA 2 improves top-1 accuracy on Something-Something V2 by 8.7 and 1.6 points, respectively. Code: https://github.com/ecker-lab/TrAction

2606.03489 2026-06-03 cs.CR cs.AI

Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs

从错误中学习:面向安全代码LLM的树状自博弈

Wenqi Chen, Ziyan Zhang, Bing Wang, Lin Liu, Hengheng Zhang, Zhengsu Chen

AI总结 提出树状自博弈(TSP)框架,将安全代码生成建模为细粒度序列决策过程,通过构建决策树探索安全与脆弱路径,使模型在关键决策节点自我纠正,显著提升代码安全性并实现跨语言泛化。

详情
Comments
18 pages, 3 figures
AI中文摘要

尽管大型语言模型(LLM)在代码生成方面表现出色,但它们仍然容易复制训练数据中固有的细微但关键的安全漏洞。当前的校准技术,如监督微调(SFT)和强化学习(RL),通常在序列级别应用粗粒度的优化。这种方法往往无法解决安全缺陷的局部性,即单个错误的token选择可能危及整个程序。为了弥合这一差距,我们引入了树状自博弈(TSP),一个将安全代码生成重新定义为细粒度序列决策过程的框架。与盲目最大化似然的标准方法不同,TSP构建了一个决策树,模型在其中探索分支轨迹——同时生成安全的“黄金路径”和易受攻击的变体。通过将代码生成视为自博弈游戏,模型学会严格区分自身的局部错误。这提供了一个密集的、在策略的学习信号,迫使模型在通常出现漏洞的关键决策节点进行自我纠正。我们的实验表明,TSP从根本上提高了模型的可靠性。在Python安全基准测试中,TSP将CodeLlama-7B的通过率(SPR@1)提升至75.8%,显著优于SFT(57.0%)和非结构化自博弈基线。关键的是,TSP引发了鲁棒的分布外泛化:模型不仅将未见类别(CWE)中的漏洞减少了24.5%,还成功将从C/C++学到的安全原则迁移到多种语言,包括Python、Go和JavaScript。这表明TSP不仅仅是记忆补丁,而是内化了抽象的、与语言无关的安全逻辑。

英文摘要

While Large Language Models (LLMs) excel in code generation, they remain prone to replicating subtle yet critical vulnerabilities endemic to their training data. Current alignment techniques, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), typically apply coarse-grained optimization at the sequence level. This approach often fails to address the localized nature of security flaws, where a single incorrect token choice can compromise an entire program. To bridge this gap, we introduce Tree-like Self-Play (TSP), a framework that reframes secure code generation as a fine-grained sequential decision process. Unlike standard methods that blindly maximize likelihood, TSP constructs a decision tree where the model explores branching trajectories--generating both secure "golden paths" and vulnerable variants. By treating code generation as a self-play game, the model learns to strictly discriminate against its own localized errors. This provides a dense, on-policy learning signal that forces self-correction precisely at the critical decision nodes where vulnerabilities typically emerge. Our experiments demonstrate that TSP fundamentally enhances model reliability. In Python security benchmarks, TSP boosts CodeLlama-7B's pass rate (SPR@1) to 75.8%, significantly outperforming SFT (57.0%) and unstructured self-play baselines. Crucially, TSP induces robust out-of-distribution generalization: the model not only reduces vulnerabilities in unseen categories (CWEs) by 24.5% but also successfully transfers security principles learned from C/C++ to diverse languages, including Python, Go, and JavaScript. This suggests that TSP does not merely memorize patches, but internalizes abstract, language-agnostic security logic.

2606.03486 2026-06-03 cs.CR cs.AI

NeuroArmor: Safe-Variant-Guided Representation Consistency for Selective Re-Anchoring in Jailbreak Defense

NeuroArmor:基于安全变体引导的表示一致性实现越狱防御中的选择性重新锚定

Zhongyang Lin, Ziran Zhao, Feifei Zhai, Pengyuan Liu

AI总结 提出NeuroArmor白盒运行时防御方法,通过为每个提示构建安全变体作为局部安全参考,在隐藏状态空间进行一致性检查并路由异常,有效降低恶意攻击成功率同时保持低误报率。

详情
Comments
16 pages, 4 figures, 17 tables. Submitted to ACL ARR
AI中文摘要

大型语言模型仍然容易受到越狱攻击,这些攻击将有害意图隐藏在看似普通的请求背后,例如角色扮演、翻译、编码、对抗性后缀和多轮铺垫。现有的防御方法仍然难以在不过度拦截良性但敏感的请求的情况下处理这些攻击,部分原因是它们通常对每个提示应用相同的操作,因此无法平衡安全性和有用性。我们提出NeuroArmor,一种白盒运行时防御方法,它使用提示特定的安全变体作为局部安全参考,用于决定何时需要干预,并在触发时作为干预的安全目标。对于每个提示,NeuroArmor构建K个安全变体,在隐藏状态空间中将提示状态与此局部安全参考进行比较,并将异常路由到恶意提示的拒绝分支或边界良性提示的有用恢复分支。在Llama-3-8B-Instruct上,NeuroArmor将恶意攻击成功率(ASR)从41.56%降低到1.57%,同时将共享良性池上的良性误报率(FPR)从30.26%降低到22.05%;匹配的基线在此权衡上仍然明显较弱。外部评估者和手动行为评估进一步表明,剩余未拦截的输出产生操作危害的可能性大大降低。总体而言,NeuroArmor通过结合提示特定的一致性检查、路由和选择性干预,为越狱防御提供了更有效的运行时策略。

英文摘要

Large language models remain vulnerable to jailbreak attacks that hide harmful intent behind seemingly ordinary requests such as role-play, translation, encoding, adversarial suffixes, and multi-turn buildup. Existing defenses still struggle to handle these attacks without over-blocking benign but sensitive requests, partly because they often apply the same action to every prompt and therefore fail to balance safety and helpfulness. We propose NeuroArmor, a white-box runtime defense that uses prompt-specific safe variants as a local safety reference for deciding when intervention is needed and, once triggered, as safe targets for intervention. For each prompt, NeuroArmor builds K safe variants, compares the prompt state against this local safe reference in hidden-state space, and routes anomalies either to a refusal branch for malicious prompts or to a helpful recovery branch for borderline benign prompts. On Llama-3-8B-Instruct, NeuroArmor reduces malicious attack success rate (ASR) from 41.56% to 1.57% while lowering benign false positive rate (FPR) on the shared benign pool from 30.26% to 22.05%; matched baselines remain substantially weaker on this trade-off. External-judge and manual behavioral evaluations further show that the remaining non-blocked outputs are much less likely to be operationally harmful. Overall, NeuroArmor provides a more effective runtime strategy for jailbreak defense by combining prompt-specific consistency checking, routing, and selective intervention.

2606.03483 2026-06-03 cs.LG cs.AI

Analyzing Stream Collapse in Hyper-Connections: From Diagnosis to Mitigation

分析超连接中的流坍缩:从诊断到缓解

Ekaterina Alimaskina, Gleb Molodtsov, Aleksandr Beznosikov

AI总结 本文通过细粒度诊断发现超连接中的多流残差连接存在流坍缩现象,即信号集中于主导流,并通过打破初始化对称性缓解该问题以提升性能。

详情
AI中文摘要

超连接(HC)用多个流替换单个Transformer残差流,引入了流索引上的置换对称性。我们研究这种对称性在实践中如何被打破:流是平衡地专门化还是表现出主导流使用。通过对基于HC的语言模型进行细粒度诊断,我们追踪多流表示的实际使用方式。我们发现,在早期种子阶段之后,残差混合通常保持接近恒等映射,限制了HC在流之间交换信息的核心机制。此外,信号和可解释特征都集中在一个主导流中,名义上的多流残差连接可能未充分利用其容量,行为更接近单流残差路径。最后,我们表明在流初始化时打破对称性可以减少主导行为并提高各种 extit{m}HC变体的性能。我们的代码已公开。

英文摘要

Hyper-Connections (HC) replace the single Transformer residual stream with multiple streams, introducing a permutation symmetry over stream indices. We study how this symmetry is resolved in practice: whether streams specialize in a balanced way or exhibit dominant-stream usage. Using fine-grained diagnostics for HC-based language models, we trace how multi-stream representations are actually used. We find that after an early seeding stage, residual mixing often remains close to identity, limiting a core HC mechanism for exchanging information between streams. Moreover, both signal and interpretable features concentrate in a dominant stream, and the nominally multi-stream residual connection can underutilize its capacity, behaving closer to a single-stream residual pathway. Finally, we show that breaking symmetry at stream initialization reduces dominant behavior and improves performance across \textit{m}HC variants. Our code is publicly available.

2606.03479 2026-06-03 cs.CV cs.GR

PersistGS: Differentiable Physics for Object Permanence in 4D Gaussian Splatting

PersistGS: 4D高斯溅射中物体持久性的可微物理

Adrian Ramlal, John S. Zelek

AI总结 提出PersistGS方法,通过将可微刚体模拟与3D高斯溅射耦合,在物体被遮挡期间利用物理规律预测其SE(3)轨迹,从而恢复物体持久性,并引入质心轮廓损失降低轨迹误差。

详情
Journal ref
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026, pp. 4687-4696
Comments
Accepted in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026 Workshop on Generative 3D Reconstruction
AI中文摘要

动态3D高斯溅射(3DGS)方法通过光度监督从同步多相机视频重建时变场景。当一个运动物体被所有训练相机完全遮挡时,光度监督消失:表示该物体的高斯体无法接收梯度信号而退化。现有处理神经重建中不完整观测的方法依赖于学习到的生成先验,这些先验优先考虑视觉合理性而非物理正确性。我们提出$ extbf{PersistGS}$,一种通过将可微刚体模拟与3D高斯溅射耦合来在遮挡期间恢复物体持久性的方法。我们的方法将场景分解为每个物体的高斯体和碰撞网格,通过可微模拟从观测到的遮挡前轨迹估计摩擦和速度,并利用得到的SE(3)轨迹在整个遮挡期间定位物体高斯体。由于预测轨迹满足刚体动力学的控制方程,它能够忠实捕捉接触事件(弹跳、基于摩擦的减速、方向变化),而运动学外推无法建模这些事件。我们引入质心轮廓损失,将位置梯度与外观噪声分离,使轨迹误差比光度监督降低40%。我们使用在训练中保留的相机进行评估,这些相机在遮挡期间观察物体。在合成场景上的实验表明,PersistGS在PSNR上比恒定速度外推高出2.46dB,并且与真实轨迹上限仅差0.19dB。

英文摘要

Dynamic 3D Gaussian Splatting (3DGS) methods reconstruct time-varying scenes from synchronized multi-camera video using photometric supervision. When a moving object becomes fully occluded from all training cameras, this supervision vanishes: the Gaussians representing it receive no gradient signal and degrade. Existing approaches to incomplete observations in neural reconstruction rely on learned generative priors that prioritize visual plausibility over physical correctness. We propose $\textbf{PersistGS}$, a method that restores object permanence during occlusion by coupling differentiable rigid body simulation with 3D Gaussian Splatting. Our approach decomposes the scene into per-object Gaussians and collision meshes, estimates friction and velocity from the observed pre-occlusion trajectory via differentiable simulation, and uses the resulting SE(3) trajectory to position object Gaussians throughout the occlusion period. Because the predicted trajectory satisfies the governing equations of rigid body dynamics, it faithfully captures contact events (bounces, friction-based deceleration, direction changes) that kinematic extrapolation cannot model. We introduce a centroid silhouette loss that isolates positional gradients from appearance noise, yielding 40% lower trajectory error than photometric supervision. We evaluate using cameras withheld from training that observe the object during its occlusion. Experiments on synthetic scenes show that PersistGS outperforms constant velocity extrapolation by +2.46dB PSNR and comes within 0.19dB of a ground-truth trajectory upper bound.

2606.03476 2026-06-03 cs.RO

Human2Humanoid: Physics-Aware Cross-Morphology Motion Retargeting for Humanoid Robots

Human2Humanoid: 面向人形机器人的物理感知跨形态运动重定向

Tianchen Huang, Feiyang Yuan, Junchi Gu, Shurui Fang, Xiaohu Zhang, Yu Wang, Wei Gao, Shiwu Zhang

AI总结 提出Human2Humanoid无监督运动重定向框架,利用CycleGAN和骨架感知图卷积网络处理未配对数据,通过形态不变末端执行器一致性损失和物理感知可行性约束,实现从人体运动到人形机器人的高保真重定向。

详情
Comments
Project page: https://huangtc233.github.io/human2humanoid_website/
AI中文摘要

将人体运动重定向到人形机器人对于远程操作、模仿学习和人机交互至关重要。然而,由于人类与机器人在骨骼拓扑、肢体比例和自由度等方面的显著形态差异,以及配对运动数据的稀缺性,这仍然具有挑战性。本文提出了Human2Humanoid,一种无监督运动重定向框架,能够将人体运动高保真地迁移到人形机器人行为。为了在未配对数据下弥合领域差距,我们采用基于CycleGAN的架构,配备骨架感知图卷积网络来捕获拓扑相关的运动特征。为了解决跨域尺度不匹配问题,我们引入了一种形态不变的末端执行器一致性损失,该损失对齐归一化的末端执行器轨迹,以保留跨实体的运动语义。为了提高物理合理性并减少接触伪影,我们施加了显式的物理感知可行性约束,以鼓励再现源运动中的接触模式。实验结果表明,所提出的方法成功地将人体运动重定向到Unitree G1人形机器人,无需配对数据,并且在下游可控性和物理可行性方面均优于现有方法。

英文摘要

Retargeting human motion to humanoid robots is critical for teleoperation, imitation learning and human-robot interaction. However, it remains challenging because of substantial morphological discrepancies between humans and robots, including differences in skeletal topology, limb proportions and degrees of freedom, as well as the scarcity of paired motion data. This paper presents Human2Humanoid, an unsupervised motion retargeting framework that transfers human motions to humanoid robot behaviors with high fidelity. To bridge the domain gap under unpaired data, we adopt a CycleGAN-based architecture equipped with a skeleton-aware graph convolutional network to capture topology-dependent motion features. To address cross-domain scale mismatches, we introduce a morphology-invariant end-effector consistency loss that aligns normalized end-effector trajectories to preserve motion semantics across embodiments. To improve physical plausibility and reduce contact artifacts, we impose explicit physics-aware feasibility constraints to encourage reproduction of the contact patterns in the source motion. Experimental results show that the proposed method successfully retargets human motion to the Unitree G1 humanoid robot without paired data, and outperforms existing methods in both downstream controllability and physical feasibility.

2606.03471 2026-06-03 cs.AI cs.MA q-bio.NC

A formal definition and meta-model for a machine theory of mind

机器心智理论的正式定义与元模型

Fabio Cuzzolin

AI总结 本文基于认知心理学、神经科学和人工智能证据,首次提出机器心智理论的严格形式化定义,并构建整体元模型,以审视现有研究并推动未来突破。

详情
Comments
48 pages, 2 figures
AI中文摘要

本文首次提出了机器心智理论概念的严格形式化定义,该定义基于认知心理学、神经科学和人工智能证据支持的原则,并以此作为视角审视该领域的最新进展和当前努力,推动进一步研究以“破解”该问题的潜在议程。本文还提出了一个通用的整体机器心智理论元模型,并考察了在经验基准测试此类模型方面的最新进展。

英文摘要

This paper proposes, for the first time, a rigorous formal definition of the concept of Machine Theory of Mind, based on principles supported by evidence from cognitive psychology, neuroscience and artificial intelligence, and uses the above as a lens to examine state-of-the-art and current efforts in the field, driving a potential agenda for further research there able to "crack" the problem. It also advances a general holistic meta-model for Machine Theory of Mind, and examines the state of the art when it comes to empirically benchmarking such models.

2606.03470 2026-06-03 cs.CV

Mixed-Modality Dual Face-Hair Retrieval

混合模态双人脸-发型检索

Quoc-Anh Bui-Huynh, Mai-Tuyen Lam, Dai-Anh-Tuan Nguyen, Thanh Duc Ngo

AI总结 提出混合模态双参考检索任务DFHR,通过解耦身份与发型特征并融合多模态嵌入,实现跨模态的身份感知与属性可控检索。

详情
AI中文摘要

我们提出了双人脸-发型检索(DFHR),这是一种图像检索中新的混合模态双参考任务,其中查询由指定身份的人脸图像和以图像或文本形式表达的发型参考组成。与先前的检索设置不同,DFHR需要对来自异质模态的两个语义独立属性——身份和发型——进行跨组件推理。这种表述要求在统一的嵌入空间内实现局部特征解耦、跨模态语义对齐和混合模态组合。我们构建了DFHR-Bench,这是首个用于混合模态人脸-发型检索的基准,包含超过18万个标注三元组,涵盖双图像和图像-文本设置,通过多阶段标注协议构建,确保语义和身份完整性。我们进一步提出了MFHC(多模态人脸-发型组合器),一个统一的框架,通过令牌注入和多视角监督融合解耦的身份和发型嵌入。DFHR和DFHR-Bench共同为跨模态的身份感知、属性可控视觉检索建立了新的范式。

英文摘要

We introduce Dual Face-Hair Retrieval (DFHR), a new mixed-modality dual-reference task in image retrieval where a query consists of a face image specifying identity and a hairstyle reference expressed as either an image or text. Unlike prior retrieval settings, DFHR requires cross-component reasoning between two semantically independent attributes -- identity and hairstyle -- originating from heterogeneous modalities. This formulation demands localized feature disentanglement, cross-modal semantic alignment, and mixed-modality composition within a unified embedding space. We construct DFHR-Bench, the first benchmark for mixed-modality face-hair retrieval, comprising over 180K annotated triplets across dual-image and image-text settings, built via a multi-stage annotation protocol ensuring semantic and identity integrity. We further propose MFHC (Multimodal Face-Hair Combiner), a unified framework that fuses disentangled identity and hairstyle embeddings through token injection and multi-view supervision. DFHR and DFHR-Bench together establish a new paradigm for identity-aware, attribute-controllable visual retrieval across modalities.

2606.03467 2026-06-03 cs.AI

StepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent Systems

StepFinder:多智能体系统中故障归因的时间语义框架

Taiyu Zhu, Yifan Wu, Weilin Jin, Ying Li, Gang Huang

AI总结 提出StepFinder框架,通过将执行日志编码为时间语义序列并利用时序建模与注意力模块,高效准确地定位多智能体系统中的故障根因步骤。

详情
Comments
12 pages, 5 figures. Accepted by KDD 2026
AI中文摘要

基于LLM的多智能体系统在复杂多步骤任务中展现出显著的协作能力。然而,这些系统对单步执行错误高度敏感,错误会通过智能体交互传播并导致级联故障。为理解故障原因并提高系统可靠性,故障归因被引入作为一项任务,旨在自动识别导致故障的根因步骤。现有故障归因方法主要依赖LLM对原始执行轨迹进行推理,这不仅导致高推理成本和延迟,还受到冗余和噪声执行日志的干扰,使LLM难以准确识别真正的根因步骤。为此,我们提出StepFinder,一个轻量级故障归因框架。我们仅在特征构建阶段使用LLM将执行日志编码为时间语义序列。随后,应用参数高效的时序建模与注意力模块组合来捕捉轨迹的序列演化与跨步骤依赖。最后,通过多尺度差异和位置偏差细化步骤级错误分数,实现精确的根因识别。在Who&When基准上的实验结果表明,StepFinder在步骤级故障归因上优于基于LLM的方法,同时实现了显著更高的推理效率,与最快的基于LLM的方法相比,推理时间减少79%,且无文本生成开销。我们的代码可从此https URL获取。

英文摘要

LLM-based multi-agent systems exhibit remarkable collaborative capabilities in complex multi-step tasks. However, these systems are highly sensitive to single-step execution errors that can propagate through agent interactions and lead to cascading failures. To understand the causes of failure and improve system reliability, failure attribution has been introduced as a task that aims to automatically identify the root cause step responsible for a failure. Existing failure attribution methods mainly rely on LLMs to reason over original execution trajectories, which not only incur high inference costs and latency, but also suffer from interference caused by redundant and noisy execution logs, causing LLMs to struggle in accurately identifying the true root cause step. To address this, we propose StepFinder, a lightweight failure attribution framework. We use LLMs solely during the feature construction phase to encode execution logs into temporal semantic sequences. Subsequently, a parameter-efficient combination of temporal modeling and attention modules is applied to capture the sequential evolution and cross-step dependencies of the trajectories. Finally, the step-level error score is refined through multi-scale differences and position bias, enabling precise root cause identification. Experimental results on the Who&When benchmark demonstrate that StepFinder outperforms LLM-based methods in step-level failure attribution while achieving substantially higher inference efficiency, reducing inference time by 79% compared with the fastest LLM-based method, with no text generation overhead. Our code is available at https://github.com/taiyu-zhu/StepFinder.

2606.03465 2026-06-03 cs.LG cs.AI

Rethinking the Role of Tensor Decompositions in Post-Training LLM Compression

重新思考张量分解在训练后大语言模型压缩中的作用

Artur Zagitov, Alexander Miasnikov, Maxim Krutikov, Vladimir Aletov, Gleb Molodtsov, Nail Bashirov, Artem Tsedenov, Aleksandr Beznosikov

AI总结 本文系统评估了张量分解在稠密和MoE架构上的训练后压缩效果,通过实证与理论分析揭示了其与LLM异构表示之间的根本性不匹配,从而界定了其实际限制和在规模化部署中的可行角色。

详情
AI中文摘要

训练后压缩对于在资源紧张条件下部署大型语言模型(LLM)至关重要。张量分解已成为一个有前景的方向,提供了适合Transformer权重结构的紧凑参数化。然而,现有研究在狭窄的设置中评估这些方法,使得张量化在大规模部署中是否有效尚不清楚。我们系统评估了稠密和MoE架构上的张量压缩,建立了基于实证分析和理论分析的性能权衡。我们识别出张量分解假设的共享子空间与现代LLM学习的异构表示之间的根本性不匹配,从而界定了它们的实际限制,并阐明了它们在大规模部署中的可行角色。代码可在该网址获取。

英文摘要

Post-training compression is essential for deploying large language models (LLMs) under tight resource constraints. Tensor decompositions have emerged as a promising direction, offering compact parameterizations well suited to Transformer weight structures. However, existing studies evaluate these methods in narrow settings, leaving unclear whether tensorization is effective at large-scale deployment. We systematically evaluate tensor compression across dense and MoE architectures, establishing performance trade-offs grounded in both empirical analysis and theoretical analysis. We identify a fundamental mismatch between the shared subspaces assumed by tensor decompositions and the heterogeneous representations learned by modern LLMs, thereby delineating their practical limits and clarifying their viable role in large-scale deployment. The code is available at https://github.com/brain-lab-research/TT-LLM.

2606.03463 2026-06-03 cs.AI cs.CL

DMF: A Deterministic Memory Framework for Conversational AI Agents

DMF:对话式AI代理的确定性记忆框架

Matteo Stabile, Enrico Zimuel

AI总结 提出一种CPU优先的确定性记忆框架DMF,通过经典NLP分析、向量几何和数学评分替代生成式记忆压缩,实现零令牌成本且与Mem0相当的准确性。

详情
Comments
21 pages, 3 figures
AI中文摘要

对话式AI代理需要在大时间跨度的交互中既具可扩展性又语义连贯的记忆系统。现有方法主要依赖基于大语言模型(LLM)的写入时摘要,这引入了非确定性、令牌成本上升以及剪枝决策不透明等问题。我们提出确定性记忆框架(DMF),一种CPU优先的方法,用完全确定性的流水线替代生成式记忆压缩,该流水线基于经典NLP分析、向量几何和数学评分。DMF为每次对话交互分配一个生存分数$\Omega$,该分数由确定性内容信号、对话线索和结构化来源通过逻辑投影组合计算得出。一个交互计数衰减定律,记为$\Omega_{\mathrm{eff}}(\Delta n)$,控制着相关性随新轮次到达的演变,其中$\Delta n$是较新交互的数量而非实际时间,从而保持完全确定性。我们给出了DMF的数学公式、结构化召回流水线、剪枝决策过程和评估协议。实验在基于LoCoMo和LongMemEval数据集构建的专用基准上进行。我们将DMF与AI代理的流行记忆层Mem0进行比较。DMF在准备记忆上下文时使用零令牌,在整个对话中使用的令牌数少5到242倍,同时达到相当的准确性。这些结果表明,可以从记忆管理循环中消除LLM调用,将令牌成本降至几乎为零,并为对话式AI代理实现确定性记忆系统。

英文摘要

Conversational AI agents require memory systems that are both scalable and semantically coherent across long interaction horizons. Existing approaches rely predominantly on large language model (LLM)-based summarisation at write time, which introduces non-determinism, escalating token costs, and opacity in pruning decisions. We present the Deterministic Memory Framework (DMF), a CPU-first approach that replaces generative memory compression with a fully deterministic pipeline grounded in classical NLP analysis, vector geometry, and mathematical scoring. DMF assigns each conversational interaction a Survival Score $Ω$ computed from deterministic content signals, conversational cues, and structured provenance, combined through a logistic projection. An interaction-count decay law, denoted as $Ω_{\mathrm{eff}}(Δn)$, governs how relevance evolves as new turns arrive, where $Δn$ is the number of newer interactions rather than wall-clock time, preserving full determinism. We present the mathematical formulation of DMF, its structured recall pipeline, the pruning decision procedure, and the evaluation protocol. Experiments are conducted on a purpose-built benchmark using the LoCoMo and LongMemEval datasets. We compare DMF against Mem0, a popular memory layer for AI agents. DMF achieves comparable accuracy while using zero tokens to prepare the memory context and 5x to 242x fewer tokens over the entire conversation. These results show that it is possible to eliminate LLM calls from the memory-management loop, reducing token costs to nearly zero and enabling deterministic memory systems for conversational AI agents.

2606.03462 2026-06-03 cs.LG cs.SI

Topology-Aware Gaussian Graph Repair for Robust Graph Neural Networks

拓扑感知的高斯图修复用于鲁棒图神经网络

Anubha Goel, Juho Kanniainen

AI总结 提出拓扑感知高斯修复(TAGR)框架,通过自适应高斯核构建稀疏特征邻域图并结合拓扑感知残差校正,在不改变网络架构的情况下提升图神经网络在噪声边和缺失边场景下的鲁棒性。

详情
AI中文摘要

图神经网络在图结构数据上取得了强劲性能,但其有效性高度依赖于观测图的质量。在实际应用中,图拓扑往往不完美:噪声边可能连接无关节点,而缺失边可能阻碍有用信息的传播。现有的鲁棒图学习方法主要通过移除可疑边或在训练过程中学习新图结构来解决这一问题。然而,仅移除边无法恢复缺失连接,而图结构学习可能引入额外的优化复杂度。在本文中,我们提出拓扑感知高斯修复(TAGR),一种用于图神经网络中鲁棒消息传递的简单图修复框架。TAGR 不学习稠密邻接矩阵,而是使用自适应高斯核构建稀疏特征邻域图,并将其与观测图的拓扑感知残差校正相结合。高斯修复组件在特征相似节点之间引入辅助边,而残差校正根据局部特征和结构一致性保留并重新加权原始拓扑。修复后的图可直接用于标准图神经网络,无需改变其架构。在基准引文网络上的大量实验表明,TAGR 在噪声边和缺失边设置下均能提升 GNN 的鲁棒性。进一步分析表明,高斯特征邻域修复提供了主要的鲁棒性增益,而拓扑感知残差校正在观测图不完整时提高了稳定性。这些结果表明,通过轻量级稀疏图修复而非稠密图结构学习即可实现有效的图鲁棒性。

英文摘要

Graph neural networks have achieved strong performance on graph-structured data, but their effectiveness depends heavily on the quality of the observed graph. In real applications, graph topology is often imperfect: noisy edges may connect unrelated nodes, while missing edges may prevent useful information from being propagated. Existing robust graph learning methods mainly address this problem by removing suspicious edges or by learning a new graph structure during training. However, edge removal alone cannot recover missing connections, and graph structure learning may introduce additional optimization complexity. In this paper, we propose Topology-Aware Gaussian Repair (TAGR), a simple graph repair framework for robust message passing in graph neural networks. Instead of learning a dense adjacency matrix, TAGR constructs a sparse feature-neighborhood graph using an adaptive Gaussian kernel and combines it with a topology-aware residual correction of the observed graph. The Gaussian repair component introduces auxiliary edges between feature-similar nodes, while the residual correction preserves and reweights the original topology according to local feature and structural consistency. The repaired graph can be used directly with standard graph neural networks without changing their architectures. Extensive experiments on benchmark citation networks show that TAGR improves the robustness of GNNs under both noisy-edge and missing-edge settings. The analysis further show that Gaussian feature-neighborhood repair provides the main robustness gain, while topology-aware residual correction improves stability when the observed graph is incomplete. These results suggest that effective graph robustness can be achieved through lightweight sparse graph repair rather than dense graph structure learning.

2606.03461 2026-06-03 cs.AI

What Makes Interaction Trajectories Effective for Training Terminal Agents?

什么使得交互轨迹对训练终端代理有效?

Sidi Yang, Chaofan Tao, Jierun Chen, Tiezheng Yu, Ruoyu Wang, Yuxin Jiang, Yiming Du, Wendong Xu, Jing Xiong, Taiqiang Wu, Lifeng Shang, Xiaohui Li, Ngai Wong, Haoli Bai

AI总结 本文通过Terminal-Lego流水线研究交互轨迹的教学效能,发现低分代理(DeepSeek-V3.2)的轨迹比高分代理(Claude Opus 4.6)更能提升学生泛化能力,归因于环境接地监督(EGS),并展示了极佳的数据效率。

详情
AI中文摘要

更强的代码代理通常被认为是训练后阶段的更优教师,然而这一假设尚未与任务难度、框架设计和学生能力充分解耦。我们使用Terminal-Lego(一个可扩展的流水线,将多领域现实问题转化为环境验证的代理任务)来研究这种教学联系。令人惊讶的是,独立表现并不能决定教学效能:尽管Claude Opus 4.6在Terminal-Bench 2.0上获得更高分数,但使用来自较低分代理DeepSeek-V3.2的轨迹进行微调的学生表现出显著更强的泛化能力。我们将这种“教学悖论”归因于环境接地监督(EGS):通过框架可见交互明确暴露“检查-行动-验证”行为的轨迹,使学生能够内化稳健的问题解决程序,而非脆弱的动作序列。扩展分析揭示了卓越的数据效率:例如,仅使用15.3k条Terminal-Lego轨迹,Qwen3-32B在Terminal-Bench 2.0上获得了24.3%的分数,与之前使用超过30倍数据量达到的最优性能相当。我们的结果表明,代理训练后的前沿不仅限于结果匹配,而是将焦点转向“框架工程”,其中环境接地交互结构的系统设计成为可复现和可泛化的代理智能的主要催化剂。

英文摘要

Stronger code agents are commonly assumed to be superior teachers for post-training, yet this assumption remains poorly disentangled from task difficulty, harness design, and student capacity. We investigate this pedagogical link using Terminal-Lego, a scalable pipeline that transforms multi-domain real-world issues into environment-verified agentic tasks. Surprisingly, standalone performance does not dictate teaching efficacy: while Claude Opus 4.6 achieves higher scores on Terminal-Bench 2.0, students fine-tuned on trajectories from DeepSeek-V3.2, a lower-scoring agent, exhibit significantly stronger generalization. We attribute this "pedagogical paradox" to Environment-Grounded Supervision (EGS): trajectories that explicitly expose inspect-act-verify behaviors through harness-visible interactions allow students to internalize robust problem-solving routines rather than fragile action sequences. Scaling analysis reveals exceptional data efficiency: with only 15.3k Terminal-Lego trajectories, for example, Qwen3-32B achieves a 24.3% score on Terminal-Bench 2.0, rivaling previous SOTA performance established with over 30x the data volume. Our results suggest that the frontier of agent post-training lies beyond mere outcome-matching, shifting the focus toward "Harness Engineering", where the systematic design of environment-grounded interaction structures serves as the primary catalyst for reproducible and generalizable agentic intelligence.

2606.03460 2026-06-03 cs.CV

From 3D Perception to Safety Reasoning: A Graph-Based Framework for Real-Time Underground Mine Monitoring

从3D感知到安全推理:基于图的实时地下矿井监控框架

Pasindu Ranasinghe, Simit Raval, Dibyayan Patra, Bikram Banerjee, Ismet Canbulat

AI总结 提出一个结合3D语义感知、不确定性异常检测、规则检查、设备端LLM推理和GraphRAG记忆分析的连续监控框架,通过场景图和时序图实现结构化安全推理,在115个危险场景中达到93%的覆盖率和92.7%的感知精度。

详情
AI中文摘要

地下煤矿开采要求人员和重型设备在共享、受限且照明不良的空间中作业,其中设备接近违规、结构不稳定和遮挡盲区等危险难以预测。传统监控系统(包括固定摄像头和基于规则的接近警报)可以检测预定义事件,但缺乏识别复杂或演变危险所需的3D场景理解和上下文记忆。本文提出一个连续监控框架,将彩色3D点云转换为结构化和可追溯的安全推理输出。该框架结合了3D语义感知、基于不确定性的异常检测、基于规则的危险检查、设备端LLM推理和基于GraphRAG的记忆分析,以识别即时危险并解释长期安全模式。场景图和时序图作为显式知识结构,连接推理阶段的感知输出。为克服标记地下数据的稀缺性,结合真实巷道扫描、受控物体放置和高保真长壁模拟生成多样化的危险场景,同时自监督预训练从有限标注中改进分割。感知模型在30 FPS下达到92.7%的准确率,内存使用低。在115个危险场景中,基于规则的检查覆盖率为57%,结合上下文LLM推理提高到76%,使用基于历史记录的记忆推理达到93%。定性结果表明,不确定性衍生的异常信号支持对超出预定义类别的分布外危险进行解释。总体而言,基于图的知识表示结合3D感知和分层安全推理,为地下矿井监控中的智能决策支持提供了实用基础。

英文摘要

Underground coal mining requires personnel and heavy equipment to operate within shared, confined, and poorly illuminated spaces where hazards such as equipment proximity violations, structural instabilities, and occluded blind spots are difficult to anticipate. Conventional monitoring systems, including fixed cameras and rule-based proximity alerts, can detect predefined events but lack the 3D scene understanding and contextual memory needed to identify complex or evolving hazards. This paper presents a continuous monitoring framework that converts colourised 3D point clouds into structured and traceable safety reasoning outputs. The framework combines 3D semantic perception, uncertainty-based anomaly detection, rule-based hazard checks, on-device LLM reasoning, and GraphRAG -based memory analysis to identify immediate hazards and interpret longer-term safety patterns. Scene and temporal graphs serve as the explicit knowledge structure, linking perception outputs across reasoning stages. To overcome the scarcity of labeled underground data, real roadway scans, controlled object placement, and high-fidelity longwall simulation were combined to generate diverse hazard scenarios, while self-supervised pretraining improved segmentation from limited annotations. The perception model achieved 92.7% accuracy at 30 FPS with low memory usage. Across 115 hazard scenarios, rule-based checks achieved 57% coverage, increasing to 76% with contextual LLM reasoning and 93% with memory-based reasoning using historical records. Qualitative results show uncertainty-derived anomaly signals support the interpretation of out-of-distribution hazards beyond predefined classes. Overall, graph-based knowledge representation combined with 3D perception and layered safety reasoning provides a practical foundation for intelligent decision support in underground mine monitoring.

2606.03459 2026-06-03 cs.SD cs.AI

Tonal parsimony in chord-sequence analysis: combining modulation cost and tonal vocabulary

和弦序列分析中的调性简约性:结合调制代价与调性词汇

François Pachet

AI总结 提出调性简约性方法,通过字典序最小化调制次数和不同调性数量,结合动态规划与固定24调性空间,在和弦序列分析中减少调性词汇并保持调制最优。

详情
Comments
20 pages, 1 figure
AI中文摘要

我们研究将局部调性分配给和弦序列,这一任务对和声分析、作曲和爵士即兴演奏很有用。标准的动态规划方法最小化调制,但可能引入不必要多的调性中心。我们将这种仅转移目标与纯最小词汇分析以及调性简约性进行比较,后者按字典序最小化调制次数,然后最小化不同调性的数量。尽管这个联合目标通常组合困难,但我们利用固定的24调性大调/小调宇宙给出了精确算法。在31,032个LMD和弦序列上,调性简约性在55.8%的情况下保持了转移最优,同时减少了调性词汇。在加权爵士替换闭包下,它将平均调性数从3.802降至3.206,调制次数从16.728降至12.141。在1,555个带注释的爵士标准曲上,它将兼容和弦-音阶一致性提高到95.6%,支持可处理的专业级和声分析。

英文摘要

We study the assignment of local tonalities to chord sequences, a task useful for harmonic analysis, composition, and jazz-oriented improvisation. Standard dynamic-programming approaches minimize modulations but can introduce unnecessarily many tonal centers. We compare this transition-only objective with pure minimum-vocabulary analysis and with tonal parsimony, which minimizes lexicographically the number of modulations and then the number of distinct tonalities. Although this joint objective is combinatorially hard in general, we give exact algorithms exploiting the fixed 24-tonality major/minor universe. On 31,032 LMD Chords sequences, tonal parsimony preserves the transition optimum while reducing tonal vocabulary in 55.8% of cases. With weighted jazz-substitution closure, it lowers mean tonalities from 3.802 to 3.206 and modulations from 16.728 to 12.141. On 1,555 annotated jazz standards, it improves compatible chord-scale agreement to 95.6%, supporting tractable professional-scale harmonic analysis.