arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.10892 2026-06-10 cs.CV cs.AI 新提交

Improving Text-Instance Alignment Of Foreground Conditioned Out-Painting Via Customized Concept Embedding

通过定制化概念嵌入改进前景条件外绘中的文本-实例对齐

Yihao Zhao, Xuan Han, Bin He, Mingyu You

AI总结 针对前景条件外绘中文本驱动方法产生的伪影问题,提出定制化概念嵌入扩散框架,通过实例感知损失和语义保持提示模板定制概念嵌入,显著减少伪影并提升图像质量。

详情
AI中文摘要

为了展示产品,商家通常需要花费大量成本制作高质量的展示图像。前景条件外绘(FCO)满足了这一需求,允许用户通过调整文本提示,以低成本为前景实例创建所需的背景。然而,现有的文本驱动FCO方法在其输出中存在关键缺陷,最明显的是伪影,即合成背景中与前景实例共享相同语义的区域。这种伪影降低了物体的显著性并降低了图像质量。我们将问题归因于给定实例与文本派生概念嵌入之间的不对齐。为了解决这个问题,我们提出了定制化概念嵌入扩散(CCE-Diffusion)框架。其核心是CCE模块,用于定制概念嵌入,弥合通用名词语义与特定视觉实例之间的差距。实例感知损失指导模块的优化,而语义保持提示模板防止定制化嵌入扭曲提示中的其他词。定性和定量评估均表明,CCE-Diffusion显著减少了输出中的伪影。作为即插即用组件,CCE模块可以集成到各种FCO方法中,提升其性能。

英文摘要

To showcase products, merchants often incur substantial costs creating high-quality display images. Foreground Conditioned Outpainting (FCO) meets this demand, allowing users to create desired backgrounds for foreground instances at a low cost by adjusting the text prompt. However, existing text-driven FCO methods exhibit critical flaws in their outputs, most notably the presence of artifacts, which refer to regions in the synthesized background that share the same semantics as the foreground instance. Such artifacts diminish the object's prominence and degrade image quality. We attribute the issue to the misalignment between the given instance and text-derived concept embeddings. To address this, we propose the Customized Concept Embedding Diffusion (CCE-Diffusion) framework. Its core is a CCE-Module to customize concept embeddings, bridging the gap between generic noun semantics and a specific visual instance. An Instance-Aware Loss guides the module's optimization, while a Semantic-Preserving Prompt Template prevents customized embeddings from distorting other words in the prompt. Both qualitative and quantitative evaluations demonstrate that CCE-Diffusion significantly reduces artifacts in the outputs. As a plug-and-play component, the CCE-Module can integrate with various FCO methods, enhancing their performance.

2606.10887 2026-06-10 cs.CV 新提交

Listen, Look, and Learn: Learning Without Forgetting through SAM-Audio

听、看、学:通过SAM-Audio实现无遗忘学习

Avi Gupta, Nilotpal Sinha, Vishnu Raj, Sambuddha Saha, Pratik Joshi, Koteswar Rao Jerripothula, Tammam Tillo

AI总结 提出一种利用SAM-Audio多模态先验的类增量学习方法,通过引导注意力机制和双层蒸馏策略,在音频-视觉场景中缓解灾难性遗忘,性能优于现有方法。

详情
AI中文摘要

类增量学习(CIL)旨在持续学习新类别而不遗忘先前获取的知识。尽管最近的CIL进展在各种模态中引起了显著兴趣,但音频-视觉设置仍未被充分探索。此外,尽管像SAM-Audio这样的基础多模态模型封装了丰富的静态先验,我们的实证分析表明,这些表示在增量设置中表现不佳。本文通过将SAM-Audio的音频-视觉先验整合到CIL设置中来弥合这一差距。具体来说,我们利用其密集的音频和视觉表示,并采用一种新颖的引导注意力策略,其中音频特征在上下文中引导视觉表示。为了进一步缓解灾难性遗忘,我们在特征和logit级别引入了双层蒸馏目标。在音频-视觉CIL基准上的广泛评估表明,我们的方法始终优于最先进的方法。

英文摘要

Class-Incremental Learning (CIL) aims to continuously learn new classes without forgetting previously acquired knowledge. While recent CIL advances have spurred significant interest across various modalities, the audio-visual setting remains underexplored. Furthermore, although foundational multimodal models like SAM-Audio encapsulate rich static priors, our empirical analysis reveals that these representations struggle in incremental settings. This work bridges this gap by integrating SAM-Audio's audio-visual priors into the CIL setting. Specifically, we leverage its dense audio and visual representations and employ a novel guided attention strategy where the audio features contextually guide the visual representations. To further mitigate catastrophic forgetting, we introduce dual-level distillation objectives at both the feature and logit levels. Extensive evaluations on audio-visual CIL benchmarks demonstrate that our approach consistently outperforms state-of-the-art methods.

2606.10881 2026-06-10 cs.AI 新提交

Large-scale semantic mapping of learner agency and autonomy reveals what measurement and generative AI research overlook

学习者能动性与自主性的大规模语义映射揭示测量与生成式AI研究的忽视

Fei Qin, Xiaobo Liu, Yaowen Zhang, Xuming Li, Fei Wang, Mutlu Cukurova, Jingjing Chen, Yu Zhang

AI总结 通过语义分析管道从超过14,000篇出版物中提取定义和量表项目,发现学习者能动性与自主性包含任务、个人和社会文化三个维度,现有量表忽视社会文化维度,且生成式AI研究过度聚焦学习调节与控制。

详情
Comments
45 pages, 12 figures, 1 table, including appendices
AI中文摘要

学习者能动性和自主性是个人发展的基础,然而普遍存在的“叮当谬误”(即相同术语指代不同构念,不同术语指代相同构念)严重阻碍了知识的积累。将意义视为通过语言实践中的使用构成的现象,我们从超过14,000篇出版物中提取了8,954个定义和2,700个量表项目,通过语义分析管道研究研究人员实际如何使用学习者能动性和自主性。这两个构念的定义景观解析为三个维度:学习的调节与控制(任务)、内在动机与内部决策(个人)以及社会关系行动(社会文化),从而经验性地量化了叮当谬误。然而,现有量表系统性地低估了社会文化维度。关键的是,当前教育领域的生成式AI研究集中于学习调节与控制,缩小了AI中介学习环境旨在培养的行为库。除了概念澄清外,这项工作对支持多维学习者能动性和自主性的概念化、测量和实践具有直接意义。

英文摘要

Learner agency and autonomy are foundational to personal development, yet a pervasive "jingle-jangle" fallacy (i.e. identical terms denoting different constructs, distinct terms denoting identical ones) has substantially hindered cumulative knowledge. Treating meaning as a phenomenon constituted through use in linguistic practice, we extracted 8,954 definitions and 2,700 scale items from over 14,000 publications, to investigate how researchers actually used learner agency and autonomy with a semantic analysis pipeline. The definitional landscape of two constructs resolves into three dimensions: regulation and control of learning (task), intrinsic motivation and internal decision-making (person), and social-relational action (sociocultural), thereby empirically quantifying the jingle-jangle fallacy. Existing scales, however, systematically underrepresent the sociocultural dimension. Critically, current generative AI research in education concentrates on learning regulation and control, narrowing the behavioral repertoire that AI-mediated learning environments are designed to cultivate. Beyond conceptual clarification, this work carries direct implications for conceptualization, measurement, and practice towards supporting the multidimensional learner agency and autonomy.

2606.10876 2026-06-10 cs.CV 新提交

Advancing Wood Identification in the Philippines: Utilizing the Xylorix Platform for Efficient AI Model Development and Deployment for Five Key Species

推进菲律宾木材识别:利用Xylorix平台高效开发和部署五种关键树种的AI模型

Rosalie C. Mendoza, Vivian C. Daracan, Arlene D. Romano, Ronniel D. Manalo, Xin Jie Tang, Yi Hong Wong, Yong Haur Tay

AI总结 本研究利用Xylorix平台,让无编程经验的木材科学家为五种菲律宾硬木开发并部署宏观木材识别AI模型,AUC达0.969-1.000,四种达AA级,证明非程序员可构建适合现场部署的可靠模型。

详情
AI中文摘要

非法采伐和木材贸易在菲律宾持续构成重大挑战,准确的木材物种识别对执法至关重要,但受限于专业设备和专业知识。本研究旨在评估木材科学家能否在没有编程专业知识的情况下,利用Xylorix平台开发和部署宏观木材识别的AI模型,聚焦五种菲律宾硬木:Mangium (Acacia mangium Willd.)、Rain Tree [Samanea saman (Jacq.) Merr.]、Banuyo (Wallaceodendron celebicum Koord.)、Tindalo [Afzelia rhomboidea (Blanco) Vidal] 和 Ipil [Intsia bijuga (Colebr.) O. Kuntze]。二元分类器使用来自260个标本的10,663张经过验证的横截面图像进行训练,并通过标本级平均评分进行评估,以模拟操作现场条件。ROC曲线下面积(AUC)值范围为0.969(Ipil)到1.000(Mangium),平均精度(AP)值范围为0.589(Samanea)到1.000(Mangium)。五个物种中有四个达到AA级(AUC和AP均≥0.90);Rain Tree获得AE级(AUC≥0.90,AP<0.60),原因是其正测试集较小(3个标本)导致AP压缩。所有五个分类器以近乎完美的保真度将目标标本排在非目标标本之上。标本级错误分析显示,Ipil有9个假阴性,主要源于局部图像伪影;Rain Tree有3个假阳性,Tindalo有1个假阳性,由共享的族级解剖特征引起。这些发现表明,Xylorix非程序员可以利用Xylorix平台构建操作可靠的木材识别模型,适用于供应链检查点的现场部署。

英文摘要

Illegal logging and timber trade continue to pose significant challenges in the Philippines, where accurate wood species identification is essential for enforcement but limited by the need for specialised equipment and expertise. This study aims to evaluate whether AI models for macroscopic wood identification can be developed and deployed by wood scientists without programming expertise using the Xylorix platform, focusing on five Philippine hardwood species: Mangium (Acacia mangium Willd.), Rain Tree [Samanea saman (Jacq.) Merr.], Banuyo (Wallaceodendron celebicum Koord.), Tindalo [Afzelia rhomboidea (Blanco) Vidal], and Ipil [Intsia bijuga (Colebr.) O. Kuntze]. Binary classifiers were trained on 10,663 verified cross-section images from 260 specimens and evaluated using specimen-level mean scoring to mirror operational field conditions. Area Under the ROC Curve (AUC) values ranged from 0.969 (Ipil) to 1.000 (Mangium), and Average Precision (AP) values ranged from 0.589 (Samanea) to 1.000 (Mangium). Four of five species achieved AA grade (AUC and AP both \geq 0.90); Rain Tree received AE (AUC \geq 0.90, AP < 0.60) due to AP compression from its small positive test set (3 specimens). All five classifiers rank their target specimens above non-target specimens with near-perfect fidelity. Specimen-level error analysis revealed 9 false negatives from Ipil, primarily stemming from localized image artifacts and 3 false positives for Rain Tree and 1 false positive for Tindalo caused by shared tribal-level anatomical traits. These findings demonstrate that Xylorix non-programmers can leverage the Xylorix platform to construct operationally reliable wood identification models suitable for field deployment at supply chain checkpoints.

2606.10856 2026-06-10 cs.RO 新提交

An Exposure-Time-Aligned Primary-Path Architecture for Autonomous-Driving ECUs

一种曝光时间对齐的主路径架构用于自动驾驶ECU

Toru Saito, Yuki Hagura, Tatsuya Konishi, Satoru Mizusawa, Takumi Yajima

AI总结 针对生产车辆从模块化多NN流水线向端到端自动驾驶过渡的需求,提出主路径、曝光时间对齐和共路径共存三项设计原则,在双SoC平台上实现平均296ms的延迟。

详情
AI中文摘要

虽然端到端(E2E)自动驾驶已成为主导研究方向,但在一个非平凡的过渡期内,量产车辆仍然依赖模块化的多NN流水线。本文的主题是设计一种架构,在此阶段支持模块化流水线和E2E路径并行,并嵌入一条用于分阶段迁移的路径。移植到量产SoC上,平等主义的后期融合计算效率低下,且没有自然单元用于分阶段的E2E替代。作为替代方案,我们提出三项设计原则:(i)主路径,明确选择一条主要感知链,并优先将其封装在单个SoC对中,而非关键路径;(ii)曝光时间对齐,将主传感器的曝光时间τ_exp作为标签沿链传播,并在匹配的τ_exp上事件驱动融合节点,而非固定周期;(iii)共路径共存,基于(i)和(ii),让E2E输出路径与模块化流水线在同一τ_exp周期内并行运行。在双SoC量产AD-ECU上,实现从相机快门到规划器输出的平均延迟为296毫秒,在350毫秒的设计预算内。在(iii)下,模块化流水线在生产启动时为主路径,E2E路径作为影子在实车上运行,随着评估证据的积累,E2E范围逐步扩大。

英文摘要

While end-to-end (E2E) autonomous driving has become the dominant research direction, production vehicles continue to rely on modular multi-NN pipelines for a non-trivial transitional period. The subject of this paper is the design of an architecture that, during this phase, supports a modular pipeline and an E2E path side by side and embeds a path for staged migration. Transplanted to a production SoC, egalitarian late fusion is compute-inefficient and offers no natural unit for staged E2E substitution. As an alternative, we propose three design principles: (i) Primary-Path, which explicitly selects a primary perception chain and prioritizes its enclosure within a single SoC pair over the non-critical paths (ii) Exposure-Time-Aligned, which propagates the primary sensor's exposure time $τ_{\rm exp}$ as a tag along the chain and event-drives the fusion node on matched $τ_{\rm exp}$ rather than a fixed cycle and (iii) Co-Path Coexistence, which, building on (i) and (ii), lets an E2E output path co-run with the modular pipeline within the same $τ_{\rm exp}$ cycle. On a Dual-SoC production AD-ECU, the implementation closes camera-shutter to planner-output latency at a mean of 296 ms within the 350 ms design budget. Under (iii), the modular pipeline is primary at production launch and the E2E path runs as shadow on real vehicles, and the E2E scope is expanded as evaluation evidence accumulates.

2606.10811 2026-06-10 cs.CV 新提交

Deep learning for echo sounder data

深度学习用于回声测深仪数据

Ketil Malde

AI总结 本文探讨深度学习在声学数据(如回声图)中的应用,指出由于声学数据特性,需开发专用方法而非简单复用图像处理模型,并强调缺乏标准数据集和格式是主要障碍。

详情
AI中文摘要

毫无疑问,在过去十年中,机器学习领域的技术已经彻底改变了我们处理和解释数据的方式,尤其是图像和文本。对于水下观测,声学是主要的信息来源,自然地,深度学习方法已被应用于回声图和其他声学数据,但迄今为止成果相当有限。在此,我们认为,由于声学数据的固有特性,重大进展可能需要研究超越简单复用图像处理模型和技术的深度学习方法。目前,方法开发的突破潜力受到缺乏标准数据格式和组织方式的阻碍,更甚的是缺乏具有既定性能目标的现成高质量数据集。为了推动该领域的发展,这些不足应得到纠正。

英文摘要

There is no doubt that over the last decade, techniques from the field of machine learning have revolutionized how we process and interpret data, especially images and text. For underwater observations acoustics is a primary source of information, and naturally, deep learning methods have been applied to echograms and other acoustics data, but so far with rather modest results. Here, we argue that due to intrinsic properties of acoustic data, substantial advances will likely require research into deep learning methods beyond mere recycling of models and techniques from image processing. Currently, the potential for breakthroughs in method development is hindered by the lack of standard data formats and organization, and even more by the lack of readily available, high quality data sets with established performance goals. To advance the field, these shortcomings should be remedied

2606.10798 2026-06-10 cs.LG 新提交

CITRAS-FM: Tiny Time Series Foundation Model for Covariate-Informed Zero-Shot Forecasting

CITRAS-FM: 面向协变量信息零样本预测的微型时间序列基础模型

Yosuke Yamaguchi, Issei Suemitsu, Yuki Kajihara, Wenpeng Wei

AI总结 提出CITRAS-FM,一个仅7M参数的时间序列基础模型,通过引入Shifted Attention和协变量合成方法CovSynth,实现高效零样本预测,在100个任务上达到子10M模型最优精度且CPU推理时间低于0.1秒。

详情
Comments
Accepted to EUSIPCO 2026
AI中文摘要

预训练的时间序列基础模型(TSFMs)已实现对未见目标序列的零样本预测。然而,现有TSFMs通常计算成本高,对多样变量类型的支持有限,且往往未能考虑外生影响目标变异的协变量。为解决这些挑战,我们提出CITRAS-FM,一个仅7M参数的微型TSFM,支持单变量、多变量和协变量信息零样本预测,并实现实时CPU推理。基于补丁化的仅解码器Transformer,CITRAS-FM在跨变量模块中引入Shifted Attention,以有效利用在整个预测范围内可获取的已知协变量。此外,为了在协变量丰富语料稀缺的情况下实现协变量感知预训练,我们提出CovSynth,从目标序列的分解成分中合成逼真的协变量。在fev-bench上的实验(涵盖不同设置下的100个任务)表明,CITRAS-FM在子10M TSFMs中实现了最先进的零样本精度,同时提供低于0.1秒的CPU推理,在预测精度和实时部署能力之间取得了强平衡。

英文摘要

Pretrained time series foundation models (TSFMs) have enabled zero-shot forecasting on unseen target series. However, existing TSFMs often incur high computational cost and provide limited support for diverse variable types, often failing to account for covariates that exogenously influence target variability. To address these challenges, we propose CITRAS-FM, a tiny 7M-parameter TSFM that supports univariate, multivariate, and covariate-informed zero-shot forecasting with real-time CPU inference. Built on a patch-based, decoder-only Transformer, CITRAS-FM introduces Shifted Attention into the cross-variate module to effectively exploit known covariates accessible throughout the forecast horizon. Moreover, to enable covariate-aware pretraining despite the scarcity of covariate-rich corpora, we propose CovSynth, which synthesizes realistic covariates from decomposed components of target series. Experiments on fev-bench, spanning 100 tasks across various settings, demonstrate that CITRAS-FM achieves state-of-the-art zero-shot accuracy among sub-10M TSFMs while delivering sub-0.1-second CPU inference, offering a strong balance between forecasting accuracy and real-time deployability.

2606.10789 2026-06-10 cs.LG 新提交

Closing the Modality Gap in Zero-Shot HAR: Contrastive Training and Separability-Optimized Prototypes on IMU Data

缩小零样本HAR中的模态差距:基于IMU数据的对比训练与可分性优化原型

Anik Ghosh

AI总结 针对IMU基零样本人体活动识别中的模态差距问题,提出对比训练与描述性原型结合的方法,在PAMAP2数据集上实现73.2%准确率和0.583宏F1,并指出宏F1更适合作为评估指标。

详情
Comments
17 pages, 7 figures
AI中文摘要

基于惯性测量单元(IMU)的人体活动识别(HAR)中的零样本学习(ZSL)面临一个核心挑战:弥合传感器嵌入与语义类表示之间的差距。我们在PAMAP2数据集上系统评估了三种推理方法与两种训练流程组合的七种配置,使用14个已知和4个未知活动类别,并保留受试者108和109用于测试。我们发现模态差距是一个由编码器目标决定的训练时现象。使用标签名称的Sentence-BERT原型进行交叉熵训练的时间卷积网络(TCN)产生的传感器嵌入与对应文本原型的平均余弦相似度为0.30,而将标签名称原型目标替换为判别性活动描述后,该值提升至0.69。这种对齐改进在所有三种推理方法中一致迁移。最强的结果结合了对比训练与反向softmax校正,在未知类别上达到73.2%的准确率和0.583的宏F1,而标签名称基线仅为58.3%准确率和0.34宏F1。另一个发现是,更丰富的文本描述降低了Sentence-BERT空间中原型间的可分性,因为共享的生物力学词汇导致语言模型压缩了原型云。只要原型描述保留足够的判别性词汇,这种效应不会抵消对比对齐的好处。我们还证明,当测试集类别分布不平衡时,总体准确率是一个误导性的主要指标,并推荐宏平均F1作为ZSL-HAR基准的标准报告指标。

英文摘要

Zero-shot learning (ZSL) for inertial measurement unit (IMU)-based human activity recognition (HAR) faces a central challenge: bridging the gap between sensor embeddings and semantic class representations. We systematically evaluate seven configurations combining three inference methods with two training pipelines on the PAMAP2 dataset, using 14 seen and 4 unseen activity classes with subjects 108 and 109 held out for testing. We find that the modality gap is a training-time phenomenon governed by the encoder objective. A temporal convolutional network (TCN) trained with cross-entropy over label-name Sentence- BERT prototypes yields sensor embeddings with a mean cosine similarity of 0.30 to the corresponding text prototypes, while replacing the label-name prototype targets with discriminative activity descriptions raises this to 0.69. This alignment improvement transfers consistently across all three inference methods. The strongest result combines contrastive training with inverted softmax correction, achieving 73.2% accuracy and 0.583 macro F1 on unseen classes, compared to 58.3% accuracy and 0.34 macro F1 for the label-name baseline. A secondary finding is that richer text descriptions reduce inter-prototype separability in Sentence-BERT space, because shared biomechanical vocabulary causes the language model to compress the prototype cloud. This effect does not negate the benefits of contrastive alignment provided prototype descriptions retain sufficient discriminative vocabulary. We also demonstrate that overall accuracy is a misleading primary metric when test-set class distributions are imbalanced, and recommend macro-averaged F1 as the standard reporting metric for ZSL-HAR benchmarks.

2606.10787 2026-06-10 cs.AI cs.LO 新提交

Accelerating NeurASP with vectorization and caching

通过向量化和缓存加速NeurASP

Alexander Philipp Rader, Alessandra Russo

AI总结 本文通过向量化、批处理和缓存中间计算,显著加速了神经符号框架NeurASP的训练,在大型任务上实现了多个数量级的提速。

详情
Comments
16 pages, 5 figures, to be published in the Theory and Practice of Logic Programming (TPLP) journal for the 42nd International Conference on Logic Programming (ICLP) issue
AI中文摘要

神经符号AI将神经网络与符号程序相结合,以创建鲁棒且可解释的预测。其中一个框架是NeurASP,它训练神经网络来预测概念,并使用答案集编程(ASP)编写的规则对这些概念进行推理,以解决下游任务。关键的是,标签仅由符号规则产生的下游预测提供,而不是潜在概念。通过不可微的ASP组件进行反向传播需要昂贵的概率和梯度计算,这阻碍了其扩展到更复杂的任务。在本文中,我们通过向量化、批处理和训练期间中间计算的缓存来改善NeurASP的计算性能,从而解决其当前局限性。我们比较了原始NeurASP和新实现的计算速度,并报告了在较大任务上多个数量级的加速。为此,我们提出了一个涉及扑克牌的困难任务新数据集,用于测试NeurASP增强学习功能的能力。

英文摘要

Neurosymbolic AI combines neural networks with symbolic programs to create robust and explainable predictions. One such framework is NeurASP, which trains a neural network to predict concepts and reasons over them using rules written in answer set programming (ASP) to solve downstream tasks. Crucially, labels are only provided for the downstream prediction produced by the symbolic rules, not for the latent concepts themselves.Backpropagation through the non-differentiable ASP component requires expensive probability and gradient calculations, which has hindered scalability to more sophisticated tasks.In this paper, we address the current limitations of NeurASP by improving its computational performance through vectorization, batch processing and caching of intermediate computations during training. We compare computation speeds between the original and our new implementation of NeurASP and report speedups of multiple orders of magnitude for larger tasks. To this end, we propose a new dataset of difficult tasks involving playing cards, which we use to test the capabilities of NeurASP's enhanced learning function.

2606.10774 2026-06-10 cs.LG cs.DC 新提交

Inverse Probability Weighting and Age-of-Information Aggregation for Decentralized Federated Learning under Partial Reception

部分接收下分散式联邦学习的逆概率加权与信息年龄聚合

Chanuka A. S. Hewa Kaluannakkage, Rajkumar Buyya

AI总结 针对无线网络下分散式联邦学习的选择偏差和更新过时问题,提出结合逆概率加权与信息年龄加权的DFL-AA方法,理论消除链路质量偏差,实验优于现有基线。

详情
Comments
14 pages, 8 figures, research paper for journal submission
AI中文摘要

在有损无线网络上的分散式联邦学习面临两个关键挑战:选择偏差,即由于部分模型接收,来自劣质链路的更新被系统性地低估;以及更新过时,即异步节点贡献过时信息。我们表明,使用局部填充重建的均匀八卦聚合会引入持久的链路质量诱导偏差,而基于完整性的加权进一步放大了这种效应。为了解决这些挑战,我们提出了DFL-AA(具有自适应AoI加权聚合的分散式联邦学习),它结合了逆概率加权与基于在线EWMA的信道估计来纠正选择偏差,以及基于信息年龄的加权来减轻过时,而无需全局同步。我们从理论上证明DFL-AA在期望上消除了链路质量失真,并通过实验证明在不同丢包率、网络规模和异构无线条件下,其性能持续优于最先进的基线。

英文摘要

Decentralized Federated Learning (DFL) over lossy wireless networks faces two key challenges: selection bias, where updates from poor-quality links are systematically underrepresented due to partial model reception, and update staleness, where asynchronous nodes contribute outdated information. We show that uniform gossip aggregation with local-fill reconstruction introduces persistent link-quality-induced bias, while completeness-based weighting further amplifies this effect. To address these challenges, we propose DFL-AA (Decentralized Federated Learning with Adaptive AoI-weighted Aggregation), which combines Inverse Probability Weighting with online EWMA-based channel estimation to correct selection bias and Age-of-Information-based weighting to mitigate staleness without requiring global synchronization. We theoretically show that DFL-AA removes link-quality distortion in expectation and experimentally demonstrate consistent improvements over state-of-the-art baselines across varying loss rates, network sizes, and heterogeneous wireless conditions.

2606.10765 2026-06-10 cs.CL 新提交

ArabiGEE: A Hierarchical Taxonomy for Arabic Grammatical Error Explanation

ArabiGEE:阿拉伯语语法错误解释的层次分类体系

Khaled Elhady, Omar Kallas, Nizar Habash, Bashar Alhafni

AI总结 提出首个基于显式错误类型的阿拉伯语语法错误解释层次分类体系,涵盖正字法、形态、句法和词汇四个维度,包含27种错误类型、140种修正类型和324种解释,并用于人工标注现有语料库以支持大语言模型的自动评估。

详情
AI中文摘要

我们介绍了ArabiGEE,这是首个基于显式错误类型的全面阿拉伯语语法错误解释(GEE)分类体系。与现有将解释生成视为自由形式文本的GEE方法不同,ArabiGEE通过涵盖正字法、形态、句法和词汇维度的层次结构组织语法解释。该分类体系包含27种错误类型、140种修正类型和324种相关解释。我们将ArabiGEE应用于人工标注现有阿拉伯语语法错误修正语料库的部分内容,并展示了结构化语法解释如何支持对大语言模型在阿拉伯语GEE上的自动评估。我们的代码和数据已公开。

英文摘要

We introduce ArabiGEE, the first comprehensive Arabic grammatical error explanation (GEE) taxonomy grounded in explicit error types. Unlike existing GEE approaches that treat explanation generation as free-form text, ArabiGEE organizes grammatical explanations through a hierarchical structure spanning orthographic, morphological, syntactic, and lexical dimensions. The taxonomy consists of 27 error types, 140 correction types, and 324 associated explanations. We apply ArabiGEE to manually annotate portions of existing Arabic grammatical error correction corpora and demonstrate how structured grammatical explanations can support automatic evaluation of LLMs on Arabic GEE. Our code and data are publicly available.

2606.10740 2026-06-10 cs.AI cs.CL cs.LG 新提交

When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models

当思维链更清楚时:多轮推理模型的失败模式

Sai Kartheek Reddy Kasu, Nils Lukas, Samuele Poppi

AI总结 提出CoT-Output 2x2安全矩阵诊断多轮推理模型隐藏的时间动态失败,发现监督悖论和上下文注入失败两种可复现漏洞。

详情
Comments
Accepted at the ICML 2026 FAGEN Workshop
AI中文摘要

多轮推理模型中的失败在终端评分评估中基本不可见。模型可能在长对话早期锁定不安全立场,但其最终轮拒绝率可能看起来与稳健对齐的基线无法区分。为了揭示这些隐藏的时间动态,我们提出了一种轨迹级诊断方法——CoT-Output 2x2安全矩阵。该框架沿两个独立轴(内部推理和可见输出)标记每一轮,产生四个操作定义的失败单元:稳健对齐、对齐伪装、显式越狱,以及我们称为上下文注入失败的不同失败模式(其中CoT保持安全推理,但可见输出产生危害,突出了多轮推理不忠实的表现)。我们在五个监督条件下针对固定攻击者评估了三个蒸馏推理目标,在信息危害场景上收集了6750个轮级观察。我们的分析揭示了两个可复现的漏洞:一个监督悖论,其中显式监控线索反而增加对齐伪装率而非抑制它;以及一个上下文注入失败,其中模型尽管内部状态安全却锁定不安全的外部输出。我们发布了多轮对话和CoT轨迹的完整数据集,以支持后续的轨迹诊断研究。

英文摘要

Failures in multi-turn reasoning models are largely invisible to terminal-score evaluation. A model can lock onto an unsafe stance early in a long dialogue, yet its final-turn refusal rate may appear indistinguishable from a robustly aligned baseline. To expose these hidden temporal dynamics, we propose a trace-level diagnostic - the CoT-Output 2x2 safety matrix. This framework labels every turn along two independent axes (internal reasoning and visible output), yielding four operationally defined failure cells: robust alignment, alignment faking, overt jailbreak, and a distinct failure mode we term context-injection failure (where the CoT maintains safe reasoning, but the visible output produces harm, highlighting a multi-turn manifestation of reasoning unfaithfulness). We evaluate three distilled reasoning targets against a fixed attacker across five oversight conditions, collecting 6750 turn-level observations on the Information-Hazard scenario. Our analysis reveals two reproducible vulnerabilities: an oversight paradox where explicit monitoring cues paradoxically increase alignment-faking rates rather than suppress them, and a context-injection failure where models lock onto unsafe external outputs despite safe internal states. We release the full dataset of multi-turn dialogues and CoT traces to support follow-up trace-diagnostic research.

2606.10705 2026-06-10 cs.LG cs.AI cs.SY eess.SY 新提交

Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication

事件驱动强化学习实现半导体制造中的长时域控制

Yavar Yeganeh, Mahsa Shekari, Nicla Frigerio, Daniele Pagano, Andrea Matta

AI总结 提出事件驱动深度强化学习框架,将半导体制造控制建模为中心化智能体问题,通过事件驱动时序差分方法优化多目标策略,在高保真仿真中显著提升吞吐量和利用率。

详情
AI中文摘要

强化学习有望优化大规模系统中的序贯决策。半导体制造系统是随机且高度约束的环境,其中异构晶圆在广泛的设备网络中经历数百个加工步骤。这些特性产生了复杂、高维的决策问题,具有延迟反馈和长时域要求,使生产计划和控制复杂化。我们提出了一个用于此规模的多目标策略优化的深度强化学习框架。具体来说,我们将控制表述为一个中心化智能体问题,其中核心策略协调系统范围的决策,而系统演化被表示为由离散事件驱动的互联时间过程。相应地,我们开发了一个定制的事件驱动时序差分公式,该公式保持通用性,并可在相关训练设置下与各种策略优化方法集成。我们研究了纳入该框架的几种核心无模型算法,并使用不同工业现实操作场景的高保真仿真评估其有效性。在广泛的验证实验中,在离线和在线设置下训练的智能体在吞吐量和利用率方面显示出显著且一致的提升。我们进一步评估了训练阶段的表现和泛化能力,阐明了替代强化学习公式和算法的相对优势。总体而言,结果支持所提出框架在控制事件驱动复杂自适应系统方面的可扩展性、通用性和可迁移性。

英文摘要

Reinforcement learning promises to optimize sequential decisions in large-scale systems. Semiconductor manufacturing systems are stochastic and highly constrained environments where heterogeneous wafers traverse hundreds of processing steps across extensive equipment networks. These characteristics yield complex, high-dimensional decision problems with delayed feedback and long-horizon requirements, complicating production planning and control. We propose a deep reinforcement learning framework for multi-objective policy optimization at this scale. Specifically, we formulate control as a centralized-agent problem, where a core policy coordinates system-wide decisions, while system evolution is represented as an interconnected temporal process driven by discrete events. Accordingly, we develop a tailored event-driven temporal-difference formulation that remains general and can be integrated with various policy optimization methods under relevant training settings. We investigate several core model-free algorithms incorporated into this framework and evaluate their effectiveness using high-fidelity simulations of diverse, industry-real operating scenarios. Across extensive validation experiments, agents trained in both offline and online settings show significant and consistent gains in throughput and utilization. We further evaluate performance and generalization across training phases, clarifying the relative strengths of alternative reinforcement learning formulations and algorithms. Overall, the results support the scalability, generality, and transferability of the proposed framework for controlling event-driven complex adaptive systems.

2606.10703 2026-06-10 cs.LG cs.CL 新提交

From Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts Models

从观察到干预:混合专家模型中专家重要性的因果审计

Leonard Engmann, Christian Medeiros Adriano, Holger Giese

AI总结 通过因果审计发现,混合专家模型中的路由统计指标无法预测专家重要性,现有剪枝方法的成功源于早期层冗余而非识别可删除专家。

详情
Comments
9 pages, 2 figures, 9 tables. Accepted at the ICML 2026 Workshop on Philosophy of Science Meets Machine Learning (PhilML). Non-archival
AI中文摘要

可解释性方法通常使用观察到的模型行为的总体统计量来推断特定计算的目标干预效果;用Pearl的术语来说,它们将第一层的关联证据视为支持第二层的干预结论,而这种做法的有效性很少被检验。我们考察了一个具体实例:混合专家(MoE)剪枝中路由统计量的使用,其中利用率、激活范数和路由权重分布被视为预测哪些专家可以被移除而不产生功能损失的指标。在三个高冗余MoE架构(OLMoE-1B-7B-0924、Qwen1.5-MoE-A2.7B、DeepSeek-V2-Lite)上进行的token级干预审计发现,经过多重比较校正后,没有任何观测指标能预测任何模型中的因果专家重要性,所有60个指标-层组合的效应量均低于Cohen's $d = 0.17$。通过每个token的路由权重控制排除了统计功效不足的问题,仅在OLMoE的最后一个MoE层恢复了一个Bonferroni显著的信号($d = +0.231$, $p = 0.0013$)。现有剪枝方法在此场景下的成功并非由于识别了可删除的专家,而是因为早期层的冗余使得大多数选择标准可互换。我们的结果提供了一个明确的反例,表明从总体观测统计量到关于专家重要性的token级干预推断这一常见推理步骤存在问题,并展示了干预审计如何校准可解释性主张的证据标准。

英文摘要

Interpretability methods routinely use population-level summary statistics over observed model behaviour to license claims about the effects of targeted interventions on specific computations; in Pearl's terms, they treat rung-1 associational evidence as if it supported rung-2 interventional conclusions, a move whose validity is rarely tested. We examine one concrete instance: the use of routing statistics in Mixture-of-Experts (MoE) pruning, where utilization rates, activation norms, and routing weight distributions are treated as predictors of which experts can be removed without functional cost. A token-level interventional audit across three high-redundancy MoE architectures (OLMoE-1B-7B-0924, Qwen1.5-MoE-A2.7B, DeepSeek-V2-Lite) finds no observational metric predicts causal expert importance after multiple-comparison correction in any model, with effect sizes below Cohen's $d = 0.17$ across all 60 metric-layer combinations. A per-token routing weight control rules out insufficient power, recovering a single Bonferroni-significant signal at OLMoE's final MoE layer ($d = +0.231$, $p = 0.0013$). Existing pruning methods succeed in this regime not by identifying dispensable experts but because early-layer redundancy renders most selection criteria interchangeable. Our results provide an explicit counterexample to the common inferential step from population-level observational summaries to token-level interventional claims about expert importance, and illustrate how interventional audits can calibrate the evidential standards for interpretability claims.

2606.10699 2026-06-10 cs.CV cs.AI 新提交

Using the YOLOv12 Model for Verifying the Correct Color Sequence of Wires in Network Cables (Patch Cords) on the Production Line

使用YOLOv12模型验证生产线上网线(跳线)中导线的正确颜色顺序

Amin Doroodchi, Danial Soleimany

AI总结 针对网线生产中导线颜色顺序检测问题,提出基于YOLOv12的目标检测模型,实现高精度实时验证,减少人工错误。

详情
AI中文摘要

在网络电缆的生产过程中,确保标准连接器内部线对的正确颜色顺序对电缆的最终性能至关重要,因为任何错位或颜色顺序错误都可能导致缺陷产品并造成巨大成本。基于数字显微镜目视检查的传统检测方法通常耗时、繁琐且容易出错。在本研究中,开发了一种基于第十二版YOLO目标检测模型的智能系统,用于识别跳线中导线的位置并验证其正确的颜色顺序。使用的数据集包括从网络连接器显微视图中捕获的2500张图像,其中70%用于训练,15%用于验证,15%用于测试。所提出的模型利用单阶段架构和学习过程中的注意力机制,实现了约98%精度的导线检测。此外,总体平均准确率、分类精度和召回率分别约为95%、99%和98%。结果表明,该系统能够在生产线上可靠地实时验证导线颜色顺序的正确性,无需人工干预,从而减少人为错误并提高制造效率。

英文摘要

In the production process of network cables, ensuring the correct color sequence of wire pairs inside the standard connector plays a critical role in the final performance of the cable, as any misplacement or color-ordering error can lead to defective products and impose significant costs. Traditional inspection methods based on visual examination through digital microscopes are typically time-consuming, tedious, and prone to human error. In this study, an intelligent system based on the twelfth version of the YOLO1 object detection model was developed to identify the position and verify the correct color sequence of wires in patch cords. The dataset used consisted of 2,500 images captured from microscopic views of network connectors, which were divided into 70% for training, 15% for validation, and 15% for testing. The proposed model, leveraging a single-stage architecture and attention mechanisms during learning, achieved highly accurate wire detection with approximately 98% precision. Additionally, the overall mean accuracy, classification precision, and recall were around 95%, 99%, and 98%, respectively. The results demonstrate that this system can reliably and in real time verify the correctness of wire color sequencing on the production line without the need for human intervention, thereby reducing human error and enhancing efficiency in the manufacturing process.

2606.10688 2026-06-10 cs.RO 新提交

Self-Supervised Relevance Modelling in Autonomous Driving via Counterfactual Analysis

自动驾驶中基于反事实分析的自监督相关性建模

Luca Lusvarghi, Javier Gozalvez, Pablo Urbano Hidalgo

AI总结 提出一种基于反事实分析的自监督方法,用于量化自动驾驶中物体的相关性,实现毫秒级实时估计,并生成相关性热图以辅助感知与规划。

详情
AI中文摘要

自动驾驶依赖于计算密集型的感知管线,以持续检测和跟踪周围环境中的物体。虽然某些物体对于规划安全有效的操作至关重要,但其他物体可能不相关,并且对自动驾驶车辆的驾驶决策没有影响。关注相关物体可以更有效地利用可用计算资源,减少处理延迟,并限制感知噪声的下游传播。在这项工作中,我们提出了一种基于反事实分析的新型自监督方法,以开发相关性模型——一种基于AI的工具,用于量化物体对自动驾驶车辆的相关性。为了展示所提出方法的潜力,我们在选定城市场景中生成的合成因果数据集上训练了相关性模型。结果表明,该相关性模型能够以毫秒级延迟准确估计物体的相关性,从而在高密度场景中实现实时相关性估计。我们还展示了该相关性模型可用于构建相关性热图,为自动驾驶车辆的驾驶策略提供有价值的见解,并可用于主动通知感知和规划任务。我们公开发布了相关性模型和因果数据集。

英文摘要

Autonomous driving relies on computationally intensive perception pipelines to continuously detect and track objects in the surrounding environment. While some objects are key to plan safe and effective maneuvers, others may not be relevant and have no impact on the autonomous vehicle's driving decisions. Focusing on relevant objects allows a more efficient usage of available computational resources, reduces processing latencies, and limits the downstream propagation of perception noise. In this work, we propose a novel self-supervised approach based on counterfactual analysis to develop a relevance model - an AI-based tool that quantifies the relevance of objects for an autonomous vehicle. To demonstrate the potential of the proposed approach, we train a relevance model on a synthetic causal dataset generated in a selected urban scenario. Results show that the relevance model is able to accurately estimate the objects' relevance with millisecond-level latency, enabling real-time relevance estimation also in high-density scenarios. We also show that the relevance model can be used to build relevance heatmaps that offer valuable insights into the autonomous vehicle's driving policy and can be used to proactively inform perception and planning tasks. We openly release both the relevance model and the causal dataset.

2606.10669 2026-06-10 cs.LG cs.AI cs.CR 新提交

In Defense of Information Leakage in Concept-based Models

为基于概念模型中的信息泄露辩护

Mateo Espinosa Zarlenga

AI总结 本文重新审视基于概念模型中的信息泄露问题,提出良性泄露概念,通过优化训练目标,在概念不完整时利用泄露提升准确性和可干预性。

详情
Comments
Accepted as a position paper at the Forty-Third International Conference on Machine Learning (ICML 2026)
AI中文摘要

基于概念的模型(CMs)是深度神经网络,其预测基于与人类可理解概念(如“圆形”、“条纹”等)对齐的表示。已有研究表明,这些模型会学习到泄露概念无关信息的表示。传统观点认为,这种泄露是不可取的,应予以消除,因为它会导致模型不可解释。在本文中,我们认为这种关于CMs中泄露的传统观点不仅是不恰当的(因为泄露如何使模型更不可解释的证据往往不明确),而且在常见的现实约束下必然导致不实用的CMs。具体来说,我们认为在概念不完整是常态的现实环境中,为了构建准确且可干预的CMs,某种程度的泄露往往是必要的。为此,我们提出存在所谓的良性泄露,并表明通过重新优化典型的CM训练目标,CMs可以鼓励并利用这种形式的泄露,而不会牺牲准确性或可干预性。

英文摘要

Concept-based models (CMs), deep neural networks that ground their predictions on representations aligned with human-understandable concepts (e.g., "round", "stripes", etc.), have been shown to learn representations that leak concept-irrelevant information. As the traditional narrative goes, this leakage is undesirable and should be eradicated as it leads to uninterpretable models. In this paper, we posit that this conventional view of leakage in CMs is not only ill-posed, as the evidence of how leakage makes a model less interpretable is often inconclusive, but also bound to lead to impractical CMs under common real-world constraints. Specifically, we argue that in real-world settings where concept incompleteness is the norm, some leakage is often necessary for constructing accurate and intervenable CMs. To this end, we propose that there is such a thing as benign leakage and show that, by optimizing a reframing of the typical CM training objective, CMs can encourage and exploit this form of leakage without sacrificing accuracy or intervenability.

2606.10658 2026-06-10 cs.CR cs.AI cs.CE q-fin.CP 新提交

Post-Quantum Secure Federated DeFi for Inclusive Banking

面向普惠银行的后量子安全联邦DeFi

Swati Sachan, Dale Fickett, Richard Buchinger, Theo Miller

AI总结 提出后量子安全联邦DeFi框架,利用格基全同态加密和NASA-IBM地理空间基础模型,实现银行间加密协作以提升信用不足个体的金融普惠性。

详情
AI中文摘要

近期纠错量子比特的进展加速了实用量子计算的时间表,这对用于保护金融系统、政府基础设施、通信网络和DeFi(去中心化金融)生态系统的密码原语构成威胁。本文提出一个后量子安全的联邦DeFi框架,支持银行间协作,以改善因有限金融历史而受到当地贷款机构服务不足的个体的普惠性。多家银行将加密信息批次贡献给一个虚拟服务器,其中基于格的完全同态加密(FHE)实现了端到端的同态计算。服务器以加密格式融合本地数据驱动的概率评估、专家信念以及由NASA-IBM Prithvi地理空间基础模型(GFM)生成的可验证证据。采用去中心化技术确保机构与服务器之间所有加密数据交换的防篡改证据和可审计问责性。该框架在弗吉尼亚州农村借款人的农业贷款决策上进行了测试。

英文摘要

Recent advances in error-corrected qubits have accelerated the timeline for practical quantum computing. It poses a threat to cryptographic primitives used to secure financial systems, government infrastructure, communication networks, and DeFi (Decentralized Finance) ecosystems. This paper introduces a post-quantum secure federated DeFi framework that enables inter-bank collaboration to improve the inclusivity of individuals underserved by local lenders due to limited financial histories. Multiple banks contribute encrypted information batches to a virtual server, where lattice-based Fully Homomorphic Encryption (FHE) enables end-to-end homomorphic computation. The server fuses local data-driven probabilistic assessments, expert beliefs, and verifiable evidence generated by the NASA-IBM Prithvi Geospatial Foundation Model (GFM), in encrypted format. Decentralized technologies are employed to ensure tamper-proof evidence and auditable accountability for all encrypted data exchanges between institutions and the server. The framework is tested on agricultural lending decisions for rural borrowers in Virginia.

2606.10620 2026-06-10 cs.CV cs.AI 新提交

Can Image Models Imagine Time? ImageTime: A Novel Benchmark for Probing Visual World Modeling Through Spatiotemporal Consistency

图像模型能想象时间吗?ImageTime:通过时空一致性探究视觉世界建模的新基准

Xinrui Wu, Lichen Huang

AI总结 提出ImageTime基准,通过四关键帧协议(初始状态、动作开始、过渡状态、最终状态)评估图像生成模型在时空一致性上的表现,揭示模型在维持连贯视觉世界状态方面的能力与不足。

详情
AI中文摘要

图像生成模型现在能够生成高质量的静态图像,但它们表示视觉世界随时间变化的能力仍然知之甚少。实际工作流程如故事板、逐步插图、参考引导编辑和视频预可视化要求模型在多个视觉状态之间保持身份、对象、空间关系和因果顺序。现有评估主要衡量单图像正确性、组合对齐或视频质量,而未明确图像模型是否能连贯地想象一个时间有序的过程。我们引入ImageTime,一个诊断基准,使用时空一致性作为图像生成中视觉世界建模的行为探针。给定一个动作指令,以及可选地指定初始状态的参考图像,模型必须生成一张包含四个有序关键状态的图像:初始状态、动作开始、过渡状态和最终状态。这个四关键帧协议比单图像生成在时间上要求更高,同时避免了密集视频动态的混淆。ImageTime通过渐进能力层次组织任务,并将每个场景分解为阶段状态谓词、跨帧时间约束和禁止的因果违规。GPT-5.5在结构化的VLM-as-judge协议下对所有生成的图像进行评分,产生可解释的能力分数、诊断子分数和失败标签。通过多家族基准测试,ImageTime揭示了当前图像生成系统在要求随时间维持连贯视觉世界状态时成功、失败和漂移的地方。

英文摘要

Image generation models now produce high-quality static images, yet their ability to represent how a visual world changes over time remains poorly understood. Practical workflows such as storyboarding, step-by-step illustration, reference-guided editing, and video previsualization require models to preserve identities, objects, spatial relations, and causal order across multiple visual states. Existing evaluations largely measure single-image correctness, compositional alignment, or video quality, leaving open whether an image model can coherently imagine a temporally ordered process. We introduce ImageTime, a diagnostic benchmark that uses spatiotemporal consistency as a behavioral probe of visual world modeling in image generation. Given an action instruction, and optionally a reference image specifying the initial state, a model must generate one image containing four ordered key states: initial state, action onset, transition state, and final state. This four-keyframe protocol is more temporally demanding than single-image generation while avoiding the confounds of dense video dynamics. ImageTime organizes tasks with a progressive capability hierarchy and decomposes each scenario into stage-wise state predicates, cross-frame temporal constraints, and forbidden causal violations. GPT-5.5 scores all generated images under a structured VLM-as-judge protocol, producing interpretable capability scores, diagnostic subscores, and failure labels. Through multi-family benchmarking, ImageTime reveals where current image generation systems succeed, fail, and drift when asked to maintain coherent visual world states over time.

2606.10617 2026-06-10 cs.CV 新提交

SSR-Merge: Subspace Signal Routing for Training-Free LoRA Merging in Diffusion Models

SSR-Merge: 面向扩散模型中免训练的LoRA合并的子空间信号路由

Zhengxuan Wei, Yi Dong, Zonghui Li, Xianhui Lin, Xing Liu, Hong Gu, Shaofeng Zhang, Wenbin Li, Qi Fan

AI总结 提出子空间信号路由(SSR)方法,通过沿秩维度拼接LoRA构建统一子空间,利用逆相关矩阵去相关和方向引导矩阵分离信号,解决参数合并中的干扰问题,理论证明其等价于OLS最优解,并设计流式算法降低开销。

详情
Comments
Accepted at ICML 2026
AI中文摘要

低秩适应(LoRA)合并可以有效地将来自多个训练好的LoRA的不同生成能力组合到扩散模型中。然而,现有的LoRA合并技术常常遭受严重的参数干扰,导致共享参数空间中的破坏性冲突。为了解决这个问题,我们提出了子空间信号路由(SSR),它通过路由内部信号而不是执行参数空间合并来解决干扰。具体来说,SSR首先通过沿秩维度拼接候选LoRA来构建一个统一的子空间。接下来,SSR使用逆相关矩阵对该空间内的混合信号进行去相关。最后,一个方向引导矩阵将这些净化后的信号引导到各自的任务特定子空间。我们提供了严格的理论分析,证明SSR与普通最小二乘(OLS)解一致,从而确保数学最优性。我们利用充分统计量的可加性设计了一个流式算法。这使得能够进行即时更新,显著减少内存开销和计算时间。大量实验验证了SSR在保持相当效率的同时显著优于最先进的方法。代码可在该https URL获取。

英文摘要

Low-Rank Adaptation (LoRA) merging can efficiently combine diverse generative capabilities from multiple trained LoRAs for a diffusion model. However, existing LoRA merging techniques often suffer from severe parameter interference, causing destructive collisions in the shared parameter space. To address this, we propose Subspace Signal Routing (SSR), which resolves interference by routing internal signals instead of performing parameter-space merge. Specifically, SSR first constructs a unified subspace by concatenating candidate LoRAs along the rank dimension. Next, SSR employs an inverse correlation matrix to decorrelate mixed signals within this space. Finally, a directional guide matrix steers these purified signals into their respective task-specific subspaces. We provide a rigorous theoretical analysis proving that SSR aligns with the Ordinary Least Squares (OLS) solution, thereby ensuring mathematical optimality. We utilize the additivity of sufficient statistics to design a streaming algorithm. This enables on-the-fly updates that significantly reduce memory overhead and computation time. Extensive experiments validate that SSR significantly outperforms state-of-the-art methods while maintaining comparable efficiency. Code is available at https://github.com/nagara214/SSR-Merge.

2606.10614 2026-06-10 cs.RO cs.CV cs.LG 新提交

Dexterous Point Policy: Learning Point-based Dexterous Hand Policies from Human Demonstrations

灵巧点策略:从人类演示中学习基于点的灵巧手策略

Beomjun Kim, Seong Hyeon Park, Seunghoon Sim, Seungjun Moon, Sanghyeok Lee, Jinwoo Shin

AI总结 提出Dexterous Point Policy框架,通过统一3D关键点表示从人类视频学习灵巧操作策略,无需机器人演示,在真实任务中达到75%成功率。

详情
AI中文摘要

基于人类演示视频预训练的机器人基础模型显示出潜力,但当策略部署到真实机器人时仍存在显著的具身差距。常见的补救措施是在机器人特定演示上微调这些模型。然而,机器人数据收集可能过于昂贵和耗时,这在灵巧操作中尤为突出,例如,即使是单个原子任务,遥操作多指手也可能需要数天。为了解决这个问题,我们引入了Dexterous Point Policy,一个直接从人类视频学习灵巧操作策略且无需机器人演示的框架。我们的核心见解是,统一的3D关键点表示在用于观察和动作时,可以桥接人类和机器人的具身。具体来说,我们从原始视频中提取任务相关物体和人类手的3D关键点,并训练一个自回归变换器来处理这些关键点。我们观察到,在关键点层面,特别是手腕和指尖,人类和机器人的行为紧密对齐,从而实现直接策略迁移。在一套包括拾取放置和工具使用的真实机器人任务中,Dexterous Point Policy达到了75.0%的成功率,而最先进的VLA基线仅达到1.0%。此外,我们的方法对未见过的场景具有很强的泛化能力,包括多物体环境和新型物体类别。

英文摘要

Robotic foundation models pre-trained on human demonstration videos have shown promise, but a significant embodiment gap remains when the resulting policies are deployed on real robots. A common remedy is to fine-tune these models on robot-specific demonstrations. However, robot data collection can be prohibitively expensive and time-consuming, which is particularly acute in dexterous manipulation, e.g., teleoperating a multi-fingered hand for even a single atomic task can take days. To address this, we introduce Dexterous Point Policy, a framework that learns dexterous manipulation policies directly from human videos and requires no robot demonstrations. Our core insight is that a unified 3D keypoint representation can bridge human and robot embodiments when used for both observations and actions. Specifically, we extract 3D keypoints of task-relevant objects and human hands from raw videos, and train an autoregressive transformer over these keypoints. We observe that at the keypoint level, specifically the wrist and fingertips, human and robot behaviors closely align, enabling direct policy transfer. On a suite of real-robot tasks spanning pick-and-place and tool use, Dexterous Point Policy attains 75.0% success, whereas a state-of-the-art VLA baseline reaches only 1.0%. Furthermore, our method generalizes strongly to unseen scenarios, including multi-object environments and novel object categories.

2606.10613 2026-06-10 cs.LG cs.AI 新提交

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

基于自举流Q学习的离线强化学习快速且高表达性策略学习

Thanh Nguyen, Tri Ton, Hongbin Choe, Tung M. Luu, Chang D. Yoo

AI总结 提出自举流Q学习(BFQ),通过分治位移向量并自举短程分量,实现单步动作生成,无需辅助网络或蒸馏,显著降低计算成本并提升性能。

详情
Journal ref
ICML 2026
Comments
ICML 2026, 19 pages
AI中文摘要

基于扩散的Q学习已成为离线强化学习的一种强大范式,但其对多步去噪的依赖使得训练和推理在计算上昂贵且脆弱。最近将扩散Q学习加速到单步动作生成的努力通常引入辅助网络、策略蒸馏或多阶段训练,这常常损害简单性、稳定性或性能。为解决这些限制,我们引入了自举流Q学习(BFQ),一种新颖的框架,能够在训练和推理期间实现精确的单步动作生成,无需辅助网络或蒸馏过程。BFQ采用分治视角处理沿流路径的位移向量:它首先学习可以从流匹配边际速度准确估计的短程位移,然后自举这些分量以直接学习单步噪声到动作的映射。这种公式消除了多步去噪,导致学习过程更快、更简单、更稳健。广泛的D4RL评估表明,与多步扩散基线相比,BFQ在显著降低计算成本的同时提高了性能,证明了单步动作生成足以实现高性能的离线强化学习。

英文摘要

Diffusion-based Q-learning has emerged as a powerful paradigm for offline reinforcement learning, but its reliance on multi-step denoising makes both training and inference computationally expensive and brittle. Recent efforts to accelerate diffusion Q-learning toward single-step action generation typically introduce auxiliary networks, policy distillation, or multi-phase training, which frequently compromise simplicity, stability, or performance. To address these limitations, we introduce Bootstrapped Flow Q-Learning (BFQ), a novel framework that enables accurate single-step action generation during both training and inference, without auxiliary networks or distillation procedures. BFQ adopts a divide-and-conquer view of the displacement vector along the flow path: it begins by learning short-range displacements that can be accurately estimated from the Flow Matching marginal velocity, and bootstraps these components to directly learn a noise-to-action mapping in a single step. This formulation eliminates multi-step denoising, resulting in a learning procedure that is substantially faster, simpler, and more robust. Extensive D4RL evaluations show that BFQ improves performance while significantly reducing computational cost compared to multi-step diffusion baselines, demonstrating that single-step action generation suffices for high-performance offline Reinforcement Learning.

2606.10612 2026-06-10 cs.CV 新提交

GaussTrace: Provenance Analysis of 3D Gaussian Splatting Models with Evidence-based LLM Reasoning

GaussTrace:基于证据的LLM推理的3D高斯泼溅模型溯源分析

Haoliang Han, Ziyuan Luo, Renjie Wan

AI总结 提出GaussTrace框架,通过属性统计分析和假设驱动的编辑模拟,结合大语言模型链式推理,构建3D高斯泼溅模型的有向溯源图,无需训练或编辑历史。

详情
Comments
Accepted by ICML2026
AI中文摘要

3D高斯泼溅(3DGS)是一种创建高保真3D资产的有力技术。然而,3DGS模型在数字平台上的广泛共享和迭代修改给知识产权保护和取证溯源带来了紧迫挑战。为此,我们提出GaussTrace,一种用于构建3DGS模型有向溯源图的新框架。GaussTrace将溯源分析表述为基于证据的推理问题。它基于3DGS参数的属性统计特征来捕捉内在属性。此外,我们引入常见操作的假设驱动编辑模拟,为可能的变换路径提供辅助证据。这些统计和模拟线索共同使大语言模型(LLM)能够执行结构化思维链(CoT)推理,产生方向性溯源推断和可解释的边原因。实验结果表明,GaussTrace有效构建了不同3DGS模型之间的演化关系,无需模型训练或访问编辑历史,即可提供准确、可解释且鲁棒的溯源图。项目页面:此https URL。

英文摘要

3D Gaussian Splatting (3DGS) is a powerful technique for creating high-fidelity 3D assets. However, the widespread sharing and iterative modification of 3DGS models across digital platforms create pressing challenges for intellectual property protection and forensic traceability. To address this, we propose GaussTrace, a novel framework for constructing directed provenance graphs for 3DGS models. GaussTrace formulates provenance analysis as an evidence-based reasoning problem. It builds upon attribute-wise statistical profiling of 3DGS parameters to capture intrinsic properties. Moreover, we introduce hypothesis-driven editing simulations of common operations to provide auxiliary evidence for plausible transformation pathways. These statistical and simulated cues jointly enable a Large Language Model (LLM) to perform structured Chain-of-Thought (CoT) reasoning, yielding directional provenance inferences and explainable edge reasons. Experimental results demonstrate that GaussTrace effectively constructs evolutionary relationships among diverse 3DGS models, delivering accurate, interpretable, and robust provenance graphs without requiring model training or access to editing histories. Project page: https://haolianghan.github.io/GaussTrace.

2606.10596 2026-06-10 cs.LG cs.AI cs.SY eess.SY 新提交

Embedding Hybrid Systems into Continuous Latent Vector Fields

将混合系统嵌入连续潜在向量场

Sangli Teng, Hang Liu, Koushil Sreenath

AI总结 证明当m>2n时,n维混合系统可嵌入m维欧氏空间中的连续向量场,并基于此提出一种潜在神经ODE方法,从时间序列数据中准确恢复混合系统流,优于现有方法。

详情
Comments
Accepted to ICML 2026
AI中文摘要

这项工作证明了当$m>2n$时,一个$n$维混合系统可以嵌入到一个$m$维欧氏空间中,并在其嵌入图像上配备一个连续向量场。这一结果表明,一个本质上不连续的混合系统通常允许一个连续的 extrinsic 表示,该表示对于可微优化是适定的。基于这一存在性定理,我们表明,在潜在空间和状态空间中都具有一致性损失的潜在神经ODE可以准确恢复混合系统的流。大量实验表明,所提出的方法在仅从时间序列数据学习具有不同几何形状的混合系统方面优于现有方法。

英文摘要

This work proves that an $n$-dimensional hybrid system can be embedded into an $m$-dimensional Euclidean space equipped with a continuous vector field on its embedded image whenever $m>2n$. This result suggests that an intrinsically discontinuous hybrid system generically admits a continuous extrinsic representation that is well-posed for differentiable optimization. Building on this existence theorem, we show that a latent Neural ODE with consistency loss in both the latent and state space can accurately recover the flow of hybrid systems. Extensive experiments suggest the proposed method outperforms the existing method in learning hybrid systems with varying geometries from only time series data.

2606.10587 2026-06-10 cs.LG cs.AI 新提交

Towards Diverse Scientific Hypothesis Search with Large Language Models

面向多样化科学假设搜索的大语言模型

Haorui Wang, Parshin Shojaee, Kazem Meidani, Kunyang Sun, José Miguel Hernández-Lobato, Teresa Head-Gordon, Jiajun He, Chandan K. Reddy, Chao Zhang, Yuanqi Du

AI总结 针对科学假设搜索中多样性崩溃问题,提出基于并行回火的多温度进化框架,在固定验证预算下提升假设质量与多样性。

详情
Comments
ICML 2026
AI中文摘要

大语言模型(LLMs)在加速科学发现方面日益崛起,最近在生成有效科学假设等高级任务中表现突出。然而,在许多发现场景中,目标并非识别单一最佳假设,因为验证可能噪声大且成本高,科学家受益于一组高质量替代假设,以对冲下游不确定性,寻求最佳解决方案。尽管如此,常用的进化搜索策略在假设生成中往往优先优化而非探索,搜索过程中的选择压力导致多样性崩溃。受这些局限性的启发,我们将假设搜索表述为采样问题,目标是在固定验证预算下高效生成多样化、高质量的假设。基于这一视角,我们提出\ours,一种受经典并行回火算法启发的进化框架,在多个温度水平下搜索假设,并实现跨温度的原则性信息交换,以在不干扰收敛的情况下改善探索。在分子发现、方程发现和算法发现等领域,我们的方法在相同验证预算下持续提升假设质量和多样性,生成的候选假设在更昂贵的下游计算验证中仍保持稳健。

英文摘要

Large language models (LLMs) are on the rise for accelerating scientific discovery, most recently in advanced tasks such as generating valid scientific hypotheses. Yet in many discovery settings, the goal is not to identify a single best hypothesis since validation can be noisy and expensive, and scientists benefit from a set of high-quality alternative hypotheses that hedge against downstream uncertainty for the best solutions. Nevertheless, commonly used evolutionary search recipes tend to prioritize optimization over exploration in hypothesis generation, and the resulting selection pressure during the search process leads to diversity collapse. Motivated by these limitations, we formulate hypothesis search as a sampling problem, where the objective is to efficiently produce diverse, high-quality hypotheses under a fixed validation budget. Building on this perspective, we propose \ours, an evolutionary framework inspired by the classical parallel tempering algorithm that searches hypotheses at multiple temperature levels and enables principled information exchange across temperatures to improve exploration without disrupting convergence. Across domains including molecular discovery, equation discovery, and algorithm discovery, our approach consistently improves both hypothesis quality and diversity under the same validation budget, and produces candidates that remain robust under more expensive downstream computational validations.

2606.10565 2026-06-10 cs.SD eess.AS 新提交

A Lightweight Dual-Factor Acoustic Authentication System via Cascaded GMM-DTW Architecture for Edge Computing

一种基于级联GMM-DTW架构的轻量级双因素声学认证系统用于边缘计算

Yutong Zhang

AI总结 针对资源受限的边缘环境,提出一种轻量级级联GMM-DTW双因素语音锁系统,通过共享MFCC特征空间实现顺序防御,结合动态联合绝对-相对边界约束,在低功耗边缘节点上实现低延迟和高安全性。

详情
AI中文摘要

本文提出了一种轻量级、级联GMM-DTW双因素语音锁系统,适用于资源受限的边缘环境。通过利用共享的MFCC特征空间,该框架实现了结合GMM说话人筛选和DTW口令验证的顺序防御机制。为了在不增加额外硬件的情况下应对呈现攻击,在GMM分类空间中引入了动态联合绝对-相对边界约束,将物理冒名顶替者和高保真重放攻击的误接受率(FAR)分别限制在2.73%和6.67%,合法用户的误拒绝率(FRR)为16.67%。由于Sakoe-Chiba窗口优化,在时间压力下,全局端到端处理延迟在单核CPU上严格限制为9.82ms,其中特征提取1.51ms,GMM评分0.54ms,最坏情况DTW匹配7.77ms。这些经验基准证明了白盒声学级联在低功耗边缘节点上实现安全、确定性实时部署的可行性。

英文摘要

This paper presents a lightweight, cascaded GMM-DTW dual-factor voice lock system for resource-constrained edge environments. By utilizing a shared MFCC feature space, the framework implements a sequential defense mechanism combining GMM speaker screening and DTW passphrase verification. To counter presentation threats without extra hardware, a dynamic joint absolute-relative margin constraint is integrated into the GMM classification space, limiting the physical imposter and high-fidelity replay attack False Acceptance Rates (FAR) to 2.73% and 6.67%, respectively, with a legitimate False Rejection Rate (FRR) of 16.67%. Due to Sakoe-Chiba window optimization, the global end-to-end processing latency under temporal stress is rigidly bounded at 9.82ms on a single-core CPU, comprising 1.51ms for feature extraction, 0.54ms for GMM scoring, and 7.77ms for worst-case DTW matching. These empirical benchmarks demonstrate the viability of white-box acoustic cascades for secure, deterministic real-time deployment on low-power edge nodes.

2606.10531 2026-06-10 cs.CL cs.AI 新提交

LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization

LC-QAT: 通过线性约束向量量化实现LLM的数据高效2比特QAT

Haoyu Wang, Xingyu Yu, Haiyan Zhao, Fengxiang Wang, Xu Han

AI总结 提出LC-QAT,一种2比特权重量化的向量量化感知训练框架,通过可微的线性映射避免离散码本查找,实现高质量PTQ初始化和端到端优化,仅用0.1%-10%训练数据即超越现有方法。

详情
Comments
Accepted by ICML 2026
AI中文摘要

量化感知训练(QAT)对于极低比特大语言模型(LLMs)至关重要。当前的QAT方法主要基于标量量化(SQ),虽然能高效优化,但在2比特精度下性能严重下降。另一方面,向量量化(VQ)提供了更高的表示能力,但其离散码本查找阻碍了端到端训练。我们提出LC-QAT,一种2比特权重量化的VQ-QAT框架,通过离散向量上的学习仿射映射表示量化权重,从而在训练前向传播中无需显式码本查找即可实现高质量PTQ初始化和完全可微的端到端优化。这种强大的训练后初始化使LC-QAT具有高度数据效率。在多种LLM上的实验表明,LC-QAT在使用仅0.1%-10%训练数据的情况下,始终优于最先进的QAT方法。我们的结果确立了LC-QAT作为极低比特模型部署的实用且可扩展的解决方案。

英文摘要

Quantization-aware training (QAT) is essential for extremely low-bit large language models (LLMs). Current QAT methods are mainly based on scalar quantization (SQ), which enables efficient optimization but suffers from severe performance degradation at 2-bit precision. On the other hand, vector quantization (VQ) provides substantially higher representational capacity, but its discrete codebook lookup prevents end-to-end training. We propose LC-QAT, a 2-bit weight-only VQ-QAT framework that represents quantized weights via a learned affine mapping over discrete vectors, which yields a high-quality PTQ initialization and enables fully differentiable end-to-end optimization without explicit codebook lookup in the training forward pass. This strong post-training initialization makes LC-QAT highly data-efficient. Experiments across diverse LLMs demonstrate that LC-QAT consistently outperforms state-of-the-art QAT methods while using only 0.1%--10% of the training data. Our results establish LC-QAT as a practical and scalable solution for extreme low-bit model deployment.

2606.10520 2026-06-10 cs.CL 新提交

UniSVQ: 2-bit Unified Scalar-Vector Quantization

UniSVQ: 2比特统一标量-向量量化

Haoyu Wang, Haiyan Zhao, Xingyu Yu, Zhangyang Yao, Xu Han, Zhiyuan Liu, Maosong Sun

AI总结 提出UniSVQ,通过将码字参数化为整数格点的仿射变换,统一标量和向量量化,实现2比特量化下性能优于标量量化、媲美向量量化,且推理吞吐更高。

详情
Comments
Accepted by ICML 2026
AI中文摘要

2比特级别的训练后量化使得大型语言模型(LLMs)能够实现低成本部署和推理加速。标量量化(SQ)和向量量化(VQ)是两种主要的量化方法,然而前者遭受显著的性能下降,后者则带来计算和存储开销。我们提出UniSVQ,一个统一的2比特量化框架,通过将码字参数化为整数格点的仿射变换,桥接了标量和向量量化。这种结构保持了与优化整数内核的兼容性,同时保留了VQ的许多灵活性。我们进一步引入了一种数据驱动的块级微调策略,以直接最小化量化重建误差。在多个LLM家族和零样本基准上的大量实验表明,UniSVQ持续优于最先进的SQ方法,并实现了与高级VQ方法相当的性能,同时提供更高的推理吞吐量。

英文摘要

Post-training quantization at the 2-bit level enables low-cost deployment and inference acceleration for large language models (LLMs). Scalar quantization (SQ) and vector quantization (VQ) are two primary quantization methods, however, the former suffers from significant performance degradation, and the latter incurs computational and storage overhead. We propose UniSVQ, a unified 2-bit quantization framework that bridges scalar and vector quantization by parameterizing codewords as an affine transform of integer lattices. This structure preserves compatibility with optimized integer kernels while retaining much of VQ's flexibility. We further introduce a data-driven block-wise fine-tuning strategy to directly minimize quantization reconstruction error. Extensive experiments across multiple LLM families and zero-shot benchmarks demonstrate that UniSVQ consistently outperforms state-of-the-art SQ methods and achieves performance comparable to advanced VQ methods, while providing higher inference throughput.

2606.10504 2026-06-10 cs.AI 新提交

Cross-Modal Knowledge Distillation without Paired Data: Theoretical Foundation and Algorithm

无配对数据的跨模态知识蒸馏:理论基础与算法

Trong Khiem Tran, Anh Duc Chu, Quang Hung Pham, Phi Le Nguyen, Trong Nghia Hoang

AI总结 提出无配对数据下的跨模态知识蒸馏框架,通过特征对齐和标签对齐两种分布对齐机制,实现跨模态知识迁移,理论保证且实验效果显著。

详情
AI中文摘要

跨模态知识蒸馏(CMKD)研究如何利用在一种数据类型(如图像)上训练的大型教师模型来指导基于另一种数据类型(如文本/音频)的较小学生模型。现有的CMKD方法通常需要具有对齐语义的配对多模态数据,但获取此类配对数据往往成本高昂且不切实际。为缓解这一限制,我们针对更困难的设置——无配对数据——开发了一种新的CMKD框架。特别地,我们建立了教师模型与学生模型之间的跨模态分布关系,揭示了控制有效蒸馏的两个基本量:特征对齐和标签对齐。这些量分别从表示和预测分布层面表征了模态间的语义差异。受此启发,我们提出了一个具有理论保证的原则性框架,通过对齐分布而非单个样本实现有效的跨模态知识蒸馏。在广泛的多模态基准上的大量实验表明,我们的框架在无配对和有配对数据设置中均非常有效,显著优于先前的工作。

英文摘要

Cross-modal knowledge distillation (CMKD) studies how a (large) teacher model trained on one type of data (e.g., images) can guide a (smaller) student model building on another type of data (e.g., text/audio). Existing CMKD methods often require paired multi-modal data with aligned semantics, but obtaining such paired data are often costly and impractical. To mitigate this limitation, we develop a new CMKD framework for the more challenging setting where paired data are unavailable. In particular, we establish a cross-modal distributional relationship between teacher and student models, which reveals two fundamental quantities governing effective distillation: feature alignment and label alignment. These quantities characterize semantic discrepancy between modalities at the levels of representation and prediction distributions, respectively. Motivated by this insight, we propose a principled framework, with theoretical guarantees, that enables effective cross-modal knowledge distillation by aligning distributions rather than individual samples. Extensive experiments across a wide range of multimodal benchmarks show that our framework is highly effective in both unpaired and paired data settings, improving significantly over prior work.

2606.10500 2026-06-10 cs.AI 新提交

A Reliable Fault Diagnosis Method Based on Belief Rule Base Consider Robustness Analysis

一种考虑鲁棒性分析的基于置信规则库的可靠故障诊断方法

Mingyuan Liu, Dan Yin, Zongzong Wu

AI总结 针对故障诊断中传感器读数可靠性问题,提出一种基于置信规则库的可靠故障诊断方法,通过鲁棒性分析与优化策略提升模型准确性和鲁棒性,在柴油机和轴承故障诊断中验证有效性。

详情
AI中文摘要

在设备运行中,实施故障诊断对于确保生产设备的连续性和安全性、提高运行效率以及降低维护成本至关重要。由于传感器读数广泛用于故障诊断,其可靠性直接影响故障诊断的结果。针对故障诊断模型的鲁棒性评估和鲁棒性优化两个问题,提出了一种新的故障诊断方法。为此,提出了一种考虑鲁棒性分析的基于置信规则库(BRB)的可靠故障诊断方法。首先,系统地对BRB模型进行鲁棒性分析。其次,提出了三种鲁棒性约束策略来优化BRB故障诊断模型的鲁棒性。最后,以WD615柴油机和凯斯西储大学轴承的故障诊断为例,验证了所提模型的有效性,实验表明所提模型在准确性和鲁棒性上均有提升。

英文摘要

In equipment operation, the implementation of fault diagnosis is essential to ensure the continuity and safety of production equipment, improve operational efficiency and reduce maintenance costs. Since sensor readings are widely used for fault diagnosis, their reliability directly affects the results of fault diagnosis. A new fault diagnosis method is proposed to address the two problems of robustness assessment and robustness optimization of fault diagnosis models. For this purpose, a reliable fault diagnosis method based on a belief rule base (BRB) considering robustness analysis is proposed. Firstly, the robustness analysis of the BRB model is carried out systematically. Secondly, three robustness constraint strategies are proposed to optimize the robustness of the BRB fault diagnosis model. Finally, the effectiveness of the proposed model is verified by taking the fault diagnosis of WD615 diesel engine and Case Western Reserve University bearings as an example, and the experiments show that the proposed model improves both accuracy and robustness.