arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2367
2605.29765 2026-05-29 cs.LG

MMTM: Tri-Modal Topic Modeling for Long-Form Video via Similarity-Gated Fusion

MMTM: 基于相似性门控融合的长视频三模态主题建模

Ali Abusaleh, Bhuvanesh Verma, Alexander Mehler

发表机构 * Text Technology Lab (TTLab), Goethe University Frankfurt(文本技术实验室(TTLab),法兰克福歌德大学)

AI总结 提出MMTM模块化流水线,通过相似性门控融合集成语音识别、音频和视觉嵌入及BERTopic聚类,在长视频主题发现中显著提升主题质量。

Comments Submitted to EMNLP 2026

详情
AI中文摘要

我们介绍了MMTM,一个用于长视频主题发现的模块化流水线,它通过确定性相似性门控融合集成了语音识别、音频和视觉嵌入以及BERTopic聚类。在德语(Tagesschau)和英语(NBC)广播新闻上进行跨语言评估,联合三模态建模显著提高了主题质量:噪声从0.27降至0.06,转换率从0.70降至0.21,归一化熵从0.84升至0.92,表明主题更加连贯且时间稳定。聚类有效性(Calinski-Harabasz)在嵌入空间上提高了5-12倍。词汇连贯性(NPMI)在德语上从0.77升至0.86,但依赖于语料库,并未迁移到较短的NBC广播中。我们发布了流水线代码和一个经过人工验证的54小时多模态视频主题语料库,包含双标注者视觉评估和LLM辅助标注。

英文摘要

We introduce MMTM, a modular pipeline for topic discovery in long-form video that integrates speech recognition, audio and visual embeddings, and BERTopic clustering through a deterministic similarity-gated fusion. Evaluated cross-lingually on German (Tagesschau) and English (NBC) broadcast news, joint tri-modal modeling substantially improves topic quality: noise drops from 0.27 to 0.06, transition rate from 0.70 to 0.21, and normalized entropy rises from 0.84 to 0.92, indicating more coherent and temporally stable topics. Cluster validity (Calinski-Harabasz) improves by 5-12X across embedding spaces. Lexical coherence (NPMI) rises from 0.77 to 0.86 on German but is corpus-dependent and does not transfer to the shorter NBC broadcasts. We release the pipeline code and a human-validated 54-hour multimodal video topic corpus with dual-annotator visual evaluation and LLM-assisted labeling.

2605.29762 2026-05-29 cs.CV

GeoMag: Geometric-Aware Video Motion Magnification via State Space Model

GeoMag: 基于状态空间模型的几何感知视频运动放大

Kecheng Han, Yuchen Zhang, Bingqing Liu, Boqiang Guo, Wenbin Zheng, Shiyuan Pei

发表机构 * School of Software Engineering, Xi'an Jiaotong University(西安交通大学软件工程学院) Xi'an Jiaotong University(西安交通大学)

AI总结 提出GeoMag框架,利用状态空间模型实现全局一致的运动放大,并构建Geo-200K数据集提升训练多样性,在视觉保真度和计算效率上优于现有方法。

Comments ICME 2026 Spotlight

详情
AI中文摘要

视频运动放大(VMM)揭示了不可感知的动态,但在复杂几何变换下常常遭受结构不一致的问题。现有的基于学习的方法通常面临CNN的有限全局上下文与Transformer的高计算成本之间的权衡。此外,当前的训练协议主要由简单的线性运动主导,未能捕捉真实世界视频中遇到的几何和成像复杂性。为了解决这些问题,我们提出了GeoMag,一个基于状态空间模型的几何感知VMM框架,以实现具有线性复杂度的全局一致运动放大。我们进一步构建了Geo-200K,一个大规模合成数据集,引入了丰富的几何变换以及传感器真实的退化,提高了训练信号的多样性和真实性。在合成和真实世界基准上的大量实验表明,GeoMag在视觉保真度和计算效率上始终优于先前的方法,同时产生更少的伪影和更好的结构一致性。

英文摘要

Video Motion Magnification (VMM) reveals imperceptible dynamics but often suffers from structural inconsistencies under complex geometric transformations. Existing learning-based methods generally face a trade-off between the limited global context of CNNs and the high computational cost of Transformers. In addition, current training protocols, largely dominated by simple linear motion, fail to capture the geometric and imaging complexities encountered in real-world videos. To address these issues, we propose GeoMag, a geometric-aware VMM framework built upon State Space Models to achieve globally consistent motion amplification with linear complexity. We further construct Geo-200K, a large-scale synthetic dataset that introduces rich geometric transformations together with sensor-realistic degradations, improving the diversity and realism of training signals. Extensive experiments on synthetic and real-world benchmarks show that GeoMag consistently outperforms prior methods in visual fidelity and computational efficiency, while producing fewer artifacts and better structural consistency.

2605.29761 2026-05-29 cs.CV cs.CG

S2MDF: A Plug-And-Play Layer for Intersection-Free Multi-Object Signed Distance Fields

S2MDF:用于无交叉多物体有符号距离场的即插即用层

Deniz Sayin Mercadier, Federico Stella, Aurel Bizeau, Nicolas Talabot, Pascal Fua

发表机构 * CVLab, Ecole Polytechnique Fédérale de Lausanne (EPFL)(计算机视觉实验室,瑞士联邦理工学院(EPFL))

AI总结 提出S2MDF模块,通过硬约束强制向量值有符号距离场避免物体间几何交叉,无需修改网络架构,在训练或后处理中均可使用,显著减少交叉至数值精度且保持重建质量。

详情
AI中文摘要

组合隐式表面表示将场景建模为物体集合,每个物体由有符号距离场(SDF)编码。该方法的一个基本限制是多个SDF可能产生相互穿透的几何形状,违反物理合理性。现有的缓解策略依赖于软惩罚项,这些项减少但不能消除交叉,并且需要仔细的损失加权。为了真正防止相互穿透,我们提出了对向量值SDF的硬约束,并引入了S2MDF,一个轻量级的即插即用模块,无需架构修改即可对任何物体组合SDF表示施加约束。它引入可忽略的计算开销,并与线性插值的标准网格化算法(如Marching Cubes)兼容。它可以在训练期间或作为后处理步骤应用。在多种最先进的组合方法上的实验表明,S2MDF将交叉减少到数值精度,同时保持重建质量,优于现有的缓解策略。

英文摘要

Compositional implicit surface representations model scenes as collections of objects, each encoded by a Signed Distance Field (SDF). A fundamental limitation of this approach is that multiple SDFs can produce geometries that interpenetrate, violating physical plausibility. Existing mitigation strategies rely on soft penalty terms that reduce but do not eliminate intersections, and require careful loss weighting. To truly prevent interpenetration, we propose a hard constraint on vector-valued SDFs and introduce S2MDF, a lightweight plug-and-play module that enforces the constraint on any object-compositional SDF representation without architectural modifications. It introduces negligible computational overhead and is compatible with linearly-interpolated standard meshing algorithms such as Marching Cubes. It can be applied during training or as a post-processing step. Experiments on multiple state-of-the-art compositional methods show that S2MDF reduces intersections to numerical precision while preserving reconstruction quality, outperforming existing mitigation strategies.

2605.29756 2026-05-29 cs.AI

LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs

LFQ:面向提升低比特量化LLM生成质量的逻辑感知最终块量化

Jung Hyun Lee, June Yong Yang, Jungwook Choi, Eunho Yang

发表机构 * Kim Jaechul Graduate School of AI, KAIST, Daejeon, South Korea(韩国科学技术院人工智能研究生院) LG AI Research, Seoul, South Korea(LG人工智能研究) Hanyang University, Seoul, South Korea(翰阳大学)

AI总结 针对低比特量化LLM在生成任务中质量下降的问题,提出通过最小化FP模型与量化模型在最终Transformer块上的logits交叉熵来优化量化,从而提升复杂生成任务的准确性。

Comments Accepted to ICML 2026

详情
AI中文摘要

随着大语言模型规模的持续扩大,低比特权重的训练后量化(PTQ)为其内存高效部署提供了实用解决方案。尽管分块PTQ在基本语言建模和理解任务上能够匹配全精度(FP)基线,但其在生成任务(尤其是长响应和扩展思维链,这对提升任务准确性至关重要)上的质量有所下降。我们将这一不足归因于两个因素:(i) 分块优化中忽略了反嵌入层(LM头),以及(ii) 对均方误差(MSE)目标的依赖。这两个因素导致量化模型的令牌概率分布与FP模型不一致,从而在文本生成基准上产生显著的准确性下降。为纠正这一偏差,我们引入了逻辑感知最终块量化(LFQ),这是对分块PTQ的一种简单而有效的增强,通过最小化FP模型与其量化对应模型在logits上的交叉熵来量化最终Transformer块。通过在最终块中在logit级别对齐令牌概率,LFQ在不同模型家族中持续提升了复杂生成任务的准确性,优于最先进的分块PTQ,同时在语言建模和理解任务上保持与FP基线相当的性能。

英文摘要

As large language models continue to scale, low-bit weight-only post-training quantization (PTQ) offers a practical solution to their memory-efficient deployment. Although block-wise PTQ is capable of matching the full-precision (FP) baseline on basic language modeling and understanding, its quality is degraded for generative tasks -- especially at longer responses and extended chains of thought, which is critical in boosting task accuracy. We attribute this shortfall to two factors: (i) the omission of the unembedding layer (the LM head) in block-wise optimization and (ii) the reliance on the mean squared error (MSE) objective. Both factors cause the token probability distribution of the quantized model to misalign with that of the FP model, yielding notable accuracy drops on text generation benchmarks. To rectify the discrepancy, we introduce Logit-aware Final-block Quantization (LFQ), a simple yet effective enhancement to block-wise PTQ that quantizes the final Transformer block by minimizing the cross-entropy between the logits of the FP model and those of its quantized counterpart. By aligning token probabilities at the logit level in the final block, LFQ consistently improves the accuracy of complex generation tasks over state-of-the-art block-wise PTQ across diverse model families, while maintaining parity with FP baselines on language modeling and understanding.

2605.29754 2026-05-29 cs.AI

Benchmarking Positional Encoding Strategies for Transformer-Based EEG Foundation Models

基于Transformer的脑电图基础模型位置编码策略基准测试

Ayse Betul Yuce, Sebastian Stober

发表机构 * Department of Computer Science, Otto von Guericke University(奥托·冯·格里克大学计算机科学系)

AI总结 本研究在CBraMod骨干网络中基准测试五种位置编码策略,通过线性探测和微调协议评估运动想象分类和情感识别任务,发现最优策略具有任务依赖性。

详情
AI中文摘要

脑电图(EEG)是一种广泛使用的非侵入性技术,用于测量脑机接口(BCI)应用中的大脑活动。监督式EEG解码模型通常难以跨任务、受试者和数据集泛化,这促使了基于Transformer的EEG基础模型通过自监督学习进行训练。由于Transformer是排列不变的,它们需要显式的位置信息。与文本标记不同,EEG电极在头皮上空间分布,这引发了如何在基于Transformer的EEG模型中编码电极位置的问题。在本研究中,我们在CBraMod骨干网络中基准测试了五种位置编码策略,并在运动想象分类和情感识别任务上通过线性探测和微调协议进行评估。我们的结果表明,没有单一策略能在所有任务中持续表现优异。球形位置编码(SPE)为运动想象生成了强大的表示,但在情感识别上表现不佳,而非对称条件位置编码(ACPE)在任务间表现更为一致。这些发现表明,最优位置编码策略具有任务依赖性,在EEG解码场景中没有通用解决方案。

英文摘要

Electroencephalography (EEG) is a widely used non-invasive technique for measuring brain activity in brain-computer interface (BCI) applications. Supervised EEG decoding models often struggle to generalize across tasks, subjects, and datasets, motivating transformer-based EEG foundation models trained with self-supervised learning. Since transformers are permutation-invariant, they require explicit positional information. Unlike textual tokens, EEG electrodes are spatially distributed across the scalp, raising the question of how electrode positions should be encoded in transformer-based EEG models. In this study, we benchmark five positional encoding strategies within the CBraMod backbone and evaluate them under linear probing and fine-tuning protocols on motor imagery classification and emotion recognition. Our results show that no single strategy consistently outperforms across tasks. Spherical Positional Encoding (SPE) yields strong representations for motor imagery but underperforms on emotion recognition, while Asymmetric Conditional Positional Encoding (ACPE) demonstrates more consistent performance across tasks. These findings suggest that the optimal positional encoding strategy is task-dependent, with no universal solution across EEG decoding scenarios.

2605.29744 2026-05-29 cs.AI cs.CL cs.LG cs.MA

Why Specialist Models Still Matter: A Heterogeneous Multi-Agent Paradigm for Medical Artificial Intelligence

为什么专家模型仍然重要:面向医学人工智能的异构多智能体范式

Yanan Wang, Shuaicong Hu, Jian Liu, Guohui Zhou, Aiguo Wang, Cuiwei Yang

发表机构 * Anthropic AI

AI总结 提出HetMedAgent异构多智能体框架,通过冲突感知证据融合、不确定性驱动的临床医生干预触发和自适应阈值校准,实现通用大语言模型与领域专家模型的协同,在三个临床决策任务中验证了专家模型在模态特定分析中的不可替代价值。

Comments Accepted at ICML 2026. 12 pages main text, 16 pages appendix

详情
AI中文摘要

GPT和Claude等通用大语言模型在医疗保健领域的出色表现引发了一个关键问题:特定领域的医学专家模型是否会变得过时?我们认为,医学人工智能的未来不在于构建单一的医学基础模型,也不在于取代人类专业知识,而在于协调通用大语言模型、领域特定专家模型和临床医生之间的协作。我们提出HetMedAgent,一个异构医学多智能体框架,能够实现冲突感知证据融合、基于不确定性的临床医生干预触发和自适应阈值校准。在三个真实世界临床决策任务上的实验表明,通用大语言模型与领域特定专家模型之间的协同显著优于单独使用任一类型模型,验证了专家模型在模态特定分析中的不可替代价值。HetMedAgent代表了从构建医学大语言模型或基础模型向多智能体协作的转变,实现了通用推理能力与领域特定精度之间的平衡。

英文摘要

The impressive performance of generalist large language models (LLMs) such as GPT and Claude in healthcare raises a critical question: will domain-specific medical specialist models become obsolete? We argue that the future of medical artificial intelligence (AI) lies not in building monolithic medical foundation models, nor in replacing human expertise, but in orchestrating collaboration among generalist LLMs, domain-specific specialist models, and clinicians. We propose HetMedAgent, a heterogeneous medical multi-agent framework that enables conflict-aware evidence fusion, uncertainty-based clinician intervention triggering, and adaptive threshold calibration. Experiments on three real-world clinical decision-making tasks demonstrate that the synergy between generalist LLMs and domain-specific specialist models significantly outperforms using either type of model alone, validating the irreplaceable value of specialist models in modality-specific analysis. HetMedAgent represents a shift from building medical LLMs or foundation models to multi-agent collaboration, achieving a balance between general reasoning capabilities and domain-specific precision.

2605.29742 2026-05-29 cs.AI

Citation-Closure Retrieval and Per-Rule Attribution for Real-World Regulatory Compliance Question Answering

引用闭包检索与逐规则归因:面向真实世界法规合规问答

Yeong-Joon Ju, Seong-Whan Lee

发表机构 * Department of Artificial Intelligence, Korea University(韩国大学人工智能系)

AI总结 针对法规合规问答中多层级权威结构的引用追踪难题,提出基于操作知识图谱的基准RegOps-Bench和统一框架RefWalk,通过共享主题锚点遍历跨文档引用、多视角候选融合及逐规则归因,显著提升检索召回率和引用准确性。

Comments Under Review

详情
AI中文摘要

将大型语言模型(LLM)部署于法规合规领域,要求通过跨多层权威结构的全面引用来实现严格的追溯性。与传统多跳或法律问答不同,该任务需要结构化的程序性查找和证据集闭包,而非实体解析或判例推理。现有的RAG系统由于扁平化的引用边、碎片化的检索扩展以及脆弱的后期归因而难以胜任。我们通过RegOps-Bench将法规合规问答形式化,这是一个新颖的基准,包含从复杂的国家研发法规中导出的操作知识图谱。为解决这些瓶颈,我们提出了RefWalk,一个由共享主题锚点驱动的统一框架。RefWalk遍历跨文档引用,通过基于最大值的聚合融合多视角候选,并强制执行逐规则归因,以明确地将声明映射到来源。我们建立了一个强大的基线,在检索召回率和引用准确性方面取得了显著改进。最后,在美国健康合规数据集(HIPAA)上的对比评估显示,现有系统在扁平结构规则上表现饱和,凸显了RegOps-Bench的必要性。我们的代码可在https://github.com/yeongjoonJu/RefWalk获取。

英文摘要

Deploying Large Language Models (LLMs) for regulatory compliance demands rigorous traceability via comprehensive citations across multi-tiered authority structures. Unlike traditional multi-hop or legal QA, this task requires structured procedural lookups and evidence-set closure rather than entity resolution or case-law reasoning. Existing RAG systems struggle here due to flattened citation edges, fragmented retrieval expansions, and fragile post-hoc attribution. We formalize Regulatory Compliance QA with RegOps-Bench, a novel benchmark featuring an Operational Knowledge Graph derived from complex national R\&D regulations. To address these bottlenecks, we propose RefWalk, a unified framework driven by a shared topic anchor. RefWalk traverses cross-document citations, fuses multi-view candidates via max-based aggregation, and enforces per-rule attribution to explicitly map claims to sources. We establish a strong baseline with substantial improvements in retrieval recall and citation accuracy. Finally, a contrastive evaluation on a U.S. health compliance dataset (HIPAA) reveals that existing systems exhibit saturation on flat-structure rules, underscoring the need for RegOps-Bench. Our code is available at https://github.com/yeongjoonJu/RefWalk.

2605.29741 2026-05-29 cs.CL

AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation

AfriScience-MT:通过文本翻译实现非洲科学去殖民化

Idris Abdulmumin, Tajuddeen Gwadabe, Shamsuddeen Hassan Muhammad, David Ifeoluwa Adelani, Nomonde Khalo, Ibrahim Said Ahmad, Abiodun Modupe, Anina Mumm, Sibusiso Biyela, Michelle Rabie, Johanna Havemann, Marek Rei, Jade Abbott, Vukosi Marivate

发表机构 * Data Science for Social Impact, University of Pretoria(数据科学与社会影响,南非比勒陀利亚大学) Masakhane Research Foundation(马萨克纳研究基金会) Imperial College London(伦敦帝国理工学院) Mila, McGill University(麦吉尔大学Mila实验室) Canada CIFAR AI Chair(加拿大CIFAR人工智能主席) University of Cape Town(开普敦大学) University of Wisconsin - Stevens Point(威斯康星大学斯蒂文斯点分校) Independent Consultant(独立顾问) University of South Africa(南非大学) Independent Researcher(独立研究员) Access 2 Perspectives Lelapa AI(Lelapa人工智能)

AI总结 针对非洲语言缺乏科学术语的问题,构建包含6种非洲语言、11个科学领域的平行语料库AfriScience-MT,并评估机器翻译和大型语言模型在零样本、少样本和微调设置下的性能。

详情
AI中文摘要

殖民语言在非洲教育和科学传播中的主导地位限制了数亿非洲语言使用者获取和产生科学知识的能力。一个核心障碍是这些语言缺乏既定的科学术语。我们引入了AfriScience-MT,这是一个涵盖六种非洲语言(阿姆哈拉语、豪萨语、卢干达语、北索托语、约鲁巴语和祖鲁语)和11个科学领域的平行语料库。专业翻译人员与科学传播专家合作,将科学论文的通俗语言摘要翻译成每种目标语言,并在没有现成术语的地方创建新术语。我们在零样本、少样本和微调设置下对机器翻译系统和大型语言模型进行了基准测试。结果表明,在句子和文档层面,闭源模型均优于所有开源模型:GPT-5.4和Gemini-3.1-Flash-Lite领先,平均句子级COMET得分分别为68.3和68.0,平均文档级COMET得分均为48.3。在开源系统中,微调的NLLB-1.3B在句子级达到67.3,TranslateGemma-12B在1-shot上下文学习下文档级达到44.0。我们发布AfriScience-MT以支持非洲语言的基准测试和文档级科学机器翻译。

英文摘要

The dominance of colonial languages in African education and scientific communication limits how hundreds of millions of speakers of African languages access and produce scientific knowledge. A core obstacle is the lack of established scientific terminology in these languages. We introduce AfriScience-MT, a parallel corpus covering six African languages (Amharic, Hausa, Luganda, Northern Sotho, Yorùbá, and isiZulu) across 11 scientific domains. Professional translators, working with expert science communicators, translated plain-language summaries of scientific papers into each target language and created new terms where none existed. We benchmark machine translation systems and large language models in zero-shot, few-shot, and fine-tuned settings. Our results show that closed-source models outperform all open-source models at both the sentence and document levels: GPT-5.4 and Gemini-3.1-Flash-Lite lead with average sentence-level COMET scores of 68.3 and 68.0, respectively, and tie at an average document-level COMET of 48.3. Among open systems, fine-tuned NLLB-1.3B reaches 67.3 at the sentence level, and TranslateGemma-12B reaches 44.0 at the document level with 1-shot in-context learning. We release AfriScience-MT to support benchmarking and document-level scientific MT for African languages.

2605.29738 2026-05-29 cs.CL cs.AI

Multi-Legal-Bench: Evaluating LLMs on Legal Reasoning Across Jurisdictions, Languages, and Legal Traditions

Multi-Legal-Bench: 跨司法管辖区、语言和法律传统的法律推理评估LLM

Volodymyr Ovcharov

发表机构 * SecondLayer

AI总结 提出Multi-Legal-Bench,首个跨司法管辖区法律基准,在6个国家、4个语系和1.34亿份法院判决上评估LLM,发现少样本效果跨辖区复制、无单一模型主导所有语言、跨语言迁移不遵循语言邻近性、分词器效率不显著预测跨语言准确率。

Comments 14 pages, 5 figures, 8 tables. Dataset: https://huggingface.co/datasets/overthelex/multi-legal-bench

详情
AI中文摘要

法律NLP基准绝大多数评估单一语言或汇总跨司法管辖区根本不同的任务,使得跨语言比较不可能。我们引入Multi-Legal-Bench,首个跨司法管辖区法律基准,在六个国家(乌克兰、法国、荷兰、波兰、捷克共和国、立陶宛)、四个语系和1.34亿份法院判决上评估相同任务。该基准定义了五个任务——法院类型分类、判决形式分类、案件结果预测、法律规范提取和原因类别预测——映射到来自国家法院登记处的结构化元数据,形成一个故意稀疏的5x6任务-司法管辖区矩阵(30个单元格中填充20个)。我们通过AWS Bedrock在零样本和3样本提示下评估7个前沿LLM,并额外使用4个小/中型模型(3-12B)进行规模分析。我们的结果显示:(1)在乌克兰发现的依赖任务的少样本效果在所有司法管辖区复制;(2)没有单一模型主导任何语言——排名随任务和司法管辖区而变化;(3)跨语言少样本迁移不遵循语言邻近性:UA->FR(罗曼语族,-2.1个百分点)迁移优于UA->PL(斯拉夫语族,-13.7个百分点),标签集对齐比语系更能预测迁移质量;(4)分词器生育率尽管有2.3倍的差异,并不能显著预测跨语言准确率(r=-0.27,p=0.14),表明模型架构和预训练数据主导分词器效率。我们发布所有数据、提示和模型预测。

英文摘要

Legal NLP benchmarks overwhelmingly evaluate a single language or aggregate tasks that differ fundamentally across jurisdictions, making cross-lingual comparison impossible. We introduce Multi-Legal-Bench, the first cross-jurisdictional legal benchmark that evaluates identical tasks across six countries (Ukraine, France, Netherlands, Poland, Czech Republic, Lithuania), four language families, and 134 million court decisions. The benchmark defines five tasks court-type classification, judgment form classification, case-outcome prediction, legal norm extraction, and cause category prediction mapped to structured metadata from national court registries, forming a deliberately sparse 5x6 task-jurisdiction matrix (20 of 30 cells filled). We evaluate 7 frontier LLMs under zero-shot and 3-shot prompting via AWS Bedrock, with 4 additional small/medium models (3-12B) for scaling analysis. Our results reveal that: (1) task-dependent few-shot effects discovered in Ukrainian replicate across all jurisdictions; (2) no single model dominates any language rankings shift with both task and jurisdiction; (3) cross-lingual few-shot transfer does not follow language proximity: UA->FR (Romance, -2.1 pp) transfers better than UA->PL (Slavic, -13.7 pp), with label-set alignment predicting transfer quality better than language family; and (4) tokenizer fertility, despite a 2.3x spread, does not significantly predict cross-lingual accuracy (r=-0.27, p=0.14), suggesting that model architecture and pretraining data dominate tokenizer efficiency. We release all data, prompts, and model predictions.

2605.29734 2026-05-29 cs.CL

HTAM: Hierarchical Transition-Attended Memory for Operator Optimization

HTAM: 用于算子优化的层次化过渡注意力记忆

Yining Zhang, Mingyang Yi, Chen Wang, Xuwen Xiang, Tianhe Jia, Zedong Dan, Chengqing Zong, Yue Wang

发表机构 * School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院) Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) Zhongguancun Academy(中关村学院) Renmin University of China(中国人民大学)

AI总结 提出HTAM框架,通过构建层次化过渡图(HTG)组织粗粒度全局方向和细粒度局部策略,解决LLM在GPU算子优化中粒度不匹配问题,显著提升正确率和加速比。

Comments 24 pages, 5 figures

详情
AI中文摘要

高性能GPU内核对于高效部署LLM至关重要,但其优化仍然需要大量专业知识。最近基于LLM的代码生成使得自动GPU算子生成变得有前景,但算子优化仍然是一个硬件感知的搜索问题。现有的基于LLM的方法面临粒度不匹配的问题:粗粒度的提示可重用但难以执行,而细粒度的记忆可操作但会扩大搜索空间并模糊优化瓶颈。因此,关键挑战在于以适当的粒度组织优化经验。为了解决这个问题,本文提出了HTAM(层次化过渡注意力记忆),一种用于基于LLM的算子优化的粗到细框架。HTAM构建了一个两层的层次化过渡图(HTG),用于组织粗粒度的全局方向、细粒度的局部策略以及优化步骤之间的过渡经验。在每个演化步骤中,HTAM从当前状态和最近的优化历史中选择一个全局方向,检索相应的局部策略记忆,并用它来指导具体的CUDA代码生成。在完整的KernelBench套件上的实验表明,与基于LLM的基线相比,HTAM在正确率、快速解率和加速比上均有持续提升,而后端和Robust-KBench研究则表明结构化记忆带来的可迁移优势。

英文摘要

High-performance GPU kernels are essential for efficient LLM deployment, yet optimizing them remains expertise-intensive. Recent LLM-based code generation makes automatic GPU operator generation promising, but operator optimization remains a hardware-aware search problem. Existing LLM-based methods face a granularity mismatch: coarse hints are reusable but hard to execute, whereas detailed memories are actionable but enlarge the search space and obscure optimization bottlenecks. The key challenge is therefore to organize optimization experience at an appropriate granularity. To address this issue, this paper proposes HTAM (Hierarchical Transition-Attended Memory), a coarse-to-fine framework for LLM-based operator optimization. HTAM builds a two-level Hierarchical Transition Graph (HTG) to organize coarse global directions, detailed local strategies, and transition experience between optimization steps. During each evolution step, HTAM selects a global direction from the current state and recent optimization history, retrieves the corresponding local strategy memory, and uses it to guide concrete CUDA code generation. Experiments on the full KernelBench suite demonstrate that HTAM consistently improves correctness, fast-solution rate, and speedup over LLM-based baselines, while backend and Robust-KBench studies indicate transferable benefits from structured memory.

2605.29733 2026-05-29 cs.AI

Uncertainty-Aware Transfer Learning for Cross-Building Energy Forecasting: Toward Robust and Scalable District-Level Energy Management

面向跨建筑能耗预测的不确定性感知迁移学习:迈向鲁棒且可扩展的区域级能源管理

Shadmehr Zaregarizi, Khashayar Yavari

发表机构 * Politecnico di Torino(托里尼理工学院)

AI总结 提出基于时间融合变换器的不确定性感知迁移学习框架,通过引入迁移鲁棒性指标和探针微调策略,实现跨建筑能耗预测的鲁棒迁移与不确定性量化。

Comments 5 pages, 3 figures, 2 tables. Accepted at BALANCES'26 (6th ACM International Workshop on Big Data and Machine Learning for Smart Buildings and Cities), Banff, Alberta, Canada, June 22, 2026. This is the author's accepted manuscript; final published version DOI will be activated after June 22, 2026

详情
AI中文摘要

将数据驱动的能耗预测扩展到区域级需要能够在最小目标域数据和诚实不确定性估计下跨建筑复用的模型。我们提出了一种基于时间融合变换器的不确定性感知迁移学习框架,用于跨建筑能耗预测,并在新发布的高分辨率真实子计量数据集上进行了评估:丹麦奥尔堡大学的一栋教育建筑(源域)和瑞士EMPA的多类型NEST建筑(目标域)。我们引入了迁移鲁棒性指数,一种与架构无关的度量,用于量化跨域泛化质量。一项四策略层冻结消融实验表明,仅探针微调(仅更新806K参数中的455个输出层参数)实现了最佳的迁移质量,优于全微调,表明TFT编码器学习了可迁移的时间表示。蒙特卡洛丢弃法得到的预测区间覆盖概率为93.2%,接近名义上的95%目标。数据稀缺性分析进一步显示,随着目标域数据的增加,性能单调提升,为区域能源部署提供了实践指导。

英文摘要

Scaling data-driven energy forecasting to district level requires models that can be re-used across buildings with minimal target-domain data and honest uncertainty estimates. We present an uncertainty-aware transfer learning (TL) framework for cross-building energy forecasting based on the Temporal Fusion Transformer (TFT), evaluated on a newly released high-resolution real sub-meter dataset: an educational building at Aalborg University, Denmark (source) and the multi-typology NEST building at EMPA, Switzerland (target). We introduce the Transfer Robustness Index (TRI), an architecture-agnostic metric for quantifying generalization quality across domain gaps. A four-strategy layer-freezing ablation shows that Probe-Only fine-tuning, updating only 455 output-layer parameters out of 806K, achieves the best transfer quality (TRI = 3,097), outperforming full fine-tuning and suggesting that TFT encoders learn transferable temporal representations. Monte Carlo Dropout yields a prediction interval coverage probability of 93.2%, close to the nominal 95% target. A data-scarcity analysis further shows monotonic improvement with increasing target-domain data, providing practical guidance for district energy deployment.

2605.29729 2026-05-29 cs.LG

Realistic honeypot evaluations for scheming propensity

针对策划倾向的逼真蜜罐评估

Victoria Krakovna, David Lindner, Lewis Ho, Sebastian Farquhar, Rohin Shah

发表机构 * Google DeepMind(谷歌深Mind)

AI总结 提出一种框架,通过在Google对齐研究代码库中设置编码任务作为蜜罐,测试模型在有机会时是否会追求工具性目标,实验表明Gemini模型在真实部署中不会主动策划,但在特定提示下会表现出策划或破坏行为。

详情
AI中文摘要

我们引入了策划蜜罐评估,这是一个测试模型在有机会时是否会追求工具性目标的框架。我们的策划蜜罐评估以Google对齐研究代码库中的编码任务形式进行。在真实的内部部署环境中,Gemini模型不会表现出未经提示的策划行为。如果提示明确鼓励主动性(情境意识或目标导向)和/或给模型一个隐藏目标,模型有时会策划或尝试破坏。验证了我们设置的真实性,模型表现出较低的评估意识,通常是由于主动性提示而非环境所致。

英文摘要

We introduce scheming honeypot evaluations, a framework for testing whether models will pursue instrumental goals if given the opportunity. Our scheming honeypot evaluations take the form of coding tasks in Google's alignment research codebases. In a real internal deployment setting, Gemini models do not demonstrate unprompted scheming. If prompts explicitly encourage agency (situational awareness or goal-directedness) and/or give the model a hidden goal, models sometimes scheme or attempt sabotage. Validating the realism of our setting, models show low rates of evaluation awareness, usually due to agency prompts rather than the environments.

2605.29727 2026-05-29 cs.LG

Bastion: Budget-Aware Speculative Decoding with Tree-structured Block Diffusion Drafting

Bastion: 预算感知的树结构块扩散草稿投机解码

Soowon Oh, Nam Cao, Yujin Kim, Hojung Jung, Huzama Ahmad, Sangmin Bae, Se-Young Yun

发表机构 * KAIST AI(韩国科学技术院人工智能研究所) Samsung Advanced Institute of Technology(三星先进技术研究所)

AI总结 提出BASTION框架,通过动态构建查询相关的树结构平衡草稿质量与硬件约束,实现预算感知的投机解码,无需训练且保持目标模型分布,速度提升达6.61倍。

详情
AI中文摘要

块扩散草稿者最近作为投机解码的强大替代方案出现,通过在单个并行步骤中预测多个未来令牌分布。然而,由于这些并行预测是从位置边缘分布而非完全条件序列中采样,承诺单一贪婪路径往往无法捕捉目标模型的偏好轨迹。为解决此问题,我们提出BASTION,一种基于树的扩散草稿的预算感知投机解码框架。与依赖静态树拓扑的现有方法不同,BASTION通过平衡草稿质量与硬件约束动态构建查询相关的树。我们的框架整合了三个协同组件:(1) 接受代理,通过路径置信度估计期望接受长度;(2) 在线延迟估计器,校准硬件感知的屋顶线模型;(3) 自适应最佳优先扩展,在边际增益不再证明增量验证成本合理时停止树生长。BASTION无需训练,保持目标模型分布,且无需逐设置调优。在多种基准和GPU架构上,BASTION相比标准自回归解码实现高达6.61倍加速,优于最先进的块扩散基线39%。

英文摘要

Block-diffusion drafters have recently emerged as a powerful alternative for speculative decoding by predicting multiple future-token distributions in a single parallel step. However, since these parallel predictions are sampled from position-wise marginals rather than fully conditioned sequences, committing to a single greedy path often fails to capture the target model's preferred trajectory. To address this, we propose BASTION, a budget-aware speculative decoding framework with tree-based diffusion drafting. Unlike existing methods that rely on static tree topologies, BASTION dynamically constructs query-dependent trees by balancing draft quality against hardware constraints. Our framework integrates three synergistic components: (1) an acceptance surrogate that estimates expected accepted length via path confidence, (2) an online latency estimator that calibrates a hardware-aware roofline model, and (3) an adaptive best-first expansion that grows the tree until marginal gains no longer justify incremental verification costs. BASTION is training-free, preserves the target model's distribution, and requires no per-setting tuning. Across diverse benchmarks and GPU architectures, BASTION achieves up to a 6.61x speedup over standard autoregressive decoding, outperforming state-of-the-art block-diffusion baselines by 39%.

2605.29726 2026-05-29 cs.CV

SLAD : Shared LoRA Adapters for Task Specific Distillation

SLAD:用于任务特定蒸馏的共享LoRA适配器

Reda Bensaid, Yassir Bendou, Vincent Gripon, François Leduc-Primeau

发表机构 * IMT Atlantique(IMT阿登蒂克) Polytechnique Montréal(蒙特利尔理工学院)

AI总结 提出SLAD方法,通过共享低秩适配器参数对齐教师和学生模型的特征表示,实现高效的知识蒸馏,在多个分类和分割数据集上达到最先进性能。

Comments CVPR Findings 2026

详情
AI中文摘要

在资源受限环境(如嵌入式系统)中,将缩小版基础模型适配到下游任务变得越来越流行。这最近激发了任务特定蒸馏的新场景,其中同一基础模型的较大和较小版本都适配到同一下游任务,目标是将知识从前者转移到后者。最近的工作展示了使用同一基础模型的较大版本协助较小版本适配的好处。通常,较大模型(教师)首先通过微调或线性探测进行适配,然后将其知识蒸馏到较小模型(学生)。虽然微调教师通常能提升其性能,但最近的工作表明,对教师进行探测能更好地向学生蒸馏知识。我们的发现表明,这主要是由于教师微调过程中教师和学生之间特征表示的对齐偏差。受现有保留先前学习知识的努力启发,我们首先提出利用低秩适配,从而带来更好的特征对齐,进而实现更好的知识转移。基于这一洞察,我们进一步通过联合训练期间两个编码器之间适配器的参数共享策略来增强特征对齐。我们提出的方法SLAD在教师和学生之间展现出更好的特征对齐,不仅提升了学生模型的性能,也提升了教师模型的性能,同时训练速度比微调快2倍。通过在多个分类和分割数据集上的大量实验,我们展示了该方法在准确性和迁移效率上的提升,在任务特定蒸馏框架中达到了最先进性能。

英文摘要

In the context of resource-constrained environments such as embedded systems, adapting reduced-size foundation models to downstream tasks has become increasingly popular. This has recently motivated the emerging setting of task-specific distillation, where a larger and a smaller version of the same foundation model are both adapted to the same downstream task, with the goal of transferring knowledge from the former to the latter. Recent work has demonstrated the benefits of using a larger version of the same foundation model to assist the adaptation of a smaller one. Typically, the larger model (teacher) is first adapted via fine-tuning or linear probing before its knowledge is distilled into the smaller model (student). While fine-tuning the teacher often increases its performance, recent work showed that probing it leads to better knowledge distillation to the student. Our findings show that this is mainly due to a mis-alignment in feature representation between the teacher and the student which occurs during the teacher's fine-tuning. Inspired by existing efforts to preserve previously learned knowledge, we first propose leveraging low-rank adaptation, resulting in better feature alignment and therefore better knowledge transfer. Drawing from this insight, we further enhance the feature alignment through a parameter-sharing strategy of the adapters between the two encoders during joint training. Our proposed method, SLAD, shows better feature alignment between the teacher and student, which results in increased performance for not only the student but also the teacher model, while being 2x faster to train than fine-tuning. Through extensive experiments on multiple classification and segmentation datasets, we demonstrate the improved accuracy and transfer efficiency of our method, achieving state-of-the-art performance in the task-specific distillation framework.

2605.29720 2026-05-29 cs.CV cs.LG

Efficient, Validation-Free Intrinsic Quality Estimation for Large-Scale Face Recognition Datasets

面向大规模人脸识别数据集的高效、免验证的内在质量评估

Zhichao Chen, Yongle Zhao, Kaicheng Yang, Meng Yang, Yin Xie, Ziyong Feng

发表机构 * School of Cyber Science and Technology, University of Science and Technology of China(中国科学技术大学网络科学与技术学院)

AI总结 提出一种无需训练的内在质量(IQ)指标,通过邻域一致性得分和全局表示子空间复杂度来估计人脸识别数据集生成高性能模型的潜力,实现快速数据集诊断与筛选。

Comments ICML 2026

详情
AI中文摘要

我们提出内在质量(IQ),一种无需验证的度量,旨在估计人脸识别(FR)数据集产生高性能模型的固有潜力,而无需进行全规模训练。IQ 包含两个组成部分:(i)邻域一致性得分,通过最近邻量化局部身份标签一致性;(ii)全局表示子空间复杂度(有效秩,ER),捕捉底层嵌入几何和数据集多样性。IQ 允许使用轻量级代理模型或数据子集进行快速评估,便于在资源密集型的全规模训练之前进行数据集诊断和筛选。我们描述了一个针对干净、噪声和混合质量 FR 数据集定制的实验协议,并概述了验证 IQ 对下游性能预测能力的评估方法。

英文摘要

We propose Intrinsic Quality (IQ), a validation-free metric designed to estimate the inherent potential of face recognition (FR) datasets to produce high-performance models without the need for full-scale training. IQ integrates two components: (i) a Neighbor-Consistency Score that quantifies local identity label agreement via nearest neighbors, and (ii) Global Representation Subspace Complexity (Effective Rank, ER), which captures the underlying embedding geometry and dataset diversity. IQ allows for rapid evaluation using lightweight proxy models or data subsets, facilitating dataset diagnosis and curation prior to resource-intensive full-scale training. We describe an experimental protocol tailored to clean, noisy, and mixed-quality FR datasets, and outline evaluation methodologies to validate IQ's predictive power for downstream performance.

2605.29716 2026-05-29 cs.AI

NaRA: Noise-Aware LoRA for Parameter-Efficient Fine-Tuning of Diffusion LLMs

NaRA: 面向扩散大语言模型参数高效微调的噪声感知LoRA

Shuaidi Wang, Zhan Zhuang, Ruping Huang, Yu Zhang

发表机构 * Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China(南方科技大学计算机科学与工程系,深圳,中国) Department of Computer Science, City University of Hong Kong, Hong Kong, China(香港城市大学计算机科学系,香港,中国) Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China(香港理工大学计算机科学与工程系,香港,中国)

AI总结 针对扩散大语言模型,提出噪声感知低秩适配(NaRA),通过噪声条件超网络生成低秩核心矩阵,实现沿去噪轨迹连续变化的更新矩阵,在常识推理、数学推理和代码生成基准上优于噪声无关基线。

详情
AI中文摘要

扩散大语言模型(dLLMs)已成为一种有前途的非自回归生成范式。鉴于全微调的计算成本过高,参数高效微调(PEFT)已成为标准方法。然而,现有的PEFT方法(如LoRA)最初是为自回归模型设计的,依赖于静态参数,对噪声水平不敏感。因此,它们忽略了扩散过程的内在动态性,其中输入分布和生成难度沿去噪轨迹显著变化,使得它们对dLLMs而言是次优的。为了解决这个问题,我们提出了噪声感知低秩适配(NaRA),它引入了一个由轻量级、全局共享的超网络根据噪声水平生成的低秩核心矩阵。这种设计使得更新矩阵能够沿扩散过程连续变化,同时保持参数和延迟开销可忽略不计。我们为所提出的NaRA框架提供了理论依据,并在常识推理、数学推理和代码生成基准上实证证明了其相对于噪声无关基线的持续改进。我们的代码可在https://github.com/generaldi/NaRA获取。

英文摘要

Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive generative paradigm. Given the prohibitive computational cost of full fine-tuning, Parameter-Efficient Fine-Tuning (PEFT) has become the standard approach. However, existing PEFT methods (e.g., LoRA), originally tailored for autoregressive models, rely on static parameters that are agnostic to the noise level. Consequently, they ignore the intrinsic dynamics of the diffusion process, where input distributions and generation difficulty shift significantly along the denoising trajectory, rendering them suboptimal for dLLMs. To address this, we propose Noise-aware Low-Rank Adaptation (NaRA), which introduces a low-rank core matrix generated by a lightweight, globally shared hypernetwork conditioned on the noise level. This design enables the update matrices to vary continuously along the diffusion process while keeping parameter and latency overhead negligible. We provide a theoretical justification for the proposed NaRA framework and empirically demonstrate consistent improvements over noise-agnostic baselines across commonsense reasoning, mathematical reasoning, and code generation benchmarks. Our code is available at https://github.com/generaldi/NaRA.

2605.29715 2026-05-29 cs.CL

User-Aware Active Knowledge Acquisition for Emotional Support Dialogue

面向情感支持对话的用户感知主动知识获取

Mufan Xu, Kehai Chen, Jiahao Hu, Xinchao Xu, Muyun Yang, Tiejun Zhao, Min Zhang

发表机构 * Harbin Institute of Technology, China(哈尔滨工业大学) Baidu Inc., Beijing, China(百度公司)

AI总结 提出用户感知主动知识获取(UKA)框架,通过理论心智不确定性估计和主动学习,在情感支持对话中高效获取用户对齐的对话知识,提升对话质量和用户对齐。

详情
AI中文摘要

情感支持在对话系统中扮演重要角色,其成功取决于在多轮交互中适应用户不断变化且隐含的需求,同时利用大语言模型的强大推理能力。然而,由于用户需求的信号通常微弱、间接,且只能通过多轮交互来消除歧义,现有的情感支持方法往往难以高效获取和泛化相关的对话知识。为弥补这一差距,我们引入了用户感知主动知识获取(UKA),这是一种无梯度的主动对话学习框架,明确表示用户需求的不确定性,并将主动学习融入知识获取和响应选择中。我们提出了一种理论心智不确定性估计机制,使模型能够优先选择响应,从而引发更多信息性的用户反馈。UKA能够在训练期间高效探索用户对齐的对话知识,同时在测试时保持鲁棒性。在多个对话基准和模型架构上的实验表明,我们的方法在对话质量和用户对齐方面始终优于强基线。

英文摘要

Emotional support plays an important role in dialogue systems, and its success depends on adapting to a user's evolving and implicit needs across multi-turn interactions while leveraging the strong reasoning capacity of large language models. However, since signals about user needs are often weak, indirect, and can only be disambiguated through multi-turn interaction, existing emotional support methods often struggle to acquire and generalize relevant conversational knowledge efficiently. To bridge this gap, we introduce User-Aware Active Knowledge Acquisition (UKA), a gradient-free active dialogue learning framework that explicitly represents uncertainty about user needs and incorporates active learning into both knowledge acquisition and response selection.We propose a Theory-of-Mind uncertainty estimation mechanism that allows the model to prioritize responses, thereby eliciting more informative user feedback. UKA is capable of efficiently exploring user-aligned conversational knowledge during training while maintaining robustness at test time. Experiments across multiple dialogue benchmarks and model architectures demonstrate that our approach consistently outperforms strong baselines in dialogue quality and user alignment.

2605.29714 2026-05-29 cs.CL

Leveraging Routing Dynamics in Mixture-of-Experts Models for Efficient Language Adaptation

利用混合专家模型中的路由动态实现高效语言适配

Aditi Khandelwal, Marius Mosbach, Verna Dankers, Siva Reddy, Golnoosh Farnadi

发表机构 * Mila – Quebec AI Institute & McGill University(魁北克AI研究院与麦吉尔大学)

AI总结 研究英语中心混合专家模型在多语言持续预训练中的路由动态,发现早期和中间层路由分散且语言无关,最终层出现语言专化,并提出仅更新最终层语言特定和共享专家的参数高效适配策略。

详情
AI中文摘要

混合专家(MoE)模型被广泛用于扩展语言模型,但其专家路由行为和多语言环境下的适配仍未被充分探索。在这项工作中,我们研究了在英语中心的MoE模型上使用多语言语料库进行持续预训练时的多语言路由动态,分析了专家使用如何随语言变化。我们发现,持续的多语言预训练导致早期和中间层出现分散的、与语言无关的路由,而语言专化主要出现在最终层。我们还表明,语言之间的token级词汇重叠在路由方式中起着重要作用。受这些发现启发,我们提出了一种参数高效的适配策略,仅更新最终MoE层中的语言特定和共享专家。在MultiBLiMP和Belebele上的实验表明,我们的方法实现了强大的性能-效率权衡,在更新不到2%参数的情况下,达到了与微调整个最终层相竞争的性能。总体而言,我们的发现揭示了在持续预训练期间MoE中语言专化出现的位置和方式,并为低资源多语言适配提供了实用见解。我们的代码可在https://github.com/aditi184/moe-routing-adaptation获取。

英文摘要

Mixture-of-Experts (MoE) models are widely used to scale language models, yet their expert routing behavior and adaptation in a multilingual setting remain underexplored. In this work, we study multilingual routing dynamics during continual pre-training of an English-centric MoE model on a multilingual corpus, analyzing how expert usage varies across languages. We find that continual multilingual pre-training leads to diffused, language-agnostic routing in early and middle layers, with language specialization primarily emerging in the final layers. We also show that token-level vocabulary overlap between languages plays an important role in how languages are routed. Motivated by these findings, we propose a parameter-efficient adaptation strategy that updates language-specific and shared experts in the final MoE layers. Experiments on MultiBLiMP and Belebele show that our method achieves a strong performance-efficiency trade-off, attaining competitive performance relative to fine-tuning complete final layers, while updating less than 2% of the parameters. Overall, our findings provide insights into where and how language specialization emerges in MoEs during continual pre-training and provide practical insights for low-resource multilingual adaptation. Our code is available at https://github.com/aditi184/moe-routing-adaptation.

2605.29713 2026-05-29 cs.LG cs.AI

The Little Book of Generative AI Foundations: An Intuitive Mathematical Primer

生成式AI基础小书:直观数学入门

Tianhua Chen

发表机构 * School of Computing and Engineering(计算与工程学院)

AI总结 本书通过推导导向的方式,从PCA到能量模型,系统介绍现代生成式人工智能的数学基础,旨在使生成建模结构更易理解。

Comments Preprint version, 178 pages. Comments and corrections are welcome

详情
AI中文摘要

本书提供了对现代生成式人工智能数学基础的紧凑、推导导向的介绍。它不是调查每一个最近的架构或实现细节,而是通过连接主要生成模型家族的思想发展出一条连贯的路线,从PCA、概率PCA、变分自编码器和扩散模型到归一化流、自回归分解、GANs、Wasserstein GANs和基于能量的模型。目的是使生成建模的结构更易理解,同时不失去理解这些模型如何推导和关联所需的数学实质。本书旨在为具有数学好奇心的研究人员、从业者和学生提供基础构建的入门读物。

英文摘要

This book provides a compact, derivation-oriented introduction to the mathematical foundations of modern generative artificial intelligence. Rather than surveying every recent architecture or implementation detail, it develops a coherent route through the ideas connecting major families of generative models, from PCA, probabilistic PCA, variational autoencoders, and diffusion models to normalising flows, autoregressive factorisations, GANs, Wasserstein GANs, and energy-based models. The aim is to make the structure of generative modelling more accessible without removing the mathematical substance needed to understand how these models are derived and related. The book is intended as a foundation-building primer for mathematically curious researchers, practitioners, and students.

2605.29712 2026-05-29 cs.CL cs.AI

Teaching Language Models to Check Grounded Claim Factuality with Human Test-Taking Strategies

教会语言模型使用人类应试策略检查基于事实的声明真实性

Yuxuan Ye, Raul Santos-Rodriguez, Edwin Simpson

发表机构 * Intelligent Systems Laboratory(智能系统实验室) University of Bristol(布里斯托大学)

AI总结 将基于事实的声明真实性检查建模为真假阅读理解任务,通过提示语言模型使用明确的应试策略进行高效推理,并训练小语言模型以降低推理成本。

Comments ACL 2026 Main

详情
AI中文摘要

基于事实的声明真实性检查对于大型语言模型(LLM)应用(如检索增强生成)非常重要,因为它帮助用户评估生成输出的正确性。现有的使用蕴含分类器的指标需要针对数据集调整阈值,而基于LLM的方法通常使用直接提示,这未能充分利用LLM的推理能力。我们通过将基于事实的声明真实性检查建模为真假阅读理解任务,并提示LLM使用明确的应试策略进行高效推理来解决这一问题。与无引导的开放式推理相比,我们的方法减少了超过80%的令牌使用量,并在两个真实性基准测试中取得了与更昂贵替代方案竞争的性能,在一个基准上达到了新的最先进水平。为了进一步降低推理成本,我们训练小语言模型(SLM)来替代检查流程中的LLM。通过监督微调(SFT)和自我修正机制,SLM学会了改进其真实性判断。实验结果表明,生成的SLM在性能上与强基线相当,结合了低推理成本和生成支持理由以支持可解释性。代码和数据集将在接收后发布。

英文摘要

Grounded claim factuality checking is important for large language model (LLM) applications such as retrieval-augmented generation, as it helps users assess the correctness of generated outputs. Existing metrics using entailment classifiers require dataset-specific threshold tuning, while LLM-based approaches often use direct prompting, which underutilises the reasoning capabilities of LLMs. We address this by formulating grounded claim factuality checking as a true/false reading comprehension task and prompting LLMs with explicit test-taking strategies for efficient reasoning. Our method reduces token usage by over 80% compared to unguided open-ended reasoning, and achieves competitive performance to more expensive alternatives across two factuality benchmarks, setting a new state of the art on one. To further reduce inference cost, we train small language models (SLMs) to replace LLMs in the checking pipeline. Using supervised fine-tuning (SFT) and a self-revision mechanism, the SLMs learn to improve their factuality judgements. Experimental results show that the resulting SLMs perform on par with strong baselines, combining low inference costs with generating supporting rationales to support interpretability. Code and datasets will be released upon acceptance.

2605.29711 2026-05-29 cs.CL cs.AI

Personalized Turn-Level User Conversation Satisfaction Benchmark

个性化轮级用户对话满意度基准

Zhefan Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang, Quanjia Yan, Hengliang Luo

发表机构 * Department of Computer Science and Technology, Tsinghua University, Beijing, China.(清华大学计算机科学与技术系,北京,中国) Institute for AI Industry Research, Tsinghua University, Beijing, China.(清华大学人工智能产业研究院,北京,中国) Meituan(美团)

AI总结 针对AI助手响应的个性化满意度评估问题,提出结合用户记忆与目标轮上下文的满意度评估器,并构建PersTurnBench基准,通过回放实现生成模型的受控比较。

详情
AI中文摘要

用户对AI助手的满意度高度个性化:同一响应可能满足一个用户但令另一个失望,取决于每个用户的期望以及他们之前询问的内容。现有的自动评估方法大多衡量通用响应质量,难以判断某个响应在特定轮次是否满足用户。我们将此问题作为个性化轮级用户对话满意度评估进行研究。我们构建了一个对话满意度评估器,将紧凑的用户记忆与目标轮上下文相结合,生成满意度分数和不满意的理由。与人类满意度标注的元评估表明,个性化记忆和事后分数校准在有序一致性和不满意轮次检测上优于监督式、检索式和通用LLM作为评判者的基线。我们进一步引入了PersTurnBench,这是一个个性化轮级用户对话满意度基准,通过回放使用经过验证的评估器来评估生成模型。通过固定回放状态,PersTurnBench能够在无需为每个候选模型收集新人工标签的情况下,对通用生成模型和记忆增强的个性化系统进行受控比较。该评估器和基准让研究人员能够在无需为每个模型收集新用户反馈的情况下,比较候选生成模型在个性化满意度上的表现。

英文摘要

User satisfaction with AI assistants is highly personalized: the same response may satisfy one user but disappoint another depending on what each user expects and what they have asked for before. Existing automatic evaluation methods mostly measure generic response quality, making it difficult to judge whether a response satisfies a user at a specific turn. We study this problem as personalized turn-level user conversation satisfaction evaluation. We build a conversation satisfaction evaluator that combines compact user memories with target-turn context to produce satisfaction scores and dissatisfaction-oriented rationales. Meta-evaluation against human satisfaction annotations shows that personalized memory and post-hoc score calibration improve ordinal agreement and dissatisfied-turn detection over supervised, retrieval-based, and generic LLM-as-a-judge baselines. We further introduce PersTurnBench, a personalized turn-level user conversation satisfaction benchmark that uses the verified evaluator to assess generation models via replay. By holding the replay state fixed, PersTurnBench enables controlled comparison of generic generation models and memory-augmented personalized systems without new human labels for every candidate model. The evaluator and benchmark let researchers compare candidate generation models on personalized satisfaction without collecting new user feedback for every model.

2605.29710 2026-05-29 cs.RO

PhAIL: A Real-Robot VLA Benchmark and Distributional Methodology

PhAIL:一个真实机器人VLA基准测试与分布性方法论

Sergey Arkhangelskiy

发表机构 * Positronic Robotics(positronic机器人)

AI总结 针对现有VLA策略评估中样本量小、统计比较不可靠的问题,提出PhAIL基准测试,采用时间-成功累积分布函数作为评估基元,通过人类相对吞吐量评分和Kolmogorov-Smirnov显著性检验,在少量rollout下实现更可靠的模型比较。

Comments 22 pages, 10 figures, 8 tables. Dataset, analysis pipeline, and paper source: https://phail.ai and https://github.com/Positronic-Robotics/phail-paper

详情
AI中文摘要

视觉-语言-动作(VLA)策略的真实世界评估仍然依赖于固定超时下的二元成功率,每个条件最多进行$N \le 25$次rollout,几乎总是没有置信区间或配对统计比较;这些队列规模难以可靠地解决接近的比较。我们引入了PhAIL(物理AI排行榜,https://phail.ai),这是一个基于Franka FR3的开放真实机器人基准测试(包括数据集、每次rollout的工件和端到端参考实现),采用分布性评估方法论:以时间-成功累积分布函数(CDF)作为评估基元,分为两个独立任务。第一个是通过人类相对吞吐量(HRT)进行评分,这是一个具有bootstrap置信区间的无量纲标量,锚定于同一设备的远程操作。第二个是显著性检验(Kolmogorov-Smirnov,按对象计算并在对象间进行宏观平均)。在四个公开可用的VLA上,宏观平均KS检验在每(模型,对象)单元$N \le 30$次rollout下解决了两个接近的比较(GR00T vs. ACT,OpenPI vs. ACT),而二元阈值指标无法做到;最接近的一对(OpenPI vs. GR00T)在我们的预算内仍未解决。评估中最佳的VLA每次操作比人类参考慢约$7\times$(RMST比率)。

英文摘要

Real-world evaluation of vision-language-action (VLA) policies still rests on binary success rate at a fixed timeout with $N \le 25$ rollouts per condition, almost always without confidence intervals or paired statistical comparison; these cohort sizes struggle to resolve close comparisons reliably. We introduce PhAIL (Physical AI Leaderboard, https://phail.ai), an open real-robot benchmark on a Franka FR3 (dataset, per-rollout artifacts, and end-to-end reference implementation) of a distributional evaluation methodology: the time-to-success cumulative distribution function (CDF) as the evaluation primitive, with two separated jobs. The first is scoring via Human-Relative Throughput (HRT), a dimensionless scalar with bootstrap confidence intervals, anchored to same-fixture human teleoperation. The second is a significance test (Kolmogorov-Smirnov, computed per-object and macro-averaged across objects). On four publicly-available VLAs, the macro-averaged KS test resolves two close comparisons (GR00T vs. ACT, OpenPI vs. ACT) at $N \le 30$ rollouts per (model, object) cell where binary-threshold metrics do not; the closest pair (OpenPI vs. GR00T) remains unresolved within our budget. The best evaluated VLA is $\sim 7\times$ slower per operation (RMST ratio) than the human reference.

2605.29708 2026-05-29 cs.CL

Understanding Safety-Sensitive Expert Behavior in Mixture-of-Experts LLMs

理解混合专家大语言模型中的安全敏感专家行为

Zhibo Zhang, Yuxi Li, Zhen Ouyang, Ling Shi, Kailong Wang

发表机构 * Huazhong University of Science and Technology, Wuhan, China(华中科技大学,武汉,中国) Nanyang Technological University, Singapore(南洋理工大学,新加坡)

AI总结 通过提出RASET框架,研究混合专家大语言模型中安全对齐与路由专家专业化之间的关系,发现路由模式主要由主题驱动,而安全行为可通过调整少数专家改变而不影响路由路径。

Comments 11 pages, 4 figures

详情
AI中文摘要

混合专家(MoE)大语言模型依赖于稀疏的、由路由器驱动的专家激活,然而安全对齐如何与路由专家专业化相互作用仍未被充分探索。一种常见的直觉是,安全行为可能通过将有害请求路由到不同的拒绝导向专家来控制。在这项工作中,我们为不同的情况提供了经验证据:对齐的MoE大语言模型中的路由模式主要是主题驱动的,而安全行为可以在不改变模型固有路由路径的情况下被改变。基于这一观察,我们提出了**RASET**(**R**outer-**A**gnostic **S**afety-critical **E**xpert **T**uning,路由器无关的安全关键专家微调),这是一个红队框架,用于探测集中在少数专家中的安全执行,同时保持模型固有的路由行为。**RASET**通过对比路由敏感性标准识别安全关键专家,并仅对选定的专家应用参数高效微调,从而相对于路由器干预最小化语义干扰。这些结果揭示了独特的MoE安全风险,强调了需要专家感知的对齐机制。

英文摘要

Mixture-of-Experts (MoE) LLMs rely on sparse, router-driven expert activation, yet how safety alignment interacts with routed expert specialization remains underexplored. A common intuition is that safety behavior may be controlled by routing harmful requests to distinct refusal-oriented experts. In this work, we provide empirical evidence for a different picture: routing patterns in aligned MoE LLMs are largely topic-driven, while safety behavior can be altered with little change to the model's intrinsic routing path. Motivated by this observation, we present **RASET** (**R**outer-**A**gnostic **S**afety-critical **E**xpert **T**uning), a red-teaming framework that probes safety enforcement that is localized in a small subset of experts while preserving the model's intrinsic routing behavior. **RASET** identifies safety-critical experts via a contrastive routing-sensitivity criterion and applies parameter-efficient tuning only to the selected experts, minimizing semantic disruption relative to router-steering interventions. These results reveal a distinct MoE safety risk, highlighting the need for expert-aware alignment mechanisms.

2605.29707 2026-05-29 cs.CL

Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

Domino: 在推测解码中将因果建模与自回归草稿解耦

Jianuo Huang, Yaojie Zhang, Qituan Zhang, Hao Lin, Hanlin Xu, Linfeng Zhang

发表机构 * EPIC Lab, Shanghai Jiao Tong University(上海交通大学EPIC实验室) School of Software Engineering, HUST(华中科技大学软件学院) UESTC Fudan University(复旦大学) Huawei(华为公司)

AI总结 提出Domino框架,通过并行草稿骨干和轻量级Domino头解耦因果依赖建模与自回归草稿执行,结合基础锚定训练课程,在Qwen3模型上实现高达5.49倍端到端加速和5.8倍吞吐量加速。

详情
AI中文摘要

推测解码通过草拟多个令牌并与目标模型并行验证来加速LLM推理。然而,其实际加速受限于草稿质量与草稿成本之间的权衡:自回归草稿器建模草稿令牌间的因果依赖但引入顺序开销,而并行草稿器降低草稿成本但削弱块内依赖建模。本文提出Domino,一种将因果依赖建模与昂贵的自回归草稿执行解耦的推测解码框架。Domino首先使用并行草稿骨干为整个块生成初步草稿分布,然后应用轻量级Domino头以前缀依赖的因果信息对其进行细化。为稳定教师强制因果编码,我们进一步引入基础锚定训练课程,首先强化并行骨干,然后逐步将优化转向因果修正的最终分布。在Qwen3模型上的实验表明,Domino在Transformers后端下实现高达5.49倍的端到端加速,在SGLang服务下实现高达5.8倍的吞吐量加速。

英文摘要

Speculative decoding accelerates LLM inference by drafting multiple tokens and verifying them in parallel with the target model. However, its practical speedup is constrained by the trade-off between draft quality and drafting cost: autoregressive drafters model causal dependencies among draft tokens but incur sequential overhead, while parallel drafters reduce drafting cost but weaken intra-block dependency modeling. In this paper, we propose Domino, a speculative decoding framework that decouples causal dependency modeling from expensive autoregressive draft execution. Domino first uses a parallel draft backbone to produce preliminary draft distributions for the entire block, and then applies a lightweight Domino head to refine them with prefix-dependent causal information. To stabilize teacher-forced causal encoding, we further introduce a base-anchored training curriculum that first strengthens the parallel backbone and then gradually shifts optimization toward the causally corrected final distribution. Experiments on Qwen3 models show that Domino achieves up to \(5.49\times\) end-to-end speedup under the Transformers backend and up to \(5.8\times\) throughput speedup under SGLang serving.

2605.29705 2026-05-29 cs.AI

BitTP: The Lightweight Trajectory Prediction Model with BitLLM for Edge-Devices

BitTP:面向边缘设备的轻量级轨迹预测模型与BitLLM

Mincheol Kang, Hyunjin Lim, Bomin Kang, Daehee Park

发表机构 * KAIST, Republic of Korea(韩国釜山国立大学) DGIST, Republic of Korea(韩国国立庆北科学技术院)

AI总结 提出BitTP,通过将LLM轨迹预测器转换为1.58比特轻量架构,在保持或提升预测质量的同时大幅降低内存和计算需求,实现边缘设备部署。

Comments Camera-ready version. Accepted as a findings paper at CVPR 2026. 8 pages, 4 figures

详情
AI中文摘要

轨迹预测是自主系统的一项基本任务,需要对多智能体交互和意图进行复杂推理。大型语言模型(LLM)最近被用于此任务,因为它们提供了强大的上下文推理和可解释的、基于语言的轨迹表示。然而,这些基于LLM的预测器极其消耗内存和计算资源,难以部署在资源受限的边缘设备上,例如自主机器人的车载计算机。为弥合这一差距,我们提出BitTP,它将基于LLM的轨迹预测器转换为轻量级比特线性架构。我们证明,仅权重量化到1.58比特(BitTP-Weight)是最优的。关键在于,激活值必须保持全精度,因为量化它们会导致时空推理的严重退化和不稳定性。实验表明,BitTP-Weight不仅保持了全精度(BF16)LLM基线的预测质量,还提升了质量,平均ADE降低14.29%,FDE降低20.97%,同时相比其他量化方法减少了内存使用和推理延迟。这些结果表明,精心设计的量化可作为有效的正则化器,使得基于LLM的复杂推理能够在边缘设备上实际部署。代码地址:https://github.com/MintCat98/BitTP。

英文摘要

Trajectory prediction is a fundamental task for autonomous systems, requiring complex reasoning about multi-agent interactions and intents. Large language models (LLMs) have recently been adopted for this task, as they provide strong contextual reasoning and interpretable, language-based trajectory representations. However, these LLM-based predictors are extremely memory- and compute-intensive, making them difficult to deploy on resource-constrained edge devices such as on-board computers in autonomous robots. To bridge this gap, we propose BitTP, which converts an LLM-based trajectory predictor into a lightweight bitlinear architecture. We demonstrate that weight-only quantization to 1.58-bit (BitTP-Weight) is optimal. Crucially, activations must remain in full precision, as quantizing them leads to severe degradation and instability in spatio-temporal reasoning. Empirically, BitTP-Weight not only preserves but improves prediction quality over the full-precision (BF16) LLM baseline, reducing ADE by 14.29% and FDE by 20.97% on average, while simultaneously reducing memory usage and inference latency relative to other quantization methods. These results demonstrate that carefully designed quantization acts as an effective regularizer, enabling the practical deployment of sophisticated LLM-based reasoning on edge devices. Code is available at: https://github.com/MintCat98/BitTP.

2605.29704 2026-05-29 cs.RO

FLIP: Real-Time and Resilient Formation Planning for Large-Scale DIstributed Swarms via Point Cloud Registration

FLIP:通过点云配准实现大规模分布式集群的实时弹性编队规划

Yuan Zhou, Guangtong Xu, Zhenyu Hou, Jialiang Hou, Fei Gao

发表机构 * Institute of Cyber-Systems and Control, College of Control Science and Engineering, Zhejiang University(浙江大学控制科学与工程学院智能系统与控制研究所) Huzhou Institute, Zhejiang University(浙江大学湖州研究院)

AI总结 提出将最优编队位置序列计算转化为时空点云配准问题,利用带离群点剔除的PCR方法实现大规模分布式集群的弹性、高效轨迹规划。

详情
AI中文摘要

传统的大规模编队规划要么过度简化编队表示导致性能不佳,要么采用完全协作关系导致计算负载过大。为了实现高性能和大规模编队规划,我们将最优编队位置序列(OFPS)计算问题转化为时空点云配准(PCR)问题。每个智能体通过分布式计算自身当前位置与所有其他智能体期望编队位置之间的匹配结果来获得OFPS。然后每个智能体利用OFPS优化协作编队轨迹。我们利用带离群点剔除的PCR方法快速执行大规模编队位置配准。这可以防止次优轨迹和故障智能体通过协作网络传播并影响更多智能体。因此,我们统一实现了大规模集群的弹性、高效和分布式轨迹规划。通过120架无人机编队的大规模仿真以及与最先进(SOTA)方法的严格基准测试,证明了所提方法的有效性和优越性。

英文摘要

Traditional large-scale formation planning either oversimplify the formation representation which leads to poor performance, or they employ complete collaborative relationships, which results in excessive computational load. To achieve high-performance and large-scale formation planning, we transform the Optimal Formation Position Sequence \cite{c1} (OFPS) calculation problem into a spatiotemporal Point Cloud Registration (PCR) problem. Each agent derives its OFPS by distributively computing the matching result between current positions and the desired formation positions of all other agents. Then each agent optimizes the cooperative formation trajectory by using OFPS. We leverage the PCR method with outlier rejection to rapidly perform large-scale formation position registration. This prevents suboptimal trajectories and failed agents from propagating through the cooperative network and affecting more agents. Consequently, we uniformly achieve resilient, efficient, and distributed trajectory planning for large-scale swarms. The effectiveness and the superiority of the proposed method are demonstrated through large-scale simulations of 120-drone formation, and rigorous benchmarking against state-of-the-art (SOTA) methods.

2605.29698 2026-05-29 cs.LG physics.chem-ph

A Systematic Evaluation of Molecular Mixture Behavior Prediction

分子混合物行为预测的系统评估

Roel J. Leenhouts, Nathan K. Morgan, William Green, Jan G. Rittig, Florence H. Vermeire

发表机构 * KU Leuven(卢森堡大学) MIT(麻省理工学院) RWTH Aachen University(亚琛工业大学)

AI总结 提出一个将混合物性质误差分解为纯组分和相互作用成分的评估框架,并基于七个匹配数据集发现绝对精度可能掩盖非理想混合行为的恢复能力。

详情
AI中文摘要

分子性质预测的机器学习主要集中在纯化合物上,尽管许多实际应用依赖于具有分子间相互作用的混合物。最近的工作扩大了混合物数据集的可用性,但评估仍然主要关注绝对精度。然而,混合物中的绝对误差将纯组分贡献与理想混合的偏差混为一谈。我们提出了一个评估框架,将混合物性质误差分解为纯化合物和相互作用(非理想)成分。该框架结合了泄漏感知分割协议、理想混合物基线和过量性质指标。为了支持可重复的基准测试,我们整理了七个匹配的纯和混合物物理化学性质数据集。在多个混合物性质任务和模型家族中,我们发现强绝对精度可能掩盖对非理想混合物行为的恢复能力,并且在严格分子分割下性能显著下降。这些结果将向未见分子的迁移识别为分子混合物机器学习中的核心挑战,并推动超越绝对精度的评估。

英文摘要

Machine learning for molecular property prediction has focused largely on pure compounds, even though many practical applications depend on mixtures with intermolecular interactions. Recent work has expanded the availability of mixture datasets, but evaluation still focuses mainly on absolute accuracy. However, absolute errors in mixtures conflate pure-component contributions with deviations from ideal mixing. We propose an evaluation framework that decomposes mixture-property error into pure-compound and interaction (non-ideal) components. The framework combines leakage-aware split protocols, ideal-mixture baselines, and excess-property metrics. To support reproducible benchmarking, we curate seven matched pure and mixture physicochemical property datasets. Across multiple mixture-property tasks and model families, we find that strong absolute accuracy can mask poor recovery of non-ideal mixture behavior, and that performance drops substantially under strict molecule splits. These results identify transfer to unseen molecules as a central challenge in molecular mixture machine learning and motivate evaluation beyond absolute accuracy alone.

2605.29697 2026-05-29 cs.AI

Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

超越轨迹奖励:通过图建模实现智能搜索的步骤级信用分配

Yuchen Liu, Yingjie Feng, Lixiong Qin, Jiasi Chen, Jianing Yu, Sheng Gao, Sheng Yang, Weiran Xu

发表机构 * Beijing University of Posts and Telecommunications(北京邮电大学) Li Auto Inc.(李自动公司)

AI总结 针对智能搜索中轨迹级奖励无法量化单步行为贡献的问题,提出基于图距离贡献奖励(GDCR)的步骤级过程奖励,并结合步骤优势策略优化(SAPO)在四个基准上验证有效性。

Comments 15 pages, 8 figures

详情
AI中文摘要

在智能搜索中,轨迹级结果奖励无法量化单个步骤的行为贡献,而现有的步骤级奖励方法通常依赖于代价高昂的树采样。我们将世界知识视为潜在的世界图,并将每个信息搜索任务视为在潜在任务图中的搜索,其中有效步骤应朝着答案节点进行图进展。基于这一先验,我们提出图距离贡献奖励(GDCR),这是一种步骤级过程奖励,通过训练时实体-关系(ER)图中实体到答案节点的距离对新检索和引用的实体进行评分。我们进一步提出步骤优势策略优化(SAPO),它将GDCR转换为步骤级优势,并与轨迹级结果优势相结合。在四个具有挑战性的基准上的实验验证了我们方法的有效性。

英文摘要

In Agentic Search, trajectory-level outcome rewards fail to quantify the behavioral contributions of individual steps, while existing step-level reward methods typically rely on costly tree sampling. We view world knowledge as a latent world graph and each IS task as search within a latent task graph, where effective steps should make graph progress toward the answer node. Based on this prior, we propose Graph-Distance Contribution Reward (GDCR), a step-level process reward that scores newly-retrieved and newly-cited entities by their distance to the answer node in a training-time Entity-Relation (ER) graph. We further propose Step Advantage Policy Optimization (SAPO), which converts GDCR into step-level advantages and combines them with trajectory-level outcome advantages. Experiments on four challenging benchmarks validate the effectiveness of our method.

2605.29695 2026-05-29 cs.AI cs.CE cs.LG math.PR

FHRFormer: A Self-Supervised Masked Transformer Framework for Fetal Heart Rate Time-Series Inpainting and Forecasting

FHRFormer: 一种用于胎儿心率时间序列修复和预测的自监督掩码Transformer框架

Kjersti Engan, Neel Kanwal, Anita Yeconia, Ladislaus Blacy, Yuda Munyaw, Estomih Mduma, Hege Ersdal

发表机构 * University of Stavanger(斯塔万格大学) Haydom Lutheran Hospital(海多姆路德医院) Stavanger University Hospital(斯塔万格大学医院)

AI总结 针对胎儿心率监测中信号丢失问题,提出基于掩码Transformer的自监督自编码器方法,通过捕获局部时间和频率成分来修复和预测缺失信号,具有鲁棒性并支持AI风险算法开发。

Comments Submitted to Frontiers in Digital Health. arXiv admin note: substantial text overlap with arXiv:2509.20852

详情
AI中文摘要

大约10%的新生儿出生时需要帮助才能开始呼吸,约5%需要通气支持。胎儿心率(FHR)监测在产前护理中评估胎儿健康状况方面起着关键作用,能够检测异常模式并支持及时产科干预以减轻分娩期间的胎儿风险。应用人工智能(AI)方法分析具有不同结局的连续FHR监测大数据集,可能为预测需要呼吸辅助或干预的风险提供新见解。可穿戴FHR监测仪的最新进展实现了在不影响母亲活动能力的情况下进行连续胎儿监测。然而,母亲运动期间的传感器移位以及胎儿或母亲位置的变化常常导致信号丢失,造成记录的FHR数据出现缺口。这种缺失数据限制了有意义信息的提取,并使基于AI的自动化分析复杂化。传统的缺失数据处理方法,如简单插值技术,往往无法保留信号的频谱特性。在本文中,我们提出了一种基于掩码Transformer的自编码器方法,通过捕获数据的局部时间和频率成分来重建缺失的FHR信号。所提出的方法在不同缺失数据时长下表现出鲁棒性,可用于信号修复和预测。该方法可回顾性地应用于研究数据集,以支持基于AI的风险算法开发。未来,该方法可集成到可穿戴FHR监测设备中,实现更早、更稳健的风险检测。

英文摘要

Approximately 10% of newborns require assistance to initiate breathing at birth, and around 5% need ventilation support. Fetal heart rate (FHR) monitoring plays a crucial role in assessing fetal well-being during prenatal care, enabling the detection of abnormal patterns and supporting timely obstetric interventions to mitigate fetal risks during labor. Applying artificial intelligence (AI) methods to analyze large datasets of continuous FHR monitoring episodes with diverse outcomes may offer novel insights into predicting the risk of needing breathing assistance or interventions. Recent advances in wearable FHR monitors have enabled continuous fetal monitoring without compromising maternal mobility. However, sensor displacement during maternal movement, as well as changes in fetal or maternal position, often lead to signal dropout, resulting in gaps in recorded FHR data. Such missing data limits the extraction of meaningful insights and complicates automated (AI-based) analysis. Traditional approaches to handling missing data, such as simple interpolation techniques, often fail to preserve the spectral characteristics of the signals. In this paper, we propose a masked transformer-based autoencoder approach to reconstruct missing FHR signals by capturing both local temporal and frequency components of the data. The proposed method demonstrates robustness across varying durations of missing data and can be used for signal inpainting and forecasting. The proposed approach can be applied retrospectively to research datasets to support the development of AI-based risk algorithms. In the future, the proposed method could be integrated into wearable FHR monitoring devices to achieve earlier and more robust risk detection.

2605.29693 2026-05-29 cs.LG cs.RO

Momentum Based Reward Design for Low Emission Traffic Signal Control

基于动量的低排放交通信号控制奖励设计

Chinmay Mundane, Amith Manoharan, Arun Singh

发表机构 * Institute of Technology, University of Tartu(塔尔图大学技术学院)

AI总结 提出一种基于动量的奖励函数(MBRF),通过鼓励车辆持续移动而非单纯惩罚拥堵,在SUMO仿真中实现更好的吞吐量-排放权衡和更稳定的学习行为。

详情
AI中文摘要

城市交通拥堵是一个日益严重的全球性问题,导致通勤时间延长和环境污染加剧。传统的交通信号控制系统往往难以适应动态交通状况。自适应交通信号控制可以在不改变道路基础设施的情况下改善城市交通。深度强化学习(DRL)在此任务中表现出色,但现有的基于延误和队列的奖励常常产生短视或不稳定的策略。本文提出了一种基于动量的奖励函数(MBRF),鼓励车辆持续移动,而非仅惩罚拥堵。该方法在SUMO(城市交通仿真)中使用标准交通指标(如等待时间、队列长度、吞吐量和CO2排放)进行评估。结果表明,与基于延误或队列的奖励以及经典控制器(如Max Pressure和LQF)相比,所提出的奖励实现了更好的吞吐量-排放权衡和更稳定的学习行为。

英文摘要

Urban traffic congestion is a growing global issue contributing significantly to long commute times and environmental pollution. Traditional traffic signal control systems often fail to adapt to dynamic traffic conditions. Adaptive traffic signal control can improve urban traffic without changing road infrastructure. Deep Reinforcement Learning (DRL) has shown strong performance for this task, but existing delay and queue-based rewards often produce short-sighted or unstable policies. This paper proposes a Momentum-Based Reward Function (MBRF) that encourages vehicles to keep moving rather than penalizing congestion alone. The method is evaluated in SUMO (Simulation of Urban MObility) using standard traffic metrics such as waiting time, queue length, throughput, and CO2 emissions. Results show that the proposed reward produces better throughput-emission trade-offs and more stable learning behavior than delay or queue-based rewards, as well as classical controllers such as Max Pressure and LQF.