arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1926
2605.22357 2026-05-22 cs.CV cs.AI

VEELA: A Clinically-Constrained Benchmark for Liver Vessel Segmentation in Computed Tomography Angiography

VEELA:一种受临床约束的肝血管分割基准数据集

Ziya Ata Yazıcı, N. Sinem Gezer, İlkay Öksüz, İlker Özgür Koska, Tuğçe Toprak, Pervin Bulucu, Ufuk Beşenk, A. Emre Kavur, Pierre-Henri Conze, Hazım Kemal Ekenel, Oğuz Dicle, Mustafa Ege Şeker, Mustafa Said Kartal, Ariorad Moniri, Orhan Özkan, Osman Faruk Bayram, Hakan Polat, Musa Balcı, Ece Tuğba Cebeci, Baran Cılga, Kardelen Peçenek, M. Alper Selver

发表机构 * Department of Radiology, Dokuz Eylul University(多尔朱·伊勒大学放射科) Department of Computer Engineering, Istanbul Technical University(伊斯坦布尔技术大学计算机工程系) Institute of Natural and Applied Sciences, Dokuz Eylul University(多尔朱·伊勒大学自然科学与应用科学学院) Department of Electrical and Electronics Engineering, Dokuz Eylul University(多尔朱·伊勒大学电气与电子工程系) Department of Radiology, University of Wisconsin-Madison(威斯康星大学麦迪逊分校放射科) School of Medicine, Sivas Cumhuriyet University(萨瓦斯·库尔德大学医学院) School of Medicine, Acibadem Mehmet Ali Aydinlar University(阿克塞姆·梅赫梅特·阿里·阿迪姆大学医学院) Department of Artificial Intelligence Engineering, Bahçeşehir University(巴切希尔大学人工智能工程系) Faculty of Pharmacy, Sivas Cumhuriyet University(萨瓦斯·库尔德大学药学院)

AI总结 本文提出VEELA数据集,用于在CT血管造影中实现肝门静脉分割,通过严格的人工标注和多专家共识,确保标注的临床现实性和准确性,并引入多种评估指标以评估血管分割的多视角性能。

Comments 27 pages, 25 figures, 5 tables

详情
AI中文摘要

在对比增强的计算机断层扫描血管造影(CTA)中,准确分割肝内和门静脉仍然具有挑战性,由于复杂的血管拓扑结构、边缘可见性限制以及成像引起的模糊性。尽管现有的公开数据集提供了有价值的基准,但很少包含临床现实的标注约束。我们引入VEELA(Vessel Extraction and Extrication for Liver Analysis),一个严格编纂的肝血管数据集,源自40个CTA扫描,继承自CHAOS大挑战队列。所有血管均在多专家共识下逐层手动勾勒,使用严格可见性驱动的标注策略,并避免解剖推断插值。这种设计明确捕捉了解剖变异性和成像相关不确定性。作为CHAOS挑战的延续,VEELA使可重复的跨基准评估成为可能,同时扩展到细粒度的肝内和门静脉分割。我们进一步建立了标准化的基准评估框架,并分析了互补的评估指标,包括拓扑感知(clDice)、重叠基于(IoU)、边界敏感(NSD)和几何感知(面积、长度)度量。我们的结果表明,不同的指标捕捉了血管完整性不同的方面,强调了多视角评估在临床有意义的血管分割中的必要性。VEELA已公开发布,以促进可重复的研究并支持稳健的血管分割方法的发展。研究人员可以访问评估指标、数据集和提交平台:https://www.synapse.org/Synapse:syn65471967。

英文摘要

Accurate segmentation of hepatic and portal vessels in contrast-enhanced computed tomography angiography (CTA) remains challenging due to complex vascular topology, peripheral visibility limitations, and acquisition-induced ambiguities. While existing public datasets offer valuable benchmarks, few include clinically realistic annotation constraints. We introduce VEELA (Vessel Extraction and Extrication for Liver Analysis), a rigorously curated liver vessel dataset derived from 40 CTA scans inherited from the CHAOS grand-challenge cohort. All vessels were manually delineated slice-by-slice under multi-expert consensus, using a strict visibility-driven annotation policy and avoiding anatomically inferred interpolation. This design explicitly captures anatomical variability and imaging-related uncertainty. As a continuation of the CHAOS challenge, VEELA enables reproducible cross-benchmark evaluation while extending the scope to fine-grained hepatic and portal vessel segmentation. We further establish a standardized benchmarking framework and analyze complementary evaluation metrics, including topology-aware (clDice), overlap-based (IoU), boundary-sensitive (NSD), and geometry-aware (area, length) measures. Our results demonstrate that different metrics capture distinct aspects of vascular integrity, underscoring the necessity of multi-perspective evaluation for clinically meaningful vessel segmentation. VEELA is publicly released to facilitate reproducible research and support the development of robust vascular segmentation methods. Researchers can access the evaluation metrics, dataset, and submission platform at https://www.synapse.org/Synapse:syn65471967.

2605.22356 2026-05-22 cs.CL

Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning

通过行为微调建模病理样行为模式

Nicola Milano, Davide Marocco

发表机构 * University of Naples Federico II(那不勒斯费德里科二世大学) Natural and Artificial Cognition Laboratory(自然与人工认知实验室) Department of Humanities(人文学部)

AI总结 本文研究了通过行为微调在语言模型中建模病理样行为模式,采用结构化决策任务进行微调,发现模型在不同上下文中产生稳定的生成分布变化,表明行为优化能影响语言生成的分布特性。

详情
AI中文摘要

大型语言模型越来越多地被用作计算工具来建模人类样行为。我们引入了一个行为诱导框架,通过在结构化决策任务上进行微调来修改模型策略:使用受适应不良行为模式启发的合成数据集,包括抑郁和偏执,我们训练基于转换器的语言模型在多样化上下文中一致选择特定类别的动作。然后我们测试这种行为优化是否会在生成分布中产生系统性的变化。在两个架构中,微调后的模型显示出稳定的、上下文通用的下一个标记概率分布变化,包括在开放性语言任务中对负面和威胁相关解释的概率增加。这些效果超越了训练上下文,并在定性完成、心理测量风格的评估和定量分布度量如詹森-香农散度中可以检测到。诱导的行为配置文件也显示出部分特异性。为不同行为模式优化的模型在评估探针上表现出可区分的响应倾向,表明结构化行为训练产生的是差异化的策略层面偏差,而不是通用的分布偏斜。我们将这些发现解释为证据,表明在LLM中一致的行为优化可以生成与改变的潜在先验相关的稳定行为和分布模式,将动作选择和语言生成联系起来。更广泛地说,这些结果支持了LLM作为基于策略系统的观点,在其中行为约束塑造了涌现的表示结构,突显了它们作为研究行为、解释和生成语言之间关系的受控测试床的潜力。

英文摘要

Large language models are increasingly used as computational tools for modeling human-like behavior. We introduce a behavioral induction framework that modifies model policies through fine-tuning on structured decision-making tasks: using synthetic datasets inspired by maladaptive behavioral patterns, including depression and paranoia, we train transformer-based language models to consistently select specific classes of actions across diverse contexts. We then test whether this behavioral optimization produces systematic changes in generative distributions. Across two architectures, fine-tuned models show stable, context-general shifts in next-token probability distributions, including increased probability assigned to negative and threat-related interpretations in open-ended language tasks. These effects generalize beyond training contexts and are detectable in qualitative completions, psychometric-style evaluations, and quantitative distributional metrics such as Jensen-Shannon divergence. Induced behavioral profiles also show partial specificity. Models optimized for different behavioral patterns exhibit dissociable response tendencies across evaluation probes, suggesting that structured behavioral training produces differentiated policy-level biases rather than generic distributional skew. We interpret these findings as evidence that consistent behavioral optimization in LLMs can generate stable behavioral and distributional patterns consistent with altered latent priors, linking action selection and language generation. More broadly, the results support a view of LLMs as policy-based systems in which behavioral constraints shape emergent representational structure, highlighting their potential as controlled testbeds for studying the relationship between behavior, interpretation, and generative language in computational models of cognition.

2605.22355 2026-05-22 cs.CL cs.AI cs.LG

TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation

TransitLM: 一个大规模数据集和基准,用于无地图的公共交通路线生成

Hanyu Guo, Jiedong Yang, Chao Chen, Longfei Xu, Kaikui Liu, Xiangxiang Chu

发表机构 * Alibaba Group(阿里巴巴集团) AMAP

AI总结 本文提出TransitLM,一个包含1300万条公共交通路线规划记录的数据集,用于无地图的公共交通路线生成,展示了通过数据训练模型生成有效路线的能力。

详情
AI中文摘要

公共交通路线规划传统上依赖于结构化的地图基础设施和复杂的路由引擎,而现有的数据集不支持训练模型绕过这种依赖。我们提出了TransitLM,一个包含来自四个中国城市的超过1300万条公共交通路线规划记录的数据集,覆盖120,845个车站和13,666条线路,作为持续预训练语料库和用于三个评估任务的基准数据。实验表明,使用TransitLM训练的LLM能够生成结构上有效的路线,精度高,并且能够隐式地将任意GPS坐标映射到合适的车站,而无需显式映射。这些结果表明,公共交通路线规划可以完全从数据中学习,从而实现端到端、无地图的路线生成,直接从起止点信息生成。数据集和基准可在https://huggingface.co/datasets/GD-ML/TransitLM获取,评估代码在https://github.com/HotTricker/TransitLM。

英文摘要

Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency. We present TransitLM, a large-scale dataset of over 13 million transit route planning records from four Chinese cities covering 120,845 stations and 13,666 lines, released as a continual pre-training corpus and benchmark data for three evaluation tasks with complementary metrics. Experiments show that an LLM trained on TransitLM produces structurally valid routes at high accuracy and implicitly grounds arbitrary GPS coordinates to appropriate stations without any explicit mapping. These results demonstrate that transit route planning can be learned entirely from data, enabling end-to-end, map-free route generation directly from origin-destination information. The dataset and benchmark are available at https://huggingface.co/datasets/GD-ML/TransitLM, with evaluation code at https://github.com/HotTricker/TransitLM.

2605.22351 2026-05-22 cs.CV

QuantSR+: Pushing the Limit of Quantized Image Super-Resolution Networks

QuantSR+: 推动量化图像超分辨率网络的极限

Haotong Qin, Xudong Ma, Xianglong Liu, Jie Luo, Jinyang Guo, Michele Magno, Yulun Zhang

发表机构 * ETH Zurich(苏黎世联邦理工学院) Shanghai Jiao Tong University(上海交通大学) Beihang University(北京航空航天大学)

AI总结 本文提出QuantSR+框架,通过改进量化操作、网络设计和训练优化,实现了在精度和效率之间的更好平衡,针对超低精度下的性能下降问题,提出了三种关键技术贡献:重分布驱动位数确定、量化瘦身架构和瘦身引导的功能局部蒸馏。

详情
AI中文摘要

低比特量化广泛用于压缩超分辨率(SR)模型,以减少在资源受限设备上的存储和计算成本。然而,当SR模型被推向超低精度(2-4位)时,性能会因表示能力的降低和SR的细节敏感性而急剧下降。为了解决这些问题,我们提出QuantSR+,一个统一的框架,通过改进量化操作、网络设计和训练优化,实现了比先前低比特SR方法更好的精度和效率的权衡。QuantSR+主要依靠三个技术贡献:(1)重分布驱动位数确定(RBD),通过正向和反向传递中重塑量化分布,以保持表示保真度;(2)量化瘦身架构(QSA),从过参数化的模型开始,逐步剪枝不重要的块以满足效率预算,同时推动精度性能;(3)瘦身引导的功能局部蒸馏(SFD),通过直接损失和逐步的功能局部训练计划强制块感知的特征对齐,以更好地捕捉量化效果并加快收敛速度。广泛的实验表明,QuantSR+在专门的量化SR方法和通用量化方法上均实现了最先进的性能。对于SwinIR-S在Urban100(x4)上,它在2位SOTA基准上将PSNR提高了0.29 dB。同时,在2位下,它在操作数上减少了高达87.9%,存储上减少了89.4%。QuantSR+对卷积和基于Transformer的SR模型都有效,表明了广泛的应用性。

英文摘要

Low-bit quantization is widely used to compress super-resolution (SR) models and reduce storage and computation costs for deployment on resource-limited devices. However, when SR models are pushed to ultra-low precision (2-4 bits), performance can drop sharply due to diminished representational capacity and the detail-sensitive nature of SR. To address these issues, we propose QuantSR+, a unified framework that improves quantization operators, network design, and training optimization, achieving better trade-offs between accuracy and efficiency than prior low-bit SR methods. QuantSR+ mainly relies on three technical contributions: (1) Redistribution-driven Bit Determination (RBD), which reshapes quantization distributions in both forward and backward passes to preserve representation fidelity; (2) Quantized Slimmable Architecture (QSA), which begins with an over-parameterized model and progressively prunes less critical blocks to meet efficiency budgets while pushing the accuracy performance; and (3) Slimming-guided Function-localized Distillation (SFD), which enforces block-aware feature alignment via a direct loss and a progressive, function-local training schedule to capture quantization effects better and speed up convergence. Extensive experiments show that QuantSR+ achieves state-of-the-art performance against both specialized quantized SR methods and generic quantization approaches. For SwinIR-S on Urban100 (x4), it improves PSNR by 0.29 dB over the 2-bit SOTA baseline. Meanwhile, it delivers strong efficiency gains at 2-bit, reducing operations by up to 87.9% and storage by 89.4%. QuantSR+ is effective for both convolutional and transformer-based SR models, indicating broad applicability.

2605.22344 2026-05-22 cs.CV cs.AI cs.MM

Bernini: Latent Semantic Planning for Video Diffusion

Bernini: 视频扩散中的潜在语义规划

Bernini Team, Chenchen Liu, Junyi Chen, Lei Li, Lu Chi, Mingzhen Sun, Zhuoying Li, Yi Fu, Ruoyu Guo, Yiheng Wu, Ge Bai, Zehuan Yuan

发表机构 * Bernini Team(伯尼尼团队)

AI总结 本文提出Bernini框架,通过将大规模多模态语言模型用于语义规划,扩散模型用于像素生成,实现了视频生成与编辑的统一方法,提升了编辑任务的泛化能力。

Comments Project Page: https://bernini-ai.github.io/

详情
AI中文摘要

多模态大语言模型(MLLMs)和扩散模型各自已达到显著成熟度:MLLMs在处理异构多模态输入时具有强大的语义基础,而扩散模型则能以逼真度生成图像和视频。我们主张通过简单的分工统一这两类模型:MLLMs负责语义规划,扩散模型则根据高层语义指导和低层视觉特征生成像素。基于此思想,我们提出了Bernini,一个统一的视频生成与编辑框架。一个基于MLLM的规划器直接在ViT嵌入空间中预测目标语义表示,而基于DiT的渲染器则根据此计划生成像素,同时结合文本特征,并在编辑时引入源VAE特征以保留细节。因为语义作为接口,规划器和渲染器可以分别训练,并仅轻度联合训练,从而保留两者预训练的优势,同时保持训练效率。为更好地处理多种视觉输入,我们引入了Segment-Aware 3D Rotary Positional Embedding(SA-3D RoPE),并进一步在规划器中结合链式推理以更好地将理解转化为生成。Bernini在广泛的视频生成与编辑基准上均取得最先进的性能,MLLMs的预训练理解在挑战性的编辑任务上实现了强大的泛化能力。

英文摘要

Multimodal large language models (MLLMs) and diffusion models have each reached remarkable maturity: MLLMs excel at reasoning over heterogeneous multimodal inputs with strong semantic grounding, while diffusion models synthesize images and videos with photorealistic fidelity. We argue that these two families can be unified through a simple division of labor: MLLMs perform semantic planning, while diffusion models render pixels from high-level semantic guidance and low-level visual features. Building on this idea, we propose Bernini, a unified framework for video generation and editing. An MLLM-based planner predicts the target semantic representation directly in the ViT embedding space, and a DiT-based renderer synthesizes pixels conditioned on this plan, augmented by text features and, for editing, source VAE features for detail preservation. Because semantics serve as the interface, the planner and renderer can be trained separately and only lightly co-trained, preserving the pretrained strengths of both components while keeping training efficient. To better handle multiple visual inputs, we introduce Segment-Aware 3D Rotary Positional Embedding (SA-3D RoPE), and further incorporate chain-of-thought reasoning in the planner to better transfer understanding into generation. Bernini achieves state-of-the-art performance across a wide range of video generation and editing benchmarks, with the MLLM's pretrained understanding translating into strong generalization on challenging editing tasks.

2605.22342 2026-05-22 cs.CV cs.AI

4D-GSW: Kinematic-Aware Spatio-Temporal Consistent Watermarking for 4D Gaussian Splatting

4D-GSW: 4D高斯点散布的运动感知空间-时间一致水印技术

Sifan Zhou, Hang Zhang, Yuhang Wang, Ming Li

发表机构 * Southeast University(东南大学) Guangdong Laboratory of Artificial Intelligence and Digital Economy(广东人工智能与数字经济实验室) University of Chinese Academy of Sciences(中国科学院大学)

AI总结 本文提出4D-GSW,一种运动感知的空间-时间一致水印技术,用于在4D高斯点散布中嵌入鲁棒的版权信息,同时保持高空间-时间一致性。

Comments 9 pages main paper, 7 figures, 18 pages in total

详情
AI中文摘要

尽管4D高斯点散布(4DGS)已革新了高保真的动态重建,但保护这些资产的知识产权仍是一个开放性挑战。传统隐写技术常常忽视底层的运动流形,导致非物理的伪影,如严重的时序闪烁和"FVD崩溃"。为了解决这个问题,我们提出了4D-GSW,一种运动感知的水印框架,旨在嵌入鲁棒的版权信息同时保持高空间-时间一致性。与以往的4D隐写技术不同,我们的方法明确处理运动轨迹的物理一致性。我们引入了空间-时间曲率(STC)度量来识别"动态瞬间",并自适应地门控水印梯度注入,以保护关键运动流形免受非物理扰动。为了确保复杂变形中的全局一致性,我们提出了联合HMM-MRF能量最小化模型,该模型同步水印相位在时间轨迹和空间邻域内。此外,一种各向异性梯度路由机制确保水印嵌入严格脱离光度重建保真度。大量实验表明,我们的方法在鲁棒隐藏水印的同时,能够抵抗各种攻击并保持高质量的渲染质量和空间-时间一致性。

英文摘要

While 4D Gaussian Splatting (4DGS) has revolutionized high-fidelity dynamic reconstruction, safeguarding the intellectual property of these assets remains an open challenge. Conventional steganographic techniques often neglect the underlying kinematic manifolds, triggering non-physical artifacts such as severe temporal flickering and "FVD collapse". To address this, we propose \textbf{4D-GSW}, a kinematic-aware watermarking framework designed to embed robust copyright information while preserving high spatio-temporal consistency. Unlike prior 4D steganography that primarily focuses on opacity-guided invisibility, our approach explicitly addresses the physical coherence of motion trajectories. We introduce a \textbf{Spatio-Temporal Curvature (STC)} metric to identify "Dynamic Instants," adaptively gating watermark gradient injection to shield critical motion manifolds from non-physical perturbations. To ensure global coherence across complex deformations, we formulate a joint \textbf{HMM-MRF energy minimization} model that synchronizes watermark phases within both temporal trajectories and spatial neighborhoods. Furthermore, an \textbf{anisotropic gradient routing} mechanism ensures that watermark embedding remains strictly decoupled from photometric reconstruction fidelity. Extensive experiments have demonstrated the superior performance of our method in robustly hiding watermarks while resisting various attacks and maintaining high rendering quality and spatiotemporal consistency.

2605.22341 2026-05-22 cs.LG cond-mat.dis-nn

A Boundary-Layer Mechanism for One-Third Scaling in Online Softmax Classification

一种用于在线Softmax分类中三分之一缩放的边界层机制

Marcel Kühn, Yoon Thelge, Bernd Rosenow

发表机构 * Institute for Theoretical Physics, Leipzig University(理论物理研究所,莱比锡大学) ScaDS.AI Dresden/Leipzig(ScaDS.AI 德累斯顿/莱比锡)

AI总结 本文研究了在线教师-学生模型中平滑替代损失与离散标签之间的不匹配如何产生幂律学习曲线的边界层机制,揭示了测试损失和泛化误差的α^{-1/3}缩放特性,以及学习率调度对泛化误差的改进。

Comments 20 pages, 7 figures

详情
AI中文摘要

硬标签分类通常使用平滑替代损失进行训练,最典型的是交叉熵。我们隔离了一个渐近机制,即这种平滑替代损失与离散标签之间的不匹配在在线教师-学生模型中产生幂律学习曲线。在减去平均logit后,热力学极限动态在中心变量中闭合:一个增长的中心学生-教师对齐D和残余学生方差Δ。在晚期时间,远离教师决策边界的例子已被自信分类并贡献指数级很小。只有宽度为O(D^{-1})的边界层仍活跃,而固定学习率的在线梯度下降噪声保持非零的Δ。作为训练时间α的函数,晚期解产生α^{-1/3}的幂律,不仅适用于测试损失,还适用于泛化误差ε_g,即1减去测试准确率。这比相同模型的贝叶斯最优参考α^{-1}要慢得多。我们进一步表明,学习率调度可以将泛化误差改进到ε_g ~ α^{-1/2}的幂律。模拟支持预测的序参量动态和学习曲线。使用相关高斯输入和白化预训练特征的受控实验表明,数据结构可以主导瞬态。因此,我们的结果是一种渐近的、补充的机制,而不是神经缩放定律频谱解释的替代方案。

英文摘要

Hard-label classification is usually trained with smooth surrogate losses, most prominently softmax cross-entropy. We isolate an asymptotic mechanism by which this mismatch between smooth surrogate and discrete labels produces power-law learning curves in an online teacher-student model. After subtracting the mean logit, the thermodynamic-limit dynamics close in centered variables: a growing centered student-teacher alignment $D$ and the residual student variance $Δ$. At late times, examples away from teacher decision boundaries are already classified confidently and contribute exponentially little. Only boundary layers of width $O(D^{-1})$ remain active, while the noise of fixed-learning-rate online gradient descent maintains a nonzero $Δ$. As a function of the training time $α$ the late-time solution yields a $α^{-1/3}$ power law not only for the test loss but also for the generalization error $ε_g$, i.e., one minus test accuracy. This is much slower than the $α^{-1}$ Bayes-optimal reference for the same model. We further show that learning-rate schedules can improve the generalization error towards a $ε_g \sim α^{-1/2}$ power law. Simulations support the predicted order parameter dynamics and learning curves. Controlled experiments with correlated Gaussian inputs and whitened pretrained features show that data structure can dominate transients. Therefore, our result is an asymptotic, complementary mechanism rather than an alternative to spectral explanations of neural scaling laws.

2605.22340 2026-05-22 cs.LG

From Snapshots to Trajectories: Learning Single-Cell Gene Expression Dynamics via Conditional Flow Matching

从快照到轨迹:通过条件流匹配学习单细胞基因表达动力学

Siyu Pu, Qingqing Long, Xiaohan Huang, Haotian Chen, Jiajia Wang, Meng Xiao, Xiao Luo, Hengshu Zhu, Yuanchun Zhou, Xuezhi Wang

发表机构 * Computer Network Information Center, Chinese Academy of Sciences(中国科学院计算机网络信息中心) University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 本文提出单细胞流匹配(scFM)方法,通过条件流匹配学习单细胞基因表达的动力学,解决时间点不连续和长时间预测中的分布漂移问题,提升轨迹推断的准确性和时间一致性。

详情
AI中文摘要

单细胞RNA测序(scRNA-seq)提供了细胞状态的高维轮廓,使能够驱动建模细胞动态随时间变化。实际上,时间分辨的scRNA-seq仅在几个离散时间点收集为不配对的快照群体,留下显著的时间间隙。这激励了在未测量时间点进行轨迹推断。现有方法主要沿着两个方向发展,最优传输(OT)对齐在观测快照之间提供分布层面的匹配,而连续时间生成模型支持通过学习的动力学进行预测。然而,仍存在两个挑战:(i)不配对的快照导致相邻时间点之间的局部转换模糊,导致监督不稳定;(ii)长时间预测依赖于重复积分,其中小的建模误差会累积并导致分布漂移。为了解决这些挑战,我们提出单细胞流匹配(scFM),一种基于耦合条件流匹配的潜在生成框架。首先,我们计算熵正则化的OT耦合在相邻快照之间,并使用它们来构建软加权流匹配目标,以学习时间依赖的速度场。其次,我们学习双向速度场,并利用其一致性来细化耦合并改进稀疏监督下的时间一致性。第三,我们引入分布层面的对齐和潜在动态正则化,以锚定长时间滚动并缓解漂移。在真实世界的时间序列scRNA-seq数据集上的实验表明,scFM在时间插值和外推的分布预测性能上始终有所提高。此外,scFM在中间时间点缺失的情况下产生更准确的轨迹重建和时间一致的可视化,表明对潜在时间基因表达动力学的更忠实恢复。

英文摘要

Single-cell RNA sequencing (scRNA-seq) provides high-dimensional profiles of cellular states, enabling data-driven modeling of cellular dynamics over time. In practice, time-resolved scRNA-seq is collected at only a few discrete time points as unpaired snapshot populations, leaving substantial temporal gaps. This motivates trajectory inference at unmeasured time points. Existing methods mainly follow two directions, optimal-transport (OT) alignment provides distribution-level matching between observed snapshots, while continuous-time generative models support forecasting via learned dynamics. However, two challenges remain: (i) unpaired snapshots render local transitions between adjacent time points ambiguous, leading to unstable supervision; and (ii) long-horizon prediction relies on repeated integration, where small modeling errors compound and cause distribution drift. To address these challenges, we propose single-cell Flow Matching (scFM), a latent generative framework based on coupling-conditioned flow matching. First, we compute entropically regularized OT couplings between adjacent snapshots and use them to construct soft, weighted flow-matching targets for learning time-dependent velocity fields. Second, we learn bidirectional velocity fields and leverage their consistency to refine couplings and improve temporal coherence under sparse supervision. Third, we introduce distribution-level alignment and latent dynamic regularization to anchor long rollouts and mitigate drift. Experiments on real-world time-series scRNA-seq datasets show that scFM consistently improves distributional prediction performance for both temporal interpolation and extrapolation. Moreover, scFM yields more accurate trajectory reconstruction and temporally coherent visualizations where intermediate time points are absent, indicating a more faithful recovery of underlying temporal gene expression dynamics.

2605.22338 2026-05-22 cs.LG

Physics-Informed Generative Solver: Bridging Data-Driven Priors and Conservation Laws for Stable Spatiotemporal Field Reconstruction

物理引导的生成求解器:连接数据驱动先验与守恒定律以稳定时空场重建

Ziyuan Zhu, Keyu Hu, Zhifei Chen, Yuhao Shi, Ming Bao, Jing Zhao, Gang Wang, Haitan Xu, Jiadong Li, Qijun Zhao, Xiaodong Li, Minghui Lu, Yanfeng Chen

发表机构 * School of Advanced Manufacturing Engineering, Nanjing University(南京大学先进制造工程学院) National Laboratory of Solid State Microstructures, Nanjing University(南京大学固态微结构国家实验室) Suzhou Acoustics Industry Technology Research Institute Co., Ltd.(苏州声学工业技术研究所有限公司) School of Mechanical and Electric Engineering, Soochow University(苏州大学机械与电子工程学院) Shishan Laboratory, Nanjing University(仙山实验室)

AI总结 本文提出了一种物理引导的生成求解器,通过分离稳定的先验学习与推理时的守恒定律强制执行,解决了从稀疏测量中重建连续物理场的问题,同时在声学和气象学中实现了高效且稳定的场重建。

详情
AI中文摘要

从稀疏测量中重建连续物理场是一个核心的逆问题,但数据驱动的生成模型可能会生成违反支配动力学的状态。我们引入了一种物理引导的生成求解器,将稳定的先验学习与推理时的守恒定律强制执行分离。Martingale-Regularized Score Matching通过Score Fokker-Planck约束正则化Score预训练,从而获得动态稳定的先验。Physics-Informed Implicit Score Sampling则通过物理残差的梯度引导去噪轨迹,将样本投影到可接受的流形上而无需重新训练。在声学中,该方法从稀疏传感器共同生成压力和粒子速度,使密集的虚拟阵列得以抑制空间混叠。相同的框架在极端稀疏的现实世界ERA5气象场中也具有泛化能力。一起,这项工作建立了一个严谨且可推广的范式,用于解决高维逆问题,弥合了生成人工智能与第一原理科学之间的差距。

英文摘要

Reconstructing continuous physical fields from sparse measurements is a central inverse problem, but data-driven generative models can produce states that violate governing dynamics. We introduce a physics-informed generative solver that separates stable prior learning from inference-time enforcement of conservation laws. Martingale-Regularized Score Matching regularizes score pretraining with a Score Fokker-Planck constraint, yielding a dynamically stable prior. Physics-Informed Implicit Score Sampling then guides denoising trajectories by gradients of physical residuals, projecting samples toward admissible manifolds without retraining. In acoustics, the method co-generates pressure and particle velocity from sparse sensors, enabling dense virtual arrays that suppress spatial aliasing. The same framework generalizes to real-world ERA5 meteorological fields under extreme sparsity. Together, this work establishes a rigorous and generalizable paradigm for solving high-dimensional inverse problems, bridging the gap between generative artificial intelligence and first-principles science.

2605.22335 2026-05-22 cs.LG

Learning Causal Orderings for In-Context Tabular Prediction

在上下文中的表格预测中学习因果顺序

Sascha Xu, Sarah Mameche, Jilles Vreeken

发表机构 * Department of XXX, University of YYY, Location, Country(XXX系,YYY大学,地点,国家) School of ZZZ, Institute of WWW, Location, Country(ZZZ学院,WWW研究所,地点,国家)

AI总结 本文研究了如何在表格预测中同时推断和强制因果结构,通过拓扑变量顺序形式进行因果结构推断,提出TabOrder模型利用因果顺序约束注意力机制,在学习的因果顺序下仅基于先于目标的特征进行预测,并通过似然目标无监督学习最优变量顺序,同时探讨了样本缺失对因果方向识别的影响。

详情
AI中文摘要

在上下文学习中,表格数据集在观测设置中具有强大的预测标准;然而,它主要依赖于相关结构,这在分布偏移或干预下变得不可靠。虽然已建立的方法可用于发现因果结构,但它们通常专注于结构可识别性,并与可能从中受益的预测架构解耦。为了弥合这些视角,我们研究了如何在表格预测中同时推断和强制因果结构,以拓扑变量顺序的形式。与标准架构不同,我们的模型TabOrder使用因果顺序约束注意力,基于学习的因果顺序下仅使用先于目标的特征进行预测。类似于因果发现方法,TabOrder通过基于似然的目标无监督学习最优变量顺序。我们在此选择下标准函数模型类别,并研究了样本缺失,这是表格数据中常见的挑战,如何与因果方向识别相互作用。经验上,我们确认TabOrder在恢复准确的变量顺序的同时,解决了预测和填补任务,并在干预下为现实世界生物数据提供了见解。

英文摘要

In-context learning for tabular data sets strong predictive standards in observational settings; it however primarily relies on correlational structure, which becomes unreliable under distribution shift or intervention. While established methods to discover causal structure exist, they are often focused on structure identifiability and decoupled from the predictive architectures that could benefit from them. To bridge these perspectives, we study how to simultaneously infer and enforce causal structure in the form of topological variable orderings into tabular prediction. Unlike standard architectures, our model TabOrder uses causal order-constrained attention, basing predictions only on features that precede a target under a learned causal order. Similar to causal discovery methods, TabOrder learns the optimal variable ordering in an unsupervised manner through a likelihood-based objective. We justify this choice under standard functional model classes and also study how sample missingness, a common challenge in tabular data, interacts with causal direction identification. Empirically, we confirm that TabOrder recovers accurate variable orderings while addressing prediction and imputation tasks, as well as gives insight into real-world biological data under intervention.

2605.22334 2026-05-22 cs.LG

Riemannian geometry meets fMRI: the advantages of modeling correlation manifolds and eigenvector subspaces

黎曼几何与fMRI的结合:建模相关流形和特征向量子空间的优势

Mario Severino, Manuela Moretto, Robert A. McCutcheon, Mattia Veronese

发表机构 * Department of Information Engineering, University of Padova(信息工程系,帕多瓦大学) Department of Neuroimaging, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King’s College London(神经影像系,精神病学、心理学与神经科学研究所(IoPPN),伦敦国王学院) Department of Psychiatry, University of Oxford(精神病学系,牛津大学) Oxford Health NHS Foundation Trust, Warneford Hospital(牛津健康国家卫生信托基金,沃内福德医院) Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King’s College London(精神病学研究系,精神病学、心理学与神经科学研究所,伦敦国王学院)

AI总结 本文提出了一种可扩展的几何框架,通过Off-log度量和Grassmannian子空间判别方法,改进了fMRI数据的分析,提高了敏感性和预测性能。

详情
AI中文摘要

相关矩阵是功能脑网络的基本总结,但标准分析通常将条目独立处理,忽略了相关空间的曲面几何。现有的几何方法往往缺乏闭式运算或依赖任意区域排序,限制了可扩展性。我们引入了一种可扩展的几何框架,包含两个组成部分:(i)Off-log度量,一种平滑变换将相关矩阵映射到对称零对角矩阵。这使得距离、弗雷歇均值和线性模型的闭式表达成为可能,允许标准统计建模而无需复杂的流形优化。(ii)Grassmannian子空间判别,通过特征向量子空间之间的主角距离比较受试者,解决固有的符号和基底模糊性。这两个组成部分可以集成到标准机器学习工作流中进行推断、回归和分类。在两个临床队列(帕金森病和精神分裂症)和三个衰老fMRI数据集上得到验证,Off-log度量在置换检验中提高了灵敏度,并在分类中与黎曼和欧几里得基线匹配或超过。脑年龄预测性能相当,其中黎曼度量在两个队列中表现最佳。Grassmannian方法始终优于欧几里得基线,突显了与疾病相关的网络。总体而言,几何意识的表示提高了灵敏度和预测性能,同时在大规模部署时仍保持简单。

英文摘要

Correlation matrices are fundamental summaries of functional brain networks, yet standard analyses often treat entries independently, ignoring the curved geometry of correlation space. Existing geometric methods frequently lack closed-form operations or depend on arbitrary region ordering, limiting scalability. We introduce a scalable geometric framework with two components: (i) the Off-log metric, a smooth transformation mapping correlation matrices to symmetric zero-diagonal matrices. This enables closed-form expressions for distances, Frechet means, and linear models, allowing standard statistical modeling without complex manifold optimization. (ii) Grassmannian subspace discrimination, which compares subjects via principal-angle distances between eigenvector subspaces, resolving inherent sign and basis ambiguities. Both components integrate into standard machine-learning workflows for inference, regression, and classification. Validated across two clinical cohorts (Parkinson's and psychosis) and three ageing fMRI datasets, the Off-log metric increased sensitivity in permutation tests and matched or exceeded Riemannian and Euclidean baselines in classification. Brain-age prediction performance was comparable, with Riemannian metrics excelling in two of three cohorts. The Grassmannian method consistently outperformed Euclidean baselines, highlighting disease-relevant networks. Overall, geometry-aware representations improve sensitivity and predictive performance while remaining straightforward to deploy at scale.

2605.22331 2026-05-22 cs.LG cs.AI cs.DC

SepsisAI Orchestrator: A Containerized and Scalable Platform for Deploying AI Models and Real-Time Monitoring in Early Sepsis Detection

SepsisAI Orchestrator:一个容器化和可扩展的平台,用于部署AI模型和实时监控以实现早期败血症检测

Santiago Ospitia, John Sanabria, John Garcia-Henao

发表机构 * School of Systems Engineering and Computing, University of Valle(系统工程与计算学院,山谷大学) Digital Medicine Unit, Balgrist University Hospital(数字医学单元,巴尔格里斯大学医院) Nucleus-AI Research(核芯AI研究所)

AI总结 本文提出SepsisAI-Orchestrator平台,通过整合HL7 FHIR启发的临床文档架构(CDA)预处理、NoSQL存储、容器化LightGBM分类器和Streamlit临床仪表板,解决了早期败血症检测中AI模型部署的挑战,并通过负载测试展示了U型扩展行为。

Comments 13 pages, 5 figures. Submitted to BioCARLA 2025 Workshop

详情
AI中文摘要

尽管在临床机器学习文献中预测结果强劲,但将这些模型转化为床边使用仍然受限于系统层面的障碍:异构数据表示、缺乏标准化的部署流程以及研究原型与医院环境的并发性和延迟需求之间的不匹配。我们提出了SepsisAI-Orchestrator,一个开源的模块化平台,旨在解决早期败血症检测中的部署缺口。该平台集成了HL7 FHIR启发的临床文档架构(CDA)预处理、NoSQL存储、通过REST API服务的容器化LightGBM分类器和Streamlit临床仪表板,并通过Docker和Kubernetes进行协调。一个之前已验证的LightGBM模型(在PhysioNet 2019上的F1值为0.87-0.94)在不进行修改的情况下被重用;贡献在于周围基础设施及其在负载下的实证表征。使用k6进行50-1000个并发虚拟用户测试,我们发现副本数量必须与主机的物理CPU线程数匹配:在12线程CPU上从3个副本扩展到12个副本,将p95延迟从3.3秒减少到1.41秒(减少57.3%)并消除所有请求失败,而过度配置到24或48个副本则由于调度器竞争导致性能下降。据我们所知,这种U型扩展行为此前尚未对临床AI推理工作负载进行量化。我们不声称具有前瞻性临床验证。源代码和部署清单可在https://github.com/nucleusai/sepsisai-orchestrator获取。

英文摘要

Despite strong predictive results in the clinical machine learning literature, the translation of these models into bedside use remains limited by systems-level barriers: heterogeneous data representations, the absence of standardized deployment workflows, and a mismatch between research prototypes and the concurrency and latency requirements of hospital environments. We present the SepsisAI-Orchestrator, an open-source modular platform that addresses this deployment gap for early sepsis detection. The platform integrates HL7 FHIR-inspired Clinical Document Architecture (CDA) preprocessing, NoSQL storage, a containerized LightGBM classifier served via REST APIs, and a Streamlit clinical dashboard, orchestrated with Docker and Kubernetes. A previously validated LightGBM model (F1 0.87-0.94 on PhysioNet 2019) is reused without modification; the contribution lies in the surrounding infrastructure and its empirical characterization under load. Using k6 with 50-1000 concurrent virtual users, we find that replica count must be matched to the physical CPU thread count of the host: scaling from 3 to 12 replicas on a 12-thread CPU reduces p95 latency from 3.3s to 1.41s (57.3% reduction) and eliminates all request failures, while over-provisioning to 24 or 48 replicas degrades performance due to scheduler contention. To our knowledge this U-shaped scaling behavior has not been quantified previously for clinical AI inference workloads. We do not claim prospective clinical validation. Source code and deployment manifests are available at https://github.com/nucleusai/sepsisai-orchestrator.

2605.22328 2026-05-22 cs.CV

3D LULC classification using multispectral LiDAR and deep learning: current and prospective schemes

基于多光谱LiDAR和深度学习的3D土地利用/覆盖分类:当前和未来方案

Narges Takhtkeshha, Aldino Rizaldy, Markus Hollaus, Juha Hyyppä, Fabio Remondino, Gottfried Mandlburger

发表机构 * D Optical Metrology (3DOM) Unit, Bruno Kessler Foundation (FBK)(3D光学计量(3DOM)单元,布鲁诺·凯塞尔基金会(FBK)) Department of Geodesy and Geoinformation, TU Wien(测绘与地理信息系,维也纳技术大学) Helmholtz-ZentrumDresden-Rossendorf (HZDR), Helmholtz Institute Freiberg for Resource Technology (HIF)(德累斯顿-罗斯托克赫尔姆霍尔茨研究中心(HZDR),资源技术赫尔姆霍尔茨研究所(HIF)) Freie Universität Berlin, Remote Sensing and Geoinformatics(柏林自由大学,遥感与地理信息学) Department of Remote Sensing and Photogrammetry, Finnish Geospatial Research Institute FGI, The National Land Survey of Finland(芬兰地理研究所(FGI),芬兰国家土地测绘局)

AI总结 本文提出了一种基于多光谱LiDAR和深度学习的3D土地利用/覆盖分类方法,介绍了NMCA对齐的L1和L2分类方案,并引入了一个新的多光谱LiDAR基准数据集,评估了七种最先进的深度学习模型,并展示了光谱信息对分类性能的提升。

详情
AI中文摘要

土地利用/覆盖(LULC)分类对于国家3D制图、地理空间分析和可持续规划至关重要。多光谱(MS)LiDAR提供同步的空谱信息,深度学习(DL)能够实现3D点云语义分割;然而,其应用受限于缺乏与国家制图和地籍机构(NMCAs)分类方案对齐的公开可用的城市和郊区MS LiDAR数据集。本文通过引入L1和L2 NMCA对齐的LULC分类方案和一个新的多光谱LiDAR数据集来填补这些空白。我们评估了七种最先进的深度学习模型,并在两个细节层次上进行了光谱消融研究。结果表明,Point Transformer V3在使用双波长LiDAR系统(532 nm和1064 nm)时,分别在L1(8类)和L2(20类)上实现了79.4%和58.9%的mIoU。消融结果表明,多光谱信息在几何信息基础上提升了性能,分别在L1和L2上提升了1.1个百分点和7.8个百分点。这些结果突显了LiDAR反射率在细粒度材料识别中的价值,并支持NMCA LULC方案向更高语义细节演进。Loosdorf-MSL数据集为一致的国家和国际LULC制图提供了新的基准。

英文摘要

Land Use Land Cover (LULC) classification is essential for national 3D mapping, geospatial analysis, and sustainable planning. Multispectral (MS) LiDAR provides synchronized spatial-spectral information, and deep learning (DL) enables 3D point cloud semantic segmentation; however, adoption is limited by the lack of publicly available urban and suburban MS LiDAR datasets aligned with National Mapping and Cadastral Agencies (NMCAs) classification schemes. This study addresses these gaps by introducing L1 and L2 NMCA-aligned LULC classification schemes and a new benchmark MS LiDAR dataset. We evaluate seven state-of-the-art DL models and perform spectral ablation studies at both levels of detail. Results show that Point Transformer V3 achieves the best performance, with mIoU of 79.4% (L1, 8 classes) and 58.9% (L2, 20 classes) using a dual-wavelength LiDAR system (532 nm and 1064 nm). Ablation results show that multispectral information improves performance over geometry-only inputs, with gains of 1.1 percentage points at L1 and 7.8 points at L2. These results highlight the value of LiDAR reflectance for fine-grained material discrimination and support the evolution of NMCA LULC schemes toward higher semantic detail. The Loosdorf-MSL dataset contributes a new benchmark for consistent national and international LULC mapping.

2605.22327 2026-05-22 cs.CV physics.med-ph

Robustness of breast lesion segmentation under MRI undersampling improves with k-space-aware deep learning

在MRI欠采样下,基于k空间的深度学习改进了乳腺病变分割的鲁棒性

Lukas T. Rotkopf, Marco Schlimbach, Julius C. Holzschuh, Heinz-Peter Schlemmer, Jens Kleesiek, Moritz Rempe

发表机构 * Institute for AI in Medicine (IKIM), University Hospital Essen(人工智能医学研究所(IKIM),埃森大学医院) Department of Physics, Technical University Dortmund(物理系,多特蒙德技术大学) Division of Radiology, German Cancer Research Center (DKFZ)(放射学部,德国癌症研究中心(DKFZ)) Cancer Research Center Cologne Essen (CCCE), University Medicine Essen(科隆埃森癌症研究中心(CCCE),埃森大学医学中心) RACOON Study Group, Site Essen(RACOON研究组,埃森站点) German Cancer Consortium (DKTK), Partner Site Essen(德国癌症联合会(DKTK),埃森合作伙伴站点) Medical Faculty and Faculty of Computer Science, University of Duisburg-Essen(医学系和计算机科学系,杜伊斯堡-埃森大学)

AI总结 本文研究了直接从获得的MRI k空间学习乳腺病变分割是否能提高在加速或噪声下的鲁棒性,通过比较不同模型发现基于k空间的深度学习方法在欠采样和噪声下表现更优。

详情
AI中文摘要

目的:评估是否可以直接从获得的MRI k空间学习乳腺病变分割,并判断在数据加速或噪声情况下这种学习方式是否能提高鲁棒性。材料和方法:本回顾性研究使用了公开的乳腺动态对比增强MRI(DCE-MRI)数据集,包含获得的和合成的k空间,以及数据集内的合成对照。我们比较了四种3D U-Net变体:混合k空间到图像模型、原生k空间模型以及幅度和复数图像空间基线。模型在增加的欠采样和添加复数高斯k空间噪声下进行评估。主要结果是交叉验证下的患者级Dice相似性系数,其中混合模型被预设为主要比较对象,与幅度图像空间基线进行比较。结果:在完全采样下,混合模型和图像空间模型表现相似。随着加速增加,混合模型在欠采样水平中保持了显著的分割准确性,并在中等至高欠采样水平上显著优于幅度图像空间基线。当直接向k空间添加噪声时,相同模式被观察到:混合模型退化更慢,而图像空间基线在更重噪声下失败。这种优势在数据集内的合成对照中被重复验证。特征分析表明,k空间阶段和图像空间阶段发挥了互补作用,频率域过滤集中在图像域病变定位之前。结论:基于k空间的深度学习在MRI欠采样和k空间噪声下提高了乳腺病变分割的鲁棒性,同时在完全采样下与图像空间方法相当。

英文摘要

Purpose: To assess whether breast lesion segmentation can be learned directly from acquired MRI k-space, and whether doing so improves robustness when data are accelerated or noisy. Materials and Methods: This retrospective study used public breast dynamic contrast-enhanced MRI (DCE-MRI) datasets with acquired and synthetic k-space, together with a within-dataset synthetic control. We compared four 3D U-Net variants: a hybrid k-space-to-image model, a native k-space model, and magnitude and complex image-space baselines. Models were evaluated under increasing undersampling and added complex Gaussian k-space noise. The primary outcome was patient-level Dice similarity coefficient under cross-validation, with the hybrid model prespecified as the main comparison against the magnitude image-space baseline. Results: At full sampling, the hybrid and image-space models performed similarly. As acceleration increased, the hybrid model retained substantially more segmentation accuracy and significantly outperformed the magnitude image-space baseline across moderate to high undersampling levels. The same pattern was observed when noise was added directly to k-space: the hybrid model degraded more slowly, whereas the image-space baseline failed under heavier noise. This advantage was reproduced in the within-dataset synthetic control. Feature analysis suggested that the k-space stage and image-space stage played complementary roles, with frequency-domain filtering concentrated before image-domain lesion localization. Conclusion: K-space-aware deep learning improves the robustness of breast lesion segmentation under MRI undersampling and k-space noise, while matching image-space methods at full sampling.

2605.22322 2026-05-22 cs.RO

How can reasoning capability empower the AI copilot robot in endoscopic surgery

推理能力如何赋能内窥手术中的AI助手机器人

Guankun Wang, Long Bai, Hongliang Ren

发表机构 * Department of Electronic Engineering(电子工程系)

AI总结 本文研究了推理能力在内窥手术中AI助手机器人中的应用,提出通过整合多模态线索、解读手术意图和推断隐藏组织动态来提高手术的精确性、安全性和可持续性。

Comments Accepted by npj digital medicine

详情
AI中文摘要

推理能力已显著提升了复杂逻辑推理和机器人决策制定在一般领域的能力。然而,其在人工智能(AI)助手机器人——特别是基于视觉-语言-动作(VLA)模型实现——在内窥手术中的潜力仍待探索。有效的推理应使AI助手机器人能够整合多模态线索、解读手术意图并推断隐藏的组织动态,从而缓解术中不确定性和对外科医生的认知负担。正确实施的推理驱动自主性可以将AI助手机器人从被动执行者转变为认知合作者,从而在临床实践中提高精确性、安全性和可持续性。

英文摘要

Reasoning capability has significantly advanced complex logical inference and robotic decision-making in general domains. However, its potential in the Artificial Intelligence (AI) copilot robot-particularly implemented based on the Vision-Language-Action (VLA) model-remains unexplored in endoscopic surgery. Effective reasoning should enable AI copilot robots to integrate multimodal cues, interpret surgical intent, and infer hidden tissue dynamics, thereby alleviating intraoperative uncertainty and cognitive burden on surgeons. Properly implemented, reasoning-driven autonomy can transform AI copilot robots from reactive executors into cognitive collaborators, enhancing precision, safety, and sustainability in clinical practice.

2605.22311 2026-05-22 cs.CV

PIU: Proximity-guided Identity Unlearning in ID-Conditioned Diffusion Models

PIU:基于接近性的身份去学习在ID条件化的扩散模型中

Jose Edgar Hernandez Cancino Estrada, Mauro Díaz Lupone, Žiga Emeršič, Vitomir Štruc, Peter Peer, Darian Tomašević

发表机构 * University of Ljubljana, Faculty of Computer and Information Science(卢布尔雅那大学计算机与信息科学系) University of Ljubljana, Faculty of Electrical Engineering(卢布尔雅那大学电子工程系)

AI总结 本文研究了在ID条件化的扩散模型中身份去学习的问题,提出了一种基于接近性的身份去学习框架PIU,通过在学习的身份空间中重新分配源身份到选定的锚身份来实现身份移除,并结合基于ArcFace表示几何的锚点选择策略,通过局部微调少量身份敏感的交叉注意力层实现有效的去学习。

详情
AI中文摘要

身份条件化的扩散模型能够生成高质量且身份一致的面部图像,但它们也引发了严重的隐私问题,因为模型可能在个人被遗忘后仍继续合成个体。尽管机器去学习已被广泛研究用于概念和数据删除,但身份去学习仍鲜有探索,特别是在直接基于身份嵌入而非文本提示的模型中。在本文中,我们研究了Arc2Face,一个最先进的身份条件化的潜在扩散模型用于面部生成,并引入了基于接近性的身份去学习(PIU),一种锚点引导的身份去学习框架。具体而言,我们将身份移除建模为身份替换目标,该目标将源身份重新分配到学习身份空间中选定的锚身份,并补充了受ArcFace表示几何启发的基于接近性的锚点选择策略。我们进一步表明,通过局部微调少量身份敏感的交叉注意力层可以实现有效的去学习。在许多目标身份上的实验表明,我们的框架能够有效抑制目标身份的生成,同时保持保留身份的真实性和身份一致性,这通过改进的去学习和图像质量指标以及定性评估得到验证。PIU框架的源代码可在https://github.com/edgarcancinoe/piu_unlearning 公开获取。

英文摘要

Identity-conditioned diffusion models enable high-quality and identity-consistent face generation, but they also raise severe privacy concerns, as models may continue to synthesize individuals despite their right to be forgotten. While machine unlearning has been extensively studied for concept and data removal, identity unlearning remains largely unexplored, particularly in models conditioned directly on identity embeddings rather than text prompts. In this work, we study identity unlearning in Arc2Face, a state-of-the-art identity-conditioned latent diffusion model for face generation, and introduce Proximity-guided Identity Unlearning (PIU), an anchor-guided framework for identity unlearning. Specifically, we formulate identity removal as an identity replacement objective that reassigns the source identity to a selected anchor identity in the learned identity space, and we complement it with a proximity-based anchor selection strategy motivated by the geometry of ArcFace representations. We further show that effective unlearning can be achieved through localized fine-tuning of a small subset of identity-sensitive cross-attention layers. Experiments across many target identities show that our framework effectively suppresses generation of the target identity while preserving realism and identity consistency for retained identities, as validated by improved performance on unlearning and image-quality metrics, together with qualitative evaluation. The source code for the PIU framework is publicly available at https://github.com/edgarcancinoe/piu_unlearning .

2605.22310 2026-05-22 cs.CL

Pattern-and-root inflectional morphology: the Arabic broken plural

词形和根的屈折形态:阿拉伯语的破碎复数

Alexis Amid Neme, Eric Laporte

发表机构 * LIGM, Université Paris-Est(LIGM,巴黎-est大学) DLL, Universidade Federal do Espírito Santo(DLL,弗拉门杜斯皮里霍萨联邦大学)

AI总结 本文提出了一种对阿拉伯语名词屈折形态的描述模型,重点在于阿拉伯语学者在管理词典和其他语言资源时的处理方式。其突破在于将传统的根-词形塞语模型反转为词形-根模型,优先考虑词形。该模型包括破碎复数(BPs),即通过修改词干形成的复数。它基于传统的塞语形态学中的根和词形概念。与传统阿拉伯语形态学相比,它将屈折的正式描述与派生和语义分开。如同传统阿拉伯语词典,可更新的词典以词形的词典条目结构进行组织,并且参考拼写完全带变音符号。在我们的模型中,阿拉伯语文本的形态分析直接使用词典进行,而无需形态音律规则。我们对名词屈折的分类是简单、有序且详细的。我们通过指定元音数量为v或vv,并忽略元音质量来简化单数词形的分类。根交替和正字法变化是独立于词形并以事实方式编码的,而不涉及深根或形态音律或正字法规则。具有三重词干BPs的名词根据22个词形细分到90个类别中进行分类,而具有四重词干BPs的名词根据3个词形细分到70个类别中进行分类。这些160个类别在考虑只影响单数的屈折变化时,变为300个屈折类别。我们提供了一种直接的编码方案,该方案应用于3200个BPs名词条目。

Journal ref Language Sciences, 2013, 40, pp.221-250

详情
AI中文摘要

我们提出了一种实施程度较高的模型,用于描述阿拉伯语名词的屈折形态,特别关注阿拉伯语学者在管理词典和其他语言资源时的处理方式。突破点在于将传统的根-词形塞语模型反转为词形-根模型,优先考虑词形。我们的模型包括破碎复数(BPs),即通过修改词干形成的复数。它基于传统的塞语形态学中的根和词形概念。然而,与传统阿拉伯语形态学相比,它将屈折的正式描述与派生和语义分开。如同传统阿拉伯语词典,可更新的词典以词形的词典条目结构进行组织,并且参考拼写完全带变音符号。在我们的模型中,阿拉伯语文本的形态分析直接使用词典进行,而无需形态音律规则。我们的名词屈折分类法简单、有序且详细。我们通过指定元音数量为v或vv,并忽略元音质量来简化单数词形的分类。根交替和正字法变化是独立于词形并以事实方式编码的,而不涉及深根或形态音律或正字法规则。具有三重词干BPs的名词根据22个词形细分到90个类别中进行分类,而具有四重词干BPs的名词根据3个词形细分到70个类别中进行分类。这些160个类别在考虑只影响单数的屈折变化时,变为300个屈折类别。我们提供了一种直接的编码方案,该方案应用于3200个BPs名词条目。

英文摘要

We present a substantially implemented model of description of the inflectional morphology of Arabic nouns, with special attention to the management of dictionaries and other language resources by Arabic-speaking linguists. The breakthrough lies in the reversal of the traditional root-and-pattern Semitic model into pattern-and-root, giving precedence to patterns over roots. Our model includes broken plurals (BPs), i.e. plurals formed by modifying the stem. It is based on the traditional notions of root and pattern of Semitic morphology. However, as compared to traditional Arabic morphology, it keeps the formal description of inflection separate from that of derivation and semantics. As traditional Arabic dictionaries, the updatable dictionary is structured in lexical entries for lemmas, and the reference spelling is fully diacritized. In our model, morphological analysis of Arabic text is performed directly with a dictionary of words and without morphophonological rules. Our taxonomy for noun inflection is simple, orderly and detailed. We simplify the taxonomy of singular patterns by specifying vowel quantity as v or vv, and ignoring vowel quality. Root alternations and orthographical variations are encoded independently from patterns and in a factual way, without deep roots or morphophonological or orthographical rules. Nouns with a triliteral BP are classified according to 22 patterns subdivided into 90 classes, and nouns with a quadriliteral BP according to 3 patterns subdivided into 70 classes. These 160 classes become 300 inflectional classes when we take into account inflectional variations that affect only the singular. We provide a straightforward encoding scheme that we applied to 3 200 entries of BP nouns.

2605.22304 2026-05-22 cs.AI cs.DB cs.LG

Evaluation of Pipelines for Data Integration into Knowledge Graphs

数据整合到知识图谱的管道评估

Marvin Hofer, Erhard Rahm

发表机构 * ScaDS.AI Dresden/Leipzig(ScaDS.AI 德累斯顿/莱比锡) Leipzig University(莱比锡大学)

AI总结 本文提出KGI-Bench基准测试,用于评估将不同输入数据整合到现有知识图谱的管道,通过覆盖度、正确性和一致性三个指标分析输出的知识图谱质量,并在电影领域提供基准数据集以评估12种管道的性能。

详情
AI中文摘要

将新数据整合到知识图谱(KG)通常涉及在工作流或管道中执行的不同任务。对于特定的整合问题,有许多可能的管道,但目前尚无通用方法来评估此类管道的整体质量和性能,以确定最佳选择。因此,我们提出一个新的基准KGI-Bench,用于评估将不同类型的输入数据整合到现有KG的管道。我们通过分析输出,即更新后的KG,使用三个互补的质量度量:覆盖度、正确性和一致性来评估管道。我们还提供了基准数据集(种子KG、三种格式的重叠输入数据、参考KG作为地面真实值)用于电影领域。为了展示所提基准的适用性和有用性,我们比较评估了12种管道,并分析了它们在不同输入数据格式和设计选择下的行为。

英文摘要

Integrating new data into knowledge graphs (KG) typically involves different tasks that are executed within workflows or pipelines There are many possible pipelines for a specific integration problem but there is not yet a general approach to evaluate the overall quality and performance of such pipelines to be able to determine the best choices. We therefore propose a new benchmark KGI-Bench to evaluate integration pipelines that ingest different kinds of input data into an existing KG. We evaluate pipelines by analyzing their output, i.e., the updated KG, with the three complementary quality metrics coverage, correctness and consistency. We also provide benchmark datasets (seed KG, overlapping input data of three formats, reference KG as a ground truth) for the movie domain. To demonstrate the applicability and usefulness of the proposed benchmark, we comparatively evaluate 12 pipelines and analyze their behavior across different input data formats and design choices.

2605.22300 2026-05-22 cs.AI cs.LG cs.MA

Cross-domain benchmarks reveal when coordinated AI agents improve scientific inference from partial evidence

跨领域基准测试揭示协调AI代理在部分证据下提升科学推断何时有效

Fiona Y. Wong, Markus J. Buehler

发表机构 * Laboratory for Atomistic and Molecular Mechanics (LAMM)(原子分子力学实验室) Department of Biological Engineering(生物工程系) Department of Mechanical Engineering(机械工程系) Department of Civil and Environmental Engineering(土木与环境工程系) Center for Computational Science and Engineering, Schwarzman College of Computing(计算科学与工程中心) Massachusetts Institute of Technology(麻省理工学院)

AI总结 本文通过跨领域基准测试探讨协调AI代理在部分证据下提升科学推断的有效性,发现当不同学科各自捕捉现象部分时,跨通道复合方法优于单一通道基线,但在某些情况下分解并不总是提升整体性能。

详情
AI中文摘要

科学证据通常跨越仪器、数据库和学科,因此没有单一来源能完整记录现象。这使得确定协调AI代理何时能超越简单科学工作流变得困难。我们通过涵盖四个科学任务的跨领域基准测试评估了这一问题:将分子结构映射到音乐表示、检测科学历史范式转变、识别媒介传播疾病爆发以及验证行星凌星候选体。每个案例均使用冻结评估小组、预定义评分协议、明确基线、消融或零对照,以及声明的限制。结果定义了三个操作模式。当不同学科各自只捕捉现象部分时,跨通道复合方法优于单一通道基线:气候-媒介爆发达到AUROC 0.944,行星凌星验证达到AUROC 0.955。然而,行星凌星工作流与强联合摘要基线几乎持平,表明分解不总能提升整体性能。当一个信号主导时,如范式转变检测,协调主要提升解释和可追溯性。对于分子音乐化,收益是表征而非预测性的。ScienceClaw x Infinite提供了此评估的可审计艺术ifacts和来源层。因此,该基准测试仅在对应的性能、来源或表征主张有明确比较器支持时才赋予协调价值。

英文摘要

Scientific evidence often spans instruments, databases, and disciplines, so no single source records the full phenomenon. This makes it difficult to determine when coordinated AI agents add value over simpler scientific workflows. We evaluate this question with a cross-domain benchmark spanning four scientific tasks: mapping molecular structure into musical representations, detecting historical paradigm shifts in science, identifying vector-borne disease emergence, and vetting transiting-exoplanet candidates. Each case uses a frozen evaluation panel, predefined scoring protocols, explicit baselines, ablations or null controls, and stated limitations. The results define three operating regimes. When different disciplines each capture only part of the phenomenon, cross-channel composites improve over single-channel baselines: climate-vector emergence reaches AUROC 0.944 and exoplanet vetting reaches AUROC 0.955. However, the exoplanet workflow is effectively tied with a strong combined-summary baseline, showing that decomposition does not always improve top-line performance. When one signal dominates, as in paradigm-shift detection, coordination mainly improves interpretation and traceability. For molecular sonification, the gain is representational rather than predictive. ScienceClaw x Infinite provides the auditable artifact and provenance layer for this evaluation. The benchmark therefore assigns value to coordination only when the corresponding performance, provenance, or representation claim is supported by explicit comparators.

2605.22291 2026-05-22 cs.LG

Long-term Fairness with Selective Labels

长期公平性与选择性标签

Giovani Valdrighi, Isabel Valera, Marcos Medeiros Raimundo

发表机构 * Department of Computer Science, Saarland University, Saarbrücken, Germany(萨尔布吕肯大学计算机科学系)

AI总结 本文研究了在选择性标签设置下长期公平性的问题,提出了一种新的框架,通过结合观测数据和标签预测模型来估计真实的公平性度量,并提出了一种新的强化学习算法以实现有效长期公平决策。

详情
AI中文摘要

长期公平性算法旨在通过考虑决策政策与人口行为之间的动态关系,满足超越静态和短期观念的公平性。大多数先前的方法从可观察特征和标签评估性能和公平性度量,其中标签被假设为完全可观测。然而,在招聘或贷款等场景中,标签(例如偿还贷款的能力)是选择性标签,因为它们仅在积极决定(例如贷款被批准时)后才被揭示。在本文中,我们研究了选择性标签设置下的长期公平性,并分析表明,朴素的解决方案无法保证公平性。为了解决这一差距,我们引入了一个新的框架,利用观测数据和标签预测模型来估计真实的公平性度量值,将其分解为观测公平性和标签预测中的偏差。这使我们能够通过使用预测模型的置信度来推导出满足真实公平性的充分条件。最后,我们依赖我们的理论结果,提出了一种新的强化学习算法,以实现有效长期公平决策。在半合成环境中,所提出的算法在公平性和性能方面与具有oracle访问真实标签的智能体相当。

英文摘要

Long-term fairness algorithms aim to satisfy fairness beyond static and short-term notions by accounting for the dynamics between decision-making policies and population behavior. Most previous approaches evaluate performance and fairness measures from observable features and a label, which is assumed to be fully observed. However, in scenarios such as hiring or lending, the labels (e.g., ability to repay the loan) are selective labels as they are only revealed based on positive decisions (e.g., when a loan is granted). In this paper, we study long-term fairness in the selective labels setting and analytically show that naive solutions do not guarantee fairness. To address this gap, we then introduce a novel framework that leverages both the observed data and a label predictor model to estimate the true fairness measure value by decomposing it into the observed fairness and bias from label predictions. This allows us to derive sufficient conditions to satisfy true fairness from observable quantities by using the confidence in the predictor model. Finally, we rely on our theoretical results to propose a novel reinforcement learning algorithm for effective long-term fair decision-making with selective labels. In semisynthetic environments, the proposed algorithm reached comparable fairness and performance to an agent with oracle access to the true labels.

2605.22287 2026-05-22 cs.AI

SciCore-Mol: Augmenting Large Language Models with Pluggable Molecular Cognition Modules

SciCore-Mol: 通过可插拔分子认知模块增强大语言模型

Yuxuan Chen, Changwei Lv, Yunduo Xiao, Zhongjing Du, Daquan Zhou, Yukun Yan, Zheni Zeng, Zhiyuan Liu

发表机构 * School of Electronic and Computer Engineering, Peking University, Shenzhen, China(电子与计算机工程学院,北京大学深圳校区) Tsinghua University, Beijing, China(清华大学,北京) School of Intelligence Science and Technology, Nanjing University, Suzhou, China(智能科学与技术学院,南京大学苏州校区)

AI总结 本文提出SciCore-Mol框架,通过三个深度集成的可插拔认知模块解决大语言模型处理异构科学数据(如分子)时的语义鸿沟问题,实现分子理解、生成、反应预测和化学知识的综合性能提升。

Comments 15 pages, 4 figures, 9 tables. Preprint

详情
AI中文摘要

大型语言模型(LLMs)是实现万物智能范式的中心,但处理异构科学数据如分子时面临根本性挑战:离散语言符号与拓扑分子或连续反应数据之间的固有差距导致文本推理中的信息丢失和语义噪声。我们提出了SciCore-Mol,一个模块化框架,通过三个深度集成的可插拔认知模块弥合这一差距:拓扑感知感知模块、基于潜在扩散的分子生成模块以及反应感知推理模块。每个模块通过学习的表示接口连接到LLM主干,使信息交换比仅使用文本工具反馈更丰富。我们在多样化的化学任务上的实验表明,SciCore-Mol在分子理解、生成、反应预测和一般化学知识方面实现了强大的综合性能,8B参数开源系统在多个维度上可与甚至超越专有大模型竞争。这项工作为通过解耦、可插拔和灵活编排的模块系统为LLM提供科学专业知识提供了系统蓝图,对药物设计、化学合成和更广泛的科学发现有直接意义。

英文摘要

Large Language Models (LLMs) are central to the one-for-all intelligent paradigm, but they face a fundamental challenge when dealing with heterogeneous scientific data such as molecules: the inherent gap between discrete linguistic symbols and topological molecular or continuous reaction data leads to significant information loss and semantic noise in text-based reasoning. We propose SciCore-Mol, a modular framework that bridges this gap through three deeply integrated pluggable cognitive modules: a topology-aware perception module, a latent diffusion-based molecular generation module, and a reaction-aware reasoning module. Each module is coupled to the LLM backbone through learned representation interfaces, enabling richer information exchange than is possible with text-only tool feedback. Our experiments on diverse chemical tasks demonstrate that SciCore-Mol achieves strong comprehensive performance across molecular understanding, generation, reaction prediction, and general chemistry knowledge, with an 8B-parameter open-source system that is competitive with and in several dimensions surpasses proprietary large models. This work provides a systematic blueprint for equipping LLMs with scientific expertise through decoupled, pluggable, and flexibly orchestrated modules, with direct implications for drug design, chemical synthesis, and broader scientific discovery.

2605.22286 2026-05-22 cs.LG cs.AI

EmoTrack: Robust Depression Tracking from Counseling Transcripts across Session Regimes

EmoTrack: 从咨询记录中跨会话制度实现稳健的抑郁跟踪

Zhaomin Wu, Jiayi Li, Bingsheng He

发表机构 * Department of Computer Science National University of Singapore(新加坡国立大学计算机科学系)

AI总结 本文研究了从单次会话和多会话制度中通过咨询记录进行稳健抑郁跟踪的问题,提出了LongCounsel多会话咨询数据集和EmoTrack框架,结合LLM提取的临床信号和冻结的轮次级语义嵌入,训练症状特定预测器,并通过紧凑的跨会话记忆进一步结合先前会话,实验表明在真实单次会话基准上表现优异。

详情
AI中文摘要

基于文本的咨询是人工智能心理健康支持的重要接口,其中记录可能用于监控抑郁严重程度并标记需要及时人工审查的会话。然而,跨会话制度实现稳健的PHQ-8预测仍然具有挑战性:基于微调的方法可以利用更丰富的监督但可能在数据稀缺时泛化能力差,而基于提示的LLM方法数据高效但通常将每个记录整体处理,对纵向上下文支持有限。我们研究了从咨询记录中跨单次会话和多会话制度进行稳健抑郁跟踪。我们引入了LongCounsel多会话咨询数据集,具有会话级PHQ-8监督,用于评估在部分症状披露和跨会话连续性下的重复会话跟踪。我们进一步提出了EmoTrack,一种PHQ-8预测框架,结合LLM提取的临床信号与冻结的轮次级语义嵌入,并在得到的记录表示上训练症状特定预测器。当先前会话可用时,EmoTrack可通过紧凑的跨会话记忆进一步结合它们。在LongCounsel和DAIC-WOZ上的实验表明,EmoTrack在真实单次会话基准上实现了明显优势,包括在最强DAIC-WOZ基线上的MAE相对减少13.5%,并在LongCounsel上与最强的纵向基线保持竞争力。

英文摘要

Text-based counseling is an important interface for AI mental-health support, where transcripts may be used to monitor depression severity and flag sessions requiring timely human review. However, robust PHQ-8 prediction across session regimes remains challenging: fine-tuning-based methods can exploit richer supervision but may generalize poorly under data scarcity, while prompt-based LLM methods are data-efficient but usually treat each transcript holistically and provide limited support for longitudinal context. We study robust depression tracking from counseling transcripts across single-session and multi-session regimes. We introduce LongCounsel, a multi-session counseling dataset with session-level PHQ-8 supervision for evaluating repeated-session tracking under partial symptom disclosure and cross-session continuity. We further propose EmoTrack, a PHQ-8 prediction framework that combines LLM-extracted clinical signals with frozen turn-level semantic embeddings and trains symptom-specific predictors over the resulting transcript representation. When prior sessions are available, EmoTrack can further incorporate them through compact cross-session memory. Experiments on LongCounsel and DAIC-WOZ show that EmoTrack achieves a clear gain on the real single-session benchmark, including a 13.5% relative MAE reduction over the strongest DAIC-WOZ baseline, and remains competitive with the strongest longitudinal baseline on LongCounsel.

2605.22283 2026-05-22 cs.RO

Spatial Memory for Out-of-Vision Manipulation in Vision-Language-Action

视觉-语言-动作中视界外操作的空间记忆

Pengteng Li, Weiyu Guo, He Zhang, Tiefu Cai, Xiao He, Yandong Guo, Hui Xiong

发表机构 * Thrust of Artificial Intelligence, The Hong Kong University of Science(人工智能推动部,香港科学大学)

AI总结 本文提出SOMA框架,用于解决视觉-语言-动作模型中视界外操作的问题,通过构建持久的空间记忆,使模型能够超越当前视觉范围进行推理,提升任务成功率和操作行为质量。

Comments Accepted by ICML 2026

详情
AI中文摘要

我们引入SOMA,即视觉-语言-动作(VLA)模型中的空间记忆框架,用于视界外操作。现有VLAs通常隐式假设任务相关物体始终可见,当目标物体处于相机视野外时,会导致行为脆弱和反应迟钝。SOMA通过为VLAs配备由移动头部相机获取的多视角观察构建的持久空间记忆,解决了这一限制。该框架包含三个组件:空间记忆构建,通过扫描将角度方向的观察聚合为统一的空间-语义表示;动态记忆细化,保持时间上的全局一致性;以及情境记忆检索,激活操作过程中与指令相关的空间线索。我们评估SOMA在五个具有挑战性的现实世界视界外操作任务上,包括多步骤和双臂场景,其中目标物体最初不可见。实验结果表明,SOMA不仅提高了任务成功率,还诱导了质不同操作行为,具有更快的目标定位、减少视角搜索和近似单次抓取在部分可观察性条件下。在RoboCasa GR1和SimplerEnv上的额外实验进一步验证了SOMA记忆设计在传统完全可观察设置下的有效性。代码将很快发布。

英文摘要

We introduce SOMA, the Spatial Memory framework for Out-of-Vision Manipulation in Vision-Language-Action (VLA) models. Most existing VLAs implicitly assume that task-relevant objects are always visible, leading to brittle and reactive behaviors when targets fall outside the camera's field of view. SOMA addresses this limitation by equipping VLAs with a persistent spatial memory constructed from multi-view observations acquired via a movable head camera, enabling reasoning beyond the current visual frustum. The framework consists of three components: Spatial Memory Construction, which aggregates angular-wise observations into a unified spatial-semantic representation through scanning; Dynamic Memory Refinement, which maintains global consistency over time; and Contextual Memory Retrieval, which activates instruction-relevant spatial cues during manipulation. We evaluate SOMA on five challenging real-world out-of-vision manipulation tasks, including multi-step and dual-arm scenarios where target objects are initially invisible. Experimental results show that SOMA not only improves task success rates, but also induces qualitatively different manipulation behaviors, with faster target localization, reduced viewpoint search, and near one-shot grasping under partial observability. Additional experiments on RoboCasa GR1 and SimplerEnv further validate the effectiveness of SOMA's memory design under conventional fully observable settings. Code will be released soon.

2605.22275 2026-05-22 cs.LG

Adaptive Measurement Allocation for Learning Kernelized SVMs Under Noisy Observations

适应性测量分配用于在噪声观测下学习核化SVM

Artur Miroszewski

发表机构 * Φ \Phi -lab, European Space Agency (ESA/ESRIN), Frascati, Italy(Φ \Phi 实验室,欧洲航天局(ESA/ESRIN),弗拉斯卡蒂,意大利)

AI总结 本文提出了一种适应性测量分配策略,用于在噪声观测下学习核化支持向量机,通过结合几何敏感性和主动集不稳定性,优化核矩阵中决策关键区域的测量分配,从而提升支持向量恢复、边距估计和决策函数准确性。

Comments 20 pages, 9 figures

详情
AI中文摘要

核方法通常是在假设能够精确获取Gram矩阵的情况下进行建模的。然而,在新兴领域如量子机器学习中,每个核元素必须从噪声观测中推断出来,其准确性取决于如何分配有限的测量预算。尽管如此,现有方法大多依赖于均匀分配,这虽然平等地降低了估计方差,但忽略了核化分类器对Gram矩阵的高度非均匀依赖。在本文中,我们提出了一种适应性测量分配策略,用于从噪声伯努利观测中学习核化支持向量机。我们的方法结合了两个互补原则:(i) 几何敏感性,捕捉单个核元素扰动对分类器边距的影响,以及 (ii) 主动集不稳定性,量化由测量噪声引起的支持向量成员身份的离散变化概率。这些信号定义了一个任务感知的分配方案,将测量集中在核矩阵中最关键的决策区域。我们提供了理论分析,表明适应性分配的益处由诱导核重要结构的异质性决定,导致在不同情况下适应性或均匀策略更优。在合成数据集上的实验证明,在固定测量预算下,适应性分配显著提高了支持向量恢复、边距估计和决策函数准确性。双系数稳定性准则进一步使早停成为可能,仅使用少量测量成本即可达到近最优性能。此外,在从真实数据导出的量子核上的额外实验揭示了与已知现象如核集中度相一致的领域依赖行为。

英文摘要

Kernel methods are typically formulated under the assumption of exact, noise-free access to the Gram matrix. However, in emerging settings such as quantum machine learning, each kernel entry must be inferred from noisy observations, and its accuracy depends on how a limited measurement budget is allocated. Despite this, existing approaches overwhelmingly rely on uniform allocation, which equalizes estimator variance but ignores the highly non-uniform dependence of kernelized classifiers on the Gram matrix. In this work, we introduce an adaptive measurement-allocation strategy for learning kernelized Support Vector Machines (SVMs) from noisy Bernoulli observations. Our approach combines two complementary principles: (i) geometric sensitivity, capturing how perturbations of individual kernel entries affect the classifier margin, and (ii) active-set instability, quantifying the probability of discrete changes in support-vector membership induced by measurement noise. These signals define a task-aware allocation scheme that concentrates measurements on the most decision-critical regions of the kernel matrix. We provide a theoretical analysis showing that the benefit of adaptive allocation is governed by the heterogeneity of the induced kernel importance structure, leading to distinct regimes in which adaptive or uniform strategies are preferable. Empirical evaluations on synthetic datasets demonstrate that adaptive allocation significantly improves support-vector recovery, margin estimation, and decision-function accuracy under fixed measurement budgets. A dual-coefficient stability criterion further enables early stopping, achieving near-optimal performance while using only a fraction of the measurement cost. Additional experiments on quantum kernels derived from real-world data reveal a regime-dependent behavior aligned with known phenomena such as kernel concentration. Together...

2605.22269 2026-05-22 cs.CV cs.AI cs.MM

MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering

MuKV:多粒度KV缓存压缩用于长流视频问答

Junbin Xiao, Jiajun Chen, Tianxiang Sun, Xun Yang, Angela Yao

发表机构 * University of Science and Technology of China(中国科学技术大学) National University of Singapore(新加坡国立大学)

AI总结 本文提出MuKV,一种多粒度KV缓存压缩方法,通过半分层检索方法提升长流视频问答的效率和准确性,实验表明其在答案准确率、内存使用和在线问答效率方面均优于基线方法。

Comments To appear at CVPR'26. Code is available at https://github.com/IMBALDY/MuKV

详情
AI中文摘要

长流视频问答仍面临挑战,由于视觉token数量增加和大语言模型(LLM)推理长度有限。KV缓存通过LLM预填充存储历史token的Key-Value(KV),从而实现更高效的流式问答。然而,现有方法缓存每个或每两个帧,导致内存使用冗余并丢失帧内或跨帧的细粒度空间细节。本文提出MuKV,一种具有多粒度KV缓存压缩模块和半分层检索方法的方法,以提高长流视频问答的效率和准确性。对于离线KV缓存,MuKV在patch、frame和segment级别提取视觉表示。多个粒度层次保留了局部线索和全局时间上下文,同时通过自注意力和频率引导的双信号token压缩机制保持效率。对于在线问答,MuKV设计了一种半分层检索方法以检索相关KV缓存用于答案生成。在长流视频问答基准测试中,MuKV显著提高了答案准确率,而无需牺牲内存和在线问答效率。此外,我们的压缩机制本身在答案准确率、内存和问答效率方面均对基线方法带来了持续的改进,展示了高度有效的贡献。

英文摘要

Long streaming video QA remains challenging due to growing visual tokens and limited reasoning length of large language models (LLMs). KV-caching stores the Key-Value (KV) of the historical tokens via LLM prefill and enables more efficient streaming QA. However, existing methods cache every one or two frames, causing redundant memory usage and losing fine-grained spatial details within frame or temporal contexts across frames. This paper proposes MuKV, a method that features a multi-grained KV cache compression module and a semi-hierarchical retrieval approach to improve both efficiency and accuracy for long streaming VideoQA. For the offline KV cache, MuKV extracts visual representations at patch-, frame-, and segment-levels. The multiple levels of granularity preserve both local cues and global temporal context, while maintaining efficiency with a dual signal token compression mechanism guided by self-attention and frequency. For online QA, MuKV designs a semi-hierarchical retrieval method to retrieve relevant KV caches for answer generation. Experiments on long-streaming VideoQA benchmarks show that MuKV significantly improves answer accuracy, without sacrificing memory and online QA efficiency. Moreover, our compression mechanism alone brings consistent benefits across answer accuracy, memory, and QA efficiency over baselines, showcasing highly effective contribution.

2605.22266 2026-05-22 cs.LG cs.AI

Detecting Atypical Clients in Federated Learning via Representation-Level Divergence

通过表示层面的分歧检测联邦学习中的非典型客户端

Cristian Pérez-Corral, Jose I. Mestre, Alberto Fernández-Hernández, Manuel F. Dolz, Enrique S. Quitana-Ortí

发表机构 * Universitat Politècnica de València(巴塞罗那理工大学) Universitat Jaume I(Jaime I 大学)

AI总结 本文提出了一种轻量级的几何信号来量化客户端与全局模型之间的功能偏差,以检测联邦学习中的非典型客户端,通过评估输入空间的激活诱导分区变化来区分稳定但异质的客户端与显著偏离全局范式的客户端。

详情
AI中文摘要

联邦学习使分布式客户端在异质数据上进行协作训练,但这种异质性常常导致更新不稳定和全局性能下降。此外,在实际部署中,客户端更新可能偏离预期行为,不仅由于良性非独立同分布的数据分布,还由于分布偏移或异常输入,这引发了对聚合过程可靠性的担忧。在本工作中,我们提出了一种轻量级的几何信号来量化客户端相对于全局模型的功能偏差。与比较模型参数或梯度不同,我们的方法衡量每个客户端本地训练如何改变激活诱导的输入空间分区,该评估基于共享的探测集。这产生了一个置换不变、可解释的客户端-全局分歧度量,捕捉了模型处理数据方式的差异。我们展示该信号能有效识别导致非典型功能变化的客户端,区分稳定但异质的客户端与那些更新显著偏离全局范式的客户端。因此,所提出的度量提供了一个简单的工具用于监控客户端行为,并在联邦学习系统中实现风险感知的聚合策略。

英文摘要

Federated learning enables collaborative training across distributed clients with heterogeneous data, but such heterogeneity often leads to unstable updates and degraded global performance. Moreover, in practical deployments, client updates may deviate from the expected behavior not only due to benign not i.i.d. distributions, but also due to distributional shifts or anomalous inputs, raising concerns about the reliability of the aggregation process. In this work, we propose a lightweight geometric signal to quantify the functional deviation of a client with respect to the global model. Instead of comparing model parameters or gradients, our approach measures how the local training of each client alters the activation-induced partition of the input space, evaluated on a shared probe set. This yields a permutation-invariant, interpretable metric of client--global divergence that captures differences in how data is processed by the model. We show that this signal effectively identifies clients that induce atypical functional changes, distinguishing stable yet heterogeneous clients from those whose updates significantly diverge from the global regime. As a result, the proposed metric provides a simple tool for monitoring client behavior and enabling risk-aware aggregation strategies in federated learning systems.

2605.22263 2026-05-22 cs.LG cs.AI

Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning

按能力定制教学:方向自适应自蒸馏用于LLM推理

Hongbin Zhang, Chaozheng Wang, Kehai Chen, Youcheng Pan, Yang Xiang, Jinpeng Wang, Min Zhang

发表机构 * Institute of Computing and Intelligence, Harbin Institute of Technology(计算智能研究所,哈尔滨工业大学) Peng Cheng Laboratory(鹏城实验室) Keeta AI, Meituan(Keeta AI,美团)

AI总结 本文提出方向自适应自蒸馏(DASD),通过熵引导的定向监督改进LLM推理,通过分析发现统一的教师监督导致探索被压制,DASD在六个数学推理基准中取得最佳表现。

Comments Under Review

详情
AI中文摘要

在线自蒸馏(OPSD)是一种新兴的LLM后训练范式,其中模型作为自己的教师:在有特权信息(如参考轨迹或提示)的条件下,同一策略为自身 rollout 提供密集的token级监督。然而,最近的研究表明,OPSD 通过抑制预测不确定性而损害复杂推理,这支持探索和假设修订。我们的token级分析显示,这种失败源于在具有不同不确定性水平的token上应用统一的教师监督方向:符合特权自教师会抑制高熵的探索,而偏离教师会降低低熵的步骤准确性。据此,我们提出了方向自适应自蒸馏(DASD),将特权自蒸馏从统一教师模仿重新框架为熵引导的定向监督:高熵token被推离特权教师以保持探索,而低熵token被拉向教师以稳定步骤级执行。在六个数学推理基准上,DASD在强RLVR和自蒸馏基线中实现了最佳的宏Avg@16。Pass@$k$、推理健康和泛化分析表明,这些平均收益来自于在不牺牲步骤级执行的情况下保留探索。

英文摘要

On-policy self-distillation (OPSD) is an emerging LLM post-training paradigm in which the model serves as its own teacher: conditioned on privileged information such as a reference trace or hint, the same policy provides dense token-level supervision on its own rollouts. However, recent studies show that OPSD degrades complex reasoning by suppressing predictive uncertainty, which supports exploration and hypothesis revision. Our token-level analysis shows that this failure arises from applying a uniform direction of teacher supervision across tokens with different uncertainty levels: conformity to the privileged self-teacher suppresses exploration at high entropy, while deviation from the teacher degrades step accuracy at low entropy. Accordingly, we propose \textbf{Direction-Adaptive Self-Distillation} (\textbf{DASD}), which reframes privileged self-distillation from uniform teacher imitation into entropy-routed directional supervision: high-entropy tokens are pushed away from the privileged teacher to preserve exploration, while low-entropy tokens are pulled toward the teacher to stabilize step-level execution. Across six mathematical reasoning benchmarks, DASD achieves the best macro Avg@16 over strong RLVR and self-distillation baselines. Pass@$k$, reasoning-health, and generalization analyses show that these average gains come from preserving exploration without sacrificing step-level execution.

2605.22262 2026-05-22 cs.SD cs.LG eess.AS

Automatic Contextual Audio Denoising

自动上下文音频去噪

Diep Luong, Konstantinos Drossos, Mikko Heikkinen, Tuomas Virtanen

发表机构 * Tampere University(塔尔皮奥大学) Nokia(诺基亚)

AI总结 本文提出了一种自动上下文音频去噪方法,通过推断音频场景类别来区分有用和无关声音成分,从而提高去噪效果。

详情
AI中文摘要

音频上下文决定了哪些声音成分和来源是相关的,哪些可以被听众感知为无关(噪声)。例如,在城市监控中交通噪声是有信息的,而在同一地点的电话通话中则为噪声。大多数当前的音频去噪系统使用固定的目标-噪声定义,往往在一种上下文中去除有用成分而在另一种上下文中无法抑制无关成分。为此,我们引入了自动上下文音频去噪(ACAD)的概念,该概念基于推断的上下文定义目标和噪声。在本工作中,我们将上下文限制为与声学场景类别相关联。我们将场景类别外的事件分布之外的声音事件(噪声)标记为离上下文(OC),而典型于该场景的事件标记为在上下文中(IC)。我们实现了一种深度学习方法,该方法能够自动推断音频信号的上下文并去除OC成分,并将其与无上下文推断、有 oracle 上下文和单独提供无信息上下文的变体进行比较。在跨多样上下文的配对干净/噪声数据上,其中一种上下文中的OC成分可能在另一种上下文中是IC,我们的方法在标准客观指标上优于其他方法,表明模型能够推断上下文,并且上下文依赖的处理可以增强去噪。

英文摘要

Audio context determines which sound components and sources are relevant and which can be perceived as irrelevant (noise) by listeners. For example, traffic noise is informative in urban surveillance but noise for a phone call at the same location. Most current audio denoising systems apply fixed target-noise definitions, often removing useful components in one context while failing to suppress irrelevant components. To address this, we introduce the concept automatic contextual audio denoising (ACAD) which defines target and noise based on the inferred context. In this work, we restrict context to be associated with an acoustic scene class. We label sound events outside the event distribution of a scene class (noise) as out-of-context (OC) and events typical for that scene as in-context (IC). We implement a deep learning method that automatically infers the context of the audio signal and removes OC components, and benchmark it against variants: without context inference, with oracle context, and with separately provided uninformative context. On paired clean/noisy data across diverse contexts, where OC components in one context may be IC in another, our proposed method outperforms other approaches across standard objective metrics, indicating that the model can infer context and context-dependent processing can enhance denoising.

2605.22259 2026-05-22 cs.LG cs.CV cs.RO

An Evidence Hierarchy for Bayesian Object Classification via OSINT-Aided Heterogeneous Sensor Fusion

基于OSINT辅助异质传感器融合的贝叶斯目标分类证据层级

Jan Nausner, Michael Hubner

发表机构 * Center for Digital Safety & Security, Austrian Institute of Technology GmbH (AIT)(数字安全与安全研究所,奥地利技术研究院(AIT))

AI总结 本文提出了一种基于OSINT辅助的异质传感器融合方法,通过建立新的证据层级模型,结合上下文信息和领域知识,提升对CBRNE威胁的分类准确率,实验结果表明该方法在抗干扰和先验不匹配方面具有优势,分类准确率高达95%。

Comments 6 pages, 1 figure; \c{opyright} 2026 The Authors. Submitted to the 2026 IEEE International Conference on Multisensor Fusion and Integration (MFI 2026). Under review

详情
AI中文摘要

异质传感器融合对于检测、定位和分类CBRNE威胁至关重要。然而,单独的传感器通常只能检测相关威胁的子集,其可靠性各异,甚至只能提供间接威胁指示,使威胁分类变得困难。此外,传感器侧的高杂波率对融合系统提出了巨大挑战。此外,高质量数据集的有限供应阻碍了智能传感器中基于学习的检测和分类模型的发展。为缓解这些传感器相关缺点,提出了一种上下文感知和领域知识增强的融合过程。首先,建立了一个新的证据层级,能够建模直接、指示性和上下文信息。其次,通过收集、处理和利用OSINT输入,将环境上下文信息引入融合过程。第三,利用证据层级的所有级别,构建一个结合领域知识的贝叶斯威胁类型分类机制。所提出的方法在模拟场景中进行了评估,结果表明该融合方法在抗杂波和先验不匹配方面具有优势,总体分类准确率高达95%。

英文摘要

Heterogeneous sensor fusion is vital for detecting, localizing, and classifying CBRNE threats. However, individual sensors are often only capable of detecting a subset of relevant threats with varying reliability or can even provide only indirect threat indications, making threat classification challenging. Furthermore, high clutter rates on the sensor side present a great challenge for fusion systems. Additionally, the limited availability of high quality datasets hinders the advancement of learning-based detection and classification models in smart sensors. To mitigate these sensor related shortcomings, a context-aware and domain knowledge-enhanced fusion process is proposed. First, a novel evidence hierarchy is established that enables modeling of direct, indicative, and contextual information. Second, contextual information about the environment is introduced into the fusion process, by collecting, processing, and exploiting OSINT inputs. Third, all levels of the evidence hierarchy are used to craft a Bayesian threat type classification mechanism with domain knowledge-informed priors. The proposed methodology is evaluated in simulated scenarios, and the results demonstrate the benefit of the proposed fusion approach in terms of robustness to clutter and prior mismatch, with an overall classification accuracy of up to 95%.

2605.22258 2026-05-22 cs.CL

Harder to Defend: Towards Chinese Toxicity Attacks via Implicit Enhancement and Obfuscation Rewriting

更难防御:通过隐含增强和模糊重写实现中文毒性攻击

Jingyi Kang, Junyu Lu, Bo Xu, Hongbo Wang, Linlin zong, Roy Ka-Wei Lee, Hongfei Lin

发表机构 * Dalian University of Technology(大连理工大学) The University of Tokyo(东京大学) Singapore University of Technology and Design(新加坡科技设计大学)

AI总结 本研究提出了一种针对中文毒性攻击的框架CITA,通过隐含增强和模糊重写技术生成攻击样本,揭示了现有检测器在识别隐含毒性内容时的不足,并展示了通过训练防御模型提升鲁棒性的效果。

Comments 16 pages, 5 figures

详情
AI中文摘要

大型语言模型(LLMs)需要超越显式用语的鲁棒毒性评估。在这种设定中,中文的毒性可能结合语义间接性和表层模糊性仍处于探索阶段。我们引入了中文隐含毒性攻击(CITA),一种受控的红队评估和防御数据生成框架,而不是可部署的逃避工具。CITA使用三个阶段:(i)有害意图学习,(ii)隐含毒性增强,以及(iii)模糊变体重写,以保持有害意图,增加隐含性,并添加受控的表层变体。在CITA生成的评估样本上,七个测试检测器表现出显著的漏检风险,达到平均ASR为69.48%;人类评估进一步确认了保留的有害性和增加的隐含性/逃避性。作为下游防御应用,我们使用CITA生成的红队数据微调了中文隐含毒性防御模型(CITD),显示此类数据可通过额外训练提高鲁棒性。

英文摘要

Large language models (LLMs) require robust toxicity evaluation beyond explicit wording. This setting remains underexplored in Chinese, where toxicity may combine semantic indirectness with surface obfuscation. We introduce Chinese Implicit Toxicity Attack (CITA), a controlled red-team evaluation and defense-data generation framework, not a deployable evasion tool. CITA uses three stages: (i) Harmful Intent Learning, (ii) Implicit Toxicity Enhancement, and (iii) Obfuscation Variant Rewriting, to preserve harmful intent, increase implicitness, and add controlled surface variants. On CITA-generated evaluation samples, the seven tested detectors exhibit substantial missed-detection risks, reaching an average ASR of 69.48%; human evaluation further confirms preserved harmfulness and increased implicitness/evasiveness. As a downstream defense application, we fine-tune a Chinese Implicit Toxicity Defense model (CITD) with CITA-generated red-team data, showing that such data can improve robustness through additional training.