arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2115
2501.12709 2026-06-17 quant-ph cs.AI cs.CR cs.DC

Experimentally validated quantum-secure federated learning over a multi-user quantum network

在多用户量子网络上实验验证的量子安全联邦学习

Zhi-Ping Liu, Xiao-Yu Cao, Hao-Wen Liu, Xiao-Ran Sun, Yu Bao, Jian-Yu Shen, Yu-Shuo Lu, Hua-Lei Yin, Zeng-Bing Chen

发表机构 * National Laboratory of Solid State Microstructures(固态微结构国家实验室) School of Physics, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, China(物理系,先进微结构协同创新中心,南京大学,南京210093,中国) School of Physics(物理系) Key Laboratory of Quantum State Construction(量子态制备重点实验室) Manipulation (Ministry of Education), Renmin University of China, Beijing 100872, China(操控(教育部),中国人民大学,北京100872,中国)

AI总结 本文提出QuNetQFL协议,通过分布式量子密钥掩蔽局部模型更新,实现信息论安全的聚合。实验验证在四客户端量子网络上,提升分类准确率并展示在语言任务和大规模模拟中的扩展性。

Comments 25 pages, 7 figures, 7 tables, Accepted by Research

详情
Journal ref
Research 9, 1299 (2026)
AI中文摘要

联邦学习实现了去中心化和隐私保护的训练,但在量子时代仍面临隐私泄露的风险。量子联邦学习(QFL)提供了一条通往增强安全性和效率的途径。然而,缺乏一个实际且经过实验验证的QFL协议,利用近期量子技术解决数据隐私问题。本文提出了QuNetQFL协议,在量子网络上实现,其中局部模型更新被分布式量子秘密密钥掩蔽,提供信息论安全的聚合。我们实验验证该协议在四客户端量子网络上,并通过生成的密钥在量子和现实数据集上进行性能基准测试。添加一个量子客户端显著提高了对多体纠缠和非稳定器量子数据集的分类准确率。在语言任务中,我们通过联邦微调混合经典-量子语言模型进行情感分析,实现了在模拟和真实量子硬件上的可比和稳健性能。大规模模拟进一步展示了其扩展性,可扩展到200个客户端进行手写数字识别,具有快速收敛和通信成本减少75%的模型压缩。本文的工作为新兴量子互联网中的量子安全联邦学习建立了实际和可扩展的路线。

英文摘要

Federated learning enables decentralized, privacy-preserving training but remains vulnerable to privacy leakage in the quantum era. Quantum federated learning (QFL) offers a promising path towards enhanced security and efficiency. However, a practical and experimentally validated QFL protocol utilizing near-term quantum techniques to address data privacy has been lacking. Here we present QuNetQFL, a QFL protocol implemented on quantum networks, in which local model updates are masked with distributed quantum secret keys, offering information-theoretic security during aggregation. We experimentally validate the protocol on a four-client quantum network and benchmark its performance using the generated keys on quantum and real-world datasets. Adding a single quantum client significantly improves global accuracy for classifying multipartite entangled and non-stabilizer quantum datasets. For language tasks, we apply QuNetQFL to sentiment analysis by federated fine-tuning of a hybrid classical-quantum language model, achieving comparable and robust performance in simulation and on real quantum hardware. Large-scale simulations further demonstrate scalability to 200 clients for handwritten-digit recognition, with rapid convergence and a $75\%$ reduction in communication cost via model compression. Our work establishes a practical and scalable route to quantum-secure federated learning for the emerging quantum internet.

2604.13662 2026-06-17 cond-mat.mes-hall cs.CV cs.LG

Automatic Charge State Tuning of 300 mm FDSOI Quantum Dots Using Neural Network Segmentation of Charge Stability Diagram

300毫米FDSOI量子点自动电荷状态调节:基于神经网络的电荷稳定性图分割

Peter Samaha, Amine Torki, Ysaline Renaud, Sam Fiette, Emmanuel Chanrion, Pierre-Andre Mortemousque, Yann Beilliard

发表机构 * CEA-Leti(法国格勒诺耶大学(Univ. Grenoble Alpes))

AI总结 本文提出基于深度学习的语义分割流程,通过识别电荷稳定性图中的过渡线实现量子点自动电荷调节,提升硅量子点量子比特的高通量电荷调节效率。

Comments 10 pages, 6 figures, supplementary materials available

详情
AI中文摘要

调节由门定义的半导体量子点(QDs)是扩展自旋量子比特技术的主要瓶颈。我们提出了一种由深度学习(DL)驱动的语义分割流程,通过在完整的电荷稳定性图(CSDs)中定位过渡线来实现电荷自动调节,并返回单电荷 regime 的门电压目标。我们组装并手动注释了1015个实验测量的硅量子点设备的大型异构数据集,涵盖九种设计几何形状、多个晶圆和制造批次。一个具有MobileNetV2编码器的U-Net风格卷积神经网络(CNN)通过五折分组交叉验证进行训练和验证。我们的模型在定位单电荷 regime 方面实现了80.0%的离线调节成功率,某些设计的峰值性能超过88%。我们分析了主导的失败模式并提出了针对性的缓解措施。最后,宽范围图分割也自然地启用了可扩展的基于物理的特征提取,可以反馈到制造和设计流程中,并概述了在低温晶圆探针中实现实时集成的道路图。总体而言,我们的结果表明,基于神经网络(NN)的宽图分割是实现硅量子点量子比特高通量电荷调节的可行步骤。

英文摘要

Tuning of gate-defined semiconductor quantum dots (QDs) is a major bottleneck for scaling spin qubit technologies. We present a deep learning (DL) driven, semantic-segmentation pipeline that performs charge auto-tuning by locating transition lines in full charge stability diagrams (CSDs) and returns gate voltage targets for the single charge regime. We assemble and manually annotate a large, heterogeneous dataset of 1015 experimental CSDs measured from silicon QD devices, spanning nine design geometries, multiple wafers, and fabrication runs. A U-Net style convolutional neural network (CNN) with a MobileNetV2 encoder is trained and validated through five-fold group cross validation. Our model achieves an overall offline tuning success of 80.0% in locating the single-charge regime, with peak performance exceeding 88% for some designs. We analyze dominant failure modes and propose targeted mitigations. Finally, wide-range diagram segmentation also naturally enables scalable physic-based feature extraction that can feed back to fabrication and design workflows and outline a roadmap for real-time integration in a cryogenic wafer prober. Overall, our results show that neural network (NN) based wide-diagram segmentation is a practical step toward automated, high-throughput charge tuning for silicon QD qubits.

2506.07917 2026-06-17 cs.GR cs.CV

SpeeDe3DGS: Speedy Deformable 3D Gaussian Splatting with Temporal Pruning and Motion Grouping

SpeeDe3DGS:通过时间修剪和运动分组实现快速变形3D高斯点拨

Allen Tu, Haiyang Ying, Alex Hanson, Yonghan Lee, Tom Goldstein, Matthias Zwicker

发表机构 * University of Maryland, College Park(马里兰大学学院公园分校)

AI总结 本文提出SpeeDe3DGS,通过时间敏感性修剪、时间敏感性采样和GroupFlow模块,在保持高质量重建的同时,显著提升3DGS的渲染和训练效率。

Comments Project Page: https://speede3dgs.github.io/

详情
Journal ref
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 26083-26093
AI中文摘要

动态扩展的3D高斯点拨(3DGS)通过神经运动场实现高质量重建,但每个高斯神经推理使其模型计算成本高。基于DeformableGS,我们引入了快速变形3D高斯点拨(SpeeDe3DGS),通过三个互补模块:时间敏感性修剪(TSP)通过时间聚合敏感性分析移除低影响高斯,时间敏感性采样(TSS)扰动时间戳以抑制漂浮点并提高时间一致性,以及GroupFlow将学习的变形场压缩为共享SE(3)变换以实现高效的组间运动。在50个动态场景的MonoDyGauBench上,将TSP和TSS整合到DeformableGS中,平均渲染速度提升6.78倍,同时保持神经场保真度并使用10倍更少的原始体素。添加GroupFlow后,渲染速度进一步提升13.71倍,训练时间缩短2.53倍,超越所有基线,在保持优越图像质量的同时实现了更快的速度。

英文摘要

Dynamic extensions of 3D Gaussian Splatting (3DGS) achieve high-quality reconstructions through neural motion fields, but per-Gaussian neural inference makes these models computationally expensive. Building on DeformableGS, we introduce Speedy Deformable 3D Gaussian Splatting (SpeeDe3DGS), which bridges this efficiency-fidelity gap through three complementary modules: Temporal Sensitivity Pruning (TSP) removes low-impact Gaussians via temporally aggregated sensitivity analysis, Temporal Sensitivity Sampling (TSS) perturbs timestamps to suppress floaters and improve temporal coherence, and GroupFlow distills the learned deformation field into shared SE(3) transformations for efficient groupwise motion. On the 50 dynamic scenes in MonoDyGauBench, integrating TSP and TSS into DeformableGS accelerates rendering by 6.78$\times$ on average while maintaining neural-field fidelity and using 10$\times$ fewer primitives. Adding GroupFlow culminates in 13.71$\times$ faster rendering and 2.53$\times$ shorter training, surpassing all baselines in speed while preserving superior image quality.

2603.19801 2026-06-17 eess.IV cs.AI cs.CV

Offshore oil and gas platform dynamics in the North Sea, Gulf of Mexico, and Persian Gulf: Exploiting the Sentinel-1 archive

北海、墨西哥湾和波斯湾的海上石油和天然气平台动态:利用Sentinel-1档案

Robin Spanier, Thorsten Hoeser, John Truckenbrodt, Felix Bachofer, Claudia Kuenzer

发表机构 * German Remote Sensing Data Center, Earth Observation Center, EOC of the German Aerospace Center, DLR(德国遥感数据中心,地球观测中心,德国航空航天中心(DLR)地球观测中心) Institute for Geography and Geology, Department of Remote Sensing, University of Würzburg(地理与地质研究所,遥感系,乌尔姆大学)

AI总结 本文利用Sentinel-1数据和深度学习技术,研究了北海、墨西哥湾和波斯湾的海上平台动态,揭示了平台数量变化及结构转型,为海洋基础设施监测提供了数据支持。

Comments 16 pages, 10 figures, 1 table

详情
Journal ref
Big Earth Data, 2026, 1-27
AI中文摘要

随着海上基础设施的增加,对持续、可扩展的监测需求日益增长。本文提出了一种基于免费地球观测数据的自动化方法,利用Sentinel-1档案数据和深度学习目标检测技术,构建了2017-2025年间北海、墨西哥湾和波斯湾的季度平台位置时间序列。此外,还推导了平台大小、水深、海岸距离、国家归属及安装和退役日期等信息。2025年识别出3728个海上平台,其中北海有356个,墨西哥湾有1641个,波斯湾有1731个。尽管波斯湾平台数量在2024年前持续增长,但墨西哥湾和北海的平台数量在2018-2020年间有所下降。同时,超过2700个平台被安装或迁移到新地点,同时有相当数量被退役或迁移。此外,平台寿命缩短的趋势表明,海上行业正经历结构性变化,与移动海上单位如钻探平台的重要性增长有关。研究结果展示了免费地球观测数据和深度学习在持续、长期监测海洋基础设施中的潜力。所推导的数据集是公开的,为海上监测、海洋规划及海上能源行业转型分析提供了基础。

英文摘要

The increasing use of marine spaces by offshore infrastructure, including oil and gas platforms, underscores the need for consistent, scalable monitoring. Offshore development has economic, environmental, and regulatory implications, yet maritime areas remain difficult to monitor systematically due to their inaccessibility and spatial extent. This study presents an automated approach to the spatiotemporal detection of offshore oil and gas platforms based on freely available Earth observation data. Leveraging Sentinel-1 archive data and deep learning-based object detection, a consistent quarterly time series of platform locations for three major production regions: the North Sea, the Gulf of Mexico, and the Persian Gulf, was created for the period 2017-2025. In addition, platform size, water depth, distance to the coast, national affiliation, and installation and decommissioning dates were derived. 3,728 offshore platforms were identified in 2025, 356 in the North Sea, 1,641 in the Gulf of Mexico, and 1,731 in the Persian Gulf. While expansion was observed in the Persian Gulf until 2024, the Gulf of Mexico and the North Sea saw a decline in platform numbers from 2018-2020. At the same time, a pronounced dynamic was apparent. More than 2,700 platforms were installed or relocated to new sites, while a comparable number were decommissioned or relocated. Furthermore, the increasing number of platforms with short lifespans points to a structural change in the offshore sector associated with the growing importance of mobile offshore units such as jack-ups or drillships. The results highlighted the potential of freely available Earth observation data and deep learning for consistent, long-term monitoring of marine infrastructure. The derived dataset is public and provides a basis for offshore monitoring, maritime planning, and analyses of the transformation of the offshore energy sector.

2602.00473 2026-06-17 quant-ph cs.AI cs.LG

Quantum Phase Recognition via Quantum Attention Mechanism

通过量子注意机制进行量子相识别

Jin-Long Chen, Xin Li, Zhang-Qi Yin

发表机构 * Center for Quantum Technology Research(量子技术研究中心) Key Laboratory of Advanced Optoelectronic Quantum Architecture(先进光电量子架构重点实验室) Measurements (MOE), School of Physics, Beijing Institute of Technology, Beijing 100081, China(测量(MOE),物理学院,北京理工大学,北京100081,中国)

AI总结 本文提出混合量子-经典注意模型,利用交换测试和参数化量子电路提取量子态关联,实现基态分类,针对簇异或模型在9和15个量子比特系统中表现出高准确率和鲁棒性。

Comments 10 pages, 7 figures

详情
Journal ref
Phys. Rev. A 113, 062403 (2026)
AI中文摘要

许多体系统中的量子相变本质上由复杂的关联结构特征化,这给传统方法在大规模系统中的计算带来了挑战。为此,我们提出了一种混合量子-经典注意模型。该模型利用交换测试和参数化量子电路实现的注意机制,提取量子态中的关联并执行基态分类。在9和15个量子比特的簇异或模型上进行测试,该模型在少于100个训练数据的情况下实现了高分类准确率,并展示了对训练集变化的鲁棒性。进一步分析表明,该模型成功捕捉了相敏感特征和特征物理长度尺度,为复杂许多体系统中的量子相识别提供了一种可扩展且数据高效的解决方案。

英文摘要

Quantum phase transitions in many-body systems are fundamentally characterized by complex correlation structures, which pose computational challenges for conventional methods in large systems. To address this, we propose a hybrid quantum-classical attention model. This model uses an attention mechanism, realized through swap tests and a parameterized quantum circuit, to extract correlations within quantum states and perform ground-state classification. Benchmarked on the cluster-Ising model with system sizes of 9 and 15 qubits, the model achieves high classification accuracy with less than 100 training data and demonstrates robustness against variations in the training set. Further analysis reveals that the model successfully captures phase-sensitive features and characteristic physical length scales, offering a scalable and data-efficient approach for quantum phase recognition in complex many-body systems.

2511.03876 2026-06-17 eess.IV cs.CV cs.LG physics.med-ph

Computed Tomography (CT)-derived Cardiovascular Flow Estimation Using Physics-Informed Neural Networks Improves with Sinogram-based Training: A Simulation Study

基于CT的心血管血流估计利用物理信息神经网络,通过sinogram训练提升:一项模拟研究

Jinyuxuan Guo, Gurnoor Singh Khurana, Alejandro Gonzalo Grande, Juan C. del Alamo, Francisco Contijoch

发表机构 * Dept. of Bioengineering, University of California San Diego(加州大学圣地亚哥分校生物工程系) Dept. of Computer Science Engineering, University of California San Diego(加州大学圣地亚哥分校计算机科学与工程系) Dept. of Mechanical Engineering, Univ of Washington(华盛顿大学机械工程系) Depts of Mechanical Engineering and Cardiology, Univ. of Washington(华盛顿大学机械工程与心内科系) Depts. of Bioengineering, Radiology, University of California San Diego(加州大学圣地亚哥分校生物工程与放射学系)

AI总结 本研究评估了CT影像对基于物理信息神经网络(PINN)的血流估计的影响,提出了一种改进框架SinoFlow,直接利用sinogram数据估计血流,结果显示SinoFlow在避免滤波反投影引入的误差方面表现更优。

详情
AI中文摘要

背景:非侵入性成像基于血流评估在评估心脏功能和结构中起关键作用。CT是一种广泛使用的成像模态,能够稳健地评估心血管解剖和功能,但直接从对比剂演变的电影中估计血流速度的方法尚未开发。目的:本研究评估CT影像对基于物理信息神经网络(PINN)的血流估计的影响,并提出一种改进框架SinoFlow,直接利用sinogram数据估计血流。方法:我们利用计算流体力学生成理想化的2D血管分叉中的脉动流场,并模拟了不同 gantry 旋转速度、管电流和脉冲模式成像设置的CT扫描。我们比较了基于重建图像的PINN血流估计(ImageFlow)与SinoFlow的性能。结果:SinoFlow通过避免滤波反投影引入的误差显著提高了血流估计性能。SinoFlow在所有测试的gantry旋转速度下都表现出鲁棒性,并且始终产生比ImageFlow更低的均方误差和速度误差。此外,SinoFlow与脉冲模式成像兼容,并且在较短的脉冲宽度下保持更高的准确性。结论:本研究展示了SinoFlow在CT基血流估计中的潜力,为非侵入性血流评估提供了一种更有前景的方法。研究结果旨在为PINNs在CT图像中的未来应用提供信息,并提供了一种基于图像的估计解决方案,合理采集参数可产生准确的血流估计。

英文摘要

Background: Non-invasive imaging-based assessment of blood flow plays a critical role in evaluating heart function and structure. Computed Tomography (CT) is a widely-used imaging modality that can robustly evaluate cardiovascular anatomy and function, but direct methods to estimate blood flow velocity from movies of contrast evolution have not been developed. Purpose: This study evaluates the impact of CT imaging on Physics-Informed Neural Networks (PINN)-based flow estimation and proposes an improved framework, SinoFlow, which uses sinogram data directly to estimate blood flow. Methods: We generated pulsatile flow fields in an idealized 2D vessel bifurcation using computational fluid dynamics and simulated CT scans with varying gantry rotation speeds, tube currents, and pulse mode imaging settings. We compared the performance of PINN-based flow estimation using reconstructed images (ImageFlow) to SinoFlow. Results: SinoFlow significantly improved flow estimation performance by avoiding propagating errors introduced by filtered backprojection. SinoFlow was robust across all tested gantry rotation speeds and consistently produced lower mean squared error and velocity errors than ImageFlow. Additionally, SinoFlow was compatible with pulsed-mode imaging and maintained higher accuracy with shorter pulse widths. Conclusions: This study demonstrates the potential of SinoFlow for CT-based flow estimation, providing a more promising approach for non-invasive blood flow assessment. The findings aim to inform future applications of PINNs to CT images and provide a solution for image-based estimation, with reasonable acquisition parameters yielding accurate flow estimates.

2508.10908 2026-06-17 physics.ao-ph cs.LG

Data-driven global ocean model resolving ocean-atmosphere coupling dynamics

数据驱动的全球海洋模型解析海洋-大气耦合动力学

Jeong-Hwan Kim, Daehyun Kang, Young-Min Yang, Jae-Heung Park, Yoo-Geun Ham

发表机构 * Center for Climate and Carbon Cycle Research, Korea Institute of Science and Technology, Seoul, Republic of Korea(韩国科学技术院气候与碳循环研究中心,首尔,大韩民国) Department of Environment and Energy, Jeonbuk National University, Jeonju, Republic of Korea(全南国立大学环境与能源系,全州,大韩民国) School of Earth and Environmental Sciences, Seoul National University, Seoul, Republic of Korea(首尔国立大学地球与环境科学学院,首尔,大韩民国) Department of Environmental Management, Seoul National University, Seoul, Republic of Korea(首尔国立大学环境管理系,首尔,大韩民国)

AI总结 本文提出KIST-Ocean模型,利用U型视觉注意力对抗网络架构,通过部分卷积、对抗训练和迁移学习提升海洋预测能力,准确模拟热带太平洋的Kelvin波和Rossby波传播及环流风应力诱导的垂直运动,展现其在气候现象中的耦合机制表示能力。

Comments The manuscript contains 4 main figures. The Extended Data contains 7 figures and 3 tables. The Supplementary Information contains 3 text sections, 7 figures, 1 table

详情
Journal ref
Sci. Adv. 12, eaed1225 (2026)
AI中文摘要

人工智能已推动全球天气预报发展,优于传统数值模型在准确性和计算效率方面。然而,预测超亚季节时间尺度需要开发基于深度学习的海洋-大气耦合模型,以真实模拟复杂海洋对大气强迫的响应。本文提出KIST-Ocean,一种基于深度学习的全球三维海洋环流模型,采用U型视觉注意力对抗网络架构。KIST-Ocean通过部分卷积、对抗训练和迁移学习解决海岸复杂性和预测分布漂移问题。全面评估证实了模型的鲁棒海洋预测能力和效率。此外,它准确捕捉现实海洋响应,如热带太平洋的Kelvin和Rossby波传播,以及由环流和反环流风应力引起的垂直运动,展示其在气候现象(如厄尔尼诺-南方涛动)中关键海洋-大气耦合机制的表示能力。这些发现增强了基于深度学习的全球天气和气候模型的信心,并拓展深度学习方法到更广泛的地球系统建模,为提升气候预测能力提供潜力。

英文摘要

Artificial intelligence has advanced global weather forecasting, outperforming traditional numerical models in both accuracy and computational efficiency. Nevertheless, extending predictions beyond subseasonal timescales requires the development of deep learning (DL)-based ocean-atmosphere coupled models that can realistically simulate complex oceanic responses to atmospheric forcing. This study presents KIST-Ocean, a DL-based global three-dimensional ocean general circulation model using a U-shaped visual attention adversarial network architecture. KIST-Ocean integrates partial convolution, adversarial training, and transfer learning to address coastal complexity and predictive distribution drift in auto-regressive models. Comprehensive evaluations confirmed the model's robust ocean predictive skill and efficiency. Moreover, it accurately captures realistic ocean response, such as Kelvin and Rossby wave propagation in the tropical Pacific, and vertical motions induced by cyclonic and anticyclonic wind stress, demonstrating its ability to represent key ocean-atmosphere coupling mechanisms underlying climate phenomena, including the El Nino-Southern Oscillation. These findings reinforce confidence in DL-based global weather and climate models and their extending DL-based approaches to broader Earth system modeling, offering potential for enhancing climate prediction capabilities.

2506.08654 2026-06-17 physics.med-ph cs.LG

A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck

一种保护隐私的联邦学习框架用于头颈区域CBCT到合成CT的可推广转换

Ciro Benito Raggio, Paolo Zaffino, Maria Francesca Spadea

发表机构 * Institute of Biomedical Engineering(生物医学工程研究所) Karlsruhe Institute of Technology(卡尔斯鲁厄理工大学) Department of Experimental and Clinical Medicine(实验与临床医学系)

AI总结 本文提出一种跨机构联邦学习框架,用于头颈区域CBCT到合成CT的转换,通过保护数据隐私实现跨机构模型的泛化能力。

详情
Journal ref
Frontiers in Digital Health, 8:1812254, June 2026
AI中文摘要

锥束计算机断层扫描(CBCT)已成为图像引导放射治疗(IGRT)中广泛应用的成像模态。然而,CBCT存在噪声增加、软组织对比度有限和伪影等问题,导致Hounsfield单位值不可靠,阻碍了直接剂量计算。合成CT(sCT)生成从CBCT中解决了这些问题,尤其是使用深度学习(DL)方法。现有方法受到机构异质性、扫描仪依赖性变化和数据隐私法规的限制,这些法规防止多中心数据共享。为克服这些挑战,我们提出了一种跨机构横向联邦学习(FL)方法,用于头颈区域CBCT到sCT的合成,扩展了我们的FedSynthCT框架。一个条件生成对抗网络在欧洲三个医疗中心的公共SynthRAD2025挑战数据集上协同训练。联邦模型在不同中心间表现出有效的泛化能力,平均绝对误差(MAE)范围从64.38±13.63到85.90±7.10 HU,结构相似性指数(SSIM)从0.882±0.022到0.922±0.039,峰值信噪比(PSNR)从32.86±0.94到34.91±1.04 dB。值得注意的是,在60名患者的外部验证数据集上,未进行额外训练即可实现相似的性能(MAE: 75.22±11.81 HU,SSIM: 0.904±0.034,PSNR: 33.52±2.06 dB),证实了在协议、扫描仪差异和配准误差的情况下具有鲁棒的泛化能力。这些发现展示了联邦学习在CBCT到sCT合成中的技术可行性,同时保护了数据隐私,并提供了一种无需集中数据共享或特定站点微调即可在不同机构之间开发可推广模型的协作解决方案。

英文摘要

Shortened Abstract Cone-beam computed tomography (CBCT) has become a widely adopted modality for image-guided radiotherapy (IGRT). However, CBCT suffers from increased noise, limited soft-tissue contrast, and artifacts, resulting in unreliable Hounsfield unit values and hindering direct dose calculation. Synthetic CT (sCT) generation from CBCT addresses these issues, especially using deep learning (DL) methods. Existing approaches are limited by institutional heterogeneity, scanner-dependent variations, and data privacy regulations that prevent multi-center data sharing. To overcome these challenges, we propose a cross-silo horizontal federated learning (FL) approach for CBCT-to-sCT synthesis in the head and neck region, extending our FedSynthCT framework. A conditional generative adversarial network was collaboratively trained on data from three European medical centers in the public SynthRAD2025 challenge dataset. The federated model demonstrated effective generalization across centers, with mean absolute error (MAE) ranging from $64.38\pm13.63$ to $85.90\pm7.10$ HU, structural similarity index (SSIM) from $0.882\pm0.022$ to $0.922\pm0.039$, and peak signal-to-noise ratio (PSNR) from $32.86\pm0.94$ to $34.91\pm1.04$ dB. Notably, on an external validation dataset of 60 patients, comparable performance was achieved (MAE: $75.22\pm11.81$ HU, SSIM: $0.904\pm0.034$, PSNR: $33.52\pm2.06$ dB) without additional training, confirming robust generalization despite protocol, scanner differences and registration errors. These findings demonstrate the technical feasibility of FL for CBCT-to-sCT synthesis while preserving data privacy and offer a collaborative solution for developing generalizable models across institutions without centralized data sharing or site-specific fine-tuning.

2501.15351 2026-06-17 cs.CY cs.LG

Fairness in LLM-Generated Surveys

LLM生成调查中的公平性

Andrés Abeliuk, Vanessa Gaete, Naim Bro

发表机构 * Department of Computer Science, University of Chile(智利大学计算机科学系) National Center for Artificial Intelligence (CENIA)(国家人工智能中心) School of Government, Adolfo Ibáñez University(阿道弗·伊巴涅斯大学政府学院) Millennium Institute for Foundational Research on Data (IMFD)(数据基础研究千年研究所)

AI总结 研究分析了LLM在不同人口中的表现,发现其在美国数据集上表现更优,但存在因训练数据偏见导致的公平性问题,提出新的测量框架以提升模型公平性。

详情
Journal ref
EPJ Data Science (2026)
AI中文摘要

大型语言模型(LLMs)在文本生成和理解方面表现出色,尤其在模拟社会政治和经济模式方面,可作为传统调查的替代方案。然而,其全球适用性仍存疑,因未探索的社会人口和地理背景中的偏见。本研究通过分析智利和美国的公开调查,探讨LLM在不同人群中的表现,关注预测准确性和公平性指标。结果显示,LLM在美国数据集上表现更优,此偏见源于以美国为中心的训练数据,即使考虑社会人口差异后仍显著。在美国,政治身份和种族显著影响预测准确性,而在智利,性别、教育和宗教归属起更重要作用。本研究提出一种新的框架,用于测量LLM中的社会人口偏见,为确保在不同社会文化背景下实现更公平和公正的模型表现提供路径。

英文摘要

Large Language Models (LLMs) excel in text generation and understanding, especially in simulating socio-political and economic patterns, serving as an alternative to traditional surveys. However, their global applicability remains questionable due to unexplored biases across socio-demographic and geographic contexts. This study examines how LLMs perform across diverse populations by analyzing public surveys from Chile and the United States, focusing on predictive accuracy and fairness metrics. The results show performance disparities, with LLM consistently outperforming on U.S. datasets. This bias originates from the U.S.-centric training data, remaining evident after accounting for socio-demographic differences. In the U.S., political identity and race significantly influence prediction accuracy, while in Chile, gender, education, and religious affiliation play more pronounced roles. Our study presents a novel framework for measuring socio-demographic biases in LLMs, offering a path toward ensuring fairer and more equitable model performance across diverse socio-cultural contexts.

2408.15188 2026-06-17 eess.AS cs.CL cs.SD

Infusing Acoustic Pause Context into Text-Based Dementia Assessment

将语音停顿上下文注入基于文本的痴呆症评估

Franziska Braun, Sebastian P. Bayerl, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer

发表机构 * Technische Hochschule Nürnberg(图林根应用技术大学纽伦堡分校) Technische Hochschule Rosenheim(图林根应用技术大学罗森海姆分校) Klinik für Psychiatrie und Psychotherapie, Universitätsklinik der Paracelsus Medizinischen Privatuniversität, Klinikum Nürnberg, Germany(帕拉塞尔斯医学私人大学纽伦堡大学心理治疗与精神病科诊所) KST Institut GmbH, Bad Emstal, Germany(KST研究所,巴德埃姆斯塔尔,德国)

AI总结 本文研究利用停顿增强的转录文本,通过Transformer语言模型区分无认知障碍、轻度认知障碍和阿尔茨海默病患者,探讨停顿信息和声学上下文对不同任务的影响。

Comments Accepted at INTERSPEECH 2024

详情
Journal ref
Proceedings of Interspeech 2024
AI中文摘要

语音停顿,与内容和结构相结合,提供了一种有价值的、非侵入性的生物标志物,用于检测痴呆症。本工作探讨了在基于Transformer的语言模型中使用包含停顿的转录文本,以区分无认知障碍、轻度认知障碍和阿尔茨海默病患者在临床评估中的语音特征。我们处理了三个二元分类任务:起始、监测和痴呆排除。通过在德语口头流畅性测试和图片描述测试上的实验,比较模型在不同语音生成上下文中的有效性。从文本基线开始,我们探讨了停顿信息和声学上下文的整合效果。我们展示了测试应根据任务选择,并且词汇停顿信息和声学交叉注意力对不同任务贡献不同。

英文摘要

Speech pauses, alongside content and structure, offer a valuable and non-invasive biomarker for detecting dementia. This work investigates the use of pause-enriched transcripts in transformer-based language models to differentiate the cognitive states of subjects with no cognitive impairment, mild cognitive impairment, and Alzheimer's dementia based on their speech from a clinical assessment. We address three binary classification tasks: Onset, monitoring, and dementia exclusion. The performance is evaluated through experiments on a German Verbal Fluency Test and a Picture Description Test, comparing the model's effectiveness across different speech production contexts. Starting from a textual baseline, we investigate the effect of incorporation of pause information and acoustic context. We show the test should be chosen depending on the task, and similarly, lexical pause information and acoustic cross-attention contribute differently.

2308.08306 2026-06-17 eess.AS cs.SD

Classifying Dementia in the Presence of Depression: A Cross-Corpus Study

在抑郁存在下的痴呆分类:一项跨语料库研究

Franziska Braun, Sebastian P. Bayerl, Paula A. Pérez-Toro, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer

发表机构 * Technische Hochschule Nürnberg(图林根应用技术大学) Friedrich-Alexander-Universität Erlangen-Nürnberg(埃尔兰根-纽伦堡 Friedrich-Alexander 大学) Klinik für Psychiatrie und Psychotherapie, Universitätsklinik der Paracelsus Medizinischen Privatuniversität, Klinikum Nürnberg, Germany(纽伦堡大学心理治疗与精神病科诊所,帕拉塞尔医学私人大学大学医院,纽伦堡诊所,德国) KST Institut GmbH, Bad Emstal, Germany(KST 机构,巴德埃姆斯塔尔,德国)

AI总结 本文通过跨语料库实验,利用文本、音频和情感嵌入对语音进行三类分类(HC vs. MCI vs. DEM),探讨抑郁作为次级诊断对分类器的影响。

Comments Accepted at INTERSPEECH 2023

详情
Journal ref
Proceedings of Interspeech 2023
AI中文摘要

自动痴呆筛查有助于早期检测和干预,减少对 healthcare 系统的成本,提高受影响者的质量生活。抑郁症与痴呆有共享症状,增加了诊断的复杂性。迄今为止,研究重点是使用单个数据集的图片描述测试语音对痴呆(DEM)和健康受试者(HC)进行二分类。在本工作中,我们应用已建立的基线系统,利用语义词汇流畅度测试和波士顿命名测试的语音,通过文本、音频和情感嵌入进行三类分类。我们在两个独立录制的德语数据集上进行跨语料库和混合语料库实验,以研究在更大人群和不同录音条件下的泛化能力。在详细的错误分析中,我们研究抑郁症作为次级诊断,以了解分类器实际上学到了什么。

英文摘要

Automated dementia screening enables early detection and intervention, reducing costs to healthcare systems and increasing quality of life for those affected. Depression has shared symptoms with dementia, adding complexity to diagnoses. The research focus so far has been on binary classification of dementia (DEM) and healthy controls (HC) using speech from picture description tests from a single dataset. In this work, we apply established baseline systems to discriminate cognitive impairment in speech from the semantic Verbal Fluency Test and the Boston Naming Test using text, audio and emotion embeddings in a 3-class classification problem (HC vs. MCI vs. DEM). We perform cross-corpus and mixed-corpus experiments on two independently recorded German datasets to investigate generalization to larger populations and different recording conditions. In a detailed error analysis, we look at depression as a secondary diagnosis to understand what our classifiers actually learn.

2201.06574 2026-06-17 eess.IV cs.CV

Neural Computed Tomography

神经计算断层扫描

Kunal Gupta, Brendan Colvert, Francisco Contijoch

发表机构 * University of California San Diego(加州大学圣地亚哥分校)

AI总结 本文提出NeuralCT框架,通过神经隐式方法生成无运动伪影的时间分辨图像,适用于心脏等复杂运动场景。

Comments https://kunalmgupta.github.io/projects/NeuralCT.html

详情
AI中文摘要

在获取投影集过程中发生的运动可能导致计算断层扫描重建中出现显著的运动伪影,尽管单个视图的获取速度较快。在如心脏成像等情况下,运动可能是不可避免的,评估运动具有临床意义。通过开发具有更快门架旋转速度的系统或使用测量和/或估计位移的算法,通常可以减少运动伪影。然而,这些方法由于物理限制以及估计/测量非刚性、时间变化和患者特异性运动的挑战而效果有限。我们提出了一种新的重建框架NeuralCT,以生成无运动伪影的时间分辨图像。我们的方法利用神经隐式方法,不需要对底层运动进行估计或建模。相反,通过使用符号距离度量和神经隐式框架来表示边界。我们利用“分析-合成”方法来确定与所获取的sinogram一致且符合空间和时间一致性约束的解决方案。我们通过三个渐进复杂的场景展示了NeuralCT的实用性:小圆的平移、椭圆直径的心跳样变化以及复杂的拓扑变形。在不进行超参数调优或改变架构的情况下,NeuralCT在使用均方误差和Dice度量时,为所有三种运动提供了高质量的图像重建,相比滤波反投影。

英文摘要

Motion during acquisition of a set of projections can lead to significant motion artifacts in computed tomography reconstructions despite fast acquisition of individual views. In cases such as cardiac imaging, motion may be unavoidable and evaluating motion may be of clinical interest. Reconstructing images with reduced motion artifacts has typically been achieved by developing systems with faster gantry rotation or using algorithms which measure and/or estimate the displacements. However, these approaches have had limited success due to both physical constraints as well as the challenge of estimating/measuring non-rigid, temporally varying, and patient-specific motions. We propose a novel reconstruction framework, NeuralCT, to generate time-resolved images free from motion artifacts. Our approaches utilizes a neural implicit approach and does not require estimation or modeling of the underlying motion. Instead, boundaries are represented using a signed distance metric and neural implicit framework. We utilize `analysis-by-synthesis' to identify a solution consistent with the acquired sinogram as well as spatial and temporal consistency constraints. We illustrate the utility of NeuralCT in three progressively more complex scenarios: translation of a small circle, heartbeat-like change in an ellipse's diameter, and complex topological deformation. Without hyperparameter tuning or change to the architecture, NeuralCT provides high quality image reconstruction for all three motions, as compared to filtered backprojection, using mean-square-error and Dice metrics.

2106.09539 2026-06-17 eess.AS cs.LG cs.SD

Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit

对新生儿重症监护病房中以儿童为中心的全天候录音中语音情感内容的自动分析

Einari Vaaras, Sari Ahlqvist-Björkroth, Konstantinos Drossos, Okko Räsänen

发表机构 * Unit of Computing Sciences, Tampere University, Finland(图瓦大学计算科学系) Department of Clinical Medicine, University of Turku, Finland(图尔库大学临床医学系) Department of Signal Processing and Acoustics, Aalto University, Finland(阿尔托大学信号处理与声学系)

AI总结 本文研究了如何通过自动语音情感识别系统分析新生儿录音中的情感内容,探讨了跨语料泛化、WGAN域适应和主动学习在新领域部署中的有效性,实现了73.4%的UAR分类性能。

详情
AI中文摘要

研究人员最近开始研究年轻婴儿听到的情感语音如何影响其发展结果。作为这项研究的一部分,来自芬兰和爱沙尼亚两家医院的数百小时全天候录音被收集,用于所谓的APPLE研究。为了分析此类大规模数据集中的语音情感内容,需要一个自动语音情感识别(SER)系统。然而,目前没有情感标签或现成的领域内SER系统可用。本文介绍了最初未标注的大型真实世界音频数据集,并描述了针对芬兰子集数据开发的功能性SER系统。我们探讨了替代的最先进技术在新领域部署SER系统的有效性,比较了跨语料泛化、基于WGAN的域适应和主动学习在该任务中的效果。结果表明,表现最好的模型能够实现二元分类中valence和arousal的73.4%未加权平均召回率(UAR)和73.2% UAR。结果还显示,主动学习在与其他两种方法相比时表现最为一致。

英文摘要

Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is required. However, there are no emotion labels or existing indomain SER systems to be used for this purpose. In this paper, we introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We explore the effectiveness of alternative state-of-the-art techniques to deploy a SER system to a new domain, comparing cross-corpus generalization, WGAN-based domain adaptation, and active learning in the task. As a result, we show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification for valence and arousal, respectively. The results also show that active learning achieves the most consistent performance compared to the two alternatives.

2606.18111 2026-06-17 cs.LG cs.AI 新提交

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

多目标强化学习中学习公平帕累托最优策略

Umer Siddique, Peilang Li, Yongcan Cao

AI总结 针对多目标强化学习中固定用户偏好无法提供多样化策略的问题,提出基于广义基尼福利函数的多策略方法,学习公平帕累托最优策略集。

Comments Accepted at the Reinforcement Learning Conference (RLC) 2025. 12 pages main + appendix, 8 figures, 4 tables

详情
AI中文摘要

公平性是多目标强化学习(MORL)决策中的一个重要方面,策略必须确保在多个潜在冲突的目标上既达到最优又实现公平。虽然单策略MORL方法可以使用福利函数(如广义基尼福利函数GGF)为固定的用户偏好学习公平策略,但它们无法提供动态或未知用户偏好所需的多样的策略集。为解决这一局限性,我们形式化了多策略MORL中的公平优化问题,其目标是学习一组帕累托最优策略,确保在所有可能的用户偏好下实现公平。我们的关键技术贡献有三点:(1)我们证明对于凹的、分段线性的福利函数(例如GGF),公平策略仍然在凸覆盖集(CCS)中,CCS是线性标量化下的近似帕累托前沿。(2)我们证明非平稳策略(通过累积奖励历史增强)和随机策略通过动态适应历史不公平性来改善公平性。(3)我们提出了三种新算法,包括将GGF与多策略多目标Q学习(MOQL)集成、用于学习非平稳策略的状态增强多策略MOQL,以及用于学习随机策略的新扩展。我们在多个领域评估了我们的算法,并将我们的方法与最先进的MORL基线进行了比较。实验结果表明,我们的方法学习了一组公平策略,能够适应不同的用户偏好。

英文摘要

Fairness is an important aspect of decision-making in multi-objective reinforcement learning (MORL), where policies must ensure both optimality and equity across multiple, potentially conflicting objectives. While single-policy MORL methods can learn fair policies for fixed user preferences using welfare functions such as the generalized Gini welfare function (GGF), they fail to provide the diverse set of policies necessary for dynamic or unknown user preferences. To address this limitation, we formalize the fair optimization problem in multi-policy MORL, where the goal is to learn a set of Pareto-optimal policies that ensure fairness across all possible user preferences. Our key technical contributions are threefold: (1) We show that for concave, piecewise-linear welfare functions (e.g., GGF), fair policies remain in the convex coverage set (CCS), which is an approximated Pareto front for linear scalarization. (2) We demonstrate that non-stationary policies, augmented with accrued reward histories, and stochastic policies improve fairness by dynamically adapting to historical inequities. (3) We propose three novel algorithms, which include integrating GGF with multi-policy multi-objective Q-Learning (MOQL), state-augmented multi-policy MOQL for learning non-statoinary policies, and its novel extension for learning stochastic policies. We evaluate our algorithms across various domains and compare our methods against the state-of-the-art MORL baselines. The empirical results show that our methods learn a set of fair policies that accommodate different user preferences.

2606.17692 2026-06-17 cs.LG 新提交

Delta-Based Target Reformulation for Short-Term Electricity Load Forecasting Using LSTM and Transformer Models

基于Delta目标重构的LSTM与Transformer短期电力负荷预测

Vansh Bansal

AI总结 针对电力负荷非平稳性,提出Delta目标重构方法,让LSTM和Transformer预测负荷变化量而非绝对值,在小时级预测中MAE和MAPE降低超50%。

Comments 8 pages, 3 tables

详情
AI中文摘要

准确的短期电力负荷预测对于现代电力系统的可靠和经济运行至关重要,尤其是在天气变化、日历效应和消费模式演变导致的非平稳性下。尽管LSTM和Transformer等深度学习模型表现出色,但大多数现有研究侧重于直接预测绝对负荷,而未明确解决目标非平稳性。受ARIMA模型中经典时间序列差分技术的启发,本文研究了一种基于Delta的目标重构方法,用于深度学习的短期电力负荷预测。该方法不直接预测绝对负荷值,而是训练模型预测连续时间步之间的负荷变化,最终预测通过最后一次观测负荷重建。这旨在稳定学习目标并降低预测难度。利用印度多年逐小时真实电力负荷数据,辅以NASA POWER项目的气象变量和日历特征,本研究评估了LSTM和Transformer在两种公式下的表现,并以LightGBM作为基准。实验针对小时前和日前预测范围进行,通过平均绝对误差(MAE)和平均绝对百分比误差(MAPE)评估性能。结果表明,Delta重构在所有评估模型的小时前预测中持续提高预测精度,与绝对公式相比,MAPE降低超过50%。对于日前预测,Delta目标特别有利于深度序列模型(LSTM和Transformer),而LightGBM在绝对公式下仍具有竞争力。这些发现表明,Delta重构是神经网络的一种强大归纳偏置,但其效果依赖于模型和预测范围。

英文摘要

Accurate short-term electricity load forecasting is critical for the reliable and economic operation of modern power systems, under non-stationarity arising from weather variability, calendar effects, and evolving consumption patterns. While deep learning models such as LSTMs and Transformers show promising performance, most existing studies focus on direct absolute load prediction without explicitly addressing target non-stationarity. Motivated by classical time-series differencing techniques in ARIMA models, this paper investigates a delta-based target reformulation for short-term electricity load forecasting using deep learning. Instead of directly predicting absolute load values, the proposed formulation trains models to predict the change in load between consecutive time steps, with final forecasts reconstructed using the last observed load. This aims to stabilize the learning target and reduce forecasting difficulty. Using multi-year, hourly real-world electricity load data from India, augmented with meteorological variables from the NASA POWER project and calendar features, this study evaluates LSTM and Transformer models under both formulations, benchmarking them against LightGBM. Experiments are conducted for hour-ahead and day-ahead horizons, assessing performance via Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). Results show that delta-based reformulation consistently improves forecasting accuracy for hour-ahead prediction across all evaluated models, yielding MAPE reductions of over 50% compared to absolute formulations. For day-ahead forecasting, delta targets specifically benefit deep sequence models (LSTM and Transformer), while LightGBM remains competitive under the absolute formulation. These findings indicate that while delta reformulation is a powerful inductive bias for neural networks, its efficacy is model- and horizon-dependent.

2606.17603 2026-06-17 cs.LG 新提交

Expanding SPHERE-JEPA: A Family of Statistical Regularizers for the Hypersphere

扩展SPHERE-JEPA:超球面上的统计正则化器家族

Léo Nicollier, Enric Meinhardt-Llopis, Max Dunitz, Marc Pic, Pablo Musé, Gabriele Facciolo

AI总结 为解决自监督学习中切片统计正则化器因蒙特卡洛采样引入投影方差导致优化不稳定和收敛慢的问题,提出全维MMD、KSD和KL散度正则化器,并采用旋转不变核,在ImageNet和Galaxy10上实现更稳定优化和一致改进。

详情
AI中文摘要

在自监督学习(SSL)中,通过在单位超球面上显式强制均匀分布来防止表示坍缩已被证明是有效的。然而,当前的框架通常依赖于切片统计正则化器,如SIGReg(用于LeJEPA)和SUSReg(用于SPHERE-JEPA),这些正则化器通过沿随机一维方向的蒙特卡洛采样来近似这一连续目标。这种随机性将投影方差注入训练梯度,破坏优化稳定性,并阻碍收敛。在这项工作中,我们首先证明,解析地积分掉这些随机投影自然地产生一个确定性的最大均值差异(MMD),从而避免了切片方法的方差。受此等价性的启发,我们直接在球面上制定了MMD、核斯坦因差异(KSD)和KL散度的全维目标,以强制均匀分布。为了防止空间偏差,我们通过谱理论构造旋转不变核来装备这些检验,并系统评估了两个典型族:平滑指数衰减(热核)和严格频率截止(带限)滤波器。实验上,去除投影引起的噪声导致更稳定的优化、更快的收敛,并在ImageNet和Galaxy10上相对于随机切片正则化器取得一致改进。此外,我们揭示了统计检验的选择塑造了学习潜在空间的几何结构:MMD和KSD有利于适用于以对象为中心的领域的局部聚类组织,而基于连续KDE的KL散度促进了细粒度的实例分离,在非聚类的程序化纹理检索上取得了最强结果。

英文摘要

In Self-Supervised Learning (SSL), preventing representation collapse by explicitly enforcing a uniform distribution on the unit hypersphere has proven to be effective. However, current frameworks typically rely on sliced statistical regularizers such as SIGReg (used in LeJEPA) and SUSReg (used in SPHERE-JEPA), which approximate this continuous objective via Monte Carlo sampling along random 1D directions. This stochasticity injects projection variance into the training gradients, destabilizing optimization, and hindering convergence. In this work, we first show that analytically integrating out these random projections natively yields a deterministic Maximum Mean Discrepancy (MMD), bypassing the variance of sliced methods. Motivated by this equivalence, we formulate full-dimensional objectives for MMD, Kernel Stein Discrepancy (KSD), and Kullback-Leibler (KL) divergence directly on the sphere to enforce a uniform distribution. To prevent spatial bias, we equip these tests with rotationally invariant kernels constructed via spectral theory, systematically evaluating two canonical families: smooth exponential decay (Heat) and strict frequency cutoff (Bandlimited) filters. Empirically, removing projection-induced noise results in more stable optimization, faster convergence, and consistent improvements over stochastic sliced regularizers on ImageNet and Galaxy10. Furthermore, we reveal that the choice of the statistical test shapes the geometry of the learned latent space: MMD and KSD favor locally clustered organization suitable for object-centric domains, whereas the continuous KDE-based KL divergence promotes fine-grained instance separation, yielding the strongest results on unclustered procedural texture retrieval.

2606.17579 2026-06-17 cs.LG cs.AI cs.CL cs.SI 新提交

LLM Features Can Hurt GNNs: Concatenation Interference on Homophilous Graph Benchmarks

LLM特征可能损害GNN:同配图基准上的拼接干扰

Zhongyuan Wang, Pratyusha Vemuri

AI总结 本文发现将LLM特征通过纯输入拼接(而非联合训练)引入图神经网络时,会在同配基准上系统性地降低准确率,并提出了一个基于LLM单独判别性指标Delta_sig来预测拼接效果。

Comments 29 pages, 8 figures

详情
AI中文摘要

将LLM生成的节点特征添加到图神经网络(GNN)中,被广泛报道能提高标准基准的准确率。我们记录了一个相反的观察:当LLM特征通过纯输入拼接(而非联合训练、蒸馏或提示条件)引入时,它们会在相同的同配基准上系统地降低准确率,而端到端LLM流水线在这些基准上却能成功。使用MLP骨干网络、Planetoid公共划分和词袋原始特征,拼接SBERT编码的GPT-4o-mini TAPE特征导致PubMed测试准确率下降-17.0±0.3个百分点,Cora下降-4.3±0.6个百分点(CiteSeer下降-0.6±0.8个百分点,在种子噪声范围内)。当我们放宽每个条件(GCN/GCNII/GAT骨干网络、随机划分、更小编码器)时,下降幅度减弱,并在中等同配的WikiCS(+4.4个百分点)和ogbn-arxiv(+11.7个百分点)上逆转。为了预测拼接何时有益或有害,我们报告了一个简单的LLM单独判别性指标Delta_sig。在9个数据集上,Delta_sig与拼接成本的相关系数(r^2=0.38)强于同配性(r^2=0.06;N=9,bootstrap置信区间重叠)。bootstrap最佳变点为tau=13.8个百分点,规则“Delta_sig <= tau预测非正拼接成本”正确分类了7/9个数据集;由于60%的bootstrap样本将tau置于[5,30]个百分点之间,我们将Delta_sig视为解释性透镜而非精确过滤器。在PubMed上进行的维度控制消融实验将LLM特征下降置于同源PCA(-2.3个百分点)和同维高斯噪声(-37.3个百分点)之间,排除了维度和权重衰减的影响。九个PubMed配置拟合出幂律|Delta_concat| ∝ (sqrt(d_l/n))^1.31,r^2=0.97;低Delta_sig、小n的角落正是标题中-17个百分点PubMed缺陷出现的位置。

英文摘要

Adding LLM-generated node features to graph neural networks (GNNs) is widely reported to improve accuracy on standard benchmarks. We document a contrasting observation: when LLM features are introduced through pure input concatenation (rather than joint training, distillation, or prompt-conditioning), they can systematically degrade accuracy on the same homophilous benchmarks where end-to-end LLM pipelines succeed. With an MLP backbone on the Planetoid public split and bag-of-words original features, concatenating SBERT-encoded GPT-4o-mini TAPE features reduces PubMed test accuracy by -17.0 +/- 0.3 pp and Cora by -4.3 +/- 0.6 pp (CiteSeer -0.6 +/- 0.8 pp, within seed noise). The drop attenuates as we relax each condition (GCN / GCNII / GAT backbones, random splits, smaller encoders) and reverses on medium-homophily WikiCS (+4.4 pp) and ogbn-arxiv (+11.7 pp). To predict when concatenation helps versus hurts, we report a simple measure of LLM-alone discriminability, Delta_sig. Across 9 datasets Delta_sig correlates with the concatenation cost more strongly than homophily at point estimate (r^2 = 0.38 vs. 0.06; N=9, bootstrap CIs overlap). The bootstrap-best change-point is tau = 13.8 pp, and the rule "Delta_sig <= tau predicts non-positive concat cost" classifies 7/9 datasets correctly; since 60% of bootstrap samples place tau in [5, 30] pp, we treat Delta_sig as an interpretive lens rather than a precision filter. A dimension-controlled ablation on PubMed places the LLM-feature drop between same-source PCA (-2.3 pp) and same-dim Gaussian noise (-37.3 pp), ruling out dimensionality and weight-decay artifacts. Nine PubMed configurations fit a power law |Delta_concat| proportional to (sqrt(d_l/n))^1.31 with r^2 = 0.97; the low-Delta_sig, small-n corner is exactly where the headline -17 pp PubMed deficit appears.

2606.17572 2026-06-17 cs.LG cs.SY eess.SY 新提交

When Dynamics Models Read the Wrong Time Steps: Label-Free Event Credit Re-Anchoring for Robust Global Readouts

当动力学模型读取错误的时间步:无标签事件信用重锚定以实现鲁棒的全局读出

Yifan Wang

AI总结 针对序列到全局接口中的时间信用稀释问题,提出无训练无标签的CREST方法,通过事件核心估计与对比重锚定,减少分布外误差并恢复事件信用。

Comments 7 pages, 6 figures

详情
AI中文摘要

学习到的动力学模型通常通过将每步特征序列池化为一个读出向量来回答全局物理问题,如故障严重性或冲击刚度。这种序列到全局的接口产生了一个未被充分研究的时间信用问题:在仅有轨迹级监督的情况下,模型可以在训练条件下准确预测,同时从丰富的平滑相关物而非决定目标的短暂物理事件中读取信息。我们将这种失败称为时间信用稀释。它不会被训练损失暴露,也不会被标准的物理信息残差消除,因为错误在于全局读出分配功能信用的位置。我们引入了Credit-in-Event,一种接口级探针,用于测量池化信用落在事件步上的程度,并闭式证明当事件分数缩小时,池化线性读取器将信用路由到虚假的背景通道。然后我们提出了CREST,一种无训练且无标签的读出方法,它从学习到的特征中估计瞬态事件核心,并通过事件与其余部分的对比重锚定池化表示。在模拟齿轮和冲击系统、循环和注意力编码器以及公共轴承振动数据上,CREST减少了分布外误差,同时恢复了事件信用。消融实验表明,稳定步选择和感受野缩小失败,证实了增益来自事件核心信用重锚定,而非通用的局部性或稳定性先验。

英文摘要

Learned dynamics models often answer global physical questions, such as fault severity or impact stiffness, by pooling a per-step feature sequence into one readout vector. This sequence-to-global interface creates an under-studied temporal credit problem: with only trajectory-level supervision, a model can predict accurately in training conditions while reading from abundant smooth correlates rather than the brief physical events that determine the target. We call this failure temporal credit dilution. It is not exposed by the training loss and is not removed by standard physics-informed residuals, because the error lies in where the global readout assigns functional credit. We introduce Credit-in-Event, an interface-level probe for measuring how much pooled credit lands on event steps, and prove in closed form that a pooled linear reader routes credit to a spurious background channel as the event fraction shrinks. We then propose CREST, a training-free and label-free readout that estimates a transient event core from learned features and re-anchors the pooled representation through event-versus-rest contrast. Across simulated gear and impact systems, recurrent and attention encoders, and public bearing vibration data, CREST reduces out-of-distribution error while restoring event credit. Ablations show that stable-step selection and receptive-field shrinking fail, confirming that the gain comes from event-core credit re-anchoring rather than a generic locality or stability prior.

2606.17451 2026-06-17 cs.LG cs.RO 新提交

Credibility-Weighted Pricing of Autonomous Vehicle Liability Under Operational Design Domain Shift

操作设计域转移下自动驾驶汽车责任的可信度加权定价

Doyeon Jang

AI总结 针对自动驾驶系统部署中经验稀疏、ODD转移及风险非平稳问题,提出分层贝叶斯可信度框架,通过ODD相似性核进行部分池化,在Waymo数据上验证其有效性。

详情
AI中文摘要

自动驾驶系统的部署带来了一个基础性的费率制定挑战:稀疏的经验、不断变化的操作设计域以及跨软件版本的非平稳风险。我们提出了一个分层贝叶斯可信度框架,通过学习的ODD相似性核汇集城市、软件版本和区域的信息,将Buhlmann-Straub作为极限情况嵌套其中。基于NHTSA Standing General Order数据库中美国四个大都市区的648起Waymo已验证碰撞事件与1.16亿匹配里程的演示表明,城市聚合可信度权重适中(0.12-0.46),部分池化明显优于无池化,且功效分析显示,学习核的优势在大约十二个部署城市时变得可检测。

英文摘要

Automated Driving System deployments create a foundational ratemaking challenge: sparse experience, shifting operational design domains, and non-stationary risk across software releases. We propose a hierarchical Bayesian credibility framework pooling across cities, software versions, and territories via a learned ODD-similarity kernel, nesting Buhlmann-Straub as a limiting case. Demonstrated on 648 verified-engaged Waymo crashes across four U.S. metros from the NHTSA Standing General Order database against 116 million matched miles, city-aggregate credibility weights are moderate (0.12-0.46), partial pooling decisively outperforms no pooling, and a power analysis shows the learned kernel's advantage becomes detectable at approximately twelve deployed cities.

2606.16878 2026-06-17 cs.LG 新提交

Integrated Marketing Attribution: A Bayesian Framework for Privacy-Safe Granular Measurement Anchored in MMM

集成营销归因:基于贝叶斯框架的隐私安全粒度测量,锚定于MMM

Meghana R. Bhat, Ankit Umare, Utsav Aggarwal, Richard Vecsler, Arunkumar Mani, Karthik Nair, Chandhu Nair

AI总结 提出集成营销归因(IMA)框架,结合营销组合模型(MMM)与贝叶斯归因模型,从聚合数据中推导出活动级效果,实现隐私安全且粒度精细的归因。

详情
AI中文摘要

零售营销测量日益需要精细的活动级洞察,而无需依赖用户级跟踪。然而,两种主流方法——营销组合模型(MMM)和多触点归因(MTA)——常常产生碎片化的洞察。MMM在渠道级规划中隐私安全且稳健,但对于活动优化过于粗糙;而MTA提供精细归因,但在日益增加的隐私限制下变得不太可靠。我们提出集成营销归因(IMA),一个统一框架,将MMM与特定渠道的贝叶斯归因模型相结合,从聚合数据中推导活动级效果。通过利用MMM信息先验,IMA提供精细、隐私安全的归因,同时保持与MMM的一致性。

英文摘要

Retail marketing measurement increasingly requires granular campaign-level insights without relying on user-level tracking. However, the two dominant approaches, Marketing Mix Modeling (MMM) and Multi-Touch Attribution (MTA), often produce fragmented insights. MMM is privacy-safe and robust for channel-level planning but is too coarse for campaign optimization, while MTA provides granular attribution but has become less reliable under increasing privacy restrictions. We propose Integrated Marketing Attribution (IMA), a unified framework that combines MMM with channel specific Bayesian attribution models to derive campaign-level effects from aggregated data. By leveraging MMM-informed priors, IMA delivers granular, privacy-safe attribution while preserving consistency with MMM.

2606.12867 2026-06-17 cs.LG 新提交

SMGFM: Spectral Multimodal Graph Pretraining for Multimodal-Attributed Graphs

SMGFM: 面向多模态属性图的谱多模态图预训练

Zhengyu Wu, Xu Wang, Hongchao Qin, Xunkai Li, Guang Zeng, Rong-Hua Li, Guoren Wang

AI总结 提出SMGFM框架,利用图频谱分解区分结构诱导语义与模态特有语义,通过频带路由实现跨模态融合,在图级和模态级任务上取得最优性能。

详情
AI中文摘要

多模态属性图(MAGs)将图拓扑结构与来自文本、图像等模态的节点语义相结合。传统的图学习通过耦合拓扑与节点特征来上下文化节点语义。然而,这种耦合设计在MAGs中变得棘手,因为结构诱导和模态固有的语义可能对下游任务产生不同贡献。结构诱导语义通过平滑拓扑变化促进关系一致性,而模态固有语义通常编码局部、细粒度的区分,不应被统一平滑或对齐。因此,关键挑战在于跨模态融合前识别语义角色。为此,我们利用图频率变化作为先验,其中低频分量捕获拓扑一致语义,高频分量保留模态特定语义。基于这一直觉,我们提出SMGFM,一种谱多模态图预训练框架,将每个模态特定的节点信号分解为图频带,并在跨模态交互前分配频带级语义角色。具体地,SMGFM使用可扩展的切比雪夫滤波器构建频率解析的模态令牌,通过拓扑条件路由估计其耦合可靠性,并在融合前进行频带-模态交互。其频率路由目标在平滑共识路由的同时保留模态特定路由,减轻空间域纠缠和统一跨模态对齐。在MAG数据集上的大量实验表明,SMGFM在图级和模态级任务上均达到最先进性能。

英文摘要

Multimodal-attributed graphs (MAGs) couple graph topology with node semantics from text, images, and other modalities. Traditional graph learning contextualizes node semantics by coupling topology with node features. However, this coupling design becomes troublesome in MAGs, where structure-induced and modality-intrinsic semantics may contribute differently to downstream tasks. Structure-induced semantics promote relational consistency through smooth topological variation, whereas modality-intrinsic semantics often encode local, fine-grained distinctions that should not be uniformly smoothed or aligned. Therefore, the key challenge is to identify semantic roles before cross-modal fusion. To this end, we leverage graph-frequency variation as a prior, where low-frequency components capture topology-consistent semantics and high-frequency components preserve modality-specific semantics. Based on this intuition, we propose SMGFM, a spectral multimodal graph pretraining framework that decomposes each modality-specific node signal into graph-frequency bands and assigns band-level semantic roles before cross-modal interaction. Concretely, SMGFM constructs frequency-resolved modality tokens with scalable Chebyshev filters, estimates their coupling reliability through topology-conditioned routing, and performs band-modality interaction before fusion. Its frequency-routed objectives align smooth consensus routes while preserving modality-specific routes, mitigating spatial-domain entanglement and uniform cross-modal alignment. Extensive experiments conducted on the MAG datasets demonstrate that SMGFM achieves state-of-the-art performance across graph-level and modality-level tasks.

2502.17518 2026-06-17 cs.LG cs.AI q-fin.CP stat.ML 版本更新

Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies

通过分类器模型进行集成强化学习:在交易策略中增强风险回报权衡

Zheli Xiong

AI总结 本文研究了在金融交易策略中使用集成强化学习模型的全面研究,利用分类器模型来提升性能。通过将A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归相结合,探讨不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个强化学习模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

Comments 23 pages,10 figures, 9 table

详情
AI中文摘要

本文提出了一项全面研究,探讨在金融交易策略中使用集成强化学习(RL)模型的应用,利用分类器模型来提升性能。通过结合A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归,我们研究了不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个RL模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。我们的结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

英文摘要

This paper presents a comprehensive study on the use of ensemble Reinforcement Learning (RL) models in financial trading strategies, leveraging classifier models to enhance performance. By combining RL algorithms such as A2C, PPO, and SAC with traditional classifiers like Support Vector Machines (SVM), Decision Trees, and Logistic Regression, we investigate how different classifier groups can be integrated to improve risk-return trade-offs. The study evaluates the effectiveness of various ensemble methods, comparing them with individual RL models across key financial metrics, including Cumulative Returns, Sharpe Ratios (SR), Calmar Ratios, and Maximum Drawdown (MDD). Our original experimental results demonstrate that ensemble methods often outperform base models in terms of risk-adjusted returns, providing better management of drawdowns and overall stability. However, both the original analysis and the additional reproduction reported in this version show that ensemble performance is sensitive to the choice of variance threshold \(τ\), classifier group, RL-agent pair, and market universe. The reproduction evidence strengthens the conclusion that classifier-assisted ensemble selection can improve robustness, while also clarifying that the advantage is conditional rather than automatic across all datasets. This study emphasizes the value of combining RL with classifiers for adaptive decision-making, with implications for financial trading, robotics, and other dynamic environments.

2602.22159 2026-06-17 cs.CV 版本更新

CASR: A Robust Cyclic Framework for Arbitrary Large-Scale Super-Resolution with Distribution Alignment and Self-Similarity Awareness

CASR:一种鲁棒的循环框架,用于任意大尺度超分辨率,具有分布对齐和自相似性意识

Wenhao Guo, Zhaoran Zhao, Peng Lu, Sheng Li, Qian Qiao, DeRui Li

AI总结 CASR通过分布对齐和自相似性意识,解决大尺度超分辨率中的分布漂移和扩散不一致问题,实现稳定推理和高效单模型处理。

详情
AI中文摘要

CASR通过分布对齐和自相似性意识,解决大尺度超分辨率中的分布漂移和扩散不一致问题,实现稳定推理和高效单模型处理。

英文摘要

Arbitrary-Scale SR (ASISR) remains fundamentally limited by cross-scale distribution shift: once the inference scale leaves the training range, noise, blur, and artifacts accumulate sharply. We revisit this challenge from a cross-scale distribution transition perspective and propose CASR, a simple yet highly efficient cyclic SR framework that reformulates ultra-magnification as a sequence of in-distribution scale transitions. This design ensures stable inference at arbitrary scales while requiring only a single model. CASR tackles two major bottlenecks: distribution drift across iterations and patch-wise diffusion inconsistencies. The proposed SSAM module aligns structural distributions via superpixel aggregation, preventing error accumulation, while SARM module restores high-frequency textures by enforcing correlation-guided consistency and preserving self-similarity structure through correlation alignment. Despite using only a single model, our approach significantly reduces distribution drift, preserves long-range texture consistency, and achieves superior generalization even at extreme magnification.

2605.09313 2026-06-17 cs.CV 版本更新

Attention Sinks in Diffusion Transformers: A Causal Analysis

扩散变换器中的注意力 sinks:一种因果分析

Fangzheng Wu, Brian Summa

AI总结 研究探讨了扩散变换器中注意力 sinks 的作用,通过动态识别并抑制注意力接收者,发现其对文本-图像对齐和偏好代理影响有限,但强干预下出现特定边界。

详情
AI中文摘要

Attention sinks -- tokens that receive disproportionate attention mass -- are assumed to be functionally important in autoregressive language models, but their role in diffusion transformers remains unclear. We present a causal analysis in text-to-image diffusion, dynamically identifying dominant attention recipients per timestep and suppressing them via paired, 免训练 interventions on the score and value paths. Across 553 GenEval prompts on Stable Diffusion~3 (with SDXL corroboration), removing these sinks does not degrade text-image alignment (CLIP-T) or preference proxies (ImageReward, HPS-v2) at $k{=}1$; only under stronger interventions ($k\!\geq\!10$) does HPS-v2 exhibit a metric-dependent boundary, while CLIP-T remains robust throughout. The perceptual shifts induced by suppression are nonetheless \emph{sink-specific} -- $\sim\!6\times$ larger than equal-budget random masking -- revealing an empirical dissociation between trajectory-level perturbation and \emph{semantic alignment} in diffusion transformers. \footnote{Code available at https://github.com/wfz666/ICML26-attention-sink.}

英文摘要

Attention sinks -- tokens that receive disproportionate attention mass -- are assumed to be functionally important in autoregressive language models, but their role in diffusion transformers remains unclear. We present a causal analysis in text-to-image diffusion, dynamically identifying dominant attention recipients per timestep and suppressing them via paired, training-free interventions on the score and value paths. Across 553 GenEval prompts on Stable Diffusion~3 (with SDXL corroboration), removing these sinks does not degrade text-image alignment (CLIP-T) or preference proxies (ImageReward, HPS-v2) at $k{=}1$; only under stronger interventions ($k\!\geq\!10$) does HPS-v2 exhibit a metric-dependent boundary, while CLIP-T remains robust throughout. The perceptual shifts induced by suppression are nonetheless \emph{sink-specific} -- $\sim\!6\times$ larger than equal-budget random masking -- revealing an empirical dissociation between trajectory-level perturbation and \emph{semantic alignment} in diffusion transformers. \footnote{Code available at https://github.com/wfz666/ICML26-attention-sink.}

2601.01762 2026-06-17 cs.RO cs.CV 版本更新

AlignDrive: Aligned Lateral-Longitudinal Planning for End-to-End Autonomous Driving

AlignDrive: 用于端到端自动驾驶的对齐横向-纵向规划

Yanhao Wu, Haoyang Zhang, Fei He, Rui Wu, Yanhu Shan, Congpei Qiu, Liang Gao, Wei Ke, Tong Zhang

AI总结 本文提出一种 cascaded 框架,通过将纵向规划转化为路径条件推理过程,提升自动驾驶的协调性和安全性。方法引入锚点回归设计和规划导向的数据增强策略,实现在 Bench2Drive 上达到 SOTA 性能。

Comments underreview

详情
AI中文摘要

实用的自动驾驶需要能够通过时空可能性推理来排除不安全结果的模型。尽管最先进的方法使用并行规划架构,但它们未能明确将速度决策与路径上的代理行为联系起来,导致协调不优。为此,我们提出了一种级联框架,将纵向规划从独立预测任务转化为路径条件推理过程。在模型方面,我们引入基于锚点的回归设计,将纵向预测条件于横向驾驶路径,并将纵向规划重新表述为路径上的 1D 位移预测。这减少了几何不确定性,并使模型更专注于由交互驱动的动力学。在数据方面,我们引入了规划导向的数据增强策略,通过程序性插入代理和重标记纵向目标来模拟罕见的安全关键事件。在具有挑战性的 Bench2Drive 基准上评估,我们的方法在驾驶分数为 89.07 和成功率为 73.18% 的情况下实现了 SOTA 性能,证明了显著改进的协调性和安全性。进一步在 Fail2Drive 上的评估证实了在平行公式通常失败的罕见边缘情况下具有强大的泛化能力。项目页面:https://yanhaowu.github.io/AlignDrive/.

英文摘要

Practical autonomous driving requires models that generalize by reasoning through spatial-temporal possibilities to exclude unsafe outcomes. While state-of-the-art (SOTA) methods use parallel planning architectures, they fail to explicitly couple speed decisions with agent behavior along the driving path, leading to suboptimal coordination. To address this, we propose a cascaded framework that transforms longitudinal planning from an independent prediction task into a path-conditioned reasoning process. On the model side, we introduce an anchor-based regression design that conditions longitudinal prediction on the lateral drive path, and reformulate longitudinal planning as 1D displacement prediction along the path. This reduces geometric uncertainty and sharpens the model's focus on interaction-driven dynamics. On the data side, we introduce a planning-oriented data augmentation strategy that simulates rare safety-critical events by programmatically inserting agents and relabeling longitudinal targets to enforce collision avoidance. Evaluated on the challenging Bench2Drive benchmark, our method achieves SOTA performance with a driving score of 89.07 and a success rate of 73.18%, demonstrating significantly improved coordination and safety. Further evaluation on Fail2Drive confirms strong generalization to rare edge cases where parallel formulations typically fail. Project page:https://yanhaowu.github.io/AlignDrive/.

2605.08827 2026-06-17 cs.AI 版本更新

Mental Health AI Safety Claims Must Preserve Temporal Evidence

心理健康AI的安全性主张必须保留时间证据

Srimonti Dutta, Ratna Kandala

AI总结 本文指出,心理健康AI的安全性评估常忽略时间维度,提出SCOPE-MH原则以确保评估保留时间证据,揭示对话中逐步恶化等机制,强调时间证据对安全部署的必要性。

详情
AI中文摘要

心理健康AI的安全性往往在错误的时间尺度上被评判。当前评估通常仅评分孤立响应、终点结果或对话质量总和,而临床重要失败可能源于交互顺序和累积,包括延迟升级、重复强化、依赖形成、失败修复和逐步恶化的跨轮次。本文认为这种不匹配不仅是评估覆盖的限制,更是无效安全结论的来源。我们引入了时间安全不可识别性,即为何依赖序列、时间、累积或恢复的安全属性无法通过丢弃这些特征的协议认证。从这一形式化中,我们开发了SCOPE(安全主张基于保留证据)作为对齐安全主张与评估实际保留证据的一般原则,并将其实例化为SCOPE-MH,即心理健康领域的这一报告标准。我们通过AnnoMI数据集上的概念验证,揭示了单轮行为评分无法代表的失败机制。我们提出SCOPE-MH作为现有评估基础设施的诊断补充,并论证保留时间证据对安全关键的心理健康AI部署是必要而非可选的。

英文摘要

The safety of mental health AI is often judged at the wrong temporal scale. Current evaluations typically score isolated responses, endpoint outcomes, or aggregate dialogue quality, while clinically consequential failures may arise from the order and accumulation of interactions themselves, including delayed escalation, repeated reinforcement, dependency formation, failed repair, and gradual deterioration across turns. This paper argues that this mismatch is not merely a limitation of evaluation coverage but a source of invalid safety conclusions. We introduce Temporal Safety Non-Identifiability, a formal account of why safety properties that depend on sequence, timing, accumulation, or recovery cannot be certified by protocols that discard those features. From this formalization, we develop SCOPE (Safety Claims Over Preserved Evidence) as a general principle for aligning safety claims with the evidence an evaluation actually retains, and instantiate it as SCOPE-MH, a mental-health instantiation of this reporting standard. We operationalize SCOPE-MH through a proof-of-concept on the AnnoMI dataset of expert-annotated motivational interviewing conversations, which reveals mechanisms of failure that per-turn behavior scoring does not represent. We propose SCOPE-MH as a diagnostic complement to existing evaluation infrastructure and argue that evaluation preserving temporal evidence is necessary, not optional, for safety-critical mental health AI deployment.

2605.08077 2026-06-17 cs.CL 版本更新

Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration

Conformal Path Reasoning: 通过路径级校准实现可信的知识图谱问答

Shuhang Lin, Chuhao Zhou, Xiao Lin, Zihan Dong, Kuan Lu, Zhencan Peng, Jie Yin, Dimitris N. Metaxas

AI总结 提出Conformal Path Reasoning (CPR)框架,通过查询级共形校准和残差共形价值网络(RCVNet)学习判别性路径级非一致性分数,在保证有效覆盖的同时将预测集大小平均减少52%。

Comments 13 pages, 3 figures, 2 tables;

详情
AI中文摘要

知识图谱问答(KGQA)提供了基于事实的、可解释的推理,但现有方法通常无法对检索到的答案提供可靠的覆盖保证。虽然共形预测(CP)为生成具有统计保证的预测集提供了原则性框架,但先前的共形KGQA方法存在两个关键缺陷:由于无效校准导致的覆盖保证被违反,以及弱分数判别性导致预测集过大。我们提出Conformal Path Reasoning (CPR),一个基于两个关键创新的新型可信KGQA框架。首先,路径级分数上的查询级共形校准保持可交换性以确保有效的覆盖保证。其次,我们引入残差共形价值网络(RCVNet),这是一个通过PUCT引导探索训练的轻量级模块,用于学习判别性的路径级非一致性分数。大量实验表明,与基准数据集上的共形基线相比,CPR将经验覆盖率平均提高45%,同时将预测集大小平均减少52%,突显了其在知识图谱上进行可靠共形推理的有效性。

英文摘要

Knowledge Graph Question Answering (KGQA) offers grounded, interpretable reasoning, but existing methods often fail to provide reliable coverage guarantees over retrieved answers. While Conformal Prediction (CP) offers a principled framework for producing prediction sets with statistical guarantees, prior conformal KGQA methods suffer from two critical pitfalls: violated coverage guarantees due to invalid calibration, and weak score discriminability that yields excessively large prediction sets. We propose Conformal Path Reasoning (CPR), a novel trustworthy KGQA framework built on two key innovations. First, query-level conformal calibration over path-level scores preserves exchangeability to ensure valid coverage guarantees. Second, we introduce the Residual Conformal Value Network (RCVNet), a lightweight module trained via PUCT-guided exploration to learn discriminative path-level nonconformity scores. Extensive experiments show that CPR significantly improves the Empirical Coverage Rate by 45% while reducing prediction set size by 52% on average over conformal baselines across benchmark datasets, highlighting its effectiveness for reliable conformal reasoning over knowledge graphs.

2504.11837 2026-06-17 cs.CL cs.AI 版本更新

EmoFSM: A Finite State Machine for Emotional Support Conversation

EmoFSM:一种用于情感支持对话的有限状态机

Yue Zhao, Qingqing Gu, Xiaoyu Wang, Teng Chen, Zhonglin Jiang, Yong Chen, Hongyan Li, Luo Ji

AI总结 针对情感支持对话中长期满意度不足的问题,提出EmoFSM框架,利用有限状态机引导大语言模型进行规划与自我推理,在多个数据集上优于多种基线方法。

Comments 15 pages, 4 figures. PAKDD 2026

详情
AI中文摘要

情感支持对话旨在通过有效对话缓解人们的情感困扰。尽管大语言模型在ESC方面取得了显著进展,但大多数研究可能未从状态模型角度定义图,从而为长期满意度提供了次优解决方案。为解决此问题,我们利用有限状态机在LLM上提出名为EmoFSM的框架。我们的框架允许单个LLM在ESC期间引导规划,并在每个对话轮次中自我推理求助者的情绪、支持策略以及最终回应。在ESC数据集上的大量实验表明,EmoFSM优于许多基线方法,包括直接推理、自我微调、思维链、微调和外部支持方法,甚至那些参数更多的模型。

英文摘要

Emotional support conversation (ESC) aims to alleviate people's emotional distress through effective conversations. Although large language models (LLMs) have made remarkable progress in ESC, most of these studies may not define the diagram from a state-model perspective, thereby providing a suboptimal solution for long-term satisfaction. To address such an issue, we leverage the Finite State Machine (FSM) on LLMs, and propose a framework called EmoFSM. Our framework allows a single LLM to bootstrap the planning during ESC, and self-reason the seeker's emotion, support strategy, and the final response upon each conversation turn. Substantial experiments in ESC datasets suggest that EmoFSM outperforms many baselines, including direct inference, self-fine, chain of thought, finetuning, and externally supported methods, even those with many more parameters.

2604.24696 2026-06-17 cs.CV 版本更新

NeuroClaw Technical Report

NeuroClaw 技术报告

Cheng Wang, Zhibin He, Zhihao Peng, Shengyuan Liu, Yufan Hu, Carl Yang, Lifang He, Lichao Sun, Xiang Li, Yixuan Yuan

AI总结 针对神经影像学中多模态数据、长流程和可重复性挑战,提出NeuroClaw多智能体研究助手,通过数据驱动决策、环境管理和三层技能架构实现可执行可复现的神经影像分析,并在NeuroBench基准上显著优于直接调用智能体。

详情
AI中文摘要

代理型人工智能系统有望加速科学工作流程,但神经影像学面临独特挑战:异质模态(sMRI、fMRI、dMRI、EEG)、长多阶段流水线以及持续的可重复性风险。为解决这一差距,我们提出了NeuroClaw,一个面向可执行和可复现神经影像研究的领域专用多智能体研究助手。NeuroClaw直接操作跨格式和模态的原始神经影像数据,将决策基于数据集语义和BIDS元数据,因此用户无需准备精选输入或定制模型代码。该平台结合了工具工程与端到端环境管理,包括固定Python环境、Docker支持、常见神经影像工具的自动安装程序以及GPU配置。在实践中,这一层强调检查点、执行后验证、结构化审计追踪和受控运行时设置,使工具链更加透明,同时提高可重复性和可审计性。三层技能/智能体层次结构将用户交互、高层编排和底层工具技能分离,将复杂工作流分解为安全、可重用的单元。除了NeuroClaw框架,我们还引入了NeuroBench,一个系统级基准测试,用于评估可执行性、工件有效性和可重复性准备情况。在多个多模态LLM上,与直接调用智能体相比,启用NeuroClaw的运行产生了一致且显著的分数提升。项目主页:此https URL

英文摘要

Agentic artificial intelligence systems promise to accelerate scientific workflows, but neuroimaging poses unique challenges: heterogeneous modalities (sMRI, fMRI, dMRI, EEG), long multi-stage pipelines, and persistent reproducibility risks. To address this gap, we present NeuroClaw, a domain-specialized multi-agent research assistant for executable and reproducible neuroimaging research. NeuroClaw operates directly on raw neuroimaging data across formats and modalities, grounding decisions in dataset semantics and BIDS metadata so users need not prepare curated inputs or bespoke model code. The platform combines harness engineering with end-to-end environment management, including pinned Python environments, Docker support, automated installers for common neuroimaging tools, and GPU configuration. In practice, this layer emphasizes checkpointing, post-execution verification, structured audit traces, and controlled runtime setup, making toolchains more transparent while improving reproducibility and auditability. A three-tier skill/agent hierarchy separates user-facing interaction, high-level orchestration, and low-level tool skills to decompose complex workflows into safe, reusable units. Alongside the NeuroClaw framework, we introduce NeuroBench, a system-level benchmark for executability, artifact validity, and reproducibility readiness. Across multiple multimodal LLMs, NeuroClaw-enabled runs yield consistent and substantial score improvements compared with direct agent invocation. Project homepage: https://cuhk-aim-group.github.io/NeuroClaw/index.html

2605.00725 2026-06-17 cs.LG 版本更新

Weisfeiler Lehman Test on Combinatorial Complexes: Generalized Expressive Power of Topological Neural Networks

组合复形上的Weisfeiler-Lehman测试:拓扑神经网络的泛化表达能力

Jiawen Chen, Qi Shao, Zhiqiang Ge, Duxin Chen, Wenwu Yu

AI总结 提出组合复形Weisfeiler-Lehman(CCWL)框架,通过四种结构邻域统一拓扑神经网络的表达能力,并证明在特定条件下可简化为仅使用上下邻域桥信息,实例化为CCIN网络,实验验证其有效性。

详情
AI中文摘要

拓扑神经网络已成为建模超图、单纯复形和胞腔复形等超越成对图的高阶关系结构的有效工具。然而,现有的Weisfeiler-Leman类型表达能力分析通常在不同的结构域上开发,并依赖于特定域的邻域系统,使得它们的表达能力难以在统一形式下进行比较。本文提出了组合复形Weisfeiler-Lehman(CCWL)框架,这是在组合复形上定义的一种统一的表达能力细化。通过利用组合复形表示集合类型关系和部分-整体层次结构的能力,CCWL通过四个结构邻域进行拓扑颜色细化:边界、共边界、下邻接和上邻接。我们证明,在指定的提升映射下,CCWL可以模拟多个特定域的WL类型细化,从而为分析拓扑消息传递提供了共同的理论基线。我们进一步研究了邻域充分性问题,并证明在显式覆盖条件下,仅使用下邻接和上邻接桥信息的简化细化保留了完整四邻域CCWL细化的区分能力。基于这一理论结果,我们将简化细化实例化为组合复形同构网络(CCIN)。在合成和真实世界基准上的实验表明,CCIN在代表性图和拓扑神经网络基线上取得了有竞争力的性能。消融研究和资源效率分析进一步支持了所提出的下/上邻域设计的有效性。

英文摘要

Topological neural networks have emerged as effective tools for modeling higher-order relational structures beyond pairwise graphs, including hypergraphs, simplicial complexes, and cell complexes. However, existing Weisfeiler-Leman type expressivity analyses are typically developed on different structural domains and rely on domain-specific neighborhood systems, making their expressive powers difficult to compare within a common formalism. In this paper, we introduce the Combinatorial Complex Weisfeiler-Leman (CCWL) framework, a unified expressive power refinement defined on combinatorial complexes. By exploiting the ability of combinatorial complexes to represent both set-type relations and part-whole hierarchies, CCWL performs topological color refinement through four structural neighborhoods: boundary, co-boundary, lower adjacency, and upper adjacency. We show that, under specified lifting maps, CCWL can simulate several domain-specific WL-type refinements, thereby providing a common theoretical baseline for analyzing topological message passing. We further study the neighborhood sufficiency problem and prove that, under explicit coverage conditions, a reduced refinement using only lower- and upper-adjacent bridge information preserves the distinguishing power of the full four-neighborhood CCWL refinement. Guided by this theoretical result, we instantiate the reduced refinement as the Combinatorial Complex Isomorphism Network (CCIN). Experiments on synthetic and real-world benchmarks demonstrate that CCIN achieves competitive performance against representative graph and topological neural network baselines. Ablation studies and resource-efficiency analyses further support the effectiveness of the proposed lower/upper-neighborhood design.