arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.08238 2026-05-12 cs.CV cs.AI cs.ET cs.LG

Resource-Aware Evolutionary Neural Architecture Search for Cardiac MRI Segmentation

Farhana Yasmin, Mahade Hasan, Haipeng Liu, Amjad Ali, Ghulam Muhammad, Yu Xue

AI总结该研究提出了一种资源感知的进化神经网络架构搜索方法CardiacNAS，用于心脏磁共振成像（CMR）分割。该方法结合了类似UNet的超网络和针对心脏分割任务设计的搜索空间，通过进化算法在固定计算预算下联合优化分割精度与模型效率。实验表明，该方法在ACDC数据集上取得了较高的分割精度与较低的计算开销，展示了其在准确性和效率之间的良好平衡。

详情

DOI: 10.1109/ICCIT68739.2025.11491084
Journal ref: F. Yasmin et.al., "Resource-Aware Evolutionary Neural Architecture Search for Cardiac MRI Segmentation," 28th International Conference on Computer and Information Technology (ICCIT), 2025, pp. 2819-2824

英文摘要

Cardiac magnetic resonance (CMR) segmentation underpins quantitative assessment of ventricular structure and function, yet reliable delineation remains difficult due to low tissue contrast, fuzzy boundaries, and inter scan variability. We present CardiacNAS, an evolutionary neural architecture search (NAS) framework that couples a UNet like supernet with a cardiac aware search space spanning depth width, kernel size, filter size, attention, fusion, activation, dropout, and residual scaling. The search is explicitly resource aware, jointly optimizing dice similarity coefficient (DSC) and 95th percentile Hausdorff distance (HD95) versus model size and floating point operations (FLOPs) under fixed compute budgets. Candidate architectures are instantiated from the supernet, trained with proxy budgets, and evolved through crossover, mutation, and elitist selection. We evaluate on the ACDC dataset and compare against six state of the art methods, using qualitative comparisons, learning curve analyses, and design factor correlation studies. The resulting model attains 93.22% average DSC and 4.73 mm HD95 with 3.58M parameters and 14.56 GFLOPs, demonstrating a favorable accuracy efficiency trade off. Analyses indicate that searched attention and fusion choices, together with residual scaling, contribute to improved boundary fidelity and stability. CardiacNAS offers a principled, resource aware approach to deployable CMR segmentation with transparent reporting of architectural complexity and compute budgets.

URL PDF HTML ☆

赞 0 踩 0

2605.08237 2026-05-12 cs.LG stat.ML

Distributional Spectral Diagnostics for Localizing Grokking Transitions

Ziyue Wang, Yufeng Ying, Takafumi Kanamori

AI总结该研究探讨了机器学习模型在“grokking”现象中从记忆训练数据到泛化的转变过程，并提出了一种基于分布谱分析的方法来定位这一转变。通过将任务相关的观测值映射到 Wasserstein/分位数坐标，并结合 Hankel 动态模态分解，研究构建了用于诊断的残差、谱特征和有效秩等指标。实验表明，该方法在模块加法 Transformer 模型中能够有效区分 grokking 与非 grokking 运行，并在固定阈值下实现提前预警，具有较高的检测性能和实用性。

2605.08234 2026-05-12 cs.LG cs.AI

When Does Value-Aware KV Eviction Help? A Fixed-Contract Diagnostic for Non-Monotone Cache Compression

Ruijie Zhang, Haozhe Liang, Da Chang, Li Hu, Fanqi Kong, Huaxiao Yin, Yu Li

AI总结该研究探讨了在长上下文大语言模型推理中，如何通过值感知的键值（KV）缓存淘汰策略来优化缓存压缩的效果。作者提出了一种固定合同诊断方法，用于分析选择器在不同决策点上的表现，从而更准确地评估压缩策略对任务性能的影响。实验表明，该方法能够有效区分缓存压缩中的支持恢复、输出值排序与边缘效应，为非单调缓存压缩提供了有价值的诊断工具。

2605.08232 2026-05-12 cs.LG physics.comp-ph physics.flu-dyn

Hierarchical Multi-Fidelity Learning for Predicting Three-Dimensional Flame Wrinkling and Turbulent Burning Velocity

Saghar Zolfaghari, Yu Xie, Junfeng Yang, Safa Jamali

AI总结该研究针对高保真实验数据获取困难的问题，提出了一种分层多保真度神经网络框架（MuFiNNs），用于预测湍流预混火焰的三维皱褶动态和湍流燃烧速度。该方法结合稀疏的高保真实验数据与编码主导物理趋势的低保真结构模型，通过分层构建和非线性修正，准确学习火焰几何与反应行为的耦合特性。研究显示，该框架在数据稀疏、噪声大或实验难以获取的条件下仍具有良好的预测能力，为数据有限情况下的燃烧建模提供了可扩展且物理基础的解决方案。

详情

英文摘要

High-fidelity experimental characterization of turbulent premixed flames remains limited by the cost and complexity of advanced diagnostics, particularly under elevated pressures and intense turbulence where measurements of coupled flame morphology and burning dynamics are sparse. Here, we develop a hierarchical multi-fidelity neural network framework (MuFiNNs) to address this challenge by integrating sparse high-fidelity experimental data with structured low-fidelity representations encoding dominant physical trends. The framework combines hierarchical low-fidelity construction with nonlinear multi-fidelity correction to learn coupled geometric and reactive flame behavior while recovering discrepancies that simplified models alone cannot capture. The methodology is applied to expanding turbulent premixed flames to predict three-dimensional flame wrinkling dynamics and turbulent mass burning velocity across varying fuels, pressures, and turbulence intensities. Using experimentally informed low-fidelity trend models with sparse high-fidelity measurements, MuFiNNs accurately reconstruct observed flame behavior, enable interpolation across unseen operating conditions, and demonstrate robust extrapolation beyond the training domain. Importantly, the framework remains effective in noisy, weakly structured, or experimentally inaccessible regimes where conventional data-driven approaches often fail. These results show that hierarchical multi-fidelity learning provides a scalable and physically grounded strategy for predictive combustion modeling in data-limited regimes. More broadly, this work establishes multi-fidelity scientific machine learning as a practical framework for extracting physically meaningful predictive models from sparse experiments, particularly for instability-dominated and turbulence-sensitive reactive flows where high-fidelity data acquisition is demanding.

URL PDF HTML ☆

赞 0 踩 0

2605.08231 2026-05-12 cs.LG cs.AI cs.AR

TRAM: Training Approximate Multiplier Structures for Low-Power AI Accelerators

Chang Meng, Hanyu Wang, Yuyang Ye, Mingfei Yu, Wayne Burleson, Giovanni De Micheli

AI总结随着AI加速器的功耗问题日益突出，本文提出TRAM方法，通过联合优化近似乘法器结构与AI模型参数，在保持精度损失较小的前提下显著降低功耗。与以往独立设计近似乘法器的工作不同，TRAM将乘法器结构设计与模型训练过程结合，实现了更高效的功耗优化。实验表明，TRAM在CIFAR-10和ImageNet数据集上分别实现了高达25.05%和27.09%的功耗降低。

2605.08230 2026-05-12 cs.LG stat.AP

Social Determinants of Health and Fentanyl Overdose Mortality Across US Counties: An XGBoost and SHAP Analysis Identifying Silent Risk Counties and Treatment Deserts

Kabi Raj Tiruwa, Abhisan Ghimire, Anuj Kumar Shah

AI总结该研究利用XGBoost和SHAP方法，分析美国各县的社 hội决定因素与芬太尼过量死亡率之间的关系，旨在识别高风险但未被关注的“沉默风险县”和“治疗荒漠县”。研究整合了多项公共卫生数据，发现残疾率、高血压、吸烟和交通不便等因素是预测过量死亡的关键指标，并揭示治疗荒漠县的死亡风险显著升高。研究结果为制定针对性干预措施提供了依据，强调应优先扩展药物使用障碍治疗资源，并对高风险地区进行早期干预。

Comments 21 pages, 7 figures, 4 tables

2605.08226 2026-05-12 cs.CV

SPECTRA-Net: Scalable Pipeline for Explainable Cross-domain Tensor Representations for AI-generated Images Detection

Sarra Arab, Anfal Achouri, Seif Eddine Bouziane

AI总结随着AI生成图像的迅速增多，保障数字信息完整性面临重大挑战。本文提出SPECTRA-Net，一种可扩展的、具有可解释性的跨领域张量表示管道，用于检测AI生成图像。该方法结合视觉基础模型的全局语义特征、光谱分析、局部块异常检测和统计描述符，实现了在域内和跨域场景下的先进检测性能，并通过定位异常区域提供了可解释性，为真实场景中的内容验证提供了更可靠的技术方案。

Comments 13 pages, 2 figures, submitted to a journal

2605.08223 2026-05-12 cs.LG

A Simulated Federated Analysis of MS-Induced Brain Lesions

Evelyn Trautmann, Joël Federer-Gsponer, Markus C. Elze, José-Tomás Prieto

AI总结本文提出了一种模拟框架，用于研究多中心联邦分析在分析多发性硬化症（MS）患者数据中的应用。该框架包含图像分割和临床数据分析两个任务，分别采用联邦生存分析和主成分分析方法，通过构建高保真合成数据集和结合真实影像数据，模拟了实际联邦研究中的数据治理、本地预处理和模型训练等关键环节。该研究为联邦学习方法在MS研究中的开发与评估提供了真实可靠的实验平台。

Comments Accepted for publication at The 39th IEEE International Symposium on Computer-Based Medical Systems

2605.08222 2026-05-12 cs.CV cs.AI cs.IR

From Historical Tabular Image to Knowledge Graphs: A Provenance-Aware Modular Pipeline

Sarah Binta Alam Shoilee, Victor de Boer, Jacco van Ossenbruggen, Susan Legêne

AI总结该研究提出了一种模块化且注重数据来源的流程，用于将手写历史表格图像转化为知识图谱，以支持人机协作。该流程分为表格重建、信息提取和知识图谱构建三个阶段，并在每个阶段保留中间表示以便于人工检查与修正。其核心贡献在于系统性地在每个处理阶段集成数据来源信息，确保所有提取的实体和字面值均可追溯至原始视觉和文本来源，从而提升处理过程的透明度与可控性。

Comments Shorter version of this paper has been accepted in the 5th International Conference on Hybrid Human-Artificial Intelligence (HHAI 2026)

2605.08221 2026-05-12 cs.LG cs.AI

NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning

Michael Jerge, David Evans

AI总结本文提出了一种名为 NoisyCoconut 的新方法，通过在推理阶段对大型语言模型的内部表示引入受控噪声，从而提升模型的可靠性。该方法无需重新训练模型，而是通过生成多样化的推理路径并利用其一致性作为置信信号，使模型在不确定时选择不回答。实验表明，这种方法在多个推理基准上实现了有效的准确率与覆盖范围的权衡，并显著降低了错误率，使模型在数学推理任务中的准确率超过95%。

2605.08220 2026-05-12 cs.AI cs.CE cs.CL cs.CV cs.SE

Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction

Andrei Lazarev, Dmitrii Sedov, Alexander Galkin

AI总结本文研究了如何提高多模态大语言模型从科学图表中提取数据的准确性，重点比较了高层语义引导与底层空间引导两种策略的效果。研究发现，尽管语义引导方法如元数据优先框架和思维链方法未能显著提升性能，但通过在图表图像上叠加坐标网格的简单空间引导方法，能够显著降低数据提取误差。实验表明，该网格方法在合成数据集上将SMAPE误差从25.5%降低至19.5%，验证了空间上下文对当前多模态模型更有效且更可靠。

Comments his is the version of the article accepted for publication in SUMMA 2025 after peer review. The final, published version is available at IEEE Xplore: https://doi.org/10.1109/SUMMA68668.2025.11302248

2605.08218 2026-05-12 cs.LG cs.CV

Deep Dreams Are Made of This: Visualizing Monosemantic Features in Diffusion Models

Adam Szokalski, Mateusz Modrzejewski

AI总结本文提出了一种名为LVO的机制可解释性方法，将卷积神经网络中的特征可视化优化技术扩展到扩散模型的潜在空间中，通过稀疏自编码器（SAE）将多义特征解耦为单一语义特征。研究展示了LVO在Stable Diffusion 1.5上的应用，能够生成清晰可辨的可视化概念图像，如人物、玫瑰、瀑布泡沫等，并验证了像素空间中的正则化技术在潜在空间中的有效性。该方法相比传统数据集示例和特征引导，能更直接地揭示特征激活的本质。

2605.08217 2026-05-12 cs.LG cs.IR

Retrieval Mechanisms Surpass Long-Context Scaling in Time Series Forecasting

Rishi Ahuja, Kumar Prateek, Simranjit Singh, Vijay Kumar

AI总结本文研究了在时间序列预测中，长期上下文是否真的有助于提升预测性能。通过实验发现，在随机性较强的领域中，过长的历史信息反而会引入噪声，导致预测误差增加。为此，作者提出了一种基于检索增强的预测方法（RAFT），通过选择性检索相关历史片段作为外部变量，显著提升了预测精度，并减少了计算需求，为未来时间序列基础模型的设计提供了新方向。

2605.08214 2026-05-12 cs.SD cs.AI eess.AS

Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization

Mohammed Aman Bhuiyan, Md Sazzad Hossain Adib, Samiul Basir Bhuiyan, Amit Chakraborty, Aritra Islam Saswato, Ahmed Faizul Haque Dhrubo, Mohammad Ashrafuzzaman Khan

AI总结本文针对孟加拉语长篇语音识别和说话人分段任务中的挑战，提出了基于Whisper和PyAnnote的改进方法。研究通过微调Whisper模型和PyAnnote分割模块，结合数据增强与定制数据集训练，显著提升了孟加拉语长时语音识别和说话人分段的性能。实验结果显示，所提出的系统在测试集上分别实现了0.2441的词错误率（WER）和0.2392的分段错误率（DER），优于原有预训练模型。

Comments 3 figures and 5 tables

2605.08213 2026-05-12 cs.CV

Low-Cost Stereo Vision for Robust 3D Positioning of Thin Radiata Pine Branches in Autonomous Drone Pruning

Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green

AI总结本文研究如何利用低成本的立体视觉系统实现对薄型辐射松枝条的高精度三维定位，以支持无人机自主修剪。研究提出了一种两阶段方法，包括枝条分割和深度估计，并在自定义数据集上对比了多种先进算法。核心贡献在于结合立体分割与基于中位绝对偏差的三角化算法，有效解决了森林场景中纹理稀疏、结构细薄和深度噪声等问题，为无需昂贵传感器的自主修剪提供了可行方案。

详情

英文摘要

Manual pruning of radiata pine, a species of major economic importance to New Zealand forestry, is hazardous, labour-intensive, and increasingly constrained by workforce shortages. Existing autonomous pruning platforms typically rely on expensive sensors such as LiDAR and are limited to thick branches, which restricts their wider adoption. This paper investigates whether a single low-cost stereo camera mounted on a drone can provide sufficiently accurate branch detection and three-dimensional positioning to support autonomous pruning of branches as thin as 10 mm, thereby removing the need for auxiliary depth sensors. The proposed pipeline comprises two stages: branch segmentation and depth estimation. For segmentation, Mask R-CNN variants and the YOLOv8 and YOLOv9 families are compared on a custom dataset of 71 stereo image pairs captured with a ZED Mini camera; YOLOv8 and YOLOv9 are selected as representative state-of-the-art real-time segmentors at the time of data collection, and the framework is designed to remain compatible with newer YOLO releases. For depth estimation, a traditional method (SGBM with WLS filtering) and deep-learning-based methods (PSMNet, ACVNet, GWCNet, MobileStereoNet, RAFT-Stereo, and NeRF-Supervised Deep Stereo) are evaluated, including cross-dataset fine-tuning experiments that expose the domain gap between urban driving benchmarks and natural forestry scenes. The main novelty of this work lies in coupling stereo segmentation with a centroid-based triangulation algorithm and Median-Absolute-Deviation outlier rejection that converts a segmentation mask and disparity map into a single robust branch-to-camera distance, addressing the challenges of sparse texture, thin structures, and noisy disparity values typical of forest scenes. Qualitative evaluations at distances of 1-2 m show that the learning-based stereo methods produce more coherent depth es...

URL PDF HTML ☆

赞 0 踩 0

2605.08212 2026-05-12 cs.LG cs.CL gr-qc hep-th

LLMs with in-context learning for Algorithmic Theoretical Physics

Anamaria Hell, Leander Thiele

AI总结随着理论物理中算法计算需求的增加，本文探讨了结合大语言模型（LLM）与计算机代数系统（CAS）进行算法任务处理的可行性。研究通过将 Claude 与 Maple 接口，应用于修正引力理论中的宇宙学扰动计算，展示了该方法在实际问题中的表现、常见失败原因及改进方向。结果表明，配备充分示例的前沿大语言模型能够解决大部分测试问题，为理论物理中的自动化计算提供了新思路。

Comments 8 pages, 2 figures

2605.08210 2026-05-12 cs.CV

Harmonized Feature Conditioning and Frequency-Prompt Personalization for Multi-Rater Medical Segmentation

Sanaz Karimijafarbigloo, Armin Khosravi, Alireza Kheyrkhah, Reza Azad, Mauricio Reyes, Dorit Merhof

AI总结该研究针对多专家医学图像分割中的标注差异问题，提出了一种融合特征调和与频率提示个性化的概率框架，旨在更准确地反映临床诊断的不确定性。通过自适应特征条件化和频域个性化模块，模型能够区分设备噪声与专家标注差异，并生成更具解剖一致性的分割结果。实验表明，该方法在多个数据集上实现了领先的分割性能与不确定性估计，尤其在噪声较大的情况下表现突出。

Comments Accepted in main CVPR 2026

2605.08209 2026-05-12 cs.LG

Learngene Search Across Multiple Datasets for Building Variable-Sized Models

Boyu Shi, Junbo Zhou, Chang Liu, Xu Yang, Qiufeng Wang, Xin Geng

AI总结本文提出了一种跨多个数据集学习基因搜索的方法（LSAMD），用于构建可变大小的深度学习模型。该方法通过扩展祖先模型（Ans-Net）为包含数据集特有模块和适配器的超级祖先网络，实现了跨数据集的架构搜索，并从中提取出高频使用的模块作为“学习基因”来初始化不同规模的后代模型。实验表明，LSAMD在保持模型性能的同时，显著降低了存储和训练成本。

2605.08207 2026-05-12 cs.CV

A Breast Vision Pathology Foundation Model for Real-world Clinical Utility

Yingxue Xu, Zhengyu Zhang, Xiuming Zhang, Mengwei Xu, Fengtao Zhou, Yihui Wang, Jiabo Ma, Yi Xin, Danyi Li, Chengyu Lu, Zhijian Cen, Ying Tan, Qingbing Yao, Qi Wang, Zizhao Gao, Yong Zhang, Jingjing Chen, Feifei Liu, Qian Xu, Yi Dai, Hongxuan Tan, Cheng Jin, Huajun Zhou, Zhengrui Guo, Ling Liang, Hongyi Wang, Yingcong Chen, Xi Wang, Zhenhui Li, Ronald Cheong Kin Chan, Ning Mao, Muyan Cai, Zhe Wang, Li Liang, Hao Chen

AI总结该研究提出了一种名为BRAVE的乳腺病理基础模型，旨在支持真实临床场景中的病理诊断与决策。该模型基于来自亚洲、欧洲和北美32个来源的10万余张乳腺全切片图像进行开发与评估，能够在术前活检、术中冰冻切片和术后切除等多个临床环节中发挥作用。实验表明，BRAVE在排除低风险病例、辅助发现漏诊病例以及提升病理医生诊断准确率和效率方面表现出显著优势，并能独立预测患者的无病生存率和总体生存率。

Comments 60 pages

详情

英文摘要

Pathology foundation models have shown strong retrospective performance, but whether such systems can support clinically relevant use remains unclear. This challenge is particularly important in breast cancer, where pathological assessment serves as the gold standard for diagnosis and guides treatment planning, surgical decision-making and risk stratification across pre-, intra- and post-operative stages. Here we present \textbf{BRAVE}, a breast-adaptive pathology foundation model developed and evaluated using a total resource of 101,638 breast whole-slide images from 32 sources across Asia, Europe and North America. We assessed BRAVE across 34 tasks in 82 cohorts spanning pre-operative biopsy, intra-operative frozen section and post-operative resection, using an evidence chain comprising retrospective benchmarking, clinically challenging scenarios, workflow-oriented clinical impact simulations, prospective observational validation with the thresholds locked in the retrospective cohorts and crossover pathologist-AI interaction studies. Across these settings, BRAVE supported practical roles in the clinical workflow, including safe exclusion of low-risk cases from routine review, AI-assisted second-review rescue of initially missed positives and prioritization of cases for further assessment. In prospective validation across three centres, BRAVE excluded 76.9% of negative biopsy cases (NPV 0.953) and 70.1% of negative frozen-section cases (NPV 0.973), and triaged 78.8% of post-operative subtyping cases as high-confidence clear-cut cases (NPV 1.000). In reader studies, AI assistance improved balanced accuracy from 88.5% to 95.1% (OR 3.14, P<0.001), with better efficiency, confidence and inter-rater agreement. BRAVE-derived scores also independently predicted disease-free survival (adjusted HR 4.79, P<0.001) and overall survival (adjusted HR 8.14, P<0.001).

URL PDF HTML ☆

赞 0 踩 0

2605.08202 2026-05-12 cs.LG cs.AI

Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning

Qingjun Wang, Hongtu Zhou, Hang Yu, Junqiao Zhao, Yanping Zhao, Chen Ye, Ziqiao Wang, Guang Chen

AI总结离线强化学习面临的一个关键挑战是对分布外（OOD）动作价值的高估问题。现有方法通常通过惩罚未见样本来缓解这一问题，但难以准确识别OOD动作，并可能抑制有益的探索。为此，本文提出DOSER框架，基于扩散模型捕捉行为策略和状态分布，利用单步去噪重建误差作为可靠的OOD检测指标，并在策略优化过程中区分有益和有害的OOD动作，选择性地抑制风险动作并鼓励高潜力动作的探索，从而在多个基准测试中表现出优于现有方法的性能。

Comments 10 pages, 5 figures. Accepted to ICLR 2026

2605.08201 2026-05-12 cs.LG cs.AI cs.CV

Weakly Supervised Concept Learning for Object-centric Visual Reasoning

Sparsh Tiwari, Bettina Finzel, Gesina Schwalbe

AI总结本文研究了如何在弱监督条件下实现面向对象的视觉推理中的概念学习问题。提出了一种结合基于插槽的架构和变分自编码器（VAE）的方法，通过自监督和概念引导在潜在空间中实现对感知符号的可解释 grounding。该方法能够在仅使用1%标签的情况下发现复杂的抽象规则，并在领域迁移情况下表现出优越的鲁棒性，甚至在少样本设置下优于当前最先进的基础模型。

2605.08200 2026-05-12 cs.AI cs.CV cs.LG

Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits

Logan Mann, Ajit Saravanan, Ishan Dave, Shikhar Shiromani, Saadullah Ismail, Yi Xia, Emily Huang

AI总结本研究通过统一的机制分析框架，探讨了视觉-语言模型（VLMs）中可靠性的真实来源，挑战了“注意力图越清晰模型越可信”的直观假设。研究发现，注意力结构与模型正确性几乎无关，而隐藏状态的几何特征和后期层的稀疏电路更能可靠地反映模型的可靠性。此外，不同模型架构在可靠性分布上存在显著差异，对模型设计和监控具有重要启示。

Comments 15 pages, 4 figures, 10 tables. Accepted at the ICLR 2026 Workshop on Multimodal Reasoning. Code and probe-training pipelines: https://github.com/itsloganmann/VLM-Reliability-Probe

2605.08198 2026-05-12 cs.LG cs.AI cs.CY

FairHealth: An Open-Source Python Library for Trustworthy Healthcare AI in Low-Resource Settings

Farjana Yesmin

AI总结本文介绍了 FairHealth，一个开源的 Python 库，旨在为低资源环境下的可信医疗 AI 提供统一且模块化的框架，特别关注如孟加拉国等低收入国家的应用场景。该库针对现有医疗 AI 工具的四大不足，提供了公平性审计、隐私保护的联邦学习、低带宽解释性工具以及面向全球南方的医疗数据集支持等功能模块，填补了相关领域的空白，并可直接通过 pip 安装使用。

Comments 8 pages, open-source Python library

2605.08197 2026-05-12 cs.LG cs.AI

ReplaySCM: A Benchmark for Executable Causal Mechanism Induction from Interventions

Serafim Batzoglou

AI总结 ReplaySCM 是一个用于评估从有限干预证据中归纳可执行因果机制的基准，包含1300个由潜在全观测的布尔结构因果模型生成的二值世界。该基准要求系统输出符合特定布尔DSL的机制图，并通过在训练和测试干预场景中回放来评估其行为，而非仅比较公式字符串。研究显示，前沿语言模型在部分信息设置下表现出色，但在隐藏因果顺序或根源时性能显著下降，表明其在因果机制归纳方面仍面临挑战。

2605.08196 2026-05-12 cs.CV

Survey on Disaster Management Datasets for Remote Sensing Based Emergency Applications

Alain P. Ndigande, Josiah Wiggins, Sedat Ozer

AI总结本文综述了用于基于遥感的应急应用的灾难管理数据集，重点介绍了支持计算机视觉和遥感任务的公开图像数据集，涵盖灾难前、中、后各个阶段。研究旨在为研究人员和实践者提供高质量数据集的集中参考，以加速基于遥感的灾难响应解决方案的开发与部署。

Comments This work has been accepted for publication at IEEE Transactions on Geoscience and Remote Sensing

2605.08195 2026-05-12 cs.LG

ExecuTorch -- A Unified PyTorch Solution to Run AI Models On-Device

Mergen Nachin, Digant Desai, Sicheng Stephen Jia, Chen Lai, Mengwei Liu, Jacob Szwejbka, Raziel Alvarez, RJ Ascani, Dave Bort, Manuel Candales, Andrew Caples, Yanan Cao, Zhengxu Chen, Soumith Chintala, Gregory Comer, Tanvir Islam, Songhao Jia, Tarun Karuturi, Jack Khuu, Abhinay Kukkadapu, Tugsbayasgalan Manlaibaatar, Andrew Or, Kimish Patel, Siddartha Pothapragada, Lucy Qiu, Supriya Rao, Orion Reblitz-Richardson, Max Ren, Scott Roy, Anthony Shoumikhin, Scott Wolchok, Guang Yang, Angela Yi, Martin Yuan, Hansong Zhang, Jack Zhang, Jerry Zhang, Shunting Zhang, C. Cagatay Bilgin

AI总结 ExecuTorch 是一个统一的 PyTorch 部署框架，旨在解决在边缘设备上运行 AI 模型时面临的硬件碎片化问题。该框架支持从微控制器到复杂 SoC 的多种异构计算环境，能够在保留 PyTorch 语义的同时提供量化优化和可插拔执行后端等功能。ExecuTorch 使得研究人员能够在 PyTorch 生态内完成模型部署验证，有效弥合了研究与生产之间的差距。

2605.08194 2026-05-12 cs.SD eess.AS eess.SP

ShipEcho -- An Interactive Tool for Global Mapping of Underwater Radiated Noise from Vessels

Mark Shipton, Valentino Denona, Đula Nađ, Roee Diamant

AI总结本文介绍了一款名为 ShipEcho 的交互式网络地理信息系统（GIS），用于全球范围内实时绘制船舶辐射噪声（V-URN）地图。该工具利用基于社区的自动识别系统（AIS）数据，并结合已建立的船舶声学模型和海底地形数据进行传播模拟，生成包括不同频段的声压级和声暴露级在内的噪声地图。研究展示了 ShipEcho 在支持环境评估、决策制定和政策制定方面的应用潜力，并通过与实际声学记录的对比验证了其地图的准确性。

Comments 34 pages

2605.08191 2026-05-12 cs.CV cs.AI

A Robust Out-of-Distribution Detection Framework via Synergistic Smoothing

Maria Stoica, Abdelrahman Hekal, Alessio Lomuscio

AI总结该论文提出了一种名为ROSS的鲁棒的分布外检测框架，旨在提升机器学习系统在面对对抗攻击时的可靠性。其核心方法是通过对基线检测分数进行中值平滑，利用生成的噪声样本量化原始分数的局部不稳定性，并据此区分分布内与分布外样本。该方法在多种数据集上表现出色，实现了对两类对抗攻击的对称鲁棒性，显著优于现有方法。

Comments Accepted to CVPR Findings 2026

2605.08190 2026-05-12 cs.LG cs.SY eess.SY

Synergistic Simplex: Cooperative Runtime Assurance for Safety-Critical Autonomous Systems

Ayoosh Bansal, Mikael Yeghiazaryan, Artyom Khachatryan, Tianyi Zhu, Hunmin Kim, Naira Hovakimyan, Lui Sha

AI总结随着自动驾驶系统越来越多地依赖机器学习组件执行安全关键任务，如何确保其可靠性成为重要问题。本文提出了一种名为Synergistic Simplex（SS）的协同运行时保证架构，通过允许安全监控模块与机器学习组件进行双向交互，从而在保持形式化安全保证的同时提升系统性能。该方法的核心创新在于打破传统运行时保证系统的限制，使安全监控能够利用机器学习的输出，并通过形式化分析证明其安全性，实验验证了其在自动驾驶障碍物检测中的有效性。

2605.08188 2026-05-12 cs.CV cs.AI

Neuroscience-Inspired Analyses of Visual Interestingness in Multimodal Transformers

Mathis Immertreu, Fitim Abdullahu, Thomas Kinfe, Helmut Grabner, Patrick Krauss, Achim Schilling

AI总结该研究探讨了多模态变换器模型中视觉吸引力的编码机制，通过引入来自Flickr平台的人类兴趣评分，分析了Qwen3-VL-8B模型内部视觉和语言组件的表示结构。研究发现，模型中与视觉兴趣相关的信息可以从最终层嵌入中线性解码，并在中间视觉变换器层和语言模型层中逐步显现，表明模型在无监督条件下能够结构化地编码视觉吸引力。研究还揭示了不同方法提取的概念向量在高层趋于一致，为理解人类注意力与人工智能系统中的兴趣机制提供了新视角。