arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.09857 2026-05-12 stat.ML cs.LG

Unified Approach for Weakly Supervised Multicalibration

Futoshi Futami, Takashi Ishida

AI总结该论文研究了弱监督学习下多校准（multicalibration）的问题，即在缺乏干净标签的情况下，如何使模型预测的分数与真实标签概率在不同子群和评分相关测试中保持一致。为解决这一问题，作者提出了一种统一框架，结合污染矩阵风险重写和基于见证的校准约束，实现了在弱监督设置下的多校准误差估计与后处理修正，并提出了一个通用的弱标签多校准提升算法（WLMC）。实验表明该方法在多种弱监督场景下有效，为不确定性估计提供了新的实证见解。

2605.09834 2026-05-12 stat.ML cs.LG

Supercharging Bayesian Inference with Reliable AI-Informed Priors

Jongwoo Choi, Sean O'Hagan

AI总结本文研究了如何利用现代预测系统提供的信念作为统计推断的先验信息，以提升数据有限情况下的推断性能。为了解决预测模型误差可能传播到后验分布的问题，作者提出了一种修正AI生成数据规律的框架，用于构建更可靠的AI先验。该方法显著降低了偏差，提高了可信区间覆盖率，并在实际皮肤疾病分类任务中验证了其有效性。

2605.09833 2026-05-12 cs.IT cs.LG math.IT

Cross-Domain Lossy Compression via Constrained Minimum Entropy Coupling

Nam Nguyen, Hassan Tavakoli, An Vuong, Thinh Nguyen, Bella Bose

AI总结本文研究了在率约束和分类约束下的跨域有损压缩问题，提出了一种基于最小熵耦合（MEC）的优化方法，旨在在源域和重构域之间建立更强的信息耦合，而非最小化逐样本失真。通过引入确定性耦合形式，简化了中间表示，并在伯努利源情况下推导了闭式解。实验表明，增加压缩率可以提升分类精度并生成更具信息量的重构结果。

2605.09830 2026-05-12 cs.IR cs.CV

Loom: Hybrid Retrieval-Scoring Outfit Recommendation with Semantic Material Compatibility and Occasion-Aware Embedding Priors

Anushree Berlia

AI总结 Loom 是一个结合神经嵌入检索与结构化领域评分的服装搭配推荐系统，旨在从时尚图册中生成完整且协调的穿搭组合。该系统通过 FashionCLIP 嵌入进行约束检索，结合多目标评分函数，综合考虑嵌入相似性、色彩协调性、正式程度一致性、场合适配性等多个因素进行打分。研究引入了语义材质权重和场合先验嵌入两种技术，分别提升材质兼容性判断和场合适配性，实验表明该系统在搭配质量与违规率方面显著优于随机基线，且能在普通硬件上快速生成多样化的穿搭方案。

Comments Code: https://github.com/anushreeberlia/loom

2605.09822 2026-05-12 cs.CR cs.AI

Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

Ben Kereopa-Yorke, Guillermo Diaz, Holly Wright, Reagan Johnston, Ron F. Del Rosario, Timothy Lynar

AI总结本文提出了一种新型攻击方法——“Oracle Poisoning”，通过篡改AI代理在运行时查询的结构化知识图谱数据，使代理在正确推理过程中得出错误结论。研究在实际的4200万个节点的代码知识图谱上展示了六种攻击场景，首次实证了针对生产级智能体系统的知识图谱中毒攻击。实验表明，所有测试模型在中等攻击复杂度下均会100%信任被污染的数据，并揭示了攻击效果与交付方式、提示框架等变量密切相关，同时评估了五种防御手段的有效性。

Comments 26 pages, 3 fugres, 16 tables

2605.09810 2026-05-12 q-bio.BM cs.LG

TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation

Hanqun Cao, Aastha Pal, Sophia Tang, Yinuo Zhang, Jingjie Zhang, Pheng Ann Heng, Pranam Chatterjee

AI总结该研究提出了一种名为TD3B的基于序列的生成框架，用于设计具有特定激动剂或拮抗剂行为的变构配体。TD3B通过方向性过渡控制目标，结合目标感知的方向向导、软结合亲和力门控以及预训练离散扩散模型的 amortized 微调，实现了与结合亲和力解耦的定向配体生成。该方法能够有效区分激动剂与拮抗剂行为，弥补了传统结构优化方法在方向性功能调控方面的不足。

Comments Published as a Spotlight at ICML 2026 (Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea)

2605.09803 2026-05-12 cs.HC cs.AI

Insight: Enhancing Mobile Accessibility for Blind and Visually Impaired Users with LLMs

Joshua Owusu Ansah, Anuj Kapoor, Ayush Khanna, Manvika Vinod, Precious Njeck, Shuai Gao

AI总结本文研究了当前移动辅助技术（如TalkBack）在帮助视障用户时的局限性，并提出了一种基于大语言模型（LLM）的新型辅助服务Insight，能够通过自然语言交互和屏幕实时摘要提升用户体验。实验表明，Insight在降低用户认知负担和任务时间方面优于传统方案，受到用户青睐，但也暴露出对中断管理的需求。研究展示了LLM在提升移动无障碍性方面的潜力，并为结合手势与对话模式的混合解决方案提供了方向。

Comments 10 pages, 5 figures

2605.09790 2026-05-12 cs.DC cs.AI cs.LG

Multi-Tier Labeling and Physics-Informed Learning for Orbital Anomaly Detection at Scale

Yong Fu

AI总结本文研究了如何在大量低轨卫星数据中高效检测轨道异常事件，如机动、大气衰减和姿态扰动。为解决标签稀缺的问题，作者提出了一种多层级的弱监督标注框架，结合物理规则、滤波算法和校准方法，实现了大规模数据的自动标注。基于60年两行轨道数据，该方法生成了大量标注序列，并训练了一个高召回率的Transformer模型，为后续事件筛选提供了基础，为构建基于神经微分方程的轨道世界模型奠定了方向。

2605.09781 2026-05-12 cs.NE cs.AI cs.CL cs.LG

Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution

Dongxin Guo, Jikun Wu, Siu Ming Yiu

AI总结该研究针对大型语言模型（LLM）生成结果多样性不足的问题，提出了一种参数高效的神经进化框架QD-LLM，通过进化提示嵌入来引导冻结的LLM生成多样化输出。该方法在无需微调模型的情况下，利用无梯度优化进化提示嵌入，并结合语义与显式特征进行行为表征，显著提升了生成内容的多样性和质量。实验表明，QD-LLM在多个基准测试中表现出更高的覆盖度和质量-多样性得分，并有效提升了测试生成和微调数据的质量。

Comments 11 pages, 3 figures, 7 tables, 1 algorithm, 1 theorem. Accepted to GECCO 2026

2605.09777 2026-05-12 cs.NE cs.AI cs.CL cs.LG

EvoPref: Multi-Objective Evolutionary Optimization Discovers Diverse LLM Alignments Beyond Gradient Descent

Dongxin Guo, Jikun Wu, Siu Ming Yiu

AI总结本文提出了一种基于多目标进化算法的新型偏好优化方法EvoPref，用于提升大语言模型（LLM）对齐的多样性。该方法通过非支配排序遗传算法（NSGA-II）和存档机制，在帮助性、无害性和诚实性等多个目标上优化低秩适配器（LoRA），有效避免了梯度下降方法中的偏好崩溃问题。实验表明，EvoPref在标准基准上显著提升了偏好覆盖率并降低了崩溃率，同时保持了良好的对齐质量，验证了进化优化在实现多样化LLM对齐中的有效性。

Comments 10 pages, 2 figures, 6 tables, 1 algorithm. Accepted to GECCO 2026

2605.09772 2026-05-12 eess.SY cs.RO cs.SY math.OC

Safe Exploration for Nonlinear Processes Using Online Gaussian Process Learning

Stefano Tonini, Soroush Rastegarpour, Hamid Reza Feyzmahdavian, Nicola Bastianello, Karl Henrik Johansson

AI总结本文提出了一种用于非线性系统的安全数据驱动控制框架，仅需系统的可稳定线性近似即可实现在线学习过程中的稳定性与约束满足。通过实时学习的高斯过程残差捕捉未建模的非线性动态，并基于李雅普诺夫理论构造概率控制不变集以确保安全。该方法通过凸二次规划计算控制输入，在满足安全约束的同时最大化信息增益，实验结果验证了其在模型不确定性下的安全有效探索能力。

Comments Accepted in 23rd IFAC World Congress

2605.09764 2026-05-12 cs.NE cs.AI

LEVI: Stronger Search Architectures Can Substitute for Larger LLMs in Evolutionary Search

Temoor Tanveer

AI总结本文提出了一种名为LEVI的进化搜索框架，旨在通过更强的搜索架构替代或超越大型语言模型在进化搜索中的作用。LEVI通过改进解决方案数据库、智能突变路由和排名保持的代理基准，提升了进化搜索的效率与多样性。实验表明，LEVI在多个系统研究基准上以更小的计算预算取得了优于现有方法的性能，展示了其在资源效率和效果上的显著优势。

2605.09755 2026-05-12 math.NA cs.DS cs.LG cs.NA stat.ML

Accelerating Power Method with Fast Sketching for Stronger Low-Rank Approximation

Shabarish Chenakkod, Michał Dereziński

AI总结本文研究如何加速幂法以实现更强的低秩近似，针对传统幂法在高秩目标下计算成本高的问题，提出了一种基于快速随机投影的加速框架。该方法在奇异值分解、低秩分解和Nystrom近似等任务中表现出高效且稳定的数值性能，其核心创新在于引入了正则化谱近似理论，为幂法的推广提供了更灵活的分析工具。

2605.09754 2026-05-12 cs.IT cs.DC cs.LG math.IT

Learning from Acceptance: Cumulative Regret in the Game of Coding

Hanzaleh Akbari Nodehi, Parsa Moradi, Mohammad Ali Maddah-Ali

AI总结本文研究了在开放去中心化系统中，数据收集者与策略性对手之间的博弈问题，其中对手可能提交足够一致的数据以被接受，同时降低系统估计质量。不同于以往假设数据收集者完全了解对手策略的工作，本文考虑了信息不完全的情形，提出了一种通过重复交互学习对手策略的算法，并证明其累积遗憾具有次线性增长，实验验证了其有效性。

2605.09735 2026-05-12 cs.AR cs.AI cs.DC cs.OS

KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving

Zhiqing Zhong, Zhijing Ye, Jian Zhang, Weijian Zheng, Bolun Sun, Xiaodong Yu

AI总结静态图大语言模型（LLM）解码器虽然具有可预测的启动、固定的张量形状和低提交开销，但在在线解码过程中，键值缓存（KV-cache）的行为高度不规则，导致内存预留过多和突发延迟问题。本文提出KV-RM运行时设计，通过在静态图解码器下规范化KV-cache的移动，解耦逻辑历史与物理存储，利用块页表追踪活跃状态，并通过单一描述符提交每个解码步骤，从而提升运行灵活性。实验表明，KV-RM在混合长度解码吞吐量、尾部延迟和内存使用方面均优于静态图基线，有效缓解了生产场景中的延迟峰值问题。

Comments 14 pages, 7 figures, 7 tables

2605.09734 2026-05-12 cs.SE cs.AI cs.MA

Trajectory Supervision for Continual Tool-Use Learning in LLMs

Vishnu Vardhan Reddy, Sagnik Chatterjee, Soumik Bhatta

AI总结本文研究了在连续工具使用学习中，轨迹监督对大语言模型性能的影响。通过在API-Bank数据集上对Llama 3.1 8B模型进行微调，实验对比了两种条件：一种去除历史API调用轨迹，另一种保留完整轨迹进行训练。结果显示，保留轨迹的条件在最终API调用准确率上提升了17.7个百分点，表明轨迹信息对模型学习工具使用过程具有重要价值。

2605.09721 2026-05-12 cs.CR cs.AI

Security Risks in Tool-Enabled AI Agents: A Systematic Analysis of Privileged Execution Environments

Hardik Goel

AI总结本文系统分析了云环境中工具增强型AI代理的安全风险，指出这些代理通过特权工具执行操作时可能引发多种安全隐患。研究提出了风险分类体系，并通过三个典型场景说明风险表现，同时探讨了缓解策略及其权衡。实验表明，许多风险并非源于新型漏洞，而是由于工具权限过高、能力与意图不匹配以及执行环境中的权限泄露所致，据此提出了更安全的云部署设计指南。

Comments Extended author preprint. A shortened version has been accepted as a short paper at IEEE COMPSAC 2026. 7 pages, 3 figures/tables

2605.09718 2026-05-12 stat.ML cs.LG math.PR math.ST stat.TH

Learning stochastic multiscale models through normalizing flows

Anan Saha, Arnab Ganguly

AI总结该论文研究了如何从单一观测轨迹中学习多尺度随机系统的有效动力学模型。作者提出了一种基于轨迹的框架，通过耦合多尺度随机微分方程建模系统动力学，并利用随机平均方法进行模型降阶。为了解决降阶模型中依赖于难以求解的快变量不变分布的问题，作者引入了归一化流来参数化该分布，并通过端到端优化学习模型参数，同时采用变分贝叶斯推断方法进行不确定性量化，从而实现了对多尺度系统中认识不确定性的有效刻画。

Comments 17 pages, 4 figures

详情

英文摘要

Many systems in physics, engineering, and biology exhibit multiscale stochastic dynamics, where low-dimensional slow variables evolve under the influence of high-dimensional fast processes. In practice, observations are often limited to a single trajectory of the slow component, while the fast dynamics remain unobserved, making statistical learning challenging. Approaches based on partial differential equations (PDE), such as Fokker-Planck formulations, aim to characterize the evolution of probability densities, typically requiring dense space-time data or grid-based solvers. In contrast, we adopt a trajectory-based perspective and develop a data-driven framework for learning effective stochastic dynamics from a single observed path. We model the dynamics by coupled multiscale stochastic differential equations (SDEs) and first obtain a principled model reduction through stochastic averaging. Unlike generic model reduction techniques such as PCA, this respects the dynamical structure of the original system and explicitly incorporates the interaction between slow and fast scales. A central challenge, however, is that the reduced model depends on the invariant distribution of the fast process, which is a solution to an intractable and often unknown PDE. We introduce a novel learning framework that parameterizes the invariant distribution using normalizing flows, enabling expressive density modeling in the latent fast-variable space. The flow is trained end-to-end by optimizing a penalized likelihood objective induced by the reduced stochastic dynamics. Furthermore, we develop a Bayesian variational inference procedure for uncertainty quantification, employing a second normalizing flow to approximate the posterior distribution over model parameters. This yields a scalable approach to capturing epistemic uncertainty in multiscale systems.

URL PDF HTML ☆

赞 0 踩 0

2605.09702 2026-05-12 stat.ME cs.CL

Calibrate, Don't Curate: Label-Efficient Estimation from Noisy LLM Judges

Yanran Li

AI总结本文研究了在存在噪声标签的多评委评估体系中，如何高效估计大型语言模型的性能。传统方法倾向于通过筛选高准确率的评委来提升评估效果，但作者发现，当目标是校准后的概率评估时，保留全部评委反而表现更优。研究表明，即使某些评委的准确率低于平均水平，只要其偏差可学习且信息不冗余，就能为校准带来帮助，因此在有标注校准数据的情况下，应避免仅依据准确率剔除弱评委。

2605.09699 2026-05-12 eess.IV cs.CV cs.GR cs.LG

A Real-Calibrated Synthetic-First Data Engine

Yukang Shen

AI总结现代计算机视觉系统在数据稀缺领域常面临性能限制，而合成数据生成虽具潜力，但直接应用常因数据质量与反馈机制不足导致效果不稳定。本文提出一种“真实校准、以合成数据为主”的数据引擎，通过可控扩散模型与多阶段筛选过滤的统一流程，系统性提升合成数据增强的实用性与可靠性。实验表明，在人体姿态估计等任务中，合成数据与真实数据结合可有效提升性能，凸显了数据驱动策略在低数据场景下的重要价值。

Comments 7 pages, 6 figures

2605.09684 2026-05-12 cs.CR cs.AI

MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring

Monika Jotautaitė, Maria Angelica Martinez, Ollie Matthews, Tyler Tracy

AI总结本文提出了一种半自动化红队测试方法MonitoringBench，用于评估代码代理监控系统的安全性，指出当前监控方法可能低估了攻击风险并高估了监控效果。研究通过构建攻击分类体系、分解攻击生成过程以及引入半自动化流程，解决了攻击生成中的模式崩溃、构思与执行不一致以及人工测试成本高等问题。应用该方法于BashArena环境，生成包含2,644条攻击轨迹的基准数据集，实验表明当前先进监控系统在面对优化攻击时检测率显著下降，揭示了监控系统在识别可疑行为和评分校准方面的不足，并为改进提供了可行方向。

2605.09654 2026-05-12 stat.ML cs.LG stat.CO

Metropolis-Adjusted Diffusion Models

Kevin H. Lam, Tyler Farghly, Christopher Williams, Jun Yang, Yee Whye Teh, Arnaud Doucet

AI总结本文研究了基于分数的扩散模型中的采样偏差问题，提出了一种基于Metropolis-Hastings（MH）或Barker接受-拒绝步骤的修正方法，以减少时间离散化和分数函数近似带来的偏差。作者引入了一种基于双硬币伯努利工厂算法的精确修正方法，并提出了一种基于辛普森法则的高效近似方法，显著提升了采样质量。实验表明，该方法在合成数据和图像数据集上均取得了更好的样本生成效果，尤其在FID指标上表现突出。

2605.09652 2026-05-12 cs.NE cs.AI

RDEx-CASK: Cauchy Mutation, Archive, and Stagnation Kick for RDEx-CSOP

Dikshant, Dikshit Chauhan, Chen Hao, Anupam Trivedi, Harikumar Kandath, Senthilnath Jayavelu

AI总结本文提出了一种改进的RDEx-CSOP算法——RDEx-CASK，旨在解决优化过程中停滞和后期方差问题。通过引入截断柯西分布采样、引入小型可行解档案以及设置个体停滞触发机制，增强了算法的探索能力与收敛效率。实验结果表明，RDEx-CASK在可行性感知优化质量上具有竞争力，并在多数问题上提升了达到目标的时间效率。

Comments 5 pages, 2 tables, 1 algorithm. Technical report for the CEC 2026 CSOP competition track

2605.09623 2026-05-12 cs.DC cs.AI cs.LG cs.NI cs.PF

Adaptive DNN Partitioning and Offloading in Heterogeneous Edge-Cloud Continuum

Akuen Akoi Deng, Eimantas Butkus, Alfreds Lapkovskis, Praveen Kumar Donta

AI总结近年来，人工智能在资源受限的物联网设备上的应用日益广泛，但现有的深度神经网络（DNN）划分与卸载方法多为静态，难以适应运行时的动态变化。本文提出了一种动态划分框架，能够在异构的边缘-云环境中根据运行时状况自适应调整网络层的分布，并通过实际硬件测试平台验证了其有效性。实验结果表明，该框架在能耗和端到端延迟方面相比静态划分方法分别减少了27.09%至35.82%和6.34%至22.92%，验证了自适应方法的优越性。

2605.09610 2026-05-12 cs.MA cs.AI cs.CE cs.LG cs.PL cs.SE

SmartEval: A Benchmark for Evaluating LLM-Generated Smart Contracts from Natural Language Specifications

Abhinav Goel, Agostino Capponi, Alfio Gliozzo, Chaitya Shah

AI总结本文介绍了 SmartEval，一个用于系统评估大型语言模型（LLM）从自然语言规范生成 Solidity 智能合约质量的基准。SmartEval 提供了 9000 个生成合约及其对应的专家实现，包含五个维度的评估标准，并通过多项实证研究验证了其可靠性。研究揭示了生成合约的典型失败模式，并量化了其相比真实实现的优势，为 LLM 智能合约合成质量的实证研究提供了可复现的基础。

2605.09606 2026-05-12 cs.CR cs.CV

On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models

Yule Liu, Yilong Yang, Jiale Teng, Hanze Jia, Zeren Luo, Jingyi Zheng, Zifan Peng, Ke Li, Yifan Liao, Zhen Sun, Jiaheng Wei, Yang Liu, Zhuo Ma, Xinlei He

AI总结本文研究了图像到3D模型在生成有害几何结构方面的风险及其缓解方法，揭示了当前模型在面对恶意输入时可能重建出具有物理危害、风险组件或欺骗性复制品的3D结构。通过系统评估多种开源和商用模型，发现现有模型在生成有害几何方面表现较强，而现有防护机制效果有限。研究进一步提出了一种多层次防御策略，有效降低有害输出比例，但仍面临较高的误报率，突显了当前系统在几何安全防护方面的不足。

详情

英文摘要

Recent advances in image-to-3D models have significantly improved the fidelity and accessibility of 3D content creation. Such a powerful reconstruction capability that enables creative design can also be misused by the adversary to generate harmful geometries, which can be further fabricated via 3D printers and pose real-world risks. However, such risks are largely underexplored: it remains unclear how well current image-to-3D models can produce these harmful geometries, and whether existing safeguards can reliably prevent such generation. To fill this gap, we conduct a systematic measurement study of harmful geometry generation and mitigation. We first describe this risk through three kinds of unsafe categories: direct-use physical hazards, risky templates or components, and deceptive replicas. Each category is instantiated with representative objects. We evaluate both open-source and commercial image-to-3D models under original, degraded, viewpoint-shifted, and semantically camouflaged inputs. We consider different evaluation metrics, including geometric validity, multi-view VLM-based semantic scoring, targeted human validation, and controlled physical fabrication. The results reveal a concerning reality that current image-to-3D models can effectively reconstruct the harmful geometries, while fewer than 0.3% of such geometries trigger commercial moderation flags. As a first step toward mitigation, we evaluate three representative safeguard families, including input moderation, model-level benign alignment, and output-level filtering. We find that existing safeguards have distinct weaknesses. We further develop a stacked defense that can reduce harmful retention to <1%, but still at 11% overall false-positive cost. Taken together, our findings demonstrate that the risk in current system and encourage better geometry-aware safeguards for moderation.

URL PDF HTML ☆

赞 0 踩 0

2605.09588 2026-05-12 cs.GT cs.AI cs.LG

Efficient Ensemble Selection from Binary and Pairwise Feedback

Tzeh Yuan Neoh, Nicholas Teh, Je Qin Chooi, Paul W. Goldberg, Milind Tambe

AI总结本文研究了在多个AI系统中选择高性能专家组合的问题，将其建模为一种分布式的多赢家投票问题。针对二元反馈和成对反馈两种情况，分别提出了相应的优化目标和算法，其中在二元反馈下设计了一种条件失败的贪心算法，能够在保证性能的同时减少查询次数；在成对反馈下则引入了加权序覆盖松弛方法，支持子模性质并提供了θ-型保证。实验验证了所提方法在减少查询次数和提升组合性能方面的有效性。

2605.09575 2026-05-12 eess.IV cs.CV

Annotation-free deep learning for detection and segmentation of fetal germinal matrix-intraventricular hemorrhage in brain MRI

Mingxuan Liu, Yingqi Hao, Yi Liao, Juncheng Zhu, Haoxiang Li, Hongjia Yang, Yifei Chen, Yijin Li, Kasidit Anmahapong, Zihan Li, Jialan Zheng, Min Kang, Yan Song, Hua Lai, Xiaoling Zhou, Nan Sun, Rong Hu, Gang Ning, Haibo Qu, Qiyuan Tian

AI总结该研究提出了一种无需标注数据的深度学习框架FreeHemoSeg，用于自动检测和分割胎儿脑MRI中的生发层-脑室出血（GMH-IVH）。该方法通过结合医学先验知识生成伪病变图像进行训练，有效解决了标注数据获取困难的问题。实验结果表明，FreeHemoSeg在内部和外部验证中均表现出优越的检测和分割性能，并显著提升了放射科医生的诊断效率和准确性。

详情

英文摘要

Background: Prenatal germinal matrix-intraventricular hemorrhage (GMH-IVH) is a leading cause of infant mortality and neurodevelopmental impairment. Manual diagnosis and lesion segmentation are labor-intensive and error-prone. Deep learning models offer potential for automation but typically require large annotated datasets, which are challenging to obtain. Purpose: To develop and validate an annotation-free deep learning framework for automated detection and segmentation of GMH-IVH on brain MRI. Materials and Methods: This retrospective study analyzed 2D T2-weighted MRI data from pregnant women collected from October 2015 to October 2023 at one hospital (internal validation) and two hospitals (external validation). Eligible participants included healthy fetuses and those with GMH-IVH. FreeHemoSeg was developed and trained using pseudo GMH-IVH images synthesized from normal fetal data guided by medical priors. Primary outcomes included diagnostic accuracy (area under the ROC curve [AUROC], sensitivity, specificity) and segmentation accuracy (Dice similarity coefficient [DSC]). A reader study evaluated clinical utility. Results: A total of 1674 stacks from 558 pregnant women were analyzed. FreeHemoSeg achieved the highest performance in both internal (sensitivity: 0.914, 95% CI 0.869-0.945; specificity: 0.966, 95% CI 0.946-0.978; DSC: 0.559, 95% CI 0.546-0.571) and external validation (sensitivity: 0.824, 95% CI 0.739-0.885; specificity: 0.943, 95% CI 0.913-0.964; DSC: 0.512, 95% CI 0.497-0.526), outperforming supervised and unsupervised methods. FreeHemoSeg assistance improved radiologists' sensitivity (from 0.882 to 0.941-1.000) and diagnostic confidence while reducing interpretation time by 16.0-52.7%. Conclusion: FreeHemoSeg accurately detects and localizes fetal brain hemorrhages without annotated training data, enabling earlier diagnosis and supporting timely clinical management.

URL PDF HTML ☆

赞 0 踩 0

2605.09552 2026-05-12 math.OC cs.LG stat.ML

Phases of Muon: When Muon Eclipses SignSGD

Elliot Paquette, Noah Marshall, Lucas Benigni, Guangyuan Wang, Atish Agarwala, Courtney Paquette

AI总结本文研究了Muon及其相关的谱优化方法在高维矩阵最小二乘问题中的行为，揭示了其与SignSVD和SignSGD等随机优化方法之间的关系。通过推导确定性动态模型，分析表明Muon在大批次时相当于对数据协方差谱进行平方根预处理，而小批次时则表现出类似SGD的行为，收敛速度变慢。研究还发现，在各向异性数据下，SignSVD和SignSGD的性能存在显著差异，并在协方差幂律模型中识别出三种不同的性能相态。

2605.09534 2026-05-12 cs.CR cs.AI

Governing AI-Assisted Security Operations: A Design Science Framework for Operational Decision Support

Elyson A. De La Cruz, Rishikesh Sahay, Md Rasel Al Mamun

AI总结本文研究了如何在高风险操作环境中引入生成式人工智能、检索增强生成和代码代理等技术，以支持安全运营决策，同时保障责任、隐私、成本控制和可审计性。研究提出了一种基于设计科学的方法框架，通过分离AI规划与操作执行、使用结构化检索、审批模板、策略验证和可审计的代理追踪等机制，构建了一个受控的AI查询代理工具。该研究的核心贡献在于提供了一个管理框架，用于指导在高风险数字基础设施中对AI辅助操作决策支持进行治理。

Comments 28 pages, 1 listing, 1 figure, 20 Tables