arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.24946 2026-05-11 cs.SE cs.LG

MobileDev-Bench: A Benchmark for Issue Resolution in Mobile Application Development

Moshood A. Fakorede, Krishna Upadhyay, A. B. Siddique, Umar Farooq

AI总结本文提出 MobileDev-Bench，一个用于评估大语言模型在移动应用开发中问题修复能力的基准数据集，涵盖了 Android 原生（Java/Kotlin）、React Native（TypeScript）和 Flutter（Dart）等平台的 19 款真实移动应用中的 407 个问题修复任务。每个任务均包含开发者报告的问题和可执行的测试补丁，支持在移动构建环境中对模型生成的修复方案进行自动化验证。实验表明，当前主流大语言模型在该基准上的端到端修复成功率远低于现有基准，突显了移动应用开发中问题修复任务的复杂性与挑战性。

Comments 30 pages, 14 figures, 12 tables

2603.24755 2026-05-11 cs.SE cs.AI cs.CL

SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

Gabriel Orlanski, Devjeet Roy, Alexander Yun, Changho Shin, Alex Gu, Albert Ge, Dyah Adila, Nicholas Roberts, Frederic Sala, Aws Albarghouthi

AI总结 SlopCodeBench 是一个用于评估代码智能体在长期迭代任务中性能退化的基准，包含36个问题和196个检查点，要求智能体不断扩展自身解决方案。与以往的迭代基准不同，该基准在架构决策上对智能体提出明确要求，但允许其自由设计内部结构，从而更真实地反映代码质量的变化。研究发现，所有测试的智能体都无法完整解决任何问题，且代码在迭代过程中逐渐出现结构退化和冗余增加，表明当前智能体在长期开发任务中仍存在显著的代码质量下降问题。

Comments Code and Leaderboards are located at https://www.scbench.ai

2603.16025 2026-05-11 cond-mat.mes-hall cs.CV quant-ph

3D tomography of exchange phase in a Si/SiGe quantum dot device

Dylan Albrecht, Sarah Thompson, N. Tobias Jacobson, Ryan Jock

AI总结本文研究了基于硅/硅锗量子点器件中交换相互作用的三维成像问题，旨在从实验数据中准确提取交换耦合系数 $J(\mathbf{V})$ 随栅压变化的函数形式。为解决相位反演和积分逆问题带来的困难，作者结合相位移数字全息技术和最大流/最小割相位展开方法，在三维电压空间中重建了累积相位体积。该方法在提高测量鲁棒性的同时，为量子比特控制的系统优化和器件性能分析提供了重要依据。

Comments 11 pages, 6 figures; updated acknowledgements

详情

英文摘要

The exchange interaction is a foundational building block for the operation of spin-based quantum processors. Extracting the exchange interaction coefficient $J(\mathbf{V})$, as a function of gate electrode voltages, is important for understanding disorder, faithfully simulating device performance, and operating spin qubits with high fidelity. Typical coherent measurements of exchange in spin qubit devices yield a modulated cosine of an accumulated phase, which in turn is the time integral of exchange. As such, extracting $J(\mathbf{V})$ from experimental data is difficult due to the ambiguity of inverting a cosine, the sensitivity to noise when unwrapping phase, as well as the problem of inverting the integral. As a step toward obtaining $J(\mathbf{V})$, we tackle the first two challenges to reveal the accumulated phase, $ϕ(\mathbf{V})$. We incorporate techniques from a wide range of fields to robustly extract and model a 3D phase volume for spin qubit devices from a sequence of 2D measurements. In particular, we present a measurement technique to obtain the wrapped phase, as done in phase-shifting digital holography, and utilize the max-flow/min-cut phase unwrapping method (PUMA) to unwrap the phase in 3D voltage space. We show this method is robust to the minimal observed drift in the device, which we confirm by increasing scan resolution. Upon building a model of the extracted phase, we optimize over the model to locate a minimal-gradient $π$ exchange pulse point in voltage space. Our measurement protocol may provide detailed information useful for understanding the origins of device variability governing device yield, enable calibrating device models to specific devices during operation for more sophisticated error attribution, and enable a systematic optimization of qubit control. We anticipate that the methods presented here may be applicable to other qubit platforms.

URL PDF HTML ☆

赞 0 踩 0

2603.03096 2026-05-11 eess.AS cs.CL

Interpreting Speaker Characteristics in the Dimensions of Self-Supervised Speech Features

Kyle Janse van Rensburg, Benjamin van Niekerk, Herman Kamper

AI总结本研究探讨了通过自监督学习训练的语音模型如何在其特征表示中编码说话人特性。通过主成分分析（PCA）对话语的平均表示进行分析，发现主成分中包含与音高、性别等相关的说话人特征，其他成分则与强度、噪声水平、共振峰等特性相关。研究进一步表明，这些特性在特征维度上相对独立，并可通过调整相应维度来改变语音特性。

Comments 5 pages, 7 figures, submitted to IEEE Signal Processing Letters

2602.16928 2026-05-11 cs.GT cs.AI cs.MA

Discovering Multiagent Learning Algorithms with Large Language Models

Zun Li, John Schultz, Daniel Hennes, Marc Lanctot

AI总结该研究探索了如何利用大语言模型（LLM）自动发现多智能体强化学习（MARL）中的算法，特别是在不完美信息博弈中。研究采用AlphaEvolve框架，在反事实遗憾最小化（CFR）和策略空间响应预言机（PSRO）两种范式中进行算法设计空间的搜索，最终提出了两个性能优异的算法VAD-CFR和SHOR-PSRO。通过进一步提炼，研究还得到了结构更简单、泛化能力更强的简化版本WOP-CFR和PM-PSRO，为利用LLM进行算法发现提供了清晰的方法论。

Comments More experiments and analysis on algorithmic distilliation

2602.15189 2026-05-11 cs.IR cs.AI cs.CL

ScrapeGraphAI-100k: Dataset for Schema-Constrained LLM Generation

William Brach, Francesco Zuppichini, Marco Vinciguerra, Lorenzo Padoan

AI总结该研究提出了一种名为 ScrapeGraphAI-100k 的数据集，用于支持大型语言模型在指定 JSON Schema 下的结构化生成任务。该数据集包含 93,695 个经过去重和平衡的结构化提取实例，覆盖 18,000 多个不同 Schema 和 15 种语言，每个实例均包含页面内容、提示、Schema 和模型响应等信息。研究还分析了 Schema 复杂度对生成效果的影响，并通过微调实验展示了该数据集在训练和评估结构化生成模型方面的有效性。

2602.10024 2026-05-11 cs.IR cs.CL

Overview of the TREC 2025 RAGTIME Track

Dawn Lawrie, Sean MacAvaney, James Mayfield, Luca Soldaini, Eugene Yang, Andrew Yates

AI总结 TREC 2025 RAGTIME 追踪旨在研究从多语言来源文档中生成报告的能力，提供了包含阿拉伯语、中文、英语和俄语新闻的文档集。该追踪包含多语言报告生成、英语报告生成和多语言信息检索三项任务，共吸引了13支队伍提交了125次运行结果。本文概述了这三项任务并呈现了相关实验结果。

Comments 14 pages, 3 figures, final version of the RAGTIME 2025 overview paper

2602.09457 2026-05-11 stat.ML cs.DS cs.LG

From Average Sensitivity to Small-Loss Regret Bounds under Random-Order Model

Shinsaku Sakaue, Yuichi Yoshida

AI总结本文研究了随机顺序模型下的在线学习问题，其中损失函数集由对手选定但以随机顺序呈现。通过扩展现有的批量到在线转换方法，作者提出了一种新的分析框架，将离线算法的近似保证、平均敏感度和稳定性转化为在线设置下的小损失遗憾界。该方法适用于包括在线聚类和低秩近似在内的多种问题，并在子模函数最小化和ℓ₁回归等任务中取得了具体的应用结果，展示了稀疏化技术在无需损失函数结构性假设下实现小损失遗憾界的有效性。

2602.09034 2026-05-11 q-bio.NC cs.AI

Latent-Space Causal Discovery from Indirect Neuroimaging Observations

Sangyoon Bae, Miruna Oprescu, David Keetae Park, Shinjae Yoo, Jiook Cha

AI总结该研究旨在从间接神经影像观测中发现潜在空间中的因果关系，克服了血流动力学和体积传导对信号的扭曲影响。研究提出了一个基于物理模型和非平稳潜在动态的条件框架，并推导了逆向误差传播的上界。在此基础上，作者设计了INCAMA方法，结合物理感知的逆向建模与延迟感知的Mamba编码器，通过机制变化提升因果图结构的估计性能。实验表明，该方法在模拟和真实fMRI数据上均显著优于现有方法，尤其在运动任务中能准确捕捉经典的视觉-运动通路。

Comments 9 pages, 2 figures

2602.01621 2026-05-11 cs.CR cs.LG

CGF-Softmax: A Cumulant-Based Softmax Reformulation for Efficient Inference under Homomorphic Encryption

Hanjun Park, Byeongseo Min, Jiheon Woo, Min-Wook Jeong, Jongho Shin, Yongwoo Lee, Young-Sik Kim, Yongjune Kim

AI总结同态加密（HE）为隐私保护机器学习提供了重要框架，但在其下高效执行softmax操作——尤其是基于transformer的模型中的关键组件——仍面临挑战。本文提出CGF-Softmax方法，通过累积生成函数（CGF）重构softmax的分母，消除了同态除法和显式最大值减法，从而大幅降低乘法深度，同时保持softmax的核心性质。实验表明，该方法在视觉Transformer和大语言模型中实现了与高深度精确方法相近的推理精度，且计算成本显著降低。

2602.00716 2026-05-11 stat.ML cond-mat.dis-nn cs.LG

Emergence of Distortions in High-Dimensional Guided Diffusion Models

Enrico Ventura, Beatrice Achilli, Luca Ambrogioni, Carlo Lucibello

AI总结该论文研究了在高维引导扩散模型中，分类器无关引导（CFG）方法导致生成样本失真的现象。通过统计物理工具，作者分析了CFG采样分布与真实条件分布之间的不匹配问题，并在可解析处理的设定中，揭示了数据维度和类别数量对失真程度的影响。研究发现，当类别数随数据维度指数增长时，高维高斯混合模型中会出现显著失真，而在次指数增长情况下，失真则因动力学相变而消失。此外，作者提出了一种新的引导调度策略，有效提升了模型的类别可分性和样本多样性。

Comments 41 pages, 21 figures

2602.00474 2026-05-11 stat.ML cs.LG cs.NA math.NA

Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients

Yang Xu, Vaneet Aggarwal

AI总结本文研究了用于有限马尔可夫链的固定策略评估问题，特别是针对可能存在不可约性和周期性的情况。传统的方法在分解收益和偏差时无法准确区分持久性行为和瞬态效应，本文通过识别转移矩阵的实外周不变子空间，提出了一种最小外周商空间分解方法，从而消除了非衰减模式，使得剩余动态严格稳定。该方法将奖励唯一分解为持久模式部分和瞬态部分，能够准确重构有限时间回报，并在生成模型下提供稳定的估计。

2601.22246 2026-05-11 cs.CR cs.AI

MirrorMark: Generalizable Mirrored Sampling for Multi-bit LLM Watermarking

Ya Jiang, Massieh Kordi Boroujeny, Surender Suresh Kumar, Kai Zeng

AI总结随着大语言模型在问答和内容生成等应用中发挥越来越重要的作用，可靠的内容归属变得至关重要。本文提出了一种名为 MirrorMark 的多比特 LLM 水印方法，其核心思想是将符号映射规则与基础水印采样器分离，并通过映射每个符号到一个可由检测器重现的伪随机对象的模 1 镜像变换来实现多比特嵌入。该方法在保持生成文本质量的同时，提升了水印的可检测性和准确性，并引入了上下文锚定平衡调度器以支持实际的负载嵌入。

2601.21839 2026-05-11 cs.CY cs.AI cs.GT cs.LG

Test-Time Compute Games

Ander Artola Velasco, Dimitrios Rontogiannis, Stratis Tsirtsis, Manuel Gomez-Rodriguez

AI总结本文研究了大型语言模型（LLM）在测试时计算资源使用带来的市场效率问题，指出当前云服务提供商为提高收益可能过度使用计算资源，而这对输出质量的提升有限。为此，作者提出了一种反向第二价格拍卖机制，使提供商根据其价格和预期质量进行竞标，用户则按中标者边际价值支付，从而提升市场效率。实验表明该机制在数学和科学基准数据集上具有实际效果。

2601.16130 2026-05-11 cs.HC cs.AI

Replicating Human Motivated Reasoning Studies with LLMs

Neeley Pate, Adiba Mahbub Proma, Hangfeng He, James N. Druckman, Daniel C. Molden, Gourab Ghoshal, Ehsan Hoque

AI总结该研究探讨了大型语言模型（LLMs）是否表现出与人类相似的动机性推理现象。通过复现四项先前关于政治动机性推理的实验，研究发现基础LLMs的行为与人类预期存在差异，且不同模型在某些行为上表现出相似性，如回避回答问题和将提供论点纳入观点中。研究结果表明，基础LLMs可能并未模拟人类的动机性推理过程，这对依赖LLMs进行观点复制和论证评估的研究具有重要意义。

2601.15356 2026-05-11 eess.IV cs.AI

Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing

Xiang Li, Xueheng Li, Yu Wang, Xuanhua He, Zhangchi Hu, Weiwei Yu, Chengjun Xie

AI总结该论文提出了一种名为Q-Probe的智能图像质量评估框架，旨在解决高分辨率图像质量评估中局部退化细节难以捕捉的问题。通过引入上下文感知的探针机制和分阶段训练策略，Q-Probe有效避免了现有方法中的“裁剪即退化”偏差，并在高分辨率场景下实现了更精确的评估。研究还构建了首个专门用于高分辨率细粒度退化分析的基准数据集Vista-Bench，显著提升了模型在不同分辨率下的性能表现。

2601.02602 2026-05-11 cs.CR cs.LG

SWaRL: Safeguard Code Watermarking via Reinforcement Learning

Neusha Javidnia, Ruisi Zhang, Ashish Kundu, Farinaz Koushanfar

AI总结本文提出了一种名为SWaRL的代码水印框架，旨在通过在生成的程序中嵌入可验证的唯一签名，保护代码大模型的知识产权。该方法采用基于强化学习的协同训练框架，结合编译器反馈确保功能正确性，并利用联合训练的保密验证器作为奖励信号，以保持水印的可检测性。实验表明，SWaRL在保持代码功能完整性的前提下，相比现有方法具有更高的水印检测准确率，并且对重构和对抗性攻击表现出较强的鲁棒性。

Comments Preprint

2512.23927 2026-05-11 stat.ML cs.LG

Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

Lars van der Laan, Nathan Kallus

AI总结本文研究了软Fitted Q-Iteration（soft FQI）在无Bellman完备性条件下的稳定性机制，提出了一种基于局部平稳分布对齐的稳定性分析方法。通过分析软Bellman算子在软最优固定点附近的收敛行为，作者发现其在平稳状态-动作范数下具有收缩性质，并据此设计了基于平稳重加权的软FQI算法，该方法在有限样本下能够实现局部线性收敛。研究还表明，普通软FQI在策略平稳采样下也具有局部稳定性，并解释了温度退火作为收敛区域的延续策略的作用。

2512.23805 2026-05-11 stat.ML cs.LG

Fitted $Q$ Evaluation Without Bellman Completeness via Stationary Weighting

Lars van der Laan, Nathan Kallus

AI总结本文研究了一种无需依赖Bellman完备性条件的拟合Q评估（FQE）方法，通过在回归步骤中引入目标策略的平稳状态-动作分布权重，改进了传统FQE在行为分布范数下的投影方式。该方法在保持模块化监督学习形式的同时，使拟合投影与目标策略诱导的$L^2$范数下的收缩算子对齐，从而在有限样本下实现了对平稳投影Bellman不动点的线性收敛，并分离了迭代、统计、近似和权重估计误差，实验表明该方法能有效稳定FQE并降低价值估计误差。

2512.23694 2026-05-11 stat.ML cs.LG econ.EM

Bellman Calibration for $V$-Learning in Offline Reinforcement Learning

Lars van der Laan, Nathan Kallus

AI总结在离线强化学习中，长期价值预测的可靠性面临挑战，因为拟合价值方法涉及引导、函数逼近和分布偏移，而标准保证通常需要贝尔曼完备性或可实现性。本文提出贝尔曼校准，一种较弱的可靠性准则，要求预测值相近的状态具有一致的贝尔曼目标平均值，并基于此提出迭代贝尔曼校准方法，通过拟合原始预测的一维映射对价值预测器进行后处理校准。该方法无需贝尔曼完备性或价值函数可实现性，即可在有限样本下保证校准误差以一维非参数速率控制，并将价值误差分解为统计估计、有限迭代和逼近误差，明确了校准在何时能提升预测性能。

2512.09682 2026-05-11 eess.SY cs.AI cs.GT cs.MA cs.SY

Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies

Mika Persson, Jonas Lidman, Jacob Ljungberg, Samuel Sandelius, Adam Andersson

AI总结本文研究了多智能体强化学习（MARL）在无人机群协同传递关键数据包中的应用，旨在解决小规模且稀疏无人机群在动态环境中实现一次性数据传递的问题。研究引入了一类确定性博弈作为MARL扩展性研究的模型问题，并提出了一种基于Dijkstra最短路径算法的鲁棒基准策略以限制无人机运动。实验表明，两种现成的MARL算法在小规模场景下表现接近基准策略，但在智能体数量增加时面临可扩展性挑战。

Comments Accepted to the 2026 IFAC World Congress

2511.22893 2026-05-11 eess.SY cs.AI cs.SY

Switching-time bioprocess control with pulse-width-modulated optogenetics

Sebastián Espinel-Ríos

AI总结该研究探讨了如何利用脉宽调制的光遗传学技术实现生物过程的动态控制，以提高生物制造效率。面对光强调控在陡峭剂量响应关系下调节能力受限的问题，研究提出通过交替开启和关闭光源来平滑基因表达响应，并将其建模为一个具有二元输入的切换时间最优控制问题。为解决传统离散优化方法在高精度控制网格下变量数量激增的问题，作者引入强化学习方法，通过参数化占空比来实现对开关时间的连续控制，从而在保持光强二元特性的同时提升过程可控性。

Comments Accepted conference paper: IFAC World Congress 2026

2511.09016 2026-05-11 eess.SY cs.LG cs.SY

Assumed Density Filtering and Smoothing with Neural Network Surrogate Models

Simon Kuang, Xinfan Lin

AI总结本文研究了在非线性动态系统中如何通过神经网络代理模型实现准确的状态估计与平滑。作者提出利用最新的分析公式计算深度神经网络在高斯输入下的均值和协方差，从而实现不确定性传播，并主张使用交叉熵而非均方根误差作为评估滤波与平滑精度的指标。实验表明，该方法在随机洛伦兹系统和维纳系统中表现出优越的估计性能，并能提升基于状态估计的线性二次调节效果。

Comments To appear at Learning for Decision and Control 2026

2511.03182 2026-05-11 cs.SE cs.LG

Understanding Robustness of Model Editing in Code LLMs

Vinaik Chhetri, Moghis Fereidouni, A. B Siddique, Umar Farooq

AI总结本文研究了代码大语言模型在模型编辑场景下的鲁棒性问题，特别是在面对API更新时能否正确迁移并保持原有性能。作者构建了一个基于HumanEval、MBPP和APPS的受控基准测试集，包含2,040个问题和140种合成API修改，并结合执行沙箱进行评估。实验表明，现有编辑方法在单次编辑和连续编辑场景下均存在显著的泛化和特异性退化问题，揭示了模型编辑在实际应用中仍面临诸多挑战。

Comments 26 pages, 14 figures, 20 tables

2510.24736 2026-05-11 q-bio.QM cs.LG q-bio.BM

RNAGenScape: Property-Guided, Optimized Generation of mRNA Sequences with Manifold Langevin Dynamics

Danqi Liao, Chen Liu, Xingzhi Sun, Dié Tang, Haochen Wang, Scott Youlten, Srikar Krishna Gopinath, Haejeong Lee, Ethan C. Strayer, Antonio J. Giraldez, Smita Krishnaswamy

AI总结 RNAGenScape 是一种基于流形朗之万动力学的 mRNA 序列生成框架，旨在生成具有特定生物性质的优化 mRNA 序列。该方法通过学习真实数据的潜在流形，并在该流形上进行约束优化，确保生成序列的生物学可行性与功能有效性。研究结合了自编码器、属性预测器和属性引导的优化过程，显著提升了生成序列的性能指标，同时保持了较高的生成效率。

Comments ICML 2025 Generative AI and Biology (GenBio) Workshop, Oral presentation

2510.18516 2026-05-11 q-bio.NC cs.LG

Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware Pretraining

Sangyoon Bae, Mehdi Azabou, Blake Richards, Jiook Cha

AI总结该研究针对神经记录数据中由细胞类型差异、电路动态和刺激响应随机性引起的异质性问题，提出了一种基于生物特性的预训练方法POYO-CAP。该方法通过识别统计规律性强的神经元并进行掩码重建与辅助监督预训练，再对更随机的神经元群体进行微调，从而提升模型性能。实验表明，该方法在Allen Brain Observatory数据集上相较从零训练提升了12-13%，并实现了模型规模的稳定扩展，有效利用了神经异质性作为可扩展的学习优势。

2510.02371 2026-05-11 cs.CR cs.AI cs.DC

Federated Spatiotemporal Graph Learning for Passive Attack Detection in Smart Grids

Bochra Al Agha, Razane Tajeddine

AI总结本文研究了智能电网中被动攻击的检测问题，这类攻击通过窃听通信链路获取电网拓扑、用电模式等敏感信息。为解决单节点检测信号微弱、短暂且易被忽略的问题，提出了一种基于联邦学习的时空图学习方法，通过融合物理层和行为特征，在本地设备上构建星型子图并提取时空特征进行攻击检测。该方法在非独立同分布的联邦学习框架下训练，具有良好的鲁棒性和隐私保护能力，并在合成数据集上取得了高检测精度和低误报率的实验结果。

详情

英文摘要

Smart grids are exposed to passive eavesdropping, where attackers listen silently to communication links. Although no data is actively altered, such reconnaissance can reveal grid topology, consumption patterns, and operational behavior, creating a gateway to more severe targeted attacks. Detecting this threat is difficult because the signals it produces are faint, short-lived, and often disappear when traffic is examined by a single node or along a single timeline. This paper introduces a graph-centric, multimodal detector that fuses physical-layer and behavioral indicators over ego-centric star subgraphs and short temporal windows to detect passive attacks. To capture stealthy perturbations, a two-stage encoder is introduced: graph convolution aggregates spatial context across ego-centric star subgraphs, while a bidirectional GRU models short-term temporal dependencies. The encoder transforms heterogeneous features into a unified spatio-temporal representation suitable for classification. Training occurs in a federated learning setup under FedProx, improving robustness to heterogeneous local raw data and contributing to the trustworthiness of decentralized training; raw measurements remain on client devices. A synthetic, standards-informed dataset is generated to emulate heterogeneous HAN/NAN/WAN communications with wireless-only passive perturbations, event co-occurrence, and leak-safe splits. The model achieves a testing accuracy of 98.32% per-timestep (F1_{attack}=0.972) and 93.35% per-sequence at 0.15% FPR using a simple decision rule with run-length m=2 and threshold $τ=0.55$. The results demonstrate that combining spatial and temporal context enables reliable detection of stealthy reconnaissance while maintaining low false-positive rates, making the approach suitable for non-IID federated smart-grid deployments.

URL PDF HTML ☆

赞 0 踩 0

2509.00398 2026-05-11 cs.CY cs.AI

A Study on the Framework for Evaluating the Ethics and Trustworthiness of Generative AI

Cheonsu Jeong, Seunghyun Lee, Seonhee Jeong, Sungsu Kim

AI总结本文研究了生成式人工智能技术快速发展所带来的伦理与可信度挑战，并提出了一套系统化的评估框架。针对当前评估方法侧重性能与准确性的不足，该研究从公平性、透明性、安全性等多个维度构建了详细指标与评估方法，并结合多国政策进行对比分析。该框架贯穿AI生命周期，整合技术评估与多学科视角，为负责任地推进生成式AI发展提供了理论基础与实践指导。

Comments 22 pages, 3 figures, 6 tables

详情

DOI: 10.47852/bonviewAIA62027463
Journal ref: Artificial Intelligence and Applications, 2026

英文摘要

This study provides an in_depth analysis of the ethical and trustworthiness challenges emerging alongside the rapid advancement of generative artificial intelligence (AI) technologies and proposes a comprehensive framework for their systematic evaluation. While generative AI, such as ChatGPT, demonstrates remarkable innovative potential, it simultaneously raises ethical and social concerns, including bias, harmfulness, copyright infringement, privacy violations, and hallucination. Current AI evaluation methodologies, which mainly focus on performance and accuracy, are insufficient to address these multifaceted issues. Thus, this study emphasizes the need for new human_centered criteria that also reflect social impact. To this end, it identifies key dimensions for evaluating the ethics and trustworthiness of generative AI_fairness, transparency, accountability, safety, privacy, accuracy, consistency, robustness, explainability, copyright and intellectual property protection, and source traceability and develops detailed indicators and assessment methodologies for each. Moreover, it provides a comparative analysis of AI ethics policies and guidelines in South Korea, the United States, the European Union, and China, deriving key approaches and implications from each. The proposed framework applies across the AI lifecycle and integrates technical assessments with multidisciplinary perspectives, thereby offering practical means to identify and manage ethical risks in real_world contexts. Ultimately, the study establishes an academic foundation for the responsible advancement of generative AI and delivers actionable insights for policymakers, developers, users, and other stakeholders, supporting the positive societal contributions of AI technologies.

URL PDF HTML ☆

赞 0 踩 0

2504.02382 2026-05-11 eess.IV cs.AI cs.CV

Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge

Yudi Sang, Yanzhen Liu, Sutuke Yibulayimu, Yunning Wang, Benjamin D. Killeen, Mingxu Liu, Ping-Cheng Ku, Ole Johannsen, Karol Gotkowski, Maximilian Zenk, Klaus Maier-Hein, Fabian Isensee, Peiyan Yue, Yi Wang, Haidong Yu, Zhaohong Pan, Yutong He, Xiaokun Liang, Daiqi Liu, Fuxin Fan, Artur Jurgas, Andrzej Skalski, Yuxi Ma, Jing Yang, Szymon Płotka, Rafał Litka, Gang Zhu, Yingchun Song, Mathias Unberath, Mehran Armand, Dan Ruan, S. Kevin Zhou, Qiyong Cao, Chunpeng Zhao, Xinbao Wu, Yu Wang

AI总结本文综述了PENGWIN 2024挑战赛对CT和X光影像中骨盆骨折分割技术的评估结果。研究针对医学影像中骨碎片分割的难点，利用包含150例CT扫描和大量模拟X光图像的数据集，对16支国际团队的算法进行了多指标评估。结果显示，CT分割效果较好，平均IoU达到0.930，而X光分割效果相对较弱，最佳算法IoU为0.774，表明投影成像中的碎片重叠仍是主要挑战。研究还揭示了算法设计的多样性，并指出交互式分割方法可能对提升临床实用性具有重要意义。

Comments PENGWIN 2024 Challenge Report

详情

DOI: 10.1109/TMI.2025.3650126

英文摘要

The segmentation of pelvic fracture fragments in CT and X-ray images is crucial for trauma diagnosis, surgical planning, and intraoperative guidance. However, accurately and efficiently delineating the bone fragments remains a significant challenge due to complex anatomy and imaging limitations. The PENGWIN challenge, organized as a MICCAI 2024 satellite event, aimed to advance automated fracture segmentation by benchmarking state-of-the-art algorithms on these complex tasks. A diverse dataset of 150 CT scans was collected from multiple clinical centers, and a large set of simulated X-ray images was generated using the DeepDRR method. Final submissions from 16 teams worldwide were evaluated under a rigorous multi-metric testing scheme. The top-performing CT algorithm achieved an average fragment-wise intersection over union (IoU) of 0.930, demonstrating satisfactory accuracy. However, in the X-ray task, the best algorithm achieved an IoU of 0.774, which is promising but not yet sufficient for intra-operative decision-making, reflecting the inherent challenges of fragment overlap in projection imaging. Beyond the quantitative evaluation, the challenge revealed methodological diversity in algorithm design. Variations in instance representation, such as primary-secondary classification versus boundary-core separation, led to differing segmentation strategies. Despite promising results, the challenge also exposed inherent uncertainties in fragment definition, particularly in cases of incomplete fractures. These findings suggest that interactive segmentation approaches, integrating human decision-making with task-relevant information, may be essential for improving model reliability and clinical applicability.

URL PDF HTML ☆

赞 0 踩 0

2503.17656 2026-05-11 q-bio.QM cs.AI cs.LG

Pretraining a Foundation Model for Small-Molecule Natural Products

Yuheng Ding, Bo Qiang, Shaoning Li, Yiran Zhou, Jie Yu, Qi Li, Cheng Shi, Liangren Zhang, Yusong Wang, Nanning Zheng, Zhenming Liu

AI总结该研究针对天然产物在药物发现中的重要性，提出了一种专门用于小分子天然产物的预训练基础模型。通过引入对比学习和掩码图学习，模型能够有效捕捉分子支架的进化信息和侧链特征，克服了现有方法在通用性和任务适应性上的不足。实验表明，该模型在天然产物分类、基因与微生物层面分析以及虚拟筛选等任务中均取得了最先进的性能，为天然产物的研究和药物开发提供了有力工具。

Comments Accepted by Nature Machine Intelligence(2026)

详情

DOI: 10.1038/s42256-026-01226-8
Journal ref: Nature Machine Intelligence(2026)

英文摘要

Natural products, as metabolites from microorganisms, animals, or plants, exhibit diverse biological activities, making them crucial for drug discovery. Nowadays, existing deep learning methods for natural products research primarily rely on supervised learning approaches designed for specific downstream tasks. However, such one-model-for-a-task paradigm often lacks generalizability and leaves significant room for performance improvement. Additionally, existing molecular characterization methods are not well-suited for the unique tasks associated with natural products. To address these limitations, we have pre-trained a foundation model for natural products based on their unique properties. Our approach employs a novel pretraining strategy that is especially tailored to natural products. By incorporating contrastive learning and masked graph learning objectives, we emphasize evolutional information from molecular scaffolds while capturing side-chain information. Our framework achieves state-of-the-art (SOTA) results in various downstream tasks related to natural product mining and drug discovery. We first compare taxonomy classification with synthesized molecule-focused baselines to demonstrate that current models are inadequate for understanding natural synthesis. Furthermore, by diving into a fine-grained analysis at both the gene and microbial levels, NaFM demonstrates the ability to capture evolutionary information. Eventually, our method is experimented with virtual screening, illustrating informative natural product representations that can lead to more effective identification of potential drug candidates.

URL PDF HTML ☆

赞 0 踩 0