arXivDaily arXiv每日学术速递 周一至周五更新

科学与医疗

AI for Science

科学智能、蛋白质、分子、药物、材料、气象、物理和数学 AI。

今日/当前日期收录 5 信号源:cs.LG, q-bio, physics, cond-mat, math, stat.ML
2604.14906 2026-06-18 physics.bio-ph cs.LG 版本更新 95%

Unraveling the Mechanism of Drug Binding to SARS-CoV-2 RNA Pseudoknot with Thermodynamics-Driven Machine Learning

用热力学驱动的机器学习揭示药物与SARS-CoV-2 RNA假结的结合机制

Mariia Ivonina, Jakub Rydzewski

发表机构 * Platform of Inter/Transdisciplinary Energy Research, Kyushu University(interdisciplinary 能源研究平台,九州大学) Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University(物理研究所,物理、天文学与信息学学院,尼古拉库普林大学)

专题命中 AI制药 :机器学习研究药物与RNA结合机制,属于AI制药

AI总结 本研究利用热力学驱动的机器学习方法(光谱映射)从全原子分子动力学轨迹中学习集体变量,揭示了配体结合对SARS-CoV-2 RNA假结拓扑选择性去稳定化的机制,并发现质子化状态是模拟RNA靶向药物作用的关键因素。

详情
AI中文摘要

SARS-CoV-2 RNA中的假结二级结构通过$-1$程序性核糖体移码($-1$ PRF)调控蛋白质合成,该机制使病毒能从重叠阅读框产生结构蛋白和非结构蛋白。该假结表现出穿线和非穿线两种长寿命拓扑结构。配体结合对其折叠的影响是开发$-1$ PRF小分子抑制剂的关键过程。通过引入捕捉相应最慢动力学模式的集体变量(CVs),可以促进通过无偏分子动力学(MD)模拟理解这一过程。这里,我们使用光谱映射(SM),一种热力学驱动的机器学习技术,直接从SARS-CoV-2 RNA假结与$-1$ PRF抑制剂莫拉沙星及其两种结构类似物(中性和离子化形式)复合物的全原子MD轨迹中学习这样的CVs。从学习到的CVs导出的自由能景观(FELs)表明,配体诱导的去稳定化是拓扑选择性的。在穿线假结中,抑制剂去稳定化S2茎,而在非穿线假结中,去稳定化发生在S1和S3茎。此外,每个配体重塑FEL的程度与实验报道的抗病毒效力相匹配,而质子化状态在相同RNA拓扑内定性地改变动力学。总体而言,我们的结果显示了假结拓扑、配体类型和质子化状态如何共同影响病毒RNA的慢构象动力学,并确立了生理质子化作为模拟RNA靶向药物作用的关键因素。

英文摘要

The pseudoknot secondary structure in SARS-CoV-2 RNA is essential for regulating protein synthesis through $-$1 programmed ribosomal frameshifting ($-1$ PRF), a mechanism that allows the virus to generate both structural and non-structural proteins from overlapping reading frames. This pseudoknot exhibits both threaded and unthreaded long-lived topologies. The influence of ligand binding on its folding is a process critical for the development of $-$1 PRF small-molecule inhibitors. Understanding this process through unbiased molecular dynamics (MD) simulations can be facilitated by introducing collective variables (CVs) that capture the corresponding slowest dynamical modes. Here, we use spectral map (SM), a thermodynamics-driven machine learning technique, to learn such CVs directly from all-atom MD trajectories of the SARS-CoV-2 RNA pseudoknot in complex with the $-$1 PRF inhibitor merafloxacin and its two structural analogs in neutral and ionized forms. Free-energy landscapes (FELs) derived from the learned CVs indicate that ligand-induced destabilization is topology-selective. In the threaded pseudoknot, the inhibitors destabilize the S2 stem, while in the unthreaded pseudoknot, destabilization occurs in the S1 and S3 stems. Furthermore, the extent to which each ligand reshapes the FEL matches experimentally reported antiviral potency, whereas the protonation state qualitatively alters dynamics within the same RNA topology. Overall, our results show how pseudoknot topology, ligand type, and protonation state collectively influence the slow conformational dynamics of viral RNA and establish physiological protonation as a critical factor for modeling RNA-targeted drug action.

2506.13196 2026-06-18 cs.LG 版本更新 95%

KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction

KEPLA:一种用于精确预测蛋白质-配体结合亲和力的知识增强深度学习框架

Han Liu, Keyan Ding, Peilin Chen, Yinwei Wei, Liqiang Nie, Dapeng Wu, Shiqi Wang

发表机构 * Department of Computer Science, City University of Hong Kong(香港城市大学计算机科学系) ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University(浙江大学杭州国际科技创新中心) School of Software, Shandong University(山东大学软件学院) College of Informatics, Harbin Institute of Technology (Shenzhen)(哈尔滨工业大学(深圳)计算机学院)

专题命中 AI制药 :预测蛋白质-配体结合亲和力,用于药物发现

AI总结 提出KEPLA框架,通过整合基因本体和配体属性的先验知识,利用全局表示对齐与局部交叉注意力,提升蛋白质-配体结合亲和力预测的准确性,在多个基准数据集上超越现有方法。

详情
AI中文摘要

准确预测蛋白质-配体结合亲和力对药物发现至关重要。尽管最近的深度学习方法已展现出有希望的结果,但它们通常仅依赖蛋白质和配体的结构特征,忽略了与结合亲和力相关的宝贵生化知识。为解决这一局限,我们提出KEPLA,一种新颖的深度学习框架,明确整合来自基因本体和配体属性的先验知识以增强预测性能。KEPLA以蛋白质序列和配体分子图作为输入,并优化两个互补目标:(1)将全局表示与知识图谱关系对齐,以捕获领域特定的生化见解;(2)利用局部表示之间的交叉注意力构建细粒度联合嵌入用于预测。在两个基准数据集上的域内和跨域场景实验表明,KEPLA始终优于最先进的基线方法。此外,基于知识图谱关系和交叉注意力图的可解释性分析为潜在的预测机制提供了有价值的见解。

英文摘要

Accurate prediction of protein-ligand binding affinity is critical for drug discovery. While recent deep learning approaches have demonstrated promising results, they often rely solely on structural features of proteins and ligands, overlooking their valuable biochemical knowledge associated with binding affinity. To address this limitation, we propose KEPLA, a novel deep learning framework that explicitly integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance. KEPLA takes protein sequences and ligand molecular graphs as input and optimizes two complementary objectives: (1) aligning global representations with knowledge graph relations to capture domain-specific biochemical insights, and (2) leveraging cross attention between local representations to construct fine-grained joint embeddings for prediction. Experiments on two benchmark datasets across both in-domain and cross-domain scenarios demonstrate that KEPLA consistently outperforms state-of-the-art baselines. Furthermore, interpretability analyses based on knowledge graph relations and cross attention maps provide valuable insights into the underlying predictive mechanisms.

2606.10376 2026-06-18 cs.AI cs.IT math.IT 交叉投稿 90%

Belief-Space Control for Personalized Cancer Treatment via Active Inference

基于主动推理的个性化癌症治疗信念空间控制

Deniz Sargun, H. Bugra Tulay, C. Emre Koksal

发表机构 * American Association for Cancer Research(美国癌症研究协会) AACR Project GENIE registry(AACR Project GENIE 注册中心) AACR Project GENIE Biopharma Collaborative(AACR Project GENIE 生物制药合作组织)

专题命中 AI制药 :主动推理用于个性化癌症治疗

AI总结 提出用主动推理将癌症治疗建模为信念空间规划问题,在测量预算下统一目标导向控制与信息获取,实现患者分类与高效治疗。

Comments 11 pages including appendix

详情
AI中文摘要

癌症治疗本质上是一个具有部分可观测性、潜在患者异质性以及医疗测量预算明确约束的序贯决策问题。与标准强化学习(RL)方法控制状态轨迹不同,癌症治疗会永久性地改变患者的转移动力学,从而改变状态随时间演化的方式。我们使用主动推理将癌症治疗建模为信念空间规划问题,推导出一个期望自由能目标,该目标在测量预算下统一了目标导向控制和信息获取。我们使用来自AACR Project GENIE Biopharma Collaborative数据集的真实临床癌症数据实现了该框架。临床数据结果表明,在真实的测量和治疗约束下,能够同时实现患者分类和高治疗效力。

英文摘要

Cancer treatment is at the core a sequential decision-making problem with partial observability, latent patient heterogeneity, and explicit constraints on the budget for medical measurements. Unlike standard Reinforcement Learning (RL) approaches that control state trajectories, cancer treatments permanently modify patients' transition dynamics, changing how states evolve over time. We model cancer treatment as a belief-space planning problem using active inference, deriving an expected free-energy objective that unifies goal-directed control and information acquisition under measurement budgets without. We implement this framework using real clinical cancer data from the AACR Project GENIE Biopharma Collaborative dataset. Results on clinical data demonstrate a simultaneous patient categorization and high treatment efficacy, under real measurement and treatment constraints.

2606.18390 2026-06-18 cs.LG q-bio.QM 新提交 80%

MOLAR: Learning Multimodal Molecular Representations from Noisy Labels

MOLAR: 从噪声标签中学习多模态分子表示

Yingxu Wang, Kunyu Zhang, Nan Yin, Yu Li, Eran Segal

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学) Zhengzhou University(郑州大学) The Education University of Hong Kong(香港教育大学) The Chinese University of Hong Kong(香港中文大学) Weizmann Institute of Science(魏茨曼科学研究所)

专题命中 AI制药 :提出多模态分子表示学习框架用于属性预测

AI总结 提出MOLAR框架,通过分离干净属性推断与标签观测,利用图与文本模态的残差证据,从噪声标签中学习多模态分子表示,在自然噪声和标签翻转基准上优于基线方法。

详情
AI中文摘要

动机:噪声标签是分子属性预测中的常见挑战,因为分子注释通常来自实验分析、 curated数据库或弱注释流程,而非直接观测到的干净生物状态。将记录标签视为可靠监督会导致模型记忆损坏的观测并学习误导性的分子证据。在多模态分子表示学习中,图-文本融合或对齐可能放大此问题,从而跨模态传播标签引起的错误。结果:我们提出MOLAR,一个从噪声标签中学习多模态分子表示的噪声感知框架。MOLAR将潜在干净属性推断与记录标签观测分离:图和文本视图为干净属性分布贡献残差证据,一个分类标签观测通道将此分布映射到记录标签用于训练。该公式从模型中推导出后验标签可靠性和模态特定的分子证据。在自然噪声分子基准和受控标签翻转基准上的实验表明,MOLAR始终优于代表性基线。可视化分析进一步表明MOLAR提供了可解释的可靠性和模态证据诊断。

英文摘要

Motivation: Noisy labels are a common challenge in molecular property prediction because molecular annotations are often obtained from assays, curated databases, or weak annotation pipelines rather than directly observed clean biological states. Treating recorded labels as reliable supervision can cause models to memorize corrupted observations and learn misleading molecular evidence. In multimodal molecular representation learning, this issue can be amplified by graph-text fusion or alignment, which may propagate label-induced errors across modalities. Results: We propose MOLAR, a noise-aware framework for learning multimodal molecular representations from noisy labels. MOLAR separates latent clean-property inference from recorded-label observation: graph and text views contribute residual evidence to a clean-property distribution, and a categorical label-observation channel maps this distribution to recorded labels for training. This formulation derives posterior label reliability and modality-specific molecular evidence from the model. Experiments on naturally noisy molecular benchmarks and controlled label-flipping benchmarks show that MOLAR consistently outperforms representative baselines. Visualization analyses further show that MOLAR provides interpretable reliability and modality-evidence diagnostics.

2606.18785 2026-06-18 cs.LG cs.AI 新提交 75%

Bayesian Anytime Pareto Set Identification for Multi-Objective Multi-Armed Bandits

贝叶斯任意时间帕累托集识别用于多目标多臂老虎机

Lennert Saerens, Bram Silue, Eleni Litsa, Peter Vrancx, Pieter Libin

发表机构 * imec Data Science Institute, Interuniversity Institute of Biostatistics and Statistical Bioinformatics, UHasselt(哈瑟尔特大学生物统计学与统计生物信息学跨大学研究所数据科学研究所)

专题命中 AI制药 :多目标分子发现,属于AI制药

AI总结 提出首个任意时间多目标多臂老虎机算法Top-Two帕累托前沿汤普森采样(TTPFTS),用于帕累托集识别,在合成环境和超大型分子库中验证有效性,并引入不确定性量化指标。

Comments 26 pages, 13 figures

详情
AI中文摘要

识别帕累托最优解对于支持多目标决策至关重要。我们首次提出了一种用于帕累托集识别问题的任意时间多目标多臂老虎机算法,采用贝叶斯方法:Top-Two帕累托前沿汤普森采样(TTPFTS)。我们在合成环境中将TTPFTS与最先进的固定预算帕累托集识别算法进行基准测试。接下来,我们通过高效探索超大型按需合成分子库,在具有挑战性的多目标分子发现场景中展示了其实用性。此外,我们引入了一种新颖的不确定性量化指标,用于估计算法在预测帕累托集上的置信度。我们证明该指标有效代理真实性能,为监控复杂环境中的学习进度提供了一种稳健的方法。最后,我们用算法渐近正确性的理论证明补充了这些实证发现。

英文摘要

Identifying Pareto optimal solutions is critical to support multi-objective decision-making. We introduce the first anytime Multi-Objective Multi-Armed Bandit algorithm for the Pareto Set Identification problem, taking a Bayesian approach: Top-Two Pareto Front Thompson Sampling (TTPFTS). We benchmark TTPFTS against state-of-the-art fixed-budget Pareto Set Identification algorithms on synthetic environments. Next, we demonstrate its practical utility in a challenging multi-objective molecular discovery setting by efficiently exploring an ultra-large synthesis-on-demand molecular library. Furthermore, we introduce a novel uncertainty quantification metric that estimates our algorithm's confidence in the predicted Pareto set. We demonstrate that this metric effectively proxies true performance, yielding a robust methodology for monitoring learning progress in complex settings. Finally, we complement these empirical findings with a theoretical proof of the algorithm's asymptotic correctness.