arXivDaily arXiv每日学术速递 周一至周五更新
重置
q-bio.MN分子网络4
2606.12219 2026-06-11 q-bio.GN q-bio.MN 新提交

m6A-FORM: A Foundation Model for Decoding N6-methyladenosine Biology

m6A-FORM:解码N6-甲基腺苷生物学的基础模型

Tinghe Zhang, Sumin Jo, Shou-Jiang Gao, Yufei Huang

AI总结 提出基于Transformer的基础模型m6A-FORM,利用MeRIP-seq峰作为先验,预训练后微调实现m6A位点预测,性能优于现有方法,并支持调控因子结合位点预测和组织保守位点分析。

详情
AI中文摘要

N6-甲基腺苷(m6A)是真核生物mRNA中最丰富的内部修饰。然而,现有大多数预测器采用以腺苷为中心的公式,计算效率低且易产生假阳性。本文提出m6A-FORM,一种基于Transformer的RNA甲基化基础模型,使用MeRIP-seq峰作为甲基化富集先验,并在来自143个人类MeRIP-seq研究的约2200万个峰衍生序列上预训练。使用来自m6A-Atlas v2.0和GLORI的高置信度单核苷酸m6A注释微调后,m6A-FORM-sites实现了最先进的m6A位点预测性能,PR-AUC为0.635,ROC-AUC为0.988,PR-AUC比现有方法至少提高0.14,同时推理速度显著加快。任务特定适配进一步支持19个m6A相关调控因子的结合位点预测,以及识别与mRNA降解相关的YTHDF2结合m6A位点。将m6A-FORM应用于来自24个人类组织的67个数据集,识别出19,631个组织保守位点,这些位点具有独特的定位、聚类、甲基化、表达、RBP相互作用和衰变相关特征。

英文摘要

N6-methyladenosine (m6A) is the most abundant internal modification in eukaryotic mRNA. However, most existing predictors use adenosine-centered formulations that are computationally inefficient and prone to false positives. Here we present m6A-FORM, a transformer-based foundation model for RNA methylation that uses MeRIP-seq peaks as methylation-enriched priors and is pretrained on approximately 22 million peak-derived sequences from 143 human MeRIP-seq studies. After fine-tuning with high-confidence single-nucleotide m6A annotations from m6A-Atlas v2.0 and GLORI, m6A-FORM-sites achieves state-of-the-art m6A site prediction performance, with a PR-AUC of 0.635 and ROC-AUC of 0.988, improving PR-AUC by at least 0.14 over existing methods while enabling substantially faster inference. Task-specific adaptation further supports prediction of binding sites for 19 m6A-associated regulators and identification of YTHDF2-bound m6A sites associated with mRNA degradation. Applying m6A-FORM across 67 datasets from 24 human tissues identifies 19,631 tissue-conserved sites with distinct localization, clustering, methylation, expression, RBP-interaction, and decay-associated signatures.

2606.11486 2026-06-11 physics.chem-ph q-bio.MN 新提交

Elucidating the Size of Chemical Space with Assembly Theory

通过组装理论阐明化学空间的大小

Juan Carlos Morales Parra, Keith Y Patarroyo, Abhishek Sharma, David Obeh Alobo, Leroy Cronin

AI总结 利用组装理论,通过组装指数量化分子复杂度,首次从第一性原理估计化学空间大小,发现其随复杂度至少超指数增长,最多双指数增长,在药物相似约束下约为10^117个分子。

详情
Comments
26 pages, 10 figures, 31 references
AI中文摘要

化学空间极其广阔,常见启发式估计表明,在分子质量低于500 Da时,可能存在约10^60个“类药”分子。然而,这些估计很大程度上忽略了所枚举分子的结构和合成复杂性。这里,我们利用组装理论从第一性原理估计化学空间的大小,该理论量化了形成分子所需的因果量,由组装指数捕获。这是一个可测量的分子复杂度度量,源于构建分子图所需的最小递归键合操作次数。组装理论将化学空间划分为由组装指数定义的层次,从而可以对其随分子复杂度增加的增长设定界限。我们表明,化学空间(累积的组装指数水平集)至少以超指数方式增长,至多以双指数方式增长相对于组装指数。使用GDB-13数据库作为增长率估计的参考,我们模拟了化学空间如何在复杂度增加下扩张以及在结构约束(包括原子和键类型、环数、环大小和化学基序)下收缩。在类似于标准类药估计的约束下,包括分子质量低于500 Da,我们的分析得出在组装指数25时化学空间约为10^117个分子。最后,我们通过生物相关基序约束化学空间,并识别出这些组装定义空间的可访问边界附近的结构相关分子。

英文摘要

Chemical space is unimaginably vast with common heuristic estimates suggesting that there are ca. 10^60 'drug-like' molecules possible below a molecular mass of 500 Da. However, these estimates largely ignore the structural and synthetic complexity of the molecules enumerated. Here we present a first-principles estimate of the size of chemical space using the Assembly Theory, which quantifies the amount of causation required to form a molecule, captured in the assembly Index. This is a measurable molecular complexity measure derived from the minimum number of recursive bond-joining operations required to construct a molecular graph. Assembly Theory partitions chemical space into levels defined by Assembly Index, allowing bounds to be placed on its growth as molecular complexity increases. We show that chemical space (the accumulated Assembly Index level sets) grows at least super-exponentially, and at most, double-exponentially with respect to the Assembly Index. Using the GDB-13 database as a reference for growth-rate estimation, we model how chemical space expands under increasing complexity and contracts under structural constraints, including atom and bond types, number of rings, ring size, and chemical motifs. Under constraints comparable to standard drug-like estimates, including molecular mass below 500 Da, our analysis yields a chemical space of approximately 10117 molecules at Assembly Index 25. Finally, we constrain chemical space by biologically relevant motifs and identify structurally relevant molecules near the accessible boundaries of these assembly-defined spaces.

2511.10223 2026-06-11 math.PR q-bio.MN 版本更新

Stochastic Reaction Networks Within Interacting Compartments with Content-Dependent Fragmentation

具有内容依赖碎裂的相互作用隔室内的随机反应网络

David F. Anderson, Aidan S. Howells, Diego Rojas La Luz

AI总结 研究隔室碎裂速率依赖于内部指定物种丰度的随机反应网络模型,证明在内容依赖碎裂下原有爆炸性刻画失效,给出非爆炸性和正递归的新充分条件。

详情
Comments
25 pages; corrected a typo (present in all previous versions) in Step 3 of the proof of Proposition 3.12
AI中文摘要

具有质量作用动力学的随机反应网络为理解均匀环境中的过程(包括生化过程)提供了有用的框架。然而,细胞反应通常是区室化的,无论是在细胞水平还是在细胞内,因此是非均匀的。我们研究了一个区室化模型,其中区室的碎裂速率取决于该区室内某些指定物种的丰度。该特定研究模型是(Duso 和 Zechner, PNAS, 2020)提出的具有动态区室的区室化化学通用框架的一部分。本文建立在(Anderson 和 Howells, Bull. Math. Biol., 2023)的基础上,该文从数学上研究了区室动力学不依赖于其内容的特殊情况。特别地,我们证明了(Anderson 和 Howells, Bull. Math. Biol., 2023)中的爆炸性刻画在此设置下失效,并在底层CRN承认线性Lyapunov函数的假设下,提供了非爆炸性和正递归的新充分条件。这些结果扩展了建模内容介导的区室动力学的理论基础,对细胞分裂和细胞内运输等系统具有意义。

英文摘要

Stochastic reaction networks with mass-action kinetics provide a useful framework for understanding processes -- biochemical and otherwise -- in homogeneous environments. However, cellular reactions are often compartmentalized, either at the cell level or within cells, and hence non-homogeneous. We investigate a model of compartmentalization in which the rate of fragmentation of a compartment depends on the abundance of some designated species inside that compartment. The particular model of study is part of a general framework for compartmentalized chemistry with dynamic compartments that was proposed in (Duso and Zechner, PNAS, 2020). This paper builds on (Anderson and Howells, Bull. Math. Biol., 2023) where the special case where the compartment dynamics do not depend on their contents was studied mathematically. In particular, we demonstrate that the explosivity characterization from (Anderson and Howells, Bull. Math. Biol., 2023) fails in this setting and provide new sufficient conditions for non-explosivity and positive recurrence, under the assumption that the underlying CRN admits a linear Lyapunov function. These results extend the theoretical foundation for modeling content-mediated compartment dynamics, with implications for systems such as cell division and intracellular transport.

2604.25701 2026-06-11 physics.bio-ph physics.data-an q-bio.BM q-bio.MN q-bio.PE 版本更新

Bayesian Rate Inference for Sequence Motif Dynamics in Systems of Reactive Nucleic Acids

反应性核酸系统中序列基序动力学的贝叶斯速率推断

Johannes Harth-Kitzerow, Ulrich Gerland, Torsten A. Enßlin

AI总结 提出贝叶斯推断框架,从链反应器模拟的连接计数数据中推断基序速率方程参数,为匹配简化模型与复杂模拟提供方法,并迈向从实验数据直接推断反应速率常数。

详情
Comments
18 pages, 8 figures, pre-submission
AI中文摘要

RNA世界假说提出了生命在早期地球上出现的一条途径。它假设生命始于基于RNA的系统,能够存储、传递和复制信息,设想单体和短RNA寡聚体相互作用形成更长的链,最终成为具有催化活性的核酶。RNA池中的关键反应是杂交、去杂交、模板化连接和切割。这些反应依赖于许多环境参数以及相互作用链之间广泛可能的构型。为了扫描如此高维的参数空间,需要高效的描述。基序速率方程将复杂的链反应器动力学投影到序列基序空间。这里我们提出了一个贝叶斯推断框架,从链反应器模拟产生的连接计数数据中推断其参数。这提供了一个将更简单的基序速率方程与更复杂的模拟相匹配的框架。此外,这是朝着直接从实验数据推断反应速率常数(包括严格的 uncertainty 估计)迈出的一步。这可能是连接理论与实验、加深我们对生命出现所必需的基本特征理解的关键步骤。

英文摘要

The RNA world hypothesis suggests a pathway of how life emerged on early earth. It assumes that life started with RNA based systems, capable of storing, transmitting and replicating information, envisioning that monomers and short RNA oligomers interact to form longer strands, eventually becoming catalytically active ribozymes. Key reactions in RNA pools are hybridization, dehybridization, templated ligation, and cleavage. Those reactions depend on many environmental parameters and the wide range of possible configurations among interacting strands. In order to scan such high dimensional parameter spaces, efficient descriptions are needed. Motif rate equations project complex strand reactor dynamics onto sequence motif space. Here we present a Bayesian inference framework to infer their parameters from ligation count data produced by strand reactor simulations. This provides a framework to match the simpler motif rate equations to more complex simulations. Additionally, it is a step towards inferring reaction rate constants directly from experimental data, including rigorous uncertainty estimation. This could be an essential procedure to connect theory and experiment, and deepen our understanding of the essential features necessary for life to emerge.