arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.20127 2026-05-20 q-bio.NC cs.AI cs.LG

Beyond Prediction Accuracy: Target-Space Recovery Profiles for Evaluating Model-Brain Alignment

超越预测准确性:用于评估模型-大脑对齐的靶空间恢复曲线

Ken Nakamura, Tomoya Nakai, Ryuto Yashiro, Ayumu Yamashita, Kaoru Amano

AI总结 本文提出了一种评估模型-大脑对齐的新方法,通过分析可重复预测的靶空间响应维度,揭示预测准确性之外的模型-大脑对齐情况。

详情
Comments
34 pages, 12 figures, 5 tables
AI中文摘要

人工视觉模型通常通过测量其内部表示预测大脑响应的准确性来评估人类视觉皮层。然而,仅凭预测准确性无法确定目标大脑响应空间中哪些维度被恢复。本文介绍了一种统一框架,通过识别预测恢复的响应维度来评估模型-大脑和大脑-大脑对齐。通过重复fMRI测量,我们首先确定可在独立试验分割中重复预测的目标大脑响应维度。然后,我们预测目标大脑响应,无论是从另一个受试者的大脑响应还是视觉模型的内部表示,并量化这些可重复响应维度的恢复程度。将此框架应用于自然场景数据集的一个子集,其中八名受试者在fMRI下观看了相同的自然图像,我们发现早期到中期视觉皮层响应包含一组低维的可重复维度。大脑-大脑比较确定哪些维度可以从其他受试者的大脑中一致恢复,提供了一种诊断性的人类参考而非仅标量基准。在某些情况下,预训练和随机初始化的模型在预测准确性上相似,但这些响应维度的恢复曲线却不同。这些结果表明,仅凭预测准确性可能掩盖模型-大脑不匹配。通过明确哪些可重复的大脑响应维度被预测恢复,我们的框架提供了更诊断性的评估,以评估人工视觉模型与人类视觉皮层的对齐情况。

英文摘要

Artificial vision models are often evaluated against the human visual cortex by measuring how accurately their internal representations predict brain responses. However, prediction accuracy alone does not indicate which dimensions of the target brain's response space are recovered. Here, we introduce a unified framework for evaluating both model-brain and brain-brain alignment by identifying the response dimensions recovered by prediction. Using repeated fMRI measurements, we first identify target-brain response dimensions that can be reproducibly predicted across independent trial splits. We then predict target-brain responses from either another subject's brain responses or a vision model's internal representations, and quantify how strongly each of these reproducible response dimensions is recovered. Applying this framework to a subset of the Natural Scenes Dataset, in which eight subjects viewed the same natural images during fMRI, we find that the early-to-intermediate visual-cortex responses contain a low-dimensional set of reproducible dimensions. Brain-to-brain comparisons identify which of these dimensions are consistently recoverable from other subjects' brains, providing a diagnostic human reference rather than only a scalar benchmark. In some cases, pretrained and randomly initialized models achieve similar prediction accuracy while showing distinct recovery profiles across these response dimensions. These results show that prediction accuracy alone can mask model-brain mismatches. By making explicit which reproducible brain response dimensions are recovered by prediction, our framework provides a more diagnostic evaluation of alignment between artificial vision models and the human visual cortex.

2605.20103 2026-05-20 q-bio.PE

Face morphometric profiles of groups as early markers for certain diseases?

群体面部形态学特征作为某些疾病早期标志物?

Roberto Herrero, Yoanna Martinez-Diaz, Heydi Mendez-Vazquez, Joan Nieves, Augusto Gonzalez

AI总结 本研究通过分析古巴人口的面部形态学特征,探讨其作为阿尔茨海默病等多因素疾病的早期生物标志物的潜力。

详情
Journal ref
Int J Oral Craniofac Sci 9(2): 008-015 (2023)
AI中文摘要

背景:面部形态学已被证明在某些综合征的诊断中起作用。面部相似性通常表明更完整的遗传相似性。目的:展示对古巴人口面部形态学特征的初步结果,并论证其可用于定义某些疾病(如阿尔茨海默病)的早期标志物。方法:处理由200000名男性照片组成的数据库。通过DLIB库提取面部关键点,并计算它们之间的距离。通过聚类具有相似面部特征的样本,形成群体并计算其在人口中的密度。结果:获得两个年龄组的面部形态学特征,显示人口动态。与面部发育相关的基因被证明与阿尔茨海默病有关。结论:多因素疾病在个体的遗传背景中发展,这通过其面部形态学表现出来。因此,后者可以被视为风险标志物。

英文摘要

Background: Face morphometry has been shown to work as a diagnosis tool in a set of syndromes. Face similarities are usually indications of more complete genetic similarities. Purpose: To show preliminary results on the face morphometry profile of the Cuban population and to argue that it could be used to define early markers for diseases, like Alzheimer. Methods: A dataset composed of photos of 200000 men is processed. Facial landmarks are extracted by means of the DLIB library and distances between them are computed. By clustering samples with similar facial traits, groups are formed and their densities inside the population are computed. Results: The face morphometry profiles for two age cohorts are obtained, showing the population dynamics. Genes involved in facial development are shown to be related to Alzheimer's disease. Conclusions: Late multifactorial diseases develop against the genetic background of each individual, which is expressed by its face morphometry. The latter can be thus considered a risk marker.

2605.19962 2026-05-20 q-bio.PE cs.DM

Computing the Arc-Deletion Distance to Orchard Networks is NP-hard

计算到果园网络的弧删除距离是NP难的

Peng Li, Zhiwei Liu, Yangjing Long

AI总结 本文研究了计算到果园网络的弧删除距离问题,证明该问题属于NP难,通过多项式时间归约到度数3顶点覆盖问题,揭示了该近似度量的计算不可行性。

详情
Comments
20pages, 5 figures
AI中文摘要

系统发育网络通过允许水平基因转移和杂交等网状进化事件来扩展系统发育树。在众多系统发育网络的子类中,果园网络因其结构和算法特性而受到越来越多的关注。在本文中,我们研究了到果园网络的弧删除距离,定义为将系统发育网络转换为果园网络所需的最少网状弧数。我们通过多项式时间归约到度数3顶点覆盖问题,证明计算该距离是NP难的。我们的结果确立了该近似度量的计算不可行性,并为系统发育网络转换的复杂性理论做出了贡献。

英文摘要

Phylogenetic networks generalize phylogenetic trees by allowing reticulate evolutionary events such as horizontal gene transfer and hybridization. Among the many subclasses of phylogenetic networks, orchard networks have attracted increasing attention due to their structural and algorithmic properties. In this paper, we study the arc-deletion distance to orchard networks, defined as the minimum number of reticulate arcs whose deletion transforms a phylogenetic network into an orchard network. We prove that computing this distance is NP-hard via a polynomial-time reduction from the Degree-3 Vertex Cover problem. Our result establishes the computational intractability of this proximity measure and contributes to the complexity theory of phylogenetic network transformations.

2605.19902 2026-05-20 cs.LG q-bio.QM

Hierarchical Contrastive Learning for Multi-Domain Protein-Ligand Binding

多领域蛋白质-配体结合的分层对比学习

Shuo Zhang, Rongqi Hong, Huifeng Zhang, Jian K. Liu

AI总结 本研究提出HCLBind框架,通过分层对比学习方法,解决多领域蛋白质-配体结合亲和力预测问题,核心方法是分离几何表示学习与亲和力回归,并采用新颖的分层诱饵策略,结合领域门控图注意力网络和跨模态注意力,提升领域界面优先级,实验表明HCLBind能有效学习判别界面特征并提供鲁棒的不确定性估计。

详情
Comments
Accepted by ISBRA2026
AI中文摘要

预测多领域蛋白质-配体结合亲和力仍然面临挑战,因为领域间动态决定了分子识别。现有几何深度学习方法通常将蛋白质视为单一静态图,导致刚体假设和柔性区域的随机噪声问题。为此,我们引入HCLBind,一种自监督框架,将几何表示学习与亲和力回归分离。HCLBind在Q-BioLiP数据库上采用通用到特定的预训练范式,学习稳健的结合物理语法。我们提出了一种新颖的分层诱饵策略:模型通过单领域蛋白质坐标扰动学习局部物理化学约束,通过多领域复合物领域旋转学习全局构象几何。我们的混合架构集成了领域门控图注意力网络和跨模态注意力,以显式优先考虑领域界面。此外,我们采用LoRA对蛋白质和配体基础模型进行优化,确保高效优化的同时保留进化知识。在PDBBind上的实验表明,HCLBind有效学习了判别界面特征,并提供了鲁棒的不确定性估计,克服了标准监督学习的局限性。代码可在https://github.com/jiankliu/HCLBind获取。

英文摘要

Predicting protein-ligand binding affinity remains intractable for multi-domain proteins, where inter-domain dynamics govern molecular recognition. Existing geometric deep learning methods typically treat proteins as monolithic static graphs, suffering from rigid-body assumptions and aleatoric noise in flexible regions. To address this, we introduced HCLBind, a self-supervised framework that decouples geometric representation learning from affinity regression. HCLBind leverages a general-to-specific pre-training paradigm on the Q-BioLiP database to learn a robust physical grammar of binding. We propose a novel hierarchical decoy strategy: the model learns local physicochemical constraints through protein coordinate perturbation in single-domain proteins and global conformational geometry through inter-domain rotation in multi-domain complexes. Our hybrid architecture integrates a domain-gated graph attention network and cross-modal attention to explicitly prioritize domain interfaces. Furthermore, we employ LoRA on protein and ligand foundation models, ensuring efficient optimization while preserving evolutionary knowledge. Experiments on PDBBind demonstrate that HCLBind effectively learns discriminative interface features and provides robust uncertainty estimation, overcoming the limitations of standard supervised learning. The code is available at https://github.com/jiankliu/HCLBind.

2605.19816 2026-05-20 q-bio.NC

Performance of low vision individuals when selecting a target with head-pointing in virtual reality

低视力者在虚拟现实中通过头部指向选择目标的性能

Camille Bordeau, Célia Passerel, Ambre Denis-Noël, Jean-Baptiste Melmi, Marianne Vaugoyeau, Carlos Aguilar, Iliana Huyet, Caroline Topart, François Devin, Frédéric Matonti, Pierre Kornprobst, Eric Castet

AI总结 研究探讨了低视力者在虚拟现实环境中通过视觉引导进行指目标任务的能力,发现通过增大指针激活区直径可使低视力者的表现接近正常视力者水平。

详情
AI中文摘要

目的:通过心理物理方法研究低视力者(中央视觉场缺损,CFL)在虚拟现实环境中进行视觉引导指目标任务的能力。方法:CFL患者(n=25,年龄67-90岁)和正常视力对照组(n=26,年龄67-85岁)需用头部依赖的指针(6°直径十字线)选择一个2°直径的目标。当目标被有效指向1.5秒时进行目标选择。指针有效当目标位于不可见的指针激活区(PAZ)中心。通过增加PAZ直径从0.5°到8°来降低任务难度。通过测量选择目标所需时间评估表现。该任务还使用了三个同时显示的指针。结果:随着PAZ直径的增加,选择时间减少(患者从14.1秒,对照组从8.4秒),两者均达到相似的极限(1.4秒)。患者减少率较小,因此其最佳表现所需的PAZ直径远大于对照组(平均:3.48° vs 1.32°)。在三个指针条件下,两组都倾向于使用更接近目标的指针。结论:CFL患者能够通过头部指向选择2°目标。通过增加PAZ大小,其表现可以接近正常视力者的最佳表现。翻译相关性:这项研究提出了改进为低视力者设计的人机接口中视觉引导指工具可访问性的指南。

英文摘要

Purpose: To investigate psychophysically the ability of low vision individuals with central visual field loss (CFL) to perform a visually-guided pointing task in a virtual reality environment. Methods: Patients with CFL (n=25, ages = 67-90 years) and normally-sighted controls (n=26, ages = 67-85 years) had to select a target (2{\textdegree} diameter dot) with a head-contingent cursor (6{\textdegree} diameter reticle). Target selection occurred when target was validly pointed at for 1.5 seconds. Pointing was valid when target was inside an invisible pointer activation zone (PAZ) centered on reticle. Task difficulty was decreased by increasing PAZ diameter from 0.5{\textdegree} to 8{\textdegree}. Performance was assessed by measuring the time needed to select the target. The task was also performed with an array of three simultaneously-displayed cursors. Results: Selection times decreased (from 14.1 and 8.4 seconds for patients and controls respectively) with increasing PAZ diameter and reached a similar asymptote for both groups (1.4 seconds). The rate of this decrease was smaller for patients so that PAZ diameter needed for their best performance was much larger than PAZ diameter needed for controls' best performance (average: 3.48{\textdegree} vs 1.32{\textdegree}). In the three-reticle condition, both groups tended to use the cursor closer to the target. Conclusions: Patients with CFL are able to point at a 2{\textdegree} target thanks to head-pointing. Their performance can get close to controls' best performance by increasing PAZ size. Translational relevance: This research suggests guidelines to improve the accessibility of visually-guided pointing tools for human-machine interfaces designed for low vision individuals.

2605.19789 2026-05-20 q-bio.TO

Charting an embryological path to cancer cure: A discussion of disease hallmarks

绘制癌症治愈的胚胎学路径:对疾病特征的讨论

Jaime Cofre

AI总结 本文探讨将癌症视为胚胎学和进化现象的新视角,提出癌症与胚胎发育的相似性,并讨论其对治疗策略的启示。

详情
Comments
12 pages, 1 Figures, Keywords: Cancer; Embryology; Evolution; Metazoa; Neoplasia; Oncology
AI中文摘要

胚胎学长期以来在塑造我们对动物进化科学理解方面发挥了基础性作用。近年来,越来越多的证据也突显了其在癌症中的作用。尽管胚胎发育与癌症之间存在无可争议的相似性,但关于疾病在胚胎学层面的深远意义却鲜有讨论。本文探讨将癌症视为胚胎学和进化现象的理解,提供对疾病的新视角,并讨论在寻找治疗方案中的直接后果。

英文摘要

Embryology has long played a foundational role in shaping our scientific understanding of animal evolution. In recent decades, growing evidence has also highlighted its role in cancer. Despite the indisputable similarities between embryonic development and cancer, there has been limited discussion on the profound embryological implications for the disease. This article explores the understanding of cancer as an embryological and evolutionary phenomenon, offering a fresh perspective on the disease and discussing immediate consequences in the search for therapeutic approaches

2601.10397 2026-05-20 q-bio.NC

Reshaping Neural Representation via Associative, Presynaptic Short-Term Plasticity

通过联想、突触前短时塑性重塑神经表征

Genki Shimizu, Taro Toyoizumi

AI总结 该研究提出了一种基于信息论的联想短时突触塑性理论,通过扩展Fisher信息学习到Tsodyks-Markram突触,推导出基线权重和释放概率的学习规则,以在资源约束下最大化刺激信息,揭示释放概率可塑性作为快速可重构时间编码的原理基础。

详情
AI中文摘要

短时突触塑性(STP)通常被视为突触前对尖峰的过滤器,与突触后活动无关。然而,最近的实验表明存在一种依赖于前突触和后突触共同激活的联想STP。我们开发了一种规范性的、信息论的联想STP理论。通过将Fisher信息基于学习扩展到Tsodyks-Markram突触,我们推导出基线权重和释放概率的学习规则,以在资源约束下最大化刺激信息。这些规则分为一个跟踪局部放电的突触后项和一个相位提前的突触前项,后者能选择性地检测刺激起始。对于缓慢变化的输入,这种起始敏感性倾向于反因果连接,并在驱动和反向回放后增强响应结束。线性响应分析显示,STP产生频率依赖的相位选择性,释放概率约束调节时间不对称性。这些结果将释放概率可塑性识别为快速可重构时间编码的原理基础。

英文摘要

Short-term synaptic plasticity (STP) is often regarded as a presynaptic filter of spikes, independent of postsynaptic activity. Recent experiments, however, indicate an associative STP that depends on pre- and postsynaptic coactivation. We develop a normative, information-theoretic theory of associative STP. Extending Fisher-information-based learning to Tsodyks-Markram synapses, we derive learning rules for baseline weight and release probability that maximize stimulus information under resource constraints. The rules split into a postsynaptic term tracking local firing and a presynaptic, phase-advanced term that selectively detects stimulus onset. For slowly varying inputs, this onset sensitivity favors anti-causal connectivity and enhances response offset during drive and reverse replay after drive removal in recurrent circuits. Linear-response analysis shows that STP yields frequency-dependent phase selectivity and that release-probability constraints tune temporal asymmetry. These results identify release-probability plasticity as a principled substrate for rapidly reconfigurable temporal coding.

2508.02928 2026-05-20 q-bio.QM cs.NA math.DS math.NA

A Nonstandard Finite Difference Scheme for an SEIQR Epidemiological PDE Model

一种用于SEIQR流行病PDE模型的非标准有限差分方案

Achraf Zinihi, Matthias Ehrhardt, Moulay Rchid Sidi Ammi

AI总结 本文提出了一种非标准有限差分方法,用于反应扩散SEIQR流行病模型,该模型捕捉了传染病传播的空间时间动态。研究通过半线性抛物型偏微分方程系统进行建模,通过引入空间扩散来考虑人口迁移和空间异质性,从而扩展了传统 compartmental 模型。非标准有限差分离散化方法旨在保持连续模型的本质质量特征,如正性、有界性和稳定性,这些特征通常会被标准有限差分方法所破坏。本文严格分析了模型的适定性,构建了保持结构的非标准有限差分方案,并研究了其收敛性和局部截断误差。数值模拟验证了理论发现,并展示了该方案在保持生物一致动态方面的有效性。

详情
AI中文摘要

本文介绍了一种非标准有限差分(NSFD)方法用于反应扩散SEIQR流行病模型,该模型捕捉了传染病传播的空间时间动态。该模型以半线性抛物型偏微分方程(PDEs)系统形式提出,通过引入空间扩散来考虑人口迁移和空间异质性,从而扩展了传统 compartmental 模型。所提出的NSFD离散化方法旨在保持连续模型的本质质量特征,如正性、有界性和稳定性,这些特征通常会被标准有限差分方法所破坏。我们严格分析了模型的适定性,构建了保持结构的NSFD方案用于PDE系统,并研究了其收敛性和局部截断误差。数值模拟验证了理论发现,并展示了该方案在保持生物一致动态方面的有效性。

英文摘要

This paper introduces a nonstandard finite difference (NSFD) approach to a reaction-diffusion SEIQR epidemiological model, which captures the spatiotemporal dynamics of infectious disease transmission. Formulated as a system of semilinear parabolic partial differential equations (PDEs), the model extends classical compartmental models by incorporating spatial diffusion to account for population movement and spatial heterogeneity. The proposed NSFD discretization is designed to preserve the continuous model's essential qualitative features, such as positivity, boundedness, and stability, which are often compromised by standard finite difference methods. We rigorously analyze the model's well-posedness, construct a structure-preserving NSFD scheme for the PDE system, and study its convergence and local truncation error. Numerical simulations validate the theoretical findings and demonstrate the scheme's effectiveness in preserving biologically consistent dynamics.

2504.05454 2026-05-20 cs.LG cs.AI cs.CE q-bio.GN q-bio.QM

GraphPINE: Graph Importance Propagation for Interpretable Drug Response Prediction

GraphPINE: 图重要性传播用于可解释的药物反应预测

Yoshitaka Inoue, Tianfan Fu, Augustin Luna

AI总结 本文提出GraphPINE,一种利用领域特定先验知识初始化节点重要性的图神经网络架构,以提高药物反应预测的可解释性。通过引入重要性传播层,统一更新特征矩阵和节点重要性,并利用基于GNN的图传播来传播特征值,从而实现更有效的特征学习和图表示。

详情
AI中文摘要

可解释性对于生物医学研究中的许多任务都是必要的。最近的可解释性方法集中在注意力、梯度和Shapley值上。这些方法无法处理具有强相关先验知识的数据,并且未能基于已知的预测特征之间的关系来约束可解释性结果。我们提出了GraphPINE,一种图神经网络(GNN)架构,利用领域特定的先验知识来初始化节点重要性,以便在训练过程中优化用于药物反应预测。通常,一个手动的后预测步骤会检查文献(即先验知识)以理解返回的预测特征。虽然梯度和注意力在预测后可以获取节点重要性,但这些方法的节点重要性缺乏互补的先验知识;GraphPINE旨在克服这一限制。GraphPINE与其他GNN门控方法的不同之处在于利用了类似LSTM的顺序格式。我们引入了一个重要性传播层,统一了1)特征矩阵和节点重要性的更新以及2)使用基于GNN的图传播来传播特征值。这种初始化和更新机制使得特征学习更加有据可依,并提高了图表示的质量。我们应用GraphPINE进行癌症药物反应预测,使用了超过5000个基因节点的药物筛选和基因数据,这些节点包含在基因-基因图中,并利用药物-靶点相互作用(DTI)图进行初始重要性。基因-基因图和DTI来自经过整理的来源,并通过讨论药物和基因之间关系的文章数量进行加权。GraphPINE在952种药物上实现了PR-AUC为0.894和ROC-AUC为0.796。代码可在https://anonymous.4open.science/r/GraphPINE-40DE获取。

英文摘要

Explainability is necessary for many tasks in biomedical research. Recent explainability methods have focused on attention, gradient, and Shapley value. These do not handle data with strong associated prior knowledge and fail to constrain explainability results based on known relationships between predictive features. We propose GraphPINE, a graph neural network (GNN) architecture leveraging domain-specific prior knowledge to initialize node importance optimized during training for drug response prediction. Typically, a manual post-prediction step examines literature (i.e., prior knowledge) to understand returned predictive features. While node importance can be obtained for gradient and attention after prediction, node importance from these methods lacks complementary prior knowledge; GraphPINE seeks to overcome this limitation. GraphPINE differs from other GNN gating methods by utilizing an LSTM-like sequential format. We introduce an importance propagation layer that unifies 1) updates for feature matrix and node importance and 2) uses GNN-based graph propagation of feature values. This initialization and updating mechanism allows for informed feature learning and improved graph representation. We apply GraphPINE to cancer drug response prediction using drug screening and gene data collected for over 5,000 gene nodes included in a gene-gene graph with a drug-target interaction (DTI) graph for initial importance. The gene-gene graph and DTIs were obtained from curated sources and weighted by article count discussing relationships between drugs and genes. GraphPINE achieves a PR-AUC of 0.894 and ROC-AUC of 0.796 across 952 drugs. Code is available at https://anonymous.4open.science/r/GraphPINE-40DE.

2605.19677 2026-05-20 cs.LG q-bio.QM

Agentic Discovery of Cryomicroneedle Formulations

代理发现冷冻微针制剂配方

Hao Li, Lifu Du, Nurul Hameed, Shemonti Saha Authai, Zlata Stefanovic, Chenjie Xu

AI总结 本研究提出了一种结合文献整理、高斯过程代理建模、贝叶斯优化和顺序湿实验验证的闭环工作流程,用于发现冷冻微针的冷冻保护剂配方,通过迭代湿实验验证提高了配方的准确性和有效性。

详情
AI中文摘要

冷冻微针提供了一种微创的皮下递送活细胞的途径,但其低温保存配方必须在保护细胞和限制毒性和设备制造约束之间取得平衡。本文报告了一种由AI辅助的闭环工作流程,用于冷冻微针冷冻保护剂的发现,结合了文献整理、高斯过程代理建模、贝叶斯优化和顺序湿实验验证。一个包含198种骨髓干细胞冷冻保存配方的curated数据集(来自42项研究)被转换为21种成分特征,并用于训练一个不确定性的文献先验模型。该模型捕捉了文献数据中的中等结构,但前瞻性地失败了,促使进行迭代的湿实验修正。在十次验证迭代和106次湿实验观察中,模型逐步适应了冷冻微针特定的结果:批次RMSE从41.21个百分点降低到6.86个百分点,后期阶段的排名相关性变得一致为正,累积的湿实验预测与测量总结达到了R²=0.942。最佳验证配方实现了95.15%的复苏存活率,同时具有低DMSO、ectoin、乙二醇和胎牛血清含量。然而,高存活率本身并不保证冷冻微针的完整形成,突显了未来多目标优化的必要性。这些结果表明,代理辅助的计算基础设施可以使数据高效的配方发现对拥有少量内部数据专业知识的实验室更加可及。项目代码可在https://github.com/baitmeister/ML-for-CryoMN上获得。

英文摘要

Cryomicroneedles offer a route to minimally invasive intradermal delivery of living cells, but their cryogenic formulations must reconcile cell protection with constraints on toxicity and device fabrication. Here we report an AI-assisted, closed-loop workflow for cryomicroneedle cryoprotectant discovery that combines literature curation, Gaussian-process surrogate modelling, Bayesian optimization, and sequential wet-lab validation. A curated dataset of 198 mesenchymal stem-cell cryopreservation formulations from 42 studies was converted into 21 ingredient features and used to train an uncertainty-aware literature prior. This model captured moderate structure in the literature data but failed prospectively, motivating iterative wet-lab correction. Across ten validation iterations and 106 wet-lab observations, the model progressively adapted to cryomicroneedle-specific outcomes: batch RMSE decreased from 41.21 to 6.86 percentage points, later-stage rank correlations became consistently positive, and the cumulative wet-lab predicted-versus-measured summary reached $R^2 = 0.942$. The best validated formulation achieved 95.15\% post-thaw viability with low DMSO, ectoin, ethylene glycol, and fetal bovine serum. However, high viability alone did not ensure intact cryomicroneedle formation, highlighting the need for future multi-objective optimization. These results demonstrate that agent-assisted computational infrastructure can make data-efficient formulation discovery more accessible to labs with minimal data expertise in-house. Project code is available at https://github.com/baitmeister/ML-for-CryoMN.

2605.19646 2026-05-20 q-bio.NC cs.LG

BCI-sift: An automated feature selection toolbox for Brain Computer Interface applications

BCI-sift: 一种用于脑机接口应用的自动化特征选择工具箱

Elena C Offenberg, Dirk Keller, Mariska J Vansteensel, Zachary V Freudenburg, Nick F Ramsey, Julia Berezutskaya

AI总结 本文提出BCI-sift工具箱,通过整合先进优化方法,为脑机接口任务提供自动化特征选择解决方案,提升了分类准确性和解释性。

详情
Comments
19 pages, 12 figures
AI中文摘要

在临床脑机接口(BCI)领域的发展依赖于精确且可靠的信号解释。然而,来自植入式和非植入式BCI采集的数据具有高维性和噪声特性,这带来了重大挑战,推动了特征选择算法的应用。我们引入了BCI-sift(BCI系统性和可解释性特征调节),一种基于Python的工具箱,旨在简化将各种优化算法应用于BCI数据集以识别机器学习任务中最相关的特征。我们的scikit-learn兼容工具箱(github.com/UMCU-RIBS/BCI-sift)通过整合先进的优化方法简化了BCI任务中的特征选择。我们验证了该工具箱在8名健康受试者(64-128个电极植入在运动皮层上)的高密度电极图(HD ECoG)数据上的性能,这些受试者重复说出12个单词。BCI-sift在电极、时间及频率维度上识别了信息丰富的神经特征。电极选择的解剖位置在不同受试者之间一致,并与已知的运动皮层功能组织一致。相关时间点集中在说话产生周围,高频带被识别为最信息丰富的,这与先前工作一致。特征选择比使用所有特征提高了分类准确性。BCI-sift提供了一个易于使用的多功能平台,用于BCI研究中的特征选择,能够提高解码性能、自动化特征分析和增强解释性。虽然验证了HD ECoG数据,该方法广泛适用于其他BCI模态。通过提高分类准确性和可解释性,BCI-sift解决了开发高效和透明BCI系统的关键挑战。

英文摘要

Advancements in clinical Brain-Computer Interfaces (BCIs) depend on precise and reliable signal interpretation. However, the high-dimensional and noisy nature of data captured from both implanted and non-implanted BCIs poses significant challenges, motivating the use of feature selection algorithms. We introduce BCI-sift (BCI Systematic and Interpretable Feature Tuning), a Python-based toolbox designed to streamline the application of diverse optimization algorithms to BCI datasets for identifying the most relevant features in machine learning tasks. Our scikit-learn-compatible toolbox (github.com/UMCU-RIBS/BCI-sift) simplifies feature selection in BCI tasks by integrating advanced optimization methods. We validated the toolbox on high-density electrocorticography (HD ECoG) data from eight able-bodied participants with 64-128 electrodes implanted over the sensorimotor cortex, who repeatedly spoke 12 words. BCI-sift identified informative neural features across electrode, temporal, and frequency dimensions. The anatomical locations of electrode selections were consistent across participants and aligned with known functional organization of the sensorimotor cortex. Relevant time points clustered around speech production, and the high-frequency band was identified as most informative, in line with prior work. Feature selection improved classification accuracy compared to using all features. BCI-sift provides an accessible and versatile platform for feature selection in BCI research, enabling improved decoding performance, automated feature analysis, and enhanced interpretability. While validated on HD ECoG data, the approach is broadly applicable to other BCI modalities. By enhancing classification accuracy and interpretability, BCI-sift addresses key challenges in developing efficient and transparent BCI systems.

2605.19352 2026-05-20 q-bio.NC cs.AI cs.LG

Brain alignment of reasoning and action representations from vision-language and action models during naturalistic gameplay

在自然主义游戏过程中,视觉语言和动作模型的推理与动作表示的脑部对齐

Subba Reddy Oota, Anant Khandelwal, Khushbu Pahwa, Satya Sai Srinath Namburi, Tanmoy Chakraborty, Bapi S. Raju, Manish Gupta

AI总结 本文研究了在自然主义游戏过程中,视觉语言模型和大动作模型的推理与动作表示在脑部活动中的对齐情况,发现动作聚焦和推理聚焦的提示影响模型内部表示与fMRI脑活动的对齐程度。

详情
Comments
21 pages, 11 figures
AI中文摘要

理解人类和人工智能系统如何通过与环境互动来预测和规划是一个在神经科学和机器学习交汇处的基本挑战。大多数脑编码研究集中在将人工模型与大脑活动对齐,特别是在语言理解和被动视觉处理期间,而交互式脑对齐研究迄今为止大多局限于强化学习(RL)代理和理论模型。为了解决这一差距,我们使用fMRI记录参与者玩自然主义的Atari风格视频游戏,研究了来自两个基础模型家族(即视觉语言模型(VLMs)和大动作模型(LAMs))的代表性模型的脑部对齐情况。具体而言,我们研究了动作聚焦和推理聚焦的提示如何影响模型的内部表示并与其fMRI脑活动对齐。首先,我们发现VLMs和LAMs在每个体素编码性能上显著优于RL基线,即使在匹配的特征维度下,优势依然存在。其次,提示驱动的增益与皮层处理层次结构成比例:最大的改进出现在前额叶和运动规划区域,而早期视觉皮层的增益大约只有后者的二分之一。第三,方差分区揭示了不同的表征组织:VLM是提示对称的(12.5%独特的动作vs.13.6%独特的推理),而LAM是提示不对称的(27%独特的动作vs.-5%独特的推理),不对称性在前额运动皮层最强。总的来说,这些结果表明,即使在全脑预测准确性在统计上相等的情况下,动作专门化的微调也会将多模态表示重新组织到与动作相关的神经计算中。

英文摘要

Understanding how humans and artificial intelligence systems predict and plan by interacting with their environment is a fundamental challenge at the intersection of neuroscience and machine learning. Most brain-encoding studies focus on aligning artificial models with brain activity during language comprehension or passive visual processing, while interactive brain-alignment studies have to date been largely limited to reinforcement-learning (RL) agents and theory-based models. To address this gap, we study brain alignment of representative models from two foundation-model families, namely vision-language models (VLMs) and large-action models (LAMs), using fMRI recordings from participants playing naturalistic Atari-style video games. Specifically, we examine how action-focused and reasoning-focused prompts shape model's internal representations and align with fMRI brain activity. First, we find that both VLMs and LAMs exhibit significantly exhibit voxel-wise encoding performance than RL baselines, with the advantage holding even under matched feature dimensionality. Second, prompt-driven gains scale with the cortical processing hierarchy: the largest improvements appear in frontal-parietal and motor-planning regions, while early visual cortex gains roughly half as much. Third, variance partitioning reveals a qualitatively different representational organization: VLM is prompt-symmetric (12.5% unique action vs. 13.6% unique reasoning), whereas LAM is prompt-asymmetric (27% unique action vs. -5% unique reasoning), with the asymmetry strongest in frontal-motor cortex. Together, these results demonstrate that action-specialized fine-tuning reorganizes multimodal representations toward action-relevant neural computations even when whole-brain prediction accuracy is statistically equivalent between VLM and LAM.

2605.19333 2026-05-20 q-bio.BM q-bio.PE

Deep-time consistency in proteome elemental composition across cellular and viral life

细胞和病毒生命中蛋白质元素组成的时间深度一致性

L. Felipe Benites, Louie Slocombe, Sara I. Walker

AI总结 研究探讨了细胞和病毒生命中蛋白质元素组成的时间深度一致性,发现尽管存在进化分歧和蛋白质组大小和基因含量的巨大差异,蛋白质组的元素组成却表现出显著的一致性,这表明共同的生化约束塑造了所有生命形式的蛋白质组结构。

详情
AI中文摘要

蛋白质由约20种氨基酸组成的有限字母表构建,但其起源和选择仍然未解。一个被忽视的方面是元素组成是否限制了可行的蛋白质组范围。在这里,我们分析了数千个涵盖细胞领域和病毒领域的蛋白质组的元素组成。尽管存在进化分歧和蛋白质组大小和基因含量的数个数量级差异,蛋白质组却表现出显著一致的元素组成。这种一致性比氨基酸频率或物理化学性质更为严格,并且不能由进化相关性、生物功能或氨基酸使用单独解释。病毒蛋白质组占据与细胞生物体相同元素组成空间,尽管没有单一的病毒共同祖先,这表明共同的生化约束塑造了所有生命形式的蛋白质组结构。为了研究这种模式的进化起源,我们比较了现代蛋白质组与多个独立的最后普遍共同祖先(LUCA)重建以及由原始氨基酸字母表生成的合成减少字母蛋白质组。LUCA蛋白质组占据与现代细菌和古菌观察到的受约束的元素组成空间相同,而减少的原始字母表系统生成了现代范围之外的元素制度,尽管保留了与现存蛋白质的高序列相似性。减少的字母表破坏了折叠空间并重组了元素组成与预测蛋白质结构组织之间的关系。我们的结果表明,受约束的元素组成代表了蛋白质组的基本组织属性,它在进化早期出现,并可能对现代氨基酸字母表的选择和稳定起作用。

英文摘要

Proteins are constructed from a limited alphabet of ~20 amino acids, yet the origins and selection of this specific alphabet are unresolved. One largely overlooked aspect is whether elemental composition constrains the range of viable proteomes. Here, we analyze the elemental composition of thousands of proteomes spanning cellular domains and viral realms. Despite evolutionary divergence and orders-of-magnitude variation in proteome size and gene content, proteomes exhibit strikingly consistent elemental composition. This consistency is substantially more constrained than amino acid frequencies or physicochemical properties and is not explained by evolutionary relatedness, biological function, or amino acid usage alone. Viral proteomes occupy the same elemental composition space observed in cellular organisms despite the absence of a single viral common ancestor, suggesting common biochemical constraints shape proteome organization across life. To investigate the evolutionary origins of this pattern, we compare modern proteomes with multiple independent reconstructions of the Last Universal Common Ancestor (LUCA) and with synthetic reduced-alphabet proteomes generated from primordial amino acid alphabets. LUCA proteomes occupy the same constrained elemental composition space observed in modern Bacteria and Archaea, whereas reduced primordial-like alphabets systematically generated alternative elemental regimes outside the modern range despite retaining high sequence similarity to extant proteins. Reduced alphabets disrupt fold space and reorganize relationships between elemental composition and predicted protein structural organization. Our results suggest that constrained elemental composition represents a fundamental organizational property of proteomes, which emerged early in evolution and may have contributed to the selection and stabilization of the modern amino acid alphabet.

2605.19252 2026-05-20 q-bio.BM astro-ph.EP q-bio.MN

Elemental Stoichiometry as an Ecological Biosignature with Applications to Life Detection

元素化学计量学作为生态生物标记物及其在生命检测中的应用

Pilar C. Vergeli, Cole Mathis, John F. Malloy, L. Felipe Benites, Christopher P. Kempes, Elizabeth Trembath-Reichert, Hilairy E. Hartnett, Sara I. Walker

AI总结 本文提出通过元素化学计量学分析化学空间中的元素组成统计结构,以识别生命活动的生态生物标记物,并展示了其在行星科学任务中区分生物与非生物化学特征的应用潜力。

详情
AI中文摘要

可能的小分子化学空间的广阔程度估计为10^60种化合物,仅由C、N、O和S组成,但生物活动仅稀疏地占据其中一部分。我们提出,生命在该空间中选择分子构成可检测的生态签名:不是特定化合物的指纹,而是生态系统的分子元素组成统计结构的指纹。本文介绍了一种结合Van Krevelen图和元素缩放定律的框架,用于表征生物系统占据的化学空间区域,并将其与其他化学系统进行对比。通过对11,834个微生物组测基因组样本应用此框架,我们发现微生物代谢占据一个富含P、S、N和O等杂原子的化学空间区域,相对于C,其O:C和H:C比值较高。我们观察到元素缩放与系统规模呈亚线性关系,揭示了元素约束如何影响生物系统在化学空间中的分布。这些模式与来自综合Reaxys合成化学数据库的18,000种化合物样本不同。关键的是,来自行星科学任务数据的分子数据集在统计上与地球生物和Reaxys分布显著不同,表明通过标准化的数据收集方法,该方法可以发展为区分生物与非生物化学特征的工具。我们的工作展示了Van Krevelen指纹和元素缩放定律的结合如何提供一种新的生态生物标记物,用于利用行星任务的质谱数据检测生命,这种方法可能超越地球特定的生化过程。

英文摘要

The vast chemical space of possible small molecules, estimated at 10^60 compounds for molecules composed of just C, N, O, and S, is only sparsely occupied by biology. We propose that where life selects molecules within this space constitutes a detectable ecological signature: a fingerprint not of specific compounds, but of the statistical structure of elemental composition across molecules sam-pled from ecological systems. Here we introduce a framework combining Van Krevelen diagrams and element scaling laws to characterize the elemental composition of regions of chemical space occupied by biological systems and contrast them with other chemical systems. Applying this framework to 11,834 microbial metagenomic samples, we show that microbial metabolisms occupy a region of chemical space, which is enriched in heteroatoms such as P, S, N, and O relative to C, shifted toward higher O:C and H:C ratios. We observe sublinear element scaling with system size, yielding insights into how elemental constraints dictate how biological systems occupy chemical space. These patterns are distinct from a sample of 18,000 compounds from the comprehensive Reaxys synthetic chemical database. Critically, datasets from molecules detected in planetary science mission data occupy statistically distinct regions from both terrestrial biological and Reaxys distributions, demonstrating that with standardized methods for data collection, the approach could be developed to discriminate biotic from abiotic chemical signatures in small molecule data from planetary science missions. Our work shows how a combination of Van Krevelen fingerprinting and elemental scaling laws can provide a new class of ecological biosignatures for life detection leveraging mass spectrometric data from planetary missions, which could generalize beyond Earth's specific biochemistry.

2605.19071 2026-05-20 q-bio.GN cond-mat.stat-mech q-bio.MN q-bio.QM

Informational blueprints reveal condition-dependent gene regulatory architectures

信息蓝图揭示条件依赖的基因调控架构

Doruk Efe Gökmen, Rosalind Wenshan Pan, Tom Röschinger, Stephen Quake, Hernan Garcia, Rob Phillips, Vincenzo Vitelli

AI总结 本研究提出了一种信息蓝图算法,通过压缩全局信息来识别在特定环境条件下活跃的转录因子结合位点,展示了其在大尺度生长条件下的应用价值。

详情
AI中文摘要

尽管基因组中的编码区域可以直接解释为蛋白质产物,但显著比例的非编码区域仍控制着基本的生物学功能。与遗传密码不同,没有“查找表”能标识出转录因子(TFs)的结合位置。在此,我们通过将核苷酸序列压缩为集体坐标(超字母)来提取这些结合位点,这些超字母代表在特定环境条件下活跃的结合位点。超越单个碱基与表达水平之间的局部信息足迹,我们的信息蓝图算法通过优化同时扫描整个启动子序列的过滤器来压缩全局信息。受重整化群技术启发,我们将TF结合位点作为粗粒变量识别,结合具有最高集体影响的相互关联突变组。我们在大肠杆菌的实验数据上验证了我们的方法,并发现了新的调控元件,展示了其在不同生长条件下的广泛应用。

英文摘要

While coding regions in the genome have a direct interpretation in terms of protein products, significant fractions are non-coding and yet control essential biological functions. Unlike the genetic code, there is no "lookup table" that identifies where regulatory proteins, known as transcription factors (TFs), bind. Here, we extract these binding sites by distilling sequences of nucleotide letters into collective coordinates (hyperletters) representing the binding sites that are active under specific environmental conditions. Going beyond local information footprints between individual bases and expression levels, our $\textit{information blueprint}$ algorithm compresses the global information by optimising filters that simultaneously scan an entire promoter sequence. Inspired by renormalisation-group techniques, we identify TF binding sites as coarse-grained variables combining groups of correlated mutations with the highest collective impact on gene expression. We validate our approach on experimental data for $\textit{E. coli}$ and discover novel regulatory elements illustrating its deployment at scale across growth conditions.

2605.19050 2026-05-20 cs.LG physics.chem-ph q-bio.QM

Generative Pseudo-Force Fields for Molecular Generation

生成伪力场用于分子生成

Stefaan Simon Pierre Hessmann, Khaled Kahouli, Stefan Gugler, Michael Plainer, Frank Noé, Klaus-Robert Müller, Niklas Wolf Andreas Gebauer

AI总结 本文提出生成伪力场(GPFFs)以解决分子生成中能量基放松与数据驱动生成模型采样效率之间的权衡问题,通过训练MLFF在参考平衡结构上的二次伪势能面上实现高效且稳定的分子构象生成。

详情
AI中文摘要

生成稳定的分子构象通常需要在基于物理的能量放松的物理真实性和数据驱动生成模型的采样效率之间做出权衡。虽然机器学习力场(MLFFs)可以通过根据物理力放松分子几何结构来采样稳定的构象,但它们需要昂贵的从头计算训练数据。相反,扩散模型(DMs)仅从平衡数据学习,但依赖于噪声调度和时间步长条件。在本文中,我们提出生成伪力场(GPFFs)以弥合这些范式,通过在参考平衡结构上的二次伪势能面上训练MLFF。由于不需要对扰动几何进行从头计算,非平衡训练数据可以通过对平衡结构添加高斯噪声实时生成。我们证明GPFFs是方差爆炸扩散模型的时间步长无关变种:分数来自预测的伪力,但力的大小隐含地编码了噪声水平,因此不需要时间步长条件。我们的GPFF因此可以作为标准扩散采样(祖先、Heun)中的直接替换,也可以促进更高效、自适应的变种和一个受MLFF启发的直接去噪方案。我们提出的采样算法支持任意的结构先验和几何约束。在QM9数据集上,GPFF在256个神经函数评估(NFE)时有100%的有效性,在仅6个NFE时超过50%,优于所有扩散基线。结合自定义先验,我们在分子编辑器中展示了我们的方法在药物设计设置中的快速和准确的生成过程,其中分子在实时中生成。

英文摘要

Generating stable molecular conformations typically forces a tradeoff between the physical realism of energy-based relaxation and the sampling efficiency of data-driven generative models. While machine learning force fields (MLFFs) can sample stable conformations by relaxing molecular geometries according to physical forces, they require costly ab-initio training data. Conversely, diffusion models (DMs) learn from equilibrium data alone but are dependent on noise schedules and time-step conditioning. In this work, we propose generative pseudo-force fields (GPFFs) to bridge these paradigms by training an MLFF on a quadratic pseudo-potential energy surface relative to reference equilibrium structures. Because no ab-initio calculations are required for the perturbed geometries, non-equilibrium training data can be generated on the fly by perturbing the equilibria with Gaussian noise. We show that GPFFs constitute a time-step-agnostic variant of variance exploding DMs: the score comes from the predicted pseudo-forces but because force magnitudes implicitly encode the noise level, no time-step conditioning is needed. Our GPFF can hence be used as a drop-in replacement in standard diffusion sampling (ancestral, Heun) but also facilitates more efficient, adaptive variants and an MLFF inspired direct denoising scheme. Our proposed sampling algorithms support arbitrary structural priors and geometric constraints. On QM9, GPFF has 100 % validity at 256 neural function evaluations (NFE) and over 50 % at just 6 NFE, outperforming diffusion baselines across all samplers. Combined with custom priors, we showcase the fast and accurate generation process of our method in a molecular editor for a drug design setting, where a molecule is generated in real time.

2605.19048 2026-05-20 q-bio.NC

Conserved Kinematic Representations enable Zero-Shot Decoding in Handwriting BCIs

保守的运动表示使手写BCI实现零样本解码

Srinivas Ravishankar, Virginia de Sa

AI总结 本研究提出了一种基于保守运动表示的框架,用于在手写BCI中实现零样本解码,展示了神经运动轨迹表示在不同字符上下文中的鲁棒性,为复杂运动控制提供了新的研究视角。

详情
AI中文摘要

尽管侵入性脑机接口(iBCIs)在解码想象的手写文本方面已实现了高通信速率,但它们依赖于在训练过程中观察每个字母,这在需要数千个字符类别的表意文字(如中文、日文)中提出了挑战。这一限制突显了运动神经科学中的一个基本问题:运动皮层是否通过共享的运动学基本单元的组合来表示手写?我们提出了一种计算框架,用于将神经活动对齐到大数据集中的想象运动学,从而训练出能够解码未见过的字符的零样本机器学习算法。我们的模型在未见过的字母上实现了64%的hits@3检索率,表明神经运动轨迹的表示在不同字符上下文中具有鲁棒性。本研究为在大规模侵入性数据集中解析保守的神经动态提供了框架,并提供了复杂运动控制的组合基础的有力证据。此外,它还建立了一种新的开放词汇iBCI通信范式,对用户来说校准负担很小,这对在表意文字中增加神经假体的采用至关重要。

英文摘要

While intracortical Brain-Computer Interfaces (iBCIs) that decode imagined handwriting have achieved high communication rates for Latin scripts, they rely on observing every character in the alphabet during training. This poses a challenge in scaling to logographic languages (e.g., Chinese, Japanese), where the character set exceeds thousands of classes. The limitation highlights a fundamental question in motor neuroscience: does the motor cortex represent handwriting through the composition of shared kinematic primitives, that can be exploited by decoders? We introduce a computational framework for aligning neural activity to imagined kinematics in large datasets, enabling the training of a zero-shot capable machine learning algorithm for decoding unseen characters. Our model achieves 64% hits@3 retrieval on unseen letters, suggesting that neural representations of kinematic strokes are robustly conserved across different character contexts. This study provides a framework for dissecting conserved neural dynamics in large-scale intracortical datasets and offers strong evidence for a compositional basis of complex motor control. It also establishes a new paradigm for open-vocabulary iBCI communication with minimal recalibration burden on the user, crucial to increasing adoption of neuroprosthetics in logographic languages.

2605.18616 2026-05-20 physics.soc-ph cs.GT q-bio.NC

Toward an Origin of Human Randomness: Interaction-Driven Enhancement in the Rock-Paper-Scissors Game

向人类随机性的起源:在石头-剪刀-布游戏中由交互驱动的增强

Song-Ju Kim, Shoma Ohara, Hiroaki Kurokawa

AI总结 本研究探讨了人类生成的随机性如何受到认知、运动和战略偏见的限制,并通过与他人互动来改变这些限制,通过分析9名参与者重复的石头-剪刀-布游戏数据,发现交互可以影响人类行为的复杂性。

详情
Comments
30 pages, 7 figures
AI中文摘要

人类生成的随机性受到认知、运动和战略偏见的限制。本研究探讨这些限制如何出现在个体行为中,以及通过与另一人互动如何改变。我们分析了9名参与者的重复石头-剪刀-布游戏数据,得到108个人与人对战和216个个体玩家序列。利用Lempel-Ziv复杂度(LZC),我们比较了人与人序列与RNG对手条件。在RNG对手条件下,人类LZC最大值为84,作为经验参考。在人与人条件下,大多数序列仍低于此值,但一小部分超过了它,产生了一个小的高复杂度尾部,这在RNG对手条件下不存在。我们引入了一种敏感度度量,捕捉玩家是否通过选择击败对手最近最频繁移动的移动来回应对手最近的频率偏见。部分回归显示,焦点玩家敏感度正预测对手移动序列未来熵值,在控制对手当前熵值后。环移替代分析表明,这种关系在对手处于低熵状态时最为明显,此时最近的移动分布包含清晰的频率偏见。这些结果表明,人类随机性不仅是孤立的个体能力,还可以通过交互在状态依赖的方式下塑造。研究确定了交互如何破坏偏见行为并增加熵的局部机制,为未来因果实验和高复杂度人类行为生成模型提供了具体基础。

英文摘要

Human-generated randomness is constrained by cognitive, motor, and strategic biases. This study examines how these constraints appear in individual behavior and how they may be modified through interaction with another human. We analyzed repeated rock-paper-scissors data from 9 participants, yielding 108 human-human matches and 216 individual player sequences. Using Lempel-Ziv complexity (LZC), we compared human-human sequences with the RNG-opponent condition. In the RNG-opponent condition, the maximum human LZC value was 84, which we used as an empirical reference. In the human-human condition, most sequences remained below this value, but a small number exceeded it, producing a small high-complexity tail that was not present in the RNG-opponent condition. We introduced a sensitivity measure that captures whether a player responds to the opponent's recent frequency bias by choosing the move that beats the opponent's most frequent recent move. Partial regression showed that focal-player sensitivity positively predicted future entropy in the opponent's move sequence after controlling for the opponent's current entropy. Circular-shift surrogate analyses indicated that this relation was most clearly interaction-specific when the opponent was in a low-entropy state, where the recent move distribution contained a clear frequency bias. These results suggest that human randomness is not only an isolated individual capacity, but can be shaped by interaction in a state-dependent manner. The findings identify a local mechanism by which interaction may destabilize biased behavior and increase entropy, providing a concrete basis for future causal experiments and generative models of high-complexity human behavior.

2605.08659 2026-05-20 cs.CE q-bio.BM

Pushing Biomolecular Utility-Diversity Frontiers with Supergroup Relative Policy Optimization

通过超组相对策略优化推动生物分子效用-多样性前沿

Xinwu Ye, He Cao, Hao Li, Bin Feng, Zijing Liu, Xiangru Tang, Yu Li, Shenghua Gao

AI总结 本文提出了一种灵活的GRPO风格框架SGRPO,通过直接构造集级别多样性奖励来维持生成多样性,从而在小分子设计、口袋基小分子设计和原生蛋白设计中扩展效用-多样性帕累托前沿。

详情
AI中文摘要

生物分子生成器通常通过奖励反馈进行适应以提高任务特定的效用,但仅推动效用会导致生成集中在狭窄的候选家族中。维持多样性困难,因为样本多样性是集合层面的属性。我们引入了超组相对策略优化(SGRPO),一种灵活的GRPO风格框架,直接从集合层面的多样性构建奖励。对于每个条件,SGRPO采样候选集合的超组,比较其在相同条件下的多样性,并通过留一法多样性贡献将组多样性奖励重新分配给个体轨迹,然后将其与轨迹层面的效用结合。这种设计使SGRPO与特定生成器、效用奖励或多样性度量解耦,并允许使用不同的GRPO风格方法进行实例化。我们在从头小分子设计、基于口袋的小分子设计和从头蛋白质设计中评估SGRPO,使用GRPO和耦合GRPO在自回归和离散扩散生成器上进行实例化。在解码扫描中,SGRPO扩展了效用-多样性帕累托前沿,并在预训练生成器、GRPO和记忆辅助GRPO适用时实现了最佳前沿级指标。我们的分析进一步表明,直接的集合层面多样性奖励在小组中仍有效,并有助于在训练后保持更广泛的生成分布覆盖。代码可在https://github.com/IDEA-XL/SGRPO上获得。

英文摘要

Biomolecular generators are often adapted with reward feedback to improve task-specific utility, but pushing utility alone can concentrate generation on a narrow family of candidates. Maintaining diversity is difficult because sample diversity is a set-level property. We introduce Supergroup Relative Policy Optimization (SGRPO), a flexible GRPO-style framework that directly constructs rewards from set-level diversity. For each condition, SGRPO samples a supergroup of candidate sets, compares their diversity under the same condition, and redistributes the group diversity reward to individual rollouts through leave-one-out diversity contributions before combining it with rollout-level utility. This design decouples SGRPO from a particular generator, utility reward, or diversity metric, and allows instantiation with different GRPO-style approaches. We evaluate SGRPO on de novo small-molecule design, pocket-based small-molecule design, and de novo protein design, instantiating it with both GRPO and Coupled-GRPO across autoregressive and discrete diffusion generators. Across decoding sweeps, SGRPO expands the utility-diversity Pareto frontier and achieves the best frontier-level metrics relative to pretrained generators, GRPO, and memory-assisted GRPO when applicable. Our analyses further show that direct set-level diversity rewards remain effective with small groups and help preserve broader generation-distribution coverage during post-training. The code is available at https://github.com/IDEA-XL/SGRPO.

2603.06740 2026-05-20 q-bio.QM cs.AI

ViroGym: Realistic Large-Scale Benchmarks for Evaluating Viral Proteins

ViroGym: 用于评估病毒蛋白的现实大规模基准

Yichen Zhou, Jonathan Golob, Amir Karimi, Stefan Bauer, Patrick Schwab

AI总结 本文提出ViroGym,一个用于评估蛋白质语言模型在病毒蛋白上的表现的综合基准,通过三个任务评估pLMs:79个深入突变扫描实验、21个流感中和任务以及SARS-CoV-2的现实世界预测任务,发现ProGen2家族在所有任务中表现最佳。

详情
AI中文摘要

蛋白质语言模型(pLMs)在零样本预测错义变异效应方面显示出强大潜力,但对病毒蛋白的系统性基准评估仍然有限,这在需要提前预测新兴突变的工具方面是一个关键缺口。本文介绍ViroGym,一个全面的基准,评估pLMs在三个任务上的表现:79个覆盖真核病毒的深入突变扫描(DMS)实验,包含7个表型读数,552,065个突变序列;21个流感中和任务;以及SARS-CoV-2的现实世界大流行预测任务。我们对已建立的pLMs在适应度景观、抗原多样性及大流行预测任务上进行了基准测试,并发现ProGen2家族在所有三个任务中均表现最佳。关键的是,DMS和中和性能可靠地识别出能够泛化到现实世界突变的模型,即使它们所揭示的突变集几乎不重叠,这表明互补的体外基准能够捕捉到现实突变预测所需进化的约束条件。

英文摘要

Protein language models (pLMs) have shown strong potential for zero-shot prediction of missense variant effects, yet systematic benchmarking on viral proteins remains limited, a critical gap given the need for proactive tools that can anticipate emerging mutations ahead of experimental validation. Here we introduce ViroGym, a comprehensive benchmark evaluating pLMs across three tasks: 79 deep mutational scanning (DMS) assays covering eukaryotic viruses with 552,065 mutated sequences across 7 phenotypic readouts, 21 influenza neutralisation tasks, and a real-world pandemic prediction task for SARS-CoV-2. We benchmark well-established pLMs on fitness landscapes, antigenic diversity, and pandemic forecasting, and find that the ProGen2 family consistently achieves the strongest performance across all three tasks. Crucially, DMS and neutralisation performance reliably identifies models that generalise to real-world emergence, even though the mutation sets they surface barely overlap, revealing that complementary in vitro benchmarks capture the evolutionary constraints needed for real-world mutation forecasting.

2602.07570 2026-05-20 q-bio.NC cs.AI cs.CV cs.LG

How does longer temporal context enhance multimodal narrative video processing in the brain?

更长的时间上下文如何增强大脑对多模态叙事视频的处理?

Prachi Jindal, Anant Khandelwal, Manish Gupta, Bapi S. Raju, Subba Reddy Oota, Tanmoy Chakraborty

AI总结 本研究探讨了视频片段时长和叙事任务提示如何影响自然电影观看过程中大脑模型对多模态大语言模型(MLLMs)的对齐情况,发现增加片段持续时间显著提高了大脑对齐程度,而单模态视频模型则无明显提升。

详情
Comments
22 pages, 15 figures
AI中文摘要

理解人类和人工智能系统如何处理复杂的叙事视频是一个在神经科学和机器学习交汇处的基本挑战。本研究调查了视频片段的时间上下文长度(3-24秒片段)和叙事任务提示如何影响自然电影观看过程中大脑模型的对齐情况。利用受试者观看完整电影的fMRI记录,我们研究了对叙事上下文敏感的大脑区域如何在不同时间尺度上动态表示信息,以及这些神经模式如何与模型派生的特征对齐。我们发现,增加片段持续时间显著提高了多模态大语言模型(MLLMs)的大脑对齐程度,而单模态视频模型则几乎没有提升。进一步地,较短的时间窗口与感知和早期语言区域对齐,而较长的窗口则更倾向于与更高阶整合区域对齐,这在MLLMs中表现为层到皮层的层次结构。最后,使用四个叙事任务提示的实验显示,这些提示会引发任务特定、区域依赖性的大脑对齐模式,并在更高阶区域引起上下文依赖的片段级调谐变化。我们的工作将长篇叙事电影定位为研究长时间尺度时间整合在长上下文MLLMs中的原理性测试平台,以及其与叙事理解过程中皮层响应关系的桥梁。

英文摘要

Understanding how humans and artificial intelligence systems process complex narrative videos is a fundamental challenge at the intersection of neuroscience and machine learning. This study investigates how the temporal context length of video clips (3--24 s clips) and the narrative-task prompting shape brain-model alignment during naturalistic movie watching. Using fMRI recordings from participants viewing full-length movies, we examine how brain regions sensitive to narrative context dynamically represent information over varying timescales and how these neural patterns align with model-derived features. We find that increasing clip duration substantially improves brain alignment for multimodal large language models (MLLMs), whereas unimodal video models show little to no gain. Further, shorter temporal windows align with perceptual and early language regions, while longer windows preferentially align higher-order integrative regions, mirrored by a layer-to-cortex hierarchy in MLLMs. Finally, experiments with four narrative-task prompts show that they elicit task-specific, region-dependent brain alignment patterns and context-dependent shifts in clip-level tuning in higher-order regions. Our work positions long-form narrative movies as a principled testbed for studying long-timescale temporal integration in long-context MLLMs and its relationship to cortical responses during narrative comprehension.

2601.04122 2026-05-20 q-bio.GN

Tool Choice Matters: Evaluating edgeR vs. DESeq2 for Sensitivity, Robustness, and Cross-Study Performance

工具选择至关重要:评估edgeR与DESeq2在灵敏度、稳健性和跨研究性能方面的表现

Mostafa Rezapour

AI总结 本研究比较了edgeR和DESeq2两种常用的差异基因表达分析工具,评估了它们在样本量、异常值稳健性和跨研究性能方面的表现,发现edgeR在灵敏度、鲁棒性和泛化能力方面表现更优。

详情
AI中文摘要

差异基因表达(DGE)分析是转录组研究的基础,但工具选择可能显著影响结果。本研究通过真实和半模拟的bulk RNA-Seq数据集,全面比较了两种广泛使用的DGE工具edgeR和DESeq2。我们评估了工具在三个关键维度上的性能:(1)对样本量的灵敏度和对异常值的稳健性;(2)在发现数据集中唯一识别的基因集的分类性能;(3)工具特定基因集在独立研究中的泛化能力。首先,两种工具对模拟异常值的响应相似,随着更多异常值的加入,DEG集之间的Jaccard相似度降低。其次,基于工具特定基因训练的分类模型显示,edgeR在9个对比中实现了更高的F1分数,并更频繁达到完美或接近完美的精确度。Dolan-More性能曲线进一步表明,edgeR在更多数据集中保持了接近最优的性能。第三,在使用四个独立的SARS-CoV-2数据集进行跨研究验证时,edgeR唯一识别的基因集在分类持有数据集时表现出更高的AUC、精确度和召回率。这种模式在所有折叠中保持一致,某些测试案例使用edgeR特定基因实现了完美的分离。相比之下,DESeq2特定基因在不同研究中的表现较低且变化较大。总体而言,我们的发现强调,尽管DESeq2在严格显著性条件下可能识别更多DEGs,但edgeR在下游分类和跨研究复制中的基因集更稳健和可泛化,这突显了在转录组分析中工具选择的关键权衡。

英文摘要

Differential gene expression (DGE) analysis is foundational to transcriptomic research, yet tool selection can substantially influence results. This study presents a comprehensive comparison of two widely used DGE tools, edgeR and DESeq2, using real and semi-simulated bulk RNA-Seq datasets spanning viral, bacterial, and fibrotic conditions. We evaluated tool performance across three key dimensions: (1) sensitivity to sample size and robustness to outliers; (2) classification performance of uniquely identified gene sets within the discovery dataset; and (3) generalizability of tool-specific gene sets across independent studies. First, both tools showed similar responses to simulated outliers, with Jaccard similarity between the DEG sets from perturbed and original (unperturbed) data decreasing as more outliers were added. Second, classification models trained on tool-specific genes showed that edgeR achieved higher F1 scores in 9 of 13 contrasts and more frequently reached perfect or near-perfect precision. Dolan-More performance profiles further indicated that edgeR maintained performance closer to optimal across a greater proportion of datasets. Third, in cross-study validation using four independent SARS-CoV-2 datasets, gene sets uniquely identified by edgeR yielded higher AUC, precision, and recall in classifying samples from held-out datasets. This pattern was consistent across folds, with some test cases achieving perfect separation using edgeR-specific genes. In contrast, DESeq2-specific genes showed lower and more variable performance across studies. Overall, our findings highlight that while DESeq2 may identify more DEGs even under stringent significance conditions, edgeR yields more robust and generalizable gene sets for downstream classification and cross-study replication, which underscores key trade-offs in tool selection for transcriptomic analyses.

2512.20581 2026-05-20 q-bio.BM physics.bio-ph

MERGE-RNA: a physics-based model to predict RNA secondary structure ensembles with chemical probing

MERGE-RNA: 一种基于物理的模型,用于预测RNA二级结构集的化学探针

Giuseppe Sacco, Jianhui Li, Redmond P. Smyth, Guido Sanguinetti, Giovanni Bussi

AI总结 本研究提出MERGE-RNA模型,通过物理建模方法预测RNA二级结构集,克服了传统方法在解析化学探针数据方面的局限,实现了对RNA结构集的更准确和可解释的预测。

详情
AI中文摘要

RNA的功能与其二级结构密切相关,通过动态且异质的结构集发挥作用。尽管当前分析工具通常输出单一静态结构或平均接触图,但化学探针方法如DMS能够捕捉到核苷酸分辨率的信号,代表完整的结构集,但这些信号仍难以结构上解释。为此,我们提出了MERGE-RNA框架,该框架描述并输出RNA为结构集。通过建模实验流程的物理特性,MERGE-RNA学习了一小组可转移且可解释的参数,使不同分子、探针浓度和重复实验的测量能够在单一优化中整合,以提高鲁棒性。我们的模型采用最大熵原理来预测热力学分布,仅需最小的调整即可使结构集与实验数据对齐。我们验证了MERGE-RNA在多种RNA上的有效性,证明其结构准确性超过了标准伪自由能方法,并产生了能更好地再现测量DMS反应性的结构集。应用于V. vulnificus腺苷核糖开关时,MERGE-RNA恢复了NMR解析的构象及其配体诱导的重排,其群体转移与NMR得出的K_d相符。在我们报告的新DMS数据的定制RNA构造中,MERGE-RNA分解混合状态,揭示了参与链置换的瞬态中间群体,这些动态在传统分析方法中不可见。

英文摘要

RNA function is tied to secondary structure, operating through dynamic and heterogeneous structural ensembles. While current analysis tools typically output single static structures or averaged contact maps, chemical probing methods like DMS capture nucleotide-resolution signals representing the full structural ensemble, which remain difficult to interpret structurally. To address this, we present MERGE-RNA, a framework that describes and outputs RNA as a structural ensemble. By modeling the physics of the experimental pipeline, MERGE-RNA learns a small set of transferable and interpretable parameters, enabling the integration of measurements across different molecules, probe concentrations, and replicates in a single optimization to improve robustness. Our model employs a maximum-entropy principle to predict thermodynamic populations, with the minimal adjustments necessary to align the ensemble with experimental data. We validate MERGE-RNA on diverse RNAs, showing that it achieves structural accuracy surpassing standard pseudo-free-energy methods and yields ensembles better recapitulating measured DMS reactivity. Applied to the V. vulnificus adenine riboswitch, MERGE-RNA recovers the NMR-resolved conformations and their ligand-induced rearrangement, with population shifts matching the NMR-derived K_d. In a designed RNA construct for which we report new DMS data, MERGE-RNA deconvolves mixed states to reveal transient intermediate populations involved in strand displacement, dynamics invisible to traditional analysis methods.

2512.15236 2026-05-20 physics.bio-ph q-bio.MN

Modeling Plant Action Potentials under Photoperiod Stress via Hodgkin-Huxley Dynamics

通过Hodgkin-Huxley动力学建模光周期压力下的植物动作电位

Imen Bekkari, Maurizio Magarini, Hamdan Awan

AI总结 本研究通过Hodgkin-Huxley框架建模光周期压力下植物动作电位的动态特性,揭示了光周期转换过程中植物生物电信号的产生机制及POCE和NETO现象的特征。

详情
Comments
7 pages, 9 figures, accepted for IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS
AI中文摘要

植物表现出动态生物电信号特性,有助于跨组织的信息传递。本研究利用定制设计的生长室中的生物信号放大器和环境传感器,记录了烟草(Nicotiana tabacum)中的动作电位(APs)。在受控的12小时人工光照周期下,观察到一致的光诱导和暗诱导动作电位在光周期转换期间出现。为了理解这些生物电信号反应,采用基于Hodgkin-Huxley框架的数学模型。从番茄(Solanum lycopersicum)的电生理测量中发现,在自然光照条件下,仅观察到光诱导的动作电位,而在人工光周期快速转换期间,光诱导和暗诱导动作电位的耦合动态现象被唯一激发。这些不同的现象分别被表征为延长振荡气候参与(POCE)和灵活环境转换振荡(NETO)。该模型在两个框架中成功再现了关键特征,同时通过电压无关的速率参数保持了计算效率。

英文摘要

Plants exhibit dynamic bioelectric properties that facilitate information transfer across tissues. This study investigates action potentials (APs) in Nicotiana tabacum recorded within a custom-designed growth chamber using a biosignal amplifier and environmental sensors. Consistent light- and dark-induced APs were observed during photoperiod transitions under controlled 12-hour artificial illumination cycles. To understand these bioelectric responses, a mathematical model based on the Hodgkin-Huxley framework is used. Electrophysiological measurements from Solanum lycopersicum revealed that under natural light conditions, only light-induced APs are observed, while light- and dark-induced APs coupled dynamics is exclusively elicited during rapid transitions in artificial photoperiods. These distinct phenomena are characterized as Prolonged Oscillatory Climatic Engagement (POCE) and Nimble Environmental Transition Oscillation (NETO), respectively. The model successfully reproduces the key features in both frameworks while maintaining computational efficiency through voltage-independent rate parameters.

2512.00281 2026-05-20 cs.CV q-bio.NC

Beyond Size and Growth: Rethinking Lung Cancer Screening with AI Based Nodule Detection and Diagnosis

超越尺寸和增长:利用AI进行肺结节检测与诊断的肺癌筛查再思考

Sylvain Bodard, Pierre Baudot, Benjamin Renoust, Charles Voyton, Gwendoline De Bie, Ezequiel Geremia, Van-Khoa Le, Danny Francis, Pierre-Henri Siot, Yousra Haddou, Vincent Bobin, Jean-Christophe Brisset, Carey C. Thomson, Valerie Bourdes, Benoit Huet

AI总结 本文提出了一种基于AI的集成系统,通过低剂量CT扫描在结节层面直接进行结节检测和恶性评估,超越传统基于尺寸和增长的筛查标准,提高了肺癌筛查的准确性和效率。

详情
Comments
25 pages, 8 figures, with supplementary information containing 11 figures
AI中文摘要

早期检测恶性肺结节仍然受到基于尺寸和生长的筛查标准的限制,常常延迟诊断。我们提出了一种集成的AI系统,该系统在统一的CADe/CADx框架内,从低剂量CT扫描中联合执行结节检测和恶性评估。与传统将检测和诊断分开的流程不同,我们的方法直接针对恶性结节,重新定义了临床决策点的评估。为了解决数据集规模和可解释性限制,系统由一个大型集成模型(LEM)组成,结合了浅层深度学习和基于特征的模型。该系统在25,709例扫描中训练和评估,其中69,449个结节被标注,并在独立队列上进行了外部验证。其内部AUC为0.98,外部AUC为0.945,优于所有基于生长的指标、Lung RADS尺寸基于的分流、欧洲体积和VDT基于的筛查标准、放射科医生和领先的AI模型。该模型在低假阳性率下保持高灵敏度,对小和早期阶段的癌症表现出色,并能对不确定和缓慢生长的结节在一年内更早地评估恶性性。这种方法有潜力优化肺癌筛查流程,支持更早、更可行的临床决策。

英文摘要

Early detection of malignant lung nodules remains constrained by size and growth based screening criteria, often delaying diagnosis. We present an integrated AI system that jointly performs nodule detection and malignancy assessment directly at the nodule level from low dose CT scans, within a unified CADe/CADx framework. Unlike conventional pipelines separating detection and diagnosis, our approach targets malignant nodules directly, redefining evaluation at the point where clinical decisions are made. To address limitations in dataset scale and explainability, the system consists of a Large Ensemble Model (LEM) combining ensembles of shallow deep learning and feature based models. It was trained and evaluated on 25,709 scans with 69,449 annotated nodules, with external validation on an independent cohort. It achieved an AUC of 0.98 internally and 0.945 externally, outperforming all growth based metrics, Lung RADS size based triage, European volume and VDT based screening criteria, radiologists, and leading AI models. The model maintains high sensitivity at low false positive rates, excels for small and early stage cancers, and enables malignancy assessment up to one year earlier than radiologists for indeterminate and slow growing nodules. This approach has the potential to streamline lung cancer screening workflows and support earlier, more actionable clinical decision making.

2511.07847 2026-05-20 q-bio.CB math.DS

Matters of Life and Death in Computational Cell Biology

计算细胞生物学中的生死问题

Connor McShaffrey, Eran Agmon, Randall D. Beer

AI总结 本文探讨了计算细胞生物学中细胞生死边界的重要性,提出通过几何结构区分生存结果区域,为细胞命运提供新的全局原理,并论证了理想化模型在理解生命固有限制中的作用。

详情
Comments
10 pages, 6 figures
AI中文摘要

几乎所有的细胞模型都显式或隐式地处理了必须被尊重的生物物理约束,以确保生命得以延续。尽管如此,这些约束的实现却缺乏系统性,我们对细胞动态如何与这些约束相互作用以及这些约束如何在实际生物学中产生缺乏系统性的理解。计算细胞生物学只有将生死边界作为核心概念,建立细胞存活理论,才能克服这些担忧。我们通过展示特定几何结构如何在模型中分离出具有相似生存结果区域,为细胞命运提供新的全局组织原理。我们还论证了理想化模型为理解生命固有限制提供了一种可操作的方法。

英文摘要

Nearly all cell models explicitly or implicitly deal with the biophysical constraints that must be respected for life to persist. Despite this, there is almost no systematicity in how these constraints are implemented, and we lack a principled understanding of how cellular dynamics interact with them and how they originate in actual biology. Computational cell biology will only overcome these concerns once it treats the life-death boundary as a central concept, creating a theory of cellular viability. We lay the foundation for such a development by demonstrating how specific geometric structures can separate regions of qualitatively similar survival outcomes in our models, offering new global organizing principles for cell fate. We also argue that idealized models of emergent individuals offer a tractable way to begin understanding life's intrinsically generated limits.

2511.02622 2026-05-20 q-bio.BM physics.bio-ph physics.comp-ph

Machine Learning for RNA Secondary Structure Prediction: a review of current methods and challenges

利用机器学习进行RNA二级结构预测:当前方法和挑战的综述

Giuseppe Sacco, Giovanni Bussi, Guido Sanguinetti

AI总结 本文综述了RNA二级结构预测中当前方法和挑战,重点讨论了机器学习和深度学习在该领域的应用,以及数据稀缺性带来的泛化危机,同时展望了未来需要解决的难题,如复杂结构的预测和动态结构的建模。

详情
Journal ref
RNA 32(4): 443-456 (2026)
Comments
22 pages, 3 figures. Updated version of the article published in RNA 32(4):443-456 (2026); the Foundation Models section has been revised to reflect developments since publication. The remainder of the manuscript is unchanged apart from formatting
AI中文摘要

预测RNA的二级结构是计算生物学中的核心挑战,对于理解分子功能和设计新型治疗剂至关重要。该领域从基础但准确性有限的热力学方法发展到由机器学习和深度学习主导的数据驱动范式。这些模型直接从数据中学习折叠模式,带来了显著的性能提升。本文综述了这些方法的现代景观,涵盖了单序列、基于进化和混合模型,这些模型结合了机器学习与生物物理。一个核心主题是该领域面临的“泛化危机”,即强大的模型在新的RNA家族上失效,促使社区转向更严格、基于同源性的基准测试。为应对数据稀缺的根本挑战,RNA基础模型应运而生,通过大规模、未标记的序列语料库学习,以提高泛化能力。最后,我们展望了下一阶段的主要障碍,包括准确预测复杂结构如假结,扩展到千碱基长度的信使RNA,纳入修饰核苷酸的化学多样性,以及将预测目标从静态结构转向动态集合,以更好地捕捉生物学功能。我们还强调了需要一个标准化、前瞻性基准测试系统,以确保无偏验证并加速进展。

英文摘要

Predicting the secondary structure of RNA is a core challenge in computational biology, essential for understanding molecular function and designing novel therapeutics. The field has evolved from foundational but accuracy-limited thermodynamic approaches to a new data-driven paradigm dominated by machine learning and deep learning. These models learn folding patterns directly from data, leading to significant performance gains. This review surveys the modern landscape of these methods, covering single-sequence, evolutionary-based, and hybrid models that blend machine learning with biophysics. A central theme is the field's "generalization crisis," where powerful models were found to fail on new RNA families, prompting a community-wide shift to stricter, homology-aware benchmarking. In response to the underlying challenge of data scarcity, RNA foundation models have emerged, learning from massive, unlabeled sequence corpora to improve generalization. Finally, we look ahead to the next set of major hurdles-including the accurate prediction of complex motifs like pseudoknots, scaling to kilobase-length transcripts, incorporating the chemical diversity of modified nucleotides, and shifting the prediction target from static structures to the dynamic ensembles that better capture biological function. We also highlight the need for a standardized, prospective benchmarking system to ensure unbiased validation and accelerate progress.

2410.09447 2026-05-20 physics.bio-ph q-bio.MN

Evolutionary origin of the bipartite architecture of dissipative cellular networks

耗散细胞网络双分结构的进化起源

Bowen Shi, Long Qian, Qi Ouyang

AI总结 研究探讨了耗散细胞网络双分结构的进化起源,通过进化模拟分析发现,网络在进化过程中倾向于将高能分子作为燃料与功能模块解耦,从而提高整体耗散并增强网络的鲁棒性。

详情
Comments
12 pages, 6 figures
AI中文摘要

最近,大量研究已探讨能量耗散在生物网络中的作用,大多数研究聚焦于耗散与功能的关系。然而,网络科学的发展促使我们深入理解生物网络的系统结构及其进化优势。我们发现生物耗散网络的耗散与其结构密切相关。通过分析这些适应良好的网络,我们发现能量产生模块在所有情况下相对孤立。我们对经典耗散网络的早期网络进行了进化模拟和分析,包括动能证明阅读、激活-抑制振荡器和两种典型适应性响应模型。我们发现尽管仅对网络功能施加选择压力,网络仍倾向于将高能分子作为燃料与功能模块解耦,以在进化过程中实现更高的整体耗散。此外,我们发现解耦的燃料模块可以增强网络对参数或结构扰动的鲁棒性。我们对动能证明阅读网络和能量驱动网络的一般情况提供了理论分析。我们发现燃料解耦可以保证更高的耗散,并在大多数情况下考虑耗散网络时具有更高的性能。我们得出结论,燃料解耦是一种进化结果,并在进化过程中带来益处。

英文摘要

Recently, plenty research has been done on discovering the role of energy dissipation in biological networks, most of which focus on the relationship of dissipation and functionality. However, the development of networks science urged us to fathom the systematic architecture of biological networks and their evolutionary advantages. We found the dissipation of biological dissipative networks is highly related to their structure. By interrogating these well-adapted networks, we find that the energy producing module is relatively isolated in all situations. We applied evolutionary simulation and analysis on premature networks of classic dissipative networks, namely kinetic proofreading, activator-inhibitor oscillator and two typical adaptative response models. We found despite that selection was imposed merely on the network function, the networks tended to decouple high energy molecules as fuels from the functional module, to achieve higher overall dissipation during the course of evolution. Furthermore, we find that decoupled fuel modules can increase the robustness of the networks towards parameter or structure perturbations. We provide theoretical analysis on the kinetic proofreading networks and the general case of energy-driven networks. We find fuel decoupling can guarantee higher dissipation and, in most cases when considering dissipative networks, higher performance. We conclude that fuel decoupling is an evolutionary outcome and bears benefits during evolution.

2402.17086 2026-05-20 q-bio.QM cs.NA math.DS math.NA physics.bio-ph

Multicellular simulations with shape and volume constraints using optimal transport

具有形状和体积约束的多细胞模拟:基于最优传输理论

Antoine Diez, Jean Feydy

AI总结 本文提出了一种基于最优传输理论的新框架,用于模拟具有任意动态形状和可变形性质的粒子系统,通过自动处理体积排除约束,实现了对细胞聚集、组织等复杂系统的高效建模。

详情
AI中文摘要

许多生物和物理系统,如细胞团块、组织或细菌群落,表现出非传统粒子系统的行为,这些系统受到体积排除和形状相互作用的强烈约束。理解这些约束如何导致宏观自组织结构是发育生物学等领域的基本问题。为此,已开发了各种计算模型。本文介绍了一种基于最优传输理论的新框架,用于建模具有任意动态形状和可变形性质的粒子系统。我们的方法基于Brenier关于不可压缩流体的开创性工作及其在材料科学中的最新应用。它允许指定单个细胞的形状和体积,并支持多种相互作用机制,同时以可接受的数值成本自动处理体积排除约束。我们通过再现计算生物学中的几个经典系统来展示该方法的多功能性。我们的Python代码可在https://iceshot.readthedocs.io/免费获取。

英文摘要

Many living and physical systems such as cell aggregates, tissues or bacterial colonies behave as unconventional systems of particles that are strongly constrained by volume exclusion and shape interactions. Understanding how these constraints lead to macroscopic self-organized structures is a fundamental question in e.g. developmental biology. To this end, various types of computational models have been developed. Here, we introduce a new framework based on optimal transport theory to model particle systems with arbitrary dynamical shapes and deformability properties. Our method builds upon the pioneering work of Brenier on incompressible fluids and its recent applications to materials science. It lets us specify the shapes and volumes of individual cells and supports a wide range of interaction mechanisms, while automatically taking care of the volume exclusion constraint at an affordable numerical cost. We showcase the versatility of this approach by reproducing several classical systems in computational biology. Our Python code is freely available at https://iceshot.readthedocs.io/.

2210.09286 2026-05-20 math.PR q-bio.PE

An interacting particle system for the front of an epidemic advancing through a susceptible population

一种用于流行病在易感人群中传播的相互作用粒子系统

Eliana Fausti, Andreas Sojmark

AI总结 本文提出了一种相互作用粒子系统,通过异质扩散动力学来建模流行病的传播,而非传统 compartmental 模型中的人口层面外源性接触和传播率。每个个体有一个一维的保护水平,其演变由一个随机微分方程驱动,该方程在流行病前沿处反射。前沿由累积感染驱动,与前沿的碰撞代表有风险的情况,可能导致感染,这取决于一个非马尔可夫机制,涉及局部时间、内在传染性以及当前流行病中的传染性。文章给出了系统的严格构造,并开发了两个关键技术工具:感染比例的补偿鞅性质和一般结果,即在随机时间依赖的双射下,局部时间如何转换。前者导致新感染期望数的分解,类似于 SIR 模型中的相应分解。后者允许在适当条件后,将每个粒子的分布表示为带有漂移的广义弹性布朗运动。

详情
Comments
38 pages, 3 figures
AI中文摘要

我们引入了一种相互作用粒子系统,该系统通过异质扩散动力学来建模流行病的传播,而不是传统 compartmental 模型中的人口层面外源性接触和传播率。每个个体有一个一维的保护水平,其演变由一个随机微分方程驱动,该方程在流行病前沿处反射。前沿由累积感染驱动,与前沿的碰撞代表有风险的情况,可能导致感染,这取决于一个非马尔可夫机制,涉及局部时间、内在传染性以及当前流行病中的传染性。我们给出了该系统的严格构造,并开发了两个关键技术工具:感染比例的补偿鞅性质和一般结果,即在随机时间依赖的双射下,局部时间如何转换。前者导致新感染期望数的分解,类似于 SIR 模型中的相应分解。后者允许在适当条件后,将每个粒子的分布表示为广义弹性布朗运动。

英文摘要

We introduce an interacting particle system that models the spread of an epidemic in terms of heterogeneous diffusive dynamics, rather than exogenous contact and transmission rates at the population level as in classical compartmental models. Each individual has a one-dimensional level of shielding that evolves according to a stochastic differential equation reflected at the advancing front of the epidemic. The front is driven by cumulative infections, and collisions with it represent at-risk situations which may lead to infection depending on a non-Markovian mechanism that involves the local time, the intrinsic transmissibility, and the current contagiousness within the population. We give a rigorous construction of the system and develop two key technical tools: a compensated martingale property for the infected proportion and a general result on how local time transforms under a random time-dependent bijection of the state space. The former yields a decomposition of the expected number of new infections that parallels a corresponding decomposition in the SIR model. The latter allows us to represent the law of each particle, after suitable conditioning, as a generalised elastic Brownian motion with drift.