arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.22755 2026-05-22 q-bio.QM

Assessing global drivers of forest transpiration using clustered machine learning models

利用聚类机器学习模型评估全球森林蒸腾作用的驱动因素

Morgan Thornwell, David Yang, Cheng-Wei Huang, Peyman Abbaszadeh, Samantha Hartzell

AI总结 本文通过聚类机器学习模型分析全球森林蒸腾速率的驱动因素,发现不同生物群落和植物功能类型对环境变量的响应存在显著差异,揭示了蒸腾作用在不同气候条件下的调控机制。

详情
AI中文摘要

理解森林蒸腾作用的环境驱动因素对于改进全球水分可用性和生态系统健康预测至关重要。然而,由于植物水分胁迫和生态系统蒸腾作用受到多种控制因素的影响,这些驱动因素可能在适应当地气候条件的树种之间差异很大。在这里,使用SAPFLUXNET数据库,通过按生物群落和植物功能类型进行聚类的两种策略,分析了全球森林蒸腾速率的驱动因素。使用随机森林算法和神经网络算法对每个聚类进行预测。分析了每种模型的性能和特征重要性,并将其与评估每个聚类性能的环境变量进行比较。通过定义站点聚类,这些模型能够预测广泛地理区域和树种的蒸腾作用及其环境驱动因素。与训练整个数据集的模型相比,高性能的聚类模型在测量数据上的R²值在0.74至0.90之间,其中在最多36个站点的中等大小聚类中达到最高性能。不同聚类之间特征重要性存在显著差异,表明蒸腾作用的关键预测因子在植物功能类型和生物群落之间变化强烈。总体而言,水分受限的气候更受土壤湿度控制,而高年均温度的气候则更受太阳辐射控制,对空气温度的依赖性较低。这些发现提供了关于森林蒸腾作用如何响应环境因素的见解,范围涵盖了广泛的气候类型和树种。

英文摘要

Understanding the environmental drivers of forest transpiration is critical for improving global predictions of water availability and ecosystem health. Due to many competing controls on plant water stress and ecosystem transpiration, however, these drivers may vary widely across tree species which have adapted hydraulically to local climate conditions. Here, clustered machine learning models were used to analyze global drivers of forest transpiration rates using the SAPFLUXNET database. Sap flux data from a total of ninety-five sites spanning seven biomes were grouped using two clustering strategies: by biome and by plant functional type. Two supervised machine learning algorithms, a random forest algorithm and a neural network algorithm, were used to predict rates of sap flux for each cluster. The performance and feature importance in each model were analyzed and compared to evaluate the environmental variables that control each cluster's performance. By defining site clusters, these models are able to predict transpiration and its environmental drivers across a wide variety of geographical sites and tree species. Unlike models trained on the entire dataset, high-performing clustered models achieved R$^2$ values to measurement data in the range of 0.74 to 0.90, with the highest performance being achieved in mid-sized clusters of up to thirty-six sites. There was high variance in feature importance between clusters, indicating that key predictors of transpiration varied strongly across both plant functional type and biome. Overall, water-limited climates tended to be more controlled by soil moisture, whereas climates with high mean annual temperature tended to be more controlled by solar radiation and less dependent on air temperature. These findings provide insights into how forest transpiration responds to environmental factors across a wide range of climate types and tree species.

2605.22665 2026-05-22 q-bio.PE

Fitness Inference in Presence of Migrations between Coupled Evolving Populations

在存在迁移的耦合进化种群中推断适应度

Yu-Han Huang, Bastien Dumont, Hong-Li Zeng, John Barton, Erik Aurell

AI总结 本文研究了在存在迁移的耦合进化种群中如何推断适应度,通过扩展QLE理论,利用FFPopSim生成的全基因组时间序列数据,展示了在低迁移率条件下QLE相的持续性,并推导出用于准确估计加性适应度和基因相互作用的解析推断关系。

详情
Comments
23 pages,7 figures
AI中文摘要

在进化种群中,准连锁平衡(QLE)阶段类似于统计力学中的热平衡状态,这一概念由Kimura于1965年首次在两个位点、两个等位基因模型中提出。QLE描述了由选择、突变、重组和遗传漂变相互作用维持的稳态。本文将QLE理论扩展到通过迁移相互连接的种群,这是一种基本的进化力,将相互作用的亚种群的进化动态耦合在一起。具体而言,我们研究了通过对称或不对称迁移相互作用的两个种群,在多位点选择下进化的情况。利用通过FFPopSim生成的全基因组时间序列数据,我们证明在足够低的迁移率条件下,QLE相得以维持。在此条件下,我们推导出解析推断关系,从而能够准确且定量地估计加性适应度和基因相互作用。

英文摘要

The phase of Quasi-Linkage Equilibrium (QLE) in evolutionary populations is analogous to the thermal equilibrium state in statistical mechanics, a concept pioneered by Kimura in 1965 for two-locus two-allele models. QLE describes a stationary state maintained by the interplay of selection, mutation, recombination and genetic drift. Here we extend QLE theory to populations connected by migration, a fundamental evolutionary force that couples the evolutionary dynamics of interacting subpopulations. Specifically, we examine two populations interacting via symmetric or asymmetric migration while evolving under multi-locus selection. Using whole-genome time-series data generated through FFPopSim, we demonstrate that the QLE phase persists under conditions of sufficiently low migration rates. In this regime, we derive analytical inference relations that allow for the accurate and quantitative estimation of both additive fitness and epistatic interactions.

2605.22588 2026-05-22 q-bio.PE

Changes in behaviour when adherers to an intervention experience a different epidemic than non-adherers

当干预执行者经历不同的流行病时行为的变化

Yuan Liu, Michael Sieber, Bin Wu, Arne Traulsen

AI总结 本文研究了在流行病中,干预执行者与非执行者行为变化对疾病传播的影响,提出了一个结合行为和疾病动态的SIR框架模型,分析了不同策略下的感染情况。

详情
AI中文摘要

非药物干预(NPIs),包括戴口罩、物理隔离和卫生措施,在流行病早期阶段主要通过减少传播来发挥作用。个体采用两种策略之一:遵守(A)或不遵守(N)NPIs。这些策略影响传播率,从而影响感染人数,但同时也伴随着固有的成本和收益。我们提出了一种基于SIR框架的模型,将行为与疾病动态结合起来,适用于执行者和非执行者。这导致了六个行为流行病学组别。通过数值模拟和分析考虑,我们首先研究了策略固定的情况。更强的NPIs和更多的初始执行者导致感染人数减少,执行者始终经历较低的感染风险。然后我们引入了基于两种策略收益和成本的行为切换。当NPIs有效时,较高的传播率促进执行,从而导致感染人数减少。令人惊讶的是,在高严重性爆发中,即使NPIs效果较弱,也能显著减少感染。这些发现突显了行为与疾病动态耦合的关键作用,并强调了个体选择如何影响或补偿公共卫生干预措施。

英文摘要

Non-pharmaceutical interventions (NPIs), including mask-wearing, physical distancing, and hygiene measures, provide the primary means of reducing transmission in the early stages of an epidemic. Individuals adopt one of two strategies-adherence (A) or non-adherence (N) to NPIs. These strategies influence the transmission rate and thus the number of infections, but they also come with inherent costs and benefits. We propose a model coupling behavior and disease dynamics in adherers and non-adherers based on the SIR framework. This gives rise to six behavioral-epidemiological compartments. Using numerical simulations and analytical considerations, we first examine the case where strategies are fixed. Stronger NPIs and more initial adherers lead to fewer infections, and adherers consistently experience lower infection risk than non-adherers. We then introduce behavioral switching based on the benefits and costs of the two strategies. When NPIs are effective, higher transmission rates promote adherence, resulting in fewer infections. Strikingly, in high-severity outbreaks, even modestly effective NPIs can significantly reduce infections. These findings highlight the critical role of the coupling between behavior and disease dynamics, and underscore how individual choices can compromise or compensate public health interventions.

2605.22523 2026-05-22 q-bio.NC

Learning sequence timing and control of replay speed in networks of spiking neurons

学习序列时间并控制网络中脉冲神经元的回放速度

Melissa Lober, Younes Bouhadjar, Markus Diesmann, Tom Tetzlaff

AI总结 本文提出了一种机制,通过元素特定神经元群体的顺序激活来表示序列元素的持续时间,使模型能够跨多种时间尺度编码序列,并展示了振荡背景输入作为时钟信号,提供了一种灵活的回放速度控制机制。

详情
AI中文摘要

处理序列输入是大脑的基本功能,支撑着感觉感知、语言和运动控制等任务。在序列处理中,一个挑战是不仅要表示事件的顺序,还要精确表示其时间。现有的计算模型可以学习序列结构,但许多缺乏生物合理机制来编码元素特定的时间并灵活控制序列回放的速度。spiking Temporal Memory (sTM)模型,一种生物启发的网络模型,为序列处理的关键方面提供了框架。在sTM模型中,每个序列元素由一小组神经元同步放电表示,其中活跃神经元的集合在序列上下文中编码元素的身份。然而,在其原始版本中,sTM模型只能学习顺序,而不能学习序列元素的时间。此外,在神经科学中,如何灵活调节序列回放的速度仍是一个开放问题。我们提出了一种机制,通过元素特定神经元群体的顺序激活来表示序列元素的持续时间,使模型能够跨多种时间尺度编码序列。这为学习和回放复杂的时间模式提供了生物合理的基础。此外,我们还表明,振荡背景输入可以作为时钟信号,并提供了一种稳健且灵活的机制来控制序列回放的速度。我们的发现表明,经过的时间由独特的、稀疏的时空神经活动模式编码,并且清醒和睡眠期间的序列回放速度与EEG或LFP记录中观察到的全局振荡活动特征相关。

英文摘要

Processing sequential inputs is a fundamental brain function, underlying tasks such as sensory perception, language, and motor control. A challenge in sequence processing is to represent not only the order of events, but also their precise timing. While existing computational models can learn sequential structure, many lack biologically plausible mechanisms to encode element-specific timing and to flexibly control the speed of sequence replay. The spiking Temporal Memory (sTM) model, a biologically inspired network model, provides a framework for key aspects of sequence processing. In the sTM model, each sequence element is represented by a small set of neurons firing synchronously, where the set of active neurons encodes the element's identity in its sequential context. In its original version, however, the sTM model learns the order but not the timing of sequence elements. Further, it remains an open question in neuroscience how the speed of sequence replay can be flexibly modulated. We propose a mechanism where the duration of sequence elements is represented by a sequential activation of element specific neuronal populations, enabling the model to encode sequences across a wide range of timescales. This provides a biologically plausible basis for learning and replaying complex temporal patterns. Additionally, we show that oscillatory background inputs can serve as a clock signal and provide a robust and flexible mechanism for controlling the speed of sequence replay. Our findings suggest that elapsed time is encoded by unique and sparse spatiotemporal patterns of neural activity, and that the speed of sequence replay during wakefulness and sleep is correlated to the characteristics of global oscillatory activity observed in EEG or LFP recordings.

2605.22401 2026-05-22 cs.LG cs.NE q-bio.NC

Cross-Species RSA Reveals Conserved Early Visual Alignment but Divergent Higher-Area Rankings Across Human fMRI and Macaque Electrophysiology

跨物种RSA揭示人类fMRI和猴子电生理学中早期视觉对齐的保守性,但更高区域的排名却呈现分歧

Nils Leutenegger

AI总结 该研究通过跨物种比较,发现早期视觉对齐在人类和猴子之间具有保守性,但更高区域的对齐性受模型容量和刺激域影响。

详情
Comments
9 pages, 6 figures
AI中文摘要

学习规则与大脑对齐之间的关系是否在物种间通用?我们扩展了之前的发现,即未经训练的CNN在人类V1中与反向传播匹配,通过将相同的五个学习规则应用于猴子电生理学进行测试。这些规则包括反向传播(BP)、反馈对齐(FA)、预测编码(PC)、脉冲时间依赖性可塑性(STDP)以及一个未经训练的随机权重基线。猴子数据来自两个数据集:MajajHong2015(V4/IT,3,200次刺激呈现,88/168个神经元)和FreemanZiemba2013(V1/V2,135个刺激,102/103个神经元)。使用与人类研究中相同的模型权重进行RSA分析,我们发现:(1)所有模型在猴子早期视觉皮层(V1/V2)的对齐度(rho = 0.15-0.30)高于人类fMRI(rho = 0.01-0.08),这与电生理学更高的信噪比一致;(2)STDP和PC在猴子V1/V2的对齐度最高(rho ~ 0.30和0.28),这与它们在人类V1中训练规则中的领先位置一致;(3)在IT区域,学习规则的跨物种排名无显著相关性(Kendall's tau = 0.00,p = 1.00),尽管这一结果预期,因为n = 5只在tau = ±1.0时有统计效力,且进一步受到刺激集差异的影响;(4)预训练的ResNet-50(ImageNet)在猴子IT区域达到rho = 0.25,显著高于所有自定义CNN条件(rho = 0.07-0.14),表明IT区域的对齐受限于模型容量和训练数据,而非学习规则。信噪比、多种子变异性(5个种子)和刺激控制分析被报告。这些结果表明,早期视觉对齐在物种间具有鲁棒性,而更高区域的对齐受模型容量和刺激域影响。

英文摘要

Does the relationship between learning rules and brain alignment generalize across species? We extend our prior finding that untrained CNNs match backpropagation at human V1 by testing the same five learning rules against macaque electrophysiology. The rules are backpropagation (BP), feedback alignment (FA), predictive coding (PC), spike-timing-dependent plasticity (STDP), and an untrained random-weights baseline. The macaque data come from two datasets: MajajHong2015 (V4/IT, 3,200 stimulus presentations, 88/168 neurons) and FreemanZiemba2013 (V1/V2, 135 stimuli, 102/103 neurons). Using RSA with identical model weights from our human study, we find: (1) all models achieve higher alignment with macaque early visual cortex (rho = 0.15-0.30 at V1/V2) than with human fMRI (rho = 0.01-0.08), consistent with the higher signal-to-noise ratio of electrophysiology; (2) STDP and PC produce the highest macaque V1/V2 alignment (rho ~ 0.30 and 0.28), consistent with their leading position among trained rules in human V1; (3) at IT, learning rule rankings show no detectable correlation across species (Kendall's tau = 0.00, p = 1.00), though this null result is expected given that n = 5 provides power only at tau = +/-1.0, and is further confounded by stimulus set differences; (4) a pretrained ResNet-50 (ImageNet) achieves rho = 0.25 at macaque IT, substantially above all custom CNN conditions (rho = 0.07-0.14), suggesting IT alignment is limited by model capacity and training data rather than by the learning rule. Noise ceilings, multi-seed variability (5 seeds), and a stimulus-control analysis are reported. These results demonstrate that early visual alignment is robust across species, while higher-area alignment is modulated by model capacity and stimulus domain.

2605.22352 2026-05-22 q-bio.PE math.ST stat.AP stat.CO stat.ME stat.TH

Spatiotemporal dynamics and ecological risk factors of highly pathogenic avian influenza A(H5N1) in Canadian wildlife: A One Health surveillance analysis

加拿大野生动物高致病性禽流感A(H5N1)的时空动态及生态风险因素:一项One Health监测分析

Hammed Olawale Fatoyinbo, Hoyeon Jeong

AI总结 本研究通过分析加拿大2022-2026年野生动物H5N1疫情监测数据,揭示了该病毒的时空动态及与检测数量相关的风险因素,采用描述流行病学、空间聚类方法和负二项混合模型进行分析,发现 Eurasian-North American 病毒谱系主导检测,并识别出年份、季节和谱系是关键预测因子。

详情
AI中文摘要

高致病性禽流感A(H5N1)已扩展到地理和生态层面,影响野生鸟类、哺乳动物野生动物、家畜和人类。野生动物监测为One Health准备提供了关键的早期预警,但整合宿主生态、空间模式、季节性、病毒谱系和风险因素的国家层面分析仍有限。本研究分析了加拿大2022至2026年野生动物H5N1监测记录,以表征时空动态并识别与检测数量相关的因素。通过描述流行病学、空间聚类方法和负二项混合模型对2657次检测进行了回顾性分析。检测主要为鸟类,水鸟和猛禽为主要宿主群体,而哺乳动物占较小但流行病学上重要比例。检测负担在2022年最高,秋季和春季活动增加。安大略、阿尔伯塔和不列颠哥伦比亚被识别为主要热点区域,部分草原地区有局部聚类证据。重组欧亚-北美谱系主导检测,并与更高的检测数量强相关。模型结果将年份、季节和谱系识别为关键预测因子。这些发现支持基于风险的One Health监测,优先考虑高负担地区、与迁徙相关的时期、关键鸟类宿主群体、重组病毒谱系以及持续监测哺乳动物野生动物。

英文摘要

Highly pathogenic avian influenza A(H5N1) has expanded geographically and ecologically, affecting wild birds, mammalian wildlife, domestic animals, and humans. Wildlife surveillance provides critical early warning for One Health preparedness, yet national-scale analyses integrating host ecology, spatial patterns, seasonality, viral lineage, and risk factors remain limited. This study analysed Canadian wildlife HPAI A(H5N1) surveillance records from 2022 to 2026 to characterise spatiotemporal dynamics and identify factors associated with detection counts. A retrospective analysis of 2,657 detections across 13 provinces and territories was conducted using descriptive epidemiology, spatial clustering methods, and Negative Binomial mixed models. Detections were predominantly avian, with waterfowl and raptors as the major host groups, while mammals accounted for a smaller but epidemiologically important proportion. Detection burden was highest in 2022, with increased activity in autumn and spring. Ontario, Alberta, and British Columbia were identified as major hotspots, with evidence of local clustering in parts of the Prairie region. Reassortant Eurasian-North American lineages dominated detections and were strongly associated with higher detection counts. Modelling results identified year, season, and lineage as key predictors. These findings support risk-based One Health surveillance prioritising high-burden regions, migration-associated periods, key avian host groups, reassortant viral lineages, and continued monitoring of mammalian wildlife.

2605.22145 2026-05-22 q-bio.OT

Persistent Homology as a Morphological Signature of Fibrin Networks

持久同调作为纤维蛋白网络的形态学特征

Thomas Burnett, Theresa Reinhold, Bea Bleile, Sophie Raynor, Freya Jensen, Martin Hermann, Tua Gyldenholm, Yossi Bokor Bleile

AI总结 本文研究了拓扑数据分析(TDA)在分析食管癌患者手术过程中高分辨率共聚焦显微镜图像中纤维蛋白网络结构的应用性。通过分析血样中纤维蛋白网络的形成图像,发现术前术后纤维蛋白网络拓扑结构无显著差异,且标准组与干预组之间也没有一致的结构差异。

详情
Comments
12 pages, 5 figures
AI中文摘要

我们呈现了一项研究,探讨拓扑数据分析(TDA)在分析接受旨在治愈手术的食管癌患者高分辨率共聚焦显微镜图像中纤维蛋白网络结构的应用性。对血样中纤维蛋白网络形成图像的分析带来了关于血凝固、出血风险和血栓形成的新的知识。收集的血样中纤维蛋白网络的图像通过共聚焦显微镜捕获,并分析了三维z堆栈。每个z堆栈都被裁剪到中心区域进行分析,其有效性在详细评估中。总体而言,我们发现术前术后纤维蛋白网络拓扑结构无显著差异,且标准组与干预组之间也没有一致的结构差异。

英文摘要

We present an investigation of the applicability of topological data analysis (TDA) to the study of high-resolution confocal microscopy images of fibrin network structures from patients with oesophageal cancer undergoing intended curative surgery. Investigation of clot structure brings new knowledge about blood coagulation, risk of bleeding, and thrombosis in this group of patients. Images of fibrin network formation in the collected blood samples were captured by confocal microscopy and three-dimensional z-stacks were analysed. Each z-stack was cropped to a centre region for analysis, the validity of which is assessed in detail. Overall, we found no significant differences in fibrin network topology across the perioperative period, and no consistent differences in network structure between the standard and intervention groups.

2605.00024 2026-05-22 q-bio.NC eess.SP

Self-organized criticality enables conscious integration through brain-body resonance

自组织临界性通过脑-体共振实现意识整合

Ahmed Gamal Eldin

AI总结 该研究通过脑-体共振维持的自组织临界性揭示了意识整合的机制,发现传统预处理方法会破坏整合动态,而原始数据中的临界动态支持大规模神经协调与事件相关处理的耦合。

详情
AI中文摘要

自组织临界性通过脑-体共振维持,使意识整合成为可能。我们使用64通道EEG数据表明,传统预处理方法无意中消除了其试图测量的整合动态。移除通常被视为'伪影'的生理信号会显著降低全局相位同步与刺激诱发振幅之间的共享方差,这种效应高度特异于生理成分。我们追溯到78毫秒的基本脑-体共振,其通过强大的双向因果关系建立零延迟同步。关键的是,原始数据表现出重尾幂律动态,表明接近临界状态,而传统清洗数据明确拒绝幂律分布,表明人为转向亚临界状态。最后,我们展示这些临界动态能够实现全息信息编码,证据是共振后显著出现的空间干涉图案。这些发现表明,生理信号积极且选择性地支持大规模神经协调与事件相关处理之间的耦合。

英文摘要

The "binding problem" of how distributed neural activity unifies into conscious experience has remained an open challenge since its articulation in 1890. We present evidence that conscious integration relies on self-organized criticality maintained by brain-body resonance, placing human cognition within the universality class of critical systems. Using 64-channel EEG data, we demonstrate that conventional preprocessing inadvertently eliminates the very integrative dynamics it seeks to measure. Removing physiological signals conventionally treated as "artifacts" drastically reduces the shared variance between global phase synchronization and stimulus-evoked amplitude, an effect highly specific to physiological components. We trace this to a fundamental brain-body resonance at 78 milliseconds that establishes zero-lag synchronization driven by robust bidirectional causality. Crucially, raw data exhibits heavy-tailed avalanche dynamics indicative of a near-critical regime, whereas conventionally cleaned data definitively rejects power-law distributions, signaling an artificial shift to subcriticality. Finally, we show these critical dynamics enable holographic information encoding, evidenced by a significant emergence of spatial interference patterns post-resonance. Together, these findings indicate that physiological signals actively and selectively support the coupling between large-scale neural coordination and event-related processing.

2604.24365 2026-05-22 q-bio.QM q-bio.NC

Persistent and anti-persistent stride-to-stride fluctuations: an ARFIMA decomposition consistent with closed-loop sensorimotor control

持久性和反持久性步间波动:一种与闭环本体感觉运动控制一致的ARFIMA分解

Philippe Terrier

AI总结 本文研究了人类行走中步间波动的自相似结构,发现其在外部提示下符号反转,自我节律步态为持久性,而节拍或视觉提示步态为反持久性。传统DFA分析无法区分真正的长期记忆动态和短时记忆ARMA过程,本文通过ARFIMA(1,d,1)模型分析发现长期记忆模型在持久性和反持久性条件下均占优,揭示了提示步态反持久性为真正的分数现象。

详情
Comments
Main article: pp. 1-42 (5 figures, 3 tables). Supplementary Materials appended: S1 - Effect of series length on ARFIMA and DFA outcomes (Hausdorff Tier 3), pp. 43-47; S2 - Morris elementary-effects screening of the ARFIMA/DFA pipeline, pp. 48-59. Reproduction archive: doi:10.5281/zenodo.19676064
AI中文摘要

人类行走中的步间波动具有自相似结构,其符号在外部提示下反转:自我节律步态呈持久性,而节拍或视觉提示步态呈反持久性。三十年来,去趋势波动分析(DFA)将这种反转视为尺度指数的改变,但DFA无法区分真正的长期记忆动态与产生相同 apparent 指数的短时记忆自回归移动平均(ARMA)过程。本文将完整的八种模型ARFIMA(1,d,1)家族拟合到三个数据集(N=70名受试者)中的步长间隔和步速序列上,涵盖地面行走、固定速度跑步机行走、节拍和视觉提示以及分级位置约束。通过BIC基于的Schwarz权重聚合模型证据,并通过贝叶斯模型平均估计分数差分参数d以及自回归和移动平均系数phi和theta。三个发现浮现:(i) 在持久性和反持久性条件下,长期记忆规范明显优于ARMA替代方案,确立提示步态反持久性为真正的分数现象。(ii) DFA alpha值高估d+0.5的值0.25到0.34单位,这一差异共同归因于DFA将短时记忆成分与长期记忆持久性混淆以及精确ML-ARFIMA估计中固有的有限样本负偏倚。(iii) 估计的(d, phi, theta)参数与一个修正的本体感觉运动模型一致,该模型中分数内在生成器、反应反馈修正和运动延迟成分共同塑造步间波动。是否一个单一的机制模型能定量解释观察到的参数范围在节律、空间和无约束条件下是一个由本文分析所激发但无法单独解决的问题。

英文摘要

Stride-to-stride fluctuations in human walking carry a fractal correlation structure that reverses sign under external cueing: self-paced gait is persistent, whereas metronomic or visually cued gait is anti-persistent. Three decades of detrended fluctuation analysis (DFA) have established this reversal as a scaling-exponent shift, but DFA cannot distinguish genuine long-memory dynamics from short-memory autoregressive moving-average (ARMA) processes that produce the same apparent exponent. We fit the full eight-model ARFIMA(1,d,1) family to stride interval and stride speed series from three datasets (N = 70 subjects) spanning overground walking, fixed-speed treadmill walking, metronomic and visual cueing, and graded positional constraint. Model evidence is aggregated through BIC-based Schwarz weights, and the fractional differencing parameter d together with the autoregressive and moving-average coefficients phi and theta are estimated by Bayesian model averaging. Three findings emerge. (i) Long-memory specifications decisively outweigh ARMA alternatives under both persistent and anti-persistent conditions, establishing cued gait anti-persistence as a genuine fractional phenomenon. (ii) DFA alpha overestimates d + 0.5 by 0.25 to 0.34 units, a discrepancy jointly attributable to short-memory components that DFA conflates with long-memory persistence and to a finite-sample negative bias inherent to exact ML-ARFIMA estimation. (iii) The estimated (d, phi, theta) parameters are consistent with a corrective sensorimotor model in which a fractal intrinsic generator, a reactive feedback correction, and a motor-delay component together shape stride-to-stride fluctuations. Whether a single mechanistic model can account quantitatively for the observed parameter ranges across rhythmic, spatial, and unconstrained conditions is a question that the present analysis motivates but cannot alone resolve.

2604.12930 2026-05-22 cond-mat.soft nlin.AO physics.bio-ph q-bio.SC

Building and maintaining a System of Intracellular Compartments

构建和维持细胞内 compartment 系统

Amit Kumar, Madan Rao

AI总结 本研究通过动态系统方法探讨了内质网囊泡和溶酶体在膜交通连续流中的非平衡组装和嵌入尺寸控制,揭示了内质网组织的两种模型实际上是同一非平衡过程的不同相态,并提出了通过调节糖基化酶和膜融合裂解动态来控制囊泡数量和化学身份的策略。

详情
Comments
58 pages, 27 figures. Supplementary movies available upon request
AI中文摘要

细胞器图案及其遗传性仍然是细胞生物学的核心谜题,突显了遗传继承与自组装之间的根本矛盾。本文探讨了在持续的膜交通流中,内质网囊泡和溶酶体的非平衡组装和嵌入尺寸控制,基于机械化学融合裂解循环的随机框架,该框架违反了详尽平衡。通过动态系统方法,我们识别了从固定点到具有确定相位关系的极限环的不同稳健区域。我们通过多样化的表型识别这些动态区域,从稳定的囊泡到周期性、依赖细胞周期溶解/重组的囊泡到囊泡进展。我们分析了其对系统扰动或驱动协议的动态响应,并做出了可能通过实验测试的明确预测。我们的分析揭示了内质网组织的两种竞争模型——囊泡运输和囊泡进展——实际上是同一非平衡过程的不同相态。我们发现囊泡尺寸稳态是由受融合裂解核驱动的尺寸依赖嵌入控制系统实现的。最后,我们的框架提供了一种通过调节糖基化酶和膜融合裂解动态的相互作用来控制囊泡数量和化学身份的策略。

英文摘要

Organelle patterning and its heritability remain central mysteries in cell biology, highlighting the fundamental tension between genetic inheritance and self-assembly. Here, we explore the nonequilibrium assembly and emdedded size control of the Golgi cisternae and endosomes, amid a continuous flux of membrane traffic, within a stochastic framework of mechanochemical fusion-fission cycles that violate detailed balance. Using a dynamical systems approach, we identify distinct, robust regimes, ranging from fixed points to limit cycles with definite phase relations between cisternae. We identify these dynamical regimes with diverse phenotypes, from stable cisternae to periodic, cell-cycle-dependent dissolution/reassembly of cisternae to cisternal progression. We analyse its dynamic response to systematic perturbations or driving protocols and make definite predictions that may be tested experimentally. Our analysis reveals that the two competing models of Golgi organization - vesicular transport and cisternal progression - are, in fact, two phases of the same underlying nonequilibrium process. We see that cisternal size homeostasis is brought about by a size-dependent embedded control system driven by fusion-fission kernels. Finally, our framework offers a strategy for controlling cisternal number and chemical identity by modulating the interplay between glycosylation enzymes and membrane fission-fusion dynamics.

2603.26974 2026-05-22 q-bio.BM

Recent advances in modeling and simulation of biological phenomena in crowded and cellular environments

生物现象在拥挤和细胞环境中的建模与模拟近期进展

Apoorva Mathur, Vanessa Regina Miranda, Ariane Nunes-Alves

AI总结 研究通过修订最近的计算方法,探讨细胞内拥挤环境中的生物现象,开发了新的模拟方法,能够达到200微秒的模拟时间,以提高对生物现象在体内的理解。

详情
Comments
Updated with comments from reviewers
AI中文摘要

尽管实验和计算机模拟通常在稀释的体外条件下研究生物现象,但这些现象发生在细胞内,一个充满多种大分子的密集环境中。在这里,我们回顾了最近的计算方法,以研究拥挤和细胞环境。蛋白质拥挤体、惰性拥挤体和小分子被用来模拟拥挤。对细胞质模型进行了模拟。开发了新的方法来模拟拥挤系统,达到了200微秒的模拟时间。除了挑战之外,对细胞内生物现象的建模和模拟是一个增长的领域,有潜力提高我们对这些现象在体内的理解。

英文摘要

While experiments and computer simulations to study biological phenomena are usually performed in diluted in vitro conditions, such phenomena happen inside the cell, an environment densely packed with diverse macromolecules. Here, we revise recent computational methods to investigate crowded and cellular environments. Protein crowders, inert crowders and small molecules were used to mimic crowding. Simulations were performed for models of the cytoplasm. New methods were developed to simulate crowded systems, reaching up to 200 microseconds of simulation time. Apart from the challenges, modeling and simulations to investigate biological phenomena inside cells is a growing field, and has a lot of potential to improve our understanding of how such phenomena happen in vivo.

2602.22270 2026-05-22 cs.LG q-bio.PE

Prior Knowledge-enhanced Spatio-temporal Epidemic Forecasting

先验知识增强的时空疫情预测

Sijie Ruan, Jinyu Li, Jia Wei, Zenghao Xu, Jie Bao, Junshi Xu, Junyang Qiu, Shuliang Wang, Xiaoxiao Wang, Hanning Yuan

AI总结 本文提出了一种结合隐式时空先验和显式专家先验的新型混合框架STOEP,通过动态调整区域依赖关系、放大弱信号和机制性预测来提升时空疫情预测的准确性。

详情
Comments
12 pages, 10 figures, accepted to IJCAI 2026
AI中文摘要

时空疫情预测对于公共卫生管理至关重要,但现有方法常面临对弱疫情信号不敏感、空间关系过于简化和参数估计不稳定的问题。为解决这些问题,我们提出了Spatio-Temporal priOr-aware Epidemic Predictor(STOEP),一种新的混合框架,整合了隐式时空先验和显式专家先验。STOEP由三个关键组件组成:(1)病例感知邻接学习(CAL),利用历史感染模式动态调整基于移动性的区域依赖关系;(2)空间指导参数估计(SPE),采用可学习的空间先验来放大弱疫情信号;(3)基于滤波的机制性预测(FMF),使用专家指导的自适应阈值策略来正则化疫情参数。在真实世界中的新冠和流感数据集上进行的广泛实验表明,STOEP在RMSE上比最佳基线高出11.1%。该系统已在中国一个省级CDC部署,以促进后续应用。

英文摘要

Spatio-temporal epidemic forecasting is critical for public health management, yet existing methods often struggle with insensitivity to weak epidemic signals, over-simplified spatial relations, and unstable parameter estimation. To address these challenges, we propose the Spatio-Temporal priOr-aware Epidemic Predictor (STOEP), a novel hybrid framework that integrates implicit spatio-temporal priors and explicit expert priors. STOEP consists of three key components: (1) Case-aware Adjacency Learning (CAL), which dynamically adjusts mobility-based regional dependencies using historical infection patterns; (2) Space-informed Parameter Estimating (SPE), which employs learnable spatial priors to amplify weak epidemic signals; and (3) Filter-based Mechanistic Forecasting (FMF), which uses an expert-guided adaptive thresholding strategy to regularize epidemic parameters. Extensive experiments on real-world COVID-19 and influenza datasets demonstrate that STOEP outperforms the best baseline by 11.1% in RMSE. The system has been deployed at a provincial CDC in China to facilitate downstream applications.

2510.16590 2026-05-22 cs.LG cs.AI q-bio.BM

Atom-anchored LLMs speak Chemistry: A Retrosynthesis Demonstration

原子锚定的大语言模型:化学 retrosynthesis 的演示

Alan Kai Hassen, Andrius Bernatavicius, Antonius P. A. Janssen, Mike Preuss, Gerard J. P. van Westen, Djork-Arné Clevert

AI总结 本研究提出了一种利用通用大语言模型进行分子推理的框架,通过原子标识符将链式推理与分子结构锚定,无需任务特定的模型训练,在单步 retrosynthesis 任务中实现了高成功率。

详情
Comments
Alan Kai Hassen and Andrius Bernatavicius contributed equally to this work
AI中文摘要

在化学领域应用机器学习通常受到标注数据稀缺和昂贵的限制,限制了传统监督方法。在本工作中,我们介绍了一种利用通用大语言模型(LLMs)进行分子推理的框架,该框架无需进行任务特定的模型训练。我们的方法通过使用独特的原子标识符将链式推理锚定到分子结构上。首先,LLM执行零样本任务以识别相关片段及其关联的化学标签或转换类别。在可选的第二步中,这种位置感知信息用于少量样本任务,结合提供的类别示例,预测化学转化。我们将框架应用于单步 retrosynthesis 任务,该任务此前LLMs表现不佳。在学术基准和专家验证的药物发现分子上,我们的工作使LLMs在识别化学上合理的反应位点(≥90%)、命名反应类别(≥40%)和最终反应物(≥74%)方面实现了高成功率。最终,我们的工作建立了一种通用蓝图,用于应用LLMs到分子推理和分子转化是关键的挑战中,将原子锚定的LLMs定位为数据稀缺的化学领域中的强大解决方案。

英文摘要

Applications of machine learning in chemistry are often limited by the scarcity and expense of labeled data, restricting traditional supervised methods. In this work, we introduce a framework for molecular reasoning using general-purpose Large Language Models (LLMs) that operates without requiring task-specific model training. Our method anchors chain-of-thought reasoning to the molecular structure by using unique atomic identifiers. First, the LLM performs a zero-shot task to identify relevant fragments and their associated chemical labels or transformation classes. In an optional second step, this position-aware information is used in a few-shot task with provided class examples to predict the chemical transformation. We apply our framework to single-step retrosynthesis, a task where LLMs have previously underperformed. Across academic benchmarks and expert-validated drug discovery molecules, our work enables LLMs to achieve high success rates in identifying chemically plausible reaction sites ($\geq90\%$), named reaction classes ($\geq40\%$), and final reactants ($\geq74\%$). Ultimately, our work establishes a general blueprint for applying LLMs to challenges where molecular reasoning and molecular transformations are key, positioning atom-anchored LLMs as a powerful solution for data-scarce chemistry domains.

2505.22749 2026-05-22 q-bio.NC cs.AI cs.LG cs.NE

Self-orthogonalizing attractor neural networks emerging from the free energy principle

从自由能原理中涌现的自正交吸引子神经网络

Tamas Spisak, Karl Friston

AI总结 本文基于自由能原理,研究了自组织动力学如何从随机动力系统的基本原理中涌现,提出了一种无需显式学习和推断规则的高效且生物合理的方法,实现了多层贝叶斯主动推断过程,通过分析和模拟证明了所提网络倾向于产生近似正交化的吸引子表示,从而提升泛化能力和隐变量与可观测效应间的互信息。

详情
Journal ref
Neurocomputing (2026): 133472
Comments
27 pages main text, 8 pages appendix, 7 figures; interactive manuscript available at: https://pni-lab.github.io/fep-attractor-network Associated GitHub repository: https://github.com/pni-lab/fep-attractor-network
AI中文摘要

吸引子动力学是许多复杂系统,包括大脑的特征。理解这些自组织动力学如何从基本原理中涌现对于推进对神经计算和人工智能系统设计的理解至关重要。本文正式阐述了如何将自由能原理应用于随机动力系统的通用划分,从而推导出吸引子网络的形成机制。我们的方法消除了显式学习和推断规则的需要,并识别出这些自组织系统中涌现的、高效且生物合理的推断和学习动力学。这些结果导致了一个集体、多层次的贝叶斯主动推断过程。自由能景观上的吸引子编码先验信念;推断将感官数据整合到后验信念中;学习则微调耦合以最小化长期的惊讶。通过分析和模拟,我们证明所提出的网络倾向于产生近似正交化的吸引子表示,这是同时优化预测准确性和模型复杂性所导致的后果。这些吸引子能够高效地覆盖输入子空间,提升泛化能力和隐变量与可观测效应间的互信息。此外,尽管随机数据呈现导致对称且稀疏的耦合,但序列数据则促进不对称耦合和非平衡稳态动力学,提供了对传统玻尔兹曼机的自然扩展。我们的发现为自组织吸引子网络提供了统一的理论,为人工智能和神经科学提供了新的见解。

英文摘要

Attractor dynamics are a hallmark of many complex systems, including the brain. Understanding how such self-organizing dynamics emerge from first principles is crucial for advancing our understanding of neuronal computations and the design of artificial intelligence systems. Here we formalize how attractor networks emerge from the free energy principle applied to a universal partitioning of random dynamical systems. Our approach obviates the need for explicitly imposed learning and inference rules and identifies emergent, but efficient and biologically plausible inference and learning dynamics for such self-organizing systems. These result in a collective, multi-level Bayesian active inference process. Attractors on the free energy landscape encode prior beliefs; inference integrates sensory data into posterior beliefs; and learning fine-tunes couplings to minimize long-term surprise. Analytically and via simulations, we establish that the proposed networks favor approximately orthogonalized attractor representations, a consequence of simultaneously optimizing predictive accuracy and model complexity. These attractors efficiently span the input subspace, enhancing generalization and the mutual information between hidden causes and observable effects. Furthermore, while random data presentation leads to symmetric and sparse couplings, sequential data fosters asymmetric couplings and non-equilibrium steady-state dynamics, offering a natural generalization of conventional Boltzmann Machines. Our findings offer a unifying theory of self-organizing attractor networks, providing novel insights for AI and neuroscience.

2605.22075 2026-05-22 cs.LG q-bio.QM

Can Breath Biomarkers Causally Influence Blood Glucose? Investigating VOC-Mediated Modulation in Diabetes

呼吸生物标志物能否因果影响血糖?探讨VOC介导的糖尿病调节

Varsha Sharma, Prasanta K. Guha, Avik Ghose

AI总结 本研究通过非侵入式数据驱动框架,利用挥发性有机化合物(VOCs)和生活方式变量识别糖尿病高风险个体,采用因果推断技术估计VOCs如乙酮、异丙醇、异戊二烯和乙醇对血糖水平的影响,并设计分类器区分糖尿病患者与非糖尿病患者,建立基于风险的排名系统和高斯混合模型识别自然聚类。

详情
Journal ref
Proceedings of the IJCAI workshop on Advanced Neural Systems for Next-Generation Biomedical Intelligence, 2025
AI中文摘要

糖尿病是一种全球健康负担,早期检测对于及时干预至关重要。本研究探讨了一种非侵入式、数据驱动的框架,利用挥发性有机化合物(VOCs)和生活方式变量识别糖尿病高风险个体。我们使用因果推断技术估计乙酮、异丙醇、异戊二烯和乙醇等VOCs对血糖水平的影响。此外,我们设计了一个分类器,利用非侵入式标志物区分糖尿病患者和非糖尿病患者。我们为“灰色区域”中的个体建立了基于风险的排名系统,并使用高斯混合模型识别人群中的自然聚类。我们的结果表明,特定的VOCs对血糖水平表现出强因果影响,且机器学习模型能够可靠地分类和分层高风险个体。这种集成的因果-可解释分析可以支持非侵入式糖尿病早期筛查工具的开发。

英文摘要

Diabetes is a global health burden, and early detection is critical for timely intervention. This study explores a non-invasive, data-driven framework to identify individuals at risk of diabetes using Volatile Organic Compounds (VOCs) and lifestyle variables. We use causal inference techniques to estimate the impact of VOCs such as acetone, isopropanol, isoprene, and ethanol on blood glucose levels. Additionally, we designed a classifier to distinguish diabetics from non-diabetics using non-invasive markers. We created a risk-based ranking system for individuals in the "gray zone," and identified natural clusters in the population using Gaussian Mixture Model. Our results suggest that specific VOCs exhibit a strong causal influence on glucose levels and that machine learning models can reliably classify and stratify individuals at high risk. This integrated causal-explainable analysis can support the development of tool for non-invasive early screening of diabetes.

2605.22009 2026-05-22 cs.CE physics.med-ph q-bio.QM

SDFStent: Real-time interactive virtual stenting via SDF deformation fields

SDFStent: 通过SDF变形场实现实时交互式虚拟支架植入

Bohan J. Li, Nicholas C. Dorn, Andras Lasso, Matthew A. Jolley, Jeffrey A. Feinstein, Doug L. James, Alison L. Marsden

AI总结 本文提出了一种基于SDF的网格变形方法,用于快速生成术后支架植入的模拟模型,该方法在实时交互速度下保持网格完整性并避免自相交,通过与临床测量数据对比验证了其准确性。

详情
Comments
39 pages, 12 figures, 4 tables. Under review at Computer Methods and Programs in Biomedicine
AI中文摘要

支架植入是治疗先天性心脏病(CHD)最常见的经血管介入手术之一。患者特定的计算流体动力学(CFD)模拟可以预测介入场景的血流动力学结果,但需要术后血管几何形状,这些形状反映了支架引起的形态变化,而现有工具要么建模不足,要么需要大量时间和手动工作来生成。我们提出了SDFStent,一种基于符号距离函数(SDF)的网格变形方法,用于虚拟支架植入,能够在实时交互速度下运行,保持网格完整性并保留节点几何形状。支架被建模为由分段胶囊SDF组成的管道表面,通过平滑最小值操作符连接。靠近扩展SDF表面的网格顶点沿着SDF梯度移动,使用具有紧凑支持区域的衰减函数和alpha混合掩码。SDFStent在三个现有方法上进行了基准测试,并在三个法洛四联症(ToF)患者和三个主动脉缩窄(CoA)患者上进行了验证,使用刚性壁稳态CFD模拟与临床导管测量数据对比。在给定直径为6.0 mm的情况下,该方法在1.5秒内产生了平均支架直径为5.92±0.08 mm,比最佳支架特定比较器快100倍以上。所有输出网格都是封闭的且无自相交。CFD模拟的术后压力降与临床测量数据在4 mmHg内一致(平均误差2 mmHg)。SDFStent生成的模拟准备术后支架模型在交互速度下匹配规定的支架尺寸,仅使用术前解剖和导管数据即可。该实现是开源的,并可在3D Slicer中获得。其脚本化架构使能够自动化生成大规模合成队列用于数据驱动的替代建模。

英文摘要

Stenting is among the most common transcatheter interventions for congenital heart disease (CHD). Patient-specific computational fluid dynamics (CFD) simulations can predict hemodynamic outcomes of intervention scenarios but require post-operative vascular geometries that reflect stent-induced shape changes, which existing tools either model inadequately or require extensive time or manual effort to generate. We present SDFStent, a signed distance function (SDF) based mesh deformation method for virtual stenting that operates in real time, maintains mesh integrity, and preserves junction geometry. The stent is modeled as a pipe surface composed of piecewise-capsule SDFs joined by a smooth-minimum operator. Mesh vertices near the expanding SDF surface are displaced along the SDF gradient with a compactly supported fall-off function and an alpha blending mask. SDFStent was benchmarked against three existing approaches and validated on three tetralogy of Fallot (ToF) patients and three coarctation of the aorta (CoA) patients using rigid-wall steady-state CFD simulations against clinical catheterization measurements. Against a prescribed diameter of 6.0 mm, the method produced a mean stented diameter of 5.92 $\pm$ 0.08 mm in 1.5 s, over 100$\times$ faster than the best stenting-specific comparator. All output meshes were watertight and self-intersection-free. CFD-simulated post-operative pressure drops agreed with clinical measurements within 4 mmHg (mean error 2 mmHg). SDFStent produces simulation-ready post-stent models that match prescribed stent dimensions at interactive speeds, from pre-operative anatomy and catheterization data alone. The implementation is open-source and available in 3D Slicer. Its scriptable architecture enables automated generation of large synthetic cohorts for data-driven surrogate modeling.

2605.21945 2026-05-22 q-bio.MN

A Characterization of Level-k Realizability for Clustering Systems

关于聚类系统level-k可实现性的特征

Shilong Dai, Yangjing Long

AI总结 本文研究了聚类系统在特定条件下的level-k可实现性,通过Hasse图特征确定了硬编码聚类系统与rooted level-k网络的关系,并给出了一个基于非平凡块的参数μ(B)的判定条件。

详情
Comments
33pages
AI中文摘要

我们给出了一个Hasse图特征,用于确定有限taxa集上的聚类系统C在rooted level-k网络的硬编码聚类系统C_N中的条件。对于H=H[C]的每个非平凡块B,我们定义参数μ(B),使用生成B内所有重叠交集的最小簇家族。主定理证明,存在一个rooted level-k网络N,使得C_N=C当且仅当对于每个非平凡块B,μ(B)≤k。必要性证明显示,在任何实现块中,重叠交集片段必须由非根杂交顶点表示。充分性证明是构造性的:从Hasse图开始,迭代分割选定的杂交顶点,保持硬编码聚类系统,并最终得到一个level不超过块内μ值的实现。

英文摘要

We give a Hasse-diagram characterization of when a clustering system $\mathcal C$ on a finite taxa set $X$ is the hardwired clustering system $C_N$ of a rooted level-$k$ network. For each non-trivial block $B$ of $H=\mathcal H[\mathcal C]$, we define a parameter $μ(B)$ using minimum families of clusters that generate all overlap-intersections inside $B$. The main theorem proves that there exists a rooted level-$k$ network $N$ with $C_N=\mathcal C$ if and only if $μ(B)\le k$ for every non-trivial block $B$ of $H$. The necessity proof shows that overlap-intersection pieces must be represented by non-root hybrid vertices in any realizing block. The sufficiency proof is constructive: starting from the Hasse diagram, it iteratively splits selected hybrid vertices, preserves the hardwired clustering system, and terminates with a realization whose level is bounded by the block-wise values of $μ$.

2605.21859 2026-05-22 q-bio.PE cs.LG q-bio.QM

PhylaFlow: Hybrid Flow Matching in Billera-Holmes-Vogtmann Tree Space for Phylogenetic Inference

PhylaFlow:在Billera-Holmes-Vogtmann树空间中进行混合流匹配用于系统发育推断

Yasha Ektefaie, Leo Cui, Shrey Jain, Marinka Zitnik, Pardis Sabeti

AI总结 该研究提出PhylaFlow模型,通过在Billera-Holmes-Vogtmann树空间中学习后验盆地运输,实现混合流匹配,从而提高系统发育推断的效率和准确性。

详情
Comments
9 pages, 3 figures
AI中文摘要

系统发育树是混合对象:分支长度连续变化,而拓扑结构通过边收缩和扩展离散变化。Billera-Holmes-Vogtmann(BHV)树空间提供了这种结构的规范几何表示,将每个解析拓扑表示为欧几里得正交ant,并将拓扑变化表示为在共享的低维边界上移动。我们引入PhylaFlow,一种混合流匹配模型,该模型在BHV空间中学习后验盆地运输。PhylaFlow在BHV测地路径上训练,从随机起始树到短程后验样本,将连续分支长度运动与学习到的边界事件和离散拓扑转换耦合在一起。我们通过操作性评估所学的几何运算:如果流到达后验相关区域,则有限预算的贝叶斯细化,从或由其终端树初始化或引导,应能更有效地恢复后验支持的拓扑。在DS1-DS8系统发育后验基准上,PhylaFlow相对于经典初始化显著减少了初始Tree-KL。在有限预算的MrBayes细化后,直接PhylaFlow在大多数数据集上改进了早期和中期拓扑恢复轨迹,而split-guided PhylaFlow-MCMC在最困难的案例中取得了最强的结果。最好的PhylaFlow变体在八种数据集中的七种上优于短预热,并在八种数据集中的五种上优于PhyloGFN。在联合序列条件实验中,序列嵌入引导后验分裂恢复,尽管精确的后验拓扑恢复仍处于初步阶段。这些结果表明,混合流匹配可以学习BHV树空间中的可操作运输,并为贝叶斯系统发育推断提供几何感知的提议机制。

英文摘要

Phylogenetic trees are hybrid objects: branch lengths vary continuously, while topologies change discretely through edge contractions and expansions. Billera-Holmes-Vogtmann (BHV) tree space provides a canonical geometry for this structure, representing each resolved topology as a Euclidean orthant and topological changes as motion across shared lower-dimensional boundaries. We introduce PhylaFlow, a hybrid flow-matching model that learns posterior-basin transport in BHV tree space. PhylaFlow is trained on BHV geodesic paths from random starting trees to short-run posterior samples, coupling continuous branch-length motion within orthants with learned boundary events and discrete topology transitions. We evaluate the learned geometry operationally: if the flow reaches posterior-relevant regions, finite-budget Bayesian refinement initialized from, or guided by, its terminal trees should recover posterior-supported topologies more efficiently. Across DS1-DS8 phylogenetic posterior benchmarks, PhylaFlow substantially reduces initial Tree-KL relative to classical initializers. After finite-budget MrBayes refinement, direct PhylaFlow improves early and intermediate topology-recovery trajectories on most datasets, while split-guided PhylaFlow-MCMC obtains the strongest hard-case results. The best PhylaFlow variant outperforms short-warmup on seven of eight datasets and PhyloGFN on five of eight under the same refinement budget. In a joint sequence-conditioned experiment, sequence embeddings steer posterior split recovery, although exact posterior topology recovery remains preliminary. These results show that hybrid flow matching can learn actionable transport in BHV tree space and provide a geometry-aware proposal mechanism for Bayesian phylogenetic inference.

2605.21787 2026-05-22 q-bio.PE

Drivers of Transient Dynamics and Persistence in Dengue: Insights from Sensitivity and Stochastic Modeling

影响登革热暂时动态和持续性的因素:来自敏感性和随机建模的见解

Cesar Alberto Rosales-Alcantar, Marcos A. Capistrán

AI总结 本文通过敏感性和随机建模研究了关键流行病学参数如何影响季节性流行和登革热传播的持续性,揭示了参数重要性排名,并为公共卫生政策优先级提供依据。

详情
AI中文摘要

我们研究了关键流行病学参数如何塑造季节性流行和登革热传播的持续性。我们的发现确认了已知的流行病学驱动因素,并在我们的登革热模型中引入了参数重要性的排名,从而指导公共卫生政策的优先级。我们提出了一种具有衰减免疫力、外源性感染和垂直传播的随机向量-宿主模型。为了评估参数影响,我们首先对宏观模型进行了定性分析。然后我们对流行病总结统计量进行了多变量Sobol敏感性分析,并检查了内稳态平衡的方差作为模型参数函数。我们证明了宏观模型是合理的,垂直传播降低了持续性的阈值,低空间耦合增加了感染内稳态平衡。向量-宿主人口比和宿主恢复率具有最大的一阶和总敏感性指数,超过了接触率;这表明在季节性登革热期间,控制措施应优先保护感染性宿主免受蚊虫叮咬。最后,我们证明了在接触率平面中,宿主和向量在内稳态平衡处的协方差是异步的。这种稳健的模式具有流行病学、生态学和进化学解释。一种登革热株在持续性时期有两个生态位可利用,共存的株各自有两个生态位。此外,在持续性时期某一株的大波动提供了高垂直传播的机制解释,使得病毒库能够孵化并触发下一季节的爆发。我们主张,我们的模型和结果可以适应特定的公共卫生问题,以指导利用现场数据的登革热控制。

英文摘要

We investigate how key epidemiological parameters shape both seasonal epidemics and the persistence of dengue transmission. Our findings confirm known mechanistic drivers of epidemic variability and introduce a ranking of parameter importance in our dengue model, which in turn informs the prioritization of public health policies. We propose a stochastic vector-host model with waning immunity, exogenous infection, and vertical transmission. To assess parameter influence, we first qualitatively analyze the macroscopic model. We then perform a multivariate Sobol sensitivity analysis of epidemic summary statistics, and examine the variance of the endemic equilibrium as a function of model parameters. We show that the macroscopic model is well posed, vertical transmission lowers the threshold for persistence, and low spatial coupling increases infectious endemic equilibria. The vector-host population ratio and host recovery rate have the largest first-order and total sensitivity indices, surpassing the contact rates; this implies that control measures during seasonal dengue should prioritize protecting infectious hosts from mosquito bites. Finally, we show that the covariance of hosts and vectors at the endemic equilibrium is asynchronous in the contact-rate plane. This robust pattern has epidemiological, ecological and evolutive interpretations. A dengue strain has two niches to exploit during the endemic regime, and coexisting strain have two niches each. Moreover, large fluctuations in a given strain during the endemic regime provide a mechanistic explanation for high vertical transmission, enabling viral reservoirs that can hatch and trigger outbreaks in the following season. We argue that our model and results can be adapted to address specific public health questions to guide dengue control using field data.

2605.21725 2026-05-22 q-bio.PE math.CO

Regularizing and Normalizing DAGs and Phylogenetic Networks

对DAG和系统发育网络的正则化与规范化

Marc Hellmuth, Anna Lindeberg, Vincent Moulton

AI总结 本文研究了如何通过正则化和规范化方法简化DAG和系统发育网络,核心方法是基于LCAs和可见性的简化过程,主要贡献是提出了i-正则化方法并统一了不同简化框架。

详情
AI中文摘要

系统发育网络和更一般的有向无环图(DAG)在存在如杂交或水平基因转移等网状进化事件时,能够表示比树更复杂的层次结构。一个核心问题是:哪些图的组成部分对于叶可观察信息是必要的,哪些可以被移除而不改变这一信息。解决这个问题可以导致系统发育网络的系统简化方法,例如Francis等人最近的规范化方法。在本文中,我们从三个相关角度研究这个问题:DAG显示的聚类、子集的最近公共祖先(LCAs)以及可见性,一种基于路径的顶点属性。我们首先引入了一种基于LCAs的简化过程,称为i-正则化。对于DAG G 和 i≥1,DAG reg_i(G) 保留恰好那些作为叶子集大小不超过i的唯一LCAs的顶点,通过图编辑操作ominus移除其余非叶顶点,然后删除捷径。我们证明reg_i(G) 保留所有此类LCAs,是i-LCA相关的,并且具有集群级别的描述:它正则,即与相应的LCA聚类的Hasse图同构。然后我们比较基于LCAs的正则化与规范化。使用相同的ominus操作符,我们描述了规范化背后的覆盖构造,识别出那些尽管可见却被移除的顶点,并且确定正则化和规范化何时一致。这些结果共同提供了一个统一的框架,用于DAG和系统发育网络的基于聚类、基于LCAs和基于可见性的简化。

英文摘要

Phylogenetic networks and, more generally, directed acyclic graphs (DAGs) represent hierarchical structure beyond trees, for instance in the presence of reticulate evolutionary events such as hybridization or horizontal gene transfer. A central question is which parts of such graphs are essential with respect to leaf-observable information, and which parts can be removed without changing this information. Resolving this question can lead to principled simplification methods for phylogenetic networks, such as the recent normalization approach of Francis et al. In this paper, we study this question from three related perspectives: clusters displayed by a DAG $G$, least common ancestors (LCAs) of subsets of its leaf set, and visibility, a path-based property of vertices. We first introduce an LCA-based simplification procedure called $i$-regularization. For a DAG $G$ and $i\geq 1$, the DAG $\reg_i(G)$ retains precisely those vertices that occur as unique LCAs of leaf subsets of size at most $i$, removes the remaining non-leaf vertices by a graph-editing operation $\ominus$, and then deletes shortcuts. We show that $\reg_i(G)$ preserves all such LCAs, is $i$-lca-relevant, and admits a cluster-level description: it is regular, i.e., isomorphic to the Hasse diagram of the corresponding lca-clusters. We then compare LCA-based regularization with normalization. Using the same $\ominus$-operator, we describe the cover construction underlying normalization, identify visible vertices that are nevertheless removed, and characterize when regularization and normalization coincide. Together, these results provide a unified framework for cluster-based, LCA-based, and visibility-based simplifications of DAGs and phylogenetic networks.

2605.21634 2026-05-22 q-bio.GN

bioETH-PRS: Confidential Polygenic Risk Scoring without a Trusted Evaluator via Fully Homomorphic Encryption on a Programmable Blockchain

bioETH-PRS:通过完全同态加密在可编程区块链上实现无需可信评估者的保密多基因风险评分

Kimon Antonios Provatas, Christos Galanopoulos, Ilias Georgakopoulos-Soares

AI总结 本研究提出了一种基于完全同态加密的可编程区块链协议bioETH-PRS,用于在不依赖可信评估者的情况下实现保密的多基因风险评分计算,通过整数精确的TFHE方案在加密域内计算PRS点积,同时保护基因型剂量向量和GWAS权重向量的隐私。

详情
Comments
12 pages, 6 figures
AI中文摘要

多基因风险评分(PRSs)通过聚合遗传效应估计来预测疾病易感性,但临床部署通常会将原始基因组数据暴露给第三方计算基础设施。先前的同态加密方法仍然需要信任一个指定的评估者。我们提出了bioETH-PRS协议,该协议将评估者的角色替换为区块链上的不可变智能合约,该区块链支持完全同态加密(fhEVM)。使用整数精确的TFHE方案,bioETH-PRS在加密域内完全计算PRS点积,整个执行过程中保持基因型剂量向量和GWAS权重向量的隐私。我们引入了一种三步固定点量化方案,用于将带符号的GWAS权重表示为无符号的64位整数,实现了在验证的固定装置上达到机器精度的重建精度。一个四合同架构将数据保管、模型发布、计算和输出发布分开,并支持经典分块路径和流式路径,后者将模拟测量的gas减少37%。链上噪声输出Oracle发出加密的噪声分数句柄和公开可解密的三元类别,减少原始分数暴露和探测风险。在真实的GWAS固定装置上的原型评估证实了线性gas扩展,并表明该方法可能在低gas部署环境中具有成本竞争力。

英文摘要

Polygenic risk scores (PRSs) aggregate genetic effect estimates to predict disease susceptibility, yet clinical deployment often exposes raw genotype data to third-party compute infrastructure. Prior homomorphic-encryption approaches, still require trust in a designated evaluator. We present bioETH-PRS, a protocol that replaces that evaluator role with immutable smart contracts on a blockchain supporting Fully Homomorphic Encryption (fhEVM). Using the integer-exact TFHE scheme, bioETH-PRS computes the PRS dot product entirely within the encrypted domain, keeping both genotype dosage vectors and GWAS weight vectors hidden from external parties throughout execution. We introduce a three-step fixed-point quantisation scheme for representing signed GWAS weights as unsigned 64-bit integers, achieving machine-epsilon reconstruction accuracy on validated fixtures. A four-contract architecture separates data custody, model publication, computation, and output release, and supports both a classic chunked path and a streaming path, with the latter reducing mock-measured gas by 37%. An on-chain noisy output oracle emits an encrypted noisy-score handle and a publicly decryptable ternary category, reducing raw score exposure and probing risk. Prototype evaluation on real GWAS fixtures confirms linear gas scaling and suggests that the approach may be cost-competitive in low-gas deployment environments.

2605.21522 2026-05-22 q-bio.QM cs.AI cs.CE cs.LG stat.ML

Protein Thoughts: Interpretable Reasoning with Tree of Thoughts and Embedding-Space Flow Matching for Protein-Protein Interaction Discovery

蛋白质思想:基于树 of 思维和嵌入空间流匹配的可解释推理用于蛋白质-蛋白质相互作用发现

Kingsley Yeon, Xuefeng Liu, Promit Ghosal

AI总结 本文提出了一种可解释的蛋白质-蛋白质相互作用发现框架,通过显式推理将PPI发现转化为可解释的搜索问题,利用嵌入空间流匹配和树 of 思维搜索方法提升预测精度和可解释性。

详情
AI中文摘要

蛋白质-蛋白质相互作用(PPIs)调控几乎所有细胞过程,但计算方法通常产生排名预测而缺乏机理解释。这限制了其应用,因为生物学家无法判断预测是否反映真实的生化见解或偶然相关性。我们提出了Protein Thoughts框架,将PPI发现重新表述为可解释的搜索问题。该系统将结合证据分解为四个生物意义的信号:序列相似性反映进化关系,结构互补性捕捉几何契合,界面平衡,以及化学兼容性编码残基级相互作用。而不是将这些信号合并为一个模糊的分数,我们通过透明的价值函数保留每个信号的贡献,从而实现排序和审计。为了高效地导航大规模候选空间,我们引入了假设引导的熵正则化树 of 思维搜索。微调的语言模型从嵌入衍生的特征生成搜索指令,将候选者分类为高优先级、探索性或可跳过。这些指令条件化一个玻尔兹曼策略,平衡利用与熵驱动的探索,同时假设意识修剪防止提前放弃有前途的候选者。对于表现出评分分歧的候选者,假设条件的嵌入空间流匹配将蛋白质嵌入推向结合者流形。在SHS148k基准测试中,Protein Thoughts实现了平均最佳结合体排名为11.2,比熵树搜索基线的47.7提高了76%,在结合预测中,训练的价值函数实现了91.08±0.19 Micro-F1,优于现有PPI方法在同一数据集上的表现。

英文摘要

Protein-protein interactions (PPIs) govern nearly all cellular processes, yet computational methods for identifying binding partners typically produce ranked predictions without mechanistic justification. This creates a fundamental barrier to adoption because biologists cannot assess whether predictions reflect genuine biochemical insight or spurious correlations. We present \textbf{Protein Thoughts}, a framework that reformulates PPI discovery as an interpretable search problem with explicit reasoning. The system decomposes binding evidence into four biologically meaningful signals: sequence similarity reflecting evolutionary relationships, structural complementarity capturing geometric fit, interface balance, and chemical compatibility encoding residue-level interactions. Rather than collapsing these signals into an opaque score, we preserve their individual contributions through a transparent value function that enables both ranking and auditing. To navigate large candidate spaces efficiently, we introduce hypothesis-guided entropy-regularized Tree-of-Thoughts search. A fine-tuned language model generates search directives from embedding-derived features, classifying candidates as high-priority, exploratory, or skippable. These directives condition a Boltzmann policy that balances exploitation with entropy-driven exploration, while hypothesis-aware pruning prevents premature abandonment of promising candidates. For candidates exhibiting score disagreement, hypothesis-conditioned embedding-space flow matching transports protein embeddings toward the binder manifold. On the SHS148k benchmark, Protein Thoughts achieves mean best-binder rank of 11.2 versus 47.7 for an entropic tree search baseline, a 76% improvement, and for binding prediction the trained value function achieves $91.08 \pm 0.19$ Micro-F1, outperforming existing PPI methods on the same dataset.

2605.21506 2026-05-22 q-bio.NC cs.NE

Canonical Functionalism: Defining Functional Structure without Observer-Relative Semantic Maps

规范功能主义:在不依赖观察者相对语义图谱的情况下定义功能结构

Ryota Kanai, Shuqin Ma

AI总结 本文提出一种数学上的功能主义 refinement,通过定义系统的规范功能结构来避免依赖观察者相对的物理系统解释,核心在于通过识别内部状态的未来行为来定义功能结构,而非任意输入输出映射或外部计算描述。

详情
AI中文摘要

关于意识的计算功能主义常常受到批评,因为它依赖于对物理系统的观察者相对解释。本文提出了一种功能主义的数学细化,以避免这一问题。核心思想是,与任意输入输出映射、语义标签或外部强加的计算描述不同,意识相关功能组织应通过系统的规范功能结构来识别:即通过识别在所有可能延续下具有相同未来行为的内部状态所获得的最小状态转换结构。在这种观点下,一个状态通过其完整的反事实角色来功能性定义:即在可能的未来互动中,系统如何演变和响应。我们称之为规范功能主义。该框架不声称确定哪些系统是意识的,也不证明功能组织对意识是充分的。相反,它确定了功能主义意识理论应基于的规范对象:任务是指定意识相关不变量、度量或结构条件,而不是任意语义解释或表面行为特征。这重新界定了熟悉的反对意见,如查找表、模拟、展开和观察者相对计算:这些案例本身并不驳斥功能主义,但迫使功能主义者说明相关规范结构是否被保留,如果未被保留,则哪些额外的结构特征缺失。

英文摘要

Computational functionalism about consciousness is often criticized for relying on observer-relative interpretations of physical systems. This paper proposes a mathematical refinement of functionalism that avoids this problem. The central idea is that consciousness-relevant functional organization should be identified not with arbitrary input-output mappings, semantic labels, or externally imposed computational descriptions, but with a system's canonical functional structure: the minimal state-transition structure obtained by identifying internal states that have identical future behavior under all possible continuations. On this view, a state is functionally defined by its complete counterfactual role: how the system would evolve and respond from that state under possible future interactions. We call this position canonical functionalism. The framework does not claim to identify which systems are conscious, nor to show that functional organization is sufficient for consciousness. Rather, it identifies the canonical object over which functionalist theories of consciousness should be formulated: the task is to specify consciousness-relevant invariants, measures, or structural conditions over canonical functional structures, rather than over arbitrary semantic interpretations or superficial behavioral profiles. This reframes familiar objections about lookup tables, simulations, unfolding, and observer-relative computation: such cases do not by themselves refute functionalism, but force the functionalist to specify whether the relevant canonical structure is preserved, and if not, which additional structural features are missing.

2605.21502 2026-05-22 q-bio.MN cs.AI cs.LG

Graph neural network explanations reveal a topological signature of disease-associated hubs in biological networks

图神经网络解释揭示了生物网络中与疾病相关的枢纽的拓扑特征

Kyle Higgins, Ivan Laponogov, Dennis Veselkov, Kirill Veselkov

AI总结 本文研究了图神经网络在生物网络中识别疾病相关结构的方法,发现不同解释方法在稀疏单节点驱动和分布式路径信号中有不同的表现,并提出了一种结合壳层枢纽评分和解释器共识排名的框架,提升了对癌症基因的优先级排序和生物学相关分子的恢复能力。

详情
Comments
25 pages (excluding supplement), 7 figures, 7 supplementary tables
AI中文摘要

图神经网络(GNNs)越来越多地用于建模生物系统,但后验解释方法恢复有意义的分子机制的可靠性仍不清楚。本文系统评估了四种广泛使用的解释方法:显著性归因(SA)、集成梯度(IG)、GNNExplainer 和层间相关传播(LRP),以识别乳腺癌RNA-seq数据在蛋白质-蛋白质相互作用网络上的疾病相关结构。通过合成基准测试,我们发现解释方法恢复了不同的信号组织:SA在稀疏单节点驱动方面表现最佳,而IG和LRP更倾向于恢复分布式的路径样和级联样信号。在TCGA BRCA数据中,我们识别出一种一致的拓扑特征,即疾病相关枢纽的归因在最近的1跳邻居中达到峰值,并在后续网络壳层中衰减,这种模式在IG和LRP中最为显著,并与已知癌症枢纽的强富集相关。我们进一步观察到局部枢纽富集与全局基因排名性能之间的权衡,IG优化局部富集,而SA在全局区分方面表现更优。受这些互补行为的启发,我们提出了一种结合基于壳层的枢纽评分和解释器共识排名的框架。共识评分提高了对经典癌症基因(TP53、BRCA1、ESR1、MYC)的优先级排序,减少了对节点度数的依赖,并且在调优时优于单独的方法。通路富集进一步揭示了对生物上一致的癌症程序的改进恢复,包括ERBB2、RTK、MAPK、免疫和细胞因子信号。这些结果表明,拓扑感知的图解释整合可以提高生物可解释性和生物相关分子的恢复能力。

英文摘要

Graph neural networks (GNNs) are increasingly used to model biological systems, yet the reliability of post-hoc explanation methods for recovering meaningful molecular mechanisms remains unclear. Here, we systematically evaluate four widely used approaches: Saliency Attribution (SA), Integrated Gradients (IG), GNNExplainer, and Layer-wise Relevance Propagation (LRP) for identifying disease-relevant structure in breast cancer RNA-seq data projected onto a protein-protein interaction network. Using synthetic benchmarks with known ground-truth motifs, we show that explanation methods recover distinct signal organizations: SA performs best for sparse single-node drivers, whereas IG and LRP preferentially recover distributed pathway-like and cascade-like signals. In TCGA BRCA data, we identify a consistent topological signature of disease-associated hubs in which attribution peaks in the immediate 1-hop neighborhood and decays across successive network shells, a pattern most pronounced for IG and LRP and associated with strong enrichment of known cancer hubs. We further observe a trade-off between local hub enrichment and global gene ranking performance, with IG optimizing local enrichment and SA achieving superior global discrimination. Motivated by these complementary behaviors, we introduce a framework combining a shell-based hub score with consensus ranking across explainers. Consensus scores improve prioritization of canonical cancer genes (TP53, BRCA1, ESR1, MYC), reduce dependence on node degree, and, especially when tuned, outperform individual methods. Pathway enrichment further reveals improved recovery of biologically coherent cancer programs, including ERBB2, RTK, MAPK, immune, and cytokine signaling. Together, these results demonstrate that topology-aware integration of graph explanations can improve biological interpretability and biologically relevant molecular recovery.

2605.16108 2026-05-22 stat.ME q-bio.QM stat.AP

Estimating Association Between Paired Outcomes in Clustered Data with Informative Subgroup Size

在具有信息性子组大小的数据中估计配对结果之间的关联

Owen Visser, Somnath Datta

AI总结 本文提出三种加权估计方法,用于估计集群数据中配对结果之间的边际关联,通过引入基于集群内重采样的权重,扩展了逆集群大小和子组大小加权方法,并利用Stouffer方法改进了现有的ISS检验过程,以减少计算负担。

详情
AI中文摘要

信息性集群大小(ICS)和信息性子组大小(ISS)当观测单位的数量或其在结果定义类别中的分布与研究结果相关时,会扭曲边际关联估计。这一问题在配对结果中尤为相关,因为观测到的关联可能依赖于集群大小、配对类别组成以及单位成为分析对象的过程。我们提出三种加权估计方法,用于估计集群数据中配对结果之间的边际关联。权重来源于集群内重采样的论点,并扩展了逆集群大小和子组大小加权到配对结果类别。我们还通过利用Stouffer的方法改进了现有的ISS检验过程,以减少计算负担。为了评估这些方法,我们开发了一个模拟器,用于集群配对结果,该模拟器分离了单元级关联、潜在集群级关联和结果依赖的保留。模拟显示,基于配对的加权方法在关联通过单元级依赖和子组组成信息性时可以减少偏差,但会减弱由潜在集群级结构携带的关联。典型逆集群加权方法在关联主要为集群级时仍更稳定。对NHANES口腔健康数据的应用显示,总体上存在小的正周期牙和龋齿关联,填充表面结果显示出更强的ISS证据和更高的敏感性,比龋坏表面结果更受基于配对的加权影响。这些结果表明,ICS和ISS下的边际关联应根据关联来源、观测单位结构和选择加权方案的假设进行解释。

英文摘要

Informative cluster size (ICS) and informative subgroup size (ISS) can distort marginal association estimates when the number of observed units, or their distribution across outcome-defined categories, is related to the outcomes under study. This issue is especially relevant for paired outcomes, where the observed association can depend on cluster size, paired-category composition, and the process by which units become available for analysis. We propose three weighted estimating approaches for marginal association between paired outcomes in clustered data. The weights are derived from within-cluster resampling arguments and extend inverse cluster-size and subgroup-size weighting to paired outcome categories. We also modify an existing ISS testing procedure by utilizing Stouffer's method to reduce computational burden. To evaluate the methods, we develop a simulator for clustered paired outcomes that separates unit-level association, latent cluster-level association, and outcome-dependent retention. Simulations show that pair-based weighting can reduce bias when association arises through unit-level dependence and subgroup composition is informative, but can attenuate association carried by latent cluster-level structure. Typical inverse-cluster weighting remains more stable when the association is primarily cluster-level. Application to NHANES oral-health data shows small positive periodontal and caries associations overall, with filled-surface outcomes showing stronger ISS evidence and greater sensitivity to pair-based weighting than decayed-surface outcomes. These results indicate that marginal association under ICS and ISS should be interpreted in relation to the source of association, observed-unit structure, and assumptions used to choose the weighting scheme.

2603.21743 2026-05-22 cs.LG q-bio.QM

CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement Learning

CellFluxRL: 通过强化学习实现生物约束的虚拟细胞建模

Dongxia Wu, Shiye Su, Yuhui Zhang, Elaine Sui, Emma Lundberg, Emily B. Fox, Serena Yeung-Levy

AI总结 本文提出CellFluxRL,通过强化学习约束虚拟细胞模型,使其在生物功能、结构有效性及形态正确性方面更符合生物学规律,从而提升虚拟细胞建模的生物意义。

详情
AI中文摘要

构建虚拟细胞以生成模型模拟细胞行为在硅中的仿真,正成为加速药物发现的有前途的范式。然而,先前基于图像的生成方法可能会产生不合理的细胞图像,违反基本的物理和生物学约束。为了解决这个问题,我们提出通过强化学习(RL)后训练虚拟细胞模型,利用具有生物意义的评估器作为奖励函数。我们设计了七个奖励,涵盖三个类别——生物功能、结构有效性及形态正确性,并优化最先进的CellFlux模型以获得CellFluxRL。CellFluxRL在所有奖励上均优于CellFlux,且在测试时扩展进一步提升性能。总体而言,我们的结果展示了一个通过强化学习施加物理约束的虚拟细胞建模框架,从而超越了“视觉逼真”的生成,朝着“生物意义”的生成迈进。

英文摘要

Building virtual cells with generative models to simulate cellular behavior in silico is emerging as a promising paradigm for accelerating drug discovery. However, prior image-based generative approaches can produce implausible cell images that violate basic physical and biological constraints. To address this, we propose to post-train virtual cell models with reinforcement learning (RL), leveraging biologically meaningful evaluators as reward functions. We design seven rewards spanning three categories-biological function, structural validity, and morphological correctness-and optimize the state-of-the-art CellFlux model to yield CellFluxRL. CellFluxRL consistently improves over CellFlux across all rewards, with further performance boosts from test-time scaling. Overall, our results present a virtual cell modeling framework that enforces physically-based constraints through RL, advancing beyond "visually realistic" generations towards "biologically meaningful" ones.

2602.01604 2026-05-22 physics.bio-ph q-bio.SC

Thermodynamic cost-controllability tradeoff in metabolic currency coupling

代谢货币耦合中的热力学成本可控性权衡

Jumpei F. Yamagishi, Tetsuhiro S. Hatakeyama

AI总结 本文研究了代谢货币物质在不同能量状态间的耦合对代谢调控效率和生物进化的影响,提出了一种理论模型揭示了代谢可控性与热力学成本之间的根本权衡关系。

详情
Comments
17 pages, 9 figures, 2 tables
AI中文摘要

细胞代谢由多种货币代谢物如ATP、GTP和NAD(P)H全局调控。这些代谢物在带电(高能)和不带电(低能)状态间循环以介导能量转移。尽管不同的货币代谢物与不同的代谢功能相关,但它们的带电和不带电形式通常通过生化反应如ATP+GDP⇌ADP+GTP和NADP++NADH⇌NADPH+NAD+相互转换。因此,它们的能量状态通常耦合并相互影响,这会阻碍不同货币代谢物的独立调控。尽管对个体货币代谢物的分子生物学知识已经很丰富,但如何协调各种耦合的货币代谢物以塑造代谢调控、效率以及最终生物体的进化仍知之甚少。本文提出了一种代谢货币耦合的最小理论模型,并揭示了代谢可控性与热力学成本之间的根本权衡关系:增加对多种货币代谢物独立调控的能力通常需要这些代谢物具有相当的丰度,这反过来又会导致更高的熵生产率。这种权衡表明,在复杂环境中,生物体进化上倾向于使货币代谢物具有相等的丰度以提高代谢可控性,但以更高的热力学成本为代价;相反,在简单环境中,生物体会进化出不平衡的丰度以减少热量散失。这些考虑还提出了关于核苷酸池平衡和基因组GC含量进化趋势的假说。

英文摘要

Cellular metabolism is globally regulated by various currency metabolites such as ATP, GTP, and NAD(P)H. These metabolites cycle between charged (high-energy) and uncharged (low-energy) states to mediate energy transfer. While distinct currency metabolites are associated with different metabolic functions, their charged and uncharged forms are generally interchangeable via biochemical reactions such as ${\rm ATP{\,+\,}GDP{\,\rightleftharpoons\,}ADP{\,+\,}GTP}$ and $\rm NADP^+{\,+\,}NADH{\,\rightleftharpoons\,}NADPH{\,+\,}NAD^+ $. Thus, their energetic states are generally coupled and influence each other, which would hinder the independent regulation of different currency metabolites. Despite the extensive knowledge of the molecular biology of individual currency metabolites, it remains poorly understood how the coordination of various coupled currency metabolites shapes metabolic regulation, efficiency, and ultimately the evolution of organisms. Here, we present a minimal theoretical model of metabolic currency coupling and reveal a fundamental tradeoff relationship between metabolic controllability and thermodynamic cost: increasing the capacity to independently regulate multiple currency metabolites generally requires comparable abundances of those metabolites, which in turn incurs a higher entropy production rate. The tradeoff suggests that in complex environments, organisms evolutionarily favor an equal abundance of currency metabolites to enhance metabolic controllability at the expense of a higher thermodynamic cost; conversely, in simple environments, organisms evolve to have imbalanced amounts of them to reduce heat dissipation. These considerations also offer a hypothesis regarding evolutionary trends in nucleotide-pool balance and genomic GC content.

2601.14577 2026-05-22 q-bio.QM

FBApro: A fast, simple linear transformation for diverse metabolic modeling tasks

FBApro:一种快速、简单的线性变换,用于多样代谢建模任务

Ariel Bruner, Mona Singh

AI总结 本文提出了一种名为FBApro的通用替代方法,用于解决代谢建模中的线性约束问题,通过快速线性变换实现高效计算,无需细胞目标函数,并在合成和真实数据上进行了验证。

详情
Comments
23 pages, 6 figures
AI中文摘要

基于约束的代谢建模是模拟细胞代谢的主要框架。这些模型的核心假设是代谢处于稳态,即每种代谢物的产生和消耗速率平衡。这一假设对生化反应的流速施加了线性约束。流平衡分析(FBA)是该领域的一个基本方法,被表述为一个优化问题,最大化细胞目标(例如生长)在稳态流速的线性子空间上。该领域中的许多其他方法要么是对FBA的修改,要么在算法中使用FBA作为黑盒。在这里,我们提出了一种通用的替代优化方法称为FBApro。对于给定的参考流速向量,FBApro在稳态子空间内找到最接近的流速向量,并考虑部分给定的参考流速和对反应的精确约束。虽然FBApro是二次规划问题的解,但我们证明它可以使用正交投影到相应的仿射空间和线性方程组来实现为单个线性操作。整体方法计算高效,不需要细胞目标函数,并且易于实现。我们正式推导了FBApro及其更简单的变体的闭式表达式,并在合成和真实癌细胞系数据上进行了验证。

英文摘要

Constraint-based metabolic modeling is the predominant framework for simulating cellular metabolism. The central assumption of these models is that metabolism operates at a steady state, meaning that the production and consumption rates of each metabolite are balanced. This assumption imposes linear constraints on the fluxes of biochemical reactions. Flux Balance Analysis (FBA), a fundamental method in the field, is formulated as an optimization problem maximizing a cellular objective (e.g., growth) over the resulting linear subspace of steady state fluxes. Many other methods in the field are expressed either as a modification to FBA, or use FBA as a black box within an algorithm. Here, we propose a general alternative to optimization called FBApro. For any given vector of reference fluxes, FBApro finds the closest flux vector within the steady-state subspace, and accounts for both partially given reference fluxes and exact constraints on reactions. While FBApro is the solution to a quadratic program, we show that it can be implemented as a single linear operation using orthogonal projections to corresponding affine spaces and sets of linear equations. The overall approach is computationally efficient, does not require a cellular objective, and is easy to implement. We formally derive the closed-form expressions for FBApro and simpler variants, and validate it on both synthetic and real cancer cell line data.

2511.04838 2026-05-22 cs.LG math.SP q-bio.MN

SPECTRA: Spectral Domain-Aware Graph Generation for Imbalanced Molecular Property Regression

SPECTRA: 用于不平衡分子属性回归的谱域感知图生成

Brenda Nogueira, Gisela A. Gonzalez-Montiel, Meng Jiang, Nitesh V. Chawla, Nuno Moniz

AI总结 本文提出SPECTRA方法,通过结合稀缺性感知预算方案、目标邻居图对齐和拉普拉斯谱插值,提升对相关但数据稀缺的分子属性值的预测能力,同时在相关目标范围内优于现有最先进方法,计算时间减少约4倍。

详情
AI中文摘要

分子属性回归在化学相关的目标范围内遇到困难,因为这些范围在数据集中代表性不足。标准的平均误差最小化方法在这些高相关性情况下表现不佳,而过采样方法会导致分子表示失去意义。本文提出SPECTRA,一种谱域感知的图生成方法,旨在提高对相关但数据稀缺的分子属性值的预测能力。它结合了稀缺性感知的预算方案以聚焦数据稀缺区域,目标邻居图对齐以建立结构对应关系,以及拉普拉斯谱、节点特征和目标的插值。结合使用谱图神经网络和边缘感知的切比雪夫卷积,SPECTRA在属性预测基准测试中表现出色,在相关目标范围内与最先进的方法竞争,同时计算时间减少约4倍。

英文摘要

Molecular property regression struggles with cases in chemically relevant target ranges that are underrepresented in datasets. Standard average error minimization approaches underperform in these highly relevant cases, and oversampling approaches lead to meaningless molecular representations. In this paper, we propose SPECTRA, a spectral, domain-aware graph generation method designed to improve the prediction of underrepresented but relevant molecular property values. It combines a rarity-aware budgeting scheme to focus generation where data are scarce, target-neighbors graph alignment to establish structural correspondence, and interpolation of Laplacian spectra, node features, and targets. Coupled with spectral GNN using edge-aware Chebyshev convolutions, SPECTRA shows its effectiveness in property prediction benchmarks with competitive performance over leading state-of-the-art methods in relevant target ranges, while requiring ~4x less computational time.

2509.06503 2026-05-22 cs.AI q-bio.QM

An AI system to help scientists write expert-level empirical software

一种帮助科学家编写专家级经验软件的AI系统

Eser Aygün, Anastasiya Belyaeva, Gheorghe Comanici, Marc Coram, Hao Cui, Jake Garrison, Renee Johnston Anton Kast, Cory Y. McLean, Peter Norgaard, Zahra Shamsi, David Smalling, James Thompson, Subhashini Venugopalan, Brian P. Williams, Chujun He, Sarah Martinson, Martyna Plomecka, Lai Wei, Yuchen Zhou, Qian-Ze Zhu, Matthew Abraham, Erica Brand, Anna Bulanova, Jeffrey A. Cardille, Chris Co, Scott Ellsworth, Grace Joseph, Malcolm Kane, Ryan Krueger, Johan Kartiwa, Dan Liebling, Jan-Matthis Lueckmann, Paul Raccuglia, Xuefei, Wang, Katherine Chou, James Manyika, Yossi Matias, John C. Platt, Lizzie Dorfman, Shibl Mourad, Michael P. Brenner

AI总结 本文提出Empirical Research Assistance (ERA)系统,利用大型语言模型和树搜索技术,自动创建高质量的科学软件,以加速计算实验的开发,从而提高科研效率。

详情
Comments
78 pages, 31 figures, 22 tables
AI中文摘要

科学发现的周期经常被缓慢、手动的软件创建所限制,用于支持计算实验。为了解决这个问题,我们提出了Empirical Research Assistance (ERA),一种AI系统,其目标是最大化一个质量度量。该系统使用大型语言模型(LLM)和树搜索(TS)来系统性地提高质量度量并智能地导航可能的解决方案空间。当探索并整合外部来源的复杂研究想法时,ERA能够产生专家级的结果。树搜索的有效性在各种任务上得到了证明。在生物信息学中,ERA发现了40种新的单细胞数据分析方法,这些方法在公开排行榜上优于顶级的人工方法。在流行病学中,ERA生成了14种模型,这些模型在预测新冠住院预测方面优于CDC集合和所有其他个体模型。ERA还为地理空间分析、斑马鱼神经活动预测和积分数值解法以及时间序列预测的规则基构造生成了专家级软件。通过为多样任务设计和实现新的解决方案,ERA代表了加速科学进步的重要一步。

英文摘要

The cycle of scientific discovery is frequently bottlenecked by the slow, manual creation of software to support computational experiments\cite{hannay2009how}. To address this, we present Empirical Research Assistance (ERA), an AI system that creates expert-level scientific software whose goal is to maximize a quality metric. The system uses a Large Language Model (LLM) and Tree Search (TS)\cite{silver2016mastering} to systematically improve the quality metric and intelligently navigate the large space of possible solutions. ERA achieves expert-level results when it explores and integrates complex research ideas from external sources. The effectiveness of tree search is demonstrated across a diverse range of tasks. In bioinformatics, ERA discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard. In epidemiology, ERA generated 14 models that outperformed the CDC ensemble and all other individual models for forecasting COVID-19 hospitalizations. ERA also produced expert-level software for geospatial analysis, neural activity prediction in zebrafish, and numerical solution of integrals, and a novel rule-based construction for time series forecasting. By devising and implementing novel solutions to diverse tasks, ERA represents a significant step towards accelerating scientific progress.