arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.18701 2026-05-19 cs.LG q-bio.QM

Learning Normal Representations for Blood Biomarkers

学习正常表示以血清生物标志物

Aashna P. Shah, Michelle M. Li, Yash Lal, Seffi Cohen, Liat F. Antwarg, Morgan Sanchez, James A. Diao, Chirag J. Patel, Ben Y. Reis, Ran D. Balicer, Noa Dagan, Arjun K. Manrai

AI总结 该研究提出NORMA框架,通过结合患者历史和人口水平数据生成更精确的参考区间,以改善血清生物标志物的个性化解读,避免过度个性化导致的误诊风险。

详情
AI中文摘要

基于生物液体的生物标志物是临床诊断和管理的基础,但其解释主要依赖于固定的参考区间,这些区间忽略了稳定的个体间变异性。因此,基于群体的解释可能会掩盖个体基线的有意义偏差,从而延误疾病检测。为了解决这个问题,人们越来越多地尝试使用个体测试历史来个性化血清生物标志物的解释。然而,这些方法可能会过度拟合稀疏数据,导致假阳性率升高和不必要的随访,并可能无意中包含未被识别或亚临床疾病。在这里,我们利用近20亿个纵向实验室测量值,来自超过160万名北美洲、中东和东亚的个体,表明尽管实验室值高度个体化,但纯个性化区间经常过度拟合,将多达68%的测量值分类为异常,而没有与不良临床结果相应的关联。我们随后引入NORMA,一个基于条件变压器的框架,通过结合患者的历史和人口水平数据中的“正常”变异生成参考区间。NORMA生成的区间在预测结果方面更具精度,包括死亡率、急性肾损伤和慢性疾病。这些发现警示过度个性化在实验室医学中的风险,并证明将个体轨迹锚定到人口水平先验优于单独的方法。为了促进透明度,我们公开发布模型、代码和一个交互式用户界面,以实现可访问的个性化实验室解释。

英文摘要

Blood-based biomarkers underpin clinical diagnosis and management, yet their interpretation relies largely on fixed population reference intervals that ignore stable, intra-patient variability. As such, population-based interpretation can mask meaningful deviation from an individual's baseline, risking delayed disease detection. To remedy this, there have been increasing efforts to personalize blood biomarker interpretation using individual testing histories. However, these methods may overfit to sparse data, inflating false-positive rates and unnecessary follow-up, and can also unwittingly include unrecognized or subclinical disease. Here, we leverage nearly 2 billion longitudinal laboratory measurements from over 1.6 million individuals across North America, the Middle East, and East Asia, to show that while laboratory values are highly individual, purely personalized intervals routinely overfit, classifying up to 68% of measurements as abnormal, without corresponding associations with adverse clinical outcomes. We then introduce NORMA, a conditional transformer-based framework that generates reference intervals by conditioning on both a patient's history and population-level data about "normal" variation. NORMA-derived intervals achieve higher precision for predicting outcomes, including mortality, acute kidney injury, and chronic disease. These findings caution against over-personalization in laboratory medicine and demonstrate that anchoring individual trajectories to population-level priors outperforms either approach alone. To promote transparency, we publicly release the model, code, and an interactive user interface for accessible, individualized laboratory interpretation.

2605.18685 2026-05-19 q-bio.QM

Multi-objective Bayesian inference in an agent-based model of zebrafish patterns via topological data analysis

基于拓扑数据分析的多目标贝叶斯推断:一种鱼鳞图案的基于代理的模型

Yue Liu, Alexandria Volkening

AI总结 本文提出了一种结合拓扑技术和贝叶斯计算的多目标方法,用于推断详细模型的参数,通过推断鱼鳞图案的基于代理的模型,在多个案例研究中实现了实用的可识别性,并通过引入扩展的先验分布将参数推断重新表述为规则推断,以寻找与数据一致的替代简单模型。

详情
Comments
28 pages, 12 figures
AI中文摘要

由个体代理的集体行为产生的空间模式在生物系统中普遍存在。尽管基于代理的模型为揭示未知代理(例如细胞)的相互作用提供了自然框架,但这些随机模型面临重大挑战。对于空间模式,基于代理的建模通常涉及手动调整以获得与多个实验定性一致的结果。这一过程限制了预测能力,并引发了关于参数可识别性和模型唯一性的问题。通过结合拓扑技术和贝叶斯计算,我们提出了一种多目标方法用于详细模型的参数推断。我们通过推断鱼鳞图案的基于代理的模型来说明我们的方法,在多个案例研究中实现了实用的可识别性。通过引入扩展的先验分布,我们将参数推断重新表述为规则推断,从而在超过80个候选的基于代理的规则中搜索,以识别一个与我们的数据一致的替代、更简单的模型。

英文摘要

Spatial patterns arising from the collective behavior of individual agents are present across biological systems. While agent-based models offer a natural framework for uncovering unknown agent (e.g., cell) interactions, these stochastic models face significant challenges. For spatial patterns, agent-based modeling often involves manual tuning to attain qualitative consistency with multiple experiments. This process limits predictive power and raises questions about parameter identifiability and model uniqueness. Combining topological techniques and Bayesian computation, we present a multi-objective methodology for parameter inference in detailed models. We illustrate our approach by inferring parameters in an agent-based model of zebrafish patterns, achieving practical identifiability in several case studies. By introducing extended prior distributions, we then reframe parameter inference as rule inference, allowing us to search across over 80 candidate agent-based rules to identify an alternative, simpler model consistent with our data.

2605.18588 2026-05-19 stat.ME q-bio.QM

OSSMM: An Open-Source Sleep Monitor and Modulator

OSSMM:一个开源的睡眠监测与调节器

Jonny Giordano, Fergal Stapleton, Gabriel Palma, Barak A. Pearlmutter

AI总结 本文提出OSSMM,一种开源的硬件和软件平台,用于可及的睡眠研究。该平台通过低成本的3D打印和商用现成组件构建了小型可穿戴头带,并配以Android应用程序,无需导电凝胶、一次性电极或专业设备,即可通过无线连接捕获多种生物信号,用于睡眠阶段分类和潜在的睡眠调节。

详情
Comments
8 pages
AI中文摘要

我们介绍了开源睡眠监测与调节器(OSSMM),一种开源的硬件和软件平台,用于可及的睡眠研究。OSSMM由一个小型可穿戴头带组成,该头带由3D打印和经济实惠的商用现成(COTS)组件制成,材料成本低于40欧元,配以一个配套的Android应用程序。该系统不需要导电凝胶、一次性电极或专用设备,能够通过无线连接捕获多种生物信号,包括运动、脉搏、电诱发眼动(EOG)和潜在的脑电图(EEG)信号,用于数据存储和可能的睡眠调节能力。通过15晚的单个受试者概念验证评估,所捕获的生物信号支持四阶段睡眠分类(清醒、浅睡、深睡、REM)使用传统机器学习方法,最佳模型在与经过验证的非接触睡眠监测器(κ=0.63与PSG)相比时,达到宏F1分数0.770和准确率0.776。两个技术发现尤其值得注意。首先,低成本、可重复使用的导电热塑性聚氨酯(CTPU)电极从商用健身胸 strap 中捕获的差分信号的频谱特性在传统EEG频率带中,包括与睡眠纺锤体一致的特征,是分类的主要特征。第二,该信号仅通过两个前额电极获得,无需专用地参考电极,表明比通常使用的更简单的配置可以实现实际的睡眠阶段分类。所有硬件设计、软件和构建说明都公开可用,以支持研究社区的复制和修改。

英文摘要

We present the Open-Source Sleep Monitor and Modulator (OSSMM), an open-source hardware and software platform for accessible sleep research. The OSSMM comprises a small wearable headband built from 3D prints and affordable commercial-off-the-shelf (COTS) components at a material cost under 40 euros, supported by a companion Android application. The system requires no conductive gels, disposable electrodes, or specialized equipment, and captures multiple biosignals movement, pulse, electrooculography (EOG), and putative electroencephalography (EEG) with wireless connectivity for data storage and potential sleep modulation capability via an onboard vibration motor. A proof-of-concept single-participant evaluation across 15 nights demonstrated that the captured biosignals support four-stage sleep classification (Wake, Light Sleep, Deep Sleep, REM) using conventional machine learning methods, with the best-performing model achieving a Macro F1-score of 0.770 and accuracy of 0.776 against a validated non-contact sleep monitor ($κ$=0.63 with PSG). Two technical findings are of particular note. First, inexpensive, reusable conductive thermoplastic polyurethane (CTPU) electrodes from commercial fitness chest straps captured a differential signal whose spectral properties in canonical EEG frequency bands, including signatures consistent with sleep spindles, are the principal features driving classification. Second, this signal is obtained from just two frontal electrodes without a dedicated ground reference, suggesting that practical sleep staging is achievable with simpler configurations than typically employed. All hardware designs, software, and build instructions are openly available to support replication and modification by the research community.

2605.18571 2026-05-19 q-bio.PE q-bio.QM

Incorporating vaccine effects into epidemiological models: common pitfalls and solutions

在流行病学模型中纳入疫苗效果:常见的陷阱与解决方案

Casey E. Middleton, Oliver Eales, James M. McCaw, Freya M. Shearer

AI总结 本文探讨了在流行病学模型中正确纳入疫苗效果的方法,指出疫苗效力估计与模型参数之间的不匹配问题,并提出了一种改进的参数化方法以提高模型准确性。

详情
AI中文摘要

将疫苗接种纳入数学模型似乎看起来很简单:模型整合了由疫苗衍生的保护效果,如降低感染易感性,使用由实证估计的疫苗效力或有效性(VE)所指导的参数。然而,在实践中,实证VE估计往往并不直接对应流行病学模型的参数。本文扩展了先前的工作,表明为了准确参数化模型,必须考虑疫苗的作用机制和用于从实证数据推断VE的统计方法。当疫苗提供渗漏保护——即疫苗部分而非完全降低个体感染风险时,我们显示常见的实证VE估计方法不提供直接适用于模型参数的值。直接将这些VE估计纳入模型会导致对群体水平疫苗影响的低估。为了在这些估计是唯一可用VE来源时取得进展,我们引入了一种参数化方法,更准确地将疫苗的模拟效果与实证估计对齐。在调整后的参数化方法下,模型预测的总感染人数更少,群体免疫阈值也比当前参数化实践下的预测值更低。我们的参数化指南和调整方法可用于提高在疫苗决策和公共卫生规划中使用的模型的准确性。

英文摘要

Incorporating vaccination into mathematical models appears deceptively simple: models integrate vaccine-derived protections, such as reduced susceptibility to infection, using parameters informed by empirical estimates of vaccine efficacy or effectiveness (VE). In practice, however, empirical VE estimates often do not correspond directly to the parameters of epidemiological models. Here, we extend previous work to demonstrate that in order to accurately parameterize a model, one must consider both a vaccine's mechanism of action and the statistic used to infer VE from empirical data. When a vaccine confers leaky protection -- that is, vaccination partially rather than completely reduces individual infection risk -- we show that common empirical VE estimation methods do not provide directly applicable values for model parameters. Naive (i.e. direct) incorporation of these VE estimates into models results in an underestimate of population-level vaccine impact. To make progress when these estimates are the only available sources for VE, we introduce a parameterization approach which more accurately aligns the modeled effect of vaccination with empirical estimates. Under this adjusted parameterization approach, models predict fewer total infections and lower herd immunity thresholds for leaky vaccines than would be predicted under current parameterization practices. Our parameterization guidelines and adjustment approach can be used to improve accuracy in models that are used in vaccine decision making and public health planning.

2605.18557 2026-05-19 cs.LG cs.NE q-bio.NC

Self-supervised local learning rules learn the hidden hierarchical structure of high-dimensional data

自监督局部学习规则学习高维数据的隐藏层次结构

Ariane Delrocq, Wu S. Zihan, Guillaume Bellec, Wulfram Gerstner

AI总结 本文研究了自监督局部学习规则在随机层次模型上的表现,发现第一类规则因输入特定的非线性(masking)失效,而第二类规则能有效学习层次结构并具备数据效率和生物合理性。

详情
AI中文摘要

大脑学习高维感觉输入的抽象表示,但使这种学习成为可能的可塑性规则尚不明确。我们研究了生物合理的算法在随机层次模型(RHM)上的表现,RHM是一个人工数据集,用于研究深度神经网络如何学习高维数据的内在层次结构。我们专注于两种类型的局部学习规则,它们避免了长收敛时间和对称误差网络的使用。第一类使用直接反馈信号来近似从输出层的误差传播。第二类使用分层自监督对比或非对比损失函数,不显式近似输出层的误差。我们证明所有第一类规则都无法解决RHM的任务,并追溯这种失败到输入特定的非线性(masking),这些非线性在完全反向传播中被实现,并对学习复杂任务至关重要。然而,第二类算法能够学习RHM任务的层次隐藏结构,并且与监督反向传播训练一样高效,同时与已知的皮层突触可塑性规则兼容。

英文摘要

The brain learns abstract representations of high-dimensional sensory input, but the plasticity rules that enable such learning are unknown. We study biologically plausible algorithms on the Random Hierarchy Model (RHM), an artificial dataset designed to investigate how deep neural networks learn the intrinsic hierarchical structure of high-dimensional data. We focus on two types of local learning rules that avoid both a long convergence time and the use of a symmetric error network. The first type uses direct feedback signals to approximate error propagation from the output layer. The second type uses layerwise self-supervised contrastive or non-contrastive loss functions that do not explicitly approximate errors at the output layer. We show that all rules of the first type fail to solve the tasks of the RHM and trace this failure back to input-specific nonlinearities (`masking') that are implemented in full backpropagation and are essential for learning complex tasks. However, algorithms of the second type are able to learn the hierarchical hidden structure of the RHM tasks and are as data-efficient as supervised backpropagation training, while being compatible with known rules of synaptic plasticity in cortex.

2605.18552 2026-05-19 cs.LG q-bio.BM q-bio.QM

Protein Fold Classification at Scale: Benchmarking and Pretraining

大规模蛋白质折叠分类:基准测试与预训练

Dexiong Chen, Andrei Manolache, Mathias Niepert, Karsten Borgwardt

AI总结 本文提出TEDBench,一个大规模非冗余的蛋白质折叠分类基准,通过Encyclopedia of Domains和Foldseek-clustered AlphaFold结构构建。基于此基准,作者提出Masked Invariant Autoencoders (MiAE)框架,通过高掩码率和SE(3)不变编码器实现蛋白质结构表示学习,从而在TEDBench上取得优异性能。

详情
Comments
Accepted at ICML 2026 (spotlight)
AI中文摘要

对蛋白质拓扑进行分类对于解析生物学功能至关重要,但进展受限于缺乏大规模基准和无法扩展的模型。我们引入TEDBench,一个大规模、非冗余的蛋白质折叠分类基准,由Encyclopedia of Domains (TED)和Foldseek-clustered AlphaFold结构构建。我们证明在TEDBench上,当前的蛋白质表示学习方法要么需要非常大的模型,要么无法提供强大的性能。为解决这一挑战,我们提出了Masked Invariant Autoencoders (MiAE),一种自监督的蛋白质结构表示学习框架。MiAE使用高达90%的高掩码率,结合SE(3)-不变编码器和轻量级解码器,从潜在表示和掩码标记中重建骨架坐标。MiAE具有良好的扩展性,并在TEDBench上优于监督方法和最先进的基线,建立了蛋白质折叠分类的强大配方。为了测试超越AlphaFold结构的迁移能力,我们进一步在CATH v4.4的实验结构数据集上进行基准测试。TEDBench可在https://github.com/BorgwardtLab/TEDBench获取。

英文摘要

Classifying protein topology is essential for deciphering biological function, but progress is held back by the lack of large-scale benchmarks that avoid duplicates and by models that do not scale well. We introduce TEDBench, a large-scale, non-redundant benchmark for protein fold classification constructed from the Encyclopedia of Domains (TED) and Foldseek-clustered AlphaFold structures. We show that on TEDBench, current protein representation learning methods either require very large models or fail to deliver strong performance. To address this challenge, we propose Masked Invariant Autoencoders (MiAE), a self-supervised framework for protein structure representation learning. MiAE uses an extremely high masking ratio of up to 90% with an $\mathrm{SE(3)}$-invariant encoder and a lightweight decoder that reconstructs backbone coordinates from the latent representation and mask tokens. MiAE scales well and outperforms supervised counterparts and state-of-the-art baselines on TEDBench, establishing a strong recipe for protein fold classification. To test transfer beyond AlphaFold structures, we further benchmark on a curated dataset from experimental structures of CATH v4.4. TEDBench is available at https://github.com/BorgwardtLab/TEDBench.

2605.18251 2026-05-19 eess.SP cs.LG q-bio.NC

Subject-Specific Analysis of Self-Initiated Attention Shifts from EEG with Controlled Internal and External Attention Conditions

基于EEG的受试者特异性自我启动注意力转移的分析:受控内部和外部注意力条件

Yuwen Zeng, Dengzhe Hou, Zhang Zhang, Sai Sun, Yongsong Huang, Chia-huei Tseng, Satoshi Shioiri

AI总结 本文研究了自我启动注意力转移的神经机制,通过EEG特征分析和机器学习方法,揭示了受试者特异性信息在可控实验条件下的应用价值。

详情
AI中文摘要

自我启动的注意力转移在自愿行为中起关键作用,但由于缺乏显式的时序标记而难以研究。尽管之前的研究所探讨了其神经相关性,但尚不清楚多维脑电图(EEG)特征如何在可解释的计算框架中贡献于其表征。在本研究中,我们基于之前的工作开发的实验范式,实现了在相同视觉刺激下的任务受限自我启动转移与外部指导转移的受控比较。在此设置中,我们探讨了准备性EEG活动是否能区分这两种类型的注意力转移。我们采用基于机器学习的方法,进行了两种互补的分析:(1)以性能为导向的频率特异性地形模式评估,以及(2)使用SHapley Additive exPlanations(SHAP)的模型基于特征归因分析。这些分析提供了对感兴趣区域跨频谱特征如何贡献于模型行为的结构化视图。我们的结果表明,具有可靠受试者内分类性能,表明准备性EEG活动在此范式中包含受试者特异性判别信息。分析显示,高频带和前额区域对模型决策有显著贡献,尽管由于高频EEG信号中可能存在的非神经伪影影响,这种贡献应谨慎解释。总体而言,本文强调了可解释机器学习在受控实验条件下分析受试者特异性EEG信号模式的价值,具有在个性化和异步脑机接口系统中的潜在应用。

英文摘要

Self-initiated attention shifts play a critical role in voluntary behavior but are difficult to study due to the absence of explicit temporal markers. While previous studies have examined their neural correlates, it remains unclear how multi-dimensional electroencephalography (EEG) features contribute to their characterization within an interpretable computational framework. In this study, we build on an experimental paradigm developed in our previous work, which enables controlled comparison between task-constrained self-initiated shifts and externally instructed shifts under identical visual stimulation. Within this setting, we investigate whether preparatory EEG activity can distinguish these two types of attention shifts. We adopt a machine learning-based approach and conduct two complementary analyses: (1) a performance-oriented assessment of frequency-specific topographic patterns, and (2) a model-based feature attribution analysis using SHapley Additive exPlanations (SHAP). These analyses provide a structured view of how spectral features across regions of interest contribute to model behavior. Our results demonstrate reliable within-subject classification performance, indicating that preparatory EEG activity contains subject-specific discriminative information within this paradigm. The analysis shows that higher-frequency bands and frontal regions contribute strongly to model decisions, although such contributions should be interpreted cautiously due to the potential influence of non-neural artifacts in high-frequency EEG signals. Overall, this work highlights the value of interpretable machine learning for analyzing subject-specific EEG signal patterns in a controlled experimental setting, with potential applications in personalized and asynchronous brain-machine interface systems.

2605.15401 2026-05-19 math.PR q-bio.MN q-bio.PE

Noise Tradeoffs, Stationary Information Flow, and Structural Balance in Unit-Birth Networks

噪声权衡、稳态信息流与结构平衡在单位出生网络中的研究

David F. Anderson

AI总结 本文研究了单位出生网络中噪声权衡、稳态信息流和结构平衡的问题,通过数学方法证明了在特定条件下噪声因子Fano factor的下界为1,并指出结构平衡网络中亚泊松噪声需要受挫的相互作用拓扑。

详情
Comments
36 pages, 2 figures
AI中文摘要

在2019年,Paulsson及其合作者提出随机生化控制网络在多个组件上同时抑制内在噪声存在根本限制。Ripsman、Kell和Hilfinger最近提出了基于稳态信息论分解的正式证明策略。本文提供了该论点的严谨数学证明。我们考虑在Z^N_{≥0}上的连续时间马尔可夫链,其中每个组件线性降解并以状态依赖的速率在单位出生中产生,该速率依赖于其他组件但不依赖于自身。组件i的噪声通过Fano因子F_{X_i}测量,即稳态方差与均值的比值,泊松值1为基线。我们的第一个贡献是隔离关于矩、均值出生率和总速率增长的显式假设,从而可以严格证明正式的信息流恒等式。按照Ripsman、Kell和Hilfinger的证明框架,我们随后证明了猜想。我们的第二个贡献是使这些假设可检验:通过Foster-Lyapunov方法,一个统一的正下界在出生率上以及总增长率被最弱降解速率主导的假设足够。我们的第三个贡献是结构强化。在速率函数上的有符号单调性条件下,该条件在结构平衡的有符号相互作用网络中得到满足,我们证明稳态分布相对于相应的有符号偏序是相关的。这将全局权衡升级为术语界F_{X_i}≥1对于每个i。因此,在有符号单调子类中,亚泊松噪声需要受挫的相互作用拓扑。

英文摘要

In 2019, Paulsson and collaborators conjectured that stochastic biochemical control networks have fundamental limits on how much intrinsic noise can be simultaneously suppressed across multiple components. Ripsman, Kell, and Hilfinger recently proposed a formal proof strategy for unit-birth models based on a stationary information-theoretic decomposition. Here, we provide a rigorous mathematical justification for this argument. We consider continuous-time Markov chains on $\Z^N_{\ge 0}$ in which each component is degraded linearly and produced in unit births at a state-dependent rate depending on the other components but not on itself. Noise in component $i$ is measured by the Fano factor $F_{X_i}$, the ratio of stationary variance to mean, with Poisson value $1$ as baseline. Our first contribution is to isolate explicit hypotheses on moments, mean birth rates, and total-rate growth under which the formal information-flow identities can be rigorously justified. Following the proof outline of Ripsman, Kell, and Hilfinger, we then prove the conjecture. Our second contribution is to make these hypotheses checkable: a uniform positive lower bound on the birth rates and at-most-linear total growth dominated by the weakest degradation rate suffices, via Foster--Lyapunov methods. Our third contribution is a structural strengthening. Under a signed monotonicity condition on the rate functions, satisfied by structurally balanced signed interaction networks, we prove that the stationary distribution is associated with respect to the corresponding signed partial order. This upgrades the global tradeoff to the termwise bound $F_{X_i}\ge 1$ for every $i$. Hence, within the signed-monotone subclass, sub-Poissonian noise requires a frustrated interaction topology.

2502.07360 2026-05-19 q-bio.QM cs.CV

Supervised contrastive learning for cell stage classification of animal embryos

基于监督对比学习的动物胚胎细胞阶段分类

Yasmine Hachani, Patrick Bouthemy, Elisa Fromont, Sylvie Ruffini, Ludivine Laffont, Alline de Paula Reis

AI总结 本文提出了一种基于监督对比学习和焦点损失的深度学习方法,用于自动分类动物胚胎的细胞阶段,解决了低质量图像、类别模糊和数据分布不均等挑战,并在牛胚胎和小鼠胚胎数据集上实现了优于现有方法的性能。

详情
Journal ref
Scientific Reports, 2026
AI中文摘要

视频显微镜结合机器学习为研究体外生成(IVP)胚胎的早期发育提供了有前景的方法。然而,手动标注发育事件,特别是细胞分裂,对于生物学家来说是耗时的,且无法扩展到实际应用。我们旨在利用深度学习方法自动分类来自2D时间延时显微镜视频的胚胎细胞阶段。我们专注于牛胚胎发育的分析,因为我们的主要应用是牛养殖,并创建了牛胚胎细胞阶段(ECS)数据集。挑战有三个:(1)低质量图像和牛暗细胞使细胞阶段识别困难,(2)发育阶段边界处的类别模糊,以及(3)数据分布不平衡。为了解决这些挑战,我们引入了CLEmbryo,一种结合监督对比学习和焦点损失的新型方法,并使用轻量级3D神经网络CSN-50作为编码器。我们还展示了我们的方法具有良好的泛化能力。CLEmbryo在我们的牛ECS数据集和公开可用的NYU小鼠胚胎数据集上均优于现有最先进的方法。

英文摘要

Videomicroscopy, when combined with machine learning, offers a promising approach for studying the early development of in vitro produced (IVP) embryos. However, manually annotating developmental events, and more specifically cell divisions, is time-consuming for a biologist and cannot scale up for practical applications. We aim to automatically classify the cell stages of embryos from 2D time-lapse microscopy videos with a deep learning approach. We focus on the analysis of bovine embryonic development using video microscopy, as we are primarily interested in the application of cattle breeding, and we have created a Bovine Embryos Cell Stages (ECS) dataset. The challenges are three-fold: (1) low-quality images and bovine dark cells that make the identification of cell stages difficult, (2) class ambiguity at the boundaries of developmental stages, and (3) imbalanced data distribution. To address these challenges, we introduce CLEmbryo, a novel method that leverages supervised contrastive learning combined with focal loss for training, and the lightweight 3D neural network CSN-50 as an encoder. We also show that our method generalizes well. CLEmbryo outperforms state-of-the-art methods on both our Bovine ECS dataset and the publicly available NYU Mouse Embryos dataset.

2605.18118 2026-05-19 q-bio.NC

Functional Whole-Brain Models: A New Framework for Unifying Brain Structure and Cognitive Function

功能全脑模型:一种统一脑结构与认知功能的新框架

Mario Senden, Leonardo Dalla Porta, Jan Fousek, Jorge F. Mejias, Gorka Zamora-López

AI总结 本文提出功能全脑模型(fWBMs),旨在整合脑结构和动态真实性与任务执行能力,通过三个支柱路线图在短期、中期和长期目标上推进,为脑科学研究提供新的工具和跨尺度假设。

详情
AI中文摘要

当代计算神经科学有两个显著的建模传统。自下而上的全脑建模(WBM)构建了生物物理细节的脑结构和动态模拟,而自上而下的神经连接主义则优化了深度神经网络以实现功能性能。每种方法都取得了显著成功,但仍有不足:WBM缺乏功能能力,神经连接主义模型则表现出有限的生物基础。本文提出功能全脑模型(fWBMs)作为一种统一的建模范式,整合了结构和动态的真实性以及跨认知领域的功能能力。fWBMs由四个最小标准定义:基于经验连接体和区域生物学的结构基础、连续时间动态真实性、跨认知领域的功能能力,以及可映射到神经影像、电生理和行为数据的可观察性。为了正式化这种整合,我们建立了跨越短期、中期和长期目标的三个支柱路线图,并概述了该范式所带来的科学和临床机遇。我们认为,对这种整合愿景的有纪律追求将产生所需的工具、共同语言和跨尺度假设,以推进我们对大脑的理解。

英文摘要

Contemporary computational neuroscience features two prominent modeling traditions. Bottom-up whole-brain modeling (WBM) builds biophysically detailed simulations of brain structure and dynamics, whereas top-down neuroconnectionism optimizes deep neural networks for functional performance. Each has achieved remarkable success yet remains incomplete with WBMs lacking functional competence and neuroconnectionist models showing limited biological grounding. Here we propose functional whole-brain models (fWBMs) as a unified modeling paradigm that integrates structural and dynamical realism with task-performing capacity. fWBMs are defined by four minimal criteria: structural grounding in empirical connectomes and regional biology, continuous-time dynamical realism, functional competence across cognitive domains, and mappable observables to neuroimaging, electrophysiologcal and behavioral data. To formalize this integration, we establish a three-pillar roadmap across short-, mid-, and long-term horizons, and outline the scientific and clinical opportunities this paradigm enables. We argue that the disciplined pursuit of this integrative vision will generate the tools, common language, and cross-scale hypotheses needed to advance our understanding of the brain.

2605.17975 2026-05-19 q-bio.PE

M-SDT: A modelling framework for dengue transmission, forecasting, and intervention strategies in Ahmedabad Municipal Corporation

M-SDT:阿赫迈德亚市议会登革热传播、预测和干预策略建模框架

Sourav Roy, Rajendra Gadhavi, Bhavin Solanki, Chirag Shah, Raj C. Sharma, Indrajit Ghosh

AI总结 本文提出了一种数据驱动的 compartmental 框架,用于模拟登革热传播动态、生成预测并评估干预策略,重点在于阿赫迈德亚市议会的区域异质性和季节性变化。

详情
Comments
38 pages, 17 figures
AI中文摘要

登革热在快速城市化的印度城市,如阿赫迈德亚市议会,构成了持续的公共卫生挑战,其中空间异质性和季节性变化使预测和控制变得复杂。在本研究中,我们开发了一个数据驱动的 compartmental 框架,用于模拟传播动态、生成预测并评估干预策略。我们采用了机制性季节性登革热传播(M-SDT)模型,该模型整合了有症状和无症状感染。我们使用2020-2024年按区域的登革热病例数据校准了所提出的模型。参数不确定性通过使用负二项式噪声的bootstrap抽样框架进行严格量化。校准的模型揭示了AMC区域内的显著空间异质性,存在持续的热点区域和不同的传播制度。2026-2028年的预测表明,将继续存在地方流行病,但年际变化中等。敏感性分析确定了蚊虫叮咬率和向量死亡率是长期疾病负担的主要驱动因素,突显了向量生态学在塑造流行病结果中的核心作用。评估季节性向量控制策略显示了操作上的显著差异;周期性喷雾具有累积效果,而持续的残余喷洒可以迅速遏制爆发并减少发病率超过80%。区域分析表明,蚊虫-人类比率不仅决定了基础爆发潜力,还决定了每个区域对控制策略的响应性。总体而言,M-SDT建模框架能够重建未观察到的动态,进行严格不确定性量化,并评估针对特定区域的干预策略,突显了将细粒度监测数据与机制性建模相结合对于适应性城市登革热控制的重要性。

英文摘要

Dengue fever poses a persistent public health challenge in rapidly urbanizing Indian cities such as Ahmedabad, where spatial heterogeneity and seasonal variability complicate forecasting and control. In this study, we develop a data-driven compartmental framework to simulate transmission dynamics, generate forecasts, and evaluate intervention strategies across the Ahmedabad Municipal Corporation (AMC). We employ a Mechanistic Seasonal Dengue Transmission (M-SDT) model that incorporates symptomatic and asymptomatic infections. We calibrated the proposed model using zone-wise dengue case data during 2020--2024. Parameter uncertainty is rigorously quantified using a bootstrap sampling framework with negative binomial noise. The calibrated model reveals pronounced spatial heterogeneity across AMC zones, with persistent hotspots and distinct transmission regimes. Forecasts for 2026--2028 indicate continued endemic circulation with moderate inter-annual variability. Sensitivity analysis identifies the mosquito biting rate and vector mortality as dominant drivers of long-term disease burden, highlighting the central role of vector ecology in shaping epidemic outcomes. Evaluating seasonal vector control strategies shows a notable difference in operation; periodic fogging has a cumulative effect over the years, while sustained residual spraying can quickly curb outbreaks and decrease incidence by over 80%. The zone-wise analysis reveals that the mosquito-to-human ratio governs not only the baseline outbreak potential but also each zone's responsiveness to control strategies. Overall, the M-SDT modelling framework enables reconstruction of unobserved dynamics, rigorous uncertainty quantification, and evaluation of targeted, zone-specific interventions, underscoring the importance of integrating fine-scale surveillance data with mechanistic modelling for adaptive urban dengue control.

2605.17899 2026-05-19 cs.LG cs.AI q-bio.QM

DCFold: Efficient Protein Structure Generation with Single Forward Pass

DCFold: 通过单次前向传递高效生成蛋白质结构

Zhe Zhang, Yuanning Feng, Yuxuan Song, Keyue Qiu, Hao Zhou, Wei-Ying Ma

AI总结 本文提出DCFold,一种单步生成模型,实现了与AlphaFold3同等的精度,通过双一致性训练框架和新的时间测地匹配(TGM)调度器,在保持预测保真度的同时将推理速度提升15倍,验证了其在结构预测和结合设计基准上的有效性。

详情
AI中文摘要

AlphaFold3引入了一种基于扩散的架构,将蛋白质结构预测提升到原子级分辨率,并提高了准确性。这种最先进的性能使AlphaFold3成为多样化生成和设计任务的基础模型。然而,其迭代设计显著增加了推理时间,限制了在虚拟筛选和蛋白质设计等下游任务中的实际部署。我们提出DCFold,一种单步生成模型,实现了AlphaFold3级别的精度。我们的双一致性训练框架,结合了新的时间测地匹配(TGM)调度器,使DCFold在保持预测保真度的同时,将推理速度提升15倍。我们验证了其在结构预测和结合设计基准上的有效性。

英文摘要

AlphaFold3 introduces a diffusion-based architecture that elevates protein structure prediction to all-atom resolution with improved accuracy. This state-of-the-art performance has established AlphaFold3 as a foundation model for diverse generation and design tasks. However, its iterative design substantially increases inference time, limiting practical deployment in downstream settings such as virtual screening and protein design. We propose DCFold, a single-step generative model that attains AlphaFold3-level accuracy. Our Dual Consistency training framework, which incorporates a novel Temporal Geodesic Matching (TGM) scheduler, enables DCFold to achieve a 15x acceleration in inference while maintaining predictive fidelity. We validate its effectiveness across both structure prediction and binder design benchmarks.

2605.17781 2026-05-19 cond-mat.stat-mech nlin.CG q-bio.PE

Universal interface fluctuations in absorbing-state phase transitions

吸收相变中的通用界面波动

Yohsuke T. Fukai, Keiichi Tamai, Tetsuya Hiraiwa

AI总结 研究揭示了在定向渗透(DP)和紧凑定向渗透(CDP)类别的(2+1)维模型中,(1+1)维界面的波动从短时间由吸收相变(APTs)主导的波动过渡到长时间KPZ波动,并发现界面高度分布的累积量在重新缩放时间与长度后坍缩到单一的标度函数,表明KPZ生长参数仅由APTs的基本属性决定。

详情
Comments
9 pages, 4 figures
AI中文摘要

尽管表现出吸收相变(APTs)的模型与表现出Kardar-Parisi-Zhang(KPZ)生长的模型在某些方面相似,但这两类通用波动之间的关系仍然不清楚。我们数值研究了具有活跃边界的(2+1)维模型的(1+1)维界面,发现从短时间由APTs主导的波动过渡到长时间KPZ波动的通用临界行为。通过对时间与长度进行重新缩放,界面高度分布的累积量坍缩到单一的标度函数。离散Domany-Kinzel模型和连续随机Fisher-Kolmogorov-Petrovsky-Piskunov(sFKPP)方程的波动特性一致,表明KPZ生长参数仅由APTs的基本属性决定。对于CDP sFKPP方程,无量纲参数同时调节临界界面分布和KPZ参数,当取极限情况时,可以恢复偏置投票模型的界面特性。这些结果揭示了KPZ波动从APTs波动在长时间尺度上出现的通用临界行为,将非平衡标度不变现象的典型通用类联系起来。

英文摘要

Despite similarities between models exhibiting absorbing phase transitions (APTs) and those showing Kardar-Parisi-Zhang (KPZ) growth, the relationship between these universal fluctuations has remained elusive. We numerically study (1+1)-dimensional interfaces of (2+1)-dimensional models showing APTs of directed percolation (DP) and compact directed percolation (CDP) classes with an active boundary, finding a universal crossover from short-time APT-governed fluctuations to long-time KPZ fluctuations. Upon rescaling time and length by the APT correlation time and length, the cumulants of the interface height distributions collapse onto a single scaling function. The fluctuation properties of the discrete Domany-Kinzel model and the continuum stochastic Fisher-Kolmogorov-Petrovsky-Piskunov (sFKPP) equation coincide, indicating that the KPZ growth parameters are determined solely by fundamental properties of the APT. For the CDP sFKPP equation, a dimensionless parameter tunes both the critical interface distribution and the KPZ parameters, with the interface properties of the biased voter model recovered in a limiting case. These results uncover a universal crossover in which KPZ fluctuations emerge from APT fluctuations at long times, linking paradigmatic universality classes of nonequilibrium scale-invariant phenomena.

2605.17559 2026-05-19 stat.ME cs.AI q-bio.QM stat.ML

Controlling False Discovery in Arbitrarily Structured Hypothesis Spaces via Reproducing Kernels

通过再生核来控制任意结构假设空间中的假发现

Binyamin Perets, Shie Mannor

AI总结 本文提出了一种基于再生核的框架,用于在任意结构的假设空间中控制假发现率,通过将结构FDR控制转化为正则化学习问题,实现了对连续域、图和层次结构的统一处理,提高了发现能力。

详情
Comments
9 pages
AI中文摘要

大规模假设检验是现代科学的核心,其中控制假发现率(FDR)已成为管理多个同时检验中假阳性的一种标准方法。假设很少是孤立存在的;它们通常通过接近性、连接性或层次结构表现出结构。这种结构既是挑战也是机会:虽然经典方法将这些依赖性视为需要保守校正的障碍,但利用它们可以显著提高发现能力。本文将结构化的FDR控制重新表述为一个正则化学习问题。通过在合适的再生核希尔伯特空间(RKHS)中优化,我们引入了一个框架,通过仅选择合适的核,将连续域、图和层次结构统一到单一算法中。这种形式化使我们能够用平滑的解决方案替代先前方法的分段常数拟合,通过原理化的基于似然的超参数选择而不是启发式调整,并在未观测位置进行推断,从而支持样本效率的实验设计。在该估计器的基础上,我们提供了两个决策规则,我们证明它们能够控制FDR。我们验证了我们的方法在两个来源上:来自高维现实数据集的空间位置,以及利用蛋白质-蛋白质相互作用图的差异基因表达任务。

英文摘要

Large-scale hypothesis testing is central to modern science, where controlling the False Discovery Rate (FDR) has become the standard approach to managing false positives across many simultaneous tests. Hypotheses rarely exist in isolation; they often exhibit structure through proximity, connectivity, or hierarchy. This structure represents both a challenge and an opportunity: while classical methods treat these dependencies as obstacles requiring conservative correction, leveraging them can substantially increase discovery power. Here, we reframe structured FDR control as a regularized learning problem. By optimizing within a suitable Reproducing Kernel Hilbert Space (RKHS), we introduce a framework that unifies continuous domains, graphs, and hierarchies under a single algorithm through kernel choice alone. This formulation enables smooth solutions in place of the piecewise-constant fits of prior methods, principled likelihood-based hyperparameter selection rather than heuristic tuning, and inference at unobserved locations which in turn supports sample-efficient experimental design. Building on this estimator, we provide two decision rules which we prove to control the FDR. We validate our method on two sources: spatial locations derived from high-dimensional real-world datasets, and a differential gene expression task utilizing protein-protein interaction graphs.

2605.17399 2026-05-19 q-bio.NC cs.NE

Von Economo neurons enable reliable social skill acquisition in recurrent spiking neural networks: a computational account with clinical predictions

Von Economo神经元在循环脉冲神经网络中促进可靠的社交技能学习:一种计算解释并具有临床预测

Esila Keskin

AI总结 该研究探讨了Von Economo神经元(VENs)在社交学习中的计算作用,通过训练包含VEN-like投射神经元的脉冲神经网络(VENCircuit),发现VENs的缺失显著影响学习能力,提出VENs作为学习支架的理论,并提出可验证的临床预测。

详情
Comments
21 pages, 5 figures, 4 tables
AI中文摘要

Von Economo神经元(VENs)在行为变异前额叶颞叶痴呆(bvFTD)中选择性丢失,在自闭症谱系障碍(ASC)中减少,但其在社交学习中的计算作用仍未明确。我们训练了一个包含VEN-like投射神经元(K=40,占总神经元的2%)的脉冲神经网络(VENCircuit),在50次匹配的随机初始化中,分别在有无VENs的情况下训练。网络在受控的二分类任务上进行训练;我们不声称直接建模社交认知。具有VENs的网络在49/50次(98%)中收敛,而VENs缺失的网络在35/50次(70%)中收敛(Fisher's exact OR=21.0,95% CI 2.7-167,p=8.7e-5)。失败的缺失网络表现出完全无学习,这与学习速度的解释不一致。相位缺失实验显示,VEN移除在中训练阶段(第5-25个epoch)最破坏性,此时脉冲电路中形成了一种共适应依赖关系。我们推导出一个正式的解释,表明VENs提供了一条直接的梯度路径,使其免受影响递归电路的雅可比不稳定性。推断时的VEN缺失导致显著的性能下降(Wilcoxon p=0.022),范围从无变化(16/20个网络)到灾难性崩溃(0.989到0.620)。VENs作为学习支架,其发育缺失导致随机学习失败——这在ASC中变量社交技能获取的计算类比,并为类器官和电生理研究提供可验证的预测。

英文摘要

Von Economo neurons (VENs) are selectively lost in behavioural-variant frontotemporal dementia (bvFTD) and reduced in autism spectrum conditions (ASC), yet their computational role in social learning remains unexplained. We train a spiking neural network (the VENCircuit) embedding VEN-like projection neurons (K=40, 2% of total) in a recurrent pyramidal circuit across 50 matched random initialisations with and without VENs. The network is trained on a controlled binary classification task; we make no claim to model social cognition directly. VEN-intact networks converged in 49/50 cases (98%) versus 35/50 (70%) for VEN-ablated networks (Fisher's exact OR=21.0, 95% CI 2.7-167, p=8.7e-5). Failed ablated networks showed complete absence of learning, inconsistent with a speed-of-learning account. Phase-ablation experiments show VEN removal is most disruptive during mid-training (epochs 5-25), when a co-adaptive dependency forms in the pyramidal circuit. We derive a formal account showing VENs provide a direct gradient pathway immune to Jacobian instabilities affecting the recurrent circuit. Inference-time VEN ablation caused a significant performance drop (Wilcoxon p=0.022), ranging from no change (16/20 networks) to catastrophic collapse (0.989 to 0.620). VENs function as acquisition scaffolds whose developmental absence produces stochastic learning failure - a computational analogue of variable social skill acquisition in ASC - with falsifiable predictions for organoid and electrophysiology studies.

2605.12485 2026-05-19 q-bio.NC q-bio.QM

Letting the neural code speak: Automated characterization of monkey visual neurons through human language

让神经代码发声:通过人类语言自动表征猴子视觉神经元

Vedang Lad, Katrin Franke, Tamar Rott Shaham, Surya Ganguli, Andreas S. Tolias, Sophia Sanborn, Nikos Karantzas

AI总结 该研究通过人类语言表征猴子视觉皮层神经元的选择性,利用生成模型和神经数字孪生技术,实现了对神经功能的可解释、可测试的大规模描述,推动了智能科学发现。

详情
AI中文摘要

理解单个神经元编码什么是个体神经科学的核心问题。在初级视觉皮层(V1)中,数学模型(如Gabor函数)捕捉神经选择性,但更高区域没有类似的框架。我们显示自然语言可以填补这一角色:在恒河猴V1和V4中,大多数神经元的选择性可通过简洁、可验证的语义描述捕捉。利用V1和V4的数字孪生,我们开发了一个闭环框架,将每个神经元的高激活和低激活图像转换为密集描述,生成语义假设和合成图像,并在硅中验证假设。描述范围从V1中的定向边缘和空间频率到V4中形式、颜色和纹理的结合。在V4中,由激活和抑制假设生成的图像分别驱动96.1%的神经元高于自然图像响应的95百分位数和97.6%低于5百分位数(与随机图像的约10%相比);V1激活结果与V4相同,而V1抑制在语言上更难描述。表征相似性分析显示神经活动、视觉嵌入和语言嵌入之间存在部分对齐,视觉最与神经活动对齐;在文本瓶颈中对齐丢失,但当假设重新生成图像时恢复,显示语言压缩是损失性但语义忠实的。这些结果表明,结合生成模型与神经数字孪生可以实现神经功能的大规模可解释、可测试的描述,朝着智能科学发现迈进。

英文摘要

Understanding what individual neurons encode is a core question in neuroscience. In primary visual cortex (V1), mathematical models (e.g., Gabor functions) capture neural selectivity, but no comparable framework exists for higher areas. We show that natural language can fill this role: across macaque V1 and V4, the selectivity of most neurons is captured by concise, verifiable semantic descriptions. Using digital twins of V1 and V4, we develop a closed-loop framework that translates each neuron's high- and low-activating images into dense captions, generates a semantic hypothesis and synthesized images, and verifies the hypothesis in silico. Descriptions range from oriented edges and spatial frequency in V1 to conjunctions of form, color, and texture in V4. In V4, images generated from activating and suppressing hypotheses drove 96.1% of neurons above the 95th and 97.6% below the 5th percentile of natural-image responses, respectively (vs. ~10% for random images); V1 activation results matched V4, while V1 suppression was less describable in language. Representational similarity analysis reveals partial alignment between neural activity, vision embeddings, and language embeddings, with vision most aligned to neural activity; alignment lost in the text bottleneck is recovered when hypotheses are rendered back into images, showing that linguistic compression is lossy yet semantically faithful. Together, these results show that combining generative models with neural digital twins enables interpretable, testable descriptions of neural function at scale, toward agentic scientific discovery.

2605.10978 2026-05-19 q-bio.QM

VibeProteinBench: An Evaluation Benchmark for Language-interfaced Vibe Protein Design

VibeProteinBench: 一种用于语言接口的蛋白质设计评估基准

Hyunjin Seo, Hongjoon Ahn, Jimin Park, Sungjun Han, Gyubok Lee, Soojung Yang, Joseph S Brown, Leo Chen, Gina El Nesr, Feyisayo Eweje, Sarah Gurev, Hyejin Lee, Cheng-Hao Liu, Junlang Liu, Zhihui Qi, Gyu Rie Lee, Sungsoo Ahn, Jamin Shin, Sangwon Jung

AI总结 本文提出VibeProteinBench,一种用于评估语言接口蛋白质设计能力的基准,通过三个互补阶段验证模型在蛋白质设计中的通用能力,揭示当前LLM在蛋白质设计方面的挑战。

详情
AI中文摘要

蛋白质设计旨在生成能够折叠成稳定三维结构并满足目标功能性质的氨基酸序列。该领域正逐渐转向vibe蛋白质设计,其中单个模型被期望生成新序列、工程现有蛋白质并通过灵活的自然语言约束推理蛋白质特性。大型语言模型(LLMs)已在此领域成为主导范式。然而,现有评估基准往往局限于蛋白质设计的某些方面,而其他则限制设计目标到结构化的输入模式,缺乏一个能够评估开放意图下蛋白质设计广泛能力的综合框架。为此,我们提出了VibeProtein设计基准(VibeProteinBench),一种语言接口的基准,通过三个互补阶段模拟计算蛋白质设计工作流程:识别、工程和生成。每个阶段均基于专家整理的机理理由和多方面的计算机验证,以计算方式验证模型输出是否具有生物学合理性。在多样化的通用和领域专用LLMs上的评估表明,没有任何模型在所有三个阶段都表现出强性能,这表明通用蛋白质设计仍然是当前LLM的重大挑战。

英文摘要

Protein design aims to compose amino-acid sequences that fold into stable three-dimensional structures while satisfying targeted functional properties. The field is increasingly shifting toward vibe protein design, where a single model is expected to generate novel sequences, engineer existing proteins, and reason about protein characteristics through flexible natural-language constraints. Large language models (LLMs) have emerged as a leading paradigm in this space. However, existing evaluation benchmarks often limit their scope to a partial aspect of protein design, while others restrict design objectives to structured input schemas, lacking an integrated framework that evaluates the broad spectrum of protein design competence under open-ended intents. To this end, we present Vibe Protein design Benchmark (VibeProteinBench), a language-interfaced benchmark that probes generalist capabilities through three complementary stages mirroring a computational protein design workflow: recognition, engineering, and generation. Each stage is grounded in expert-curated mechanistic rationales and multi-faceted in silico validation, to computationally verify whether model outputs are biologically plausible. Evaluations across diverse general-purpose and domain-specialized LLMs reveal that no model achieves strong performance across all three stages, suggesting that generalist protein design remains a substantial open challenge for current LLMs.

2604.11852 2026-05-19 q-bio.QM cs.AI cs.LG

Limitations of Sequence-Based Protein Representations for Parkinson's Disease Classification: A Leakage-Free Benchmark

序列基蛋白质表示在帕金森病分类中的局限性:一种无泄漏的基准测试

César Jesús Núñez-Prado, Grigori Sidorov, Liliana Chanona-Hernández

AI总结 本文研究了序列基蛋白质表示在帕金森病分类中的局限性,通过无泄漏的基准测试评估了多种基于蛋白质初级序列的表示方法,发现单一序列信息对疾病分类的判别能力有限,需引入更丰富的生物学特征。

详情
Comments
36 pages, 10 figures, 9 tables. Updated title, abstract, figures, and revised experimental discussion
AI中文摘要

可靠分子生物标志物的鉴定仍因帕金森病的多因素性质而具有挑战性。尽管蛋白质序列是基础且广泛可用的生物信息来源,但其单独判别能力用于复杂疾病分类仍不明确。本文提出了一个受控且无泄漏的评估,评估了多种仅基于蛋白质初级序列的表示方法,包括氨基酸组成、k-mer、物理化学描述符、混合表示以及来自蛋白质语言模型的嵌入,所有均在嵌套分层交叉验证框架下评估以确保性能估计的无偏性。表现最佳的配置(ProtBERT + MLP)达到F1分数为0.704 ± 0.028和ROC-AUC为0.748 ± 0.047,表明判别性能仅中等。传统表示如k-mer达到相似的F1值(最高约0.667),但表现出高度不平衡的行为,召回率接近0.98,精度约0.50,反映出对正样本预测的强烈偏倚。在各种表示中,性能差异仍保持在狭窄范围内(F1在0.60到0.70之间),而无监督分析揭示没有与类别标签对齐的内在结构,统计检验(Friedman检验,p = 0.1749)不显示模型间的显著差异。这些结果表明类别之间有显著重叠,并表明仅凭初级序列信息对帕金森病分类的判别能力有限。本研究建立了一个可重复的基线,并提供了实证证据,表明更丰富的生物学特征,如结构、功能或相互作用描述符,对于稳健的疾病建模是必需的。

英文摘要

The identification of reliable molecular biomarkers for Parkinson's disease remains challenging due to its multifactorial nature. Although protein sequences constitute a fundamental and widely available source of biological information, their standalone discriminative capacity for complex disease classification remains unclear. In this work, we present a controlled and leakage-free evaluation of multiple representations derived exclusively from protein primary sequences, including amino acid composition, k-mers, physicochemical descriptors, hybrid representations, and embeddings from protein language models, all assessed under a nested stratified cross-validation framework to ensure unbiased performance estimation. The best-performing configuration (ProtBERT + MLP) achieves an F1-score of 0.704 +/- 0.028 and ROC-AUC of 0.748 +/- 0.047, indicating only moderate discriminative performance. Classical representations such as k-mers reach comparable F1 values (up to approximately 0.667), but exhibit highly imbalanced behavior, with recall close to 0.98 and precision around 0.50, reflecting a strong bias toward positive predictions. Across representations, performance differences remain within a narrow range (F1 between 0.60 and 0.70), while unsupervised analyses reveal no intrinsic structure aligned with class labels, and statistical testing (Friedman test, p = 0.1749) does not indicate significant differences across models. These results demonstrate substantial overlap between classes and indicate that primary sequence information alone provides limited discriminative power for Parkinson's disease classification. This work establishes a reproducible baseline and provides empirical evidence that more informative biological features, such as structural, functional, or interaction-based descriptors, are required for robust disease modeling.

2603.09089 2026-05-19 stat.CO math.PR q-bio.NC

Sampling on Discrete Spaces with Temporal Point Processes

在离散空间中使用时间点过程进行采样

Cameron A. Stewart, Maneesh Sahani

AI总结 本文提出了一种基于时间点过程的离散空间采样方法,通过构造多变量时间点过程,使其在固定长度滑动窗口内的事件计数向量收敛于目标分布,同时引入辅助随机性将采样器转化为退化出生-死亡过程,并在多个目标分布上验证了其优越性。

详情
Comments
20 pages, 1 figure. Minor revisions to wording, notation, and formatting. No substantive changes
AI中文摘要

时间点过程为从离散分布中采样提供了一个强大的框架,但在现有文献中仍未被充分利用。我们展示了如何为任何具有向下闭合支持的多变量计数分布构造一个多变量时间点过程,其在固定长度滑动窗口内的事件计数向量随着时间趋于无穷大时收敛于目标分布。该采样器被结构化为一组可能相互耦合的无限服务器队列,具有确定性服务时间,表现出一种离散形式的动量,抑制了随机游走行为。允许的进程家族既包括可逆动态也包括不可逆动态。作为应用,我们推导出一个递归的随机神经网络,其动态实现基于采样的计算,并表现出一些生物合理特征,包括相对抑制期和振荡。引入辅助随机性将采样器转化为出生-死亡过程,从而将后者确立为退化情况,具有相同的极限分布。在63个目标分布的模拟中,我们的采样器始终优于这些出生-死亡过程,并在多变量有效样本量方面频繁优于Zanella过程,进一步在归一化CPU时间下获得进一步增益。

英文摘要

Temporal point processes offer a powerful framework for sampling from discrete distributions, yet they remain underutilized in existing literature. We show how to construct, for any target multivariate count distribution with downward-closed support, a multivariate temporal point process whose event-count vector in a fixed-length sliding window converges in distribution to the target as time tends to infinity. Structured as a system of potentially coupled infinite-server queues with deterministic service times, the sampler exhibits a discrete form of momentum that suppresses random-walk behaviour. The admissible families of processes permit both reversible and non-reversible dynamics. As an application, we derive a recurrent stochastic neural network whose dynamics implement sampling-based computation and exhibit some biologically plausible features, including relative refractory periods and oscillations. The introduction of auxiliary randomness reduces the sampler to a birth-death process, establishing the latter as a degenerate case with the same limiting distribution. In simulations on 63 target distributions, our sampler always outperforms these birth-death processes and frequently outperforms Zanella processes in multivariate effective sample size, with further gains when normalized by CPU time.

2603.03190 2026-05-19 cs.AI q-bio.NC

Expectation and Acoustic Neural Network Representations Enhance Music Identification from Brain Activity

期望与听觉神经网络表示增强从脑活动识别音乐

Shogo Noguchi, Taketo Akama, Tai Nakamura, Shun Minamikawa, Natalia Polouliakh

AI总结 本研究通过区分听觉和期望相关的神经网络表示作为教师目标,提高了基于EEG的音乐识别性能,展示了表示学习可以由神经编码引导,并为预测音乐认知和神经解码的发展提供了新方向。

详情
Comments
47 pages, 12 figures
AI中文摘要

在音乐聆听过程中,皮层活动编码了听觉和期望相关信息。先前工作已表明,ANN表示类似于皮层表示,并可作为EEG识别的监督信号。本文显示,将听觉和期望相关的ANN表示作为教师目标进行区分,能提高基于EEG的音乐识别性能。预训练以预测任一表示的模型优于非预训练基线,且结合它们可获得互补增益,超过通过不同随机初始化形成的强种子集合。这些发现表明,教师表示类型影响下游性能,且表示学习可以由神经编码引导。本工作为预测音乐认知和神经解码的发展指明了方向。我们的期望表示直接从原始信号计算得出,无需人工标签,反映了超越起始或音高的预测结构,使能够研究跨多样刺激的多层预测编码。其可扩展性表明,未来可能开发出基于皮层编码原理的通用EEG模型。

英文摘要

During music listening, cortical activity encodes both acoustic and expectation-related information. Prior work has shown that ANN representations resemble cortical representations and can serve as supervisory signals for EEG recognition. Here we show that distinguishing acoustic and expectation-related ANN representations as teacher targets improves EEG-based music identification. Models pretrained to predict either representation outperform non-pretrained baselines, and combining them yields complementary gains that exceed strong seed ensembles formed by varying random initializations. These findings show that teacher representation type shapes downstream performance and that representation learning can be guided by neural encoding. This work points toward advances in predictive music cognition and neural decoding. Our expectation representation, computed directly from raw signals without manual labels, reflects predictive structure beyond onset or pitch, enabling investigation of multilayer predictive encoding across diverse stimuli. Its scalability to large, diverse datasets further suggests potential for developing general-purpose EEG models grounded in cortical encoding principles.

2602.21787 2026-05-19 q-bio.BM math-ph math.MP nlin.PS physics.bio-ph

Sub-residue sharpness of protein helix-coil transitions reveals a spatial-spectral uncertainty limit

蛋白质螺旋-无序转变的亚残基锐度揭示了空间-频谱不确定性极限

Yiquan Wang

AI总结 通过分析蛋白质螺旋-无序转变的亚残基锐度,揭示了空间-频谱不确定性极限,研究发现螺旋段呈现近可积分、低熵的孤子状状态,而无序区域则表现出宽带构象噪声,统计分析表明几何转变宽度仅为0.145个残基,提供了Zimm-Bragg模型高热力学协同性的独立动力学对应。

详情
AI中文摘要

合作性螺旋-无序转变的边界直接影射蛋白质别构和构象动力学,但这些结构界面处持续存在的一个到两个残基的分配模糊性物理起源仍未解决。我们应用离散的Hasimoto映射,将三维蛋白质主链几何结构转化为一维离散非线性薛定谔有效势能,并分析其空间-频率波动。螺旋段显示近可积分、低熵的孤子状状态,而无序区域则表现出宽带构象噪声。对1986种蛋白质中超过19000个边界进行统计分析,揭示出几何转变宽度的中位数仅为0.145个残基,提供了Zimm-Bragg模型高热力学协同性的独立动力学对应。这种亚残基空间狭窄性表明了一种由Gabor不确定性原理支配的内在观测限制,即任何宏观的频谱探测器倾向于模糊微观的相边界,暗示结构生物学中的边界模糊性不仅仅是算法性的,而是反映了生物聚合物晶格固有的物理分辨率限制。

英文摘要

The boundaries of cooperative helix--coil transitions directly affect protein allostery and conformational dynamics, yet the physical origin of the persistent one-to-two-residue assignment ambiguity at these structural interfaces remains unresolved. We apply the discrete Hasimoto map to translate three-dimensional protein backbone geometry into a one-dimensional discrete nonlinear Schrödinger effective potential and analyze its spatial-frequency fluctuations. Helical segments display near-integrable, low-entropy soliton-like states, while coil regions exhibit broadband conformational noise. Statistical analysis of over 19,000 boundaries across 1,986 proteins reveals a median geometric transition width of only 0.145 residues, providing an independent kinematic counterpart to the high thermodynamic cooperativity of the Zimm--Bragg model. This sub-residue spatial narrowness indicates an intrinsic observational constraint governed by the Gabor uncertainty principle, whereby any macroscopic spectral probe tends to blur the microscopic phase boundary, suggesting that the boundary ambiguity in structural biology is not merely algorithmic but reflects a physical resolution limit inherent to the biopolymer lattice.

2602.10644 2026-05-19 q-bio.BM

Towards Universal Spatial Transcriptomics Super-Resolution: A Generalist Physically Consistent Flow Matching Framework

迈向通用空间转录组超分辨率:一种通用的物理一致流匹配框架

Xinlei Huang, Weihao Dai, Zijun Qin, Xin Yu, Di Wang, Yanran Liu, Lixin Cheng, Xubin Zheng

AI总结 本文提出SRast框架,通过策略性解耦架构和物理先验,解决空间转录组超分辨率中的异质性和物理一致性问题,实现跨物种、组织和平台的高超分辨率性能。

详情
Journal ref
SIGKDD 2026
AI中文摘要

空间转录组学提供了解析组织空间异质性的前所未有的视角。然而,高分辨率空间转录组技术仍然受到基因覆盖有限、技术复杂性和高成本的限制。现有的空间转录组超分辨率方法从低分辨率数据中存在两个根本性限制:由于忽视固有生物异质性导致的分布外泛化能力差,以及缺乏物理一致性。为解决这些挑战,我们提出了SRast,一种新的物理约束通用框架,用于稳健的空间转录组超分辨率。为解决异质性,SRast采用策略性解耦架构,明确将基因语义表示与空间几何反卷积解耦,利用自监督学习对潜在分布进行对齐,以缓解跨样本偏移。关于物理先验,SRast将任务重新表述为简单x上的比率预测,通过流匹配模型学习最优传输基于的几何变换,严格强制局部质量守恒。在多样本、组织和平台上的广泛实验表明,SRast实现了最先进的性能,表现出优越的零样本泛化能力和确保在恢复细粒度生物结构中的物理一致性。

英文摘要

Spatial transcriptomics provides an unprecedented perspective for deciphering tissue spatial heterogeneity. However, high-resolution spatial transcriptomic technology remains constrained by limited gene coverage, technical complexity, and high cost. Existing spatial transcriptomics super-resolution methods from low resolution data suffer from two fundamental limitations: poor out-of-distribution generalization stemming from a neglect of inherent biological heterogeneity, and a lack of physical consistency. To address these challenges, we propose SRast, a novel physically constrained generalist framework designed for robust spatial transcriptomics super-resolution. To tackle heterogeneity, SRast employs a strategic decoupling architecture that explicitly decouples gene semantics representation from spatial geometry deconvolution, utilizing self-supervised learning to align latent distributions and mitigate cross-sample shifts. Regarding physical priors, SRast reformulates the task as ratio prediction on the simplex, performing a flow matching model to learn optimal transport-based geometric transformations that strictly enforce local mass conservation. Extensive experiments across diverse species, tissues, and platforms demonstrate that SRast achieves state-of-the-art performance, exhibiting superior zero-shot generalization capabilities and ensuring physical consistency in recovering fine-grained biological structures.

2512.22098 2026-05-19 stat.ME math.PR math.ST q-bio.PE stat.CO stat.TH

Exact inference via quasi-conjugacy in two-parameter Poisson-Dirichlet hidden Markov models

通过准共轭性在双参数泊松-狄利克雷隐马尔可夫模型中实现精确推断

Marco Dalla Pria, Matteo Ruggiero, Dario Spanò

AI总结 本文提出了一种非参数模型,用于从离散时间数据中推断时间演变的未观察概率分布,数据由无标签的划分组成。潜在过程是双参数泊松-狄利克雷扩散过程,观测通过可交换抽样产生。应用包括社会和遗传数据,其中仅观察到聚类汇总信息。为了解决不可行的似然,我们开发了一个可计算的推断框架,避免了标签枚举和直接模拟潜在状态。我们利用扩散过程与在划分上的纯死亡过程之间的对偶性,以及编码新数据影响的凝集算子,从而得到前向和后向推断的闭式递归更新。我们计算了任意时间点潜在状态的精确后验分布和未来或插值划分的预测分布。这使我们能够进行在线和离线推断和预测,并完全量化不确定性,绕过MCMC和序列蒙特卡罗方法。与粒子滤波相比,我们的方法在准确性、方差和计算效率方面都有显著优势。我们通过合成实验和社会网络应用展示了该方法,恢复了时间变化异质性的可解释模式。

详情
Comments
Final accepted version. To appear in JASA
AI中文摘要

我们介绍了一种非参数模型,用于从由无标签划分组成的离散时间数据中推断时间演变的未观察概率分布。潜在过程是一个双参数泊松-狄利克雷扩散过程,观测通过可交换抽样产生。应用包括社会和遗传数据,其中仅观察到聚类汇总信息。为了解决不可行的似然,我们开发了一个可计算的推断框架,避免了标签枚举和直接模拟潜在状态。我们利用扩散过程与在划分上的纯死亡过程之间的对偶性,以及编码新数据影响的凝集算子,从而得到前向和后向推断的闭式递归更新。我们计算了任意时间点潜在状态的精确后验分布和未来或插值划分的预测分布。这使我们能够进行在线和离线推断和预测,并完全量化不确定性,绕过MCMC和序列蒙特卡罗方法。与粒子滤波相比,我们的方法在准确性、方差和计算效率方面都有显著优势。我们通过合成实验和社会网络应用展示了该方法,恢复了时间变化异质性的可解释模式。

英文摘要

We introduce a nonparametric model for inferring time-evolving, unobserved probability distributions from discrete-time data consisting of unlabelled partitions. The latent process is a two-parameter Poisson-Dirichlet diffusion, and observations arise via exchangeable sampling. Applications include social and genetic data where only aggregate clustering summaries are observed. To address the intractable likelihood, we develop a tractable inferential framework that avoids label enumeration and direct simulation of the latent state. We exploit a duality between the diffusion and a pure-death process on partitions, together with coagulation operators that encode the effect of new data. These yield closed-form, recursive updates for forward and backward inference. We compute exact posterior distributions of the latent state at arbitrary times and predictive distributions of future or interpolated partitions. This enables online and offline inference and forecasting with full uncertainty quantification, bypassing MCMC and sequential Monte Carlo. Compared to particle filtering, our method achieves higher accuracy, lower variance, and substantial computational gains. We illustrate the methodology with synthetic experiments and a social network application, recovering interpretable patterns in time-varying heterozygosity.

2511.03483 2026-05-19 q-bio.MN

A Gene Ranking Framework Enhances the Design Efficiency of Genome-Scale Constraint-Based Metabolic Networks under Time Limits

一种基因排序框架提高了在时间限制下的基因组规模约束代谢网络设计效率

Yier Ma, Takeyuki Tamura

AI总结 本文提出了一种基因排序框架,通过加速混合整数线性规划问题的求解来提高基因组规模约束代谢网络设计的效率,该方法在相同时间内提升了成功率37%至186%。

详情
AI中文摘要

基因组规模约束代谢网络的设计已经稳步发展,越来越多的成功案例实现了生长耦合生产,其中关键代谢物的生物合成与细胞生长相关。然而,设计失败的主要原因是无法在现实时间内找到解决方案。因此,开发在指定计算时间内实现高成功率的方法至关重要。在本研究中,我们提出了一种用于排序单个基因重要性的框架,以加速约束模型设计中原始混合整数线性规划(MILP)问题的求解。在所提出的方法中,预分配高重要性基因的值后,MILPs被并行求解为一系列互斥的子问题。发现我们的框架能够恢复原始方法所识别的大多数成功案例,并在相同的时间限制内将成功率提高了37%至186%。对MILP求解过程的分析表明,所提出的方法减小了子问题的规模并减少了分支定界树中的节点数量。这种基因重要性排序框架可以直接应用于各种基于MILP的算法中,用于约束代谢网络的设计。开发的脚本可在https://github.com/MetNetComp/Gene-Ranked-RatGene上获取。

英文摘要

The design of genome-scale constraint-based metabolic networks has steadily advanced, with an increasing number of successful cases achieving growth-coupled production, in which the biosynthesis of key metabolites is linked to cell growth. However, a major cause of design failures is the inability to find solutions within realistic time limits. Therefore, it is essential to develop methods that achieve a high success rate within the specified computation time. In this study, we propose a framework for ranking the importance of individual genes to accelerate the solution of the original mixed-integer linear programming (MILP) problems in the design of constraint-based models. In the proposed method, after pre-assigning values to highly important genes, the MILPs are solved in parallel as a series of mutually exclusive subproblems. It is found that our framework was able to recover most of the successful cases identified by the original approach and achieved a 37% to 186% increase in success rate compared to the original method within the same time limits. Analysis of the MILP solution process revealed that the proposed method reduced the sizes of subproblems and decreased the number of nodes in the branch-and-bound tree. This framework for ranking gene importance can be directly applicable to a range of MILP-based algorithms for the design of constraint-based metabolic networks. The developed scripts are available on \href{https://github.com/MetNetComp/Gene-Ranked-RatGene}{https://github.com/MetNetComp/Gene-Ranked-RatGene}.

2509.16255 2026-05-19 q-bio.TO eess.IV physics.med-ph

Segmentation of spinal rootlets across MRI contrasts with RootletSeg

跨MRI对比的脊髓根段分割:RootletSeg

Katerina Krejci, Jiri Chmelik, Sandrine Bedard, Falk Eippert, Ulrike Horn, Virginie Callot, Julien Cohen-Adad, Jan Valosek

AI总结 本文提出了一种深度学习方法RootletSeg,用于自动分割不同MRI扫描中的脊髓根段,通过在多个数据集上训练和测试,验证了该方法在不同MRI对比度下的分割性能,并展示了其在脊髓水平确定中的应用价值。

详情
Journal ref
K. Krejci et al., Segmentation of spinal rootlets across MRI contrasts with RootletSeg, Scientific Reports, May 2026, doi: 10.1038/s41598-026-49164-0
Comments
26 pages, 6 figures, 4 tables
AI中文摘要

目的:开发一种深度学习方法,用于自动分割不同MRI扫描中的脊髓神经根段。材料和方法:本回顾性研究包括两个开放数据集和一个私人数据集的MRI扫描,包含3D等距3T TSE T2加权(T2w)和7T MP2RAGE(T1加权[T1w] INV1和INV2,UNIT1)MRI扫描。开发了深度学习模型RootletSeg,用于分割C2-T1背侧和腹侧脊髓根段。在76例扫描上进行训练,在17例扫描上进行测试。使用Dice评分比较模型性能与现有开源方法。通过Bland-Altman分析比较RootletSeg分割的脊髓水平与由椎间盘定义的椎体水平。结果:在50名健康成人(平均年龄28.70岁±6.53 [SD];28 [56%]男性,22 [44%]女性)的93例MRI扫描上开发的RootletSeg模型,在T1w-INV2对比度下的平均Dice评分为0.67±0.09,在UNIT1对比度下的平均Dice评分为0.65±0.11,在T2w对比度下的平均Dice评分为0.64±0.08,在T1w-INV1对比度下的平均Dice评分为0.62±0.10。脊髓-椎体水平对应显示出逐渐增加的 rostrocaudal 移动,Bland-Altman 偏差范围从0.00到8.15 mm(水平中点之间的中位数差异)。结论:RootletSeg在不同MRI对比度下准确分割了C2-T1脊髓根段,使能够直接从MRI扫描中确定脊髓水平。该方法是开源的,可用于多种下游分析,包括病变分类、神经调控治疗和功能性MRI群体分析。

英文摘要

Purpose: To develop a deep learning method for the automatic segmentation of spinal nerve rootlets on various MRI scans. Material and Methods: This retrospective study included MRI scans from two open-access and one private dataset, consisting of 3D isotropic 3T TSE T2-weighted (T2w) and 7T MP2RAGE (T1-weighted [T1w] INV1 and INV2, and UNIT1) MRI scans. A deep learning model, RootletSeg, was developed to segment C2-T1 dorsal and ventral spinal rootlets. Training was performed on 76 scans and testing on 17 scans. The Dice score was used to compare the model performance with an existing open-source method. Spinal levels derived from RootletSeg segmentations were compared with vertebral levels defined by intervertebral discs using Bland-Altman analysis. Results: The RootletSeg model developed on 93 MRI scans from 50 healthy adults (mean age, 28.70 years $\pm$ 6.53 [SD]; 28 [56%] males, 22 [44%] females) achieved a mean $\pm$ SD Dice score of 0.67 $\pm$ 0.09 for T1w-INV2, 0.65 $\pm$ 0.11 for UNIT1, 0.64 $\pm$ 0.08 for T2w, and 0.62 $\pm$ 0.10 for T1w-INV1 contrasts. Spinal-vertebral level correspondence showed a progressively increasing rostrocaudal shift, with Bland-Altman bias ranging from 0.00 to 8.15 mm (median difference between level midpoints). Conclusion: RootletSeg accurately segmented C2-T1 spinal rootlets across MRI contrasts, enabling the determination of spinal levels directly from MRI scans. The method is open-source and can be used for a variety of downstream analyses, including lesion classification, neuromodulation therapy, and functional MRI group analysis.

2507.21035 2026-05-19 cs.AI cs.LG cs.MA q-bio.GN

GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

GenoMAS:通过代码驱动的基因表达分析进行科学发现的多智能体框架

Haoyang Liu, Yijiang Li, Haohan Wang

AI总结 该研究提出GenoMAS多智能体框架,通过类型消息传递协议协调六个专门的LLM代理,以实现基因表达数据的高效处理和科学发现,其在数据预处理和基因识别任务上均优于现有方法。

详情
Comments
51 pages (14 pages for the main text, 10 pages for references, and 27 pages for the appendix)
AI中文摘要

基因表达分析对于许多生物医学发现至关重要,但从原始转录组数据中提取见解仍然极具挑战性,这归因于多个大型半结构化文件的复杂性和对大量领域专业知识的需求。当前的自动化方法往往受到不灵活的工作流或完全自主代理的限制,这些代理缺乏进行严谨科学探究所需的精确度。GenoMAS则另辟蹊径,通过集成结构化工作流的可靠性与自主代理的适应性,提出了一支基于LLM的科学家团队。GenoMAS通过类型消息传递协议协调六个专门的LLM代理,每个代理都为共享的分析画布贡献互补的强项。GenoMAS的核心是一个引导规划框架:编程代理将高层任务指南展开为动作单元,并在每个节点选择前进、修订、绕过或回溯,从而在保持逻辑一致性的同时,灵活适应基因组数据的特性。在GenoTEX基准测试中,GenoMAS在数据预处理方面达到了89.13%的复合相似度相关性,在基因识别方面达到了60.48%的F1分数,分别超过了最佳现有方法10.61%和16.85%。除了指标外,GenoMAS还揭示了由文献支持的生物合理基因-表型关联,同时调整了潜在混杂因素。代码可在https://github.com/Liu-Hy/GenoMAS上获得。

英文摘要

Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data. On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F$_1$ of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at https://github.com/Liu-Hy/GenoMAS.

2506.23287 2026-05-19 cs.LG q-bio.QM

HDTree: Generative Modeling of Cellular Hierarchies for Robust Lineage Inference

HDTree: 用于鲁棒谱系推断的细胞层次生成建模

Zelin Zang, WenZhe Li, Yongjie Xu, Chang Yu, Changxi Chi, Jingbo Zhou, Zhen Lei, Stan Z. Li

AI总结 本文提出HDTree,一种用于鲁棒谱系推断的生成建模框架,通过统一的层次代码库和量化扩散过程捕捉细胞层次关系,提升稳定性与可扩展性,并在通用和单细胞数据集上验证了其在谱系推断准确性、重建质量和层次一致性方面的优越性。

详情
Comments
accepted by ICML26
AI中文摘要

在单细胞研究中,追踪和分析高通量单细胞分化轨迹对于理解生物过程至关重要。关键在于对支配细胞发育的层次结构的稳健建模。传统方法在计算成本、性能和稳定性方面存在局限。基于VAE的方法虽有所进展,但仍需要分支特定的网络模块,限制了其可扩展性和稳定性,同时常遭遇后验崩溃问题。为克服这些挑战,我们引入HDTree,一种用于稳健谱系推断的生成建模框架。HDTree通过统一的层次代码库在层次化潜在空间中捕捉树状关系,并利用量化扩散过程建模连续细胞状态转换。通过将生成过程与Waddington景观对齐,该方法不仅提高了稳定性和可扩展性,还增强了推断谱系的生物学合理性。HDTree的有效性通过在通用和单细胞数据集上的比较得到验证,其在谱系推断准确性、重建质量和层次一致性方面均优于现有方法。这些贡献使细胞分化路径的准确高效建模成为可能,为生物学发现提供可靠见解。 ootnote{代码可在https://github.com/zangzelin/code\_HDTree\_icml获取。}

英文摘要

In single-cell research, tracing and analyzing high-throughput single-cell differentiation trajectories is crucial for understanding biological processes. Key to this is the robust modeling of hierarchical structures that govern cellular development. Traditional methods face limitations in computational cost, performance, and stability. VAE-based approaches have made strides but still require branch-specific network modules, limiting their scalability and stability, while often suffering from posterior collapse. To overcome these challenges, we introduce HDTree, a generative modeling framework designed for robust lineage inference. HDTree captures tree relationships within a hierarchical latent space using a unified hierarchical codebook and employs a quantized diffusion process to model continuous cell state transitions. By aligning the generative process with the Waddington landscape, this method not only improves stability and scalability but also enhances the biological plausibility of inferred lineages. HDTree's effectiveness is demonstrated through comparisons on both general-purpose and single-cell datasets, where it outperforms existing methods in lineage inference accuracy, reconstruction quality, and hierarchical consistency. These contributions enable accurate and efficient modeling of cellular differentiation paths, offering reliable insights for biological discovery.\footnote{Code is available at https://github.com/zangzelin/code\_HDTree\_icml.

2411.12123 2026-05-19 q-bio.CB math.DS q-bio.TO

Optimisation of neoadjuvant pembrolizumab therapy for locally advanced MSI-H/dMMR colorectal cancer using data-driven delay integro-differential equations

利用数据驱动的延迟积分微分方程优化新辅助帕博利珠单抗治疗局部晚期微卫星不稳定性高/错配修复缺陷型结直肠癌

Georgio Hawi, Peter S. Kim, Peter P. Lee

AI总结 本研究利用数据驱动的延迟积分微分方程模型,优化新辅助帕博利珠单抗治疗局部晚期微卫星不稳定性高/错配修复缺陷型结直肠癌的方案,通过分析肿瘤细胞、免疫细胞和免疫检查点之间的相互作用,提高治疗效果和患者生活质量。

详情
Journal ref
Journal of Theoretical Biology 613 (2025) 112231
Comments
102 pages in total with 63 pages for the main body and 39 pages for the supporting information. Minor edits made. Published in the Journal of Theoretical Biology
AI中文摘要

结直肠癌(CRC)由于其发病率的上升,尤其是年轻人群,成为重大的公共卫生挑战。微卫星不稳定性高(MSI-H)CRC和错配修复缺陷(dMMR)CRC占所有CRC的15%,对免疫治疗,特别是PD-1抑制剂表现出显著的反应性。尽管如此,仍有必要优化免疫治疗方案以最大限度地提高临床疗效和患者生活质量。为此,我们采用一种新的框架,由延迟积分微分方程驱动,以建模局部晚期MSI-H/dMMR CRC(laMCRC)中肿瘤细胞、免疫细胞和免疫检查点之间的相互作用。其中几个组件首次在癌症中被确定性地建模,为深入理解复杂的免疫动态奠定了基础。我们考虑了两个隔室——肿瘤部位和肿瘤引流淋巴结(TDLN)——考虑了诸如树突状细胞迁移、T细胞增殖和CD8+ T细胞耗竭和再激活等现象。参数值和初始条件是从实验数据中得出的,整合了各种药代动力学、生物分析和放射学研究,以及TCGA COADREAD和GSE26571数据集的RNA测序数据去卷积。最后,我们优化了帕博利珠单抗的新辅助治疗,这是一种广泛使用的PD-1抑制剂,以在laMCRC患者中平衡疗效、效率和毒性。我们机械地分析了影响治疗成功的因素,并改进了目前FDA批准的转移性MSI-H/dMMR CRC治疗方案,证明单次中等至高剂量的帕博利珠单抗可能足以有效清除肿瘤,同时是高效、安全且实用的。

英文摘要

Colorectal cancer (CRC) poses a major public health challenge due to its increasing prevalence, particularly among younger populations. Microsatellite instability-high (MSI-H) CRC and deficient mismatch repair (dMMR) CRC constitute 15% of all CRC and exhibit remarkable responsiveness to immunotherapy, especially with PD-1 inhibitors. Despite this, there is a significant need to optimise immunotherapeutic regimens to maximise clinical efficacy and patient quality of life. To address this, we employ a novel framework driven by delay integro-differential equations to model the interactions among cancer cells, immune cells, and immune checkpoints in locally advanced MSI-H/dMMR CRC (laMCRC). Several of these components are being modelled deterministically for the first time in cancer, paving the way for a deeper understanding of the complex underlying immune dynamics. We consider two compartments$\unicode{x2014}$the tumour site and the tumour-draining lymph node (TDLN)$\unicode{x2014}$taking into account phenomena such as DC migration, T cell proliferation, and CD8+ T cell exhaustion and reinvigoration. Parameter values and initial conditions are derived from experimental data, integrating various pharmacokinetic, bioanalytical, and radiographic studies, along with deconvolution of bulk RNA-sequencing data from the TCGA COADREAD and GSE26571 datasets. We finally optimised neoadjuvant treatment with pembrolizumab, a widely used PD-1 inhibitor, to balance efficacy, efficiency, and toxicity in laMCRC patients. We mechanistically analysed factors influencing treatment success and improved upon currently FDA-approved therapeutic regimens for metastatic MSI-H/dMMR CRC, demonstrating that a single medium-to-high dose of pembrolizumab may be sufficient for effective tumour eradication while being efficient, safe, and practical.

2605.17313 2026-05-19 q-bio.PE

Dose-limited interventions in an epidemiological model

在流行病模型中的剂量限制干预

Annour Saad Abdramane, Hippolyte Djimramadji, Mahamat Saleh Daoussa Haggar, Patrick Mimphis Tchepmo Djomegni, Julien Arino

AI总结 本文研究了在流行病模型中考虑疫苗接种和治疗等干预措施,探讨了在治疗剂量有限的情况下,如何通过数学方法分析其对流行病传播的影响,并展示了预算限制对结果的影响。

详情
AI中文摘要

我们考虑了一个包含疫苗接种和治疗等干预措施的SLIARS数学流行病模型。与经典模型不同,假设治疗剂量的可用性是有限的。数学上,我们证明大多数场景实际上都归结于经典已知的场景:剂量不足相当于没有剂量,而能够恢复库存通常等同于经典情况下的疫苗接种和治疗。我们还进行了计算分析,展示了某些暂时性和随机动态如何偏离确定性的长期行为,以及预算限制的影响。

英文摘要

We consider an SLIARS mathematical epidemiology model including intervention in the form of vaccination and treatment. Contrary to classical models, it is assumed that treatment doses can be limited in availability. Mathematically, we show that most scenarios actually reduce to classic well-known scenarios: having an unreplenished number of doses is akin to having none, while being able to restore stocks is (often) equivalent to the classic situation with vaccination and treatment. We also perform a computational analysis, illustrating some of the transient and stochastic dynamics that diverge from deterministic long-term behaviour, as well as the impact of budgetary constraints.

2605.17220 2026-05-19 q-bio.PE nlin.AO physics.bio-ph

Skewed weak and Pareto-tailed strong interactions accompany community diversity and complexity

倾斜的弱相互作用和帕累托尾的强相互作用伴随群落多样性与复杂性

Takuya Hojo, Taiko Arakaki, Koichi Fujimoto

AI总结 研究揭示了群落中弱相互作用和强相互作用的分布特征,发现其遵循倾斜弱和帕累托尾分布,并探讨了这种分布与群落多样性和复杂性之间的关系。

详情
Comments
31 pages, 4 figures, 1 table, Supporting Information
AI中文摘要

生态群落常常表现出许多弱的和少数强的物种间相互作用,但其数量结构、生成基础以及与群落水平性质的联系仍然理解有限。利用两个植物-动物网络的实证数据集,我们显示,营养级和互利相互作用的强度分布呈现倾斜弱和帕累托尾(SWAPS),这通过正偏度和极值理论分别量化。我们进一步发现,相互作用强度具有物种特异性,并且在物种内受到较大限制。基于广义Lotka-Volterra模型的群落组装模拟显示,这种物种保守性,以及除营养级和互利相互作用外的多种相互作用类型,是SWAPS分布出现的必要条件。值得注意的是,SWAPS分布不仅在物种层面出现,还跨谱系出现,并伴随群落多样性和复杂性的增加。这些结果将SWAPS分布识别为生态群落中以前未被认识的相互作用特征,并提供了对群落水平性质组织的新视角。

英文摘要

Ecological communities are often characterized by many weak and few strong interspecific interactions, yet their quantitative structure, generative basis, and links to community-level properties remain poorly understood. Using two empirical datasets of plant--animal networks, we show that both trophic and mutualistic interaction strengths distribute skewed weak and Pareto-strong tails (SWAPS), as quantified by positive skewness and extreme value theory, respectively. We further find that interaction strengths are taxon-specific and largely constrained within taxa. In community assembly simulations based on a generalized Lotka--Volterra model, this taxonomic conservatism, together with multiple interaction types beyond trophic and mutualistic ones, is required for the emergence of SWAPS distribution. Notably, SWAPS distribution emerges not only at the species level but also across lineages, and its emergence accompanies increases in community diversity and complexity. Together, these results identify SWAPS distribution as a previously unrecognized interaction signature of ecological communities and provide a new perspective on the organization of community-level properties.