arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.22269 2026-03-24 q-bio.BM

Computational modeling of RNA-protein binding interactions under an external force

Danielle Wampler, Ralf Bundschuh

详情
英文摘要

RNA binding proteins play a crucial role in post-transcriptional gene regulation by controlling the transport, processing, and translation of their target RNAs. Post-transcriptional gene regulation leads to the differential expression of genetic material and loss of regulation or over-regulation relates to a large range of cancers and diseases - many of which have directly been associated with RNA binding proteins and their target RNAs. To understand RNA, RNA binding proteins, and how they function in gene expression, it is essential to characterize how RNA binding proteins interact with their target RNAs. Here, we aim to assess the potential for single molecule force spectroscopy experiments to be used in the characterization of RNA-protein binding by investigating to what extent a change of extension due to RNA-protein binding is experimentally measurable and what aspects of the interaction can be deduced from such measurements. We predict the effect of protein binding on RNA force extension measurements via the open-source ViennaRNA package, which we have modified to simultaneously consider an external force, protein binding, and RNA secondary structure. From this work, we see protein concentration-dependent responses to external forces with discernable differences in predicted extensions around biologically relevant concentrations and a connection to protein binding domain geometry for several RNA binding proteins.

2603.22150 2026-03-24 q-bio.PE physics.soc-ph

Epidemic reproduction numbers in spatial networks

Zahra Ghadiri, Jari Saramäki, Takayuki Hiraoka

详情
英文摘要

The basic and effective reproduction numbers are widely used metrics for characterizing the dynamics of infectious disease epidemics. However, the interpretation of these numbers is based on the assumption of homogeneous mixing and may not hold in real-world populations where the contact patterns deviate from that assumption. In this paper, we present a network-based framework to compare reproduction numbers in populations with and without spatial structure, while other parameters of the disease remain fixed. Using this framework, we show that in homogeneously mixed populations, in the absence of external interventions, the effective reproduction number decreases exponentially as the susceptible population declines. In contrast, in spatially structured populations, the basic reproduction number is smaller, and the effective reproduction number initially decreases faster but eventually converges to unity. We show that the reproduction number is determined by the level of competition between infectious nodes, which is governed by the network structure. Our results suggest that without knowledge of the network structure, reproduction numbers may not be informative for parameterizing the contagiousness of the disease or predicting the behavior of epidemic spreading.

2603.21634 2026-03-24 math.PR q-bio.PE

Individual-based stochastic model with unbounded growth, birth and death rates: a tightness result

Virgile Brodu

Comments 52 pages, 6 figures, 1 table

详情
英文摘要

We study population dynamics through a general growth/degrowth-fragmentation process, with resource consumption and unbounded growth/degrowth, birth and death rates. Our model is structured in a positive trait called energy (which is a proxy for any biological parameter such as size, age, mass, protein quantity...), and the jump rates of the process can be arbitrarily high depending on individual energies, which has not been considered yet in the literature. After a preliminary study to construct well-defined objects (which is necessary contrary to similar works, because of the explosion of individual rates), we consider a classical sequence of renormalizations of the underlying process and obtain a tightness result for the associated laws in large-population asymptotics. We characterize the accumulation points of this sequence as solutions of an integro-differential system of equations, which proves the existence of measure solutions to this system. Furthermore, if such a measure solution is unique, then our tightness result becomes a convergence result towards this unique process. We illustrate our work with the case of allometric rates (i.e. they are assumed to be power functions) and eventually present numerical simulations in this allometric setting.

2603.21542 2026-03-24 q-bio.NC

Brain Learning Principles Utilizing Non-Ideal Factors in Neural Circuits

Da-Zheng Feng, Hao-Xuan Du

详情
英文摘要

The human brain achieves its remarkable computational prowess not despite its inherent non-ideal factors noise, heterogeneity, structural irregularities, decentralized plasticity, systematic errors, and chaotic dynamics but precisely because of them. This paper systematically demonstrates that these traits, long dismissed as imperfections in classical neuroscience and eliminated in digital engineering, are evolutionary design principles that endow the brain with robustness, adaptability, and creativity.

2603.21503 2026-03-24 q-bio.BM

Persistent local Laplacian prediction of protein-ligand binding affinities

Jian Liu, Hongsong Feng

详情
英文摘要

Accurate prediction of protein-ligand binding affinity remains a central challenge in structure-based drug discovery. The effectiveness of machine learning models critically depends on the quality of molecular descriptors, for which advanced mathematical frameworks provide powerful tools. In this work, we employ a novel mathematical theory, termed the persistent local Laplacian (PLL), to construct molecular descriptors that capture localized geometric and topological features of biomolecular structures. The PLL framework addresses key limitations of traditional topological data analysis methods, such as persistent homology and the persistent Laplacian, which are often insensitive to local structural variations, while maintaining high computational efficiency. The resulting molecular descriptors are integrated with advanced machine learning algorithms to develop accurate predictive models for protein-ligand binding affinity. The proposed models are systematically evaluated on three well-established benchmark datasets, demonstrating consistently strong and competitive predictive performance. Computational results show that the PLL-based models outperform existing approaches, highlighting their potential as a powerful tool for drug discovery, protein engineering, and broader applications in science and engineering.

2603.19814 2026-03-24 math.AP math.DS q-bio.PE

Stability analysis and long-time convergence of a partial differential equation model of two-phase ageing

Luce Breuil

详情
英文摘要

Recent biological evidence suggests the presence of a two-phase ageing process in several species. We introduce a system of two age-structured partial differential equations (PDE) representing two phases of ageing of a wild population. The model includes a coupling of both equations through birth and transition between phases and non-linearities due to competition. We show the existence, positivity and uniqueness of weak solutions in a general setting. For a simplified system of ordinary differential equations (ODE), we show existence and uniqueness of a strictly positive steady state attracting all trajectories. We study another simplification, a coupled PDE-ODE model, for which we prove existence, uniqueness and local asymptotic stability of a strictly positive steady state. Under further assumptions, but without assuming weak non-linearities, we show the global asymptotic stability of that steady state. The uniqueness of steady states and absence of oscillations in these systems show that the proportion of individuals in each phase at equilibrium is a unique feature of the model. This paves the way to ecological applications as the experimental measure of such a proportion could help gain some insight on the health of a wild population.

2602.23624 2026-03-24 q-bio.PE q-bio.MN

Sex chromosome stability and turnover across vertebrates: a developmental gene regulatory network perspective

Wen-Juan Ma, Ricard Fontserè, Tristan Cornelis, Paris Veltsos, Qi Zhou

Comments 22 pages, 2 figures, GBE invited review article

详情
英文摘要

Sex chromosomes have evolved repeatedly across the Tree of Life, yet their evolutionary fates differ strikingly. In sharp contrast to mammals and birds with degenerated, stable Y/W chromosomes, in most amphibians, teleosts, non avian reptiles and flowering plants, sex chromosomes remain largely homomorphic and undergo frequently turnover. Explanations such as the evolutionary trap hypothesis, sexually antagonistic selection, mutation load, genetic drift and selfish genetic elements, focus on population genetic processes and do not fully explain this pattern. Here we propose the developmental gene regulatory network (GRN) lock in hypothesis. We compile case studies of turnover across vertebrates, synthesise comparative developmental data on sex determination and dosage regulation (DC). In mammals and birds, sex is determined by an early, initiation by somatic cells, fully penetrant master signal acting within a narrow, thermally buffered embryonic window. This signal operates within highly canalised GRNs, coupled to chromosome scale dosage compensation, with alternative splicing events playing little or no causal role in primary sex determination. This configuration makes it difficult for new master sex determining loci to invade without generating deleterious intermediate states. By contrast, many ectothermic vertebrates possess flexible, integrative threshold GRNs in which genetic, germ cells and environmental inputs interact over a prolonged sensitive embryonic period, with absent or largely gene-by-gene based DC and environmentally responsive splicing near key regulatory nodes, providing many entry points for sex determining loci to evolve. We outline empirical predictions and highlight how integrating developmental biology, molecular mechanisms and population genetics can yield testable models for when sex chromosomes become evolutionarily locked-in versus repeated turnover.

2602.18889 2026-03-24 math.AT q-bio.QM

Topological shape transform for thymus structures

Haochen Yang, Vadim Lebovici, Andreas Tarcevski, Liliana Tchernev, Saulius Zuklys, Georg A. Holländer, Helen M. Byrne, Heather A. Harrington

Comments 41 pages, 13 figures

详情
英文摘要

The Euler characteristic transform (ECT) is an emerging and powerful framework within topological data analysis for quantifying the geometry of shape. The applicability of ECT has been limited due to its sensitivity to noisy data. Here, we introduce SampEuler, a novel ECT-based shape descriptor designed to achieve enhanced robustness to perturbations. We provide a theoretical analysis establishing the stability of SampEuler and validate these properties empirically through pairwise similarity analyses on a benchmark dataset and showcase it on a thymus dataset. The thymus is a primary lymphoid organ that is essential for the maturation and selection of self-tolerant T cells, and within the thymus, thymic epithelial cells are organized in complex three-dimensional architectures, yet the principles governing their formation, functional organization, and remodeling during age-related involution remain poorly understood. Addressing these questions requires robust and informative shape descriptors capable of capturing subtle architectural changes across developmental stages. We develop and apply SampEuler to a newly generated two-dimensional imaging dataset of mouse thymi spanning multiple age groups, where SampEuler outperforms both persistent homology--based methods and deep learning models in detecting subtle, localized morphological differences associated with aging. To facilitate interpretation, we develop a vectorization and visualization framework for SampEuler, which preserves rich morphological information and enables identification of structural features that distinguish thymi across age groups. Collectively, our results demonstrate that SampEuler provides a robust and interpretable approach for quantifying thymic architecture and reveals age-dependent structural changes that offer new insights into thymic organization and involution.

2511.17695 2026-03-24 q-bio.QM

SynCell: Contextualized Drug Synergy Prediction

Keqin Peng, Guangxin Su, Qinshan Shi, Shuai Gao, Ren Wang, Can Chen, Jun Wen

Comments 12 pages, 1 figures

详情
英文摘要

Drug synergy is profoundly influenced by cellular context, as variations in protein interaction landscapes and pathway activities across cell types reshape how drugs act in combination. Most existing models overlook this heterogeneity, relying on static or bulk-level protein-protein interaction (PPI) networks that ignore cell-specific molecular wiring. The availability of large-scale transcriptomic data now enables the reconstruction of cell-line-resolved interactomes, offering a new foundation for contextualized drug synergy modeling. Here we present SynCell, a Contextualized Drug Synergy framework that integrates drug-protein, protein-protein, and protein-cell line relations within a unified graph architecture. SynCell leverages cell-line-specific PPI networks to embed the molecular context in which drugs act, and employs graph convolutional learning to model how pharmacological effects propagate through cell-specific signaling networks. This formulation treats synergy prediction as a cell-line-contextualized drug-drug interaction problem. Across the large-scale DrugCombDB benchmark, SynCell consistently outperforms state-of-the-art baselines - including DeepSynergy, HypergraphSynergy, HERMES, BAITSAO, DTF, and NHP - particularly in predicting synergies involving unseen drugs or novel cell lines. When benchmarked against these seven methods, SynCell demonstrates substantial gains in generalization and biological interpretability, confirming that contextualizing PPIs with cell-line resolution is indispensable for accurate synergy prediction.

2509.19988 2026-03-24 stat.ML cs.LG q-bio.QM

BioBO: Biology-informed Bayesian Optimization for Perturbation Design

Yanke Li, Tianyu Cui, Tommaso Mansi, Mangal Prakash, Rui Liao

Comments ICLR 2026

详情
英文摘要

Efficient design of genomic perturbation experiments is crucial for accelerating drug discovery and therapeutic target identification, yet exhaustive perturbation of the human genome remains infeasible due to the vast search space of potential genetic interactions and experimental constraints. Bayesian optimization (BO) has emerged as a powerful framework for selecting informative interventions, but existing approaches often fail to exploit domain-specific biological prior knowledge. We propose Biology-Informed Bayesian Optimization (BioBO), a method that integrates Bayesian optimization with multimodal gene embeddings and enrichment analysis, a widely used tool for gene prioritization in biology, to enhance surrogate modeling and acquisition strategies. BioBO combines biologically grounded priors with acquisition functions in a principled framework, which biases the search toward promising genes while maintaining the ability to explore uncertain regions. Through experiments on established public benchmarks and datasets, we demonstrate that BioBO improves labeling efficiency by 25-40%, and consistently outperforms conventional BO by identifying top-performing perturbations more effectively. Moreover, by incorporating enrichment analysis, BioBO yields pathway-level explanations for selected perturbations, offering mechanistic interpretability that links designs to biologically coherent regulatory circuits.

2508.15077 2026-03-24 q-bio.PE

Modelling the transmission and impact of Omicron variants of Covid-19 in different ethnicity groups in Aotearoa New Zealand

Samik Datta, Vincent X Lomas, Nicole Satherley, Andrew Sporle, Michael J Plank

详情
Journal ref
Epidemics (2026), 55: 100905
英文摘要

Previous pandemics, including influenza pandemics and Covid-19, have disproportionately impacted Māori and Pacific populations in Aotearoa New Zealand. The reasons for this are multi-faceted, including differences in socioeconomic deprivation, housing conditions and household size, vaccination rates, access to healthcare, and prevalence of pre-existing health conditions. Many mathematical models that were used to inform the response to the Covid-19 pandemic did not explicitly include ethnicity or other socioeconomic variables. This limited their ability to predict, understand and mitigate inequitable impacts of the pandemic. Here, we extend a model that was developed during the Covid-19 pandemic to support the public health response by stratifying the population into four ethnicity groups: Māori, Pacific, Asian and European/other. We include three ethnicity-specific components in the model: vaccination rates, clinical severity parameters, and contact patterns. We compare model results to ethnicity-specific data on Covid-19 cases, hospital admissions and deaths between 1 January 2022 and 30 June 2023, under different model scenarios in which these ethnicity-specific components are present or absent. We find that differences in vaccination rates explain only part of the observed disparities in outcomes. While no model scenario is able to fully capture the heterogeneous temporal dynamics, our results suggest that differences between ethnicities in the per-infection risk of clinical severe disease is an important factor. Our work is an important step towards models that are better able to predict inequitable impacts of future pandemic and emerging disease threats, and investigate the ability of interventions to mitigate these.

2508.14936 2026-03-24 q-bio.QM cs.AI cs.LG stat.AP stat.ML

Can synthetic data reproduce real-world findings in epidemiology? A replication study using adversarial random forests

Jan Kapar, Kathrin Günther, Lori Ann Vallis, Klaus Berger, Nadine Binder, Hermann Brenner, Stefanie Castell, Beate Fischer, Volker Harth, Bernd Holleczek, Timm Intemann, Till Ittermann, André Karch, Thomas Keil, Lilian Krist, Berit Lange, Michael F. Leitzmann, Katharina Nimptsch, Nadia Obi, Iris Pigeot, Tobias Pischon, Tamara Schikowski, Börge Schmidt, Carsten Oliver Schmidt, Anja M. Sedlmair, Justine Tanoey, Harm Wienbergen, Andreas Wienke, Claudia Wigmann, Marvin N. Wright

详情
英文摘要

Synthetic data holds substantial potential to address practical challenges in epidemiology due to restricted data access and privacy concerns. However, many current methods suffer from limited quality, high computational demands, and complexity for non-experts. Furthermore, common evaluation strategies for synthetic data often fail to directly reflect statistical utility and measure privacy risks sufficiently. Against this background, a critical underexplored question is whether synthetic data can reliably reproduce key findings from epidemiological research while preserving privacy. We propose adversarial random forests (ARF) as an efficient and convenient method for synthesizing tabular epidemiological data. To evaluate its performance, we replicated statistical analyses from six epidemiological publications covering blood pressure, anthropometry, myocardial infarction, accelerometry, loneliness, and diabetes, from the German National Cohort (NAKO Gesundheitsstudie), the Bremen STEMI Registry U45 Study, and the Guelph Family Health Study. We further assessed how dataset dimensionality and variable complexity affect the quality of synthetic data, and contextualized ARF's performance by comparison with commonly used tabular data synthesizers in terms of utility, privacy, generalisation, and runtime. Across all replicated studies, results on ARF-generated synthetic data consistently aligned with original findings. Even for datasets with relatively low sample size-to-dimensionality ratios, replication outcomes closely matched the original results across descriptive and inferential analyses. Reduced dimensionality and variable complexity further enhanced synthesis quality. ARF demonstrated favourable performance regarding utility, privacy preservation, and generalisation relative to other synthesizers and superior computational efficiency.

2508.06719 2026-03-24 q-bio.PE

Speciation by local adaptation and isolation by distance in extended environments

Lara D. Hissa, Flavia M. D. Marquitti, Marcus A. M. de Aguiar

Comments 26 pages, 5 figures, revised

详情
英文摘要

Speciation is often associated with geographical barriers that limit gene flow. However, species can also emerge in continuous homogeneous environments through isolation by distance. When the environment is not homogeneous, natural selection contributes to differentiation by local adaptation and tends to facilitate speciation. To explore how isolation by distance and adaptation combine to determine species diversity, we implemented a model regulated by these two components. The first is implemented via mating restrictions on spatial proximity and genetic similarity. The second is realized by an ecological phenotype subjected to adaptation by natural selection. We consider scenarios where the environment is either homogeneous, with a single ecological optimum, or heterogeneous with two distinct optima. We show that the interplay between selection and isolation by distance affect not only species formation but also phenotypic distributions and speed of speciation. In homogeneous environment, speciation occurs only under restrictive mating, but it takes longer if selection is weak. In contrast, in heterogeneous environments with two local optima and strong selection, species well adapted to each of the optima emerge along the spatial structure, leading to the formation of groups with distinct phenotypes. Permissive mating leads to the formation of only two species, each occupying one of the optima; restrictive mating leads to several species per optimum, in a much faster speciation process. Interestingly, when selection is weak and mating is restrictive, several species form, but the process is slow. Moreover, species average phenotypes do not remain constant over generations, causing the phenotypic distribution to oscillate, never reaching a stationary pattern.

2505.15054 2026-03-24 cs.CL cs.AI cs.LG q-bio.BM

MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation

Feiyang Cai, Jiahui Bai, Tao Tang, Guijuan He, Joshua Luo, Tianyu Zhu, Srikanth Pilla, Gang Li, Ling Liu, Feng Luo

Comments ICLR-2026 Camera-Ready version

详情
英文摘要

Precise recognition, editing, and generation of molecules are essential prerequisites for both chemists and AI systems tackling various chemical tasks. We present MolLangBench, a comprehensive benchmark designed to evaluate fundamental molecule-language interface tasks: language-prompted molecular structure recognition, editing, and generation. To ensure high-quality, unambiguous, and deterministic outputs, we construct the recognition tasks using automated cheminformatics tools, and curate editing and generation tasks through rigorous expert annotation and validation. MolLangBench supports the evaluation of models that interface language with different molecular representations, including linear strings, molecular images, and molecular graphs. Evaluations of state-of-the-art models reveal significant limitations: the strongest model (GPT-5) achieves $86.2\%$ and $85.5\%$ accuracy on recognition and editing tasks, which are intuitively simple for humans, and performs even worse on the generation task, reaching only $43.0\%$ accuracy. These results highlight the shortcomings of current AI systems in handling even preliminary molecular recognition and manipulation tasks. We hope MolLangBench will catalyze further research toward more effective and reliable AI systems for chemical applications.The dataset and code can be accessed at https://huggingface.co/datasets/ChemFM/MolLangBench and https://github.com/TheLuoFengLab/MolLangBench, respectively.

2502.01178 2026-03-24 math.PR q-bio.PE

Genetic contribution of advantaged ancestors in the biparental Moran model -- finite selection

Camille Coron, Yves Le Jan

详情
英文摘要

We study a population of $N$ individuals evolving according to a biparental Moran model with two types, one being advantaged compared to the other. The advantage is conferred by a Mendelian mutation, which reduces the death probability of individuals carrying it. We assume that a proportion $a$ of individuals initially carry this mutation, which therefore eventually gets fixed with high probability. After a long time, we sample a gene uniformly from the population, at a new locus, independent of the locus under selection, and calculate the probability that this gene originated from one of the initially advantaged individuals, when the population size is large. Our theorem provides quantitative insights, such as the observation that under strong viability selection, if only $1\%$ of the individuals are initially advantaged, up to $19\%$ of the population's genome will originate from them after a long time.

2407.03239 2026-03-24 q-bio.QM cs.CV

Solving the inverse problem of microscopy deconvolution with a residual Beylkin-Coifman-Rokhlin neural network

Rui Li, Mikhail Kudryashev, Artur Yakimovich

Comments 17 pages, 8 figures

详情
Journal ref
2024. In European Conference on Computer Vision (pp. 378-395). Cham: Springer Nature Switzerland
英文摘要

Optic deconvolution in light microscopy (LM) refers to recovering the object details from images, revealing the ground truth of samples. Traditional explicit methods in LM rely on the point spread function (PSF) during image acquisition. Yet, these approaches often fall short due to inaccurate PSF models and noise artifacts, hampering the overall restoration quality. In this paper, we approached the optic deconvolution as an inverse problem. Motivated by the nonstandard-form compression scheme introduced by Beylkin, Coifman, and Rokhlin (BCR), we proposed an innovative physics-informed neural network Multi-Stage Residual-BCR Net (m-rBCR) to approximate the optic deconvolution. We validated the m-rBCR model on four microscopy datasets - two simulated microscopy datasets from ImageNet and BioSR, real dSTORM microscopy images, and real widefield microscopy images. In contrast to the explicit deconvolution methods (e.g. Richardson-Lucy) and other state-of-the-art NN models (U-Net, DDPM, CARE, DnCNN, ESRGAN, RCAN, Noise2Noise, MPRNet, and MIMO-U-Net), the m-rBCR model demonstrates superior performance to other candidates by PSNR and SSIM in two real microscopy datasets and the simulated BioSR dataset. In the simulated ImageNet dataset, m-rBCR ranks the second-best place (right after MIMO-U-Net). With the backbone from the optical physics, m-rBCR exploits the trainable parameters with better performances (from ~30 times fewer than the benchmark MIMO-U-Net to ~210 times than ESRGAN). This enables m-rBCR to achieve a shorter runtime (from ~3 times faster than MIMO-U-Net to ~300 times faster than DDPM). To summarize, by leveraging physics constraints our model reduced potentially redundant parameters significantly in expertise-oriented NN candidates and achieved high efficiency with superior performance.

2307.14436 2026-03-24 eess.IV cs.CV q-bio.QM

Phenotype-preserving metric design for high-content image reconstruction by generative inpainting

Vaibhav Sharma, Artur Yakimovich

Comments 8 pages, 3 figures, conference proceedings

详情
Journal ref
In Emerging Topics in Artificial Intelligence (ETAI) 2023 (Vol. 12655, pp. 7-14). SPIE
英文摘要

In the past decades, automated high-content microscopy demonstrated its ability to deliver large quantities of image-based data powering the versatility of phenotypic drug screening and systems biology applications. However, as the sizes of image-based datasets grew, it became infeasible for humans to control, avoid and overcome the presence of imaging and sample preparation artefacts in the images. While novel techniques like machine learning and deep learning may address these shortcomings through generative image inpainting, when applied to sensitive research data this may come at the cost of undesired image manipulation. Undesired manipulation may be caused by phenomena such as neural hallucinations, to which some artificial neural networks are prone. To address this, here we evaluate the state-of-the-art inpainting methods for image restoration in a high-content fluorescence microscopy dataset of cultured cells with labelled nuclei. We show that architectures like DeepFill V2 and Edge Connect can faithfully restore microscopy images upon fine-tuning with relatively little data. Our results demonstrate that the area of the region to be restored is of higher importance than shape. Furthermore, to control for the quality of restoration, we propose a novel phenotype-preserving metric design strategy. In this strategy, the size and count of the restored biological phenotypes like cell nuclei are quantified to penalise undesirable manipulation. We argue that the design principles of our approach may also generalise to other applications.

2306.02929 2026-03-24 q-bio.QM

Microscopy image reconstruction with physics-informed denoising diffusion probabilistic model

Rui Li, Gabriel della Maggiora, Vardan Andriasyan, Anthony Petkidis, Artsemi Yushkevich, Mikhail Kudryashev, Artur Yakimovich

Comments 16 pages, 5 figures

详情
Journal ref
Communications Engineering 3, no. 1 (2024): 186
英文摘要

Light microscopy is a widespread and inexpensive imaging technique facilitating biomedical discovery and diagnostics. However, light diffraction barrier and imperfections in optics limit the level of detail of the acquired images. The details lost can be reconstructed among others by deep learning models. Yet, deep learning models are prone to introduce artefacts and hallucinations into the reconstruction. Recent state-of-the-art image synthesis models like the denoising diffusion probabilistic models (DDPMs) are no exception to this. We propose to address this by incorporating the physical problem of microscopy image formation into the model's loss function. To overcome the lack of microscopy data, we train this model with synthetic data. We simulate the effects of the microscope optics through the theoretical point spread function and varying the noise levels to obtain synthetic data. Furthermore, we incorporate the physical model of a light microscope into the reverse process of a conditioned DDPM proposing a physics-informed DDPM (PI-DDPM). We show consistent improvement and artefact reductions when compared to model-based methods, deep-learning regression methods and regular conditioned DDPMs.

2603.21201 2026-03-24 q-bio.GN

A harmonized benchmarking framework for implementation-aware evaluation of 46 polygenic risk score tools across binary and continuous phenotypes

Muhammad Muneeb, David B. Ascher

详情
英文摘要

Polygenic risk score (PRS) tools differ substantially in statistical assumptions, input requirements, and implementation complexity, making direct comparison difficult. We developed a harmonized, implementation-aware benchmarking framework to evaluate 46 PRS tools across seven binary UK Biobank phenotypes and one continuous trait under three model configurations: null, PRS-only, and PRS plus covariates. The framework integrates standardized preprocessing, tool-specific execution, hyperparameter exploration, and unified downstream evaluation using five-fold cross-validation on high-performance computing infrastructure. In addition to predictive performance, we assessed runtime, memory use, input dependencies, and failure modes. A Friedman test across 40 phenotype--fold combinations confirmed significant differences in tool rankings ($χ^2 = 102.29$, $p = 2.57 \times 10^{-11}$), with no single method universally optimal. These findings provide a reproducible framework for comparative PRS evaluation and demonstrate that tool performance is shaped not only by statistical methodology but also by phenotype architecture, preprocessing choices, covariate structure, computational demands, software robustness, and practical implementation constraints.

2603.21025 2026-03-24 q-bio.PE

Pattern Formation in a Spatial Public Goods Dilemma due to Diffusive or Directed Motion

Yuxuan Zhao, Kaisheng Zhu, Yefei Zhang, Daniel B. Cooney

详情
英文摘要

The costly provision of public goods serves as a model problem for the evolution of cooperative behavior, presenting a social dilemma between the collective benefits of shared resources and the individual incentive to free-ride in resource production. The spatial structure of populations can also impact cooperation over public goods, as diffusion of public goods and intentional motion of individuals towards regions with greater resources can interact with population and public goods dynamics to produce heterogeneous patterns in the spatial distribution of strategies and resources. In this paper, we build off a model introduced by Young and Belmonte for the reaction dynamics of interacting individuals and explicit public good, deriving a system of PDEs that describes the spatial profiles of strategies and the public good in the presence of both diffusive motion of individuals and resources and chemotaxis-like directed motion of individuals in response to gradients in the concentration of public goods. Through linear stability analysis, we show that spatial patterns in strategic and public goods profiles can emerge due to either Turing instability with high defector diffusivity or a directed-motion instability through strong sensitivity of cooperators towards increasing resource concentration. We further explore the emergent spatial patterns with a mix of weakly nonlinear stability analysis and numerical simulation, showing that diffusion-driven instability appears to increase cooperation and public goods across the spatial domain, while directed motion of cooperators towards regions with great public goods provision tends to decrease cooperation and environmental quality across the environment.

2603.21020 2026-03-24 q-bio.QM

Characterizing Long-Range Dependencies in Knee Joint Contact Mechanics: A Comparison of Topology Diffusion, Global Routing, and Hybrid Graph Neural Networks

Zhengye Pan, Jianwei Zuo, Jiajia Luo

详情
英文摘要

Finite element analysis of knee joint contact mechanics is computationally expensive, which has motivated the development of graph neural network surrogate models. However, effectively representing long-range dependencies in joint mechanical responses remains challenging. This study systematically compared topology diffusion, global routing, and their hybridization for surrogate modeling of knee joint contact mechanics. Using kinematic and force data from nine soccer players performing change-of-direction maneuvers, finite element simulations were used to generate graph-structured samples for training and evaluation under a grouped three-fold cross-subject evaluation framework. Five architectures were compared: standard MeshGraphNet, hierarchical MeshGraphNet, a routing-only transformer, a topology-biased routing transformer, and a hybrid model. The hybrid model achieved the best overall performance, yielding the lowest full-field error and peak stress error, together with the highest spatial agreement for high-risk regions. Among the non-hybrid models, the standard topology-diffusion model performed best overall, whereas routing-only strategies were less effective. These findings indicate that topology diffusion provides a robust basis for surrogate modeling of knee joint contact mechanics within the present benchmark, while the addition of global routing can further improve reconstruction of clinically relevant high-stress patterns.

2603.20988 2026-03-24 cs.AI q-bio.NC

Can we automatize scientific discovery in the cognitive sciences?

Akshay K. Jagadish, Milena Rmus, Kristin Witte, Marvin Mathony, Marcel Binz, Eric Schulz

详情
英文摘要

The cognitive sciences aim to understand intelligence by formalizing underlying operations as computational models. Traditionally, this follows a cycle of discovery where researchers develop paradigms, collect data, and test predefined model classes. However, this manual pipeline is fundamentally constrained by the slow pace of human intervention and a search space limited by researchers' background and intuition. Here, we propose a paradigm shift toward a fully automated, in silico science of the mind that implements every stage of the discovery cycle using Large Language Models (LLMs). In this framework, experimental paradigms exploring conceptually meaningful task structures are directly sampled from an LLM. High-fidelity behavioral data are then simulated using foundation models of cognition. The tedious step of handcrafting cognitive models is replaced by LLM-based program synthesis, which performs a high-throughput search over a vast landscape of algorithmic hypotheses. Finally, the discovery loop is closed by optimizing for ''interestingness'', a metric of conceptual yield evaluated by an LLM-critic. By enabling a fast and scalable approach to theory development, this automated loop functions as a high-throughput in-silico discovery engine, surfacing informative experiments and mechanisms for subsequent validation in real human populations.

2603.20848 2026-03-24 cs.CV cs.CE q-bio.TO

GOLDMARK: Governed Outcome-Linked Diagnostic Model Assessment Reference Kit

Chad Vanderbilt, Gabriele Campanella, Siddharth Singi, Swaraj Nanda, Jie-Fu Chen, Ali Kamali, Amir Momeni Boroujeni, David Kim, Mohamed Yakoub, Jamal Benhamida, Meera Hameed, Neeraj Kumar, Gregory Goldgof

详情
英文摘要

Computational biomarkers (CBs) are histopathology-derived patterns extracted from hematoxylin-eosin (H&E) whole-slide images (WSIs) using artificial intelligence (AI) to predict therapeutic response or prognosis. Recently, slide-level multiple-instance learning (MIL) with pathology foundation models (PFMs) has become the standard baseline for CB development. While these methods have improved predictive performance, computational pathology lacks standardized intermediate data formats, provenance tracking, checkpointing conventions, and reproducible evaluation metrics required for clinical-grade deployment. We introduce GOLDMARK (https://artificialintelligencepathology.org), a standardized benchmarking framework built on a curated TCGA cohort with clinically actionable OncoKB level 1-3 biomarker labels. GOLDMARK releases structured intermediate representations, including tile coordinate maps, per-slide feature embeddings from canonical PFMs, quality-control metadata, predefined patient-level splits, trained slide-level models, and evaluation outputs. Models are trained on TCGA and evaluated on an independent MSKCC cohort with reciprocal testing. Across 33 tumor-biomarker tasks, mean AUROC was 0.689 (TCGA) and 0.630 (MSKCC). Restricting to the eight highest-performing tasks yielded mean AUROCs of 0.831 and 0.801, respectively. These tasks correspond to established morphologic-genomic associations (e.g., LGG IDH1, COAD MSI/BRAF, THCA BRAF/NRAS, BLCA FGFR3, UCEC PTEN) and showed the most stable cross-site performance. Differences between canonical encoders were modest relative to task-specific variability. GOLDMARK establishes a shared experimental substrate for computational pathology, enabling reproducible benchmarking and direct comparison of methods across datasets and models.

2603.20707 2026-03-24 q-bio.PE

Coexistence coalitions in propagule disperser quasi-communities

Leonardo Aguirre, José A. Capitán, David Alonso

Comments 35 pages (17 pages Appendix)

详情
英文摘要

Many natural ecosystems harbor large numbers of coexisting species competing for far fewer distinct resources, in apparent defiance of the competitive exclusion principle. Various mechanisms have been proposed to explain this apparent paradox, among the most prominent being competition--colonization trade-offs, environmental heterogeneity, and ecological neutrality. We develop a unified stochastic model class that combines all three coexistence narratives in the context of propagule disperser communities and show that this setting encompasses several important classical models. We then prove a general theorem on coexistence at macroscopic equilibria and provide an algorithm that determines equilibrium coalitions solely from readily available matrix spectra, thereby bypassing the costly computation of exact equilibrium states. Using illustrative examples, we demonstrate the potential of this approach for quantifying the relative merits of different coexistence narratives and for studying their synergistic effects.

2603.20680 2026-03-24 q-bio.NC cs.LG

Hierarchical Multiscale Structure-Function Coupling for Brain Connectome Integration

Jianwei Chen, Zhengyang Miao, Wenjie Cai, Jiaxue Tang, Boxing Liu, Yunfan Zhang, Yuhang Yang, Hao Tang, Carola-Bibiane Schönlieb, Zaixu Cui, Du Lei, Shouliang Qi, Chao Li

详情
英文摘要

Integrating structural and functional connectomes remains challenging because their relationship is non-linear and organized over nested modular hierarchies. We propose a hierarchical multiscale structure-function coupling framework for connectome integration that jointly learns individualized modular organization and hierarchical coupling across structural connectivity (SC) and functional connectivity (FC). The framework includes: (i) Prototype-based Modular Pooling (PMPool), which learns modality-specific multiscale communities by selecting prototypical ROIs and optimizing a differentiable modularity-inspired objective; (ii) an Attention-based Hierarchical Coupling Module (AHCM) that models both within-hierarchy and cross-hierarchy SC-FC interactions to produce enriched hierarchical coupling representations; and (iii) a Coupling-guided Clustering loss (CgC-Loss) that regularizes SC and FC community assignments with coupling signals, allowing cross-modal interactions to shape community alignment across hierarchies. We evaluate the model's performance across four cohorts for predicting brain age, cognitive score, and disease classification. Our model consistently outperforms baselines and other state-of-the-art approaches across three tasks. Ablation and sensitivity analyses verify the contributions of key components. Finally, the visualizations of learned coupling reveal interpretable differences, suggesting that the framework captures biologically meaningful structure-function relationships.

2602.15677 2026-03-24 cs.LG q-bio.QM

CAMEL: An ECG Language Model for Forecasting Cardiac Events

Neelay Velingker, Alaia Solko-Breslin, Mayank Keoliya, Seewon Choi, Jiayi Xin, Anika Marathe, Alireza Oraii, Rajat Deo, Sameed Khatana, Rajeev Alur, Mayur Naik, Eric Wong

Comments 24 pages, 6 figures

详情
英文摘要

Electrocardiograms (ECG) are electrical recordings of the heart that are critical for diagnosing cardiovascular conditions. ECG language models (ELMs) have recently emerged as a promising framework for ECG classification accompanied by report generation. However, current models cannot forecast future cardiac events despite the immense clinical value for planning earlier intervention. To address this gap, we propose CAMEL, the first ELM that is capable of inference over longer signal durations which enables its forecasting capability. Our key insight is a specialized ECG encoder which enables cross-understanding of ECG signals with text. We train CAMEL using established LLM training procedures, combining LoRA adaptation with a curriculum learning pipeline. Our curriculum includes ECG classification, metrics calculations, and multi-turn conversations to elicit reasoning. CAMEL demonstrates strong zero-shot performance across 6 tasks and 9 datasets, including ECGForecastBench, a new benchmark that we introduce for forecasting arrhythmias. CAMEL is on par with or surpasses ELMs and fully supervised baselines both in- and out-of-distribution, achieving SOTA results on ECGBench (+7.0% absolute average gain) as well as ECGForecastBench (+12.4% over fully supervised models and +21.1% over zero-shot ELMs).

2512.03497 2026-03-24 q-bio.QM cs.AI q-bio.CB

Cell-cell Communication Inference and Analysis: Biological Mechanisms, Computational Approaches, and Future Opportunities

Xiangzheng Cheng, Haili Huang, Ye Su, Qing Nie, Xiufen Zou, Suoqin Jin

Comments Published in CSIAM Transactions on Life Sciences (2026)

详情
英文摘要

In multicellular organisms, cells coordinate their activities through cell-cell communication (CCC), which is crucial for development, tissue homeostasis, and disease progression. Recent advances in single-cell and spatial omics technologies provide unprecedented opportunities to systematically infer and analyze CCC from these omics data, either by integrating prior knowledge of ligand-receptor interactions (LRIs) or through de novo approaches. A variety of computational methods have been developed, focusing on methodological innovations, accurate modeling of complex signaling mechanisms, and investigation of broader biological questions. These advances have greatly enhanced our ability to analyze CCC and generate biological hypotheses. Here, we introduce the biological mechanisms and modeling strategies of CCC, and provide a focused overview of more than 140 computational methods for inferring CCC from single-cell and spatial transcriptomic data, emphasizing the diversity in methodological frameworks and biological questions. Finally, we discuss the current challenges and future opportunities in this rapidly evolving field, and summarize available methods in an interactive online resource (https://cellchat.whu.edu.cn) to facilitate more efficient method comparison and selection.

2511.17685 2026-03-24 q-bio.QM cs.AI cs.CV cs.LG

Dual-Path Knowledge-Augmented Contrastive Alignment Network for Spatially Resolved Transcriptomics

Wei Zhang, Jiajun Chu, Xinci Liu, Chen Tong, Xinyue Li

Comments AAAI 2026 Oral, extended version

详情
Journal ref
Proceedings of the AAAI Conference on Artificial Intelligence, 40(15), 12807-12815. 2026
英文摘要

Spatial Transcriptomics (ST) is a technology that measures gene expression profiles within tissue sections while retaining spatial context. It reveals localized gene expression patterns and tissue heterogeneity, both of which are essential for understanding disease etiology. However, its high cost has driven efforts to predict spatial gene expression from whole slide images. Despite recent advancements, current methods still face significant limitations, such as under-exploitation of high-level biological context, over-reliance on exemplar retrievals, and inadequate alignment of heterogeneous modalities. To address these challenges, we propose DKAN, a novel Dual-path Knowledge-Augmented contrastive alignment Network that predicts spatially resolved gene expression by integrating histopathological images and gene expression profiles through a biologically informed approach. Specifically, we introduce an effective gene semantic representation module that leverages the external gene database to provide additional biological insights, thereby enhancing gene expression prediction. Further, we adopt a unified, one-stage contrastive learning paradigm, seamlessly combining contrastive learning and supervised learning to eliminate reliance on exemplars, complemented with an adaptive weighting mechanism. Additionally, we propose a dual-path contrastive alignment module that employs gene semantic features as dynamic cross-modal coordinators to enable effective heterogeneous feature integration. Through extensive experiments across three public ST datasets, DKAN demonstrates superior performance over state-of-the-art models, establishing a new benchmark for spatial gene expression prediction and offering a powerful tool for advancing biological and clinical research.

2509.11545 2026-03-24 q-bio.NC

Representational drift under spontaneous activity -- self-organized criticality enhances representational reliability

Zhuda Yang, Junhao Liang, Wing Ho Yung, Changsong Zhou

详情
英文摘要

Neural systems face the challenge of maintaining reliable representations amid variations from plasticity and spontaneous activity. In particular, the spontaneous dynamics in neuronal circuit is known to operate near a highly variable critical state, which intuitively contrasts with the requirement of reliable representation. It is intriguing to understand how reliable representation could be maintained or even enhanced by critical spontaneous states. We firstly examined the co-existence of the scale-free avalanche in the spontaneous activity of mouse visual cortex with restricted representational geometry manifesting representational reliability amid the representational drift with respect to the visual stimulus. To explore how critical spontaneous state influences the neural representation, we built an excitation-inhibition network with homeostatic plasticity, which self-organizes to the critical spontaneous state. This model successfully reproduced both representational drift and restricted representational geometry observed experimentally, in contrast with randomly shuffled plasticity which causes accumulated drift of representational geometry. We further showed that the self-organized critical state enhances the cross-session low-dimensional representation, comparing to the non-critical state, by restricting the synapse weight into a low variation space. Our findings suggest that spontaneous self-organized criticality serves not only as a ubiquitous property of neural systems but also as a functional mechanism for maintaining reliable information representation under continuously changing networks, providing a potential explanation how the brain maintains consistent perception and behavior despite ongoing synaptic rewiring.

2507.06358 2026-03-24 q-bio.PE cs.LG

Multi-scale species richness estimation with deep learning

Victor Boussange, Bert Wuyts, Philipp Brun, Johanna T. Malle, Gabriele Midolo, Jeanne Portier, Théophile Sanchez, Niklaus E. Zimmermann, Irena Axmanová, Helge Bruelheide, Milan Chytrý, Stephan Kambach, Zdeňka Lososová, Martin Večeřa, Idoia Biurrun, Klaus T. Ecker, Jonathan Lenoir, Jens-Christian Svenning, Dirk Nikolaus Karger

Comments 31 pages

详情
英文摘要

Biodiversity assessments depend critically on the spatial scale at which species richness is measured. How species richness accumulates with sampling area is influenced by natural and anthropogenic processes whose effects vary across spatial scales. These accumulation dynamics, described by the species-area relationship (SAR), are challenging to assess because most biodiversity surveys cover sampling areas far smaller than the scales at which these processes operate. Here, we combine sampling theory with deep learning to estimate species richness at arbitrary spatial scales across geographic space from existing ecological surveys. We apply our model, named MuScaRi, to ~350k vegetation surveys across Europe. Validated against independent regional plant inventories, MuScaRi reduces root mean squared error of vascular plant richness estimates by 61% relative to conventional estimators, yields substantially less biased predictions, and produces multi-scale richness maps alongside spatially explicit estimates of the species accumulation rate, a key indicator for biodiversity conservation. By encompassing the full spectrum of ecologically relevant spatial scales within a single unified framework, MuScaRi provides an essential tool for robust biodiversity assessments and forecasts under global change.