arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.18559 2026-04-21 q-bio.BM cs.LG

ConforNets: Latents-Based Conformational Control in OpenFold3

Minji Lee, Colin Kalicki, Minkyu Jeon, Aymen Qabel, Alisia Fadini, Mohammed AlQuraishi

详情
英文摘要

Models from the AlphaFold (AF) family reliably predict one dominant conformation for most well-ordered proteins but struggle to capture biologically relevant alternate states. Several efforts have focused on eliciting greater conformational variability through ad hoc inference-time perturbations of AF models or their inputs. Despite their progress, these approaches remain inefficient and fail to consistently recover major conformational modes. Here, we investigate both the optimal location and manner-of-operation for perturbing latent representations in the AF3 architecture. We distill our findings in ConforNets: channel-wise affine transforms of the pre-Pairformer pair latents. Unlike previous methods, ConforNets globally modulate AF3 representations, making them reusable across proteins. On unsupervised generation of alternate states, ConforNets achieve state-of-the-art success rates on all existing multi-state benchmarks. On the novel supervised task of conformational transfer, ConforNets trained on one source protein can induce a conserved conformational change across a protein family. Collectively, these results introduce a mechanism for conformational control in AF3-based models.

2604.18548 2026-04-21 cs.LG q-bio.QM

Physics-Informed Neural Networks for Biological $2\mathrm{D}{+}t$ Reaction-Diffusion Systems

William Lavery, Jodie A. Cochrane, Christian Olesen, Dagim S. Tadele, John T. Nardini, Sara Hamis

详情
英文摘要

Physics-informed neural networks (PINNs) provide a powerful framework for learning governing equations of dynamical systems from data. Biologically-informed neural networks (BINNs) are a variant of PINNs that preserve the known differential operator structure (e.g., reaction-diffusion) while learning constitutive terms via trainable neural subnetworks, enforced through soft residual penalties. Existing BINN studies are limited to $1\mathrm{D}{+}t$ reaction-diffusion systems and focus on forward prediction, using the governing partial differential equation as a regulariser rather than an explicit identification target. Here, we extend BINNs to $2\mathrm{D}{+}t$ systems within a PINN framework that combines data preprocessing, BINN-based equation learning, and symbolic regression post-processing for closed-form equation discovery. We demonstrate the framework's real-world applicability by learning the governing equations of lung cancer cell population dynamics from time-lapse microscopy data, recovering $2\mathrm{D}{+}t$ reaction-diffusion models from experimental observations. The proposed framework is readily applicable to other spatio-temporal systems, providing a practical and interpretable tool for fast analytic equation discovery from data.

2604.18470 2026-04-21 math.NA cs.NA q-bio.NC

High-fidelity and Network-based Spatio-temporal Mathematical Models of Alzheimer's Disease Progression and their Validation Against PET-SUVR Imaging Data

Beatrice Caon, Mattia Corti, Francesca Bonizzoni, Paola F. Antonietti

详情
英文摘要

Alzheimer's disease is the most common neurodegenerative disorder. Its pathological development is connected with the misfolding and accumulation of two toxic proteins: amyloid-beta and tau proteins. Mathematical models provide a valuable quantitative tool for monitoring disease progression. In this work, we proposed and compare a novel framework where the spatio-temporal dynamics of amyloid-beta and tau proteins is modeled based on employing either three-dimensional patient-specific geometries or through reduced network-based models defined on the brain connectome. More specifically, a high-fidelity biophysical model is proposed on three-dimensional brain geometries reconstructed from magnetic resonance imaging, whereas a network-based reduced formulation is defined on the brain connectome. For both approaches, a suitable numerical discretisation is proposed. A sensitivity analysis is presented to quantify the influence of model parameters on protein concentration patterns as well as compare the quality of the predictions. For both approaches, the results are validated against PET-SUVR clinical data using 18FAZD4694 for amyloid-beta and 18FMK6240 for tau protein. The results indicate that the three-dimensional model provides the most accurate and biologically consistent description of the disease progression, but remains computationally demanding. On the other hand, the reduced graph-based model is cheaper, but it is not always able to achieve reliable results.

2604.18345 2026-04-21 q-bio.PE

Effect of antibiotic spectrum on the abundance of resistant bacteria in multispecies communities

Magnus Aspenberg, Erik Andreas Martens, Kristofer Wollein Waldetoft

Comments 5 figures

详情
英文摘要

Antibiotic resistance is a major threat to global health. It emerges in multispecies microbial communities under antibiotic exposure. This makes antibiotic spectrum -- a drug's distribution of effects across species -- a potential key parameter in resistance management. However, we currently lack evolutionary theory for resistance dynamics in a multispecies setting. Analysing established community ecology theory, we develop a simple mathematical measure for how one taxon (strain or species) affects another taxon through all direct and indirect interactions in a complex interaction network. Using this, we derive the expected effects of different antibiotic spectra on the abundance of resistant taxa in microbial communities. This furthers our understanding of microbial evolutionary ecology in multispecies communities, and provides a formal theoretical basis for empirical work on optimal antibiotic choice.

2604.18230 2026-04-21 q-bio.QM cond-mat.mtrl-sci cond-mat.soft

ToFiE, a Topology-aware Fiber Extraction workflow for 3D reconstruction of dense and heterogeneous biological fiber networks from microscopy images

Risa Togo, Sara Cardona, Irène Nagle, Gijsje H. Koenderink, Behrooz Fereidoonnezhad, Mathias Peirlinck

详情
英文摘要

Fibrous networks are ubiquitous structural components in biology, spanning cellulose in plant cell walls, fibrin in blood clots, and collagen in the extracellular matrix of animal tissues. Theoretical models predict that network connectivity critically influences their mechanical behavior. However, accurately reconstructing network topology from 3D image data remains a major challenge as current segmentation methods are not designed to preserve network topology and often rely on intensity-based thresholding, which can fragment fibers and distort junction connectivity. Here, we introduce ToFiE, an open-source topology-aware fiber extraction workflow for reconstructing dense and heterogeneous fibrous networks from high resolution microscopy images while preserving connectivity in three dimensions. We validate ToFiE using synthetic fluorescence microscopy images of fiber networks with varying topologies and signal-to-noise ratios. We further demonstrate its performance by reconstructing the fiber networks of a library of collagen gels with various microstructures, imaged using confocal fluorescence microscopy. Altogether, the results establish ToFiE as a practical semi-automated framework for extracting mechanically relevant network information from imaging data across a broad range of fibrous materials.

2604.18185 2026-04-21 physics.bio-ph q-bio.MN

Noise-Driven Differentiation via Gene Frustration and Epigenetic Fixation

Davey Plugers, Kunihiko Kaneko

Comments 9 pages, 5 figures

详情
英文摘要

Gene expression in cells is stochastic, yet differentiation is robust. We propose a mechanism in which frustrated genes with weakly stable intermediate expression undergo noise-driven switching between basins of attraction, followed by irreversible fate fixation through slow epigenetic feedback. Regulatory interactions amplify effective noise and promote differentiation. We derive analytic expression for the logarithmic dependence of differentiation time on noise strength and input-dependent cell-fate selection, and demonstrate homeorhesis, the dynamical robustness of the epigenetic landscape.

2604.18031 2026-04-21 cs.CL cs.LG q-bio.BM

How Creative Are Large Language Models in Generating Molecules?

Wen Tao, Yiwei Wang, Peng Zhou, Bryan Hooi, Wanlong Fang, Tianle Zhang, Xiao Luo, Yuansheng Liu, Alvin Chan

详情
英文摘要

Molecule generation requires satisfying multiple chemical and biological constraints while searching a large and structured chemical space. This makes it a non-binary problem, where effective models must identify non-obvious solutions under constraints while maintaining exploration to improve success by escaping local optima. From this perspective, creativity is a functional requirement in molecular generation rather than an aesthetic notion. Large language models (LLMs) can generate molecular representations directly from natural language prompts, but it remains unclear what type of creativity they exhibit in this setting and how it should be evaluated. In this work, we study the creative behavior of LLMs in molecular generation through a systematic empirical evaluation across physicochemical, ADMET, and biological activity tasks. We characterize creativity along two complementary dimensions, convergent creativity and divergent creativity, and analyze how different factors shape these behaviors. Our results indicate that LLMs exhibit distinct patterns of creative behavior in molecule generation, such as an increase in constraint satisfaction when additional constraints are imposed. Overall, our work is the first to reframe the abilities required for molecule generation as creativity, providing a systematic understanding of creativity in LLM-based molecular generation and clarifying the appropriate use of LLMs in molecular discovery pipelines.

2604.18022 2026-04-21 q-bio.BM cond-mat.stat-mech cs.LG stat.ML

Boltzmann Machine Learning with a Parallel, Persistent Markov chain Monte Carlo method for Estimating Evolutionary Fields and Couplings from a Protein Multiple Sequence Alignment

Sanzo Miyazawa

Comments A manuscript of 11 pages including 3 figures and 3 tables, and a supplementary material of 9 pages including 8 figures. The program and multiple sequence alignments employed here are available from https://gitlab.com/sanzo.miyazawa/BM/ and https://github.com/Sanzo-Miyazawa/BM/

详情
英文摘要

The inverse Potts problem for estimating evolutionary single-site fields and pairwise couplings in homologous protein sequences from their single-site and pairwise amino acid frequencies observed in their multiple sequence alignment would be still one of useful methods in the studies of protein structure and evolution. Since the reproducibility of fields and couplings are the most important, the Boltzmann machine method is employed here, although it is computationally intensive. In order to reduce computational time required for the Boltzmann machine, parallel, persistent Markov chain Monte Carlo method is employed to estimate the single-site and pairwise marginal distributions in each learning step. Also, stochastic gradient descent methods are used to reduce computational time for each learning. Another problem is how to adjust the values of hyperparameters; there are two regularization parameters for evolutionary fields and couplings. The precision of contact residue pair prediction is often used to adjust the hyperparameters. However, it is not sensitive to these regularization parameters. Here, they are adjusted for the fields and couplings to satisfy a specific condition that is appropriate for protein conformations. This method has been applied to eight protein families.

2604.17960 2026-04-21 q-bio.NC cs.LG

The Umwelt Representation Hypothesis: Rethinking Universality

Victoria Bosch, Rowan Sommers, Adrien Doerig, Tim C Kietzmann

Comments preprint v1

详情
英文摘要

Recent studies reveal striking representational alignment between artificial neural networks (ANNs) and biological brains, leading to proposals that all sufficiently capable systems converge on universal representations of reality. Here, we argue that this claim of Universality is premature. We introduce the Umwelt Representation Hypothesis (URH), proposing that alignment arises not from convergence toward a single global optimum, but from overlap in ecological constraints under which systems develop. We review empirical evidence showing that representational differences between species, individuals, and ANNs are systematic and adaptive, which is difficult to reconcile with Universality. Finally, we reframe ANN model comparison as a method for mapping clusters of alignment in ecological constraint space rather than searching for a single optimal world model.

2604.17926 2026-04-21 q-bio.PE

Information on hidden birth events restores identifiability in phylodynamic inference

Tobias Dieselhorst, Tanja Stadler

详情
英文摘要

The parameters of many classes of birth-death processes cannot be inferred uniquely from phylogenetic trees: infinitely many parameter combinations yield the same distribution of phylogenetic trees. Here, we show that parameter identifiability can be recovered even for the most general cases of time-dependent rates when additional information on hidden birth events along branches of the reconstructed tree is available. This holds both for models in which individuals are sampled at a single point in time or through time at a time-dependent rate. Moreover, we prove that when mutations occur at birth - assuming two different models for the accumulation of mutations at a birth event - then information about hidden birth events is available in the sequences and thus all parameters of time-dependent birth-death models become identifiable. Thus, phylodynamic inference is identifiable whenever evolutionary models with mutation accumulation at birth (such as at speciation, transmission, or cell division) are plausible.

2603.19761 2026-04-21 math.OC q-bio.NC q-bio.QM

Multimodal branched transport infers anatomically aligned brain reaction maps

Cristian Mendico

详情
英文摘要

How external stimulation is transformed into distributed reaction patterns remains unresolved at the level of propagation architecture. Existing large-scale control models quantify transition costs on prescribed networks but do not infer the routing map itself from source and target activity. Here we combine task-related blood-oxygen-level-dependent responses, source-reconstructed electrophysiology and tractography-derived anisotropy to estimate stimulation and reaction measures, define an anatomical transport cost, and infer a branched propagation architecture by variational optimisation. Unlike standard transport formulations, branched transport favours aggregation of signal into shared neural highways before redistribution. We further attach a stochastic graph-induced dynamics to the inferred map and quantify the trade-off between geometric efficiency and dynamical controllability. We show that multimodal data generate anatomically aligned brain reaction maps, that anisotropic costs qualitatively reshape routing backbones relative to isotropic baselines, and that hybrid geometric--dynamical optimisation reveals non-trivial rank reversals across branching regimes.

2512.15948 2026-04-21 cs.AI q-bio.NC

Subjective functions

Samuel J. Gershman

详情
英文摘要

Where do objective functions come from? How do we select what goals to pursue? Human intelligence is adept at synthesizing new objective functions on the fly. How does this work, and can we endow artificial systems with the same ability? This paper proposes an approach to answering these questions, starting with the concept of a subjective function, a higher-order objective function that is endogenous to the agent (i.e., defined with respect to the agent's features, rather than an external task). Expected prediction error is studied as a concrete example of a subjective function. This proposal has many connections to ideas in psychology, neuroscience, and machine learning.

2509.25872 2026-04-21 q-bio.QM q-bio.BM

Marginal Girsanov Reweighting: Stable Variance Reduction for Long-Timescale Dynamics from Biased Simulation

Yan Wang, Hao Wu, Simon Olsson

详情
英文摘要

Recovering unbiased kinetic and thermodynamic observables from the enhanced sampling simulations is a central challenge in rare-event sampling. Classical Girsanov Reweighting (GR) offers a principled solution by yielding exact pathwise probability ratios between biased and unbiased processes. However, the variance of GR weights grows rapidly with time, rendering it impractical for long-horizon reweighting. We introduce Marginal Girsanov Reweighting (MGR), which mitigates variance explosion by marginalizing over intermediate paths, producing stable and scalable weights for long-timescale dynamics. Experiments on various molecular dynamics systems demonstrate that MGR accurately recovers unbiased kinetic properties from trajectories generated under both umbrella sampling and metadynamics biases.

2506.22178 2026-04-21 q-bio.PE math.AP math.DS nlin.PS physics.bio-ph

Vegetation Patterning Can Both Impede and Trigger Critical Transitions from Savanna to Grassland

Jelle van der Voort, Mara Baudena, Ehud Meron, Max Rietkerk, Arjen Doelman

Comments 24 pages, 8 figures

详情
Journal ref
Environmental Research Letters, 2025, Volume 20, Number 9
英文摘要

Tree-grass coexistence is a defining feature of savanna ecosystems, which play an important role in supporting biodiversity and human populations worldwide. While recent advances have clarified many of the underlying processes, how these mechanisms interact to shape ecosystem dynamics under environmental stress is not yet understood. Here, we present and analyze a minimalistic spatially extended model of tree-grass dynamics in dry savannas. We incorporate tree facilitation of grasses through shading and grass competing with trees for water, both varying with tree life stage. Our model shows that these mechanisms lead to grass-tree coexistence and bistability between savanna and grassland states. Moreover, the model predicts vegetation patterns consisting of trees and grasses, particularly under harsh environmental conditions, which can persist in situations where a non-spatial version of the model predicts ecosystem collapse from savanna to grassland instead (a phenomenon called ``Turing-evades-tipping''). Additionally, we identify a novel ``Turing-triggers-tipping'' mechanism, where unstable pattern formation drives tipping events that are overlooked when spatial dynamics are not included. These transient patterns act as early warning signals for ecosystem transitions, offering a critical window for intervention. Further theoretical and empirical research is needed to determine when spatial patterns prevent tipping or drive collapse.

2506.03157 2026-04-21 q-bio.BM cs.LG

UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules

Ziyang Yu, Wenbing Huang, Yang Liu

Comments ICML 2025 poster

详情
英文摘要

Molecular Dynamics (MD) simulations are essential for understanding the atomic-level behavior of molecular systems, giving insights into their transitions and interactions. However, classical MD techniques are limited by the trade-off between accuracy and efficiency, while recent deep learning-based improvements have mostly focused on single-domain molecules, lacking transferability to unfamiliar molecular systems. Therefore, we propose \textbf{Uni}fied \textbf{Sim}ulator (UniSim), which leverages cross-domain knowledge to enhance the understanding of atomic interactions. First, we employ a multi-head pretraining approach to learn a unified atomic representation model from a large and diverse set of molecular data. Then, based on the stochastic interpolant framework, we learn the state transition patterns over long timesteps from MD trajectories, and introduce a force guidance module for rapidly adapting to different chemical environments. Our experiments demonstrate that UniSim achieves highly competitive performance across small molecules, peptides, and proteins.

2604.17786 2026-04-21 q-bio.CB

Spatial dynamic modelling to understand how dendritic cell clustering affects T cell activation

Domenic P. J. Germano, Federico Frascoli, Robyn P. Araujo, Peter P. Lee, Peter S. Kim

详情
英文摘要

The coordination of the immune system and its components is essential for the body to maintain a healthy status. Recent clinical studies show that breast cancer patients with high Dendritic cell clustering in tumour draining lymph nodes have improved survival outcomes, compared to those with a lower degree of clustering. These results suggest that a specific form of Dendritic cell clustering promotes T cell activation. However, the mechanistic effects of this spatial organisation is unclear. We develop a spatially dynamic model of T cells interacting with Dendritic cells within the lymph node. We present a novel probabilistic agent-based model (ABM) of T cells, and use it to derive the deterministic, phenotypically structured partial differential equation (PS-PDE) of T cell activation and motion. Using the PS-PDE, we derive analytic approximations of the expected T cell stimulation distribution, based on the topology and level of clustering of a given Dendritic cell population. Our analytic approximation enables us to identify T cell characteristics that benefit most from Dendritic cell clustering, to result in an enhanced stimulation distribution. We also perform a sensitivity analysis with our models to identify T cell characteristics that result in desirable T cell activation characteristics, such as rapid T cell activation, and robust heterogeneous T cell activation. Our key findings show that T cells with an intermediate level of stimulation uptake benefit most from higher levels of Dendritic cell clustering, activating with a comparable or greater abundance, and greater heterogeneity, when compared to T cells of a similar characteristic but with a lower level of Dendritic cell clustering.

2604.17581 2026-04-21 cs.LG cs.AI q-bio.NC

How Much Data is Enough? The Zeta Law of Discoverability in Biomedical Data, featuring the enigmatic Riemann zeta function

Paul M. Thompson

Comments 25 pages, 5 figures

详情
英文摘要

How much data is enough to make a scientific discovery? As biomedical datasets scale to millions of samples and AI models grow in capacity, progress increasingly depends on predicting when additional data will substantially improve performance. In practice, model development often relies on empirical scaling curves measured across architectures, modalities, and dataset sizes, with limited theoretical guidance on when performance should improve, saturate, or exhibit cross-over behavior. We propose a scaling-law framework for cross-modal discoverability based on spectral structure of data covariance operators, task-aligned signal projections, and learned representations. Many performance metrics, including AUC, can be expressed in terms of cumulative signal-to-noise energy accumulated across identifiable spectral modes of an encoder and cross-modal operator. Under mild assumptions, this accumulation follows a zeta-like scaling law governed by power-law decay of covariance spectra and aligned signal energy, leading naturally to the appearance of the Riemann zeta function. Representation learning methods such as sparse models, low-rank embeddings, and multimodal contrastive objectives improve sample efficiency by concentrating useful signal into earlier stable modes, effectively steepening spectral decay and shifting scaling curves. The framework predicts cross-over regimes in which simpler models perform best at small sample sizes, while higher-capacity or multimodal encoders outperform them once sufficient data stabilizes additional degrees of freedom. Applications include multimodal disease classification, imaging genetics, functional MRI, and topological data analysis. The resulting zeta law provides a principled way to anticipate when scaling data, improving representations, or adding modalities is most likely to accelerate discovery.

2604.17361 2026-04-21 q-bio.QM physics.med-ph

3D-DXA Cortical and Trabecular Parameters: Agreement Between Hologic Densitometers in Clinical Practice

Marta I. Bracco, Jorge Malouf, Laurent Maimoun, Xavier Nogues, Jean Paul Roux, François DuBoeuf, Ludovic Humbert

Comments 17 pages, 2 tables, 4 figures

详情
英文摘要

Background: Three-dimensional dual-energy X-ray absorptiometry reconstructs three-dimensional maps of the proximal femur's density distribution from standard hip scans, enabling the estimation of trabecular and cortical bone parameters. The aim of this study was to assess the agreement of these three-dimensional cortical and trabecular femur parameters across different series and models of Hologic densitometers. Methodology: The study cohort was composed of 103 women and men recruited from four clinical centers in Spain and France. Subjects had duplicated hip scans using different Hologic scanners from the Horizon, Discovery, and QDR4500 series. Analyses were performed using 3D-Shaper software. Inter-scanner agreement was evaluated using Deming regression and Bland-Altman analysis. Results: The parameters demonstrated strong inter-device agreement across all clinical centers and scanner models, with coefficients of determination greater than 0.91. Absolute biases were less than 2.5 mg$/$cm$^3$ for integral volumetric bone mineral density, less than 2.9 mg$/$cm$^3$ for trabecular volumetric bone mineral density, and less than 1.7 mg$/$cm$^2$ for cortical surface bone mineral density. No statistically significant bias was found between parameters obtained from different scanners. Furthermore, the observed bias was lower than the expected least significant change, indicating that inter-scanner variability across these devices is not clinically significant. Conclusions: This study demonstrated excellent agreement for standard and three-dimensional derived bone parameters at the hip across Hologic densitometers. These findings support their suitability for clinical use.

2604.17291 2026-04-21 q-bio.NC

Poisson Flow Model of Cortical Folding Pattern

Moo K. Chung, Luigi Maccotta, Aaron Struck

Comments Published in IEEE EMBC 2026

详情
英文摘要

Cortical folding reflects coordinated neurodevelopmental processes and provides a sensitive marker of neurological disease. In juvenile myoclonic epilepsy (JME), structural abnormalities are subtle and spatially distributed, limiting the sensitivity of conventional morphometric measures such as cortical thickness. We introduce a Poisson flow model derived from gradients of the mean curvature field on the cortical surface. The method yields a smooth scalar field obtained from a Poisson equation, whose surface gradient defines a flow representation of folding organization. This representation enables spatially coherent characterization of sulcal--gyral patterns and provides a principled geometric framework for studying distributed cortical alterations in JME.

2604.11824 2026-04-21 q-bio.QM

Patterns in Individual Blood Count Trajectories in the UK Biobank Characterise Disease-Specific Signatures and Anticipate Pan-Cancer Risk

Riya Nagar, Abicumaran Uthamacumaran, Adelaide de Vecchi, Hector Zenil

Comments 22 pages 6 figures

详情
英文摘要

We investigate the longitudinal behaviour of blood markers from common haematological tests as a marker of disease and as a function of disease progression in a variety of conditions including cancer, cardiovascular disease, and infections. We study confounding and non-confounding factors to allow for the earlier detection of disease and conditions based on their longitudinal signatures from biomarker patterns commonly measured in popular and scalable common blood tests across routine clinical tests, in particular the Complete Blood Count (CBC or FBC). Our analysis with normalised temporal profiles and machine learning techniques even before any symptoms appear demonstrates that analyte-group patterns found in blood testing are disease sensitive and disease specific. We demonstrate that CBC markers contribute to the majority of the predictive signal, while biochemistry and other blood panels provide only a modest additional gain mostly associated to very the individual disease for which the test was designed (e.g. CRP, liver enzymes, blood sugar). Our results demonstrate how regular monitoring, computational intelligence, and machine learning applied to longitudinal CBC data can converge to uncover disease patterns, advancing the potential for precision healthcare and predictive medicine on a mass scale leveraging an existing and pervasive blood test.

2603.06778 2026-04-21 q-bio.MN math.DS

A cocktail of chemical reaction networks and mathematical epidemiology tools for positive ODE stability problems

Florin Avram, Rim Adenane, Andrei-Dan Halanay

Comments Section 3 corrected

详情
英文摘要

We continue recent attempts to put together concepts and results of Chemical Reaction Networks theory (CRNT) and Mathematical Epidemiology (ME), for solving problems of stability of positive ODEs. We provide first an elegant CRN-flavored generalization of the most cited result in ME, the Next Generation Matrix (NGM) theorem. We review next the "symbolic-numeric approach of Vassena and Stadler, which tackles bifurcation problems by viewing the characteristic polynomial of the Jacobian at fixed points as a formal polynomial in the "symbolic reactivities", and identifies its coefficients as "Child Selection minors of the stoichiometric matrix". We also review two applications of this approach using the Mathematica package Epid-CRN tools from both CRNT and ME.

2602.08280 2026-04-21 q-bio.GN

ClusterChirp: Scalable Interactive Exploration of Omics Data with Natural Language-Guided Analysis

Osho Rawal, Rex Lu, Edgar Gonzalez-Kozlova, Sacha Gnjatic, Zeynep H. Gümüş

详情
英文摘要

High-dimensional omics datasets are routinely visualized as heatmaps, where color intensities reveal co-expression patterns and correlations. However, modern omics technologies increasingly generate matrices so large that existing visual exploration tools require down-sampling or filtering, causing loss of biologically important patterns. Additional barriers arise from tools that require command-line expertise, or fragmented workflows for downstream biological interpretation. We present ClusterChirp, a web-based platform for real-time exploration of large-scale data matrices. The platform combines GPU-accelerated rendering and parallelized hierarchical clustering using multiple CPU cores. Built on deck.gl and multi-threaded clustering algorithms, ClusterChirp supports on-the-fly clustering, multi-metric sorting, feature search and interactive visualization controls within a single interface. Uniquely, a natural language interface powered by a Large Language Model allows users to perform complex operations and build reproducible workflows through conversational commands. ClusterChirp further enables within-cluster correlation network analysis in 2D or 3D, and integrates functional enrichment through biological knowledge bases. Developed with iterative user feedback and adhering to FAIR4S principles, ClusterChirp enables users to extract insights from high-dimensional omics data with unprecedented ease and speed. It is freely available at clusterchirp.mssm.edu without login and is also distributed as a Dockerized application at ghcr.io/gumuslab/clusterchirp.

2601.17808 2026-04-21 cs.NE q-bio.GN

Motif Diversity in Human Liver ChIP-seq Data Using MAP-Elites

Alejandro Medina, Mary Lauren Benton

Comments Accepted Companion Paper to the GECCO 2026 Conference

详情
英文摘要

Motif discovery is a core problem in computational biology, traditionally formulated as a likelihood optimization task that returns a single dominant motif from a DNA sequence dataset. However, regulatory sequence data admit multiple plausible motif explanations, reflecting underlying biological heterogeneity. In this work, we frame motif discovery as a quality-diversity problem and apply the MAP-Elites algorithm to evolve position weight matrix motifs under a likelihood-based fitness objective while explicitly preserving diversity across biologically meaningful dimensions. We evaluate MAP-Elites using three complementary behavioral characterizations that capture trade-offs between motif specificity, compositional structure, coverage, and robustness. Experiments on human CTCF liver ChIP-seq data aligned to the human reference genome compare MAP-Elites against a standard motif discovery tool, MEME, under matched evaluation criteria across stratified dataset subsets. Results show that MAP-Elites recovers multiple high-quality motif variants with fitness comparable to MEME's strongest solutions while revealing structured diversity obscured by single-solution approaches.

2601.09173 2026-04-21 cs.LG cs.CL q-bio.QM stat.ML

Geometric Stability: The Missing Axis of Representations

Prashant C. Raju

详情
英文摘要

Representational similarity analysis and related methods have become standard tools for comparing the internal geometries of neural networks and biological systems. These methods measure what is represented, the alignment between two representational spaces, but not whether that structure is robust. We introduce geometric stability, a distinct dimension of representational quality that quantifies how reliably a representation's pairwise distance structure holds under perturbation. Our metric, Shesha, measures self-consistency through split-half correlation of representational dissimilarity matrices constructed from complementary feature subsets. A key formal property distinguishes stability from similarity: Shesha is not invariant to orthogonal transformations of the feature space, unlike CKA and Procrustes, enabling it to detect compression-induced damage to manifold structure that similarity metrics cannot see. Spectral analysis reveals the mechanism: similarity metrics collapse after removing the top principal component, while stability retains sensitivity across the eigenspectrum. Across 2463 encoder configurations in seven domains -- language, vision, audio, video, protein sequences, molecular profiles, and neural population recordings -- stability and similarity are empirically uncorrelated ($ρ=-0.01$). A regime analysis shows this independence arises from opposing effects: geometry-preserving transformations make the metrics redundant, while compression makes them anti-correlated, canceling in aggregate. Applied to 94 pretrained models across 6 datasets, stability exposes a "geometric tax": DINOv2, the top-performing model for transfer learning, ranks last in geometric stability on 5/6 datasets. Contrastive alignment and hierarchical architecture predict stability, providing actionable guidance for model selection in deployment contexts where representational reliability matters.

2509.01038 2026-04-21 q-bio.BM cs.LG

Learning residue level protein dynamics with multiscale Gaussians

Mihir Bafna, Bowen Jing, Bonnie Berger

Comments ICLR 2026

详情
英文摘要

Many methods have been developed to predict static protein structures, however understanding the dynamics of protein structure is essential for elucidating biological function. While molecular dynamics (MD) simulations remain the in silico gold standard, its high computational cost limits scalability. We present DynaProt, a lightweight, SE(3)-invariant framework that predicts rich descriptors of protein dynamics directly from static structures. By casting the problem through the lens of multivariate Gaussians, DynaProt estimates dynamics at two complementary scales: (1) per-residue marginal anisotropy as $3 \times 3$ covariance matrices capturing local flexibility, and (2) joint scalar covariances encoding pairwise dynamic coupling across residues. From these dynamics outputs, DynaProt achieves high accuracy in predicting residue-level flexibility (RMSF) and, remarkably, enables reasonable reconstruction of the full covariance matrix for fast ensemble generation. Notably, it does so using orders of magnitude fewer parameters than prior methods. Our results highlight the potential of direct protein dynamics prediction as a scalable alternative to existing methods.

2310.07464 2026-04-21 eess.IV cs.LG q-bio.QM

Multi-Beholder: Biomarker Prediction for Low-Grade Glioma with Multiple Instance Learning and One-Class Classification

Zijie Fang, Yihan Liu, Yifeng Wang, Xiangyang Zhang, Yang Chen, Changjing Cai, Yiyang Lin, Ying Han, Zhi Wang, Shan Zeng, Jun Tan, Yongbing Zhang, Hong Shen

Comments 14 pages, 5 figures

详情
英文摘要

Biomarker detection is an indispensable part of the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, named Multi-Biomarker Histomorphology Discoverer (Multi-Beholder), to predict the status of five biomarkers in LGG using only hematoxylin and eosin-stained whole slide images. Specifically, Multi-Beholder incorporates one-class classification into the multiple instance learning framework to achieve accurate instance-level pseudo-labeling, thereby complementing slide-level labels and improving prediction performance. Multi-Beholder demonstrates high performance on two LGG cohorts with diverse races and scanning protocols, with area under the receiver operating characteristic curve up to 0.973 on the internal-validated TCGA-LGG dataset and 0.820 on the external-validated Xiangya cohort. Moreover, the interpretability of Multi-Beholder allows for discovering quantitative and qualitative correlations between biomarker status and histomorphology characteristics. Our pipeline not only provides a novel approach for biomarker prediction, enhancing the applicability of molecular treatments for LGG patients but also facilitates the discovery of new mechanisms in molecular functionality and LGG progression. Code can be accessed at https://github.com/Vison307/Multi-Beholder.

2604.17151 2026-04-21 q-bio.NC

Causality as a Minimum Energy Principle

Moo K. Chung, D. Vijay Anand, Anass B El-Yaagoubi, Jae-Hun Jung, Anqi Qiu, Hernando Ombao

Comments Published in IEEE Engineering in Medicine and Biology Society Annual Conference (EMBC) 2026

详情
英文摘要

Classical causal models, such as Granger causality and structural equation modeling, are largely restricted to acyclic interactions and struggle to represent cyclic and higher-order dynamics in complex networks. We introduce a causal framework grounded in a variational principle, interpreting causality as directional energy flow from high- to low-energy states along network connections. Using Hodge theory, network flows are decomposed into dissipative components and a persistent harmonic component that captures stable cyclic interactions. Applied to resting-state fMRI connectivity, our variational framework reveals robust cyclic causal patterns that are not detected by conventional causal models, highlighting the value of variational principles for causality.

2604.17036 2026-04-21 q-bio.PE

Evolution as fitness landscape navigation: Concepts, Measures, and Emerging Questions

Malvika Srivastava, Claudia Bank, Joachim Krug, Suman G. Das

Comments 27 pages, 2 figures

详情
英文摘要

Fitness landscapes are mappings between genotypes, phenotypes, and fitness that shape evolution. In recent years, empirical work and theoretical models have greatly advanced our understanding of how populations navigate rugged fitness landscapes. Here, we provide a timely review of this field. Its rapidly growing literature employs a wide range of terms, which are sometimes used ambiguously or inconsistently. We therefore begin by defining the major concepts and the field's vocabulary, highlighting our own terminology choices wherever needed. We then review key results on the relationships between epistasis, ruggedness, accessibility, and navigability for genotype-fitness maps, highlighting several complex and sometimes counterintuitive connections that have emerged. Further, we review how the conserved structural properties of the underlying genotype-phenotype map -- that leads to the formation of large connected neutral networks of genotypes -- influence dynamics on fitness landscapes. We then compare the two levels to study landscape navigation -- the level of the genotype-phenotype maps and the level of genotype-fitness maps. Our review leads us to propose a new measure of navigability, based on evolutionary outcomes, that is broadly applicable and overcomes limitations of existing measures. Finally, we review the smaller body of work that relaxes the common assumption of fitness-monotonic paths on static landscapes, and discuss how this can fundamentally change the nature of fitness landscape navigation. Throughout the review, we identify directions for future work to fill existing gaps and to synthesize the disparate strands of research within the field.

2604.16896 2026-04-21 q-bio.QM cs.AI

ProtoCycle: Reflective Tool-Augmented Planning for Text-Guided Protein Design

Yutang Ge, Guojiang Zhao, Sihang Li, Zheng Cheng, Zifeng Zhao, Hanchen Xia, Guolin Ke, Linfeng Zhang, Zhifeng Gao, Yuguang Wang

Comments 25 pages, 11 figures. Accepted to Findings of ACL 2026

详情
英文摘要

Designing proteins that satisfy natural language functional requirements is a central goal in protein engineering. A straightforward baseline is to fine-tune generic instruction-tuned LLMs as direct text-to-sequence generators, but this is data- and compute-hungry. With limited supervision, LLMs can produce coherent plans in text yet fail to reliably realize them as sequences. This plan-execute gap motivates ProtoCycle, an agentic framework for protein design that uses LLMs primarily to drive a multi-round, feedback-driven decision cycle. ProtoCycle couples an LLM planner with a lightweight tool environment designed to emulate the iterative workflow of human protein engineering and uses LLM-driven reflection on tool feedback to revise plans. Trained with supervised trajectories and online reinforcement learning, ProtoCycle achieves strong language alignment while maintaining competitive foldability, and ablations show that reflection substantially improves sequence quality.

2604.16851 2026-04-21 cs.LG cs.AI cs.CV q-bio.BM q-bio.QM

Applications of deep generative models to DNA reaction kinetics and to cryogenic electron microscopy

Chenwei Zhang

Comments PhD Thesis

详情
英文摘要

This dissertation explores how deep generative models can advance the analysis of challenging biological problems by integrating domain knowledge with deep learning. It focuses on two areas: DNA reaction kinetics and cryogenic electron microscopy (cryo-EM). In the first part, we present ViDa, a biophysics-informed framework leveraging variational autoencoders (VAEs) and geometric scattering transforms to generate biophysically-plausible embeddings of DNA reaction kinetics simulations. These embeddings are reduced to a two-dimensional space to visualize DNA hybridization and toehold-mediated strand displacement reactions. ViDa preserves structure and clusters trajectory ensembles into reaction pathways, making simulation results more interpretable and revealing new mechanistic insights. In the second part, we address key challenges in cryo-EM density map interpretation and protein structure modeling. We provide a comprehensive review and benchmarking of deep learning methods for atomic model building, with improved evaluation metrics and practical guidance. We then present Struc2mapGAN, a generative adversarial network that synthesizes high-fidelity experimental-like cryo-EM density maps from protein structures. Finally, we present CryoSAMU, a structure-aware multimodal U-Net that enhances intermediate-resolution cryo-EM maps by integrating density features with structural embeddings from protein language models via cross-attention. Overall, these contributions demonstrate the potential of deep generative models to interpret DNA reaction mechanisms and advance cryo-EM density map analysis and protein structure modeling.