arXivDaily arXiv每日学术速递 周一至周五更新
2602.08910 2026-02-10 cond-mat.dis-nn cond-mat.stat-mech q-bio.NC q-bio.PE

Structural coarse-graining enables noise-robust functional connectivity and reveals hidden inter-subject variability

Izaro Fernandez-Iriondo, Antonio Jimenez-Marin, Jesus Cortes, Pablo Villegas

Comments 10 Pages, 4 Figures and Supplementary Information

详情
英文摘要

Functional connectivity estimates are highly sensitive to analysis choices and can be dominated by noise when the number of sampled time points is small relative to network dimensionality. This issue is particularly acute in fMRI, where scan resolution is limited. Because scan duration is constrained by practical factors (e.g., motion and fatigue), many datasets remain statistically underpowered for high-dimensional correlation estimation. We introduce a framework that combines diffusion-based structural coarse-graining with spectral noise filtering to recover statistically reliable functional networks from temporally limited data. The method reduces network dimensionality by grouping regions according to diffusion-defined communication. This produces coarse-grained networks with dimensions compatible with available time points, enabling random matrix filtering of noise-dominated modes. We benchmark three common FC pipelines against our approach. We find that raw-signal correlations are strongly influenced by non-stationary fluctuations that can reduce apparent inter-subject variability under limited sampling conditions. In contrast, our pipeline reveals a broader, multimodal landscape of inter-subject variability. These large-scale organization patterns are largely obscured by standard pipelines. Together, these results provide a practical route to reliable functional networks under realistic sampling constraints. This strategy helps separate noise-driven artifacts from reproducible patterns of human brain variability.

2602.08897 2026-02-10 q-bio.BM

A Mathematical Theory of Redox Biology

James N. Cobley, Michalis G. Nikolaidis

Comments 41 pages, 4 figures, 2 boxes

详情
英文摘要

Redox biology underpins signalling, metabolism, immunity, and adaptation, yet lacks a unifying theoretical framework capable of formalising structure, function, and dynamics. Current interpretations rely on descriptive catalogues of molecules and reactions, obscuring how redox behaviour emerges from constrained biochemical organisation. Here, we present a mathematical theory of redox biology that resolves this gap by treating redox systems as finite, compositional, dynamical, and spatially embedded objects. We define a structured redox state space in which admissible molecular transformations form a neutral algebra of possibilities. Biological function emerges when this structure is embedded within a wider molecular network and interpreted through weighted flux distributions. Time-dependent reweighting of these transformations generates redox dynamics, while spatial embedding enforces locality and causality, yielding a distributed redox field. Within this framework, context dependence, nonlinearity, hysteresis, and memory arise naturally from bounded state spaces and irreversible transformations, without requiring ad hoc assumptions. This theory provides a working, predictive interpretative basis for redox biology: it constrains admissible states and trajectories, clarifies the meaning of redox measurements, and links chemical transformation to biological behaviour. Redox biology emerges as a geometric, dynamical process governed by lawful organisation.

2602.08832 2026-02-10 q-bio.QM

Oxi-Shapes: Tropical geometric analysis of bounded redox proteomic state spaces

James N. Cobley

Comments 24 pages, 4 figures

详情
英文摘要

Redox proteomics generates bounded biochemical measurements that are categorically mismatched to conventional linear algebraic formalisms. This work introduces Oxi-Shapes, a tropical geometric framework for the measurement-native analysis of bounded redox proteomic data. Oxi-Shapes represents cysteine oxidation as a scalar field over a discrete lattice, enabling global and site-wise analysis without rescaling, interpolation, or kinetic assumptions. At the global level, the framework yields internal redox entropy, lattice curvature, and derived energy functionals that characterise the geometric structure of the redox proteome. At the site level, Oxi-Shapes defines a bounded change space that makes explicit hard geometric constraints on admissible redox transitions and enables a normalised signed representation of site-wise change as a fraction of available redox freedom. Applied to an ageing mouse brain dataset, Oxi-Shapes reveals that a small decrease in mean oxidation arises from a profound redistribution of site-wise redox states, with thousands of residues shifting toward the reduced absorbing boundary. These results demonstrate that categorically correct algebraic representations expose structure in proteomic data that is inaccessible to mean-centric or unbounded analyses.

2602.08779 2026-02-10 q-bio.CB

How negative feedback from filamentous actin affects cell shapes and motility

Jack M. Hughes, Jupiter Algorta, Leah Edelstein-Keshet

Comments Submitted to the Journal of Mathematical Biology in their special issue on Mathematical Modeling of Cell Motility Across Scales

详情
英文摘要

The crawling motility of many eukaryotic cells is driven by filamentous actin (F-actin), and regulated by a network of signaling proteins and lipids (including small GTPases). The tangle of positive and negative feedback loops gives rise to various experimentally observed dynamic patterns (``actin waves''). Here we consider a recent prototypical model for actin waves in which F-actin exerts negative feedback onto a GTPase. Guided by recent numerical PDE bifurcation analysis in Hughes (2025) and Hughes et al (2026), we explore cell shapes and motility associated with polar, oscillatory, and traveling waves solutions of a mass-conserved partial differential equation (PDE) model. We use Morpheus (cellular Potts) simulations to investigate the implications of such regimes of behavior on the shapes and motion of cells, and on transitions between modes of behavior. The model demonstrates various cell states, including resting (spatially uniform GTPase), polar cells (static ``zones'' of GTPase), and traveling waves along the cell edge. In some parameter regimes, such states can coexist, so that cells can transition from one behavior to another in response to noisy stimuli.

2602.08656 2026-02-10 q-bio.PE

Ecosystems in the Anthropocene: transformative drivers

Clara de Goes Monteiro de Carvalho Guimaraes, Pablo Jose Francisco Pena Rodrigues

Comments 16 pages, 1 figure, 1 table

详情
英文摘要

Human activity has an enormous impact on Earth, changing organisms, environments and landscapes, leading to the decline of original ecosystems and irreversible changes that create new combinations of living beings and materials. As a result, ecosystems with new properties and new species pools are emerging. Here, we explore a set of transformative drivers, which can act either individually or in synergy. The expansion of novel ecosystems (hybrids of natural and agricultural systems) is a sign of irreversible, human-induced change. Human growth, adaptation to climate change, urban expansion and geoengineering are powerful transformative drivers which are expected to have a high impact, creating novel ecosystems. In contrast, less transformative drivers such as degrowth, biocentrism, ecological restoration and low-impact agriculture can mitigate human impacts, leading to adaptation, resilience and sustainability, while conserving original ecosystems. This requires a new approach, incorporating new ecological, ethical and cultural perspectives, to keep ecosystems functional and healthy.

2602.08641 2026-02-10 q-bio.PE cond-mat.dis-nn q-bio.BM

Modeling Protein Evolution via Generative Inference From Monte Carlo Chains to Population Genetics

Leonardo Di Bari, Thierry Mora, Andrea Pagnani, Aleksandra M. Walczak, Francesco Zamponi, Saverio Rossi

详情
英文摘要

Generative models derived from large protein sequence alignments define complex fitness landscapes, but their utility for accurately modeling non-equilibrium evolutionary dynamics remains unclear. In this work, we perform a rigorous comparative analysis of three simulation schemes, designed to mimic evolution in silico by local sampling of the probability distribution defined by a generative model. We compare standard independent Markov Chain Monte Carlo, Monte Carlo on a phylogenetic tree, and a population genetics dynamics, benchmarking their outputs against deep sequencing data from four distinct in vitro evolution experiments. We find that standard Monte Carlo fails to reproduce the correct phylogenetic structure and generates unrealistic, gradual mutational sweeps. Performing Monte Carlo on a tree inferred from data improves phylogenetic fidelity and historical accuracy. The population genetics scheme successfully captures phylogenetic correlations, mutational abundances, and selective sweeps as emergent properties, without the need to infer additional information from data. However, the latter choice come at the price of not sampling the proper generative model distribution at long times. Our findings highlight the crucial role of phylogenetic correlations and finite-population effects in shaping evolutionary trajectories on fitness landscapes. These models therefore provide powerful tools for predicting complex adaptive paths and for reliably extrapolating evolutionary dynamics beyond current experimental limitations.

2602.08213 2026-02-10 cs.LG cs.AI cs.CL q-bio.QM

DrugR: Optimizing Molecular Drugs through LLM-based Explicit Reasoning

Haoran Liu, Zheni Zeng, Yukun Yan, Yuxuan Chen, Yunduo Xiao

详情
英文摘要

Molecule generation and optimization is a fundamental task in chemical domain. The rapid development of intelligent tools, especially large language models (LLMs) with powerful knowledge reserves and interactive capabilities, has provided new paradigms for it. Nevertheless, the intrinsic challenge for LLMs lies in the complex implicit relationship between molecular structure and pharmacological properties and the lack of corresponding labeled data. To bridge this gap, we propose DrugR, an LLM-based method that introduces explicit, step-by-step pharmacological reasoning into the optimization process. Our approach integrates domain-specific continual pretraining, supervised fine-tuning via reverse data engineering, and self-balanced multi-granular reinforcement learning. This framework enables DrugR to effectively improve key ADMET properties while preserving the original molecule's core efficacy. Experimental results demonstrate that DrugR achieves comprehensive enhancement across multiple properties without compromising structural similarity or target binding affinity. Importantly, its explicit reasoning process provides clear, interpretable rationales for each optimization step, yielding actionable design insights and advancing toward automated, knowledge-driven scientific discovery. Our code and model checkpoints are open-sourced to foster future research.

2602.08188 2026-02-10 q-bio.PE physics.pop-ph

The Great Filter hypothesis -- a new Great Filter?

Darren J. Dougan

Comments 8 pages, 11 figures, keywords depopulation, exodemography, ecospecies, exodepopulation, Great Filter, Fermi paradox

详情
英文摘要

The Great Filter hypothesis is an extension of the Fermi Paradox: "If life is so common in the universe, why don't we see it?" The Great Filter theory posits there are multiple obstacles or filters life must pass through which ultimately sifts out intelligent life. This paper identifies a new filter: depopulation. As an exospecies advances and reaches the top of the food chain on its planet, Darwinian evolution selects the species to breed fewer offspring due to a lack of predation. As the species evolves intelligence, this leads to medicines and most notably contraception, enabling the species to reduce infant mortality while controlling reproduction. Finally, economic, social and educational factors add to the conscious decision of the intelligent life to slow reproduction. These factors are currently contributing to a human global population peak mid century with subsequent population collapse in less than 500 years. Noting that population growth and decline is exponential, our modelling forecasts human extinction thresholds being tested sometime after the year 2500. There is no reason to assume depopulation dynamics (exodepopulation) would not apply to exocivilizations (exodemography), thus providing a possible resolution of the Fermi Paradox. Furthermore, as machines and AI inevitably supplement humans as depopulation accelerates, the Fermi Paradox can be restated as "Why don't we see machines and AI colonising the galaxy?" A plausible answer is machines will not become conscious and will continue to operate only as tools, tools that will cease operating once humanity is extinct. The Fermi Paradox can then be restated as "Machines will not become conscious, otherwise we would see them colonising the galaxy".

2602.06020 2026-02-10 cs.LG q-bio.BM

Mechanisms of AI Protein Folding in ESMFold

Kevin Lu, Jannik Brinkmann, Stefan Huber, Aaron Mueller, Yonatan Belinkov, David Bau, Chris Wendler

Comments Our code, data, and results are available at https://folding.baulab.info

详情
英文摘要

How do protein structure prediction models fold proteins? We investigate this question by tracing how ESMFold folds a beta hairpin, a prevalent structural motif. Through counterfactual interventions on model latents, we identify two computational stages in the folding trunk. In the first stage, early blocks initialize pairwise biochemical signals: residue identities and associated biochemical features such as charge flow from sequence representations into pairwise representations. In the second stage, late blocks develop pairwise spatial features: distance and contact information accumulate in the pairwise representation. We demonstrate that the mechanisms underlying structural decisions of ESMFold can be localized, traced through interpretable representations, and manipulated with strong causal effects.

2602.03824 2026-02-10 q-bio.PE cs.CV q-bio.QM

Deep-learning-based pan-phenomic data reveals the explosive evolution of avian visual disparity

Jiao Sun

Comments Readers from the field of computer science may be interested in section 2.1, 2.2, 3.1, 4.1, 4.2. These sections discussed the interpretability and representation learning, especially the texture vs shape problem, highlighting our model's ability of overcoming the texture biases and capturing overall shape features. (Although they're put here to prove the biological validity of the model.)

详情
英文摘要

The evolution of biological morphology is critical for understanding the diversity of the natural world, yet traditional analyses often involve subjective biases in the selection and coding of morphological traits. This study employs deep learning techniques, utilising a ResNet34 model capable of recognising over 10,000 bird species, to explore avian morphological evolution. We extract weights from the model's final fully connected (fc) layer and investigate the semantic alignment between the high-dimensional embedding space learned by the model and biological phenotypes. The results demonstrate that the high-dimensional embedding space encodes phenotypic convergence. Subsequently, we assess the morphological disparity among various taxa and evaluate the association between morphological disparity and species richness, demonstrating that species richness is the primary driver of morphospace expansion. Moreover, the disparity-through-time analysis reveals a visual "early burst" after the K-Pg extinction. While mainly aimed at evolutionary analysis, this study also provides insights into the interpretability of Deep Neural Networks. We demonstrate that hierarchical semantic structures (biological taxonomy) emerged in the high-dimensional embedding space despite being trained on flat labels. Furthermore, through adversarial examples, we provide evidence that our model in this task can overcome texture bias and learn holistic shape representations (body plans), challenging the prevailing view that CNNs rely primarily on local textures.

2601.19002 2026-02-10 q-bio.GN

Y-Trim: Evidence-gated Adaptase tail trimming for single-stranded bisulfite sequencing

Yihan Fang

详情
英文摘要

Background: Single-stranded whole-genome bisulfite sequencing (ssWGBS) enables DNA methylation profiling in low-input and highly fragmented material, including cell-free DNA. In widely used post-bisulfite protocols, Adaptase-mediated tailing adds stochastic, template-free end sequence. Unlike adapter-defined junctions, these tails lack a fixed sequence template, so trimming must be decided from FASTQ-stage observables under intrinsic uncertainty. Results: We show that bisulfite-induced compositional degeneracy implies a strictly positive error floor for any fixed per-read boundary rule under a finite nucleotide alphabet. Guided by this limit, we introduce Y-Trim, an evidence-gated framework that separates admission (should we trim) from inference (where to trim). For Read 2, Y-Trim performs per-read adaptive cut placement via a fixed, chemistry-typed matrix-linear texture scoring scheme; for Read 1, it uses automated sample-level anchoring when read-level localization is feasibility-limited. Across modules, Y-Trim is an explicit, chemistry-specific decision rule with interpretable operating points. On a curated 34-run public cohort (CCGB-34) and simulator stress tests with known latent boundaries, Y-Trim exhibits stable Read 2 operating behavior and Read 1 feasibility-limited behavior consistent with conditional read-through. Conclusions: Template-free Adaptase tail trimming is best viewed as an evidence-limited FASTQ-stage decision rather than a generic preprocessing knob. By making admissibility and abstention explicit and exposing interpretable genomic-retention versus residual-carryover trade-offs, Y-Trim provides a practical uncertainty-aware preprocessing strategy for ssWGBS.

2510.21280 2026-02-10 eess.AS cs.AI cs.LG cs.SD q-bio.QM

WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation

Christiaan M. Geldenhuys, Günther Tonitz, Thomas R. Niesler

Journal ref SATNAC 2025, ISBN 978-1-0492-3850-0 (2025)

详情
英文摘要

While recent sound event detection (SED) systems can identify baleen whale calls in marine audio, challenges related to false positive and minority-class detection persist. We propose the boundary proposal network (BPN), which extends an existing lightweight SED system. The BPN is inspired by work in image object detection and aims to reduce the number of false positive detections. It achieves this by using intermediate latent representations computed within the backbone classification model to gate the final output. When added to an existing SED system, the BPN achieves a 16.8 % absolute increase in precision, as well as 21.3 % and 9.4 % improvements in the F1-score for minority-class d-calls and bp-calls, respectively. We further consider two approaches to the selection of post-processing hyperparameters: a forward-search and a backward-search. By separately optimising event-level and frame-level hyperparameters, these two approaches lead to considerable performance improvements over parameters selected using empirical methods. The complete WhaleVAD-BPN system achieves a cross-validated development F1-score of 0.475, which is a 9.8 % absolute improvement over the baseline.

2508.04747 2026-02-10 q-bio.GN cs.LG

GRIT: Graph-Regularized Logit Refinement for Zero-shot Cell Type Annotation

Tianxiang Hu, Chenyi Zhou, Jiaxiang Liu, Jiongxin Wang, Ruizhe Chen, Haoxiang Xia, Gaoang Wang, Jian Wu, Zuozhu Liu

Comments 10 pages, 6 figures

详情
英文摘要

Cell type annotation is a fundamental step in the analysis of single-cell RNA sequencing (scRNA-seq) data. In practice, human experts often rely on the structure revealed by principal component analysis (PCA) followed by $k$-nearest neighbor ($k$-NN) graph construction to guide annotation. While effective, this process is labor-intensive and does not scale to large datasets. Recent advances in CLIP-style models offer a promising path toward automating cell type annotation. By aligning scRNA-seq profiles with natural language descriptions, models like LangCell enable zero-shot annotation. While LangCell demonstrates decent zero-shot performance, its predictions remain suboptimal. In this paper, we propose a principled inference-time paradigm for zero-shot cell type annotation (GRIT) which bridges the scalability of pre-trained foundation models with the structural robustness relied upon in human expert annotation workflows. Specifically, we enforce local consistency of the zero-shot CLIP logits over the task-specific PCA-based $k$-NN graph. We evaluate our approach on 14 annotated human scRNA-seq datasets from 4 distinct studies, spanning 11 organs and over 200,000 single cells. Our method consistently improves zero-shot annotation accuracy, achieving accuracy gains of up to 10\%. Further analysis showcase the mechanism by which GRIT effectively propagates correct signals through the graph, pulling back mislabeled cells toward more accurate predictions. The method is training-free, model-agnostic, and serves as a simple yet effective plug-in for enhancing zero-shot cell type annotation.

2506.03640 2026-02-10 q-bio.NC cond-mat.dis-nn cond-mat.stat-mech

Robust Scaling in Human Brain Dynamics Despite Latent Variables and Limited Sampling Distortions

Rubén Calvo, Carles Martorell, Adrián Roig, Miguel A. Muñoz

Journal ref Phys. Rev. Lett. 136, 068402 (2026)

详情
英文摘要

The idea that information-processing systems operate near criticality to enhance computational performance is supported by scaling signatures in brain activity. However, external signals raise the question of whether this behavior is intrinsic or input-driven. We show that autocorrelated inputs and temporal resolution influence observed scaling exponents in simple neural models. We also demonstrate analytically that under subsampling, non-critical systems driven by independent autocorrelated signals can exhibit strong signatures of apparent criticality. To address these pitfalls, we develop a robust framework and apply it to pooled neural data, revealing resting-state brain activity at the population level is slightly sub-critical yet near-critical. Notably, the extracted critical exponents closely match predictions from a simple recurrent firing-rate model, supporting the emergence of near-critical dynamics from reverberant network activity, with potential implications for information processing and artificial intelligence.

2505.15709 2026-02-10 nlin.CD math-ph math.MP physics.comp-ph physics.soc-ph q-bio.PE

Composing $α$-Gauss and logistic maps: Gradual and sudden transitions to chaos

Marcelo A. Pires, Constantino Tsallis, Evaldo M. F. Curado

Comments 11 pages and 12 figures. This updated version, accepted by Physical Review E, presents a more comprehensive set of analytical results

Journal ref Physical Review E 112, 034209, 2025

详情
英文摘要

We introduce the $α$-Gauss-Logistic map, a new nonlinear dynamics constructed by composing the logistic and $α$-Gauss maps. Explicitly, our model is given by $x_{t+1} = f_L(x_t)x_t^{-α} - \lfloor f_L(x_t)x_t^{-α} \rfloor $ where $f_L(x_t) = r x_t (1-x_t)$ is the logistic map and $ \lfloor \ldots \rfloor $ is the integer part function. Our investigation reveals a rich phenomenology depending solely on two parameters, $r$ and $α$. For $α< 1$, the system exhibits multiple period-doubling cascades to chaos as the parameter $r$ is increased, interspersed with stability windows within the chaotic attractor. In contrast, for $1 \leq α< 2$, the onset of chaos is abrupt, occurring without any prior bifurcations, and the resulting chaotic attractors emerge without stability windows. For $α\geq 2$, the regular behavior is absent. The special case of $α= 1$ allows an analytical treatment, yielding a closed-form formula for the Lyapunov exponent and conditions for an exact uniform invariant density, using the Perron-Frobenius equation. Chaotic regimes for $α= 1$ can exhibit gaps or be gapless. Surprisingly, the golden ratio $Φ$ marks the threshold for the disappearance of the largest gap in the regime diagram. Additionally, at the edge of chaos in the abrupt transition regime, the invariant density approaches a $q$-Gaussian with $q=2$, which corresponds to a Cauchy distribution.

2503.06286 2026-02-10 q-bio.NC

A 7T fMRI dataset of synthetic images for out-of-distribution modeling of vision

Alessandro T. Gifford, Radoslaw M. Cichy, Thomas Naselaris, Kendrick Kay

Journal ref Nature Communications, 2026

详情
英文摘要

Now published in Nature Communications DOI: https://doi.org/10.1038/s41467-026-69345-9 Large-scale visual neural datasets such as the Natural Scenes Dataset (NSD) are boosting computational neuroscience research by enabling models of the brain with performances beyond what was possible just a decade ago. However, because the stimuli of these datasets typically live within a common naturalistic visual distribution, they do not allow for strict out-of-distribution (OOD) generalization tests which are crucial for the development of more robust models. Here, we address this limitation by releasing NSD-synthetic, a dataset consisting of 7T fMRI responses from the same eight NSD participants for 284 synthetic images. We show that NSD-synthetic's fMRI responses reliably encode stimulus-related information and are OOD with respect to NSD. Furthermore, we provide a proof of principle that OOD generalization tests on NSD-synthetic reveal differences between models of the brain that are not detected with the original NSD data; we demonstrate that the degree of OOD (quantified as the distance between a set of responses and the training data used for modeling) is predictive of the magnitude of model failures; and we show that less strict OOD generalization tests can can be usefully applied even within the domain of naturalistic stimuli. These results showcase how NSD-synthetic enables OOD generalization tests that facilitate the development of more robust models of visual processing and the formulation of more accurate theories of human vision.

2501.17207 2026-02-10 cs.NE cs.AI cs.LG q-bio.NC

Rethinking Functional Brain Connectome Analysis: Do Graph Deep Learning Models Help

Keqi Han, Yao Su, Lifang He, Liang Zhan, Sergey Plis, Vince Calhoun, Carl Yang

Comments Published version. See journal for final typeset version

Journal ref npj Artificial Intelligence (2026)

详情
英文摘要

Graph deep learning models, a class of AI-driven approaches employing a message aggregation mechanism, have gained popularity for analyzing the functional brain connectome in neuroimaging. However, their actual effectiveness remains unclear. In this study, we re-examine graph deep learning versus classical machine learning models based on four large-scale neuroimaging studies. Surprisingly, we find that the message aggregation mechanism, a hallmark of graph deep learning models, does not help with predictive performance as typically assumed, but rather consistently degrades it. To address this issue, we propose a hybrid model combining a linear model with a graph attention network through dual pathways, achieving robust predictions and enhanced interpretability by revealing both localized and global neural connectivity patterns. Our findings urge caution in adopting complex deep learning models for functional brain connectome analysis, emphasizing the need for rigorous experimental designs to establish tangible performance gains and perhaps more importantly, to pursue improvements in model interpretability.

2409.16016 2026-02-10 eess.IV cs.CV q-bio.TO

VascX Models: Model Ensembles for Retinal Vascular Analysis from Color Fundus Images

Jose Vargas Quiros, Bart Liefers, Karin van Garderen, Jeroen Vermeulen, Eyened Reading Center, Sinergia Consortium, Caroline Klaver

详情
英文摘要

We introduce VascX models, a comprehensive set of model ensembles for analyzing retinal vasculature from color fundus images (CFIs). Annotated CFIs were aggregated from public datasets . Additional CFIs, mainly from the population-based Rotterdam Study were annotated by graders for arteries and veins at pixel level, resulting in a dataset diverse in patient demographics and imaging conditions. VascX models demonstrated superior segmentation performance across datasets, image quality levels, and anatomic regions when compared to existing, publicly available models, likely due to the increased size and variety of our training set. Important improvements were observed in artery-vein and disc segmentation performance, particularly in segmentations of these structures on CFIs of intermediate quality, common in large cohorts and clinical datasets. Importantly, these improvements translated into significantly more accurate vascular features when we compared features extracted from VascX segmentation masks with features extracted from segmentation masks generated by previous models. With VascX models we provide a robust, ready-to-use set of model ensembles and inference code aimed at simplifying the implementation and enhancing the quality of automated retinal vasculature analyses. The precise vessel parameters generated by the model can serve as starting points for the identification of disease patterns in and outside of the eye.

2306.08135 2026-02-10 physics.soc-ph cond-mat.stat-mech cs.MA q-bio.PE q-bio.QM

Tricritical behavior in epidemic dynamics with vaccination

Marcelo A. Pires, Cesar I. N. Sampaio Filho, Hans J. Herrmann, José S. Andrade

Comments 12 pages, 6 figures and 2 tables. Version closer to the published paper

Journal ref Chaos, Solitons & Fractals, 2023

详情
英文摘要

We scrutinize the phenomenology arising from a minimal vaccination-epidemic (MVE) dynamics using three methods: mean-field approach, Monte Carlo simulations, and finite-size scaling analysis. The mean-field formulation reveals that the MVE model exhibits either a continuous or a discontinuous active-to-absorbing phase transition, accompanied by bistability and a tricritical point. However, on square lattices, we detect no signs of bistability, and we disclose that the active-to-absorbing state transition has a scaling invariance and critical exponents compatible with the continuous transition of the directed percolation universality class. Additionally, our findings indicate that the tricritical and crossover behaviors of the MVE dynamics belong to the universality class of mean-field tricritical directed percolation.

2602.08101 2026-02-10 q-bio.PE

From Stochastic Shocks to Macroscopic Tails: The Moyal Distribution as a Unified Framework for Epidemic Dynamics

Jose de Jesus Bernal-Alvarado, David Delepine

Comments 12 pages, 6 figures

详情
英文摘要

Traditional epidemiological models often fail to characterize the extreme volatility and heavy-tailed "Dragon King" events observed in real-world outbreaks. We propose a unified framework that bridges microscopic agent-based simulations with macroscopic wave decomposition using the Moyal probability density function. By treating viral transmission as a stochastic collision process, we derive a Moyal-Poisson mixture that describes secondary case distributions. Our model successfully recovers the extreme ``superspreading'' events in SARS, MERS, and COVID-19 data that standard Negative Binomial models systematically miss. Furthermore, we apply spectral decomposition to pandemic waves in Germany, demonstrating that the macroscopic "Social Friction" ($β$) is a direct emergent property of microscopic "Collision Shocks". This framework provides a useful descriptive tool for public health planning, emphasizing the need to manage extreme volatility rather than deterministic averages.

2602.08061 2026-02-10 cs.AI q-bio.OT

Securing Dual-Use Pathogen Data of Concern

Doni Bloomfield, Allison Berke, Moritz S. Hanke, Aaron Maiwald, James R. M. Black, Toby Webster, Tina Hernandez-Boussard, Oliver M. Crook, Jassi Pannu

Comments 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Biosecurity Safeguards for Generative AI

详情
英文摘要

Training data is an essential input into creating competent artificial intelligence (AI) models. AI models for biology are trained on large volumes of data, including data related to biological sequences, structures, images, and functions. The type of data used to train a model is intimately tied to the capabilities it ultimately possesses--including those of biosecurity concern. For this reason, an international group of more than 100 researchers at the recent 50th anniversary Asilomar Conference endorsed data controls to prevent the use of AI for harmful applications such as bioweapons development. To help design such controls, we introduce a five-tier Biosecurity Data Level (BDL) framework for categorizing pathogen data. Each level contains specific data types, based on their expected ability to contribute to capabilities of concern when used to train AI models. For each BDL tier, we propose technical restrictions appropriate to its level of risk. Finally, we outline a novel governance framework for newly created dual-use pathogen data. In a world with widely accessible computational and coding resources, data controls may be among the most high-leverage interventions available to reduce the proliferation of concerning biological AI capabilities.

2602.08008 2026-02-10 q-bio.CB

Randomness-aware multiscale models of glioma invasion and treatment

Martina Conte, Sandesh Hiremath, Christina Surulescu

Comments 26 pages, 10 figures

详情
英文摘要

In this work, we develop a stochastic multiscale model for glioma growth and invasion in the brain, incorporating the effects of therapeutic interventions. The model accounts for tumor cell migration influenced by brain tissue heterogeneity and anti-crowding mechanisms, while explicitly addressing treatment-related uncertainties through stochastic processes. Starting from a microscopic description of individual cell dynamics, we derive the corresponding system of macroscopic random reaction-diffusion-taxis equations governing cell density and tissue evolution. Finally, we conduct several numerical experiments to assess the efficacy of different treatment protocols, evaluated with respect to both established and newly proposed clinical criteria and measurable outcomes.

2602.07735 2026-02-10 cs.LG q-bio.BM

TerraBind: Fast and Accurate Binding Affinity Prediction through Coarse Structural Representations

Matteo Rossi, Ryan Pederson, Miles Wang-Henderson, Ben Kaufman, Edward C. Williams, Carl Underkoffler, Owen Lewis Howell, Adrian Layer, Stephan Thaler, Narbe Mardirossian, John Anthony Parkhill

Comments 31 pages, 14 figures

详情
英文摘要

We present TerraBind, a foundation model for protein-ligand structure and binding affinity prediction that achieves 26-fold faster inference than state-of-the-art methods while improving affinity prediction accuracy by $\sim$20\%. Current deep learning approaches to structure-based drug design rely on expensive all-atom diffusion to generate 3D coordinates, creating inference bottlenecks that render large-scale compound screening computationally intractable. We challenge this paradigm with a critical hypothesis: full all-atom resolution is unnecessary for accurate small molecule pose and binding affinity prediction. TerraBind tests this hypothesis through a coarse pocket-level representation (protein C$_β$ atoms and ligand heavy atoms only) within a multimodal architecture combining COATI-3 molecular encodings and ESM-2 protein embeddings that learns rich structural representations, which are used in a diffusion-free optimization module for pose generation and a binding affinity likelihood prediction module. On structure prediction benchmarks (FoldBench, PoseBusters, Runs N' Poses), TerraBind matches diffusion-based baselines in ligand pose accuracy. Crucially, TerraBind outperforms Boltz-2 by $\sim$20\% in Pearson correlation for binding affinity prediction on both a public benchmark (CASP16) and a diverse proprietary dataset (18 biochemical/cell assays). We show that the affinity prediction module also provides well-calibrated affinity uncertainty estimates, addressing a critical gap in reliable compound prioritization for drug discovery. Furthermore, this module enables a continual learning framework and a hedged batch selection strategy that, in simulated drug discovery cycles, achieves 6$\times$ greater affinity improvement of selected molecules over greedy-based approaches.

2602.07709 2026-02-10 q-bio.QM cs.NE

Generative structural elucidation from mass spectra as an iterative optimization problem

Mrunali Manjrekar, Runzhong Wang, Samuel Goldman, Jenna C. Fromer, Connor W. Coley

详情
英文摘要

Liquid chromatography tandem mass spectrometry (LC-MS/MS) is a critical analytical technique for molecular identification across metabolomics, environmental chemistry, and chemical forensics. A variety of computational methods have emerged for structural annotation of spectral features of interest, but many of these features cannot be confidently annotated with reference structures or spectra. Here, we introduce FOAM (Formula-constrained Optimization for Annotating Metabolites), a computational workflow that poses structure elucidation from LC-MS/MS as an iterative optimization problem. FOAM couples a formula-constrained graph genetic algorithm with spectral simulation to explore candidate annotations given an experimental spectrum. We demonstrate FOAM's performance on the NIST'20 and MassSpecGym datasets as both a standalone elucidation pipeline and as a complement to existing inverse models. This work establishes iterative optimization as an effective and extensible paradigm for structural elucidation.

2602.07648 2026-02-10 q-bio.GN

Alteration of the Brains Microbiome and Neuroinflammation Associated with Ventricular Catheters

Zihan Zhu, Dipankar Biswas, Michael Meggyesy, Di Cao, Gwendolyn Williams, Richard Um, Farzad Maroufi, Ryan P. Lee, Jun Hua, Liangliang Zhang, Jeffrey Capadona, Horst V. Recum, Mark G. Luciano

Comments 24 pages, 10 figures

详情
英文摘要

Background and Objectives: Proximal catheter obstruction is the leading cause of ventriculoperitoneal shunt failure, yet the biological triggers of peri-catheter inflammation and tissue ingrowth remain poorly defined. Evidence of bacterial ribosomal RNA in human brain tissue suggests that low-biomass microbial exposure may influence the inflammatory microenvironment surrounding implants. This study examined if microbial signal is detectable in unaltered brain tissue and if catheter implantation produces microbial shifts relevant to shunt dysfunction. Methods: Twenty-nine female mice were assigned to unaltered control (UC), trauma control (TC), plain silicone catheter (PSC), or antibiotic-impregnated catheter (AIC) groups. Brain and cecum tissues were harvested at postoperative days 7 and 28 for 16S rRNA sequencing. Microbial composition and predicted functional pathways were analyzed. A separate cohort underwent longitudinal MRI to assess edema, glial scar formation, and macrophage-associated susceptibility signal. Results: Low-level microbial signal was detected in unaltered brain tissue. Catheter implantation induced material-dependent shifts in brain-associated microbial composition. PSC was associated with enrichment of pro-inflammatory taxa, whereas AIC favored immune-regulatory taxa. Predicted short-chain fatty acid biosynthesis was highest in AIC and lowest in PSC, while predicted lipopolysaccharide biosynthesis trended higher in PSC. MRI showed similar edema resolution but higher macrophage-associated susceptibility signal in PSC animals. Conclusion: Intracranial catheter implantation produces material-dependent shifts in low-biomass brain-associated microbial signal that parallel differential neuroimmune activation. These findings suggest catheter material may shape a biologically relevant peri-catheter niche with implications for chronic gliosis and proximal shunt obstruction.

2602.07547 2026-02-10 q-bio.NC cs.AI cs.CL cs.LG

Linguistic properties and model scale in brain encoding: from small to compressed language models

Subba Reddy Oota, Vijay Rowtula, Satya Sai Srinath Namburi, Khushbu Pahwa, Anant Khandelwal, Manish Gupta, Tanmoy Chakraborty, Bapi S. Raju

Comments 40 pages, 33 figures

详情
英文摘要

Recent work has shown that scaling large language models (LLMs) improves their alignment with human brain activity, yet it remains unclear what drives these gains and which representational properties are responsible. Although larger models often yield better task performance and brain alignment, they are increasingly difficult to analyze mechanistically. This raises a fundamental question: what is the minimal model capacity required to capture brain-relevant representations? To address this question, we systematically investigate how constraining model scale and numerical precision affects brain alignment. We compare full-precision LLMs, small language models (SLMs), and compressed variants (quantized and pruned) by predicting fMRI responses during naturalistic language comprehension. Across model families up to 14B parameters, we find that 3B SLMs achieve brain predictivity indistinguishable from larger LLMs, whereas 1B models degrade substantially, particularly in semantic language regions. Brain alignment is remarkably robust to compression: most quantization and pruning methods preserve neural predictivity, with GPTQ as a consistent exception. Linguistic probing reveals a dissociation between task performance and brain predictivity: compression degrades discourse, syntax, and morphology, yet brain predictivity remains largely unchanged. Overall, brain alignment saturates at modest model scales and is resilient to compression, challenging common assumptions about neural scaling and motivating compact models for brain-aligned language modeling.

2602.07539 2026-02-10 q-bio.NC cs.CL

Training-Driven Representational Geometry Modularization Predicts Brain Alignment in Language Models

Yixuan Liu, Zhiyuan Ma, Likai Tang, Runmin Gan, Xinche Zhang, Jinhao Li, Chao Xie, Sen Song

详情
英文摘要

How large language models (LLMs) align with the neural representation and computation of human language is a central question in cognitive science. Using representational geometry as a mechanistic lens, we addressed this by tracking entropy, curvature, and fMRI encoding scores throughout Pythia (70M-1B) training. We identified a geometric modularization where layers self-organize into stable low- and high-complexity clusters. The low-complexity module, characterized by reduced entropy and curvature, consistently better predicted human language network activity. This alignment followed heterogeneous spatial-temporal trajectories: rapid and stable in temporal regions (AntTemp, PostTemp), but delayed and dynamic in frontal areas (IFG, IFGorb). Crucially, reduced curvature remained a robust predictor of model-brain alignment even after controlling for training progress, an effect that strengthened with model scale. These results links training-driven geometric reorganization to temporal-frontal functional specialization, suggesting that representational smoothing facilitates neural-like linguistic processing.

2602.07475 2026-02-10 cs.LG q-bio.GN

Bipartite Graph Attention-based Clustering for Large-scale scRNA-seq Data

Zhuomin Liang, Liang Bai, Xian Yang

详情
英文摘要

scRNA-seq clustering is a critical task for analyzing single-cell RNA sequencing (scRNA-seq) data, as it groups cells with similar gene expression profiles. Transformers, as powerful foundational models, have been applied to scRNA-seq clustering. Their self-attention mechanism automatically assigns higher attention weights to cells within the same cluster, enhancing the distinction between clusters. Existing methods for scRNA-seq clustering, such as graph transformer-based models, treat each cell as a token in a sequence. Their computational and space complexities are $\mathcal{O}(n^2)$ with respect to the number of cells, limiting their applicability to large-scale scRNA-seq datasets.To address this challenge, we propose a Bipartite Graph Transformer-based clustering model (BGFormer) for scRNA-seq data. We introduce a set of learnable anchor tokens as shared reference points to represent the entire dataset. A bipartite graph attention mechanism is introduced to learn the similarity between cells and anchor tokens, bringing cells of the same class closer together in the embedding space. BGFormer achieves linear computational complexity with respect to the number of cells, making it scalable to large datasets. Experimental results on multiple large-scale scRNA-seq datasets demonstrate the effectiveness and scalability of BGFormer.

2602.07426 2026-02-10 math.CO q-bio.PE

Maximally probable tree topologies with $r$-furcation

Emily H. Dickey, Noah A. Rosenberg

详情
英文摘要

For a specific rooted labeled tree topology, a labeled history is a sequence of branchings that give rise to that labeled topology as it unfolds over time. Here, for $r$-furcating trees, we use a connection with Huffman trees from information theory to identify maximally probable rooted trees -- unlabeled $r$-furcating topologies whose labelings each have a number of labeled histories greater than or equal to those of all other labeled topologies. Our characterization of the unique maximally probable $r$-furcating unlabeled topology generalizes the Harding--Hammersley--Grimmett result identifying the maximally probable bifurcating unlabeled topology, and it provides a new proof for that result. We present a conjecture for the maximally probable $r$-furcating unlabeled topology if labeled histories are tabulated allowing for simultaneous branching events across multiple internal nodes of a tree.

2602.07261 2026-02-10 q-bio.NC cs.AI cs.ET

Cognitive algorithms and systems of episodic memory, semantic memory and their learnings

Qi Zhang

Comments 33 pages, 6 figures, 6 tables

Journal ref Book chapter in Perception-action cycle: Models, Architectures, and Hardware. Springer, 2011

详情
英文摘要

Declarative memory, the memory that can be "declared" in words or languages, is made up of two dissociated parts: episodic memory and semantic memory. This dissociation has its neuroanatomical basis episodic memory is mostly associated with the hippocampus and semantic memory with the neocortex. The two memories, on the other hand, are closely related. Lesions in the hippocampus often result in various impairments of explicit memory, e.g., anterograde, retrograde and developmental amnesias, and semantic learning deficit. These impairments provide opportunities for us to understand how the two memories may be acquired, stored and organized. This chapter reviews several cognitive systems that are centered to mimic explicit memory, and other systems that are neuroanatomically based and are implemented to simulate those memory impairments mentioned above. This review includes: the structures of the computational systems, their learning rules, and their simulations of memory acquisition and impairments.