arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.06598 2026-05-08 q-bio.QM q-bio.CB

Mathematical Modeling of Early Embryonic Cell Cycles of Drosophila melanogaster

Meskerem Abebaw Mebratie, Benedikt Drebes, Katja Kapp, Arno Müller, Werner M. Seiler

Comments 20 pages, 7 figures

详情
英文摘要

In the early stages of development, Drosophila melanogaster embryos possess very fast and well-coordinated cell cycles. In the cell cycle, CDK activity is essentially regulated by binding CDK and CycB to form an active complex and by phosphorylating CDK via CDC25 and dephosphorylating it via Wee1. We develop a mathematical model for the embryonic cell cycle which is biochemically sound and which can be rigorously analysed after a model reduction. We show that there exists a region in the parameter space where the model describes oscillations. We then focus on the role of two parameters: the CycB synthesis and the activation coefficient of APC. Our main biological hypothesis is that the first one is responsible for the period lengthening over the first 14 cycles which can be experimentally observed and this hypothesis is supported by numerical simulations of our model: if the CycB synthesis is made time-dependent with a prescribed dynamics, then our simulations show qualitatively a very similar behavior to experimental data reported in the literature.

2605.06562 2026-05-08 cs.LG q-bio.GN

Feature Dimensionality Outweighs Model Complexity in Breast Cancer Subtype Classification Using TCGA-BRCA Gene Expression Data

Meena Al Hasani

Comments 8 pages, 4 figures, 3 tables. Independent research study using TCGA-BRCA RNA-seq data

详情
英文摘要

Accurate classification of breast cancer subtypes from gene expression data is critical for diagnosis and treatment selection. However, such datasets are characterized by high dimensionality and limited sample size, posing challenges for machine learning models. In this study, we evaluate the impact of model complexity and feature selection on subtype classification performance using TCGA-BRCA gene expression data. Logistic regression, random forest, and support vector machine (SVM) models were trained using varying numbers of highly variable genes (50 to 20,518). Performance was evaluated using stratified 5-fold cross-validation and assessed with accuracy and macro F1 score. While all models achieved high accuracy, macro F1 analysis revealed substantial differences in subtype-level performance. Logistic regression demonstrated the most stable and balanced performance across subtypes, including improved detection of rare classes. Random forest underperformed on minority subtypes despite strong overall accuracy, while SVM showed sensitivity to feature dimensionality. These findings highlight the importance of model simplicity, evaluation metrics, and feature selection in high-dimensional biological classification tasks.

2605.06456 2026-05-08 cond-mat.stat-mech q-bio.MN q-bio.SC

Activation in Vesicle-Mediated Signaling Shaped by Batch Arrival Statistics

Jan Hauke, Julian B. Voits, Ulrich S. Schwarz

Comments 15 pages, 7 figures, supplement with 16 pages

详情
英文摘要

Vesicle-mediated secretion of ions or molecules is a central mechanism of cellular communication, for example in processes such as neurotransmission or hormone release. These events are inherently stochastic: vesicle fusions lead to bursts of variable sizes, releasing discrete packets of transmitters that are subsequently cleared or degraded. The dynamics break time-reversal symmetry due to the interplay of spontaneous bursts and continuous degradation. Using generating functions and a recursion relation, we derive an exact solution for the full time-dependent probability distribution of a general batch arrival-degradation model. This framework also enables a full analysis of first-passage times to a concentration threshold representing downstream activation. We show that activation kinetics are not determined by mean dynamics alone, but depend sensitively on the temporal statistics of arrival events, batch-size variability, and degradation. In particular, different arrival processes with identical mean rates can lead to qualitatively distinct first-passage behavior, reflecting the role of time-asymmetric fluctuations. We also discuss extensions incorporating vesicle depletion. Our results provide a transparent link between stochastic release dynamics and activation timing in vesicle-mediated signaling.

2605.06420 2026-05-08 q-bio.NC

Beyond Object-Level Alignment: Do Brains and DNNs Preserve the Same Transformations?

Yukiyasu Kamitani

详情
英文摘要

Brain-DNN alignment is usually assessed through stimulus-level correspondence or stimulus-set geometry. Inspired by category theory, we operationalize a different question: do brain and model preserve the same candidate transformations among stimuli? We formalize this as approximate naturality: if a proxy-defined stimulus change is propagated through the brain side and then translated to the model side, the result should match translating first and then propagating, so that the naturality square approximately commutes. We quantify deviations from commutativity by a Naturality Violation Score (NVS) normalized to a permutation null, shifting alignment from per-stimulus sameness to preservation of structure under an explicitly chosen comparison map. As a proof of concept, a controlled five-factor synthetic setting shows that NVS separates complementary alignment failures that aggregate object- and geometry-level scalars cannot resolve. Applied to fMRI responses from the GOD dataset (5 subjects), 3 vision DNNs, and 3 World-Model proxy embeddings, the axis-resolved analysis reveals a hierarchy crossover: semantic axes align most strongly toward HVC and deeper DNN layers (NVS^animacy = 0.39 vs 0.52 for the next-best axis and 1.0 for the permutation-null baseline), whereas low- and mid-level visual axes align toward earlier visual cortex and shallower layers. Supporting analyses (a 15-axis appendix atlas, dissociation tests against RSA/CKA and encoding/decoding accuracy, and a W-less anchor-ablation control) confirm that the alignment is selective over candidate morphism families rather than uniform. NVS thereby turns brain-DNN comparison into a test of jointly preserved candidate transformations, relative to an explicit proxy space and permutation null, and opens a path to richer proxy spaces and controlled world-side transformations.

2605.06304 2026-05-08 q-bio.NC

A multi-scale information geometry reveals the structure of mutual information in neural populations

Simone Azeglio, Steeve Laquitaine, Ulisse Ferrari, Matthew Chalk

详情
英文摘要

Understanding how neural population responses represent sensory information is a central problem in systems neuroscience. One approach is to define a representational geometry on stimulus space in which distances reflect how reliably stimuli can be distinguished from neural activity. However, different constructions of these distances can lead to qualitatively different conclusions about the neural code. Here, we show that a unique Riemannian representational geometry emerges from first principles governing how distances contract as stimulus resolution is lost through coarse-graining. This results in a multi-scale extension of the Fisher information metric, capturing encoding structure from fine stimulus details to coarse global distinctions. The resulting geometry is exactly related to the mutual information encoded by the population: well encoded stimulus directions - those contributing more to mutual information - are expanded, whereas poorly encoded directions are contracted. The metric tensor can be estimated using diffusion models, making the framework practical for large neural populations and high-dimensional stimuli. Applied to visual cortical responses to natural images, the eigenvectors of the metric tensor identify stimulus variations that contribute most to information transmission, yielding interpretable features that are robust to modelling choices. Together, these results provide a principled, information-theoretic framework for characterising neural population codes.

2605.06301 2026-05-08 q-bio.PE cond-mat.stat-mech physics.soc-ph

Higher-order interactions in ecology can be hidden in plain sight

Violeta Calleja-Solanas, Santiago Lamata-Otín, Carlos Gómez-Ambrosi, Jesús Gómez-Gardeñes, Sandro Meloni

Comments 8 pages, 4 figures

详情
英文摘要

Higher-order interactions are increasingly recognized as a key component of ecological dynamics. However, we show that higher-order Lotka-Volterra dynamics can, in some scenarios, be accurately reproduced by effective pairwise models fitted to the same abundance time series. Consequently, higher-order interactions cannot, in general, be inferred from time-series data alone. We further identify a fundamental problem of mechanistic identifiability, whereby distinct interaction mechanisms generate nearly indistinguishable dynamics, potentially leading to accurate yet misleading ecological interpretations. Our results highlight the need to complement time-series data with additional ecological information to infer interaction structure reliably.

2605.06243 2026-05-08 math.CO q-bio.PE

A $μ$-distance for semidirected orchard phylogenetic networks

Gerard Ribas, Joan Carles Pons, Cécile Ané

详情
英文摘要

In evolutionary biology, phylogenetic networks are now widely used to represent the historical relationships between species and population, when this history includes reticulation events such as hybridization, gene flow and admixture between populations. Semidirected phylogenetic networks are appropriate models when the direction of some edges and the root position are not identifiable from data. Comparing semidirected networks is important in many applications. For rooted and directed networks, a $μ$-representation was originally introduced to distinguish tree-child networks, and has since been extended in two different directions: to the larger class of orchard directed networks by adding an extra component that counts paths to reticulations; and to semidirected networks, through an edge-based variant. However, the latter does not provide a distance between semidirected and orchard networks. We introduce here a new edge-based $μ$-representation capable of distinguishing distinct orchard binary semidirected networks. For this class, we provide a reconstruction algorithm and therefore obtain a true distance that is computable in polynomial time.

2605.05985 2026-05-08 cs.AI cs.MA q-bio.QM

BioResearcher: Scenario-Guided Multi-Agent for Translational Medicine

Remigiusz Kinas, Joanna Krawczyk, Rafał Powalski, Przemysław Pietrzak, Agnieszka Kowalewska, Krzysztof Kolmus, Maciej Sypetkowski, Łukasz Smoliński, Tomasz Jetka

Comments 5 pages (main text), 21 pages (appendix), 8 figures, 11 tables

详情
英文摘要

Translational medicine turns underspecified development goals into evidence synthesis that must combine literature, trials, patents, and quantitative multi-omics analysis while preserving identifiers, uncertainty, and retrievable provenance. General-purpose foundation models and off-the-shelf tool-augmented or multi-agent systems are not built for this: they tend to produce single-shot answers or run open-endedly, and fall short on the auditable, scenario-specific workflows that heterogeneous biomedical sources demand. This paper introduces Ingenix BioResearcher, a scenario-guided multi-agent system that maps queries to versioned research playbooks, delegates to specialized subagents over 30+ tools and machine-learning endpoints, mixes structured database access with sandboxed code for genome-scale analyses, and applies claim-level multi-model reconciliation before editorial assembly. We evaluate BioResearcher across unit-level capabilities, open-ended biomedical reasoning, and end-to-end clinical discovery. It leads evaluated baselines on 109 single-step tests (83.49% pass rate; 0.892 average score), achieves strong biomedical benchmark performance (89.33% on BixBench-Verified-50 and the top 0.758 mean score on BaisBench Scientific Discovery), and leads on a 30-query clinical end-to-end benchmark with the highest positive hit rate (74.7% $\pm$ 3.3%) and negative clear rate (96.8% $\pm$ 0.2%). These results show broad, competitive performance across unit-level, open-ended, and end-to-end clinical evaluations.

2605.05966 2026-05-08 q-bio.PE

Towards a unified framework for multiple stable states in ecological systems

Jennifer Paige, Denis D. Patterson, Alan Hastings

Comments 30 pages, 4 figures, 2 tables

详情
英文摘要

Multiple stable states - the coexistence of two or more distinct ecological configurations under identical environmental conditions - have attracted sustained interest in ecology, yet the field still lacks a unified framework connecting ecological mechanisms to dynamical models. Here, we review empirical and theoretical approaches to multiple stable states, synthesising perspectives on stability, tipping, hysteresis, and transient dynamics, and contextualise these within a common mathematical framework. Drawing on examples of well-known ecosystem models, we highlight the central and necessary role of positive feedback loops and identify other common, unifying features of ecological systems that exhibit multiple stable states. We further discuss the relationship between stable and transient dynamics, the roles of spatial and temporal scales in feedback identification, and the implications for ecological restoration and management. We conclude with open questions and challenges for the field, including extending multistability theory to persistent-transient frameworks and harnessing emerging data-collection technologies to sharpen empirical inference.

2604.06269 2026-05-08 q-bio.QM cs.AI

MAT-Cell: A Multi-Agent Tree-Structured Reasoning Framework for Batch-Level Single-Cell Annotation

Yehui Yang, Zelin Zang, Xienan Zheng, Yuzhe Jia, Changxi Chi, Jingbo Zhou, Chang Yu, Jinlin Wu, Fuji Yang, Jiebo Luo, Zhen Lei, Stan Z. Li

详情
英文摘要

Automated single-cell annotation is difficult when the most abundant genes are not the most discriminative ones, or when a target state is poorly covered by a fixed reference atlas. GPTCelltype-style one-shot prompting allows large language models (LLMs) to produce plausible labels from generic expression signals, while reference-based annotators can force unfamiliar states into the nearest known category. We propose MAT-Cell, a prompt-driven framework for batch-level single-cell annotation that separates evidence grounding from label decision. MAT-Cell first uses Reverse Verification Query (RVQ) to combine tissue context, observed differentially expressed genes, and LLM-elicited biological priors into structured candidate-specific premises. Verifier agents then convert these premises into explicit premise-to-claim reasoning trees, and bounded multi-round debate compares,challenges, and revises the resulting claims before consensus or final adjudication.The returned Syllogistic Derivation Tree (SDT) provides an auditable debate trace rather than a formal proof of the annotation. In open-candidate benchmarks across five datasets, a locally deployed Qwen3-30B model with MAT-Cell achieves 75.5% average accuracy, compared with 64.2% for the strongest evaluated CoT baseline and 51.9% for the strongest evaluated scPilot variant. In oracle-candidate bench-marks across three species,MAT-Cell remains competitive across backbones, and local inference substantially reduces monetary cost for batch annotation. Code is available at: https://anonymous.4open.science/r/MATCell-4067

2603.12278 2026-05-08 q-bio.OT cs.AI cs.LG

Unsupervised Anomaly Detection in Wearable Foot Sensor Data: A Baseline Feasibility Study Towards Diabetic Foot Ulcer Prevention

Md Tanvir Hasan Turja

Comments 36 pages, 19 figures. Published in Biomedical Signal Processing and Control, Vol. 123, Part A, 110416, September 2026. https://doi.org/10.1016/j.bspc.2026.110416

详情
Journal ref
Biomedical Signal Processing and Control, Vol. 123, Part A, 110416 (2026)
英文摘要

Diabetic foot ulcers (DFUs) are a severe complication of diabetes associated with significant morbidity, amputation risk, and healthcare burden. Developing effective continuous monitoring frameworks requires first establishing reliable baseline models of normal foot biomechanics. This paper presents a feasibility study of an anomaly detection framework applied to time-series data from wearable foot sensors, specifically NTC thin-film thermocouples for temperature and FlexiForce A401 pressure sensors for plantar load monitoring. Data were collected from healthy adult subjects across 312 capture sessions on an instrumented pathway, generating 93,790 valid multi-sensor readings spanning September 2023 to June 2024. Two unsupervised algorithms, Isolation Forest and K-Nearest Neighbors using Local Outlier Factor (KNN/LOF), were applied to detect statistical deviations in foot temperature and pressure signals. Results show that Isolation Forest is more sensitive to subtle, distributed anomalies, while KNN/LOF identifies concentrated extreme deviations but flags a higher proportion of sessions not corroborated by Isolation Forest. Since no clinical ground truth is available, this difference is interpreted as lower specificity under the shared 5 percent contamination assumption rather than a confirmed false-positive rate. A mild positive correlation (0.41-0.48) between pressure and temperature features supports the case for combined multi-modal monitoring. These findings establish a validated baseline analytical pipeline and provide a methodological foundation for future clinical validation studies involving diabetic patients, where the relationship between detected anomalies and DFU-related pathophysiology can be directly assessed.

2603.10302 2026-05-08 cs.LG q-bio.QM

How to make the most of your masked language model for protein engineering

Calvin McCarter, Nick Bhattacharya, Sebastian W. Ober, Hunter Elliott

Comments Accepted into the GEM Workshop, ICLR 2026

详情
英文摘要

A plethora of protein language models have been released in recent years. Yet comparatively little work has addressed how to best sample from them to optimize desired biological properties. We fill this gap by proposing a flexible, effective sampling method for masked language models (MLMs), and by systematically evaluating models and methods both in silico and in vitro on actual antibody therapeutics campaigns. Firstly, we propose sampling with stochastic beam search, exploiting the fact that MLMs are remarkably efficient at evaluating the pseudo-perplexity of the entire 1-edit neighborhood of a sequence. Reframing generation in terms of entire-sequence evaluation enables flexible guidance with multiple optimization objectives. Secondly, we report results from our extensive in vitro head-to-head evaluation for the antibody engineering setting. This reveals that choice of sampling method is at least as impactful as the model used, motivating future research into this under-explored area.

2602.06640 2026-05-08 q-bio.PE cond-mat.stat-mech

Habitat heterogeneity and dispersal network structure as drivers of metacommunity dynamics

Davide Bernardi, Alice Doimo, Giorgio Nicoletti, Prajwal Padmanabha, Andrea Rinaldo, Samir Suweis, Sandro Azaele, Amos Maritan

Comments 35 pages, 6 figures

详情
英文摘要

Spatial structure and species interactions jointly shape the dynamics and biodiversity of ecological systems, yet most theoretical models either neglect spatial heterogeneity or sacrifice analytical tractability. Here, we provide a unified microscopic, mechanistic framework for deriving effective metapopulation and metacommunity models from individual-based ecological dynamics on arbitrary dispersal networks. The resulting coarse-grained description features an effective dispersal kernel that encodes both microscopic dynamical parameters and network topology. Based on this framework, we demonstrate exact analytical results for species persistence in both homogeneous and heterogeneous landscapes, including a generalization of the classical concept of metapopulation capacity to non-uniform local extinction rates. Incorporating stochasticity arising from finite carrying capacities, we obtain a reduced one-dimensional description that reveals universal finite-size scaling laws for extinction times and fluctuations. Extending the approach to multiple competing species, we prove that in homogeneous environments monodominance can be avoided only in a fine-tuned, marginally stable coexistence state, and that the classic metapopulation capacity gives only a necessary but not sufficient condition for persistence. We demonstrate that heterogeneous habitats can support stable coexistence, but only above a critical level of heterogeneity. Finally, we outline how additional ecological processes can be systematically incorporated within the same formalism. Together, these results provide analytical benchmarks and a general route for constructing spatially explicit ecological theories based on an interpretable underlying mechanistic foundation.

2602.01839 2026-05-08 cs.LG cs.AI q-bio.GN

DOGMA: Weaving Structural Information into Data-centric Single-cell Transcriptomics Analysis

Ru Zhang, Xunkai Li, Yaxin Deng, Sicheng Liu, Daohan Su, Qiangqiang Dai, Hongchao Qin, Rong-Hua Li, Guoren Wang, Jia Li

Comments 34 pages, 4 figures

详情
英文摘要

Recently, data-centric AI methodology has been a dominant paradigm in single-cell transcriptomics analysis, which treats data representation rather than model complexity as the fundamental bottleneck. In the review of current studies, earlier sequence methods treat cells as independent entities and adapt prevalent ML models to analyze their directly inherited sequence data. Despite their simplicity and intuition, these methods overlook the latent intercellular relationships driven by the functional mechanisms of biological systems and the inherent quality issues of the raw sequencing data. Therefore, a series of structured methods has emerged. Although they employ various heuristic rules to capture intricate intercellular relationships and enhance the raw sequencing data, these methods often neglect biological prior knowledge. This omission incurs substantial overhead and yields suboptimal graph representations, hindering the utility of ML models. To address these issues, we propose DOGMA, a data-centric framework designed for the structural reshaping and semantic enhancement of raw data through multi-level biological prior knowledge. Transcending reliance on purely data-driven heuristics, DOGMA provides a prior-guided graph construction pipeline that integrates statistical alignment with Cell Ontology and phylogenetic structure for biologically grounded cell-graph construction and robust cross-species alignment. Furthermore, Gene Ontology is utilized to bridge the feature-level semantic gap by incorporating functional priors. In complex multi-species and multi-organ benchmarks, DOGMA exhibits strong robustness in strict zero-shot cell-type evaluation and sample efficiency while using substantially lower GPU memory and inference time in downstream evaluation.

2512.00254 2026-05-08 q-bio.PE

Self-organized vegetation patterns promote persistence of plant-pollinator mutualisms under environmental stress

Matheus Bongestab, David Pinto-Ramos, Ricardo Martinez-Garcia

Comments 27 pages, 4 figures

详情
英文摘要

Mutualisms are key for structuring ecological communities, but they are sensitive to environmental change and fluctuations in population size. Consequently, how mutualisms achieve stability remains an open question in ecological theory. Motivated by previous results in competitive and predator-prey interactions, we hypothesize that self-organized pattern formation can act as a key stabilizing mechanism of mutualistic interactions. We test this hypothesis using a two-species reaction-diffusion model of a plant-pollinator system that incorporates non-local plant competition and local mutualistic interactions. We first perform a linear stability analysis to determine the conditions under which non-local competition can trigger vegetation pattern formation. We then compute the bifurcation diagrams for both spatial and homogeneous solutions and find that pattern formation enables coexistence at mutualistic strengths below the threshold required in well-mixed populations. This stability gain increases as environmental conditions worsen, because local maxima in vegetation density create the conditions for community persistence despite globally harsh conditions. Moreover, in the strong mutualism limit, the spatial system exhibits multistability between patterned and homogeneous solutions, creating alternative stable configurations that can buffer against fluctuations in population abundance. Spatial self-organization thus stabilizes mutualistic communities through spatial patterns, potentially driving plant-pollinator persistence in stressed environments, including arid ecosystems.

2511.16802 2026-05-08 q-bio.PE math.DS

A model for mosquito-borne epidemic outbreaks with information-dependent protective behaviour

Simone De Reggi, Andrea Pugliese, Mattia Sensi, Cinzia Soresina

Comments 54 pages, 15 figures

详情
英文摘要

We investigate a model for a mosquito-borne epidemic in which human hosts may adopt protective behaviour against vector bites in response to information on both past and current disease prevalence. Assuming that mosquitoes can also feed on non-competent hosts (i.e.\ hosts that do not contribute to disease transmission), we first revisit existing results and show that behaviour-driven protection may either decrease or increase the basic reproduction number, depending on the interaction between behavioural response, host composition, and transmission parameters. Assuming that opinion dynamics evolves on a much faster time scale than disease transmission, we then apply Geometric Singular Perturbation Theory to effectively reduce the original two-group model to a model for a homogeneous host population. The reduced system enables a detailed investigation of the impact of information-induced behavioural changes on the transient dynamics of the epidemic, including scenarios in which protective measures lead to outbreaks with low attack rates. Our analysis shows that behavioural responses may either facilitate epidemic control or prolong disease persistence, potentially generating recurrent damped epidemic waves. Numerical simulations are provided to illustrate and support the analytical findings.

2409.15641 2026-05-08 math.OC q-bio.PE

A minimal compact description of the diversity index polytope

Martin Frohn, Kerry Manson

Comments 31 pages, 5 Figures

详情
英文摘要

A phylogenetic tree is an edge-weighted binary tree, with leaves labelled by a collection of species, that represents the evolutionary relationships between those species. For such a tree, a phylogenetic diversity index is a function that apportions the biodiversity of the collection across its constituent species. The diversity index polytope is the convex hull of the images of phylogenetic diversity indices. We study the combinatorics of phylogenetic diversity indices to provide a minimal compact description of the diversity index polytope. Furthermore, we discuss extensions of the polytope to expand the study of biodiversity measurement.

2605.05907 2026-05-08 q-bio.NC

Decoding Alignment without Encoding Alignment: A critique of similarity analysis in neuroscience

Johannes Bertram, Luciano Dyballa, T. Anderson Keller, Savik Kinger, Steven W. Zucker

Comments 40 pages, 27 figures

详情
英文摘要

Decoding approaches are widely used in neuroscience and machine learning to compare stimulus representations across neural systems, such as different brain regions, organisms, and deep learning models. Popular methods include decoding (perceptual) manifolds and alignment metrics such as Representational Similarity Analysis (RSA) and Dynamic Similarity Analysis (DSA), where similarity in decoding representations is interpreted as evidence for similar computation. This paper demonstrates a fundamental weakness behind this approach: it is misleading to assume that representational geometry is representative of a neuronal population as a whole, when such representations may actually be shaped by a very small subset of neurons. We show that the complementary encoding paradigm addresses this issue directly: it characterizes how neurons are organized globally in terms of their responses to a set of data, providing insight into how the decoding representation is implemented by neurons within a population. We demonstrate across experiments in biological systems and deep learning models that (i) surprisingly, similar decoding behavior and high representational alignment can arise from small, non-representative subpopulations of neurons; and critically, (ii) alignment metrics are insensitive to encoding manifold topology (how function is distributed across neurons), despite this being a key signature of differentiation across biological systems. A controlled MNIST experiment provides causal evidence: decoding metrics remain unchanged even when encoding topology is causally manipulated via the training loss. Overall, similarity in decoding behavior, as measured by classic alignment metrics, does not imply similarity in function or computation, motivating the use of encoding manifolds as a complementary tool for comparing neural systems.

2605.05829 2026-05-08 q-bio.BM

MP2D: Constrained Monte Carlo Tree-Guided Diffusion for Multi-Objective Protein Sequence Design

Zitai Kong, Yifan Dong, Yixuan Wu, Zhaokang Liang, Jian Wu, Hongxia Xu

Comments 16 pages, 4 figures, 7 tables, accepted by the 35th International Joint Conference on Artificial Intelligence

详情
英文摘要

Designing functional protein sequences that satisfy multiple desired properties is a core research focus of protein engineering. Prior methods struggle with inability or inefficiency when dealing with numerous, often conflicting, properties. We propose Multi-Property Protein Diffusion (MP2D), a unified framework for multi-objective protein sequence optimization that integrates conditional discrete diffusion with constrained MCTS and global iterative refinement. MP2D formulates diffusion denoising as a constrained sequential decision-making process and employs MCTS to explore diverse denoising trajectories guided by Pareto-based rewards. A global iterative refinement strategy further enables repeated remasking and re-optimization of candidate sequences, while a dynamic Pareto constraint prevents candidate bloat and maintains balanced trade-offs across objectives. We evaluate MP2D on two challenging multi-objective protein design tasks: antimicrobial peptide and protein binder optimization, involving four to five conflicting properties. Experimental results demonstrate that MP2D consistently outperforms existing multi-objective baselines, achieving robust and balanced improvements across all objectives without retraining generative models. These results highlight MP2D as a practical and scalable solution for multi-objective functional protein design.

2605.05706 2026-05-08 cs.AI q-bio.QM

Resolving the bias-precision paradox with stochastic causal representation learning for personalized medicine

Peisong Zhang, Manqiang Peng, Yuxuan Wu, Pawit Phadungsaksawasdi, Wesley Yeung, Ye Zhang, Trang Nguyen, Qiang Zhang, Nan Liu, Meng Wang, Kee Yuan Ngiam, Yih-Chung Tham, Ching-Yu Cheng, Tianfan Fu, Qingyu Chen, Rosemary Ke, Chang Li, Wenzhuo Yang, Zhenghao Lu, Chunyou Lai, Yu Zhang, Sheng Zhong, Hao Deng, Dianbo Liu

详情
英文摘要

Estimating individualized treatment effects from longitudinal observational data is central to data-driven medicine, yet existing methods face a fundamental limitation: reducing confounding bias often suppresses clinically informative heterogeneity, degrading patient-specific predictions. Here, we identify this tension as a bias-precision paradox in causal representation learning and introduce sampling-based maximum mean discrepancy (sMMD), a stochastic alignment strategy that replaces global adversarial balancing with subset-level matching. We instantiate this approach in a framework for counterfactual outcome prediction with attribution-grounded interpretability. Across two large-scale ICU cohorts (n = 27,783), our framework improves accuracy under distribution shift, reducing error by up to 11.5% and substantially increasing recall in high-risk tasks. Mechanistic analyses show that sMMD selectively preserves clinically decisive variables. In human-AI evaluation, our method outperforms clinicians-in-training and large language models, and improves clinician accuracy by 14.7% while reducing decision time, enabling interpretable, real-time clinical decision support.

2605.05385 2026-05-08 q-bio.PE math.DS

Chapter 2: Geometry of the Fitness Surface and Trajectory Dynamics of Replicator Systems

A. S. Bratus, S. Drozhzhin, T. Yakushkina

详情
英文摘要

We study the geometry of the mean fitness surface of replicator systems and its relationship to evolutionary trajectory dynamics. Using the symmetric--antisymmetric decomposition of the fitness landscape matrix, we derive an explicit formula for the rate of change of mean fitness and establish necessary conditions for its monotonicity along trajectories. In general, replicator trajectories do not reach the maximum of the fitness surface, even in the presence of a unique asymptotically stable equilibrium. We characterise, in terms of the symmetric and antisymmetric parts of the fitness matrix, the precise conditions under which an equilibrium coincides with a local extremum of the fitness surface. Circulant matrices are identified as a natural and nontrivial class satisfying these conditions. We establish a two-way connection between fitness surface maxima and evolutionarily stable states: evolutionary stability implies a local fitness maximum, and the converse holds under the identified structural conditions. When the unique asymptotically stable equilibrium is a local maximum, it is evolutionarily stable and realises the global maximum of the fitness surface; an unstable equilibrium forces the global maximum to the boundary of the simplex. The framework is extended to general Lotka--Volterra systems, where an analogue of mean fitness is shown to share the same extremal properties. Results are illustrated through six examples spanning autocatalytic and hypercyclic replication, a parametric family exhibiting Andronov--Hopf bifurcation and heteroclinic cycles, and the Eigen quasispecies model.

2605.05259 2026-05-08 q-bio.BM cond-mat.mtrl-sci cs.AI q-bio.QM

Enhancing Cryo-EM Density Map Segmentation in Phenix for Improved Atomic Model Building

Chenwei Zhang

Comments 10 pages, 4 figures, 2 tables

详情
英文摘要

We introduce PhenixCraft, a fully automated pipeline for building atomic models from cryo-EM density maps. By integrating AlphaFold predictions, we enhance the map-segmentation step in Phenix during model building, addressing challenges posed by noise and artifacts that traditionally hinder this step. Our results demonstrate PhenixCraft's superior performance in TM-scores and sequence accuracy, significantly improving upon the limitations and inefficiencies of traditional model building using Phenix.

2605.05254 2026-05-08 q-bio.MN q-bio.QM

Modularity Emerges from Action-Functional Constraints in Marine Metabolic Networks: A Biology-Scale Validation of the Network-Weighted Action Principle

Martin G. Frasch

Comments 49 pp, 10 figs. Companion papers: Frasch 2026a (J Physiol, DOI:10.1113/JP290762), 2026b (arXiv:2603.16951), 2026c (arXiv:2604.24805). Code: https://github.com/martinfrasch/tara-modularity

详情
英文摘要

Biological systems operate under simultaneous energetic and informational constraints, yet direct evidence that such constraints shape real metabolic networks is limited. The Network-Weighted Action Principle predicts that networks under these constraints should organize toward high modularity. We tested this prediction in marine microbiome metabolic networks reconstructed from Tara Oceans metagenomes using two complementary approaches. Composite metrics of protein-deployment efficiency and functional-repertoire complexity (n=10) failed under causal-inference diagnostics, with apparent structure dominated by shared-component bias. In contrast, network modularity (n=7) was high (Q ~ 0.987), but this value was shown to arise from sparsity alone. The biologically meaningful signal is the excess over null models: modularity exceeded configuration-model, label-permutation, and bipartite-incidence nulls by Delta Q ~ 0.15-0.40 (p < 0.001), with the largest effect under the bipartite-incidence control. Fine-grained communities recovered by the network partition are not arbitrary: 25% recur across samples, and the most consistent modules map to known functional units, including enzyme subunits, biosynthetic sequences, and transporter complexes. Together, these results show that modularity excess - rather than absolute modularity - is the appropriate signature of biological organization, and that such excess is consistent with cost-minimization principles operating at the scale of natural metabolic networks.

2605.05213 2026-05-08 cs.LG q-bio.QM

Nationwide EHR-Based Chronic Rhinosinusitis Prediction Using Demographic-Stratified Models

Sicong Chang, Yidan Shen, Justina Varghese, Akshay R Prabhakar, Sebastian Guadarrama-Sistos-Vazquez, Jiefu Chen, Masayoshi Takashima, Omar G. Ahmed, Renjie Hu, Xin Fu

Comments Sicong Chang, Yidan Shen are the co-first authors This paper is already accepted to IEEE Engineering in Medicine and Biology Society (EMBC) 2026 conference

详情
英文摘要

Chronic rhinosinusitis (CRS) is a common heterogeneous inflammatory disorder that causes substantial morbidity and healthcare costs. CRS is difficult to identify early from routine encounters, as symptom presentations overlap with common conditions such as allergic rhinitis, and heterogeneous phenotypes further obscure risk patterns. Prior predictive studies often rely on single-institutional cohorts , which reduce population-level generalizability. To overcome this, we leveraged nationwide longitudinal EHR data from the \textit{All of Us} Research Program to predict CRS diagnosis using two years of pre-diagnostic history. To address extreme feature sparsity and dimensionality in coded EHR data, we implemented a hybrid feature-selection pipeline that combines prevalence-based statistical screening with model-based importance ranking, compressing approximately 110,000 candidate codes into 100 interpretable features. To capture demographic heterogeneity, we trained demographic stratified models across six adult sex and life-stage subgroups with subgroup-specific hyperparameter tuning. Our framework achieved an overall AUC of 0.8461, improving discrimination by 0.0168 over the best baseline. These results demonstrate that routinely collected EHR data may support population-representative CRS risk stratification and inform earlier triage and referral prioritization in primary care.

2605.04088 2026-05-08 q-bio.NC math.PR nlin.CD physics.bio-ph

Noise-accelerated Kramers Escape and Coherence Resonance in a 5D Neural Manifold

Yefan Wu

Comments 12 pages, 7 figures, revised version with more rigorous stability derivations. Currently under review at Physical Review E

详情
英文摘要

Intrinsic channel noise is fundamental to neural processing, yet its state-dependent nature, when constrained by strict Feller boundary conditions, is often overlooked. Here, we demonstrate that this bounded multiplicative noise is not merely a source of jitter but an active dynamical force that fundamentally reshapes neural excitability. Investigating a 5D Hodgkin-Huxley-type cortical pacemaker model, we utilize a full-truncation semi-implicit Euler scheme to ensure rigorous probability conservation and domain-preserving integration. Through comprehensive parameter sweeps, we uncover a rich triphasic landscape of noise-induced transitions dictated by the underlying bifurcation structure. Deep in the subthreshold regime, multiplicative noise acts as a constructive force, triggering stochastic awakening via Kramers escape. Near the subcritical Hopf bifurcation, this evolves into highly robust coherence resonance (CR). Crucially, in the supra-threshold oscillatory regime, our framework reveals a striking dynamical shift: a generalized, noise-accelerated Kramers escape. Under extreme multiplicative noise - characteristic of sparse channel populations - strictly bounded fluctuations actively amplify escape rates from the hyperpolarized slow manifold, transforming regular pacing into high-frequency, irregular bursting. Conductance perturbation experiments confirm the profound biological robustness of this transition. These findings establish a physically rigorous mechanism for how boundary-constrained noise drives high-dimensional oscillators toward states of pathological hyperexcitability.

2605.03061 2026-05-08 stat.ML cs.LG q-bio.QM stat.ME

Dynamic Vine Copulas: Detecting and Quantifying Time-Varying Higher-Order Interactions

Houman Safaai, Alessandro Marin Vargas

详情
英文摘要

Time-varying dependence is often modeled with dynamic correlations or Gaussian graphical models, but multivariate systems can change through tail behavior, asymmetry, or conditional structure even when correlations are nearly stable. We introduce Dynamic Vine Copulas (DVC), a temporal vine-copula framework for estimating and diagnosing sequence-wide non-Gaussian dependence. DVC fixes a chosen vine factorization for comparability; the framework applies to C-, D-, and R-vines, and our experiments use fixed-root-order C-vines. Pair-copula states evolve through smooth parameter trajectories or temporally regularized family-switching paths. The main diagnostic is a held-out comparison between a full vine and its matched 1-truncated version, which separates flexible first-tree pairwise dependence from evidence contributed by higher-tree conditional terms. At the population level, under a correct fixed vine and the simplifying assumption, this contrast equals the higher-tree component of a vine total-correlation decomposition; in finite samples, it is a predictive diagnostic. In controlled benchmarks, DVC detects Student-t degrees-of-freedom changes, Clayton-to-Gumbel switches, and recurrent conditional-interaction episodes missed or conflated by Gaussian dynamic baselines. The higher-tree score remains near zero in pairwise-only regimes and rises during conditional-interaction regimes. On Allen Visual Behavior Neuropixels data, DVC identifies a reproducible time-indexed higher-tree signal that is positive across held-out splits and vanishes under a decorrelated null, indicating simultaneous cross-area dependence. DVC therefore provides a flexible temporal copula model and an interpretable test of whether temporal dependence changes are pairwise or conditional.

2603.16281 2026-05-08 cs.LG q-bio.NC

Laya: A LeJEPA Approach to EEG via Latent Prediction over Reconstruction

Saarang Panchavati, Uddhav Panchavati, Hiroki Nariai, Corey Arnold, William Speier

详情
英文摘要

Electroencephalography (EEG) is a widely used tool for studying brain function, with applications in clinical neuroscience, diagnosis, and brain-computer interfaces (BCIs). Recent EEG foundation models trained on large unlabeled corpora aim to learn transferable representations, but their effectiveness remains unclear; reported improvements over smaller task-specific models are often modest, sensitive to downstream adaptation and fine-tuning strategies, and limited under linear probing. We hypothesize that one contributing factor is the reliance on signal reconstruction as the primary self-supervised learning (SSL) objective, which biases representations toward high-variance artifacts rather than task-relevant neural structure. To address this limitation, we explore an SSL paradigm based on Joint Embedding Predictive Architectures (JEPA), which learn by predicting latent representations instead of reconstructing raw signals. We introduce Laya, the first EEG foundation model based on LeJEPA. We show that latent prediction yields representations that encode semantic structure in EEG: Laya embeddings track clinically meaningful state changes such as seizure onset, are resilient to noise, and achieve the strongest mean clinical accuracy under frozen linear probing, with particular gains on tasks where relevant neural patterns are subtle and easily obscured by artifacts. Controlled ablations against matched MAE variants confirm that the choice of pretraining objective, rather than architecture or data, is the primary driver of these gains.

2603.11344 2026-05-08 eess.IV q-bio.QM

Hybrid eTFCE-GRF: Exact Cluster-Size Retrieval with Analytical p-Values for Voxel-Based Morphometry

Don Yin, Hao Chen, Takeshi Miki, Enyu Yang

Comments 25 pages, 7 figures, 3 tables. Submitted to NeuroImage. Open-source package: https://github.com/Don-Yin/pytfce

详情
英文摘要

Threshold-free cluster enhancement (TFCE) integrates cluster extent across thresholds to improve voxel-wise neuroimaging inference, but permutation testing makes it prohibitively slow for large datasets. Probabilistic TFCE (pTFCE) uses analytical Gaussian random field (GRF) p-values but discretises the threshold grid. Exact TFCE (eTFCE) eliminates discretisation via a union-find data structure but still requires permutations. We combine eTFCE's union-find for exact cluster-size retrieval with pTFCE's analytical GRF inference. The union-find builds the cluster hierarchy in one pass over sorted voxels and enables exact size queries at any threshold; GRF theory then converts these sizes to analytical p-values without permutations. Validation on synthetic phantoms (64^3, 80 subjects): FWER controlled at nominal level (0/200 null rejections, 95% CI [0.0%, 1.9%]); power matches baseline pTFCE (Dice >= 0.999); smoothness error below 1%; concordance r > 0.99. On UK Biobank (N=500) and IXI (N=563), significance maps form strict subsets of reference R pTFCE, which supports conservative error control. Implemented in pytfce (pip install pytfce): baseline completes whole-brain VBM in ~5s (75x faster than R pTFCE), hybrid in ~85s (4.6x faster) with exact cluster sizes; both >1000x faster than permutation TFCE.

2510.08410 2026-05-08 q-bio.PE

Intermediate stages in the origin of metabolism at a phosphorylating hydrothermal vent

Natalia Mrnjavac, Nadja K. Hoffmann, Manon L. Schlikker, Maximilian Burmeister, Loraine Schwander, Carolina Garcia Garcia, Max Brabender, Mike Steel, Daniel H. Huson, Sabine Metzger, Quentin Dherbassy, Bernhard Schink, Mirko Basen, Joseph Moran, Harun Tueysuez, Martina Preiner, William F. Martin

Comments 70 pages, 14 figures

详情
英文摘要

The origin of life required the emergence of metabolism, an autocatalytic network of enzymatic reactions that synthesize amino acids, nucleotides and cofactors. At the origin of metabolism there were no enzymes--how did it start? Empirical studies addressing early metabolic evolution are lacking. Harnessing protein structures for metabolic enzymes, we identify intermediate states in primordial metabolic assembly. We show that enzymatic metabolism in the universal common ancestor was incomplete, undergoing final assembly independently in the lineages leading to Bacteria and Archaea. Native transition metals--Fe0, Co0, Ni0, Pd0--served as the catalytic forerunners of both enzymes and cofactors at metabolic origin while phosphite supplied energy, as it phosphorylates AMP to ADP and serine to phosphoserine using native metal catalysts in water. Phosphite and native metals occur in serpentinizing hydrothermal systems, identifying an energy-supplying, catalytic site of metabolic origin. Cofactors liberated nascent metabolism from native metal catalysts, engendering its autocatalytic state.

2509.15832 2026-05-08 q-bio.NC

Overcoming Output Dimension Collapse: When Sparsity Enables Zero-shot Brain-to-Image Reconstruction at Small Data Scales

Kenya Otsuka, Yoshihiro Nagano, Yukiyasu Kamitani

详情
Journal ref
Transactions on Machine Learning Research, 2026
英文摘要

Advances in brain-to-image reconstruction are enabling us to externalize the subjective visual experiences encoded in the brain as images. A key challenge in this task is data scarcity: a translator that maps brain activity to latent image features is trained on a limited number of brain-image pairs, making the translator a bottleneck for zero-shot reconstruction beyond the training stimuli. In this paper, we mathematically analyze the behavior of two translators commonly used in recent reconstruction pipelines: naive multivariate linear regression and sparse multivariate linear regression. We define the data scale as the ratio of the number of training samples to the latent feature dimensionality and characterize the behavior of each model across data scales. Building on a standard structural property of naive multivariate regression, we first show that the resulting ``output dimension collapse'' can become a practical generalization bottleneck in brain-to-image reconstruction. We introduce the best prediction diagnostic, which is computable without brain activity, to quantify the practical impact of this collapse. We then analyze sparse linear regression models in a student--teacher framework and derive expressions for the prediction error in terms of data scale and other sparsity-related parameters. Our analysis clarifies when variable selection can reduce prediction error at small data scales by exploiting the sparsity of the brain-to-feature mapping. Our findings provide quantitative guidelines for diagnosing output dimension collapse and for designing effective translators and feature representations for zero-shot reconstruction.