arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.20488 2026-04-23 q-bio.GN

Conditional Monte Carlo Tree Diffusion for Designing Cell-Type-Specific and Biologically Faithful Regulatory DNA

Animesh Awasthi, Raphael Bednarsky, Moritz Schaefer, Christoph Bock

详情
英文摘要

Designing regulatory DNA elements with precise cell-type-specific activity is broadly relevant for cell engineering and gene therapy. Deep generative models can generate functional gene-regulatory elements, but existing methods struggle to achieve high specificity against undesired cell types while adhering to the genome's natural regulatory grammar. Here, we introduce DNA-CRAFT, a generative framework that integrates class-conditioned discrete diffusion with Monte Carlo tree search to design cell-type-specific and biologically faithful regulatory elements. We first train a discrete diffusion model on the ENCODE registry of 3.2 million candidate regulatory elements. Second, we condition the model to learn class-specific regulatory grammars of naturally occurring DNA sequences, including enhancers and promoters. Third, we employ conditional Monte Carlo tree guidance, an inference-time alignment algorithm designed to maximize the differential regulatory activity between desired and undesired cell types. By benchmarking DNA-CRAFT on regulatory sequence design tasks for human cell lines and immune cell types, we demonstrate that our model generates sequences with high predicted cell-type-specific activity and biological fidelity, achieving the best trade-offs compared to methods that use diffusion, autoregressive models, and gradient-based optimization.

2604.20477 2026-04-23 q-bio.PE

Emergence biases in molecular evolution

Timothy Fuqua, Nikolaos Vakirlis

Comments 14 pages, 4 figures, perspective piece submitted to a peer-reviewed journal

详情
英文摘要

Biases in molecular evolution can significantly influence evolutionary trajectories. They have been described in a variety of contexts such as development and mutation, but not for acquiring new functions (i.e. emergence). Here, we formalize the term, emergence bias, as the molecular predisposition that, upon mutation, biases a genetic sequence towards or against gaining new functions or causing new phenotypes. These biases have been observed in previous studies for the emergence of promoters, enhancers, and de novo proteins, but never formally characterized as such. In this Perspective piece, we describe these studies and synthesize their findings through the prism of a unifying term, emergence bias, to provide support for this new concept , and speculate on its molecular underpinnings. We believe that emergence biases may play an important role in evolutionary innovations.

2604.20469 2026-04-23 math.AP q-bio.PE

Indirect Prey-taxis VS a Shortwave External Signal in Multiple Dimensions

Andrey Morgulis, Karrar Malal

Comments 30 pages, 1 figures

详情
英文摘要

We address a short-wave asymptotic for one class of quasi-linear second order PDE systems involving the cross-diffusion described by the so-called Patlak--Keller--Segel law. It is common to employ these equations for modelling the predator--prey community with the prey-taxis that means the interactions of two species of particles or cells or anything else through which the species called "predators" is capable of moving directionally while searching for the other species called "prey." However, we suppose the predators to be sensitive not to the prey density but to a driving signal produced by the prey. Additionally, the production of the driving signal is assumed to be sensitive to the intensity of an external field, which is independent from the community state. This is what we call the external signal. It can be due to the spatiotemporal inhomogeneity of the environment arising from natural or artificial reasons. We assume that the external signal takes a general short-wave form and construct a complete asymptotic expansion for the short-wave solutions with no restrictions on the spatial dimension or kinetics of inter/intraspecific reactions. Further, we apply the short wave asymptotic to studying the stability or instability induced by the external signal following Kapitza' theory for the upside-down pendulum. Applying the general results to some special classes external signals, we get examples of suppressing the taxical transport, examples of robustness of the species equilibrium to the signal or, oppositely, blurring the borderline in the parametric space between the areas of stability and instability of this equilibrium. These results contribute to filling the gap in the literature, since the theory and techniques for the asymptotic integration of systems described above represent a weakly charted area.

2604.20263 2026-04-23 q-bio.QM cs.AI cs.LG

AROMA: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling

Zhenyu Wang, Geyan Ye, Wei Liu, Man Tat Alexander Ng

Comments Accepted to ACL 2026 as a Findings paper. Zhenyu Wang and Geyan Ye are equal contributors; Geyan Ye is the corresponding author and project lead

详情
英文摘要

Virtual cell modeling predicts molecular state changes under genetic perturbations in silico, which is essential for biological mechanism studies. However, existing approaches suffer from unconstrained reasoning, uninterpretable predictions, and retrieval signals that are weakly aligned with regulatory topology. To address these limitations, we propose AROMA, an Augmented Reasoning Over a Multimodal Architecture for virtual cell genetic perturbation modeling. AROMA integrates textual evidence, graph-topology information, and protein sequence features to model perturbation-target dependencies, and is trained with a two-stage optimization strategy to yield predictions that are both accurate and interpretable. We also construct two knowledge graphs and a perturbation reasoning dataset, PerturbReason, containing more than 498k samples, as reusable resources for the virtual cell domain. Experiments show that AROMA outperforms existing methods across multiple cell lines, and remains robust under zero-shot evaluation on an unseen cell line, as well as in knowledge-sparse, long-tail scenarios. Overall, AROMA demonstrates that combining knowledge-driven multimodal modeling with evidence retrieval provides a promising pathway toward more reliable and interpretable virtual cell perturbation prediction. Model weights are available at https://huggingface.co/blazerye/AROMA. Code is available at https://github.com/blazerye/AROMA.

2604.20003 2026-04-23 q-bio.QM cs.AI cs.LG

scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics

Qifeng Zhou, Lei Yu, Yuzhi Guo, Yuwei Miao, Hehuan Ma, Wenliang Zhong, Lin Xu, Junzhou Huang

详情
英文摘要

The integration of single-cell proteomic data is often hindered by the fragmented nature of targeted antibody panels. To address this limitation, we introduce scpFormer, a transformer-based foundation model designed for single-cell proteomics. Pre-trained on over 390 million cells, scpFormer replaces standard index-based tokenization with a continuous, sequence-anchored approach. By combining Evolutionary Scale Modeling (ESM) with value-aware expression embeddings, it dynamically maps variable panels into a shared semantic space without artificial discretization. We demonstrate that scpFormer generates global cell representations that perform competitively in large-scale batch integration and unsupervised clustering. Moreover, its open-vocabulary architecture facilitates in silico panel expansion, assisting in the reconstruction of biological manifolds in sparse clinical datasets. Finally, this learned protein co-expression logic is transferable to bulk-omics tasks, supporting applications like cancer drug response prediction. scpFormer provides a versatile, panel-agnostic framework to facilitate scalable biomarker discovery and precision oncology.

2604.19852 2026-04-23 q-bio.CB

Multi-stage volume exclusion models for cell proliferation

John Carlo Dimaculangan, Cameron A. Smith, Christian A. Yates

Comments 55 pages, 20 figures, submitted to Physical Review E

详情
英文摘要

Cell proliferation and cell movement are fundamentally stochastic processes which lead to variability in the growth and spatial structure of cell populations in many biological settings, such as cell invasion, wound healing, and tumour growth. We develop stochastic, on-lattice agent-based models (ABMs) which incorporate volume exclusion, random movement, and multi-stage representations of the cell cycle. The multi-stage framework enables a more realistic representation of true cell cycle time distributions. We also introduce a novel form of myopic behaviour, in which cells sense their local environment when attempting to proliferate. For each ABM, we derive a corresponding continuum partial differential equation (PDE) description under the mean-field approximation. Using numerical simulations, we investigate how different proliferation mechanisms influence population-level dynamics in both the discrete and continuum models. In particular, we consider biologically relevant contexts of growth-to-confluence assays (using uniform initial conditions) and travelling wave behaviour associated with cell invasion. We examine how the PDE solutions compare with the behaviour of the corresponding ABMs averaged over many realisations.

2604.19850 2026-04-23 cs.ET cs.LG cs.NE q-bio.MN q-bio.QM

What Makes a Bacterial Model a Good Reservoir Computer? Predicting Performance from Separability and Similarity

Laura Alonso Bartolomé, Jean-Loup Faulon, Xavier Hinaut

详情
英文摘要

Biological systems are promising substrates for computation because they naturally process environmental information through complex internal dynamics. In this study, we investigate whether bacterial metabolic models can act as physical reservoirs and whether their computational performance can be predicted from dynamical properties linked to separability and similarity. We simulated the growth dynamics of five bacterial species, one yeast species, and 29 Escherichia coli single-gene deletion mutants using dynamic flux balance analysis (dFBA), with glucose and xylose concentrations as inputs and growth curves as reservoir states. Computational performance was assessed on random nonlinear classification tasks using a linear readout, while reservoir properties linked to separability and similarity were characterised through kernel and generalisation ranks computed from growth-curve state matrices. Several microbial models achieved high classification accuracy, showing that bacterial metabolic dynamics can support nonlinear computation. Clear differences were observed between species, with some models converging more rapidly and others reaching higher maximum accuracy, revealing a trade-off between convergence speed and peak performance. In contrast, all E. coli mutants were dominated by the wild-type model, suggesting that gene deletions reduce the dynamical richness required for efficient computation. The difference between kernel and generalisation ranks was generally associated with improved accuracy, but deviations across models and sensitivity at low rank values limited its predictive power in practice. Overall, these results show that bacterial metabolic models constitute promising substrates for reservoir computing and provide a first step towards identifying microbial strains with favourable computational properties for future experimental implementations.

2604.19842 2026-04-23 q-bio.OT

Energy gradients as potential drivers of pre-cellular chemical organization

Arturo Tozzi

Comments 14 pages, 5 figures

详情
英文摘要

The onset of life is often framed around membrane bound compartments and encoded metabolism, leaving unresolved how spatial organization arose before stable boundaries. In this context, environmental gradients are usually treated as boundary conditions rather than variables structuring chemical dynamics. We ask whether spatial localization and functional coupling can emerge under realistic environmental gradients in the absence of membranes, proposing that spatial variations in energy availability act as organizing variables that bias transport and reaction. We introduce a reaction diffusion model in which interacting chemical species evolve within an externally imposed activity landscape defined by coupled gradients in pH, redox potential and temperature, integrating diffusion, gradient driven drift and position dependent reaction kinetics. We performed simulations across a range of gradient strengths representative of hydrothermal vent like conditions. Our results suggest that sufficiently strong gradients induce spontaneous accumulation of reactants, spatial alignment of reaction maxima and the emergence of stable, confined chemical states. Localization arises above a threshold at which gradient driven transport overcomes diffusive and degradative losses. We conclude that spatially structured energy landscapes can support organized chemical dynamics without predefined compartments, providing a mechanism for coupling and persistence in continuous media. Potential applications include experimental platforms for studying prebiotic chemistry, microfluidic systems with controlled gradients and the design of chemically responsive materials.

2604.19840 2026-04-23 cs.LG q-bio.QM

Graph-Theoretic Models for the Prediction of Molecular Measurements

Anna Niane, Prudence Djagba

详情
英文摘要

Graph-theoretic approaches offer simplicity, interpretability, and low computational cost for molecular property prediction. Among these, the model proposed by Mukwembi and Nyabadza, based on the external activity $D(G)$ and internal activity $ζ(G)$ indices, achieved strong results on a small flavonoid dataset. However, its ability to generalize to larger and chemically diverse datasets has not been tested. This study evaluates the baseline $D(G)$-$ζ(G)$ polynomial model on five benchmark datasets from MoleculeNet, covering biological activity (BACE, 1,513 molecules), lipophilicity (LogP synthetic, 14,610 molecules; LogP experimental, 753 molecules), aqueous solubility (ESOL, 1,128 molecules), and hydration free energy (SAMPL, 642 molecules). The baseline model achieves an average $R^2 = 0.24$, confirming limited transferability. To address this, a systematic enhancement framework is proposed, progressively incorporating Ridge regularization, additional graph descriptors, physicochemical properties, ensemble learning with Gradient Boosting, Lasso feature selection, and a hybrid approach combining topological indices with Morgan fingerprints. The enhanced models raise the average best $R^2$ to 0.79, with individual improvements ranging from 165\% to 274\%. All improvements are statistically significant ($p < 0.001$). A direct comparison with a Graph Convolutional Network under identical experimental conditions shows that the enhanced classical models match or outperform deep learning on all five datasets. Comparison with the recent GNN+PGM hybrid of Djagba et al.\ further confirms competitiveness, with the enhanced models achieving the best results on two datasets and tying on one. The entire framework requires no GPU, trains in under five minutes, and uses only open-source tools, making it accessible for researchers in resource-limited settings.

2604.19805 2026-04-23 q-bio.PE

Modeling of Pneumococcal and Respiratory Syncytial Virus Pneumonia: An Epidemiological Review, with Statistical Inference

Rupchand Sutradhar, Anuj Mishra, Malay Banerjee, Subhra Sankar Dhar

详情
英文摘要

Infectious diseases continue to pose significant public health challenges worldwide, requiring effective prevention and control strategies to mitigate their negative impact. Infectious diseases can be broadly classified into two groups: vaccine-preventable diseases (e.g., measles, polio, influenza, hepatitis B, pneumonia) and vaccine-non-preventable diseases (e.g., HIV/AIDS). Vaccine-preventable disease models are one of the essential tools for understanding infectious disease dynamics, evaluating intervention strategies, and guiding public health policies. In this review article, we explore the recent advancements in modeling two particular vaccine-preventable infectious diseases. Here, we consider both deterministic and stochastic models to comprehensively capture the complexity of disease transmission, vaccine efficacy, and population-level immunity. We highlight the application of these models to the infectious diseases, namely, bacterial and viral pneumonia caused by the bacteria Streptococcus pneumoniae (S. pneumoniae) and the respiratory syncytial virus (RSV). Pneumonia carry a substantial global burden, where modeling has played a crucial role in assessing vaccine impacts and optimizing immunization strategies to minimize the disease burden. By synthesizing recent methodologies and findings, this review provides valuable insights for future research and policy decisions aimed at improving vaccine-preventable disease control for pneumonia caused by S. pneumoniae and RSV.

2604.19799 2026-04-23 cs.HC cs.AI cs.CY q-bio.NC

Measuring Creativity in the Age of Generative AI: Distinguishing Human and AI-Generated Creative Performance in Hiring and Talent Systems

Yigal Rosen, Ilia Rushkin

Comments Research Paper Presented at the BIG.AI@MIT Conference, April 2, 2026

详情
英文摘要

Generative AI is rapidly transforming how organizations create value and evaluate talent. While large language models enhance baseline output quality, they simultaneously introduce ambiguity in assessing human creativity, as observable artifacts may be partially or fully AI-generated. This paper reconceptualizes creativity as a distributional and process-based property that emerges under shared constraints and competitive incentives. We introduce a quantitative framework for measuring creativity as novelty in synthesis, operationalized through idea generation and idea transformation within embedding space. Empirical evaluation demonstrates that the proposed metrics align with intuitive judgments of creativity while capturing distinctions that surface-level quality assessments miss. We further identify a structural shift toward bimodal distributions of creative output in AI-mediated environments, with implications for hiring, leadership, and competitive strategy. The findings suggest that in the age of generative AI, distinctiveness rather than fluency becomes the primary signal of human creative capability.

2601.11505 2026-04-23 cs.LG cs.AI cs.SY eess.SY q-bio.QM

MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management

Miriam K. Wolff, Peter Calhoun, Eleonora Maria Aiello, Yao Qin, Sam F. Royston

Comments 30 pages, 5 figures, 1 Table, 10 supplementary figures, 3 supplementary tables, submitted to JDST

详情
英文摘要

Progress in Type 1 Diabetes (T1D) algorithm development is limited by the fragmentation and lack of standardization across existing T1D management datasets. Current datasets differ substantially in structure and are time-consuming to access and process, which impedes data integration and reduces the comparability and generalizability of algorithmic developments. This work aims to establish a unified and accessible data resource for T1D algorithm development. Multiple publicly available T1D datasets were consolidated into a unified resource, termed the MetaboNet dataset. Inclusion required the availability of both continuous glucose monitoring (CGM) data and corresponding insulin pump dosing records. Additionally, auxiliary information such as reported carbohydrate intake and physical activity was retained when present. The MetaboNet dataset comprises 3135 subjects and 1228 patient-years of overlapping CGM and insulin data, making it substantially larger than existing standalone benchmark datasets. The resource is distributed as a fully public subset available for immediate download at https://metabo-net.org/ , and with a Data Use Agreement (DUA)-restricted subset accessible through their respective application processes. For the datasets in the latter subset, processing pipelines are provided to automatically convert the data into the standardized MetaboNet format. A consolidated public dataset for T1D research is presented, and the access pathways for both its unrestricted and DUA-governed components are described. The resulting dataset covers a broad range of glycemic profiles and demographics and thus can yield more generalizable algorithmic performance than individual datasets.

2601.05367 2026-04-23 q-bio.PE

The rights and wrongs of rescaling in population genetics simulations

Parul Johri, Fanny Pouyet, Brian Charlesworth

详情
英文摘要

Computer simulations of complex population genetic models are an essential tool for making sense of the large-scale datasets of multiple genome sequences from a single species that are becoming increasingly available. A widely used approach for reducing computing time is to simulate populations that are much smaller than the natural populations that they are intended to represent, by using parameters such as selection coefficients and mutation rates whose products with the population size correspond to those of the natural populations. This approach has come to be known as rescaling, and is justified by the theory of the genetics of finite populations. Recently, however, there have been criticisms of this practice, which have brought to light situations in which it can lead to erroneous conclusions. This paper reviews the theoretical basis for rescaling, and relates it to current practice in population genetics simulations. It shows that some population genetic statistics are scaleable while others are not. Additionally, it shows that there are likely to be problems with rescaling when simulating large chromosomal regions, due to the non-linear relation between the physical distance between a pair of separate nucleotide sites and the frequency of recombination between them. Other difficulties with rescaling can arise in connection with simulations of selection on complex traits, and with populations that reproduce partly by self-fertilization or asexual reproduction. A number of recommendations are made for good practice in relation to rescaling.

2512.15808 2026-04-23 q-bio.QM cs.AI cs.CV cs.LG

Foundation Models in Biomedical Imaging: Turning Hype into Reality

Amgad Muneer, Kai Zhang, Ibraheem Hamdi, Rizwan Qureshi, Muhammad Waqas, Shereen Fouad, Hazrat Ali, Syed Muhammad Anwar, Jia Wu

Comments 9 figures and 3 tables

详情
英文摘要

Foundation models (FMs) are driving a prominent shift in biomedical imaging from task-specific models to unified backbone models for diverse tasks. This opens an avenue to integrate imaging, pathology, clinical records, and genomics data into a composite system. However, this vision contrasts sharply with modern medicine's trajectory toward more granular sub-specialization. This tension, coupled with data scarcity, domain heterogeneity, and limited interpretability, creates a gap between benchmark success and real-world clinical value. We argue that the immediate role of FMs lies in augmenting, not replacing, clinical expertise. To separate hype from reality, we introduce REAL-FM (Real-world Evaluation and Assessment of Foundation Models), a multi-dimensional framework for assessing data, technical readiness, clinical value, workflow integration, and responsible AI. Using REAL-FM, we find that while FMs excel in pattern recognition, they fall short in causal reasoning, domain robustness, and safety. Clinical translation is hindered by scarce representative data for model training, unverified generalization beyond oversimplified benchmark settings, and a lack of prospective outcome-based validation. We further examine FM reasoning paradigms, including sequential logic, spatial understanding, and symbolic domain knowledge. We envision that the path forward lies not in a monolithic medical oracle, but in coordinated subspecialist AI systems that are transparent, safe, and clinically grounded.

2510.21742 2026-04-23 q-bio.NC cond-mat.dis-nn cs.NE hep-th physics.bio-ph

Statistics of correlations in nonlinear recurrent neural networks

German Mato, Facundo Rigatuso, Gonzalo Torroba

Comments 39 pages, 9 figures

详情
英文摘要

The statistics of correlations are central quantities characterizing the collective dynamics of recurrent neural networks. We derive exact expressions for the statistics of correlations of nonlinear recurrent networks in the limit of a large number N of neurons, including systematic 1/N corrections, in the regime of Gaussian quenched disorder. Our approach uses a path-integral representation of the network stochastic dynamics, which reduces the description to a few collective variables and enables efficient computation. This generalizes previous results on linear networks to include a wide family of nonlinear activation functions, which enter as interaction terms in the path integral. These interactions can resolve the instability of the linear theory and yield a strictly positive participation dimension. We present explicit results for power-law activations, revealing scaling behavior controlled by the network coupling. In addition, we introduce a class of activation functions based on Pade approximants and provide analytic predictions for their correlation statistics. Numerical simulations confirm our theoretical results with excellent agreement. We also compare with previous works that have studied the complementary case with annealed disorder, and based on this we propose a new self-consistent equation for the more general case of colored noise.

2509.17260 2026-04-23 q-bio.NC cs.OH stat.AP

A tutorial on electrogastrography using low-cost hardware and open-source software

Evgeniya Anisimova, Sameer N. B. Alladin, Styliani Tsamaz, Edwin S. Dalmaijer

详情
英文摘要

Electrogastrography is the recording of changes in electric potential caused by the stomach's pacemaker region, typically through several cutaneous sensors placed on the abdomen. It is a worthwhile technique in medical and psychological research, but also relatively niche. Here we present a tutorial on the acquisition and analysis of the human electrogastrogram. Because dedicated equipment and software can be prohibitively expensive, we demonstrate how data can be acquired using a low-cost OpenBCI Ganglion amplifier. We also present a processing pipeline that minimises attrition, which is particularly helpful for low-cost equipment but also applicable to top-of-the-line hardware. Our approach comprises outlier rejection, frequency filtering, movement filtering, and noise reduction using independent component analysis. Where traditional approaches include a subjective step in which only one channel is manually selected for further analysis, our pipeline recomposes the electrogastrogram from all recorded channels after automatic rejection of nuisance components. The main benefits of this approach are reduced attrition, retention of data from all recorded channels, and reduced influence of researcher bias. In addition to our tutorial on the method, we offer a proof-of-principle in which our approach leads to reduced data rejection compared to established methods. We aimed to describe each step in sufficient detail to be implemented in any programming language. In addition, we made an open-source Python package freely available for ease of use.

2509.02060 2026-04-23 q-bio.BM cs.LG

Morphology-Aware Peptide Discovery via Masked Conditional Generative Modeling

Nuno Costa, Julija Zavadlav

Comments 46 pages, 4 figures, 6 tables

详情
英文摘要

Peptide self-assembly prediction offers a powerful bottom-up strategy for designing biocompatible, low-toxicity materials for large-scale synthesis in a broad range of biomedical and energy applications. However, screening the vast sequence space for categorization of aggregate morphology remains intractable. We introduce PepMorph, an end-to-end peptide discovery pipeline that generates novel sequences that are not only prone to aggregate but whose self-assembly is steered toward fibrillar or spherical morphologies by conditioning on isolated peptide descriptors that serve as morphology proxies. To this end, we compiled a new dataset by leveraging existing aggregation propensity datasets and extracting geometric and physicochemical descriptors. This dataset is then used to train a Transformer-based Conditional Variational Autoencoder with a masking mechanism, which generates novel peptides under arbitrary conditioning. After filtering to ensure design specifications and validation of generated sequences through coarse-grained molecular dynamics (CG-MD) simulations, PepMorph yielded 83% success rate under our CG-MD validation protocol and morphology criterion for the targeted class, showcasing its promise as a framework for application-driven peptide discovery.

2507.07800 2026-04-23 q-bio.QM cs.CV

A novel attention mechanism for noise-adaptive and robust segmentation of microtubules in microscopy images

Achraf Ait Laydi, Louis Cueff, Mewen Crespo, Yousef El Mourabit, Hélène Bouvrais

详情
英文摘要

Segmenting cytoskeletal filaments in microscopy images is essential for studying their roles in cellular processes. However, this task is highly challenging due to the fine, densely packed, and intertwined nature of these structures. Imaging limitations further complicate analysis. While deep learning has advanced segmentation of large, well-defined biological structures, its performance often degrades under such adverse conditions. Additional challenges include obtaining precise annotations for curvilinear structures and managing severe class imbalance during training. We introduce a novel noise-adaptive attention mechanism that extends the Squeeze-and-Excitation (SE) module to dynamically adjust to varying noise levels. Integrated into a U-Net decoder with residual encoder blocks, this yields ASE_Res_UNet, a lightweight yet high-performance model. We also developed a synthetic dataset generation strategy that ensures accurate annotations of fine filaments in noisy images. We systematically evaluated loss functions and metrics to mitigate class imbalance, ensuring robust performance assessment. ASE_Res_UNet effectively segmented microtubules in noisy synthetic images, outperforming its ablated variants. It also demonstrated superior segmentation compared to models with alternative attention mechanisms or distinct architectures, while requiring fewer parameters, making it efficient for resource-constrained environments. Evaluation on a newly curated real microscopy dataset and a recently reannotated dataset highlighted ASE_Res_UNet's effectiveness in segmenting microtubules beyond synthetic images. For these datasets, ASE_Res_UNet was competitive with a recent synthetic data-driven approach that shares two cytoskeleton pretrained models. Importantly, ASE_Res_UNet showed strong transferability to other curvilinear structures (blood vessels and nerves) across diverse imaging conditions.

2506.14103 2026-04-23 stat.ME q-bio.QM

A Robust Nonparametric Framework for Detecting Repeated Spatial Patterns

Rajitha Senanayake, Pratheepa Jeganathan

Comments 39 pages including an Appendix of 17 pages, 39 figures

详情
英文摘要

Identifying spatially contiguous clusters and repeated spatial patterns (RSP) characterized by similar underlying distributions that are spatially apart is a key challenge in modern spatial statistics. Existing constrained clustering methods enforce spatial contiguity but are limited in their ability to identify RSP. We propose a novel nonparametric framework that addresses this limitation by combining constrained clustering with a post-clustering reassigment step based on the maximum mean discrepancy (MMD) statistic. We employ a block permutation strategy within each cluster that preserves local attribute structure when approximating the null distribution of the MMD. We also show that the MMD$^2$ statistic is asymptotically consistent under second-order stationarity and spatial mixing conditions. This two-stage approach enables the detection of clusters that are both spatially distant and similar in distribution. Through simulation studies that vary spatial dependence, cluster sizes, shapes, and multivariate dimensionality, we demonstrate the robustness of our proposed framework in detecting RSP. We further illustrate its applicability through an analysis of spatial proteomics data from patients with triple-negative breast cancer. Overall, our framework presents a methodological advancement in spatial clustering, offering a flexible and robust solution for spatial datasets that exhibit repeated patterns.

2411.00063 2026-04-23 q-bio.QM

Logistic Regression Analysis on the Dietary Behavior and the Risk of Nutritional Deficiency Dermatosis: The Case of Bicol Region, Philippines

John Ben S Temones

Comments 11 pages

详情
英文摘要

This study explores the link between dietary behavior and the risk of nutritional deficiency dermatoses (NDD) in the Bicol region, where malnutrition remains a concern. Using regression analysis on FNRI data, it examines food purchase patterns, particularly riboflavin intake. Findings show an NDD risk prevalence of 15.75%, with Masbate and Camarines Sur contributing over half of cases. While rice (1590.93 g/day) and plant-based diets (523.30 g/day) are not rich in riboflavin, they still reduce NDD odds by 0.3% per gram. Riboflavin-rich foods like meat, eggs, and dairy lower risks by up to 3% per gram. The logistic regression model demonstrated strong performance (Nagelkerke = 0.765, accuracy = 94.1%, precision = 84.5%). Findings highlight the need for nutrition interventions, including enriched rice, better market access, and food diversity education to improve riboflavin intake and mitigate NDD risks.

2404.06459 2026-04-23 q-bio.PE

A hybrid discrete-continuum modelling approach for the interactions of the immune system with oncolytic viral infections

David Morselli, Marcello E. Delitala, Adrianne L. Jenner, Federico Frascoli

Comments 32 pages, 12 figures. Supplementary material available at https://doi.org/10.5281/zenodo.18340945

详情
Journal ref
J. Theor. Biol. (2026), 627, p. 112462
英文摘要

Oncolytic virotherapy, utilizing genetically modified viruses to combat cancer and trigger anti-cancer immune responses, has garnered significant attention in recent years. In our previous work arXiv:2305.12386, we developed a stochastic agent-based model elucidating the spatial dynamics of infected and uninfected cells within solid tumours. Building upon this foundation, we present a novel stochastic agent-based model to describe the intricate interplay between the virus and the immune system; the agents' dynamics are coupled with a balance equation for the concentration of the chemoattractant that guides the movement of immune cells. We formally derive the continuum limit of the model and carry out a systematic quantitative comparison between this system of PDEs and the individual-based model in two spatial dimensions. Furthermore, we describe the traveling waves of the three populations, with the uninfected proliferative cells trying to escape from the infected cells while immune cells infiltrate the tumour. Simulations show a good agreement between agent-based approaches and numerical results for the continuum model. Some parameter ranges give rise to oscillations of cell number in both models, in line with the behaviour of the corresponding nonspatial model, which presents Hopf bifurcations. Nevertheless, in some situations the behaviours of the two models may differ significantly, suggesting that stochasticity plays a key role in the dynamics. Our results highlight that a too rapid immune response, before the infection is well-established, appears to decrease the efficacy of the therapy and thus some care is needed when oncolytic virotherapy is combined with immunotherapy. This further suggests the importance of clinically improving the modulation of the immune response according to the tumour's characteristics and to the immune capabilities of the patients.

2604.20824 2026-04-23 cs.LG q-bio.QM

Closing the Domain Gap in Biomedical Imaging by In-Context Control Samples

Ana Sanchez-Fernandez, Thomas Pinetz, Werner Zellinger, Günter Klambauer

详情
英文摘要

The central problem in biomedical imaging are batch effects: systematic technical variations unrelated to the biological signal of interest. These batch effects critically undermine experimental reproducibility and are the primary cause of failure of deep learning systems on new experimental batches, preventing their practical use in the real world. Despite years of research, no method has succeeded in closing this performance gap for deep learning models. We propose Control-Stabilized Adaptive Risk Minimization via Batch Normalization (CS-ARM-BN), a meta-learning adaptation method that exploits negative control samples. Such unperturbed reference images are present in every experimental batch by design and serve as stable context for adaptation. We validate our novel method on Mechanism-of-Action (MoA) classification, a crucial task for drug discovery, on the large-scale JUMP-CP dataset. The accuracy of standard ResNets drops from 0.939 $\pm$ 0.005, on the training domain, to 0.862 $\pm$ 0.060 on data from new experimental batches. Foundation models, even after Typical Variation Normalization, fail to close this gap. We are the first to show that meta-learning approaches close the domain gap by achieving 0.935 $\pm$ 0.018. If the new experimental batches exhibit strong domain shifts, such as being generated in a different lab, meta-learning approaches can be stabilized with control samples, which are always available in biomedical experiments. Our work shows that batch effects in bioimaging data can be effectively neutralized through principled in-context adaptation, which also makes them practically usable and efficient.

2604.20629 2026-04-23 math.PR q-bio.PE

Rates of forgetting for the sequentially Markov coalescent

Jonathan Terhorst

详情
英文摘要

The sequentially Markov coalescent (SMC) is a Markov jump process which models correlations in local genealogies across a chromosome. It has been used as a theoretical tool for studying linkage disequilibrium and identity-by-descent, and it also forms the basis of a class of statistical procedures for estimating population history and inferring ancestry. In this paper, we study the rate at which SMC forgets its initial condition in the pairwise setting. For the embedded jump chain, we prove geometric ergodicity in total variation, with explicit constants. For the continuous process, by contrast, the total variation distance from stationarity decays as $\asymp 1/\ell$ in genetic distance $\ell$. We obtain analogous results for the closely related SMC' process using a novel time-change argument. One application of these results is to justify heuristic approximations used in the literature that treat distant loci as evolving independently.

2604.20626 2026-04-23 q-bio.PE cs.AI

Centering Ecological Goals in Automated Identification of Individual Animals

Lukas Picek, Timm Haucke, Lukáš Adam, Ekaterina Nepovinnykh, Lasha Otarashvili, Kostas Papafitsoros, Tanya Berger-Wolf, Michael B. Brown, Tilo Burghardt, Vojtech Cermak, Daniela Hedwig, Justin Kitzes, Sam Lapp, Subhransu Maji, Daniel Rubenstein, Arjun Subramonian, Charles Stewart, Silvia Zuffi, Sara Beery

详情
英文摘要

Recognizing individual animals over time is central to many ecological and conservation questions, including estimating abundance, survival, movement, and social structure. Recent advances in automated identification from images and even acoustic data suggest that this process could be greatly accelerated, yet their promise has not translated well into ecological practice. We argue that the main barrier is not the performance of the automated methods themselves, but a mismatch between how those methods are typically developed and evaluated, and how ecological data is actually collected, processed, reviewed, and used. Future progress, therefore, will depend less on algorithmic gains alone than on recognizing that the usefulness of automated identification is grounded in ecological context: it depends on what question is being asked, what data are available, and what kinds of mistakes matter. Only by centering these questions can we move toward automated identification of individuals that is not only accurate but also ecologically useful, transparent, and trustworthy.

2604.20524 2026-04-23 q-bio.NC cond-mat.dis-nn cs.NE

Response time of lateral predictive coding and benefits of modular structures

Guanghui Cai, Zhen-Ye Huang, Weikang Wang, Hai-Jun Zhou

Comments 16 pages, under review in Physica A

详情
英文摘要

Lateral predictive coding (LPC) is a simple theoretical framework to appreciate feature detection in biological neural circuits. Recent theoretical work [Huang et al., Phys.Rev.E 112, 034304 (2025)] has successfully constructed optimal LPC networks capable of extracting non-Gaussian hidden input features by imposing the tradeoff between energetic cost and information robustness, but the resulting dynamical systems of recurrent interactions can be very slow in responding to external inputs. We investigate response-time reduction in the present paper. We find that the characteristic response time of the LPC system can be minimized to closely approaching the lower-bound value without compromising the mean predictive error (energetic cost) and the information robustness of signal transmission. We further demonstrate that optimal LPC networks taking a modular structural organization with extensively reduced number of lateral interactions are equally excellent as all-to-all completely connected networks, in terms of feature detection performance, response time, energetic cost and information robustness.

2407.01621 2026-04-23 cs.LG q-bio.QM stat.ME stat.ML

Deciphering interventional dynamical causality from non-intervention complex systems

Jifan Shi, Yang Li, Juan Zhao, Siyang Leng, Rui Bao, Kazuyuki Aihara, Luonan Chen, Wei Lin

详情
英文摘要

Detecting and quantifying causality is a focal topic in the fields of science, engineering, and interdisciplinary studies. However, causal studies on non-intervention systems attract much attention but remain extremely challenging. Delay-embedding technique provides a promising approach. In this study, we propose a framework named Interventional Dynamical Causality (IntDC) in contrast to the traditional Constructive Dynamical Causality (ConDC). ConDC, including Granger causality, transfer entropy and convergence of cross-mapping, measures the causality by constructing a dynamical model without considering interventions. A computational criterion, Interventional Embedding Entropy (IEE), is proposed to measure causal strengths in an interventional manner. IEE is an intervened causal information flow but in the delay-embedding space. Further, the IEE theoretically and numerically enables the deciphering of IntDC solely from observational (non-interventional) time-series data, without requiring any knowledge of dynamical models or real interventions in the considered system. In particular, IEE can be applied to rank causal effects according to their importance and construct causal networks from data. We conducted numerical experiments to demonstrate that IEE can find causal edges accurately, eliminate effects of confounding, and quantify causal strength robustly over traditional indices. We also applied IEE to real-world tasks. IEE performed as an accurate and robust tool for causal analyses solely from the observational data. The IntDC framework and IEE algorithm provide an efficient approach to the study of causality from time series in diverse non-intervention complex systems.