arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.12884 2026-04-15 cs.NE q-bio.PE

An abstract model of nonrandom, non-Lamarckian mutation in evolution using a multivariate estimation-of-distribution algorithm

Liudmyla Vasylenko, Adi Livnat

Comments 62 pages, 8 figures

详情
英文摘要

At the fundamental conceptual level, two alternatives have traditionally been considered for how mutations arise and how evolution happens: 1) random mutation and natural selection, and 2) Lamarckism. Recently, the theory of Interaction-based Evolution (IBE) has been proposed, according to which mutations are neither random nor Lamarckian, but are influenced by information accumulating internally in the genome over generations. Based on the estimation-of-distribution algorithms framework, we present a simulation model that demonstrates nonrandom, non-Lamarckian mutation concretely while capturing indirectly several aspects of IBE: selection, recombination, and nonrandom, non-Lamarckian mutation interact in a complementary fashion; evolution is driven by the interaction of parsimony and fit; and random bits do not directly encode improvement but enable generalization by the manner in which they connect with the rest of the evolutionary process. Connections are drawn to Darwin's observations that changed conditions increase the rate of production of heritable variation; to the causes of bell-shaped distributions of traits and how these distributions respond to selection; and to computational learning theory, where analogizing evolution to learning in accord with IBE casts individuals as examples and places the learned hypothesis at the population level. The model highlights the importance of incorporating internal integration of information through heritable change in both evolutionary theory and evolutionary computation.

2604.12825 2026-04-15 q-bio.NC

The illusory simplicity of the feedforward pass: evidence for the dynamical nature of stimulus encoding along the primate ventral stream

Daniel Anthes, Sushrut Thorat, Anna Mitola, Paolo Papale, Peter König, Tim C Kietzmann

详情
英文摘要

In studying primate vision, a large body of work focuses on the first feedforward sweep. During this initial time window, information is thought to pass through ventral stream regions in a stage-like fashion in an effort to extract high-level information from the retinal input. Consequently, electrophysiological analyses commonly focus on spatial response patterns, either by averaging data in time, or by applying decoders in a temporally local fashion. By analysing data recorded simultaneously across multiple arrays placed along the macaque ventral stream, we here show that this prior approach may be missing key aspects of information encoding. First, time-resolved, multivariate analyses of information transfer between V4 and IT reveal temporally and semantically varied information content as being exchanged within the first 100ms of processing. Second, by employing recurrent neural network (RNN) decoding techniques that extend across the temporal domain, we demonstrate that the neural pattern dynamics themselves carry categorical information far beyond the spatially encoded information available at any given time point. These findings challenge the prevailing view of a single, stage-like feedforward process and suggest that even the earliest parts of visual processing are better characterised as a spatiotemporally evolving process that encodes information in its dynamics rather than purely spatial response patterns.

2604.12683 2026-04-15 cs.CV q-bio.NC

Brain-DiT: A Universal Multi-state fMRI Foundation Model with Metadata-Conditioned Pretraining

Junfeng Xia, Wenhao Ye, Xuanye Pan, Xinke Shen, Mo Wang, Quanying Liu

详情
英文摘要

Current fMRI foundation models primarily rely on a limited range of brain states and mismatched pretraining tasks, restricting their ability to learn generalized representations across diverse brain states. We present \textit{Brain-DiT}, a universal multi-state fMRI foundation model pretrained on 349,898 sessions from 24 datasets spanning resting, task, naturalistic, disease, and sleep states. Unlike prior fMRI foundation models that rely on masked reconstruction in the raw-signal space or a latent space, \textit{Brain-DiT} adopts metadata-conditioned diffusion pretraining with a Diffusion Transformer (DiT), enabling the model to learn multi-scale representations that capture both fine-grained functional structure and global semantics. Across extensive evaluations and ablations on 7 downstream tasks, we find consistent evidence that diffusion-based generative pretraining is a stronger proxy than reconstruction or alignment, with metadata-conditioned pretraining further improving downstream performance by disentangling intrinsic neural dynamics from population-level variability. We also observe that downstream tasks exhibit distinct preferences for representational scale: ADNI classification benefits more from global semantic representations, whereas age/sex prediction comparatively relies more on fine-grained local structure. Code and parameters of Brain-DiT are available at \href{https://github.com/REDMAO4869/Brain-DiT}{Link}.

2604.12671 2026-04-15 q-bio.QM eess.SP

Differentiating Physical and Psychological Stress Using Wearable Physiological Signals and Salivary Cortisol

Ozan Kaya, Nikoletta Athanassopoulou, George G. Malliaras, Marco Vinicio Alban-Paccha

Comments 8 pages, 4 figures, 3 tables

详情
英文摘要

Objective: This study aimed to assess how wearable physiological signals, alone and combined with salivary cortisol, distinguish physical and psychological stress and their recovery states. Methods: Six healthy adults completed three laboratory sessions on separate days: rest, physical stress (high-intensity cycling), or psychological stress (modified Trier Social Stress Test). Heart rate, heart rate variability, electrodermal activity, and wrist accelerometry were recorded continuously, and salivary cortisol was sampled at five time points. Features were extracted in non-overlapping 10-minute windows and labelled as rest, physical stress, physical recovery, psychological stress, or psychological recovery. A gradient boosting classifier was trained using wearable features alone and with five additional cortisol features per window. Performance was evaluated using leave-one-participant-out cross-validation. Results: Wearable-only classification achieved 77.8% overall accuracy, with high accuracy for physical stress and recovery but frequent misclassification of psychological stress and recovery (recall 50.0% and 54.2%). Including cortisol improved overall accuracy (94.4%), particularly for psychological states, increasing recall to 83.3% and 87.5%. Cortisol also reduced misclassification between psychological stress and rest. Conclusion: Wearable signals alone were insufficient to reliably distinguish psychological stress from rest and recovery. Integrating salivary cortisol improved classification of psychological stress and recovery and reduced confusion with rest, highlighting the value of endocrine context alongside wearable physiology. Significance: These findings support multimodal stress monitoring and motivate larger, ecologically valid studies and scalable alternatives to repeated cortisol sampling.

2511.13790 2026-04-15 q-bio.QM cs.AI

GeoPl@ntNet: A Platform for Exploring Essential Biodiversity Variables

Lukas Picek, César Leblanc, Alexis Joly, Pierre Bonnet, Rémi Palard, Maximilien Servajean

Comments 4 pages, 5 figures, and 2 tables

详情
英文摘要

This paper describes GeoPl@ntNet, an interactive web application designed to make Essential Biodiversity Variables accessible and understandable to everyone through dynamic maps and fact sheets. Its core purpose is to allow users to explore high-resolution AI-generated maps of species distributions, habitat types, and biodiversity indicators across Europe. These maps, developed through a cascading pipeline involving convolutional neural networks and large language models, provide an intuitive yet information-rich interface to better understand biodiversity, with resolutions as precise as 50x50 meters. The website also enables exploration of specific regions, allowing users to select areas of interest on the map (e.g., urban green spaces, protected areas, or riverbanks) to view local species and their coverage. Additionally, GeoPl@ntNet generates comprehensive reports for selected regions, including insights into the number of protected species, invasive species, and endemic species.

2508.19420 2026-04-15 q-bio.QM

Using PyBioNetFit to Leverage Qualitative and Quantitative Data in Biological Model Parameterization and Uncertainty Quantification

Ely F. Miller, Abhishek Mallela, Jacob Neumann, Yen Ting Lin, William S. Hlavacek, Richard G. Posner

Comments 45 pages, 7 main figures, 4 supplemental figures. Main text, figures, tables, all captions, and supplemental material included

详情
英文摘要

Data generated in studies of cellular regulatory systems are often qualitative. For example, measurements of signaling readouts in the presence and absence of mutations may reveal a rank ordering of responses across conditions but not the precise extents of mutation-induced differences. Qualitative data are often ignored by mathematical modelers or are considered in an ad hoc manner, as in the study of Kocieniewski and Lipniacki (2013) [Phys Biol 10: 035006], which was focused on the roles of MEK isoforms in ERK activation. In this earlier study, model parameter values were tuned manually to obtain consistency with a combination of qualitative and quantitative data. This approach is not reproducible, nor does it provide insights into parametric or prediction uncertainties. Here, starting from the same data and the same ordinary differential equation (ODE) model structure, we generate formalized statements of qualitative observations, making these observations more reusable, and we improve the model parameterization procedure by applying a systematic and automated approach enabled by the software package PyBioNetFit. We also demonstrate uncertainty quantification (UQ), which was absent in the original study. Our results show that PyBioNetFit enables qualitative data to be leveraged, together with quantitative data, in parameterization of systems biology models and facilitates UQ. These capabilities are important for reliable estimation of model parameters and model analyses in studies of cellular regulatory systems and reproducibility.

2412.07238 2026-04-15 cs.CL q-bio.NC

Speaker effects in language comprehension: An integrative model of language and speaker processing

Hanlin Wu, Zhenguang G. Cai

详情
Journal ref
Psychon Bull Rev 33, 138 (2026)
英文摘要

The identity of a speaker influences language comprehension through modulating perception and expectation. This review explores speaker effects and proposes an integrative model of language and speaker processing that integrates distinct mechanistic perspectives. We argue that speaker effects arise from the interplay between bottom-up perception-based processes, driven by acoustic-episodic memory, and top-down expectation-based processes, driven by a speaker model. We show that language and speaker processing are functionally integrated through multi-level probabilistic processing: prior beliefs about a speaker modulate language processing at the phonetic, lexical, and semantic levels, while the unfolding speech and message continuously update the speaker model, refining broad demographic priors into precise individualized representations. Within this framework, we distinguish between speaker-idiosyncrasy effects arising from familiarity with an individual and speaker-demographics effects arising from social group expectations. We discuss how speaker effects serve as indices for assessing language development and social cognition, and we encourage future research to extend these findings to the emerging domain of artificial intelligence (AI) speakers, as AI agents represent a new class of social interlocutors that are transforming the way we engage in communication.

2604.12546 2026-04-15 q-bio.PE cond-mat.stat-mech physics.soc-ph

Predicting success of cooperators across arbitrary heterogeneous environmental landscapes

Amir Kargaran, Kamran Kaveh, Krishnendu Chatterjee

Comments 34 pages, 6 figures in main text, 10 figures in supplementary material

详情
英文摘要

Cooperation is central to the organization of complex biological and social systems. Most theoretical models assume homogeneous environments; in reality, populations inhabit spatially varying landscapes in which the payoffs of cooperation differ across space. Here, we introduce a general framework for the evolution of cooperation in complex, heterogeneous environments where the benefit of cooperation depends on local environmental quality. Cooperators in environmentally rich sites confer greater benefits than those on poor sites. We show that whether heterogeneity promotes or suppresses cooperation is determined primarily by the spatial organization of environmental states. Across arbitrary environmental landscapes, a single quantity, the spatial correlation index (SCI), predicts the fixation probability of cooperators. Under weak selection, segregated environments enhance cooperation, whereas highly intermixed, checkerboard-like landscapes suppress it. Beyond fixation probabilities, environmental organization also controls evolutionary timescales: segregated landscapes generate long-lived metastable coexistence, whereas intermixed landscapes lead to faster but less successful fixation of cooperators. Together, these results provide a unifying description of how spatial environmental heterogeneity shapes the evolution of cooperation and suggest measurable predictors of cooperative success in biological and social settings.

2604.12387 2026-04-15 q-bio.GN

oxo-call: Documentation-grounded Skill Augmentation for Accurate Bioinformatics Command-line Generation with Large Language Models

Yun Peng, Yujun Sun, Jia Ding, Bin Yan, Zhangyu Wang, Chunyang Wang, Chenyang Shu, Jian-Guo Zhou, Shixiang Wang

Comments 19 pages, 4 figures

详情
英文摘要

Command-line bioinformatics tools remain essential for genomic analysis, yet their diversity in syntax and parameterization presents a persistent barrier to productive research. We present oxo-call, a Rust-based command-line assistant that translates natural-language task descriptions into accurate tool invocations through two complementary strategies: documentation-first grounding, which provides the large language model (LLM) with the complete, version-specific help text of each target tool, and curated skill augmentation, which primes the model with domain-expert concepts, common pitfalls, and worked examples. oxo-call (v0.10) ships >150 built-in skills covering 44 analytical categories, from variant calling and genome assembly to single-cell transcriptomics, compiled into a single, statically linked binary. Every generated command is logged with provenance metadata to support reproducible research. oxo-call also provides a DAG-based workflow engine, extensibility through user-defined and community skills via the Model Context Protocol, and support for local LLM inference to address data-privacy requirements. oxo-call is freely available for academic use at https://traitome.github.io/oxo-call/.

2604.12294 2026-04-15 q-bio.QM stat.AP

The IQ-Motion Confound in Multi-Site Autism fMRI May Be Inflated by Site-Correlated Measurement Uncertainty

Kareem Soliman

Comments 14 pages, 4 figures, 2 tables

详情
英文摘要

Multi-site autism neuroimaging studies routinely control for the confound between full-scale IQ and head motion by regressing framewise displacement against IQ scores and removing shared variance. This procedure assumes that ordinary least squares (OLS) provides an unbiased estimate of the confound magnitude. We tested this assumption on the ABIDE-I phenotypic dataset (n=935 subjects across 19 international scanning sites) using Probability Cloud Regression, an errors-in-variables (EIV) estimator that models per-observation measurement uncertainty in both variables. IQ measurement error was derived from published Wechsler test-retest reliability coefficients; response-side uncertainty was represented by a site-level proxy equal to the within-site standard deviation of mean framewise displacement. Three findings emerged. First, OLS overestimates the IQ-motion slope by a factor of 4.67 relative to the EIV-corrected estimate when the bias factor is computed from the full-precision fitted coefficients (OLS -0.00125, EIV -0.00027 mm per IQ point after rounding for display). Second, under leave-site-out cross-validation a single pooled predictor of raw FD produces negative out-of-sample R^2 at all 19 sites (overall R^2 = -0.074), indicating that the pooled predictor does not transport cleanly across sites once site information is removed. Third, the direction of the EIV-corrected slope is robust across all 64 configurations of an 8x8 sensitivity grid spanning 12-fold ranges of each noise parameter. These results suggest that pooled OLS may overstate the IQ-motion association in ABIDE-I, but direct downstream consequences for motion-correction pipelines remain to be quantified using raw motion traces and connectivity-level re-analysis. Formal EIV methods appear to remain uncommon in multi-site neuroimaging confound estimation.

2604.12164 2026-04-15 q-bio.PE cs.DS math.OC

Phylogenetic Inference under the Balanced Minimum Evolution Criterion via Semidefinite Programming

P. Skums

详情
英文摘要

In this study, we investigate the application of Semidefinite Programming (SDP) to phylogenetics. SDP is a powerful optimization framework that seeks to optimize a linear objective function over the cone of positive semidefinite matrices. As a convex optimization problem, SDP generalizes linear programming and provides tight relaxations for many combinatorial optimization problems. However, despite its many applications, SDP remains largely unused in computational biology. We argue that SDP relaxations are particularly well suited for phylogenetic inference. As a proof of concept, we focus on the Balanced Minimum Evolution (BME) problem, a widely used model in distance-based phylogenetics. We propose an algorithm combining an SDP relaxation with a rounding scheme that iteratively converts relaxed solutions into valid tree topologies. Experiments on simulated and empirical datasets show that the method enables accurate phylogenetic reconstruction. The approach is sufficiently general to be extendable to other phylogenetic problems.

2604.12075 2026-04-15 cs.CV cs.AI cs.LG q-bio.QM

OpenTME: An Open Dataset of AI-powered H&E Tumor Microenvironment Profiles from TCGA

Maaike Galama, Nina Kozar-Gillan, Christina Embacher, Todd Dembo, Cornelius Böhm, Evelyn Ramberger, Julika Ribbat-Idel, Rosemarie Krupar, Verena Aumiller, Miriam Hägele, Kai Standvoss, Gerrit Erdmann, Blanca Pablos, Ari Angelo, Simon Schallenberg, Andrew Norgan, Viktor Matyas, Klaus-Robert Müller, Maximilian Alber, Lukas Ruff, Frederick Klauschen

详情
英文摘要

The tumor microenvironment (TME) plays a central role in cancer progression, treatment response, and patient outcomes, yet large-scale, consistent, and quantitative TME characterization from routine hematoxylin and eosin (H&E)-stained histopathology remains scarce. We introduce OpenTME, an open-access dataset of pre-computed TME profiles derived from 3,634 H&E-stained whole-slide images across five cancer types (bladder, breast, colorectal, liver, and lung cancer) from The Cancer Genome Atlas (TCGA). All outputs were generated using Atlas H&E-TME, an AI-powered application built on the Atlas family of pathology foundation models, which performs tissue quality control, tissue segmentation, cell detection and classification, and spatial neighborhood analysis, yielding over 4,500 quantitative readouts per slide at cell-level resolution. OpenTME is available for non-commercial academic research on Hugging Face. We will continue to expand OpenTME over time and anticipate it will serve as a resource for biomarker discovery, spatial biology research, and the development of computational methods for TME analysis.

2604.12060 2026-04-15 cs.LG cs.AI q-bio.GN

Interpretable DNA Sequence Classification via Dynamic Feature Generation in Decision Trees

Nicolas Huynh, Krzysztof Kacprzyk, Ryan Sheridan, David Bentley, Mihaela van der Schaar

Comments AISTATS 2026

详情
英文摘要

The analysis of DNA sequences has become critical in numerous fields, from evolutionary biology to understanding gene regulation and disease mechanisms. While deep neural networks can achieve remarkable predictive performance, they typically operate as black boxes. Contrasting these black boxes, axis-aligned decision trees offer a promising direction for interpretable DNA sequence analysis, yet they suffer from a fundamental limitation: considering individual raw features in isolation at each split limits their expressivity, which results in prohibitive tree depths that hinder both interpretability and generalization performance. We address this challenge by introducing DEFT, a novel framework that adaptively generates high-level sequence features during tree construction. DEFT leverages large language models to propose biologically-informed features tailored to the local sequence distributions at each node and to iteratively refine them with a reflection mechanism. Empirically, we demonstrate that DEFT discovers human-interpretable and highly predictive sequence features across a diverse range of genomic tasks.

2604.12026 2026-04-15 cs.LG q-bio.BM q-bio.QM

TriFit: Trimodal Fusion with Protein Dynamics for Mutation Fitness Prediction

Seungik Cho

详情
英文摘要

Predicting the functional impact of single amino acid substitutions (SAVs) is central to understanding genetic disease and engineering therapeutic proteins. While protein language models and structure-based methods have achieved strong performance on this task, they systematically neglect protein dynamics; residue flexibility, correlated motions, and allosteric coupling are well-established determinants of mutational tolerance in structural biology, yet have not been incorporated into supervised variant effect predictors. We present TriFit, a multimodal framework that integrates sequence, structure, and protein dynamics through a four-expert Mixture-of-Experts (MoE) fusion module with trimodal cross-modal contrastive learning. Sequence embeddings are extracted via masked marginal scoring with ESM-2 (650M); structural embeddings from AlphaFold2-predicted C-alpha geometries; and dynamics embeddings from Gaussian Network Model (GNM) B-factors, mode shapes, and residue-residue cross-correlations. The MoE router adaptively weights modality combinations conditioned on the input, enabling protein-specific fusion without fixed modality assumptions. On the ProteinGym substitution benchmark (217 DMS assays, 696k SAVs), TriFit achieves AUROC 0.897 +/- 0.0002, outperforming all supervised baselines including Kermut (0.864) and ProteinNPT (0.844), and the best zero-shot model ESM3 (0.769). Ablation studies confirm that dynamics provides the largest marginal contribution over pairwise modality combinations, and TriFit achieves well-calibrated probabilistic outputs (ECE = 0.044) without post-hoc correction.

2604.12004 2026-04-15 q-bio.PE cond-mat.stat-mech

Fixation probabilities for multi-allele Moran dynamics with weak selection

Ian Braga, Lucas Wardil, Ricardo Martinez-Garcia

Comments 11 pages, 3 figures

详情
英文摘要

Fixation probabilities are essential for characterizing stochastic evolutionary dynamics, but analytical results remain limited mainly to systems with two competing types. We develop a perturbative framework to compute fixation probabilities in multi-allele Moran processes under weak selection. Exploiting the general structure of the backward Fokker-Planck operator in this regime, we show that fixation probabilities admit a systematic expansion around their neutral solution. We first introduce the framework in a general case with $M$ competing alleles and arbitrary fitness functions, and then apply it to three biologically motivated examples: a simple model of three competing alleles with a constant fitness function, a coordination game in which allele fitness increases with its frequency in the population, and a model of clonal interference between mutualistic alleles. These results extend the analytical understanding of fixation probabilities beyond pairwise interactions, establishing a framework for investigating multi-strategy stochastic evolutionary dynamics.

2604.11944 2026-04-15 cs.LG q-bio.QM

A unified data format for managing diabetes time-series data: DIAbetes eXchange (DIAX)

Elliott C. Pryor, Marc D. Breton, Anas El Fathi

Comments 7 pages, 2 figures

详情
英文摘要

Diabetes devices, including Continuous Glucose Monitoring (CGM), Smart Insulin Pens, and Automated Insulin Delivery systems, generate rich time-series data widely used in research and machine learning. However, inconsistent data formats across sources hinder sharing, integration, and analysis. We present DIAX (DIAbetes eXchange), a standardized JSON-based format for unifying diabetes time-series data, including CGM, insulin, and meal signals. DIAX promotes interoperability, reproducibility, and extensibility, particularly for machine learning applications. An open-source repository provides tools for dataset conversion, cross-format compatibility, visualization, and community contributions. DIAX is a translational resource, not a data host, ensuring flexibility without imposing data-sharing constraints. Currently, DIAX is compatible with other standardization efforts and supports major datasets (DCLP3, DCLP5, IOBP2, PEDAP, T1Dexi, Loop), totaling over 10 million patient-hours of data. https://github.com/Center-for-Diabetes-Technology/DIAX

2604.11915 2026-04-15 cs.LG cs.AI cs.NE q-bio.PE

Can AI Detect Life? Lessons from Artificial Life

Ankit Gupta, Christoph Adami

Comments 6 pages, 7 figures. Alife 2026

详情
英文摘要

Modern machine learning methods have been proposed to detect life in extraterrestrial samples, drawing on their ability to distinguish biotic from abiotic samples based on training models using natural and synthetic organic molecular mixtures. Here we show using Artificial Life that such methods are easily fooled into detecting life with near 100% confidence even if the analyzed sample is not capable of life. This is due to modern machine learning methods' propensity to be easily fooled by out-of-distribution samples. Because extra-terrestrial samples are very likely out of the distribution provided by terrestrial biotic and abiotic samples, using AI methods for life detection is bound to yield significant false positives.

2604.11818 2026-04-15 q-bio.PE math.PR

Scale-dependent Temporal Signatures of Arboviral Transmission in Urban Environments

Marcílio Ferreira dos Santos, Cleiton de Lima Ricardo

Comments 5 figures, 13 pages

详情
英文摘要

Understanding epidemic dynamics in urban environments requires models that capture interactions across space and time while incorporating biological constraints. In this work, we propose a probabilistic spatiotemporal framework based on pairwise interaction kernels to analyze arboviral transmission using large-scale georeferenced data from Recife, Brazil. The model describes interactions as a function of spatial distance and temporally delayed influence, with parameters estimated via maximum likelihood. Our results reveal a marked asymmetry between spatial and temporal components. The spatial parameter systematically collapses, indicating that spatial proximity does not provide discriminatory information between diseases at the urban scale. In contrast, temporal dynamics exhibit scale-dependent behavior: statistical differentiation between dengue, Zika, and chikungunya emerges only beyond a critical temporal window. We show that unconstrained models primarily capture short-term co-occurrence, leading to apparent but non-robust differences, while biologically constrained models reveal a common underlying transmission structure. Additionally, reconstructed transmission networks exhibit localized and structured interaction patterns consistent with plausible epidemic propagation. These findings demonstrate that epidemic differentiation is not intrinsic, but an emergent phenomenon dependent on temporal scale, highlighting the importance of biologically grounded and scale-aware modeling in spatiotemporal epidemic analysis.

2603.29977 2026-04-15 cs.LG cs.AI q-bio.QM

Quantifying Cross-Modal Interactions in Multimodal Glioma Survival Prediction via InterSHAP: Evidence for Additive Signal Integration

Iain Swift, JingHua Ye, Ruairi O'Reilly

Comments 8 pages, 1 figure, under review at XAI 2026 LBW

详情
英文摘要

Multimodal deep learning for cancer prognosis is commonly assumed to benefit from synergistic cross-modal interactions, yet this assumption has not been directly tested in survival prediction settings. This work adapts InterSHAP, a Shapley interaction index-based metric, from classification to Cox proportional hazards models and applies it to quantify cross-modal interactions in glioma survival prediction. Using TCGA-GBM and TCGA-LGG data (n=575), we evaluate four fusion architectures combining whole-slide image (WSI) and RNA-seq features. Our central finding is an inverse relationship between predictive performance and measured interaction: architectures achieving superior discrimination (C-index 0.64$\to$0.82) exhibit equivalent or lower cross-modal interaction (4.8\%$\to$3.0\%). Variance decomposition reveals stable additive contributions across all architectures (WSI${\approx}$40\%, RNA${\approx}$55\%, Interaction${\approx}$4\%), indicating that performance gains arise from complementary signal aggregation rather than learned synergy. These findings provide a practical model auditing tool for comparing fusion strategies, reframe the role of architectural complexity in multimodal fusion, and have implications for privacy-preserving federated deployment.

2603.24626 2026-04-15 q-bio.GN cs.LG stat.ML

A Large-Scale Comparative Analysis of Imputation Methods for Single-Cell RNA Sequencing Data

Yuichiro Iwashita, Ahtisham Fazeel Abbasi, Koichi Kise, Andreas Dengel, Muhammad Nabeel Asim

详情
英文摘要

Background: Single-cell RNA sequencing (scRNA-seq) enables gene expression profiling at cellular resolution but is inherently affected by sparsity caused by dropout events, where expressed genes are recorded as zeros due to technical limitations. These artifacts distort gene expression distributions and compromise downstream analyses. Numerous imputation methods have been proposed to recover latent transcriptional signals. These methods range from traditional statistical models to deep learning (DL)-based methods. However, their comparative performance remains unclear, as existing benchmarks evaluate only a limited subset of methods, datasets, and downstream analyses. Results: We present a comprehensive benchmark of 15 scRNA-seq imputation methods spanning 7 methodological categories, including traditional and DL-based methods. Methods are evaluated across 30 datasets from 10 experimental protocols on 6 downstream analyses. Results show that traditional methods, such as model-based, smoothing-based, and low-rank matrix-based methods, generally outperform DL-based methods, including diffusion-based, GAN-based, GNN-based, and autoencoder-based methods. In addition, strong performance in numerical gene expression recovery does not necessarily translate into improved biological interpretability in downstream analyses, including cell clustering, differential expression analysis, marker gene analysis, trajectory analysis, and cell type annotation. Furthermore, method performance varies substantially across datasets, protocols, and downstream analyses, with no single method consistently outperforming others. Conclusions: Our findings provide practical guidance for selecting imputation methods tailored to specific analytical objectives and underscore the importance of task-specific evaluation when assessing imputation performance in scRNA-seq data analysis.

2603.03866 2026-04-15 cond-mat.stat-mech cond-mat.mes-hall q-bio.MN

Ising Models of Cooperativity in Muscle Contraction

Elaheh Saadat, Matthieu Caruel, Stefano Gherardini, Ilaria Morotti, Matteo Marcello, Marco Caremani, Marco Linari, Ivan Latella, Stefano Ruffo

Comments 11 pages, 6 figures

详情
Journal ref
Phys. Rev. E 113 (4), 044408 (2026)
英文摘要

Regulation of contraction in striated muscle is controlled by a dual mechanism involving both thin filaments containing actin and thick filaments containing myosin. The thin filament is activated by calcium ions binding to troponin, leading to tropomyosin azimuthal displacement which allows the activation of a regulatory unit (composed of one troponin, one tropomyosin and seven actin monomers) that exposes the actin sites for interaction with the myosin motors. Motor attachment to actin contributes to spreading activation within and beyond a regulatory unit along the thin filament through a cooperative mechanism. We introduce a one-dimensional Ising model to elucidate the mechanism of cooperativity in thin filament activation in relation to the force generated by the attached myosin motor. The model characterizes thin filament activation and cooperativity using only two parameters: one related to calcium concentration and the other to the force exerted by the attached myosin motor, which is modulated by temperature. At any force, the model is able to determine the extent of actin-myosin interactions on a correlation length ranging from two to seven actin monomers in addition to the seven actin monomers of the regulatory unit. Our theoretical predictions are successfully tested on experimental data, and our tests also include the condition of hindered filament activation by the use of the specific drug Omecamtiv Mecarbil (OM). According to our model, the effect of OM results in an anti-cooperativity mechanism accounting for the experimental data.

2602.14863 2026-04-15 q-bio.PE

Quasilocalization under coupled mutation-selection dynamics

C. J. Palpal-latoc, Ian Vega

Comments 25 pages, 12 figures, comments welcome

详情
英文摘要

When mutations are rampant, quasispecies theory or Eigen's model predicts that the fittest type in a population may not dominate. Beyond a critical mutation rate, the population may even be delocalized completely from the peak of the fitness landscape and the fittest is ironically lost. Extensive efforts have been made to understand this exceptional scenario. But in general, there is no simple prescription that predicts the eventual degree of localization for arbitrary fitness landscapes and mutation rates. Here, we derive a simple and general relation linking the quasispecies' Hill numbers, which are diversity metrics in ecology, and the ratio of an effective fitness variance to the mean mutation rate squared. This ratio, which we call the localization factor, emerges from mean approximations of decomposed surprisal or stochastic entropy change rates. On the side of application, the relation we obtained here defines a combination of Hill numbers that may complement other complexity or diversity measures for real viral quasispecies. Its advantage being that there is an underlying biological interpretation under Eigen's model.

2602.05971 2026-04-15 cs.CL cs.LG q-bio.NC

Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space

Felipe D. Toro-Hernández, Jesuino Vieira Filho, Rodrigo M. Cabral-Carvalho

Comments 10 pages, 6 figures (excluding refs/appendix). Accepted to ICLR 2026

详情
Journal ref
International Conference on Learning Representations (ICLR) 2026
英文摘要

Semantic representations can be framed as a structured, dynamic knowledge space through which humans navigate to retrieve and manipulate meaning. To investigate how humans traverse this geometry, we introduce a framework that represents concept production as navigation through embedding space. Using different transformer text embedding models, we construct participant-specific semantic trajectories based on cumulative embeddings and extract geometric and dynamical metrics, including distance to next, distance to centroid, entropy, velocity, and acceleration. These measures capture both scalar and directional aspects of semantic navigation, providing a computationally grounded view of semantic representation search as movement in a geometric space. We evaluate the framework on four datasets across different languages, spanning different property generation tasks: Neurodegenerative, Swear verbal fluency, Property listing task in Italian, and in German. Across these contexts, our approach distinguishes between clinical groups and concept types, offering a mathematical framework that requires minimal human intervention compared to typical labor-intensive linguistic pre-processing methods. Comparison with a non-cumulative approach reveals that cumulative embeddings work best for longer trajectories, whereas shorter ones may provide too little context, favoring the non-cumulative alternative. Critically, different embedding models yielded similar results, highlighting similarities between different learned representations despite different training pipelines. By framing semantic navigation as a structured trajectory through embedding space, bridging cognitive modeling with learned representation, thereby establishing a pipeline for quantifying semantic representation dynamics with applications in clinical research, cross-linguistic analysis, and the assessment of artificial cognition.

2512.24427 2026-04-15 q-bio.MN nlin.AO nlin.CD physics.bio-ph q-bio.QM

Epigenetic feedback reshapes dynamical landscapes in gene regulatory networks

Sascha H. Hauck, Sandip Saha, Narsis A. Kiani, Jesper N. Tegner

Comments 18 pages, 7 figures

详情
英文摘要

Understanding how gene regulatory networks (GRNs) give rise to stable and dynamic cellular states remains a central challenge in theoretical biology, particularly when slow epigenetic feedback reshapes the underlying regulatory landscape. While experimental approaches such as single-cell transcriptomics reveal rich dynamical behaviour, a tractable theoretical framework that links gene expression, epigenetic control, and collective dynamics remains challenging. Here, we develop an extended Dynamical Mean Field Theory (DMFT) framework for GRNs that incorporates epigenetic modifications as slow, feedback-driven variables. Building on the analogy between Hopfield networks and spin glass systems, we derive effective stochastic equations that reduce high-dimensional dynamics to a tractable form across multiple timescales. This formulation enables quantitative characterization of both stable and oscillatory regimes and reveals how epigenetic feedback reshapes the effective potential landscape governing cell fate decisions. Our model shows how epigenetic feedback regulation dynamically reshapes the Waddington landscape. Our results and methodology provide a unified theoretical framework for understanding developmental dynamics and epigenetic reprogramming in complex biological systems.

2511.21124 2026-04-15 q-bio.GN

Moonshine.jl: a Julia package for genome-scale model-based ancestral recombination graph inference

Patrick Fournier, Fabrice Larribe

Comments Multiple revisions

详情
Journal ref
Front.Genet. 17 (2026)
英文摘要

The ancestral recombination graph (ARG) is the model of choice in statistical genetics to model population ancestries. Software capable of simulating ARGs on a genome scale within a reasonable amount of time are now widely available for most practical use cases. While the inverse problem of inferring ancestries from a sample of haplotypes has seen major progress in the last decade, it does not enjoy the same level of advancement as its counterpart. Up until recently, even moderately sized samples could only be handled using heuristics. In recent years, the possibility of model-based inference for datasets closer to "real world" scenarios has become a reality, largely due to the development of threading-based samplers. This article introduces Moonshine.jl, a Julia package that has the ability, among other things, to infer ARGs for samples of thousands of human haplotypes of sizes on the order of hundreds of megabases within a reasonable amount of time. On recent hardware, our package is able to infer an ARG for samples of densely haplotyped (over one marker/kilobase) human chromosomes of sizes up to 10000 in well under a day on data simulated by msprime. Scaling up simulation on a compute cluster is straightforward thanks to a strictly single-threaded implementation. While model-based, it does not resort to threading but rather places restrictions on probability distributions typically used in simulation software in order to enforce sample consistency. In addition to being efficient, a strong emphasis is placed on ease of use and integration into the biostatistical software ecosystem.

2509.10547 2026-04-15 q-bio.NC cs.AI cs.LG

Pursuit of biomarkers of brain diseases: Beyond cohort comparisons

Pascal Helson, Arvind Kumar

详情
英文摘要

Despite the diversity and volume of brain data acquired and advanced AI-based algorithms to analyze them, brain features are rarely used in clinics for diagnosis and prognosis. Here we argue that the field continues to rely on cohort comparisons to seek biomarkers, despite the well-established degeneracy of brain features. Using a thought experiment (Brain Swap), we show that more data and more powerful algorithms will not be sufficient to identify biomarkers of brain diseases. We argue that instead of comparing patient versus healthy controls using single data type, we should use multimodal (e.g. brain activity, neurotransmitters, neuromodulators, brain imaging) and longitudinal brain data to guide the grouping before defining multidimensional biomarkers for brain diseases.

2509.02648 2026-04-15 q-bio.GN cs.LG q-bio.QM stat.AP

Optimizing Prognostic Biomarker Discovery in Pancreatic Cancer Through Hybrid Ensemble Feature Selection and Multi-Omics Data

John Zobolas, Anne-Marie George, Alberto López, Sebastian Fischer, Marc Becker, Tero Aittokallio

Comments 52 pages, 5 figures, 9 Supplementary Figures, 1 Supplementary Table

详情
Journal ref
BioData Mining (2026)
英文摘要

Prediction of patient survival using high-dimensional multi-omics data requires systematic feature selection methods that ensure predictive performance, sparsity, and reliability for prognostic biomarker discovery. We developed a hybrid ensemble feature selection (hEFS) approach that combines data subsampling with multiple prognostic models, integrating both embedded and wrapper-based strategies for survival prediction. Omics features are ranked using a voting-theory-inspired aggregation mechanism across models and subsamples, while the optimal number of features is selected via a Pareto front, balancing predictive accuracy and model sparsity without any user-defined thresholds. When applied to multi-omics datasets from three pancreatic cancer cohorts, hEFS identifies significantly fewer and more stable biomarkers compared to the conventional, late-fusion CoxLasso models, while maintaining comparable discrimination performance. Implemented within the open-source mlr3fselect R package, hEFS offers a robust, interpretable, and clinically valuable tool for prognostic modelling and biomarker discovery in high-dimensional survival settings.

2508.12260 2026-04-15 cs.AI q-bio.QM

Mantis: A Foundation Model for Mechanistic Disease Forecasting

Carson Dudley, Reiden Magdaleno, Christopher Harding, Ananya Sharma, Emily Martin, Marisa Eisenberg

Comments 11 pages, 4 figures

详情
英文摘要

Infectious disease forecasting in novel outbreaks or low-resource settings is hampered by the need for large disease and covariate data sets, bespoke training, and expert tuning, all of which can hinder rapid generation of forecasts for new settings. To help address these challenges, we developed Mantis, a foundation model trained entirely on mechanistic simulations, which enables out-of-the-box forecasting across diseases, regions, and outcomes, even in settings with limited historical data. We evaluated Mantis against 78 forecasting models across sixteen diseases with diverse modes of transmission, assessing both point forecast accuracy (mean absolute error) and probabilistic performance (weighted interval score and coverage). Despite using no real-world data during training, Mantis achieved lower mean absolute error than all models in the CDC's COVID-19 Forecast Hub when backtested on early pandemic forecasts which it had not previously seen. Across all other diseases tested, Mantis consistently ranked in the top two models across evaluation metrics. Mantis further generalized to diseases with transmission mechanisms not represented in its training data, demonstrating that it can capture fundamental contagion dynamics rather than memorizing disease-specific patterns. These capabilities illustrate that purely simulation-based foundation models such as Mantis can provide a practical foundation for disease forecasting: general-purpose, accurate, and deployable where traditional models struggle.

2505.15653 2026-04-15 stat.ME q-bio.QM

Quantifying structural uncertainty in chemical reaction network inference

Yong See Foo, Adriana Zanca, Jennifer A. Flegg, Ivo Siekmann

Comments 35 pages, 12 figures

详情
英文摘要

Dynamical systems in biology are complex, and one often does not have comprehensive knowledge about the interactions involved. Chemical reaction network (CRN) inference aims to identify, from observing species concentrations over time, the unknown reactions between the species. Existing approaches such as sparse regularisation largely focus on identifying a single, most likely CRN, without addressing uncertainty about the network structure. However, it is important to quantify structural uncertainty to have confidence in our inference and predictions. In this work, we explore how effective sparse regularisation methods are for quantifying structural uncertainty. Locally optimal solutions to sparse regularisation are mapped to CRN structures; however, it is unclear whether this approach encompasses all plausible CRNs. We find that inducing sparsity with nonconvex penalty functions results in better coverage of the plausible CRNs compared to the popular lasso regularisation. To validate our approach, we apply our methods to real-world data examples, and are able to simultaneously recover reactions proposed across multiple literature sources for a reaction system. Our emphasis on network-level probabilities enables a novel, hierarchical representation of structural ambiguities in the space of CRNs. This representation translates into alternative reaction pathways suggested by the available data, thus guiding the efforts of future experimental design.

2505.10517 2026-04-15 q-bio.QM

A Tutorial on Structural Identifiability of Epidemic Models Using StructuralIdentifiability.jl

Yuganthi R. Liyanage, Omar Saucedo, Necibe Tuncer, Gerardo Chowell

详情
英文摘要

Structural identifiability is the theoretical ability to uniquely recover model parameters from ideal, noise-free data and is a prerequisite for reliable parameter estimation in epidemic modeling. Despite its importance for calibration and inference, structural identifiability analysis remains underused and inconsistently applied in infectious disease modeling. This paper presents a user-oriented methodological tutorial demonstrating how global structural identifiability analysis can be systematically integrated into epidemic modeling workflows. We provide a reproducible framework for conducting structural identifiability analysis of ordinary differential equation models using the Julia package StructuralIdentifiability.jl. The workflow is illustrated across commonly used epidemic models, including SEIR variants with asymptomatic and presymptomatic transmission, vector-borne disease models, and systems incorporating hospitalization and disease-induced mortality. We also introduce a visual communication strategy that embeds identifiability results directly into compartmental diagrams, facilitating interpretation and interdisciplinary communication. Our results show that identifiability depends critically on model structure, the choice of observed variables, and assumptions about initial conditions, and that identifiable parameter combinations may exist even when individual parameters are not globally identifiable. Emphasizing transparent implementation, interpretation, and communication, this work provides practical guidance and comparative insights across model classes. The tutorial is designed as both a reference and a teaching resource for researchers and educators seeking to incorporate structural identifiability analysis into epidemic model development. All code and annotated diagrams are publicly available to ensure reproducibility and reuse.