arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.02212 2026-04-03 q-bio.NC

Phase estimation with autoregressive padding (PEAP): addressing inaccuracies and biases in EEG analysis

Miriam Kirchhoff, Johanna Rösch, Maria Ermolova, Oskari Ahola, Sarah Harders, Juliana Hougland, Ulf Ziemann

详情
英文摘要

Accurate phase estimation at the edge of data segments is crucial for EEG applications such as EEG-TMS in offline and real-time data analysis. Our research evaluates the phase estimation performance of four commonly used methods (Phastimate, SSPE, ETP, and PhastPadding) for accuracy and systemic biases, using data from young and elderly healthy controls and chronic stroke participants. To address the identified limitations of the established methods, we introduce Phase Estimation with Autoregressive Padding (PEAP), a method that prevents strong bandpass filtering-induced artifacts. Contrary to the established methods, PEAP does not show significant biases and improves accuracy by 3.2 to 9.2% for the continuous phase estimation. Our offline analysis demonstrates how established methods are systematically biased towards some estimates and how they induce phase shifts. We also show that differences between methods do not vary between clinical and control populations, supporting their translatability. This work indicates that systematic biases in established phase estimation methods may compromise the validity and comparability of phase-dependent findings. PEAP addresses these limitations and thus offers a more reliable and more accurate alternative method.

2604.02166 2026-04-03 physics.ins-det physics.bio-ph q-bio.BM

Data Sieving for Scalable Real-Time Multichannel Nanopore Sensing

Matteo Cartiglia, Natan Biesmans, Wannes Peeters, Wouter Botermans, Koen Ongena, Liam Vandekerckhove, Wouter Renckens, Eric Beamish, Elizabeth Skelly, Kirill A. Afonin, Pol van Dorpe, Sanjin Marion

Comments 28 pages, 5 figures

详情
英文摘要

High-throughput solid-state nanopore experiments generate continuous MHz-rate data streams in which only a small fraction of data contains informative molecular information. This creates storage and processing bottlenecks that limit experimental scalability. We introduce Data Sieving, a GPU-accelerated acquisition framework that integrates real-time event detection directly into the measurement pipeline and selectively stores and allows real-time analysis of snapshots around molecular translocations. The system employs a lightweight rolling-average and min-max trigger to identify event candidates in parallel across channels. This architecture reduces stored data volume by up to 98% while preserving complete molecular signatures across a wide temporal range, from microsecond-scale protein dynamics to second-scale nucleic acid nanoparticle events. Continuous baseline monitoring enables autonomous closed-loop actuation; in high-concentration DNA experiments, automatic declogging restored pore conductance, reducing the time spent in a non-productive clogged state to near-zero and without interrupting parallel measurements. Validated across DNA, protein, and nucleic acid nanoparticle measurements, Data Sieving links data storage directly to molecular information content rather than experiment duration, enabling scalable, real-time operation of parallel nanopore sensors. The approach provides a hardware-agnostic foundation for long-duration, high-bandwidth single-molecule experiments and other event-driven sensing platforms. By using algorithms intrinsically compatible with low-latency digital architectures, this framework provides a clear path toward high-bandwidth, highly multiplexed recording across hundreds of individual nanopore channels in both solid-state and biological pores.

2603.28764 2026-04-03 cs.LG cs.AI math.DG q-bio.NC

Geometry-aware similarity metrics for neural representations on Riemannian and statistical manifolds

N Alex Cayco-Gajic, Arthur Pellegrino

详情
英文摘要

Similarity measures are widely used to interpret the representational geometries used by neural networks to solve tasks. Yet, because existing methods compare the extrinsic geometry of representations in state space, rather than their intrinsic geometry, they may fail to capture subtle yet crucial distinctions between fundamentally different neural network solutions. Here, we introduce metric similarity analysis (MSA), a novel method which leverages tools from Riemannian geometry to compare the intrinsic geometry of neural representations under the manifold hypothesis. We show that MSA can be used to i) disentangle features of neural computations in deep networks with different learning regimes, ii) compare nonlinear dynamics, and iii) investigate diffusion models. Hence, we introduce a mathematically grounded and broadly applicable framework to understand the mechanisms behind neural computations by comparing their intrinsic geometries.

2603.16880 2026-04-03 eess.SP cs.CL cs.LG q-bio.NC

NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectro-Spatial Grounding and Temporal State-Space Reasoning

Guoan Wang, Shihao Yang, Jun-en Ding, Hao Zhu, Feng Liu

详情
英文摘要

Electroencephalography (EEG) provides a non-invasive window into neural dynamics at high temporal resolution and plays a pivotal role in clinical neuroscience research. Despite this potential, prevailing computational approaches to EEG analysis remain largely confined to task-specific classification objectives or coarse-grained pattern recognition, offering limited support for clinically meaningful interpretation. To address these limitations, we introduce NeuroNarrator, the first generalist EEG-to-text foundation model designed to translate electrophysiological segments into precise clinical narratives. A cornerstone of this framework is the curation of NeuroCorpus-160K, the first harmonized large-scale resource pairing over 160,000 EEG segments with structured, clinically grounded natural-language descriptions. Our architecture first aligns temporal EEG waveforms with spatial topographic maps via a rigorous contrastive objective, establishing spectro-spatially grounded representations. Building on this grounding, we condition a Large Language Model through a state-space-inspired formulation that integrates historical temporal and spectral context to support coherent clinical narrative generation. This approach establishes a principled bridge between continuous signal dynamics and discrete clinical language, enabling interpretable narrative generation that facilitates expert interpretation and supports clinical reporting workflows. Extensive evaluations across diverse benchmarks and zero-shot transfer tasks highlight NeuroNarrator's capacity to integrate temporal, spectral, and spatial dynamics, positioning it as a foundational framework for time-frequency-aware, open-ended clinical interpretation of electrophysiological data.

2510.16082 2026-04-03 q-bio.QM cs.AI cs.LG

BIOGEN: Evidence-Grounded Multi-Agent Reasoning Framework for Transcriptomic Interpretation in Antimicrobial Resistance

Elias Hossain, Mehrdad Shoeibi, Ivan Garibay, Niloofar Yousefi

详情
英文摘要

Interpreting gene clusters from RNA sequencing (RNA-seq) remains challenging, especially in antimicrobial resistance studies where mechanistic insight is important for hypothesis generation. Existing pathway enrichment methods can summarize co-expressed modules, but they often provide limited cluster-specific explanations and weak connections to supporting literature. We present BIOGEN, an evidence-grounded multi-agent framework for post hoc interpretation of RNA-seq transcriptional modules. BIOGEN combines biomedical retrieval, structured reasoning, and multi-critic verification to generate traceable cluster-level explanations with explicit evidence and confidence labels. On a primary Salmonella enterica dataset, BIOGEN achieved strong biological grounding, including BERTScore 0.689, Semantic Alignment Score 0.715, KEGG Functional Similarity 0.342, and a hallucination rate of 0.000, compared with 0.100 for an LLM-only baseline. Across four additional bacterial RNA-seq datasets, BIOGEN also maintained zero hallucination under the same fixed pipeline. In comparisons with representative open-source agentic AI baselines, BIOGEN was the only framework that consistently preserved zero hallucination across all five datasets. These findings suggest that retrieval alone is not enough for reliable biological interpretation, and that evidence-grounded orchestration is important for transparent and source-traceable transcriptomic reasoning.

2509.07013 2026-04-03 cs.LG q-bio.PE stat.ME

Generalized Machine Learning for Fast Calibration of Agent-Based Epidemic Models

Sima Najafzadehkhoei, George Vega Yon, Derek S. Meyer, Bernardo Modenesi

详情
英文摘要

Agent-based models (ABMs) are widely used to study infectious disease dynamics, but their calibration is often computationally intensive, limiting their applicability in time-sensitive public health settings. We propose DeepIMC (Deep Inverse Mapping Calibration), a machine learning-based calibration framework that directly learns the inverse mapping from epidemic time series to epidemiological parameters. DeepIMC trains a bidirectional Long Short-Term Memory (BiLSTM) neural network on synthetic epidemic trajectories generated from agent-based models such as the Susceptible-Infected-Recovered (SIR) model, enabling rapid parameter estimation without repeated simulation at inference time. We evaluate DeepIMC through an extensive simulation study comprising 5,000 heterogeneous epidemic scenarios and benchmark its performance against Approximate Bayesian Computation (ABC) using likelihood-free Markov Chain Monte Carlo. The results show that DeepIMC substantially improves parameter recovery accuracy, produces sharp and well-calibrated predictive intervals, and reduces computational time by more than an order of magnitude relative to ABC. Although structural parameter identifiability constraints limit the precise recovery of all model parameters simultaneously, the calibrated models reliably reproduce epidemic trajectories and support accurate forward prediction with their estimated parameters. DeepIMC is implemented in the open-source R package epiworldRCalibrate, facilitating practical adoption for real-time epidemic modeling and policy analysis. Overall, our findings demonstrate that DeepIMC provides a scalable, operationally effective alternative to traditional simulation-based calibration methods for agent-based epidemic models.

2503.03485 2026-04-03 cs.LG q-bio.QM

TEDDY: A Family Of Foundation Models For Understanding Single Cell Biology

Alexis Chevalier, Soumya Ghosh, Urvi Awasthi, James Watkins, Julia Bieniewska, Nichita Mitrea, Olga Kotova, Kirill Shkura, Andrew Noble, Michael Steinbaugh, Vijay Sadashivaiah, George Dasoulas, Julien Delile, Christoph Meier, Leonid Zhukov, Iya Khalil, Srayanta Mukherjee, Judith Mueller

Comments ICML 2025 Generative AI and Biology (GenBio) Workshop

详情
英文摘要

Understanding the biological mechanisms of disease is crucial for medicine, and in particular, for drug discovery. AI-powered analysis of genome-scale biological data holds great potential in this regard. The increasing availability of single-cell RNA sequencing data has enabled the development of large foundation models for disease biology. However, existing foundation models only modestly improve over task-specific models in downstream applications. Here, we explored two avenues for improving single-cell foundation models. First, we scaled the pre-training data to a diverse collection of 116 million cells, which is larger than those used by previous models. Second, we leveraged the availability of large-scale biological annotations as a form of supervision during pre-training. We trained the \model family of models comprising six transformer-based state-of-the-art single-cell foundation models with 70 million, 160 million, and 400 million parameters. We vetted our models on several downstream evaluation tasks, including identifying the underlying disease state of held-out donors not seen during training, distinguishing between diseased and healthy cells for disease conditions and donors not seen during training, and probing the learned representations for known biology. Our models showed substantial improvement over existing works, and scaling experiments showed that performance improved predictably with both data volume and parameter count.

2405.14690 2026-04-03 q-bio.QM stat.AP

Beyond Scalar Metrics: Functional Data Analysis of Postprandial Continuous Glucose Monitoring in the AEGIS Study

Marcos Matabuena, Joe Sartini, Francisco Gude

详情
英文摘要

Postprandial glucose collected through continuous glucose monitoring (CGM) provides critical information for assessing metabolic capacity and guiding dietary recommendations. Traditional approaches summarize these data into scalar measures, such as 2-hour AUC or peak glucose, potentially overlooking temporal dynamics. We propose analyzing entire CGM trajectories using multilevel functional data analysis (FDA), which accounts for the smooth, hierarchical nature of glucose responses. Applying these methods to AEGIS study participants without diabetes, we illustrate how FDA characterizes variability in postprandial responses and links dietary/patient characteristics to glucose dynamics. We further extend the r-square metric to hierarchical functional models to quantify explanatory power. Our results show that dietary effects vary across the 6-hour postprandial window-for example, fiber blunts responses after 90 minutes, while fats reduce early rises within 50 minutes. Moreover, metabolic responses differ between normoglycemic and prediabetic individuals. These findings demonstrate that functional approaches reveal temporal and stratified insights into postprandial glucose regulation that scalar methods cannot capture.

2604.02057 2026-04-03 q-bio.NC cond-mat.dis-nn nlin.AO physics.bio-ph physics.data-an

Thermodynamic connectivity reveals functional specialization and multiplex organization of extrasynaptic signaling

Giridhar Sunil, Habib Benali, Elkaïoum M. Moutuou

详情
英文摘要

Neural communication operates on both fast synaptic transmission and slower, diffusive extrasynaptic signaling, yet how these two modes jointly organize brain function remains unclear. Here, using the complete synaptic and neuropeptidergic connectomes of \emph{Caenorhabditis elegans}, we develop a unified multiplex framework linking anatomical wiring to functional communication. We infer structure-derived functional connectivity from the synaptic connectome using equilibrium principles from statistical physics, yielding a probabilistic map of information flow across all synaptic pathways, and compare this functional layer directly with the extrasynaptic connectome. This reveals a principled functional specialization across four communication regimes: (i) a topology-dependent layer that reinforces and stabilizes synaptic motor circuits, (ii) a topology-resilient modulatory layer supporting global regulation and behavioral state control, (iii) a purely extrasynaptic network sustaining survival and homeostasis, and (iv) a purely synaptic regime mediating rapid, low-latency sensorimotor processing. Together, these findings reveal that synaptic and extrasynaptic signaling form complementary architectures optimized for speed, modulation, robustness, and survival, and provide a general strategy for integrating structural and modulatory connectomes to understand how distinct communication modes cooperate to sustain coherent brain function.

2604.01990 2026-04-03 q-bio.QM

Evaluating Deep Surrogate Models for Knee Joint Contact Mechanics Under Input-Limited Conditions

Zhengye Pan, Jianwei Zuo, Jiajia Luo

详情
英文摘要

Background and Objective: Accurate surrogate modeling of knee joint contact mechanics is important for reconstructing stress distributions and identifying risk-relevant regions, yet the relative suitability of different modeling paradigms under practically relevant input-limited conditions remains unclear. Methods: Nine male soccer players performed 90° change-of-direction trials. Finite element simulations driven by subject-specific joint posture and reaction forces were converted into graph-structured samples. Five surrogate architectures representing local diffusion, history-context enhancement, hierarchical multi-scale modeling, explicit global interaction, and local-global hybridization were compared using three-fold cross-subject validation under full, pose-corrupted, load-corrupted, and minimal-input conditions. Performance was evaluated using full-field error, high-stress error, high-risk region overlap, and hotspot localization metrics. Results: The hybrid model achieved the best overall performance under full inputs and remained the most robust under pose- and load-corrupted conditions. Under minimal inputs, no single model dominated all metrics: the history-context model yielded lower overall and high-stress errors, the hybrid model better preserved high-risk region reconstruction, and the hierarchical model showed an advantage in hotspot localization. Conclusion: Evaluation of surrogate models for knee joint contact mechanics should shift from accuracy comparisons under ideal inputs to a comprehensive assessment of the preservation of risk-relevant information under realistic input constraints. Although the local-global hybrid model showed the best overall robustness, the optimal model under minimal-input conditions remained task-dependent.

2604.01734 2026-04-03 q-bio.QM stat.ME

A Novel Multi-view Mixture Model Framework for Longitudinal Clustering with Application to ANCA-Associated Vasculitis

Shen Jia, David Selby, Mark A Little, Tin Lok James Ng

详情
英文摘要

Effectively modeling irregularly sampled longitudinal data is essential for understanding disease progression and improving risk prediction. We propose a two-view mixture model that integrates static baseline covariates and longitudinal biomarker trajectories within a unified probabilistic clustering framework. Temporal patterns are modeled using Neural Ordinary Differential Equations. Model training uses an EM algorithm with a sparsity-inducing log-penalty for interpretable subgroup discovery. Application of the model to an Irish cohort of ANCA-associated vasculitis patients reveals subgroups with heterogeneous serum creatinine trajectories and variation in end-stage kidney disease outcomes.

2604.01435 2026-04-03 cond-mat.soft cond-mat.stat-mech q-bio.SC

Osmotically Induced Shape Changes in Membrane Vesicles

Rajiv G Pereira, Biswaroop Mukherjee, Sanjeev Gautam, Mattiangelo D'Agnese, Subhadip Biswas, Rachel Meeker, Buddhapriya Chakrabarti

Comments 13 pages, 9 figures

详情
英文摘要

We develop a self-consistent free-energy framework in which membrane shape and osmotic pressure are determined simultaneously in a finite reservoir by minimizing bending elasticity and solute entropy. Solute conservation makes osmotic pressure a thermodynamic variable rather than an externally prescribed parameter, producing a nonlinear coupling between membrane mechanics and solvent entropy. This coupling modifies the classical stability condition for spherical vesicles: instability emerges from global free-energy competition rather than the linear Helfrich stability criterion. The resulting critical pressures differ by orders of magnitude from Helfrich predictions and agree with simulations for small and large unilamellar vesicles. The framework is relevant to cellular environments involving biomolecular condensate confinement as well as synthetic vesicles and the development of osmotic-pressure-driven encapsulation platforms.

2604.01385 2026-04-03 q-bio.QM

Strategies for tumor elimination and control under immune evasion and chemotherapy resistance

Nazanin Mokari, Bryce Morsky

Comments 27 pages, 11 figures. Mathematical oncology model of immune evasion and chemotherapy resistance

详情
英文摘要

The evolutionary and ecological dynamics of tumors under immune responses and therapeutic interventions pose major challenges to long-term treatment success. Although treatment may initially achieve short-term disease control, resistant cancer cell subpopulations often arise, leading to relapse with more aggressive and treatment-resistant forms of the disease. Here, we develop and analyze mathematical models describing the interactions among effector cells, chemo-resistant tumor cells, and immuno-resistant tumor cells under distinct immune-evasion strategies. The models incorporate competition and cooperation between resistant and sensitive tumor subpopulations. We identify threshold conditions governing tumor persistence, elimination, and phenotype dominance under varying therapeutic intensities. These findings provide a theoretical framework for designing targeted and combination therapies and offer insights into strategies for mitigating the treatment resistance.

2604.01362 2026-04-03 cs.IT eess.SP math.IT q-bio.QM

Multipath Channel Metrics and Detection in Vascular Molecular Communication: A Wireless-Inspired Perspective

Timo Jakumeit, Lukas Brand, Josep M. Jornet, Robert Schober, Maximilian Schäfer, Sebastian Lotter

Comments 7 pages, 3 figures; This paper has been submitted to the IEEE Global Communications Conference (GLOBECOM) 2026

详情
英文摘要

Motivated by classical communications engineering, early works in molecular communication (MC) largely adopted established modeling and signal processing concepts from wireless electromagnetic communication systems. In the context of the human cardiovascular system (CVS), MC channel models evolved from simple unbounded and single-duct environments mimicking individual blood vessels to complex vessel network (VN) topologies, generally at the expense of analytical tractability. Up until now, this has largely prohibited rigorous communication-theoretic analysis of large-scale VNs. In this work, we leverage a recently established closed-form analytical channel model for VNs, named mixture of inverse Gaussians for hemodynamic transport (MIGHT), to conduct the first systematic communication-theoretic study of MC in complex, large-scale VNs. Based on MIGHT, we derive a Poisson channel noise model and unveil structural analogies between multipath wireless communications (MWC) and advective-diffusive MC in VNs. In particular, we establish classical MWC metrics, namely the root mean squared (RMS) delay spread, the mean excess delay, and the coherence bandwidth, for MC in VNs and derive closed-form expressions for the channel frequency response and power delay profile (PDP). Building on this characterization, we propose a VN-adapted, coherent decision-feedback (DF) detector and show how the derived multipath metrics can inform the choice of critical system parameters like the symbol duration, the sampling time, and the memory length. Additionally, we evaluate the detector's performance in different VNs exhibiting inter-symbol interference (ISI). Together, these contributions open the door to a systematic, MWC-inspired MC system design for large-scale VNs.

2604.01357 2026-04-03 math.AP q-bio.CB

Cell Migration Boundary Motion in Drosophila Egg Chambers: A Combined Phase Field and Chemoattractant Model

Naghmeh Akhavan, Alexander George, Michelle Starz-Gaiano, Bradford E. Peercy

Comments 15 pages, 7 figures

详情
英文摘要

In the Drosophila melanogaster egg chamber, the collective migration of border cells toward the oocyte is guided by spatial gradients of chemoattractants. While cellular responses to these cues are well characterized, the spatial distribution of chemoattractant within the tissue remains difficult to measure experimentally due to imaging limitations and extracellular complexity. In this study, we develop a spatially resolved mathematical framework to model local chemoattractant concentrations during border cell migration. We use a phase-field approach to represent the egg chamber geometry and define a diffusion-reaction system with spatially heterogeneous diffusivity that accounts for confinement by cellular domains. This framework allows chemoattractant diffusion to be restricted to extracellular space while remaining excluded from the interiors of nurse cells, the border cell cluster, and the oocyte, similar to what we observe in vivo. We simulate secretion from the oocyte and degradation throughout the domain, showing how geometry shapes the distribution of signaling molecules. We further couple this chemical field to a mechanical model of cluster migration that includes a tangential interface migration (TIM) force, allowing the cluster to respond to both chemoattractant gradients and cell-cell contact. Our results show that signal localization and tissue geometry jointly influence directional persistence and the speed of migration. Notably, geometric bottlenecks and intersections can flatten local gradients and slow migration, consistent with experimental observations. This modeling framework offers a tool to investigate how biophysical constraints shape signaling environments and guide collective cell movement in vivo.

2604.01252 2026-04-03 q-bio.QM

A Data-Driven Measure of REM Sleep Propensity for Human and Rodent Sleep

Naghmeh Akhavan, Alexander G. Ginsberg, Madelyn E. C. Cruz, Yunxi Yan, Shelby R. Stowe, Dinesh Pal, Franz Weber, Cecilia G. Diniz Behn, Victoria Booth

Comments 33 pages, 15 figures

详情
英文摘要

Mammalian sleep is characterized by multiple alternations between episodes of rapid-eye-movement sleep (REMS) and non-REM sleep (NREMS). While the mechanisms governing the timing of these ultradian NREMS-REMS cycles remain poorly understood, the phenomenon of REMS pressure, namely a drive for REMS that builds up between REMS episodes, is thought to be a contributing factor. Prior analyses of NREMS-REMS cycles in mice has suggested that time in NREMS is a primary contributor to REMS pressure. Building on that finding, we previously introduced a REMS propensity measure defined as the probability to enter REMS before the accumulation of an additional amount of NREMS. Analyzing mouse ultradian cycle data, we showed that REMS propensity at REMS onset was positively correlated with REMS bout duration and with the probability of the occurrence of a REMS bout followed by a short inter-REMS interval, called a sequential REMS cycle. In this paper, we extend our analyses of REMS propensity to human and rat ultradian NREMS-REMS cycle data. We show that, as in mice, human and rat sleep contain both short NREMS-REMS sequential cycles and longer single NREMS-REMS cycles, though there are some differences in the relative distributions of cycle durations. Although rodents exhibit polyphasic sleep in contrast with the consolidated sleep of humans, the calculated REMS propensity measures in all three species show similar profiles as functions of time spent in NREMS: specifically, REMS propensity increases with time spent in NREMS until it reaches a peak value, and then it decays with additional time in NREMS. Positive correlations of REMS propensity at REMS onset with REMS bout duration were present in both human and rat data as in mouse data, suggesting that time spent in NREMS also influences REMS duration in these species.

2603.02491 2026-04-03 cs.LG cs.AI cs.RO q-bio.NC stat.ML

What Capable Agents Must Know: Selection Theorems for Robust Decision-Making under Uncertainty

Aran Nayebi

Comments 23 pages; added PSR recovery (Theorems 3 & 4), and updated related work

详情
英文摘要

As artificial agents become increasingly capable, what internal structure is *necessary* for an agent to act competently under uncertainty? Classical results show that optimal control can be *implemented* using belief states or world models, but not that such representations are required. We prove quantitative "selection theorems" showing that strong task performance (low *average-case regret*) forces world models, belief-like memory and -- under task mixtures -- persistent variables resembling core primitives associated with emotion, along with informational modularity under block-structured tasks. Our results cover stochastic policies, partial observability, and evaluation under task distributions, without assuming optimality, determinism, or access to an explicit model. Technically, we reduce predictive modeling to binary "betting" decisions and show that regret bounds limit probability mass on suboptimal bets, enforcing the predictive distinctions needed to separate high-margin outcomes. In fully observed settings, this yields approximate recovery of the interventional transition kernel; under partial observability, it implies necessity of predictive state and belief-like memory, addressing an open question in prior world-model recovery work.

2511.22828 2026-04-03 cs.AI q-bio.NC

Fast dynamical similarity analysis

Arman Behrad, Mitchell Ostrow, Mohammad Taha Fakharian, Ila Fiete, Christian Beste, Shervin Safavi

详情
英文摘要

Understanding how nonlinear dynamical systems (e.g., artificial neural networks and neural circuits) process information requires comparing their underlying dynamics at scale, across diverse architectures and large neural recordings. While many similarity metrics exist, current approaches fall short for large-scale comparisons. Geometric methods are computationally efficient but fail to capture governing dynamics, limiting their accuracy. In contrast, traditional dynamical similarity methods are faithful to system dynamics but are often computationally prohibitive. We bridge this gap by combining the efficiency of geometric approaches with the fidelity of dynamical methods. We introduce fast dynamical similarity analysis (fastDSA), a computationally efficient and accurate metric for measuring (dis)similarity between nonlinear dynamical systems. FastDSA leverages modern computational tools, including random matrix theory to determine optimal system rank, novel optimization pipelines for aligning system flow fields, and Koopman embeddings. Across benchmark nonlinear systems and recurrent network models, fastDSA is robust to arbitrary coordinate choices while remaining sensitive to meaningful dynamical differences, capturing variations in system evolution that geometric methods may miss and traditional methods detect only at high computational cost. To our knowledge, fastDSA is the fastest method that retains accuracy in comparing nonlinear dynamical systems. It enables scalable, statistical analyses across diverse systems, significantly expanding the practical applicability of dynamical similarity analysis.

2509.04718 2026-04-03 stat.ME physics.data-an q-bio.QM

When correcting for regression to the mean is worse than no correction at all

José F. Fontanari, Mauro Santos

详情
Journal ref
Am. Nat. (2026)
英文摘要

The ubiquitous regression to the mean (RTM) effect complicates statistical inference regarding the relationship between baseline levels of a biological variable and its subsequent change. We demonstrate that common RTM correction methods are problematic: the Berry et al. method, popularized by Kelly & Price in The American Naturalist, is unreliable for hypothesis testing or effect-size estimation, leading to systematic bias and inflated error rates. Conversely, while the Blomqvist method is theoretically unbiased, its high sampling variance limits its practical utility in small-to-moderate datasets. Using a structural linear model, we show that the most robust approach to navigating RTM is not to correct the data, but to evaluate the uncorrected crude slope against a structural null expectation derived from measurement repeatability-the proportion of total variance attributable to true individual differences. We illustrate this approach using empirical data from studies on lizard thermal physiology and bird telomere dynamics. Ultimately, we argue that any conclusion regarding a differential treatment effect is statistically unfounded without a clear understanding of the experiment's repeatability.

2507.20598 2026-04-03 stat.ME q-bio.GN stat.AP

Nullstrap-DE: A General Framework for Calibrating FDR and Preserving Power in DE Methods, with Applications to DESeq2 and edgeR

Chenxin Jiang, Changhu Wang, Jingyi Jessica Li

详情
英文摘要

Differential expression (DE) analysis is a key task in RNA-seq studies, aiming to identify genes with expression differences across conditions. A central challenge is balancing false discovery rate (FDR) control with statistical power. Parametric methods such as DESeq2 and edgeR achieve high power by modeling gene-level counts using negative binomial distributions and applying empirical Bayes shrinkage. However, these methods may suffer from FDR inflation when model assumptions are mildly violated, especially in large-sample settings. In contrast, non-parametric tests like Wilcoxon offer more robust FDR control but often lack power and do not support covariate adjustment. We propose Nullstrap-DE, a general add-on framework that combines the strengths of both approaches. Designed to augment tools like DESeq2 and edgeR, Nullstrap-DE calibrates FDR while preserving power, without modifying the original method's implementation. It generates synthetic null data from a model fitted under the gene-specific null (no DE), applies the same test statistic to both observed and synthetic data, and derives a threshold that satisfies the target FDR level. We show theoretically that Nullstrap-DE asymptotically controls FDR while maintaining power consistency. Simulations confirm that it achieves reliable FDR control and high power across diverse settings, where DESeq2, edgeR, or Wilcoxon often show inflated FDR or low power. Applications to real datasets show that Nullstrap-DE enhances statistical rigor and identifies biologically meaningful genes.

2503.03126 2026-04-03 physics.bio-ph q-bio.CB q-bio.QM q-bio.TO

Controlling tissue size by active fracture

Wei Wang, Brian A. Camley

Comments 21 pages, 13 figures, 1 table

详情
Journal ref
Phys. Rev. E 113, 034405 (2026)
英文摘要

Groups of cells, including clusters of cancerous cells, multicellular organisms, and developing organs, may both grow and break apart. What physical factors control these fractures? In these processes, what sets the eventual size of clusters? We first develop a one-dimensional framework for understanding cell clusters that can fragment due to cell motility using an active particle model. We compute analytically how the break rate of cell-cell junctions depends on cell speed, cell persistence, and cell-cell junction properties. Next, we find the cluster size distributions, which differ depending on whether all cells can divide or only the cells on the edge of the cluster divide. Cluster size distributions depend solely on the ratio of the break rate to the growth rate - allowing us to predict how cluster size and variability depend on cell motility and cell-cell mechanics. Our results suggest that organisms can achieve better size control when cell division is restricted to the cluster boundaries or when fracture can be localized to the cluster center. Additionally, we derive a universal survival probability for an intact cluster $S(t)=\mathrm{e}^{-k_d t}$ at steady state if all cells can divide, which is independent of the rupture kinetics and depends solely on the cell division rate $k_d$. Finally, we further corroborate the one-dimensional analytics with two-dimensional simulations, finding quantitative agreement with some - but not all - elements of the theory across a wide range of cell motility. Our results link the general physics problem of a collective active escape over a barrier to size control, providing a quantitative measure of how motility can regulate organ or organism size.

2410.11548 2026-04-03 q-bio.PE

Bayesian inference of mixed Gaussian phylogenetic models

Bayu Brahmantio, Krzysztof Bartoszek, Etka Yapar

详情
Journal ref
BMC Bioinformatics 27 (Suppl 1), 77 (2026)
英文摘要

Background: Continuous traits evolution of a group of taxa that are correlated through a phylogenetic tree is commonly modelled using parametric stochastic differential equations to represent deterministic change of trait through time, while incorporating noises that represent different unobservable evolutionary pressures. Often times, a heterogeneous Gaussian process that consists of multiple parametric sub-processes is often used when the observed data come from a very diverse set of taxa. In the maximum likelihood setting, challenges can be found when exploring the involved likelihood surface and when interpreting the uncertainty around the parameters. Results: We extend the methods to tackle inference problems for mixed Gaussian phylogenetic models (MGPMs) by implementing a Bayesian scheme that can take into account biologically relevant priors. The posterior inference method is based on the Population Monte Carlo (PMC) algorithm that are easily parallelized, and using an efficient algorithm to calculate the likelihood of phylogenetically correlated observations. A model evaluation method that is based on the proximity of the posterior predictive distribution to the observed data is also implemented. Simulation study is done to test the inference and evaluation capability of the method. Finally, we test our method on a real-world dataset. Conclusion: We implement the method in the R package bgphy, available at github.com/bayubeta/bgphy. Simulation study demonstrates that the method is able to infer parameters and evaluate models properly, while its implementation on the real-world dataset indicates that a carefully selected model of evolution based on naturally occurring classifications results in a better fit to the observed data.

2410.07006 2026-04-03 q-bio.GN

The Mitochondrial Genome of Cathaya argyrophylla Reaches 18.99 Mb: Analysis of Super-Large Mitochondrial Genomes in Pinaceae

Kerui Huang, Wenbo Xu, Haoliang Hu, Xiaolong Jiang, Lei Sun, Wenyan Zhao, Binbin Long, Shaogang Fan, Zhibo Zhou, Ping Mo, Xiaocheng Jiang, Jianhong Tian, Aihua Deng, Peng Xie, Yun Wang

Comments 22 pages, 9 figures

详情
英文摘要

Mitochondrial genomes in the Pinaceae family are notable for their large size and structural complexity. In this study, we sequenced and analyzed the mitochondrial genome of Cathaya argyrophylla, an endangered and endemic Pinaceae species, uncovering a genome size of 18.99 Mb, meaning the largest mitochondrial genome reported to date. To investigate the mechanisms behind this exceptional size, we conducted comparative analyses with other Pinaceae species possessing both large and small mitochondrial genomes, as well as with other gymnosperms. We focused on repeat sequences, transposable element activity, RNA editing events, chloroplast-derived sequence transfers (mtpts), and sequence homology with nuclear genomes. Our findings indicate that while Cathaya argyrophylla and other extremely large Pinaceae mitochondrial genomes contain substantial amounts of repeat sequences and show increased activity of LINEs and LTR retrotransposons, these factors alone do not fully account for the genome expansion. Notably, we observed a significant incorporation of chloroplast-derived sequences in Cathaya argyrophylla and other large mitochondrial genomes, suggesting that extensive plastid-to-mitochondrial DNA transfer may play a crucial role in genome enlargement. Additionally, large mitochondrial genomes exhibited distinct patterns of RNA editing and limited similarity with nuclear genomes compared to smaller genomes. These results suggest that the massive mitochondrial genomes in Pinaceae are likely the result of multiple contributing factors, including repeat sequences, transposon activity, and extensive plastid sequence incorporation. Our study enhances the understanding of mitochondrial genome evolution in plants and provides valuable genetic information for the conservation and study of Cathaya argyrophylla.