arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.08345 2026-03-10 stat.ME q-bio.QM

Amortized Phylodynamic Inference with Neural Bayes Estimators and Recursive Neural Networks

Alexander E. Zarebski, Thomas Williams, Louis du Plessis

详情
英文摘要

Phylodynamics is used to estimate epidemic dynamics from phylogenetic trees or genomic sequences of pathogens, but the likelihood calculations needed can be challenging for complex models. We present a neural Bayes estimator (NBE) for key epidemic quantities: the reproduction number, prevalence, and cumulative infections through time. By performing quantile regression over tree space, the NBE allows us to estimate posterior medians and credible intervals directly from a reconstructed tree. Our approach uses a recursive neural network as a tree embedding network with a prediction network conditioned on time and quantile level to generate the estimates. In simulation studies, the NBE achieves good predictive performance, with conservative uncertainty estimates. Compared with a BEAST2 fixed-tree analysis, the NBE gives less biased estimates of time-varying reproduction numbers in our test setting. Under a misspecified sampling model, the NBE performance degrades (as expected) but remains reasonable, and fine-tuning a pre-trained model yields estimates comparable to those from a model trained from scratch, at substantially lower computational cost.

2603.08300 2026-03-10 cond-mat.soft cond-mat.mtrl-sci cond-mat.stat-mech q-bio.BM

A thermodynamic metric quantitatively predicts disordered protein partitioning and multicomponent phase behavior

Zhuang Liu, Beijia Yuan, Mihir Rao, Gautam Reddy, William M. Jacobs

Comments Includes Supplementary Information

详情
英文摘要

Intrinsically disordered regions (IDRs) of proteins mediate sequence-specific interactions underlying diverse cellular processes, including the formation of biomolecular condensates. Although IDRs strongly influence condensate compositions, quantitative frameworks that predict and explain their phase behavior in complex mixtures remain lacking. Here we introduce a thermodynamic model that quantitatively predicts the behavior of arbitrary combinations of IDRs across a wide range of concentrations, with accuracy comparable to state-of-the-art simulations. The model learns low-dimensional, context-independent representations of IDR sequences that combine to form mixture representations, producing context-dependent interactions. These representations define a thermodynamic metric space in which distances between IDRs correspond directly to differences in their thermodynamic properties. We show that the model predicts multicomponent phase diagrams in quantitative agreement with molecular simulations without being trained on free-energy or phase-coexistence data. The metric space provides geometrically intuitive predictions of IDR partitioning, multicomponent condensation, and context-dependent mutational effects, addressing several central problems in IDR biophysics within a single model. Systematic interrogation of the learned representations reveals how amino-acid composition and sequence patterning jointly determine mixture thermodynamics. Together, our results establish a unified and interpretable framework for predicting and understanding the behavior of complex mixtures of IDRs and other sequence-dependent biomolecules.

2602.23885 2026-03-10 q-bio.PE

Bounds on $R_0$ and final epidemic size when the next-generation matrix $M$ is only partially known

Andrea Bizzotto, Frank Ball, Tom Britton

详情
英文摘要

We study a multitype SIR epidemic model where individuals are categorized into different types, and where infection spread is characterized by a next-generation matrix $M=\{m_{ij}\}$ with community fractions $\{π_j\}$ for the different types of individuals. We analyse two key quantities: the basic reproduction number $R_0$ and the final epidemic outcome of the different types $\{τ_i\}$. We consider the situation where $M$ is only partly known, through the row sums $\{r_i\}$ or the column sums $\{c_j\}$, and treat both a general $M$ and the special but common situation where $M$ is proportional to a contact matrix satisfying detailed balance. For a general $M$, which is partially observed through $\{r_i\}$ or $\{c_j\}$, we obtain sharp upper and lower bounds of $R_0$ and $\{τ_i\}$, but for the case where $M$ satisfies detailed balance the problem is harder: our obtained bounds for $R_0$ are narrower than the general case but still not sharp, and bounds for the final size are only obtained when there are two types of individual.

2602.01347 2026-03-10 q-bio.NC cs.HC

Vulnerability-Amplifying Interaction Loops: a systematic failure mode in AI chatbot mental-health interactions

Veith Weilnhammer, Kevin YC Hou, Lennart Luettgau, Christopher Summerfield, Raymond Dolan, Matthew M Nour

详情
英文摘要

Millions of users turn to consumer AI chatbots to discuss mental health and behavioral concerns. While this presents unprecedented opportunities to deliver population-level support, it also highlights an urgent need for rigorous and scalable safety evaluations. Here we introduce SIM-VAIL, an AI chatbot auditing framework that captures how harmful chatbot responses manifest across a range of mental health contexts. SIM-VAIL pairs a simulated user, harboring a distinct psychiatric vulnerability and conversational intent, with a frontier AI chatbot. It scores conversation turns on 13 clinically relevant risk dimensions, enabling context-dependent, temporally resolved safety assessment. Across 810 conversations, encompassing over 90,000 turn-level ratings and 30 psychiatric user profiles, we found evidence of concerning chatbot behavior across virtually all user phenotypes and most of the 9 consumer AI chatbots audited, albeit reduced in newer models. Rather than arising abruptly, concerning behavior accumulated over multiple turns. Risk profiles were phenotype-dependent and exhibited trade-offs, indicating that chatbot behaviors that appear supportive in general settings can become maladaptive when they align with mechanisms that sustain a user's vulnerability. These findings identify a systematic failure mode in human-AI interactions, which we term Vulnerability-Amplifying Interaction Loops (VAILs), and underscore the need for multidimensional approaches to risk quantification. SIM-VAIL provides a scalable framework for quantifying how mental health risk is distributed across user phenotypes, conversational trajectories, and clinically grounded behavioral dimensions, offering a new foundation for targeted safety improvements.

2601.15502 2026-03-10 physics.optics physics.bio-ph q-bio.CB q-bio.QM

Optical Manipulation of Erythrocytes via Evanescent Waves: Assessing Glucose-Induced Mobility Variations

T. Troncoso Enríquez, J. Staforelli-Vivanco, I. Bordeu, M. González-Ortiz

Comments 5 pages, pre-print

详情
英文摘要

This study investigates the dynamics of red blood cells (RBCs) under the influence of evanescent waves generated by total internal reflection (TIR). Using a 1064 nm laser system and a dual-chamber prism setup, we quantified the mobility of erythrocytes in different glucose environments. Our methodology integrates automated tracking via TrackMate\c{opyright} to analyze over 60 trajectory sets. The results reveal a significant decrease in mean velocity, from 11.8 μm/s in 5 mM glucose to 8.8 μm/s in 50 mM glucose (p = 0.019). These findings suggest that evanescent waves can serve as a non-invasive tool to probe the mechanical properties of cell membranes influenced by biochemical changes.

2601.14205 2026-03-10 physics.optics physics.bio-ph q-bio.QM

Three-Dimensional Volumetric Reconstruction of Native Chilean Pollen via Lens-Free Digital In-line Holographic Microscopy

J. Staforelli-Vivanco, V. Salamanca-Levi, R. Jofré-Cerda, M. Rondanelli-Reyes, I. Lamas

Comments 5 pages, pre-print article

详情
英文摘要

This study presents a robust methodology for the three-dimensional (3D) volumetric reconstruction and morphological characterization of native Chilean pollen grains using a lens-free Digital In-line Holographic Microscopy (DLHM) system. Utilizing a 532 nm laser point-source configuration and a 3.45 $μ$m pixel pitch CMOS sensor , we achieved a geometric magnification of 50x, resulting in an effective lateral resolution of approximately 69 nm at the object plane. The complex wavefronts of \textit{Anthemis cotula} (chamomile), \textit{Gevuina avellana} (hazel), and \textit{Conium maculatum} (hemlock) were numerically reconstructed via the Kirchhoff-Helmholtz transform to generate high-fidelity 3D refractive index maps. Biophysical parameters were extracted with nanometric precision, with volumes ranging from $3780.2 \pm 18$ $μ$m$^3$ to $4320.5 \pm 15$ $μ$m$^3$. Morphological quantification identified \textit{A. cotula} as the least spherical species ($Ψ= 0.76 \pm 0.03$) due to its characteristic echinate (spiny) exine, while \textit{G. avellana} exhibited the highest sphericity index of $0.89 \pm 0.02$. These results demonstrate that the label-free retrieval of "digital fingerprints" provides a scalable alternative for automated melissopalynology and viability assessment, filling critical geographic data gaps in South American biodiversity hotspots.

2512.05731 2026-03-10 q-bio.GN

DeeDeeExperiment: Building an infrastructure for integrating and managing omics data analysis results in R/Bioconductor

Najla Abassi, Lea Schwarz, Edoardo Filippi, Federico Marini

Comments 1 figure

详情
英文摘要

Summary: Modern omics experiments now involve multiple conditions and complex designs, producing an increasingly large set of differential expression and functional enrichment analysis results. However, no standardized data structure exists to store and contextualize these results together with their metadata, leaving researchers with an unmanageable and potentially non-reproducible collection of results that are difficult to navigate and/or share. Here we introduce DeeDeeExperiment, a new S4 class for managing and storing omics data analysis results, implemented within the Bioconductor ecosystem, which promotes interoperability, reproducibility and good documentation. This class extends the widely used SingleCellExperiment object by introducing dedicated slots for Differential Expression (DEA) and Functional Enrichment Analysis (FEA) results, allowing users to organize, store, and retrieve information on multiple contrasts and associated metadata within a single data object, ultimately streamlining the management and interpretation of many omics datasets. Availability and implementation: DeeDeeExperiment is available on Bioconductor under the MIT license (https://bioconductor.org/packages/DeeDeeExperiment), with its development version also available on Github (https://github.com/imbeimainz/DeeDeeExperiment).

2503.21403 2026-03-10 math.PR q-bio.PE

Bounds for survival probabilities in supercritical Galton-Watson processes and applications to population genetics

Reinhard Bürger

详情
Journal ref
Journal of Mathematical Biology 92, 40 (2026)
英文摘要

Population genetic processes, such as the adaptation of a quantitative trait to directional selection, may occur on longer time scales than the sweep of a single advantageous mutation. To study such processes in finite populations, approximations for the time course of the distribution of a beneficial mutation were derived previously by branching process methods. The application to the evolution of a quantitative trait requires bounds for the probability of survival $S^{(n)}$ up to generation $n$ of a single beneficial mutation. Here, we present a method to obtain a simple, analytically explicit, either upper or lower, bound for $S^{(n)}$ in a supercritical Galton-Watson process. We prove the existence of an upper bound for offspring distributions including Poisson, binomial, and negative binomial. They are constructed by bounding the given generating function, $φ$, by a fractional linear one that has the same survival probability $S^\infty$ and yields the same rate of convergence of $S^{(n)}$ to $S^\infty$ as $φ$. For distributions with at most three offspring, we characterize when this method yields an upper bound, a lower bound, or only an approximation. Because for many distributions it is difficult to get a handle on $S^\infty$, we derive an approximation by series expansion in $s$, where $s$ is the selective advantage of the mutant. We briefly review well-known asymptotic results that generalize Haldane's approximation $2s$ for $S^\infty$, as well as less well-known results on sharp bounds for $S^\infty$. We apply them to explore when bounds for $S^{(n)}$ exist for a family of generalized Poisson distributions. Numerical results demonstrate the accuracy of our and of previously derived bounds for $S^\infty$ and $S^{(n)}$. Finally, as an application we determine the response of a quantitative trait caused by new beneficial mutations to prolonged directional selection.

2502.13606 2026-03-10 q-bio.NC cs.AI cs.CL cs.CV cs.LG

LaVCa: LLM-assisted Visual Cortex Captioning

Takuya Matsuyama, Shinji Nishimoto, Yu Takagi

Comments Accepted to ICLR 2026. Website: https://sites.google.com/view/lavca-llm/

详情
英文摘要

Understanding the property of neural populations (or voxels) in the human brain can advance our comprehension of human perceptual and cognitive processing capabilities and contribute to developing brain-inspired computer models. Recent encoding models using deep neural networks (DNNs) have successfully predicted voxel-wise activity. However, interpreting the properties that explain voxel responses remains challenging because of the black-box nature of DNNs. As a solution, we propose LLM-assisted Visual Cortex Captioning (LaVCa), a data-driven approach that uses large language models (LLMs) to generate natural-language captions for images to which voxels are selective. By applying LaVCa for image-evoked brain activity, we demonstrate that LaVCa generates captions that describe voxel selectivity more accurately than the previously proposed method. Furthermore, the captions generated by LaVCa quantitatively capture more detailed properties than the existing method at both the inter-voxel and intra-voxel levels. Furthermore, a more detailed analysis of the voxel-specific properties generated by LaVCa reveals fine-grained functional differentiation within regions of interest (ROIs) in the visual cortex and voxels that simultaneously represent multiple distinct concepts. These findings offer profound insights into human visual representations by assigning detailed captions throughout the visual cortex while highlighting the potential of LLM-based methods in understanding brain representations.

2502.03569 2026-03-10 cs.LG q-bio.GN q-bio.PE

Controllable Sequence Editing for Biological and Clinical Trajectories

Michelle M. Li, Kevin Li, Yasha Ektefaie, Ying Jin, Yepeng Huang, Shvat Messica, Tianxi Cai, Marinka Zitnik

Comments ICLR 2026

详情
英文摘要

Conditional generation models for longitudinal sequences can produce new or modified trajectories given a conditioning input. However, they often lack control over when the condition should take effect (timing) and which variables it should influence (scope). Most methods either operate only on univariate sequences or assume that the condition alters all variables and time steps. In scientific and clinical settings, interventions instead begin at a specific moment, such as the time of drug administration or surgery, and influence only a subset of measurements while the rest of the trajectory remains unchanged. CLEF learns temporal concepts that encode how and when a condition alters future sequence evolution. These concepts allow CLEF to apply targeted edits to the affected time steps and variables while preserving the rest of the sequence. We evaluate CLEF on 8 datasets spanning cellular reprogramming, patient health, and sales, comparing against 9 state-of-the-art baselines. CLEF improves immediate sequence editing accuracy by 16.28% (MAE) on average against their non-CLEF counterparts. Unlike prior models, CLEF enables one-step conditional generation at arbitrary future times, outperforming their non-CLEF counterparts in delayed sequence editing by 26.73% (MAE) on average. We test CLEF under counterfactual inference assumptions and show up to 62.84% (MAE) improvement on zero-shot conditional generation of counterfactual trajectories. In a case study of patients with type 1 diabetes mellitus, CLEF identifies clinical interventions that generate realistic counterfactual trajectories shifted toward healthier outcomes.

2603.08062 2026-03-10 cs.LG q-bio.GN

Adversarial Domain Adaptation Enables Knowledge Transfer Across Heterogeneous RNA-Seq Datasets

Kevin Dradjat, Massinissa Hamidi, Blaise Hanczar

Comments 7 pages, 5 figures. Submitted to ECCB 2026

详情
英文摘要

Accurate phenotype prediction from RNA sequencing (RNA-seq) data is essential for diagnosis, biomarker discovery, and personalized medicine. Deep learning models have demonstrated strong potential to outperform classical machine learning approaches, but their performance relies on large, well-annotated datasets. In transcriptomics, such datasets are frequently limited, leading to over-fitting and poor generalization. Knowledge transfer from larger, more general datasets can alleviate this issue. However, transferring information across RNA-seq datasets remains challenging due to heterogeneous preprocessing pipelines and differences in target phenotypes. In this study, we propose a deep learning-based domain adaptation framework that enables effective knowledge transfer from a large general dataset to a smaller one for cancer type classification. The method learns a domain-invariant latent space by jointly optimizing classification and domain alignment objectives. To ensure stable training and robustness in data-scarce scenarios, the framework is trained with an adversarial approach with appropriate regularization. Both supervised and unsupervised approach variants are explored, leveraging labeled or unlabeled target samples. The framework is evaluated on three large-scale transcriptomic datasets (TCGA, ARCHS4, GTEx) to assess its ability to transfer knowledge across cohorts. Experimental results demonstrate consistent improvements in cancer and tissue type classification accuracy compared to non-adaptive baselines, particularly in low-data scenarios. Overall, this work highlights domain adaptation as a powerful strategy for data-efficient knowledge transfer in transcriptomics, enabling robust phenotype prediction under constrained data conditions.

2603.07710 2026-03-10 cs.LG q-bio.BM

Reverse Distillation: Consistently Scaling Protein Language Model Representations

Darius Catrina, Christian Bepler, Samuel Sledzieski, Rohit Singh

Comments Proceedings of ICLR 2026

详情
英文摘要

Unlike the predictable scaling laws in natural language processing and computer vision, protein language models (PLMs) scale poorly: for many tasks, models within the same family plateau or even decrease in performance, with mid-sized models often outperforming the largest in the family. We introduce Reverse Distillation, a principled framework that decomposes large PLM representations into orthogonal subspaces guided by smaller models of the same family. The resulting embeddings have a nested, Matryoshka-style structure: the first k dimensions of a larger model's embedding are exactly the representation from the smaller model. This ensures that larger reverse-distilled models consistently outperform smaller ones. A motivating intuition is that smaller models, constrained by capacity, preferentially encode broadly-shared protein features. Reverse distillation isolates these shared features and orthogonally extracts additional contributions from larger models, preventing interference between the two. On ProteinGym benchmarks, reverse-distilled ESM-2 variants outperform their respective baselines at the same embedding dimensionality, with the reverse-distilled 15 billion parameter model achieving the strongest performance. Our framework is generalizable to any model family where scaling challenges persist. Code and trained models are available at https://github.com/rohitsinghlab/plm_reverse_distillation.

2603.07369 2026-03-10 q-bio.NC cs.CV

Task learning increases information redundancy of neural responses in macaque visual cortex

Shizhao Liu, Anton Pletenev, Ralf M. Haefner, Adam C. Snyder

Comments published in Science, accepted manuscript prior to editing, main text: 33 pages, 5 figures, 39 supplementary pages, 22 supplementary figures, 7 supplementary tables

详情
Journal ref
Science, 391(6789), 1029-1035 (2026)
英文摘要

How does the brain optimize sensory information for decision-making in new tasks? One hypothesis suggests learning reduces redundancy in neural representations to improve efficiency, while another, based on Bayesian inference, predicts learning increases redundancy by distributing information across neurons. We tested these hypotheses by tracking population responses in macaque cortical area V4 as monkeys learned visual discrimination tasks. We found strong support for the Bayesian predictions: task learning increased redundancy in neural responses over weeks of training and within single trials. This redundancy did not reduce information but instead increased the information carried by individual neurons. These insights suggest sensory processing in the brain reflects a generative rather than discriminative inference process.

2603.07364 2026-03-10 q-bio.QM cs.HC q-bio.NC

Neural Control and Learning of Simulated Hand Movements With an EMG-Based Closed-Loop Interface

Balint K. Hodossy, Dario Farina

详情
英文摘要

The standard engineering approach when facing uncertainty is modelling. Mixing data from a well-calibrated model with real recordings has led to breakthroughs in many applications of AI, from computer vision to autonomous driving. This type of model-based data augmentation is now beginning to show promising results in biosignal processing as well. However, while these simulated data are necessary, they are not sufficient for virtual neurophysiological experiments. Simply generating neural signals that reproduce a predetermined motor behaviour does not capture the flexibility, variability, and causal structure required to probe neural mechanisms during control tasks. In this study, we present an in silico neuromechanical model that combines a fully forward musculoskeletal simulation, reinforcement learning, and sequential, online electromyography synthesis. This framework provides not only synchronised kinematics, dynamics, and corresponding neural activity, but also explicitly models feedback and feedforward control in a virtual participant. In this way, online control problems can be represented, as the simulated human adapts its behaviour via a learned RL policy in response to a neural interface. For example, the virtual user can learn hand movements robust to perturbations or the control of a virtual gesture decoder. We illustrate the approach using a gesturing task within a biomechanical hand model, and lay the groundwork for using this technique to evaluate neural controllers, augment training datasets, and generate synthetic data for neurological conditions.

2603.01396 2026-03-10 cs.AI cs.CE q-bio.QM

HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts

Wenxuan Huang, Mingyu Tsoi, Yanhao Huang, Xinjie Mao, Xue Xia, Hao Wu, Jiaqi Wei, Yuejin Yang, Lang Yu, Cheng Tan, Xiang Zhang, Zhangyang Gao, Siqi Sun

Comments 18 pages total (8 pages main text + appendix), 6 figures

详情
英文摘要

Single-cell perturbation studies face dual heterogeneity bottlenecks: (i) semantic heterogeneity--identical biological concepts encoded under incompatible metadata schemas across datasets; and (ii) statistical heterogeneity--distribution shifts from biological variation demanding dataset-specific inductive biases. We propose HarmonyCell, an end-to-end agent framework resolving each challenge through a dedicated mechanism: an LLM-driven Semantic Unifier autonomously maps disparate metadata into a canonical interface without manual intervention; and an adaptive Monte Carlo Tree Search engine operates over a hierarchical action space to synthesize architectures with optimal statistical inductive biases for distribution shifts. Evaluated across diverse perturbation tasks under both semantic and distribution shifts, HarmonyCell achieves a 95% valid execution rate on heterogeneous input datasets (versus 0% for general agents) while matching or even exceeding expert-designed baselines in rigorous out-of-distribution evaluations. This dual-track orchestration enables scalable automatic virtual cell modeling without dataset-specific engineering.

2602.22263 2026-03-10 q-bio.BM cs.AI eess.IV q-bio.QM

CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints

Fuyao Huang, Xiaozhu Yu, Kui Xu, Qiangfeng Cliff Zhang

Comments Published as a conference paper at ICLR 2026

详情
英文摘要

High-resolution structure determination by cryo-electron microscopy (cryo-EM) requires the accurate fitting of an atomic model into an experimental density map. Traditional refinement pipelines such as Phenix.real_space_refine and Rosetta are computationally expensive, demand extensive manual tuning, and present a significant bottleneck for researchers. We present CryoNet.Refine, an end-to-end deep learning framework that automates and accelerates molecular structure refinement. Our approach utilizes a one-step diffusion model that integrates a density-aware loss function with robust stereochemical restraints, enabling rapid optimization of a structure against experimental data. CryoNet.Refine provides a unified and versatile solution capable of refining protein complexes as well as DNA/RNA-protein complexes. In benchmarks against Phenix.real_space_refine, CryoNet.Refine consistently achieves substantial improvements in both model-map correlation and overall geometric quality metrics. By offering a scalable, automated, and powerful alternative, CryoNet.Refine aims to serve as an essential tool for next-generation cryo-EM structure refinement. Web server: https://cryonet.ai/refine; Source code: https://github.com/kuixu/cryonet.refine.

2512.00126 2026-03-10 q-bio.QM cs.AI

RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding

Jin Han, Tianfan Fu, Wu-Jun Li

详情
英文摘要

Protein inverse folding, the design of an amino acid sequence based on a target protein structure, is a fundamental problem of computational protein engineering. Existing methods either generate sequences without leveraging external knowledge or relying on protein language models~(PLMs). The former omits the knowledge stored in natural protein data, while the latter is parameter-inefficient and inflexible to adapt to ever-growing protein data. To overcome the above drawbacks, in this paper we propose a novel method, called $\underline{\text{r}}$etrieval-$\underline{\text{a}}$ugmented $\underline{\text{d}}$enoising $\underline{\text{diff}}$usion~($\mbox{RadDiff}$), for protein inverse folding. In RadDiff, a novel retrieval-augmentation mechanism is designed to capture the up-to-date protein knowledge. We further design a knowledge-aware diffusion model that integrates this protein knowledge into the diffusion process via a lightweight module. Experimental results on the CATH, TS50, and PDB2022 datasets show that $\mbox{RadDiff}$ consistently outperforms existing methods, improving sequence recovery rate by up to 19\%. Experimental results also demonstrate that RadDiff generates highly foldable sequences and scales effectively with database size.

2511.20859 2026-03-10 cs.GT cs.AI cs.MA econ.TH q-bio.PE

Computing Evolutionarily Stable Strategies in Multiplayer Games

Sam Ganzfried

Comments Reverting to original title after fixing Google scholar merge

详情
英文摘要

We present an algorithm for computing all evolutionarily stable strategies in nondegenerate normal-form games with three or more players.

2511.08996 2026-03-10 q-bio.GN

Partial domain adaptation enables cross domain cell type annotation between scRNA-seq and snRNA-seq

Xiran Chen, Quan Zou, Qinyu Cai, Xiaofeng Chen, Weikai Li, Yansu Wang

详情
英文摘要

Accurate cell type annotation across datasets is a key challenge in single-cell analysis. snRNA-seq enables profiling of frozen or difficult-to-dissociate tissues, complementing scRNA-seq by capturing fragile or rare cell types. However, cross-annotation between these two datasets remains largely unexplored, as existing methods treat them independently. We introduce ScNucAdapt, a method designed for cross-annotation between paired and unpaired scRNA-seq and snRNA-seq datasets. To address distributional and cell composition differences, ScNucAdapt employs partial domain adaptation. Experiments across both unpaired and paired scRNA-seq and snRNA-seq show that ScNucAdapt achieves robust and accurate cell type annotation, outperforming existing approaches. Therefore, ScNucAdapt provides a practical framework for the cross-domain cell type annotation between scRNA-seq and snRNA seq data.

2510.01089 2026-03-10 cs.LG q-bio.QM

Double projection for reconstructing dynamical systems: between stochastic and deterministic regimes

Viktor Sip, Martin Breyton, Spase Petkoski, Viktor Jirsa

详情
英文摘要

Learning stochastic models of dynamical systems from observed data is of interest in many scientific fields. Here, we propose a new method for this task within the family of dynamical variational autoencoders. The proposed double projection method estimates both the system state trajectories and the noise time series from data. This approach naturally allows us to perform multi-step system evolution and to learn models with a comparatively low-dimensional state space. We evaluate the performance of the method on six benchmark problems, including both simulated and experimental data. We further illustrate the effects of the teacher forcing interval of the multi-step scheme on the nature of the internal dynamics and compare the resulting behavior to that of deterministic models of equivalent architecture.

2508.01920 2026-03-10 q-bio.NC q-bio.QM stat.AP

CITS: Nonparametric Statistical Causal Modeling for High-Resolution Neural Time Series

Rahul Biswas, SuryaNarayana Sripada, Somabha Mukherjee, Reza Abbasi-Asl

Comments arXiv admin note: text overlap with arXiv:2312.09604

详情
英文摘要

Identifying causal interactions in complex dynamical systems is a fundamental challenge across the computational sciences. Existing functional connectivity methods capture correlations but not causation. While addressing directionality, popular causal inference tools such as Granger causality and the Peter-Clark algorithm rely on restrictive assumptions that limit their applicability to high-resolution time-series data, such as the large-scale recordings now standard in neuroscience. Here, we introduce CITS (Causal Inference in Time Series), a nonparametric framework for inferring statistically causal structure from multivariate time series. CITS models dynamics using a structural causal model of arbitrary Markov order and statistical tests for lagged conditional independence. We prove consistency under mild assumptions and demonstrate superior accuracy over state-of-the-art baselines across simulated linear, nonlinear, and recurrent neural network benchmarks. Applying CITS to large-scale neuronal recordings from the mouse visual cortex, thalamus, and hippocampus, we uncover stimulus-specific causal pathways and inter-regional hierarchies that align with known anatomy while revealing new functional insights. We further highlight CITS ability in accurately identifying conditional dependencies within small inferred neuronal motifs. These results establish CITS as a theoretically grounded and empirically validated method for discovering interpretable statistically causal networks in neural time series. Beyond neuroscience, the framework is broadly applicable to causal discovery in complex temporal systems across domains.

2506.07842 2026-03-10 q-bio.PE

Simulating nationwide coupled disease and fear spread in an agent-based model

Joy Kitson, Prescott C. Alexander, Joseph Tuccillo, David J. Butts, Christa Brelsford, Abhinav Bhatele, Sara Y. Del Valle, Timothy C. Germann

Comments 21 pages, 8 figures, 2 tables

详情
Journal ref
Scientific Reports, 15(1), 42235 (2025)
英文摘要

Human cognitive responses, behavioral responses, and disease dynamics co-evolve over the course of any disease outbreak, and can result in complex feedbacks. We present a dynamic agent-based model that explicitly couples the spread of disease with the spread of fear surrounding the disease, implemented within the EpiCast simulation framework. EpiCast models transmission across a realistic synthetic population, capturing individual-level interactions. In our model, fear propagates through both in-person contact and broadcast media, prompting individuals to adopt protective behaviors that reduce disease spread. In order to better understand these coupled dynamics, we create and compare a range of compartmental surrogate models to analyze the impact of including various disease states. Additionally, we compare a range of behavioral scenarios within EpiCast, varying the level and intensity of fear and behavioral change. Our results show that the addition of asymptomatic, exposed, and pre-symptomatic disease states can impact both the rate at which an outbreak progresses and its overall trajectory. Moreover, the combination of non-local fear spread via broadcasters and strong behavioral responses by fearful individuals generally leads to multiple epidemic waves, an outcome that occurs only within a narrow parameter range when fear spreads purely through local contact. Accounting for the coupled spread of fear and disease is critical for understanding disease dynamics and designing timely, targeted responses to emerging infectious threats.

2503.20817 2026-03-10 q-bio.QM

Label-free pathological subtyping of non-small cell lung cancer using deep classification and virtual immunohistochemical staining

Zhenya Zang, David A Dorward, Katherine E Quiohilag, Andrew DJ Wood, James R Hopgood, Ahsan R Akram, Qiang Wang

Comments Main article: 27 pages, 6 figures, and 1 table. Supplementary information: 12 figures and 6 tables. Accepted by NPJ Digital Medicine

详情
英文摘要

The differentiation between pathological subtypes of non-small cell lung cancer (NSCLC) is an essential step in guiding treatment options and prognosis. However, current clinical practice relies on multi-step staining and labelling processes that are time-intensive and costly, requiring highly specialised expertise. In this study, we propose a label-free methodology that facilitates autofluorescence imaging of unstained NSCLC samples and deep learning (DL) techniques to distinguish between non-cancerous tissue, adenocarcinoma (AC), squamous cell carcinoma (SqCC), and other subtypes (OS). We conducted DL-based classification and generated virtual immunohistochemical (IHC) stains, including thyroid transcription factor-1 (TTF-1) for AC and p40 for SqCC, and evaluated these methods using two types of autofluorescence imaging: intensity imaging and lifetime imaging. The results demonstrate the exceptional ability of this approach for NSCLC subtype differentiation, achieving an area under the curve above 0.981 and 0.996 for binary- and multi-class classification. Furthermore, this approach produces clinical-grade virtual IHC staining which was blind-evaluated by three experienced thoracic pathologists. Our label-free NSCLC subtyping approach enables rapid and accurate diagnosis without conventional tissue processing and staining. Both strategies can significantly accelerate diagnostic workflows and support efficient lung cancer diagnosis, without compromising clinical decision-making.

2503.19935 2026-03-10 q-bio.QM

CAN-STRESS: A Real-World Multimodal Dataset for Understanding Cannabis Use, Stress, and Physiological Responses

Reza Rahimi Azghan, Nicholas C. Glodosky, Ramesh Kumar Sah, Carrie Cuttler, Ryan McLaughlin, Michael J. Cleveland, Hassan Ghasemzadeh

详情
英文摘要

Coping with stress is one of the most frequently cited reasons for chronic cannabis use. Therefore, it is hypothesized that cannabis users exhibit distinct physiological stress responses compared to non-users, and these differences would be more pronounced during moments of consumption. However, there is a scarcity of publicly available datasets that allow such hypotheses to be tested in real-world environments. This paper introduces a dataset named CAN-STRESS, collected using Empatica E4 wristbands. The dataset includes physiological measurements such as skin conductance, heart rate, and skin temperature from 82 participants (39 cannabis users and 43 non-users) as they went about their daily lives. Additionally, the dataset includes self-reported surveys where participants documented moments of cannabis consumption, exercise, and rated their perceived stress levels during those moments. In this paper, we publicly release the CAN-STRESS dataset, which we believe serves as a highly reliable resource for examining the impact of cannabis on stress and its associated physiological markers. I

2503.05031 2026-03-10 eess.IV cs.AI cs.CV q-bio.NC

Enhancing Alzheimer's Diagnosis: Leveraging Anatomical Landmarks in Graph Convolutional Neural Networks on Tetrahedral Meshes

Yanxi Chen, Mohammad Farazi, Zhangsihao Yang, Yonghui Fan, Nicholas Ashton, Eric M Reiman, Yi Su, Yalin Wang

详情
英文摘要

Alzheimer's disease (AD) is a major neurodegenerative condition that affects millions around the world. As one of the main biomarkers in the AD diagnosis procedure, brain amyloid positivity is typically identified by positron emission tomography (PET), which is costly and invasive. Brain structural magnetic resonance imaging (sMRI) may provide a safer and more convenient solution for the AD diagnosis. Recent advances in geometric deep learning have facilitated sMRI analysis and early diagnosis of AD. However, determining AD pathology, such as brain amyloid deposition, in preclinical stage remains challenging, as less significant morphological changes can be observed. As a result, few AD classification models are generalizable to the brain amyloid positivity classification task. Blood-based biomarkers (BBBMs), on the other hand, have recently achieved remarkable success in predicting brain amyloid positivity and identifying individuals with high risk of being brain amyloid positive. However, individuals in medium risk group still require gold standard tests such as Amyloid PET for further evaluation. Inspired by the recent success of transformer architectures, we propose a geometric deep learning model based on transformer that is both scalable and robust to variations in input volumetric mesh size. Our work introduced a novel tokenization scheme for tetrahedral meshes, incorporating anatomical landmarks generated by a pre-trained Gaussian process model. Our model achieved superior classification performance in AD classification task. In addition, we showed that the model was also generalizable to the brain amyloid positivity prediction with individuals in the medium risk class, where BM alone cannot achieve a clear classification. Our work may enrich geometric deep learning research and improve AD diagnosis accuracy without using expensive and invasive PET scans.

2412.21159 2026-03-10 q-bio.QM

UNISEP: A Unified Sensor Placement Framework for Human Motion Capture and Wearables

Julius Welzel, Sein Jeung, Lara Godbersen, Seyed Yahya Shirazi

Comments 14 pages, 2 Tables. GitHub Rpostiroy and Page are available from the code availability section

详情
英文摘要

The proliferation of wearable sensors and monitoring technologies has created a need for standardized sensor placement protocols. While existing standards like the Surface Electromyography for Non-Invasive Assessment of Muscles (SENIAM) recommendations for electromyography (EMG) and the 10-20 system for electroencephalography (EEG) address modality-specific applications, no comprehensive framework spans different sensing modalities and applications. We present the Unified Sensor Placement (UNISEP) framework to facilitate reproducible handling of human movement and physiological data across various systems and research domains. The framework provides a method to describe coordinate systems and placement protocols based on anatomical landmarks, and is designed to complement existing data-sharing standards such as the Brain Imaging Data Structure (BIDS) and Hierarchical Event Descriptors (HED). Even during its proposal stage, the UNISEP approach has been adopted by the EMG-BIDS extension (BIDS version 1.11.0), confirming the community need for a unified, machine-readable sensor placement framework. The UNISEP framework facilitates consistency, reproducibility, and interoperability in applications ranging from lab-based clinical biomechanics to continuous health monitoring in everyday life.

2403.14629 2026-03-10 physics.bio-ph physics.data-an q-bio.OT

Physics-based signal analysis of genome sequences: GenomeBits overview

E. Canessa

Comments To appear Microorganisms (2023) 5 Figs, 12 pages

详情
Journal ref
Microorganisms 11 (2023) 2733
英文摘要

A comprehensive overview of the recent physics-inspired genome analysis tool, GenomeBits, is presented. This is based on traditional signal processing methods such as Discrete Fourier Transform (DFT). GenomeBits can be used to extract underlying genomics features from the distribution of nucleotides, and can be further used to analyze the mutation patterns in viral genomes. Examples of the main GenomeBits findings outlining the intrinsic signal organization of genomics sequences for different SARS-CoV-2 variants along the pandemic years 2020-2022 and Monkeypox cases in 2021 are presented to show the usefulness of GenomeBits. GenomeBits results for DFT of SARS-CoV-2 genomes in different geographical regions are discussed together with the GenomeBits analysis of complete genome sequences for the first coronavirus variants reported: Alpha, Beta, Gamma, Epsilon and Eta. Interesting features of the Delta and Omicron variants in the form of a unique "order-disorder" transition are uncovered from these samples as well as from their cumulative distribution function and scatter plots. This class of transitions might reveal the cumulative outcome of mutations on the spike protein. A salient feature of GenomeBits is the mapping of the nucleotide bases (A,T,C,G) into an alternating spin-like numerical sequence via a series having binary (0,1) indicators for each A,T,C,G. This leads to derive a set of statistical distribution curves. Furthermore, the quantum-based extension of the GenomeBits model to an analogous probability measure is shown to identify properties of genome sequences as wavefunctions via a superposition of states. An association of the integral of the GenomeBits coding and a binding-like energy can in principle also be established. The relevance of these different results in Bioinformatics is analyzed.

2303.02157 2026-03-10 eess.IV eess.SP q-bio.QM

Expectation-maximization for structure determination directly from cryo-EM micrographs

Shay Kreymer, Amit Singer, Tamir Bendory

详情
英文摘要

A single-particle cryo-electron microscopy (cryo-EM) measurement, called a micrograph, consists of multiple two-dimensional tomographic projections of a three-dimensional (3-D) molecular structure at unknown locations, taken under unknown viewing directions. All existing cryo-EM algorithmic pipelines first locate and extract the projection images, and then reconstruct the structure from the extracted images. However, if the molecular structure is small, the signal-to-noise ratio (SNR) of the data is very low, making it challenging to accurately detect projection images within the micrograph. Consequently, all standard techniques fail in low-SNR regimes. To recover molecular structures from measurements of low SNR, and in particular small molecular structures, we devise an approximate expectation-maximization algorithm to estimate the 3-D structure directly from the micrograph, bypassing the need to locate the projection images. We corroborate our computational scheme with numerical experiments and present successful structure recoveries from simulated noisy measurements.

2203.11578 2026-03-10 q-bio.CB

Mathematical modeling of glioma invasion and therapy approaches via kinetic theory of active particles

Martina Conte, Yvonne Dzierma, Sven Knobe, Christina Surulescu

Comments 30 pages, 12 figures

详情
Journal ref
Mathematical Models and Methods in Applied Sciences, 33(5): 1009-1051 (2023)
英文摘要

We propose here a multiscale model for study the effect of combined therapies on glioma spread in the brain under the influence of vascularization. The model accounts for the interplay between the different components of the neoplasm and the healthy tissue and it investigates and compares various therapy approaches. Precisely, these involve radio- and chemotherapy in a concurrent or adjuvant manner together with anti-angiogenic therapy affecting the vascular component of the system. We assess tumor growth and spread on the basis of DTI data, which allows us to reconstruct a realistic brain geometry and tissue structure, and we apply our model to real glioma patient data. In this latter case, a space-dependent radiotherapy description is considered using data about the corresponding isodose curves.

2603.07279 2026-03-10 q-bio.QM

Learning When to Look: On-Demand Keypoint-Video Fusion for Animal Behavior Analysis

Weihan Li, Jingyang Ke, Yule Wang, Chengrui Li, Anqi Wu

详情
英文摘要

Understanding animal behavior from video is essential for neuroscience research. Modern laboratories typically collect two complementary data streams: skeletal keypoints from pose estimation tools and raw video recordings. Keypoint-based methods are efficient but suffer from geometric ambiguity, environmental blindness, and sensitivity to occlusions. Video-based methods capture rich context but require processing every frame, making them impractical for the hundreds of hours of recordings that modern experiments produce. We introduce LookAgain, a multimodal framework that combines the efficiency of keypoints with the representational power of video through on-demand visual grounding. During training, LookAgain uses dense visual features to pretrain a motion encoder and to train a gating module that learns which frames require visual context. During inference, this gating module activates visual processing only when keypoint signals are ambiguous, while maintaining performance comparable to using all frames. Experiments on single-animal and multi-animal benchmarks show that LookAgain achieves strong performance with significantly reduced computational cost, enabling high-quality behavior analysis on long-duration recordings.