arXivDaily arXiv每日学术速递 周一至周五更新
2601.16384 2026-01-26 q-bio.CB

Modeling tumor progression in heterogeneous microenvironments: A cellular automata approach

Yue Deng, Mingjing Li, Jinzhi Lei

Comments 20 pages, 11 figures, 4 tables

详情
英文摘要

Understanding how microenvironmental heterogeneity influences tumor progression is essential for advancing both cancer biology and therapeutic strategies. In this study, we develop a cellular automata (CA) model to simulate tumor growth under varying microenvironmental conditions and genetic mutation rates, addressing a gap in existing studies that rarely integrate these two factors to explain tumor dynamics. The model explicitly incorporates the cellular heterogeneity of stem and non-stem cells, dynamic cell-cell interactions, and tumor-microenvironment crosstalk. Using computational simulations, we examine the synergistic effects of gene mutation rate, initial tumor burden, and microenvironmental state on tumor progression. Our results demonstrate that lowering the mutation rate significantly mitigates tumor expansion and preserves microenvironmental integrity. Interestingly, the initial tumor burden has a limited impact, whereas the initial condition of the microenvironment critically shapes tumor dynamics. A supportive microenvironment promotes proliferation and spatial invasion, while inhibitory conditions suppress tumor growth. These findings highlight the key role of microenvironmental modulation in tumor evolution and provide computational insights that may inform more effective cancer therapies.

2601.16378 2026-01-26 cs.CV cs.AI q-bio.NC

Cognitively-Inspired Tokens Overcome Egocentric Bias in Multimodal Models

Bridget Leonard, Scott O. Murray

详情
英文摘要

Multimodal language models (MLMs) perform well on semantic vision-language tasks but fail at spatial reasoning that requires adopting another agent's visual perspective. These errors reflect a persistent egocentric bias and raise questions about whether current models support allocentric reasoning. Inspired by human spatial cognition, we introduce perspective tokens, specialized embeddings that encode orientation through either (1) embodied body-keypoint cues or (2) abstract representations supporting mental rotation. Integrating these tokens into LLaVA-1.5-13B yields performance on level-2 visual perspective-taking tasks. Across synthetic and naturalistic benchmarks (Isle Bricks V2, COCO, 3DSRBench), perspective tokens improve accuracy, with rotation-based tokens generalizing to non-human reference agents. Representational analyses reveal that fine-tuning enhances latent orientation sensitivity already present in the base model, suggesting that MLMs contain precursors of allocentric reasoning but lack appropriate internal structure. Overall, embedding cognitively grounded spatial structure directly into token space provides a lightweight, model-agnostic mechanism for perspective-taking and more human-like spatial reasoning.

2601.15341 2026-01-26 q-bio.MN cs.LG

Latent Causal Diffusions for Single-Cell Perturbation Modeling

Lars Lorch, Jiaqi Zhang, Charlotte Bunne, Andreas Krause, Bernhard Schölkopf, Caroline Uhler

详情
英文摘要

Perturbation screens hold the potential to systematically map regulatory processes at single-cell resolution, yet modeling and predicting transcriptome-wide responses to perturbations remains a major computational challenge. Existing methods often underperform simple baselines, fail to disentangle measurement noise from biological signal, and provide limited insight into the causal structure governing cellular responses. Here, we present the latent causal diffusion (LCD), a generative model that frames single-cell gene expression as a stationary diffusion process observed under measurement noise. LCD outperforms established approaches in predicting the distributional shifts of unseen perturbation combinations in single-cell RNA-sequencing screens while simultaneously learning a mechanistic dynamical system of gene regulation. To interpret these learned dynamics, we develop an approach we call causal linearization via perturbation responses (CLIPR), which yields an approximation of the direct causal effects between all genes modeled by the diffusion. CLIPR provably identifies causal effects under a linear drift assumption and recovers causal structure in both simulated systems and a genome-wide perturbation screen, where it clusters genes into coherent functional modules and resolves causal relationships that standard differential expression analysis cannot. The LCD-CLIPR framework bridges generative modeling with causal inference to predict unseen perturbation effects and map the underlying regulatory mechanisms of the transcriptome.

2601.14795 2026-01-26 cs.SI q-bio.PE

Validating Behavioral Proxies for Disease Risk Monitoring via Large-Scale E-commerce Data

Naomi Sasaya, Shigefumi Kishida, Ryo Kikuchi, Akira Tajima

Comments 12 pages, 6 figures. Cross-domain validation of behavioral disease proxies using large-scale e-commerce data. Minor revision to the abstract for clarity

详情
英文摘要

Digital traces of daily activities, such as e-commerce (EC) purchase histories, provide scalable signals for public health surveillance, yet their epidemiological validity remains unclear. This study validates a behavioral proxy for disease onset, defined as transitions from regular to therapeutic diets, by comparing large-scale EC data (N=55,645) against independent insurance-derived clinical records. Using feline lower urinary tract disease (FLUTD) as a case study, the proxy showed strong agreement with clinical data for ingredient-level risk patterns (r=0.74) and seasonal dynamics (r=0.82). Furthermore, analysis using EC data alone reproduced the established protective association of wet food consumption. These results demonstrate that validated behavioral signals from EC data can serve as cost-effective complements to traditional surveillance, with potential applicability to monitoring lifestyle-related diseases in human populations.

2601.02530 2026-01-26 cs.LG q-bio.QM

Multi-scale Graph Autoregressive Modeling: Molecular Property Prediction via Next Token Prediction

Zhuoyang Jiang, Yaosen Min, Peiran Jin, Lei Chen

详情
英文摘要

We present Connection-Aware Motif Sequencing (CamS), a graph-to-sequence representation that enables decoder-only Transformers to learn molecular graphs via standard next-token prediction (NTP). For molecular property prediction, SMILES-based NTP scales well but lacks explicit topology, whereas graph-native masked modeling captures connectivity but risks disrupting the pivotal chemical details (e.g., activity cliffs). CamS bridges this gap by serializing molecular graphs into structure-rich causal sequences. CamS first mines data-driven connection-aware motifs. It then serializes motifs via scaffold-rooted breadth-first search (BFS) to establish a stable core-to-periphery order. Crucially, CamS enables hierarchical modeling by concatenating sequences from fine to coarse motif scales, allowing the model to condition global scaffolds on dense, uncorrupted local structural evidence. We instantiate CamS-LLaMA by pre-training a vanilla LLaMA backbone on CamS sequences. It achieves state-of-the-art performance on MoleculeNet and the activity-cliff benchmark MoleculeACE, outperforming both SMILES-based language models and strong graph baselines. Interpretability analysis confirms that our multi-scale causal serialization effectively drives attention toward cliff-determining differences.

2511.18142 2026-01-26 q-bio.PE math.DS

SEIR models with host heterogeneity: theoretical aspects and applications to seasonal influenza dynamics

Tamás Tekeli, Andrea Pugliese, Cinzia Soresina

详情
英文摘要

Population heterogeneity is a key factor in epidemic dynamics, influencing both transmission and final epidemic size. While heterogeneity is often modelled through age structure, spatial location, or contact patterns, differences in host susceptibility have recently gained attention, particularly during the COVID-19 pandemic. Building on the framework of Diekmann and Inaba (Journal of Mathematical Biology, 2023), we focus on the special case of SEIR epidemic models, assuming that at the epidemic start there is no pre-existing immunity. Under two distinct assumptions linking susceptibility and infectiousness, one obtains a closed system of 3 ODEs, which can be easily simulated and for which some analytical results are obtained. In particular, we proved that heterogeneity in susceptibility reduces the epidemic final size compared to homogeneous models with the same basic reproduction number $R_0$. We specialised in the case where susceptibility is distributed according to a gamma or extended Beta distribution, showing how the epidemic final size depends on the variance of the distribution. In the case of a gamma-distributed susceptibility, the resulting model consists of a system of ODEs with just one parameter more than the classical SEIR model; this makes it practical for fitting epidemic data. We illustrate its use by fitting data on seasonal influenza in Italy, and comparing the results to those obtained with simple SEIR models with pre-existing immunity.

2510.13018 2026-01-26 cs.LG q-bio.QM

Escaping Local Optima in the Waddington Landscape: A Two-Stage TRPO-PPO Approach for Single-Cell Perturbation Analysis

Francis Boabang, Samuel Asante Gyamerah

Comments 17 pages, 6 figures, 8 tables

详情
英文摘要

Modeling cellular responses to genetic and chemical perturbations remains a central challenge in single-cell biology. Existing data-driven frameworks have advanced perturbation prediction through variational autoencoders, chemically conditioned autoencoders, and large-scale transformer pretraining. However, most existing models rely exclusively on either in silico perturbation data or experimental perturbation data but rarely integrate both, limiting their ability to generalize and validate predictions across simulated and real biological contexts in a digital twin system. Moreover, the models are prone to local optima in the nonconvex Waddington landscape of cell fate decisions, where poor initialization can trap trajectories in spurious lineages. In this work, we introduce a two-stage reinforcement learning algorithm for modeling single-cell perturbation. We first compute an explicit natural gradient update using Fisher-vector products and a conjugate gradient solver, scaled by a KL trust-region constraint to provide a safe, curvature-aware first step for the policy. Starting with these preconditioned parameters, we then apply a second phase of proximal policy optimization (PPO) with a KL penalty, exploiting minibatch efficiency to refine the policy. We demonstrate that this initialization strategy substantially improves generalization on Single-cell RNA sequencing (scRNA-seq) perturbation analysis in a digital twin system.

2508.09871 2026-01-26 q-bio.PE

Inference of germinal center evolutionary dynamics via simulation-based deep learning

Duncan K Ralph, Athanasios G Bakis, Jared Galloway, Ashni A Vora, Tatsuya Araki, Gabriel D Victora, Yun S Song, William S DeWitt, Frederick A Matsen

详情
英文摘要

B cells and the antibodies they produce are vital to health and survival, motivating research on the details of the mutational and evolutionary processes in the germinal centers (GC) from which mature B cells arise. It is known that B cells with higher affinity for their cognate antigen (Ag) will, on average, tend to have more offspring. However the exact form of this relationship between affinity and fecundity, which we call the ``affinity-fitness response function'', is not known. Here we use deep learning and simulation-based inference to learn this function from a unique experiment that replays a particular combination of GC conditions many times. All code is freely available at https://github.com/matsengrp/gcdyn, while datasets and inference results can be found at https://doi.org/10.5281/zenodo.15022130.

2504.03732 2026-01-26 cs.AR cs.DC q-bio.GN

SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis

Nika Mansouri Ghiasi, Talu Güloglu, Harun Mustafa, Can Firtina, Konstantina Koliogeorgi, Konstantinos Kanellopoulos, Haiyu Mao, Rakesh Nadig, Mohammad Sadrosadati, Jisung Park, Onur Mutlu

Comments To appear in HPCA 2026

详情
英文摘要

Genome sequence analysis, which examines the DNA sequences of organisms, drives advances in many critical medical and biotechnological fields. Given its importance and the exponentially growing volumes of genomic sequence data, there are extensive efforts to accelerate genome sequence analysis. In this work, we demonstrate a major bottleneck that greatly limits and diminishes the benefits of state-of-the-art genome sequence analysis accelerators: the data preparation bottleneck, where genomic sequence data is stored in compressed form and needs to be first decompressed and formatted before an accelerator can operate on it. To mitigate this bottleneck, we propose SAGe, an algorithm-architecture co-design for highly-compressed storage and high-performance access of large-scale genomic sequence data. The key challenge is to improve data preparation performance while maintaining high compression ratios (comparable to genomic-specific compression algorithms) at low hardware cost. We address this challenge by leveraging key properties of genomic datasets to co-design (i) a lossless (de)compression algorithm, (ii) hardware that decompresses data with lightweight operations and efficient streaming accesses, (iii) storage data layout, and (iv) interface commands to access data. SAGe is highly versatile, as it supports datasets from different sequencing technologies and species. Due to its lightweight design, SAGe can be seamlessly integrated with a broad range of hardware accelerators for genome sequence analysis to mitigate their data preparation bottlenecks. Our results demonstrate that SAGe improves the average end-to-end performance and energy efficiency of two state-of-the-art genome sequence analysis accelerators by 3.0x-32.1x and 13.0x-34.0x, respectively, compared to when the accelerators rely on state-of-the-art software and hardware decompression tools.

2502.18710 2026-01-26 q-bio.NC cs.AI

Bridging Critical Gaps in Convergent Learning: How Representational Alignment Evolves Across Layers, Training, and Distribution Shifts

Chaitanya Kapoor, Sudhanshu Srivastava, Meenakshi Khosla

详情
英文摘要

Understanding convergent learning -- the degree to which independently trained neural systems -- whether multiple artificial networks or brains and models -- arrive at similar internal representations -- is crucial for both neuroscience and AI. Yet, the literature remains narrow in scope -- typically examining just a handful of models with one dataset, relying on one alignment metric, and evaluating networks at a single post-training checkpoint. We present a large-scale audit of convergent learning, spanning dozens of vision models and thousands of layer-pair comparisons, to close these long-standing gaps. First, we pit three alignment families against one another -- linear regression (affine-invariant), orthogonal Procrustes (rotation-/reflection-invariant), and permutation/soft-matching (unit-order-invariant). We find that orthogonal transformations align representations nearly as effectively as more flexible linear ones, and although permutation scores are lower, they significantly exceed chance, indicating a privileged representational basis. Tracking convergence throughout training further shows that nearly all eventual alignment crystallizes within the first epoch -- well before accuracy plateaus -- indicating it is largely driven by shared input statistics and architectural biases, not by the final task solution. Finally, when models are challenged with a battery of out-of-distribution images, early layers remain tightly aligned, whereas deeper layers diverge in proportion to the distribution shift. These findings fill critical gaps in our understanding of representational convergence, with implications for neuroscience and AI.

2412.16111 2026-01-26 q-bio.NC

How random connectivity shapes the fluctuating dynamics of finite-size neural populations

Nils E. Greven, Jonas Ranft, Tilo Schwalger

Journal ref PRX Life, 4(1), 013007 (2026)

详情
英文摘要

Mesoscopic models of finite-size neuronal populations are crucial to understand the dynamics of neural networks in the brain, especially their fluctuations and response to stimuli. However, current theories to derive such models are based on homogeneous all-to-all (full) connectivity. This assumption neglects the variance in the connectivity of biologically realistic networks with connection probabilities $p<1$ (non-full connectivity). To gain insight into the different fluctuation mechanisms underlying neural variability at the population level, we derive and analyze a stochastic mean-field model for finite-size networks of Poisson neurons with random connectivity (including non-full connectivity), external noise and disordered mean inputs. We treat the quenched disorder of the connectivity by an annealed approximation enabling a doubly stochastic description of synaptic inputs for finite network size. A further reduction leads to a low-dimensional closed system of coupled Langevin equations for the mean and variance of the membrane potentials as well as a variable capturing finite-size fluctuations. Compared to microscopic simulations, the mesoscopic model describes the fluctuations and nonlinearities well and outperforms previous theories that neglected the variance in the connectivity. The joint effect of connectivity disorder and finite network size can be analytically understood by a softening of the effective nonlinearity and the multiplicative character of spiking noise. The mesoscopic theory shows that quenched disorder can stabilize the asynchronous state, and it correctly predicts large quantitative and non-trivial qualitative effects of connection probability on the variance of the population firing rate and its dependence on stimulus strength. In conclusion, our theory elucidates how disordered connectivity shapes nonlinear dynamics and fluctuations of neural populations.

2410.18024 2026-01-26 q-bio.MN math.CT q-bio.QM

A mathematical framework to study organising principles in graphical representations of biochemical processes

Adittya Chaudhuri, Ralf Köhl, Olaf Wolkenhauer

详情
英文摘要

The complexity of molecular and cellular processes forces experimental studies to focus on subsystems. To study the functioning of biological systems across levels of structural and functional organisation, we require tools to compose and organise networks with different levels of detail and abstraction. Systems Biology Graphical Notation (SBGN) is a standardised notational system that visualises biochemical processes as networks. Despite their widespread adoption, SBGN languages remain purely visual and lack an underlying mathematical framework, limiting their compositional analysis, abstraction, and integration with formal modelling approaches. SBGN comprises three complementary visual languages-Process Description (SBGN-PD), Activity Flow (SBGN-AF), and Entity Relationship (SBGN-ER)-each operating at a different level of abstraction. In this manuscript, we introduce a category-theoretic formalism for SBGN-PD, a visual language to describe biochemical processes as biochemical reaction networks. Using the theory of structured cospans, we construct a symmetric monoidal double category whose horizontal 1-morphisms correspond to SBGN-PD diagrams. We also analyse how a designated subnetwork influences the surrounding network and how external entities, in turn, affect the internal reactions of the subnetwork. Our work addresses a key gap between biological visualisation and mathematical structure. It provides precise organising principles for SBGN-PD, including compositionality, enabling the construction of large biochemical reaction networks from smaller ones, and zooming out, allowing the abstraction of detailed biochemical mechanisms while preserving their functional interfaces. Throughout the paper, the proposed framework is illustrated using standard SBGN-PD examples, demonstrating its applicability to large-scale biochemical reaction networks.

2309.15566 2026-01-26 q-bio.NC

Simultaneity of consciousness with physical reality: the key that unlocks the mind-matter problem

John Sanfey

Comments Scheduled for publication in Frontiers in Psychology

详情
英文摘要

The problem of explaining the relationship between subjective experience and physical reality remains difficult and unresolved. In most explanations, consciousness is epiphenomenal, without causal power. The most notable exception is Integrated Information Theory (IIT), which provides a causal explanation for consciousness. However, IIT relies on an identity between subjectivity and a particular type of physical structure, namely with an information structure that has intrinsic causal power greater than the sum of its parts. Any theory that relies on a psycho-physical identity must eventually appeal to panpsychism, which undermines that theorys claim to be fundamental. IIT has recently pivoted towards a strong version of causal emergence, but macroscopic causal structures cannot be causally stronger than its microscopic parts without some new physical law or governing principle. The approach taken here is designed to uncover such a principle. The decisive argument is entirely deductive from initial premises that are phenomenologically certain. If correct, the arguments prove that conscious experience is sufficient to create additional degrees of causal freedom independently of the content of experience, and in a manner that is unpredictable and unobservable by any temporally sequential means. This provides a fundamental principle about consciousness, and a conceptual bridge between it and the physics describing what is experienced. The principle makes testable predictions about brain function, with notable differences from IIT, some of which are also empirically testable.

2106.07292 2026-01-26 q-bio.PE cs.CG q-bio.GN q-bio.QM

Ultrafast topological data analysis reveals pandemic-scale dynamics of convergent evolution

Michael Bleher, Lukas Hahn, Maximilian Neumann, Zachary Ardern, Juan Angel Patino-Galindo, Mathieu Carriere, Ulrich Bauer, Raul Rabadan, Andreas Ott

Comments substantial revision

详情
英文摘要

Genome variants which re-occur independently across evolutionary lineages are key molecular signatures of adaptation. Inferring the dynamics of such genetic changes from pandemic-scale genomic datasets is now possible, which opens up unprecedented insight into evolutionary processes. However, existing approaches depend on the construction of accurate phylogenetic trees, which remains challenging at scale. Here we present EVOtRec, an organism-agnostic, fast and scalable Topological Data Analysis approach that enables the inference of convergently evolving genomic variants over time directly from topological patterns in the dataset, without requiring the construction of a phylogenetic tree. Using data from both simulations and published experiments, we show that EVOtRec can robustly identify variants under positive selection and performs orders of magnitude faster than state-of-the-art phylogeny-based approaches, with comparable results. We apply EVOtRec to three large viral genome datasets: SARS-CoV-2, influenza virus A subtype H5N1 and HIV-1. We identify key convergent genome variants and demonstrate how EVOtRec facilitates the real-time tracking of high fitness variants in large datasets with millions of genomes, including effects modulated by varying genomic backgrounds. We envision our Topological Data Analysis approach as a new framework for efficient comparative genomics.