arXivDaily arXiv每日学术速递 周一至周五更新
2602.06649 2026-02-09 math.PR q-bio.PE

Growth Models Under Uniform Catastrophes

Joan Amaya, Valdivino V. Junior, Fábio P. Machado, Alejandro Roldán-Correa

Comments 17 pages, 2 figures

详情
英文摘要

We consider stochastic growth models for populations organized in colonies and subject to uniform catastrophes. To assess population viability, we analyze scenarios in which individuals adopt dispersion strategies after catastrophic events. For these models, we derive explicit expressions for the survival probability and the mean time to extinction, both with and without spatial constraints. In addition, we complement this analysis by comparing uniform catastrophes with binomial and geometric catastrophes in models with dispersion and no spatial restrictions. Here, the terms uniform, binomial and geometric refer to the probability distributions governing the number of individuals that survive immediately after a catastrophe. This comparison allows us to quantify the impact of different types of catastrophic events on population persistence.

2602.06418 2026-02-09 cs.LG q-bio.BM

Adaptive Protein Tokenization

Rohit Dilip, Ayush Varshney, David Van Valen

详情
英文摘要

Tokenization is a promising path to multi-modal models capable of jointly understanding protein sequences, structure, and function. Existing protein structure tokenizers create tokens by pooling information from local neighborhoods, an approach that limits their performance on generative and representation tasks. In this work, we present a method for global tokenization of protein structures in which successive tokens contribute increasing levels of detail to a global representation. This change resolves several issues with generative models based on local protein tokenization: it mitigates error accumulation, provides embeddings without sequence-reduction operations, and allows task-specific adaptation of a tokenized sequence's information content. We validate our method on reconstruction, generative, and representation tasks and demonstrate that it matches or outperforms existing models based on local protein structure tokenizers. We show how adaptive tokens enable inference criteria based on information content, which boosts designability. We validate representations generated from our tokenizer on CATH classification tasks and demonstrate that non-linear probing on our tokenized sequences outperforms equivalent probing on representations from other tokenizers. Finally, we demonstrate how our method supports zero-shot protein shrinking and affinity maturation.

2602.06394 2026-02-09 cs.AI cs.CE q-bio.GN q-fin.CP

Unlocking Noisy Real-World Corpora for Foundation Model Pre-Training via Quality-Aware Tokenization

Arvid E. Gollwitzer, Paridhi Latawa, David de Gruijl, Deepak A. Subramanian, Adrián Noriega de la Colina

详情
英文摘要

Current tokenization methods process sequential data without accounting for signal quality, limiting their effectiveness on noisy real-world corpora. We present QA-Token (Quality-Aware Tokenization), which incorporates data reliability directly into vocabulary construction. We make three key contributions: (i) a bilevel optimization formulation that jointly optimizes vocabulary construction and downstream performance, (ii) a reinforcement learning approach that learns merge policies through quality-aware rewards with convergence guarantees, and (iii) an adaptive parameter learning mechanism via Gumbel-Softmax relaxation for end-to-end optimization. Our experimental evaluation demonstrates consistent improvements: genomics (6.7 percentage point F1 gain in variant calling over BPE), finance (30% Sharpe ratio improvement). At foundation scale, we tokenize a pretraining corpus comprising 1.7 trillion base-pairs and achieve state-of-the-art pathogen detection (94.53 MCC) while reducing token count by 15%. We unlock noisy real-world corpora, spanning petabases of genomic sequences and terabytes of financial time series, for foundation model training with zero inference overhead.

2602.06296 2026-02-09 cs.RO q-bio.QM

Internalized Morphogenesis: A Self-Organizing Model for Growth, Replication, and Regeneration via Local Token Exchange in Modular Systems

Takeshi Ishida

详情
英文摘要

This study presents an internalized morphogenesis model for autonomous systems, such as swarm robotics and micro-nanomachines, that eliminates the need for external spatial computation. Traditional self-organizing models often require calculations across the entire coordinate space, including empty areas, which is impractical for resource-constrained physical modules. Our proposed model achieves complex morphogenesis through strictly local interactions between adjacent modules within the "body." By extending the "Ishida token model," modules exchange integer values using an RD-inspired discrete analogue without solving differential equations. The internal potential, derived from token accumulation and aging, guides autonomous growth, shrinkage, and replication. Simulations on a hexagonal grid demonstrated the emergence of limb-like extensions, self-division, and robust regeneration capabilities following structural amputation. A key feature is the use of the body boundary as a natural sink for information entropy (tokens) to maintain a dynamic equilibrium. These results indicate that sophisticated morphological behaviors can emerge from minimal, internal-only rules. This framework offers a computationally efficient and biologically plausible approach to developing self-repairing, adaptive, and autonomous hardware.

2601.02885 2026-02-09 eess.SY cs.SY q-bio.NC

A Mathematical Formalization of Self-Determining Agency

Yoshiyuki Ohmura, Earnest Kota Carr, Yasuo Kuniyoshi

详情
英文摘要

Defining agency is an extremely important challenge for cognitive science and artificial intelligence. Physics generally describes mechanical happenings, but there remains an unbridgeable gap between these and the acts of agents. To discuss the morality and responsibility of agents, it is necessary to model acts; whether such responsible acts can be fully explained by physical determinism remains an ongoing debate. Although we have already proposed a physical agent determinism model that appears to go beyond mere mechanical happenings, we have not yet established a strict mathematical formalism to eliminate ambiguity. Here, we explain why a physical system can follow coarse-graining agent-level determination without violating physical laws by formulating supervenient causation. Generally, supervenience including coarse graining does not change without a change in its lower base; therefore, a single supervenience alone cannot define supervenient causation. We define supervenient causation as the causal efficacy from the supervenience level to its lower base level. Although an algebraic expression composed of the multiple supervenient functions does supervenes on the base, an index sequence that determines the algebraic expression does not supervene on the base. Therefore, the sequence can possess unique dynamical laws that are independent of the lower base level. This independent dynamics creates the possibility for temporally preceding changes at the supervenience level to cause changes at the lower base level. Such a dual-laws system is considered useful for modeling self-determining agents such as humans.

2511.08855 2026-02-09 q-bio.GN stat.ML

Path Signatures Enable Model-Free Mapping of RNA Modifications

Maud Lemercier, Paola Arrubarrena, Salvatore Di Giorgio, Julia Brettschneider, Thomas Cass, Valerie Griesche, Isabel S. Naarmann-de Vries, Anastasia Papavasiliou, Alessia Ruggieri, Irem Tellioglu, Chia Ching Wu, F. Nina Papavasiliou, Terry Lyons

详情
英文摘要

Detecting chemical modifications on RNA molecules remains a key challenge in epitranscriptomics. Traditional reverse transcription-based sequencing methods introduce enzyme- and sequence-dependent biases and fragment RNA molecules, confounding the accurate mapping of modifications across the transcriptome. Nanopore direct RNA sequencing offers a powerful alternative by preserving native RNA molecules, enabling the detection of modifications at single-molecule resolution. However, current computational tools can identify only a limited subset of modification types within well-characterized sequence contexts for which ample training data exists. Here, we introduce a model-free computational method that reframes modification detection as an anomaly detection problem, requiring only canonical (unmodified) RNA reads without any other annotated data. For each nanopore read, our approach extracts robust, modification-sensitive features from the raw ionic current signal at a site using the signature transform, then computes an anomaly score by comparing the resulting feature vector to its nearest neighbors in an unmodified reference dataset. We convert anomaly scores into statistical p-values to enable anomaly detection at both individual read and site levels. Validation on densely-modified \textit{E. coli} rRNA demonstrates that our approach detects known sites harboring diverse modification types, without prior training on these modifications. We further applyied this framework to dengue virus (DENV) transcripts and mammalian mRNAs. For DENV sfRNA, it led to revealing a novel 2'-O-methylated site, which we validate orthogonally by qRT-PCR assays. These results demonstrate that our model-free approach operates robustly across different types of RNAs and datasets generated with different nanopore sequencing chemistries.

2510.25814 2026-02-09 q-bio.QM cs.LG

Optimizing Mirror-Image Peptide Sequence Design for Data Storage via Peptide Bond Cleavage Prediction

Yilong Lu, Si Chen, Songyan Gao, Han Liu, Xin Dong, Wenfeng Shen, Guangtai Ding

Comments 8 pages, 4 figures;Accepted by BIBM 2025

详情
英文摘要

Traditional non-biological storage media, such as hard drives, face limitations in both storage density and lifespan due to the rapid growth of data in the big data era. Mirror-image peptides composed of D-amino acids have emerged as a promising biological storage medium due to their high storage density, structural stability, and long lifespan. The sequencing of mirror-image peptides relies on \textit{de-novo} technology. However, its accuracy is limited by the scarcity of tandem mass spectrometry datasets and the challenges that current algorithms encounter when processing these peptides directly. This study is the first to propose improving sequencing accuracy indirectly by optimizing the design of mirror-image peptide sequences. In this work, we introduce DBond, a deep neural network based model that integrates sequence features, precursor ion properties, and mass spectrometry environmental factors for the prediction of mirror-image peptide bond cleavage. In this process, sequences with a high peptide bond cleavage ratio, which are easy to sequence, are selected. The main contributions of this study are as follows. First, we constructed MiPD513, a tandem mass spectrometry dataset containing 513 mirror-image peptides. Second, we developed the peptide bond cleavage labeling algorithm (PBCLA), which generated approximately 12.5 million labeled data based on MiPD513. Third, we proposed a dual prediction strategy that combines multi-label and single-label classification. On an independent test set, the single-label classification strategy outperformed other methods in both single and multiple peptide bond cleavage prediction tasks, offering a strong foundation for sequence optimization.

2510.11670 2026-02-09 physics.med-ph cs.NA math.NA q-bio.TO

Collagen and myocyte interplay in cardiac volume overload: a multi-constituent growth and remodeling framework

Ludovica Maga, Mathias Peirlinck, Lise Noël

详情
英文摘要

Hearts subjected to volume overload (VO) are prone to detrimental anatomical and functional changes in response to elevated mechanical loading, ultimately leading to heart failure. Experimental findings now emphasize that organ-scale changes following VO cannot be explained by myocyte growth alone, as traditionally proposed in the literature. Collagen degradation, in particular, has been associated with VO and assumed to play a central role in both its acute and chronic stages. This hypothesis, however, remains to be substantiated by comprehensive mechanistic evidence, and each constituent contribution to myocardial growth and remodeling (G&R) processes is yet to be quantified. In this work, we present a multi-constituent G&R framework that integrates a mixture-based constitutive model within the kinematic growth formulation. This framework enables us to mechanistically assess the relative contributions of collagen and myocyte changes to alterations in tissue properties, ventricular dimensions, and growth phenotype. Our numerical results confirm that collagen remodeling affects the passive mechanical response of the myocardium, whereas myocytes predominantly influence the extent and phenotype of VO-induced growth. Importantly, collagen degradation exacerbates myocyte hypertrophy, revealing a synergistic interplay that accelerates the left ventricular eccentric growth and thereby promotes systolic dysfunction. This work constitutes an important step towards an integrated characterization of the early compensatory stages of VO-induced cardiac G&R.

2510.06781 2026-02-09 q-bio.OT

The Epigenetic Tapestry: A Review of DNA Methylation and Non-Coding RNA's Interplay with Genetic Threads, Weaving a Network Impacting Gene Expression and Disease Manifestations

Yu-Li He, Youshin Loh

Comments 31 pages, unaffiliated review article

详情
英文摘要

The emerging field of epigenetics has recently unveiled a dynamic landscape in which gene expression is not determined solely by genetic sequences but also by intricate regulatory mechanisms. This review examines the interactions between these regulatory mechanisms, including DNA methylation and non-coding RNAs (ncRNAs), that orchestrate gene expression fine-tuning for cellular homeostasis and the pathogenesis of a multitude of diseases. We explore long non-coding RNAs (lncRNAs) such as telomeric repeat-containing RNA (TERRA) and Fendrr, highlighting their role in protein regulation to ensure proper gene activation or silencing. Additionally, we explain the therapeutic potential of brain-derived neurotrophic factor (BDNF)-related microRNA 132, which has shown promise in treating chronic illnesses by restoring BDNF levels. Finally, this review covers the role of DNA methyltransferases and ncRNAs in cancer, focusing on how lncRNAs contribute to X chromosome inactivation and interact with chromatin-modifying complexes and DNA methyltransferase inhibitors to reduce cancer cell aggressiveness. By amalgamating the wide array of research in this field, we aim to provide glimpses into the complex entangling of genetics and environment as they control gene expressions.

2507.13501 2026-02-09 cs.CL math.RA q-bio.NC

Encoding syntactic objects and Merge operations in function spaces

Matilde Marcolli, Robert C. Berwick

Comments 48 pages, LaTeX, 4 png figures; v2: expository changes

详情
英文摘要

We provide a mathematical argument showing that, given a representation of lexical items as functions (wavelets, for instance) in some function space, it is possible to construct a faithful representation of arbitrary syntactic objects in the same function space. This space can be endowed with a commutative non-associative semiring structure built using the second Renyi entropy. The resulting representation of syntactic objects is compatible with the magma structure. The resulting set of functions is an algebra over an operad, where the operations in the operad model circuits that transform the input wave forms into a combined output that encodes the syntactic structure. The action of Merge on workspaces is faithfully implemented as action on these circuits, through a coproduct and a Hopf algebra Markov chain. The results obtained here provide a constructive argument showing the theoretical possibility of a neurocomputational realization of the core computational structure of syntax. We also present a particular case of this general construction where this type of realization of Merge is implemented as a cross frequency phase synchronization on sinusoidal waves. This also shows that Merge can be expressed in terms of the successor function of a semiring, thus clarifying the well known observation of its similarities with the successor function of arithmetic.

2503.05591 2026-02-09 q-bio.QM

IUPAC-Induced Computational Approaches for Identifying Boosters of Small Biomolecule Functionality: A Case Study of Human Tyrosyl-DNA Phosphodiesterase 1 (TDP1) Inhibitors

Mariya L. Ivanova, Nicola Russo, Gueorgui Mihaylov, Konstantin Nikolic

Comments 18 pages, 7 figures, 7 tables

Journal ref 2026, Computers in Biology and Medicine

详情
英文摘要

This paper introduces several proof-of-concept (PoC) computational methods intended to offer biochemical researchers straightforward, time- and cost-effective strategies to accelerate their work. While Machine Learning (ML) models were developed, the study's central purpose was to explore approaches for the identification of desirable functional groups/fragments in small biomolecules regarding a specific functionality, which, in this case, was human tyrosyl-DNA phosphodiesterase 1 (TDP1) inhibition. This was achieved primarily by tokenising IUPAC names to generate features. Additionally, the applicability of the CID_SID ML model for predicting TDP1 activity was developed and explored. Since these computational approaches were not experimentally validated due to a lack of appropriate laboratory facilities, they are presented as open proposals for further laboratory investigation.

2408.07551 2026-02-09 cond-mat.soft q-bio.CB

Tissue-Intrinsic Shape Mechanics in Growing Pre-Migratory Tumor Spheroids

Urban Železnik, Matej Krajnc, Tanmoy Sarkar

Comments 13 pages, 4 figures, 1 supplementary document, and 9 supplementary movies

详情
英文摘要

One of the hallmarks of pre-migratory tumors is the progressive loss of compact morphology. To investigate how tumors may intrinsically regulate their shape during growth, we employ a three-dimensional (3D) vertex model of multicellular aggregates that incorporates key structural features of tumor spheroids, including its surface, a proliferative rim, and a necrotic core. Focusing exclusively on tumor-intrinsic mechanical interactions, we examine how their collective effects guide morphological evolution en route to metastasis. We show that spheroids acquire lobulated morphologies through an interplay between differential tensions at the spheroid surface and the living-necrotic interface (LNI), together with differential growth within the proliferative rim. In addition, spheroid shapes can be substantially modulated by tissue rheological properties emerging from active, cell-scale forces. Our cell- and tissue-scale simulations of tumor morphologies are enabled by a computational framework that overcomes a major limitation of 3D vertex models - the lack of cell-division - by introducing a graph-based polyhedral-division algorithm within the Graph Vertex Model (GVM).

2602.06200 2026-02-09 q-bio.PE nlin.AO physics.soc-ph

Threshold Resource Redistribution in Spatially-Structured Kinship Networks

Alina Kochocki

Comments 39 pages, 8 figures. Submitted to Journal of Theoretical Biology

详情
英文摘要

We present a model for a threshold-based resource redistribution process in a spatially-explicit population, characterizing the relation between kinship network structure, local interactions and persistence. We find that population survival becomes possible for lower resource densities, but leads to increased network heterogeneity and locally centralized clusters. We interpret this in relation to a feedback between the kinship network structure and reproduction ability. Agents receive stochastic resources and solicit additional resources from connected individuals when below a minimum, with each agent contributing a fraction of their excess based on relatedness. We first analyze a fully-connected population with uniform redistribution fraction and discuss mean field expectations as well as finite size corrections. We extend this model to a hub-and-spoke network, exploring the impact of network asymmetry or centrality on resource distribution. We then develop a spatially-limited population model with diffusion, local pairing, reproduction and mortality. Redistribution is introduced as a function of relatedness (generational distance through most-recent common ancestor) and distance. Redistribution-dependent populations exhibit a higher level of relational closeness with increased clustering for agents of highest node strength. These results highlight the interaction of resource density, cooperation and kinship in a spatially-limited regime.

2602.06106 2026-02-09 physics.soc-ph q-bio.PE

To clean or not to clean: The free-rider problem in sequentially shared resources

Alexander Feigel, Alexandre V. Morozov

Comments 12 pages, 4 figures in the main text, 1 figure in the Supplement

详情
英文摘要

Shared resources enhance productivity yet at the same time provide channels for biological and digital contamination, turning physical or digital hygiene into a cooperation dilemma prone to free-riding. Here we introduce a game of sequential sharing of common resources, an empirically parameterized evolutionary model of population dynamics in sequential-use settings such as gyms and shared workspaces. The success of the strategies implemented in the model, such as cleaning equipment before or after use, are based on the trade-offs between cleaning costs, contamination risk, and social incentives to mitigate disease transmission. We find that cooperative hygiene can be achieved by lowering the effective costs of cleaning, strengthening pro-social incentives, and monitoring population-level noncompliance. Remarkably, stability of fully altruistic populations is primarily affected by the cleaning costs. In contrast, increasing effective infection costs, for example through punishment, appears less important in this case. The model's evolutionary dynamics exhibit multi-stability, hysteresis, and abrupt shifts in strategy composition, broadly consistent with empirical observations from shared-use facilities. Our framework offers testable predictions and is amenable to quantitative calibration with behavioral and environmental data. Our predictions can be used to inform the design of cost-effective public health and digital security policies.

2602.00261 2026-02-09 physics.bio-ph q-bio.MN

You ain't seen nothing, and yet: Future biochemical concentrations can be predicted with surprisingly high accuracy

Ketevan Danelia, Sean A. Ridout, Ilya Nemenman

Comments 13 pages, 3 figures

详情
英文摘要

Accurate sensing of chemical concentrations is essential for numerous biological processes. The accuracy of this sensing, for small numbers of molecules, is limited by shot noise. Corresponding theoretical limits on sensing precision, as a function of sensing duration, have been well-studied in the context of quasi-static and randomly fluctuating concentrations. However, during development and in many other cases, concentration profiles are not random but exhibit predictable spatiotemporal patterns. We propose that leveraging prior knowledge of these structured profiles can improve and accelerate concentration sensing by utilizing information from current molecular binding events to predict future concentrations. By framing the constrained sensing problem as Bayesian inference over an allowed class of spatiotemporal profiles, we derive new theoretical limits on sensing accuracy. Our analysis reveals that maximum a posteriori (MAP) estimation can outperform the classical Berg-Purcell and maximum-likelihood (Poisson counting) limits, achieving a sensing precision of $δc/c = 1/\sqrt{a^2N}$, where $N$ is the number of binding events, and $a > 1$ in certain cases. Thus knowledge of the statistical structure of concentration profiles enhances sensing precision, providing a potential explanation for the rapid yet highly accurate cell fate decisions observed during development.

2506.00602 2026-02-09 stat.AP q-bio.QM

Assessing Honey Bee Colony Health Using Temperature Time Series

Karina Arias-Calluari, Theotime Colin, Tanya Latty, Mary Myerscough, Eduardo G. Altmann

Comments 14 pages, 7 figures and 1 repository

Journal ref J R Soc Interface 23 (235): 202e50505 (2026)

详情
英文摘要

Honey bees face an increasing number of stressors that disrupt the natural behaviour of colonies and, in extreme cases, can lead to their collapse. Quantifying the status and resilience of colonies is essential to measure the impact of stressors and to identify colonies at risk. In this manuscript, we present and apply new methodologies to efficiently diagnose the status of a honey bee colony from widely available time series of hive and environmental temperature. Healthy hives have a remarkable ability to control temperature near the brood area. Our method exploits this fact and quantifies the status of a hive by measuring how resilient they are to extreme environmental temperatures, which act as natural stressors. Analysing 22 hives during different times of the year, including 3 hives that collapsed, we find the statistical signatures of stress that reveal whether honeybees are doing well or are at risk of failure. Based on these analyses, we propose a simple scale of hive status (stable, warning, and collapse) that can be determined based on a few temperature measurements. Our approach offers a lower-cost and practical bee-monitoring solution, providing a non-invasive way to track hive conditions and trigger interventions to save the hives from collapse.

2502.01879 2026-02-09 math.OC q-bio.PE

Optimizing Impulsive Releases: A Species Competition Model

Jéssica C. S. Alves, Sergio M. Oliva, Christian E. Schaerer

Comments 22 pages, 16 figures

Journal ref Applied Mathematical Modelling 151 (2026) 116517

详情
英文摘要

This study focuses on optimizing species release $S_2$ to control species population $S_1$ through impulsive release strategies. We investigate the conditions required to remove species $S_1$, which is equivalent to the establishment of $S_2$. The research includes a theoretical analysis that examines the positivity, existence, and uniqueness of solutions, the conditions ensuring global stability, and a sufficient condition for controlling the $S_1$-free solution. In addition, we formulate an optimal control problem to maximize the effectiveness of $S_2$ releases, manage the population of $S_1$, and minimize the costs associated with this intervention strategy. Numerical simulations are conducted to validate the proposed theories and allow visualization of population dynamics under various release scenarios.