arXivDaily arXiv每日学术速递 周一至周五更新
2601.19852 2026-01-28 q-bio.OT

Hyperdisorder in tumor growth

Arturo Tozzi

Comments 10 pages, 1 figure

详情
英文摘要

Tumor growth is constrained by spatial, mechanical, and metabolic factors whose alignment progressively breaks down across cellular, mesoscopic, and tissue scales as tumors expand. We hypothesize that this misalignment drives tumors toward a distinct architectural regime, termed hyperdisorder. Hyperdisorder is not defined by increased heterogeneity alone, but by the coexistence of elevated disorder across scales and spatial nonstationarity within the same tumor. Unlike ordinary randomness, where independent fluctuations diminish under spatial averaging, disorder here persists, reorganizes, or even amplifies with increasing observation scale, preventing convergence toward a stable architectural description. Using hematoxylin and eosin stained whole-slide images of gastric cancer from The Cancer Genome Atlas, we quantify tumor architecture using tile-based metrics that capture complementary aspects of organization, including texture entropy, microstructural fragmentation, orientation isotropy, and multiscale entropy variation. These measures are combined into a standardized hyperdisorder index, enabling unsupervised comparison across spatial regions. We find that architectural disruption is unevenly distributed and partially decoupled across scales within individual slides, consistent with growth-driven multiscale incoherence rather than uniform stochastic variability. Testable consequences include anomalous scaling of heterogeneity with sampling size, failure of coarse graining to converge, and systematic differences between tumor cores and invasive fronts. In diagnostic and clinical contexts, this framework clarifies when measurements from limited tissue samples are representative of the whole tumor and when they are dominated by scale- and location-dependent effects.

2601.19681 2026-01-28 q-bio.PE

Long-term evolution of regulatory DNA sequences. Part 1: Simulations on global, biophysically-realistic genotype-phenotype maps

Elia Mascolo, Réka Borbély, Santiago Herrera-Álvarez, Calin C Guet, Justin Crocker, Gašper Tkačik

Comments Invited review (Part I of a two-part series), submitted to Current Opinion in Genetics & Development

详情
英文摘要

Promoters and enhancers are cis-regulatory elements (CREs), DNA sequences that bind transcription factor (TF) proteins to up- or down-regulate target genes. Decades-long efforts yielded TF-DNA interaction models that predict how strongly an individual TF binds arbitrary DNA sequences and how individual binding events on the CRE combine to affect gene expression. These insights can be synthesized into a global, biophysically-realistic, and quantitative genotype-phenotype (GP) map for gene regulation, a "holy grail" for the application of evolutionary theory. A global map provides a rare opportunity to simulate long-term evolution of regulatory sequences and pose several fundamental questions: How long does it take to evolve CREs de novo? How many non-trivial regulatory functions exist in sequence space? How connected are they? For which regulatory architecture is CRE evolution most rapid and evolvable? In this article, the first of a two-part series, we briefly review the pertinent modeling and simulation efforts for a unique system that enables close, quantitative, and mechanistic links between biophysics, as well as systems, synthetic, and evolutionary biology.

2601.19384 2026-01-28 cs.AR q-bio.GN

GenPairX: A Hardware-Algorithm Co-Designed Accelerator for Paired-End Read Mapping

Julien Eudine, Chu Li, Zhuo Cheng, Renzo Andri, Can Firtina, Mohammad Sadrosadati, Nika Mansouri Ghiasi, Konstantina Koliogeorgi, Anirban Nag, Arash Tavakkol, Haiyu Mao, Onur Mutlu, Shai Bergman, Ji Zhang

详情
英文摘要

Genome sequencing has become a central focus in computational biology. A genome study typically begins with sequencing, which produces millions to billions of short DNA fragments known as reads. Read mapping aligns these reads to a reference genome. Read mapping for short reads comes in two forms: single-end and paired-end, with the latter being more prevalent due to its higher accuracy and support for advanced analysis. Read mapping remains a major performance bottleneck in genome analysis due to expensive dynamic programming. Prior efforts have attempted to mitigate this cost by employing filters to identify and potentially discard computationally expensive matches and leveraging hardware accelerators to speed up the computations. While partially effective, these approaches have limitations. In particular, existing filters are often ineffective for paired-end reads, as they evaluate each read independently and exhibit relatively low filtering ratios. In this work, we propose GenPairX, a hardware-algorithm co-designed accelerator that efficiently minimizes the computational load of paired-end read mapping while enhancing the throughput of memory-intensive operations. GenPairX introduces: (1) a novel filtering algorithm that jointly considers both reads in a pair to improve filtering effectiveness, and a lightweight alignment algorithm to replace most of the computationally expensive dynamic programming operations, and (2) two specialized hardware mechanisms to support the proposed algorithms. Our evaluations show that GenPairX delivers substantial performance improvements over state-of-the-art solutions, achieving 1575x and 1.43x higher throughput per watt compared to leading CPU-based and accelerator-based read mappers, respectively, all without compromising accuracy.

2601.19257 2026-01-28 q-bio.BM cs.AI cs.LG

PCEvo: Path-Consistent Molecular Representation via Virtual Evolutionary

Kun Li, Longtao Hu, Yida Xiong, Jiajun Yu, Hongzhi Zhang, Jiameng Chen, Xiantao Cai, Jia Wu, Wenbin Hu

Comments 10 pages, 4 figures, 5 tables

详情
英文摘要

Molecular representation learning aims to learn vector embeddings that capture molecular structure and geometry, thereby enabling property prediction and downstream scientific applications. In many AI for science tasks, labeled data are expensive to obtain and therefore limited in availability. Under the few-shot setting, models trained with scarce supervision often learn brittle structure-property relationships, resulting in substantially higher prediction errors and reduced generalization to unseen molecules. To address this limitation, we propose PCEvo, a path-consistent representation method that learns from virtual paths through dynamic structural evolution. PCEvo enumerates multiple chemically feasible edit paths between retrieved similar molecular pairs under topological dependency constraints. It transforms the labels of the two molecules into stepwise supervision along each virtual evolutionary path. It introduces a path-consistency objective that enforces prediction invariance across alternative paths connecting the same two molecules. Comprehensive experiments on the QM9 and MoleculeNet datasets demonstrate that PCEvo substantially improves the few-shot generalization performance of baseline methods. The code is available at https://anonymous.4open.science/r/PCEvo-4BF2.

2601.19205 2026-01-28 q-bio.BM cs.AI cs.LG

EnzyPGM: Pocket-conditioned Generative Model for Substrate-specific Enzyme Design

Zefeng Lin, Zhihang Zhang, Weirong Zhu, Tongchang Han, Xianyong Fang, Tianfan Fu, Xiaohua Xu

Comments 9 pages, 4 figures, under review

详情
英文摘要

Designing enzymes with substrate-binding pockets is a critical challenge in protein engineering, as catalytic activity depends on the precise interaction between pockets and substrates. Currently, generative models dominate functional protein design but cannot model pocket-substrate interactions, which limits the generation of enzymes with precise catalytic environments. To address this issue, we propose EnzyPGM, a unified framework that jointly generates enzymes and substrate-binding pockets conditioned on functional priors and substrates, with a particular focus on learning accurate pocket-substrate interactions. At its core, EnzyPGM includes two main modules: a Residue-atom Bi-scale Attention (RBA) that jointly models intra-residue dependencies and fine-grained interactions between pocket residues and substrate atoms, and a Residue Function Fusion (RFF) that incorporates enzyme function priors into residue representations. Also, we curate EnzyPock, an enzyme-pocket dataset comprising 83,062 enzyme-substrate pairs across 1,036 four-level enzyme families. Extensive experiments demonstrate that EnzyPGM achieves state-of-the-art performance on EnzyPock. Notably, EnzyPGM reduces the average binding energy of 0.47 kcal/mol over EnzyGen, showing its superior performance on substrate-specific enzyme design. The code and dataset will be released later.

2601.19125 2026-01-28 q-bio.NC math.DS nlin.PS

Stroboscopic motion reversals in delay-coupled neural fields

Noah Parks, Zachary P Kilpatrick

Comments 29 pages, 7 figures

详情
英文摘要

Visual illusions provide a window into the mechanisms underlying visual processing, and dynamical neural circuit models offer a natural framework for proposing and testing theories of their emergence. We propose and analyze a delay-coupled neural field model that explains stroboscopic percepts arising from the subsampling of a moving, often rotating, stimulus, such as the wagon-wheel illusion. Motivated by the role of activity propagation delays in shaping visual percepts, we study neural fields with both uniform and spatially dependent delays, representing the finite time required for signals to travel along axonal projections. Each module is organized as a ring of neurons encoding angular preference, with instantaneous local coupling and delayed long-range coupling strongest between neurons with similar preference. We show that delays generate a family of coexisting traveling bump solutions with distinct, quantized propagation speeds. Using interface-based asymptotic methods, we reduce the neural field dynamics to a low-dimensional system of coupled delay differential equations, enabling a detailed analysis of speed selection, stability, entrainment, and state transitions. Regularly pulsed inputs induce transitions between distinct speed states, including motion opposite to the forcing direction, capturing key features of visual aliasing and stroboscopic motion reversal. These results demonstrate how delayed neural interactions organize perception into discrete dynamical states and provide a mechanistic explanation for stroboscopic visual illusions.

2601.19064 2026-01-28 q-bio.PE cs.DS

LvD: A New Algorithm for Computing the Likelihood of a Phylogeny

David Bryant, Celine Scornavacca, David Swofford

详情
英文摘要

There are few, if any, algorithms in statistical phylogenetics which are used more heavily than Felsenstein's 1973 pruning method for computing the likelihood of a tree. We present LvD, (Likelihood via Decomposition), an alternative to Felsenstein's algorithm based on a different decomposition of the underlying phylogeny. It works for all standard nucleotide models. The new algorithm allows updates of the likelihood calculation in worst case $O(\log n)$ time with $n$ taxa, as opposed to worst case $O(n)$ time for existing methods. In practice this leads to appreciable improvements in likelihood calculations, the extent of speed-up depending on how balanced or unbalanced the trees are. We explore implications for parallel computing, and show that the approach allows likelihoods to be computed in $O(\log n)$ parallel time per site, compared to (worst case) $O(n)$ time. We implemented and applied the algorithm to large numbers of simulated and empirical data sets and showed that these theoretical advances lead to a significant practical speed-up, although the extent of the improvement depends on how balanced the phylogenies already are.

2601.19037 2026-01-28 cs.LG q-bio.QM stat.ML

XIMP: Cross Graph Inter-Message Passing for Molecular Property Prediction

Anatol Ehrlich, Lorenz Kummer, Vojtech Voracek, Franka Bause, Nils M. Kriege

详情
英文摘要

Accurate molecular property prediction is central to drug discovery, yet graph neural networks often underperform in data-scarce regimes and fail to surpass traditional fingerprints. We introduce cross-graph inter-message passing (XIMP), which performs message passing both within and across multiple related graph representations. For small molecules, we combine the molecular graph with scaffold-aware junction trees and pharmacophore-encoding extended reduced graphs, integrating complementary abstractions. While prior work is either limited to a single abstraction or non-iterative communication across graphs, XIMP supports an arbitrary number of abstractions and both direct and indirect communication between them in each layer. Across ten diverse molecular property prediction tasks, XIMP outperforms state-of-the-art baselines in most cases, leveraging interpretable abstractions as an inductive bias that guides learning toward established chemical concepts, enhancing generalization in low-data settings.

2601.18640 2026-01-28 cs.LG q-bio.MN

TwinPurify: Purifying gene expression data to reveal tumor-intrinsic transcriptional programs via self-supervised learning

Zhiwei Zheng, Kevin Bryson

详情
英文摘要

Advances in single-cell and spatial transcriptomic technologies have transformed tumor ecosystem profiling at cellular resolution. However, large scale studies on patient cohorts continue to rely on bulk transcriptomic data, where variation in tumor purity obscures tumor-intrinsic transcriptional signals and constrains downstream discovery. Many deconvolution methods report strong performance on synthetic bulk mixtures but fail to generalize to real patient cohorts because of unmodeled biological and technical variation. Here, we introduce TwinPurify, a representation learning framework that adapts the Barlow Twins self-supervised objective, representing a fundamental departure from the deconvolution paradigm. Rather than resolving the bulk mixture into discrete cell-type fractions, TwinPurify instead learns continuous, high-dimensional tumor embeddings by leveraging adjacent-normal profiles within the same cohort as "background" guidance, enabling the disentanglement of tumor-specific signals without relying on any external reference. Benchmarked against multiple large cancer cohorts across RNA-seq and microarray platforms, TwinPurify outperforms conventional representation learning baselines like auto-encoders in recovering tumor-intrinsic and immune signals. The purified embeddings improve molecular subtype and grade classification, enhance survival model concordance, and uncover biologically meaningful pathway activities compared to raw bulk profiles. By providing a transferable framework for decontaminating bulk transcriptomics, TwinPurify extends the utility of existing clinical datasets for molecular discovery.

2601.18604 2026-01-28 cs.LG q-bio.GN

LaCoGSEA: Unsupervised deep learning for pathway analysis via latent correlation

Zhiwei Zheng, Kevin Bryson

详情
英文摘要

Motivation: Pathway enrichment analysis is widely used to interpret gene expression data. Standard approaches, such as GSEA, rely on predefined phenotypic labels and pairwise comparisons, which limits their applicability in unsupervised settings. Existing unsupervised extensions, including single-sample methods, provide pathway-level summaries but primarily capture linear relationships and do not explicitly model gene-pathway associations. More recently, deep learning models have been explored to capture non-linear transcriptomic structure. However, their interpretation has typically relied on generic explainable AI (XAI) techniques designed for feature-level attribution. As these methods are not designed for pathway-level interpretation in unsupervised transcriptomic analyses, their effectiveness in this setting remains limited. Results: To bridge this gap, we introduce LaCoGSEA (Latent Correlation GSEA), an unsupervised framework that integrates deep representation learning with robust pathway statistics. LaCoGSEA employs an autoencoder to capture non-linear manifolds and proposes a global gene-latent correlation metric as a proxy for differential expression, generating dense gene rankings without prior labels. We demonstrate that LaCoGSEA offers three key advantages: (i) it achieves improved clustering performance in distinguishing cancer subtypes compared to existing unsupervised baselines; (ii) it recovers a broader range of biologically meaningful pathways at higher ranks compared with linear dimensionality reduction and gradient-based XAI methods; and (iii) it maintains high robustness and consistency across varying experimental protocols and dataset sizes. Overall, LaCoGSEA provides state-of-the-art performance in unsupervised pathway enrichment analysis. Availability and implementation: https://github.com/willyzzz/LaCoGSEA

2601.11018 2026-01-28 q-bio.NC cs.CV

KOCOBrain: Kuramoto-Guided Graph Network for Uncovering Structure-Function Coupling in Adolescent Prenatal Drug Exposure

Badhan Mazumder, Lei Wu, Sir-Lord Wiafe, Vince D. Calhoun, Dong Hye Ye

Comments Preprint version of the paper accepted to the IEEE International Symposium on Biomedical Imaging (ISBI 2026). This is the author's accepted manuscript. The final published version will appear in IEEE Xplore

详情
英文摘要

Exposure to psychoactive substances during pregnancy, such as cannabis, can disrupt neurodevelopment and alter large-scale brain networks, yet identifying their neural signatures remains challenging. We introduced KOCOBrain: KuramotO COupled Brain Graph Network; a unified graph neural network framework that integrates structural and functional connectomes via Kuramoto-based phase dynamics and cognition-aware attention. The Kuramoto layer models neural synchronization over anatomical connections, generating phase-informed embeddings that capture structure-function coupling, while cognitive scores modulate information routing in a subject-specific manner followed by a joint objective enhancing robustness under class imbalance scenario. Applied to the ABCD cohort, KOCOBrain improved prenatal drug exposure prediction over relevant baselines and revealed interpretable structure-function patterns that reflect disrupted brain network coordination associated with early exposure.

2601.11013 2026-01-28 cond-mat.soft nlin.AO physics.bio-ph physics.chem-ph q-bio.BM

De novo emergence of metabolically active protocells

Nayan Chakraborty, Shashi Thutupalli

详情
英文摘要

A continuous route from a disordered soup of simple chemical feedstocks to a functional protocell -- a compartment that metabolizes, grows, and propagates -- remains elusive. Here, we show that a homogeneous aqueous chemical mixture containing phosphorus, iron, molybdenum salts and formaldehyde spontaneously self-organizes into compartments that couple robust non-equilibrium chemical dynamics to their own growth. These structures mature to a sustained, dissipative steady state and support an organic synthetic engine, producing diverse molecular species including many core biomolecular classes. Internal spherules that are themselves growth-competent are produced within the protocells, establishing a rudimentary mode of self-perpetuation. The chemical dynamics we observe in controlled laboratory conditions also occur in reaction mixtures exposed to natural day-night cycles. Strikingly, the morphology and chemical composition of the protocells in our experiments closely resemble molybdenum-rich microspheres recently discovered in current oceanic environments. Our work establishes a robust, testable route to de novo protocell formation. The emergence of life-like spatiotemporal organization and chemical dynamics from minimal initial conditions is more facile than previously thought and could be a recurring natural phenomenon.

2509.15278 2026-01-28 q-bio.OT cs.CR cs.CY eess.IV

Assessing metadata privacy in neuroimaging

Emilie Kibsgaard, Anita Sue Jwa, Christopher J Markiewicz, David Rodriguez Gonzalez, Judith Sainz Pardo, Russell A. Poldrack, Cyril R. Pernet

Comments 19 pages, 7 tables, 1 figure, original analysis of 6 Open Datasets

详情
英文摘要

The ethical and legal imperative to share research data without causing harm requires careful attention to privacy risks. While mounting evidence demonstrates that data sharing benefits science, legitimate concerns persist regarding the potential leakage of personal information that could lead to reidentification and subsequent harm. We reviewed metadata accompanying neuroimaging datasets from heterogeneous studies openly available on OpenNeuro, involving participants across the lifespan, from children to older adults, with and without clinical diagnoses, and including associated clinical score data. Using metaprivBIDS (https://github.com/CPernet/metaprivBIDS), a software application for BIDS compliant tsv/json files that computes and reports different privacy metrics (k-anonymity, k-global, l-diversity, SUDA, PIF), we found that privacy is generally well maintained, with serious vulnerabilities being rare. Nonetheless, issues were identified in nearly all datasets and warrant mitigation. Notably, clinical score data (e.g., neuropsychological results) posed minimal reidentification risk, whereas demographic variables: age, sex assigned at birth, sexual orientations, race, income, and geolocation, represented the principal privacy vulnerabilities. We outline practical measures to address these risks, enabling safer data sharing practices.

2507.14056 2026-01-28 cs.LG cs.AI q-bio.NC

Noradrenergic-inspired gain modulation attenuates the stability gap in joint training

Alejandro Rodriguez-Garcia, Anindya Ghosh, Srikanth Ramaswamy

Comments 23 pages, 9 figures, 6 table, 1 pseudo-code

详情
英文摘要

Recent work in continual learning has highlighted the stability gap -- a temporary performance drop on previously learned tasks when new ones are introduced. This phenomenon reflects a mismatch between rapid adaptation and strong retention at task boundaries, underscoring the need for optimization mechanisms that balance plasticity and stability over abrupt distribution changes. While optimizers such as momentum-SGD and Adam introduce implicit multi-timescale behavior, they still exhibit pronounced stability gaps. Importantly, these gaps persist even under ideal joint training, making it crucial to study them in this setting to isolate their causes from other sources of forgetting. Motivated by how noradrenergic (neuromodulatory) bursts transiently increase neuronal gain under uncertainty, we introduce a dynamic gain scaling mechanism as a two-timescale optimization technique that balances adaptation and retention by modulating effective learning rates and flattening the local landscape through an effective reparameterization. Across domain- and class-incremental MNIST, CIFAR, and mini-ImageNet benchmarks under task-agnostic joint training, dynamic gain scaling effectively attenuates stability gaps while maintaining competitive accuracy, improving robustness at task transitions.

2506.11062 2026-01-28 q-bio.NC cs.AI cs.NE

Decoding Cortical Microcircuits: A Generative Model for Latent Space Exploration and Controlled Synthesis

Xingyu Liu, Yubin Li, Guozhang Chen

Journal ref AAAI 2026 NeuroAI Workshop

详情
英文摘要

A central idea in understanding brains and building artificial intelligence is that structure determines function. Yet, how the brain's complex structure arises from a limited set of genetic instructions remains a key question. The ultra high-dimensional detail of neural connections vastly exceeds the information storage capacity of genes, suggesting a compact, low-dimensional blueprint must guide brain development. Our motivation is to uncover this blueprint. We introduce a generative model, to learn this underlying representation from detailed connectivity maps of mouse cortical microcircuits. Our model successfully captures the essential structural information of these circuits in a compressed latent space. We found that specific, interpretable directions within this space directly relate to understandable network properties. Building on this, we demonstrate a novel method to controllably generate new, synthetic microcircuits with desired structural features by navigating this latent space. This work offers a new way to investigate the design principles of neural circuits and explore how structure gives rise to function, potentially informing the development of more advanced artificial neural networks.

2408.08080 2026-01-28 stat.AP q-bio.QM

Assessing the properties of the prediction interval in random-effects meta-analysis

Peter Matrai, Tamas Koi, Zoltan Sipos, Nelli Farkas

Comments To be published in Research Synthesis Methods

Journal ref Research Synthesis Methods (2026)

详情
英文摘要

Random effects meta-analysis is a widely applied methodology to synthetize research findings of studies in a specific scientific question. Besides estimating the mean effect, an important aim of the meta-analysis is to summarize the heterogeneity, i.e. the variation in the underlying effects caused by the differences in study circumstances. The prediction interval is frequently used for this purpose: a 95% prediction interval contains the true effect of a similar new study in 95% of the cases when it is constructed, or in other words, it covers 95% of the true effects distribution on average. In this article, after providing a clear mathematical background, we present an extensive simulation investigating the performance of all frequentist prediction interval methods published to date. The work focuses on the distribution of the coverage probabilities and how these distributions change depending on the amount of heterogeneity and the number of involved studies. Although the single requirement that a prediction interval has to fulfill is to keep a nominal coverage probability on average, we demonstrate why the distribution of coverages cannot be disregarded, and that for small number of studies no reliable conclusion can be drawn from the prediction interval. We argue that assessing only the mean coverage can easily lead to misunderstanding and misinterpretation. The length of the intervals and the robustness of the methods concerning non-normality of the true effects are also investigated.

2406.06765 2026-01-28 q-bio.PE stat.AP

Classical JAK2V617F+ Myeloproliferative Neoplasms emergence and development based on real life incidence and mathematical modeling

Ana Fernández Baranda, Vincent Bansaye, Evelyne Lauret, Morgane Mounier, Valérie Ugo, Sylvie Méléard, Stéphane Giraudier

详情
英文摘要

Mathematical modeling allows us to better understand myeloproliferative neoplasms (MPN), a group of blood cancers, emergence and development. We test different mathematical models on an initial cohort to determine the emergence and evolution times before diagnosis of JAK2V617F+ classical MPN (Polycythemia Vera (PV) and Essential Thrombocythemia (ET)). We consider the time before diagnosis as the sum of two independent periods: the time (from embryonic development) for the JAK2V617F mutation to occur, not disappear and enter proliferation, and a second time corresponding to the expansion of the clonal population until diagnosis. We prove that the rate of active mutation occurrence increases exponentially with age following the Gompertz model rather than being constant. We find that the first tumorous cell takes an average time of $63.1 \pm 13$ years to appear and start proliferation. On the other hand, the expansion time is constant: $8.8$ years once the mutation has emerged. These results are validated in an external cohort. Using this model, we analyze JAK2V617F ET versus PV, and obtain that the time of active mutation occurrence for PV takes approximately $1.5$ years more than for ET to develop, while the expansion time was similar. In conclusion, our age-dependent approach for the emergence and development of MPN demonstrates that the emergence of a JAKV617F mutation should be linked to an aging mechanism, and indicates a $8-9$ years period of time to develop a full MPN.

2403.18602 2026-01-28 stat.ME q-bio.MN

Multi-omics network reconstruction with collaborative graphical lasso

Alessio Albanese, Wouter Kohlen, Pariya Behrouzi

详情
英文摘要

Motivation: In recent years, the availability of multi-omics data has increased substantially. Multi-omics data integration methods mainly aim to leverage different molecular layers to gain a complete molecular description of biological processes. An attractive integration approach is the reconstruction of multi-omics networks. However, the development of effective multi-omics network reconstruction strategies lags behind. Results: In this study, we introduce collaborative graphical lasso, a novel approach that extends graphical lasso by incorporating collaboration between omics layers, thereby improving multi-omics data integration and enhancing network inference. Our method leverages a collaborative penalty term, which harmonizes the contribution of the omics layers to the reconstruction of the network structure. This promotes a cohesive integration of information across modalities, and it is introduced alongside a dual regularization scheme that separately controls sparsity within and between layers. To address the challenge of model selection in this framework, we propose XStARS, a stability-based criterion for multi-dimensional hyperparameter tuning. We assess the performance of collaborative graphical lasso and the corresponding model selection procedure through simulations, and we apply them to publicly available multi-omics data. This application demonstrated collaborative graphical lasso recovers established biological interactions while suggesting novel, biologically coherent connections. Availability and implementation: We implemented collaborative graphical lasso as an R package, available on CRAN as coglasso. The results of the manuscript can be reproduced running the code available at https://github.com/DrQuestion/coglasso_reproducible_code

2312.08290 2026-01-28 eess.IV cs.LG q-bio.QM

PhenDiff: Revealing Subtle Phenotypes with Diffusion Models in Real Images

Anis Bourou, Thomas Boyer, Kévin Daupin, Véronique Dubreuil, Aurélie De Thonel, Valérie Mezger, Auguste Genovesio

Journal ref MICCAI 2024

详情
英文摘要

For the past few years, deep generative models have increasingly been used in biological research for a variety of tasks. Recently, they have proven to be valuable for uncovering subtle cell phenotypic differences that are not directly discernible to the human eye. However, current methods employed to achieve this goal mainly rely on Generative Adversarial Networks (GANs). While effective, GANs encompass issues such as training instability and mode collapse, and they do not accurately map images back to the model's latent space, which is necessary to synthesize, manipulate, and thus interpret outputs based on real images. In this work, we introduce PhenDiff: a multi-class conditional method leveraging Diffusion Models (DMs) designed to identify shifts in cellular phenotypes by translating a real image from one condition to another. We qualitatively and quantitatively validate this method on cases where the phenotypic changes are visible or invisible, such as in low concentrations of drug treatments. Overall, PhenDiff represents a valuable tool for identifying cellular variations in real microscopy images. We anticipate that it could facilitate the understanding of diseases and advance drug discovery through the identification of novel biomarkers.

2303.02488 2026-01-28 q-bio.SC

Multi class intracellular protein targeting predictions in diatoms and other algae with complex plastids: ASAFind 2.0

Ansgar Gruber, Cedar McKay, Miroslav Oborník, Gabrielle Rocap

Comments 22 pages, 2 figures, 2 tabless, Supplemental Information (6 items) available on request

Journal ref Plant J, 122: e70138 (2025)

详情
英文摘要

Cells of diatoms and related algae with complex plastids of red algal origin are highly compartmentalized. These plastids are surrounded by four envelope membranes, which also define the periplastidic compartment (PPC), the space between the second and third membranes. The PPC corresponds to the cytosol of the eukaryotic alga that was the ancestor of the complex plastid. Metabolic reactions as well as cell biological processes take place in this compartment; however, its exact function remains elusive. Automated predictions of protein locations proved useful for genome wide explorations of metabolism in the case of plastid proteins, but until now, no automated method for the prediction of PPC proteins was available. Here, we present an updated version of the plastid protein predictor ASAFind, which includes optional prediction of PPC proteins. The new ASAFind version also accepts the output of the most recent versions of SignalP (5.0) and TargetP (2.0) input data. Furthermore, we release a Python script to calculate custom scoring matrices for adjustment of the ASAFind method to other groups of algae, and included the option to run the predictions with custom scoring matrices in a simplified score cut-off mode.

2601.18819 2026-01-28 q-bio.QM

Descriptive and risk analysis of vehicle movements linked to porcine reproductive and respiratory syndrome and porcine epidemic diarrhea transmission in US commercial swine farms

Jason A. Galvis, Taylor B. Parker, Cesar A. Corzo, Juliana B. Ferreira, Kelly A. Meiklejohn, Gustavo Machado

详情
英文摘要

Vehicle movements, including vehicle cabs and trailers, play a role in disseminating disease in swine production. However, there are many information gaps about vehicle movements patterns that increase the probability of disease transmission, which is crucial in developing better preventive strategies. In this study we described the movement pattern of vehicle cabs and trailers and identified risk factors for porcine reproductive and respiratory syndrome (PRRS) and porcine epidemic diarrhea (PED) farm's infectious status. We collected global positioning system (GPS) movement data from vehicle cabs and trailers for 18 months and basic information for 6621 farms in the U.S. For the vehicle movement data, we estimated 66 variables and evaluate their association with farms PRRS and PED status. Our univariate analysis showed that 56 variables were significant associated (p < 0.05) to PED and PRRS farm status. Within these variables, vehicle visit frequency and previous exposition to positive farms were the main risk factors for both diseases. Otherwise, increased vehicle cab and trailer loyalty for farm shipments and vehicle cleaning and disinfection events were protective factors. In the multivariate model, each additional weekly visit by a vehicle cab that had been exposed to a positive farm one day before the shipment was associated with a 234\% and 243\% increase in the odds of a farm testing PRRS- and PED-positive, respectively. Our analysis revealed that vehicle contact history play a crucial role in the transmission of PRRS and PED. These findings can provide insights to develop more target strategies aimed at reducing the transmission and outbreaks linked to vehicle movements in swine production.

2601.18818 2026-01-28 q-bio.QM cs.AI q-bio.PE

LabelKAN -- Kolmogorov-Arnold Networks for Inter-Label Learning: Avian Community Learning

Marc Grimson, Joshua Fan, Courtney L. Davis, Dylan van Bramer, Daniel Fink, Carla P. Gomes

详情
英文摘要

Global biodiversity loss is accelerating, prompting international efforts such as the Kunming-Montreal Global Biodiversity Framework (GBF) and the United Nations Sustainable Development Goals to direct resources toward halting species declines. A key challenge in achieving this goal is having access to robust methodologies to understand where species occur and how they relate to each other within broader ecological communities. Recent deep learning-based advances in joint species distribution modeling have shown improved predictive performance, but effectively incorporating community-level learning, taking into account species-species relationships in addition to species-environment relationships, remains an outstanding challenge. We introduce LabelKAN, a novel framework based on Kolmogorov-Arnold Networks (KANs) to learn inter-label connections from predictions of each label. When modeling avian species distributions, LabelKAN achieves substantial gains in predictive performance across the vast majority of species. In particular, our method demonstrates strong improvements for rare and difficult-to-predict species, which are often the most important when setting biodiversity targets under frameworks like GBF. These performance gains also translate to more confident predictions of the species spatial patterns as well as more confident predictions of community structure. We illustrate how the LabelKAN leads to qualitative and quantitative improvements with a focused application on the Great Blue Heron, an emblematic species in freshwater ecosystems that has experienced significant population declines across the United States in recent years. Using the LabelKAN framework, we are able to identify communities and species in New York that will be most sensitive to further declines in Great Blue Heron populations.

2601.13349 2026-01-28 q-bio.PE econ.GN q-fin.EC

Conservation priority mapping to prevent zoonotic spillovers

Leonardo Viotti, Luis Diego Herrera, Garo Batmanian, Franck Berthe, Rachael Kramp

详情
英文摘要

Diseases originating from wildlife pose a significant threat to global health, causing human and economic losses each year. The transmission of disease from animals to humans occurs at the interface between humans, livestock, and wildlife reservoirs, influenced by abiotic factors and ecological mechanisms. Although evidence suggests that intact ecosystems can reduce transmission, disease prevention has largely been neglected in conservation efforts and remains underfunded compared to mitigation. A major constraint is the lack of reliable, spatially explicit information to guide efforts effectively. Given the increasing rate of new disease emergence, accelerated by climate change and biodiversity loss, identifying priority areas for mitigating the risk of disease transmission is more crucial than ever. We present new high-resolution (1 km) maps of priority areas for targeted ecological countermeasures aimed at reducing the likelihood of zoonotic spillover, along with a methodology adaptable to local contexts. Our study compiles data on well-documented risk factors, protection status, forest restoration potential, and opportunity cost of the land to map areas with high potential for cost-effective interventions. We identify low-cost priority areas across 50 countries, including 277,000 km2 where environmental restoration could mitigate the risk of zoonotic spillover and 198,000 km2 where preventing deforestation could do the same, 95% of which are not currently under protection. The resulting layers, covering tropical regions globally, are freely available alongside an interactive no-code platform that allows users to adjust parameters and identify priority areas at multiple scales. Ecological countermeasures can be a cost-effective strategy for reducing the emergence of new pathogens; however, our study highlights the extent to which current conservation efforts fall short of this goal.

2509.25554 2026-01-28 q-bio.PE

Continuum models describing probabilistic motion of tagged agents in exclusion processes

Michael J. Plank, Matthew J. Simpson

Journal ref Physical Review E (2026), 113: 014137

详情
英文摘要

Lattice-based random walk models are widely used to study populations of migrating cells with motility bias and proliferation. Crowding is typically represented by volume exclusion, where each lattice site can be occupied by at most one agent and conflicting moves are aborted. This framework enables simulations that yield both population-level spatiotemporal agent density profiles and individual agent trajectories, comparable to experimental cell-tracking data. Previous continuum models for tagged-agent trajectories captured trajectory information only, and overlooked any measure of variability. This is an important limitation since trajectory data is inherently variable. To address this limitation, here we derive partial differential equations for the probability density function of tagged-agent trajectories. This continuum description has a clear physical interpretation, agrees well with distributional data from stochastic simulations, reveals the role of stochasticity in different contexts, and generalises to multiple subpopulations of distinct agents.

2412.00008 2026-01-28 q-bio.NC

The Copernican Argument for Alien Consciousness; The Mimicry Argument Against Robot Consciousness

Eric Schwitzgebel, Jeremy Pober

详情
英文摘要

On broadly Copernican grounds, we are entitled to assume that apparently behaviorally sophisticated extraterrestrial entities ("aliens") would be conscious. Otherwise, we humans would be inexplicably, implausibly lucky to have consciousness, while similarly behaviorally sophisticated entities elsewhere would be mere shells, devoid of consciousness. However, this Copernican default assumption is canceled in the case of behaviorally sophisticated entities designed to mimic superficial features associated with consciousness ("consciousness mimics"), and in particular a broad class of current, near-future, and hypothetical robots. These considerations, which we formulate, respectively, as the Copernican and Mimicry Arguments, jointly defeat an otherwise potentially attractive parity principle, according to which we should apply the same types of behavioral or cognitive tests to aliens and robots, attributing or denying consciousness similarly to the extent they perform similarly. Our approach is unusual in the following respect: Instead of grounding speculations about alien and robot consciousness in a particular metaphysical or scientific theory about the physical or functional bases of consciousness, we appeal directly to the epistemic principles of Copernican mediocrity and inference to the best explanation.

2304.05411 2026-01-28 q-bio.OT

Precision Oncology: Targeting Genomic Alterations and Cancer Signaling with Integrative Multi-Omics, Deep Learning and Network Biology in Medical Oncology

Manish Kumar

Comments Pictures and other related data have been taken from sources freely available for reuse or permission for the same can be obtained upon request. Pictures no. 1 has been added to the text with permission from Elsevier. (Order No. 5521991271884, dated 4th April 2023). 40 pages, 2 figures, and 2 tables

详情
英文摘要

Cancer is a complex genetic disease involving uncontrolled cell growth and proliferation, and necessitates effective targeting of dysregulated cellular pathways underlying cancer progression. Multiple genetic and epigenetic alterations characterize tumor progression and define hallmarks of cancer. Importantly, patients with the same cancer type respond differently to available cancer treatments, likely due to tumor-specific DNA, RNA, and proteins, indicating the need for patient-specific treatment options. Precision oncology has evolved as a form of cancer therapy that is focused on genetic and molecular profiling of tumors to identify specific molecular alterations involved in carcinogenesis for tailored individualized cancer treatment. Advances in high-throughput sequencing technologies have enabled gene expression profiling, providing multiomics data for detailed molecular characterization of various tumors. Integration and analysis of various multiomic sequencing data are crucial in this regard, as they can reveal critical molecular changes, such as cancer-driving mutations, post-translational modifications, gene fusions, amplifications, and alterations in signaling networks within tumors. Furthermore, the role of computational techniques such as artificial intelligence and deep learning, in analyzing complex data and identifying patterns of disease development for better outcomes is now well established in precision medicine. Additionally, AI-powered multi-omics and network biology have been harnessed to integrate and analyze biological data through networks, which may prove crucial in solving key problems facing precision oncology. This article aims to briefly explain the foundations and frontiers of precision oncology in the context of cutting-edge developments in tools and techniques associated with it, and try to assess its scope and importance in achieving the intended goals.