arXivDaily arXiv每日学术速递 周一至周五更新
2601.18716 2026-01-27 cs.AI q-bio.BM

Conditioned Generative Modeling of Molecular Glues: A Realistic AI Approach for Synthesizable Drug-like Molecules

Naeyma N. Islam, Thomas R. Caulfield

Comments 30 pages, 8 figures

Journal ref Biomolecules 2025, 15, 849

详情
英文摘要

Alzheimer's disease (AD) is marked by the pathological accumulation of amyloid beta-42 (Abeta-42), contributing to synaptic dysfunction and neurodegeneration. While extracellular amyloid plaques are well-studied, increasing evidence highlights intracellular Abeta-42 as an early and toxic driver of disease progression. In this study, we present a novel, AI-assisted drug design approach to promote targeted degradation of Abeta-42 via the ubiquitin-proteasome system (UPS), using E3 ligase-directed molecular glues. We systematically evaluated the ternary complex formation potential of Abeta-42 with three E3 ligases: CRBN, VHL, and MDM2, through structure-based modeling, ADMET screening, and docking. We then developed a Ligase-Conditioned Junction Tree Variational Autoencoder (LC-JT-VAE) to generate ligase-specific small molecules, incorporating protein sequence embeddings and torsional angle-aware molecular graphs. Our results demonstrate that this generative model can produce chemically valid, novel, and target-specific molecular glues capable of facilitating Abeta-42 degradation. This integrated approach offers a promising framework for designing UPS-targeted therapies for neurodegenerative diseases.

2601.18713 2026-01-27 q-bio.QM cs.AI

Point transformer for protein structural heterogeneity analysis using CryoEM

Muyuan Chen, Muchen Li, Renjie Liao

详情
英文摘要

Structural dynamics of macromolecules is critical to their structural-function relationship. Cryogenic electron microscopy (CryoEM) provides snapshots of vitrified protein at different compositional and conformational states, and the structural heterogeneity of proteins can be characterized through computational analysis of the images. For protein systems with multiple degrees of freedom, it is still challenging to disentangle and interpret the different modes of dynamics. Here, by implementing Point Transformer, a self-attention network designed for point cloud analysis, we are able to improve the performance of heterogeneity analysis on CryoEM data, and characterize the dynamics of highly complex protein systems in a more human-interpretable way.

2601.18703 2026-01-27 q-bio.PE math.AP

Chemotaxis-inspired PDE models of airborne infectious disease transmission: epidemiologically-motivated mathematical and numerical analyses

Alex Viguerie, Malú Grave, Alvaro L. G. A. Coutinho, Alessandro Veneziani, Thomas J. R. Hughes

详情
英文摘要

Partial differential equation (PDE) models for infectious diseases, while less common than their ordinary differential equation (ODE) counterparts, have found successful applications for many years. Such models are typically of reaction-diffusion type, and model spatial propagation as a diffusive process. However, given the complex nature of human mobility, such models are limited in their ability to describe airborne infectious diseases in human populations. Recent work has advocated for the inclusion of an additional chemotaxis-type term as an alternative; spatial propagation of infection fronts is assumed additionally to flow from low-to-high concentrations of susceptible populations. The present work extends the study of such models by providing an epidemiologically interpretable analysis, directly connecting model behavior to information readily available to policymakers. In particular, we derive a spatially-aware basic reproduction number, which accounts for spatial heterogeneity in population density. Furthermore, we discuss several important aspects concerning the numerical solution of the model, including the introduction of a stabilization scheme. Finally, we perform a series of simulation studies in the Italian region of Lombardy (severely affected by the COVID-19 outbreak in 2020) and in the US state of Georgia, in which we demonstrate the model's potential to better capture important spatiotemporal dynamics observed in real-world data compared to pure reaction-diffusion models.

2601.18286 2026-01-27 q-bio.NC

Closed Eyes and Coil Size -- Effects on Motor Threshold and Intracortical Inhibition, measured with TMS

Meher Sabharwal, Narin Suleyman, Gabriel R. Palma, Roisin McMackin

Comments 21 pages

详情
英文摘要

Rationale: Transcranial magnetic stimulation (TMS)-based measures such as resting motor threshold (RMT) and short interval intracortical inhibition (SICI) are widely employed to study motor cortical and corticospinal tract function, and effects of diseases and drug therapies thereon. However, the effect of key experimental factors, including as eye state (open or closed) or stimulating coil size, remain unclear. As such, it is unknown whether these factors must be kept consistent across multi-center studies, and whether differences in such factors may underpin contradictory findings in existing literature. Materials and Methods: Threshold tracking TMS was employed to measure RMT and SICI (3ms interstimulus interval, conditioning at 70% of RMT) in 21 alert and awake, healthy controls. Motor evoked potentials were recorded from abductor pollicis brevis. Both RMT and SICI were measured under 6 conditions, while eyes were open or closed, using 3 figure-of-eight coils of differing winding diameter. Mixed effects modelling was employed to investigate effects of eye state and coil size on each measure. Results: RMT was found to be significantly higher for the smallest (30BFT) coil compared to both larger (50BFT and 70BF) coils. No difference in SICI was identified across coil sizes. Eye state was not found to affect either RMT or SICI measurements. Conclusions: Measurements of RMT and SICI can be considered comparable if recorded with eyes open or closed, provided the individual is awake and alert. Measurements of SICI recorded with figure-of-eight coils of different size can be considered comparable.

2601.18214 2026-01-27 q-bio.PE

A model for a population of trees structured by phenological traits

Sirine Boucenna, Vasilis Dakos, Gaël Raoul

详情
英文摘要

In the context of global warming, tree populations rely on two primary mechanisms of adaptation: phenotypic plasticity, which enables individuals to adjust their behavior in response to environmental stress, and genetic evolution, driven by natural selection and genetic diversity within the population. Understanding the interplay between these mechanisms is crucial for assessing the impacts of climate change on forest ecosystems and for informing sustainable management strategies. In this manuscript, we focus on a specific phenological adaptation: the ability of trees to enter summer dormancy once a critical temperature threshold is exceeded. Individuals are characterized by this threshold temperature and by their seed production capacity. We first establish a detailed mathematical model describing the population dynamics under these traits, and progressively reduce it to a system of two coupled ordinary differential equations. This simpler macroscopic model is then analyzed numerically, to investigate how the population reacts to a shift in its environment: an temperature increase, a drop in precipitation levels, or a combination of the two. Our results highlight contrasting effects of water stress and temperature stress on population dynamics, as well as the ambivalent effect of the plasticity.

2601.16593 2026-01-27 q-bio.PE nlin.AO physics.soc-ph

Adaptive dynamics of eco-evolutionary repeated games: Effect of reward and punishment

Prosanta Mandal, Suman Chakraborty, Vaibhav Madhok, Sagar Chakraborty

详情
英文摘要

Long-term evolutionary processes can strongly influence common-pool resource conservation by generating new traits or behaviours that modify the feedback between population strategies and the resource state. Here we develop an eco-evolutionary framework in which individuals repeatedly interact with the same opponent and follow direct reciprocity through reactive strategies. The strategic dynamics is coupled to a renewable common resource and analyzed using adaptive dynamics. After our exhaustive non-linear dynamical analysis of $2\times2$ strategic games, we focus on comparative and combined usefulness of institutional incentives in the form of rewards and punishments in preventing the Tragedy of the Commons even when defection dominates in the replete resource state. We also report possibility of robust stable oscillations -- emerging via Hopf bifurcation -- in resource state and population strategies.

2601.07024 2026-01-27 cond-mat.stat-mech nlin.AO physics.soc-ph q-bio.MN

Largest connected component in duplication-divergence growing graphs with symmetric coupled divergence

Dario Borrelli

Comments edit in an inline Eqn. ($ν$ rather than $ψ$) in Appendix C, caption of Fig.8 (Appendix A), and other minor edits

详情
英文摘要

The largest connected component in duplication-divergence growing graphs with symmetric coupled divergence is studied. Finite-size scaling reveals a phase transition occurring at a divergence rate $δ_c$. The $δ_c$ found stands near the locus of zero in Euler characteristic for finite-size graphs, known to be indicative of the largest connected component transition. The role of non-interacting vertices in shaping this transition with their presence ($d=0$) and absence ($d=1$) in duplication is also discussed, suggesting a particular transformation of the time variable considered, which yields a singularity locus in the natural logarithm of the absolute value of Euler characteristic in finite-size graphs near to that obtained with $d=1$ but from the model with $d=0$. The findings may suggest implications for bond percolation in these growing graph models.

2509.14788 2026-01-27 cs.LG cs.AI q-bio.BM

Structure-Aware Contrastive Learning with Fine-Grained Binding Representations for Drug Discovery

Jing Lan, Hexiao Ding, Hongzhao Chen, Yufeng Jiang, Nga-Chun Ng, Gwing Kei Yip, Gerald W. Y. Cheng, Yunlin Mao, Jing Cai, Liang-ting Lin, Jung Sun Yoo

Comments Accepted by 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026)

详情
英文摘要

Accurate identification of drug-target interactions (DTI) remains a central challenge in computational pharmacology, where sequence-based methods offer scalability. This work introduces a sequence-based drug-target interaction framework that integrates structural priors into protein representations while maintaining high-throughput screening capability. Evaluated across multiple benchmarks, the model achieves state-of-the-art performance on Human and BioSNAP datasets and remains competitive on BindingDB. In virtual screening tasks, it surpasses prior methods on LIT-PCBA, yielding substantial gains in AUROC and BEDROC. Ablation studies confirm the critical role of learned aggregation, bilinear attention, and contrastive alignment in enhancing predictive robustness. Embedding visualizations reveal improved spatial correspondence with known binding pockets and highlight interpretable attention patterns over ligand-residue contacts. These results validate the framework's utility for scalable and structure-aware DTI prediction.

2505.22673 2026-01-27 q-bio.TO cs.AI cs.CV

Physiology-Informed Generative Multi-Task Network for Contrast-Free CT Perfusion

Wasif Khan, John Rees, Kyle B. See, Simon Kato, Ziqian Huang, Amy Lazarte, Kyle Douglas, Xiangyang Lou, Teng J. Peng, Dhanashree Rajderkar, Pina Sanelli, Amita Singh, Ibrahim Tuna, Christina A. Wilson, Ruogu Fang

Comments Under Review

详情
英文摘要

Perfusion imaging is extensively utilized to assess hemodynamic status and tissue perfusion in various organs. Computed tomography perfusion (CTP) imaging plays a key role in the early assessment and planning of stroke treatment. While CTP provides essential perfusion parameters to identify abnormal blood flow in the brain, the use of contrast agents in CTP can lead to allergic reactions and adverse side effects, along with costing USD 4.9 billion worldwide in 2022. To address these challenges, we propose a novel deep learning framework called Multitask Automated Generation of Intermodal CT perfusion maps (MAGIC). This framework combines generative artificial intelligence and physiological information to map non-contrast computed tomography (CT) imaging to multiple contrast-free CTP imaging maps. We demonstrate enhanced image fidelity by incorporating physiological characteristics into the loss terms. Our network was trained and validated using CT image data from patients referred for stroke at UF Health and demonstrated robustness to abnormalities in brain perfusion activity. A double-blinded study was conducted involving seven experienced neuroradiologists and vascular neurologists. This study validated MAGIC's visual quality and diagnostic accuracy showing favorable performance compared to clinical perfusion imaging with intravenous contrast injection. Overall, MAGIC holds great promise in revolutionizing healthcare by offering contrast-free, cost-effective, and rapid perfusion imaging.

2505.14125 2026-01-27 cs.LG cs.AI q-bio.NC

Contrastive Consolidation of Top-Down Modulations Achieves Sparsely Supervised Continual Learning

Viet Anh Khoa Tran, Emre Neftci, Willem A. M. Wybo

Comments Accepted to NeurIPS 2025. Camera-ready version. 33 pages, 5 figures. Updated acknowledgements. Code available at: https://github.com/tran-khoa/tmcl

详情
英文摘要

Biological brains learn continually from a stream of unlabeled data, while integrating specialized information from sparsely labeled examples without compromising their ability to generalize. Meanwhile, machine learning methods are susceptible to catastrophic forgetting in this natural learning setting, as supervised specialist fine-tuning degrades performance on the original task. We introduce task-modulated contrastive learning (TMCL), which takes inspiration from the biophysical machinery in the neocortex, using predictive coding principles to integrate top-down information continually and without supervision. We follow the idea that these principles build a view-invariant representation space, and that this can be implemented using a contrastive loss. Then, whenever labeled samples of a new class occur, new affine modulations are learned that improve separation of the new class from all others, without affecting feedforward weights. By co-opting the view-invariance learning mechanism, we then train feedforward weights to match the unmodulated representation of a data sample to its modulated counterparts. This introduces modulation invariance into the representation space, and, by also using past modulations, stabilizes it. Our experiments show improvements in both class-incremental and transfer learning over state-of-the-art unsupervised approaches, as well as over comparable supervised approaches, using as few as 1% of available labels. Taken together, our work suggests that top-down modulations play a crucial role in balancing stability and plasticity.

2411.11547 2026-01-27 cs.DC q-bio.GN

gpuPairHMM: High-speed Pair-HMM Forward Algorithm for DNA Variant Calling on GPUs

Bertil Schmidt, Felix Kallenborn, Alexander Wichmann, Alejandro Chacon, Christian Hundt

Journal ref IEEE Transactions on Computational Biology and Bioinformatics, 2026

详情
英文摘要

The continually increasing volume of DNA sequence data has resulted in a growing demand for fast implementations of core algorithms. Computation of pairwise alignments between candidate haplotypes and sequencing reads using Pair-HMMs is a key component in DNA variant calling tools such as the GATK HaplotypeCaller but can be highly time consuming due to its quadratic time complexity and the large number of pairs to be aligned. Unfortunately, previous approaches to accelerate this task using the massively parallel processing capabilities of modern GPUs are limited by inefficient memory access schemes. This established the need for significantly faster solutions. We address this need by presenting gpuPairHMM -- a novel GPU-based parallelization scheme for the dynamic-programming based Pair-HMM forward algorithm based on wavefronts and warp-shuffles. It gains efficiency by minimizing both memory accesses and instructions. We show that our approach achieves close-to-peak performance on several generations of modern CUDA-enabled GPUs (Volta, Ampere, Ada, Hopper). It also outperforms prior implementations on GPUs, CPUs, and FPGAs by a factor of at least 8.6, 10.4, and 14.5, respectively. gpuPairHMM is publicly available at https://github.com/asbschmidt/gpuPairHMM.

2409.02588 2026-01-27 cs.LG q-bio.BM

Multiview Random Vector Functional Link Network for Predicting DNA-Binding Proteins

A. Quadir, M. Sajid, M. Tanveer

详情
英文摘要

The identification of DNA-binding proteins (DBPs) is essential due to their significant impact on various biological activities. Understanding the mechanisms underlying protein-DNA interactions is essential for elucidating various life activities. In recent years, machine learning-based models have been prominently utilized for DBP prediction. In this paper, to predict DBPs, we propose a novel framework termed a multiview random vector functional link (MvRVFL) network, which fuses neural network architecture with multiview learning. The MvRVFL model integrates both late and early fusion advantages, enabling separate regularization parameters for each view, while utilizing a closed-form solution for efficiently determining unknown parameters. The primal objective function incorporates a coupling term aimed at minimizing a composite of errors stemming from all views. From each of the three protein views of the DBP datasets, we extract five features. These features are then fused together by incorporating a hidden feature during the model training process. The performance of the proposed MvRVFL model on the DBP dataset surpasses that of baseline models, demonstrating its superior effectiveness. We further validate the practicality of the proposed model across diverse benchmark datasets, and both theoretical analysis and empirical results consistently demonstrate its superior generalization performance over baseline models.

2103.15265 2026-01-27 math.CO cs.FL math.CT q-bio.NC

Polychrony as Chinampas

Eric Dolores-Cuenca, Jose Antonio Arciniega-Nevarez, Anh Nguyen, Yitong Zou, Luke Van Popering, Nathan Crock, Gordon Erlebacher, Jose L. Mendoza-Cortes

Comments 32 pages. We changed the exposition, removed unfinished work, and added bibliography. To appear on "Algorithms"

Journal ref Algorithms 2023, 16(4), 193

详情
英文摘要

In this paper, we study the flow of signals through linear paths with the nonlinear condition that a node emits a signal when it receives external stimuli or when two incoming signals from other nodes arrive coincidentally with a combined amplitude above a fixed threshold. Sets of such nodes form a polychrony group and can sometimes lead to cascades. In the context of this work, cascades are polychrony groups in which the number of nodes activated as a consequence of other nodes is greater than the number of externally activated nodes. The difference between these two numbers is the so-called profit. Given the initial conditions, we predict the conditions for a vertex to activate at a prescribed time and provide an algorithm to efficiently reconstruct a cascade. We develop a dictionary between polychrony groups and graph theory. We call the graph corresponding to a cascade a chinampa. This link leads to a topological classification of chinampas. We enumerate the chinampas of profits zero and one and the description of a family of chinampas isomorphic to a family of partially ordered sets, which implies that the enumeration problem of this family is equivalent to computing the Stanley-order polynomials of those partially ordered sets.

2601.17796 2026-01-27 q-bio.NC

AI and World Models

Robert Worden

Comments 15 pages, 2 figures

详情
英文摘要

While large neural nets perform impressively on specific tasks, they are unreliable and unsafe, as is shown by the persistent hallucinations of large language models. This paper shows that large neural nets are intrinsically unreliable, because it is not possible to make or validate a tractable theory of how a neural net works. There is no reliable way to extrapolate its performance from a limited number of test cases to an unlimited set of use cases. To have confidence in the performance of a neural net, it is necessary to enclose it in a guardrail which is provably safe, so that whatever the neural net does, there cannot be harmful consequences. World models have been proposed as a way to do this. This paper discusses the scope and architecture required of world models. World models are often conceived as models of the physical and natural world, using established theories of natural science, or learned regularities, to predict the physical consequences of AI actions. However, unforeseen consequences of AI actions impact the human social world as much as the physical world. To predict and control the consequences of AI, a world model needs to include a model of the human social world. I explore the challenges that this entails. Human language is based on a Common Ground of mutual understanding of the world, shared by the people conversing. The common ground is an overlapping subset of each persons world model, including their models of the physical, social and mental worlds. LLMs have no stable representation of a common ground. To be reliable, AI systems will need to represent a common ground with their users, including physical, mental and social domains.

2601.17763 2026-01-27 q-bio.PE

Tracking dynamics of superspreading through contacts, exposures, and transmissions in edge-based network epidemics

Ari S. Freedman, Bjarke F. Nielsen, Maximillian M. Nguyen, Laurent Hébert-Dufresne, Simon A. Levin

详情
英文摘要

Infectious disease superspreading caused by heterogeneity in contact behavior has been observed to be an important determinant of epidemic dynamics and size in both empirical and theoretical settings. However, it has also been observed that the importance of this type of superspreading changes throughout an epidemic, generally in a decreasing manner as infections cascade from individuals with many contacts to those with fewer contacts. We provide an exact mathematical formulation of this phenomenon in strongly-immunizing (SIR) epidemics on static contact networks. Building on the edge-based modeling framework, we construct three metrics to track how superspreading changes through the course of an epidemic, respectively measuring infected nodes' contacts, exposures, and transmissions: (1) the mean degree of infected nodes, (2) the mean number of susceptible neighbors of infected nodes, and (3) the mean number of secondary cases that will be caused by newly infected nodes. We prove results about the behaviors of these metrics, highlighting the fact that their peak times all occur at less than half the time it takes for population-level infection prevalence to peak. This suggests that the importance of superspreading will be low when an epidemic is already near its peak, so contact-based control strategies are best employed as early in an outbreak as possible. We discuss implications for accurately measuring epidemiological parameters from incidence, mobility, contact tracing, and transmission data.

2601.17669 2026-01-27 q-bio.QM q-bio.TO

Quantitative cancer-immunity cycle modeling to optimize bevacizumab and atezolizumab combination therapy for advanced renal cell carcinoma

Lei Du, Chenghang Li, Jinzhi Lei

Comments 19 pages, 10 pages

详情
英文摘要

The incidence of advanced renal cell carcinoma(RCC) has been rising, presenting significant challenges due to the limited efficacy and severe side effects of traditional radiotherapy and chemotherapy. While combination immunotherapies show promise, optimizing treatment strategies remains difficult due to individual heterogeneity. To address this, we developed a Quantitative Cancer-Immunity Cycle (QCIC) model that integrates ordinary differential equations with stochastic modelling to quantitatively characterize and predict tumor evolution in patients with advanced RCC. By systematically integrating quantitative systems pharmacology principles with biological mechanistic knowledge, we constructed a virtual patient cohort and calibrated the model parameters using clinical immunohistochemistry data to ensure biological validity. To enhance predictive performance, we coupled the model with pharmacokinetic equations and defined the Tumor Response Index (TRI) as a quantitative metric of efficacy. Systematic analysis of the QCIC model allowed us to determine an optimal treatment regimen for the combination of bevacizumab and atezolizumab and identify tumor biomarkers with clinical predictive value. This study provides a theoretical framework and methodological support for precision medicine in the treatment of advanced RCC.

2601.17590 2026-01-27 q-bio.PE math.DS

Travelling Waves in Wolbachia Spread Dynamics

Zhuolin Qu, Tong Wu, Eddy Kwessi

详情
英文摘要

Wolbachia, a maternally transmitted endosymbiont, offers a powerful biological control strategy for mosquito-borne diseases such as dengue, Zika, and malaria. We develop an integro-difference equation (IDE) model that integrates Wolbachia's nonlinear growth with spatially explicit mosquito dispersal kernels to study invasion dynamics in heterogeneous landscapes. Analytical results establish the existence and uniqueness of monotone traveling waves and provide explicit estimates of invasion speeds as functions of dispersal and growth parameters. Four kernels: Gaussian, Laplace, exponential square-root, and Cauchy, represent a continuum from short- to long-range movement. Fat-tailed kernels generate faster, broader wavefronts, while compact ones limit spread. We also identify a critical bubble, the minimal localized profile required for sustained invasion. Numerical simulations in one- and two-dimensional domains confirm theoretical predictions and reveal parameter regimes governing invasion success. This framework quantifies how dispersal mechanisms shape Wolbachia's spread, thus informing targeted and efficient vector-control strategies.

2601.17528 2026-01-27 math.FA q-bio.NC

Sampling in the Euclidean Motion Group and a Problem from Brain's Primary Visual Cortex

Davide Barbieri

详情
英文摘要

We study a sampling problem for the abstract wavelet transform associated with the quasiregular representation of the $SE(2)$ group, for a modulated gaussian mother wavelet. This problem is motivated by the behavior of brain's primary visual cortex. We provide a characterization in terms of a dual Gramian matrix, and study numerically the relationships among the parameters defining the sampling and the mother wavelet.

2601.17523 2026-01-27 q-bio.NC physics.bio-ph

Unsupervised sleep-like intra- and inter-layer plasticity categorizes and improves energy efficiency in a multilayer spiking network

Leonardo Tonielli, Cosimo Lupo, Elena Pastorelli, Giulia De Bonis, Francesco Simula, Alessandro Lonardo, Pier Stanislao Paolucci

Comments 34 pages, 5 figures, plus supplementary material

详情
英文摘要

Sleep is thought to support memory consolidation and the recovery of optimal energetic regime by reorganizing synaptic connectivity, yet how plasticity across hierarchical brain circuits contributes to abstraction and energy efficiency remains unclear. Here we study a spiking multi-layer network alternating wake-like and deep-sleep-like states, with state-dependent dendritic integration and synaptic plasticity in a biologically inspired thalamo-cortical framework. During wakefulness, the model learns from few perceived examples, while during deep sleep it undergoes spontaneous replay driven by slow oscillations. Plasticity enabled not only within intra-layer connections, but also in inter-layer pathways, is critical for memory consolidation and energetic downshift. Compared to restricted plasticity, full inter-layer plasticity yields higher post-sleep visual classification accuracy and promotes the emergence of sharper class-specific associations. Furthermore, we introduce a biophysically grounded estimator of metabolic power expressing network energy consumption in ATP units, partitioned into baseline, synaptic maintenance, action potential, and transmission costs. We find that inter-layer plasticity in sleep leads to a larger reduction in firing rates, synaptic strength and synaptic activity, corresponding to a substantially larger decrease in power consumption. This work suggests promising elements to be integrated in neuromorphic/energy-efficient AI learning systems, supported by brain state-specific apical mechanisms.

2601.17504 2026-01-27 cs.CV q-bio.QM

BMDS-Net: A Bayesian Multi-Modal Deep Supervision Network for Robust Brain Tumor Segmentation

Yan Zhou, Zhen Huang, Yingqiu Li, Yue Ouyang, Suncheng Xiang, Zehua Wang

Comments 16 pages, 5 figures. Manuscript prepared for submission to ACM TOMM

详情
英文摘要

Accurate brain tumor segmentation from multi-modal magnetic resonance imaging (MRI) is a prerequisite for precise radiotherapy planning and surgical navigation. While recent Transformer-based models such as Swin UNETR have achieved impressive benchmark performance, their clinical utility is often compromised by two critical issues: sensitivity to missing modalities (common in clinical practice) and a lack of confidence calibration. Merely chasing higher Dice scores on idealized data fails to meet the safety requirements of real-world medical deployment. In this work, we propose BMDS-Net, a unified framework that prioritizes clinical robustness and trustworthiness over simple metric maximization. Our contribution is three-fold. First, we construct a robust deterministic backbone by integrating a Zero-Init Multimodal Contextual Fusion (MMCF) module and a Residual-Gated Deep Decoder Supervision (DDS) mechanism, enabling stable feature learning and precise boundary delineation with significantly reduced Hausdorff Distance, even under modality corruption. Second, and most importantly, we introduce a memory-efficient Bayesian fine-tuning strategy that transforms the network into a probabilistic predictor, providing voxel-wise uncertainty maps to highlight potential errors for clinicians. Third, comprehensive experiments on the BraTS 2021 dataset demonstrate that BMDS-Net not only maintains competitive accuracy but, more importantly, exhibits superior stability in missing-modality scenarios where baseline models fail. The source code is publicly available at https://github.com/RyanZhou168/BMDS-Net.

2601.17466 2026-01-27 q-bio.PE

$β$-diversity and Graph Sheaf Laplacians

Peter Davidson, Michael Grinfeld

Comments 7 pages

详情
英文摘要

We suggest a new approach to $β$-diversity in ecological systems, based on the energy of the graph sheaf Laplacian associated with the sample data. This scalar quantity is easily computable using methods of linear algebra. We show using simple examples that the energy is much more informative than the generally accepted definitions of $β$-diversity

2601.17228 2026-01-27 cs.CV q-bio.QM

Semi-Supervised Domain Adaptation with Latent Diffusion for Pathology Image Classification

Tengyue Zhang, Ruiwen Ding, Luoting Zhuang, Yuxiao Wu, Erika F. Rodriguez, William Hsu

详情
英文摘要

Deep learning models in computational pathology often fail to generalize across cohorts and institutions due to domain shift. Existing approaches either fail to leverage unlabeled data from the target domain or rely on image-to-image translation, which can distort tissue structures and compromise model accuracy. In this work, we propose a semi-supervised domain adaptation (SSDA) framework that utilizes a latent diffusion model trained on unlabeled data from both the source and target domains to generate morphology-preserving and target-aware synthetic images. By conditioning the diffusion model on foundation model features, cohort identity, and tissue preparation method, we preserve tissue structure in the source domain while introducing target-domain appearance characteristics. The target-aware synthetic images, combined with real, labeled images from the source cohort, are subsequently used to train a downstream classifier, which is then tested on the target cohort. The effectiveness of the proposed SSDA framework is demonstrated on the task of lung adenocarcinoma prognostication. The proposed augmentation yielded substantially better performance on the held-out test set from the target cohort, without degrading source-cohort performance. The approach improved the weighted F1 score on the target-cohort held-out test set from 0.611 to 0.706 and the macro F1 score from 0.641 to 0.716. Our results demonstrate that target-aware diffusion-based synthetic data augmentation provides a promising and effective approach for improving domain generalization in computational pathology.

2601.17184 2026-01-27 q-bio.GN cs.LG

FASTR: Reimagining FASTQ via Compact Image-inspired Representation

Adrian Tkachenko, Sepehr Salem, Ayotomiwa Ezekiel Adeniyi, Zulal Bingol, Mohammed Nayeem Uddin, Akshat Prasanna, Alexander Zelikovsky, Serghei Mangul, Can Alkan, Mohammed Alser

详情
英文摘要

Motivation: High-throughput sequencing (HTS) enables population-scale genomics but generates massive datasets, creating bottlenecks in storage, transfer, and analysis. FASTQ, the standard format for over two decades, stores one byte per base and one byte per quality score, leading to inefficient I/O, high storage costs, and redundancy. Existing compression tools can mitigate some issues, but often introduce costly decompression or complex dependency issues. Results: We introduce FASTR, a lossless, computation-native successor to FASTQ that encodes each nucleotide together with its base quality score into a single 8-bit value. FASTR reduces file size by at least 2x while remaining fully reversible and directly usable for downstream analyses. Applying general-purpose compression tools on FASTR consistently yields higher compression ratios, 2.47, 3.64, and 4.8x faster compression, and 2.34, 1.96, 1.75x faster decompression than on FASTQ across Illumina, HiFi, and ONT reads. FASTR is machine-learning-ready, allowing reads to be consumed directly as numerical vectors or image-like representations. We provide a highly parallel software ecosystem for FASTQ-FASTR conversion and show that FASTR integrates with existing tools, such as minimap2, with minimal interface changes and no performance overhead. By eliminating decompression costs and reducing data movement, FASTR lays the foundation for scalable genomics analyses and real-time sequencing workflows. Availability and Implementation: https://github.com/ALSER-Lab/FASTR

2601.15530 2026-01-27 cs.LG q-bio.NC q-bio.QM

Machine learning-enhanced non-amnestic Alzheimer's disease diagnosis from MRI and clinical features

Megan A. Witherow, Michael L. Evans, Ahmed Temtam, Hamid R. Okhravi, Khan M. Iftekharuddin

Comments 10 pages, 4 figures, 4 tables

详情
英文摘要

Alzheimer's disease (AD), defined as an abnormal buildup of amyloid plaques and tau tangles in the brain can be diagnosed with high accuracy based on protein biomarkers via PET or CSF analysis. However, due to the invasive nature of biomarker collection, most AD diagnoses are made in memory clinics using cognitive tests and evaluation of hippocampal atrophy based on MRI. While clinical assessment and hippocampal volume show high diagnostic accuracy for amnestic or typical AD (tAD), a substantial subgroup of AD patients with atypical presentation (atAD) are routinely misdiagnosed. To improve diagnosis of atAD patients, we propose a machine learning approach to distinguish between atAD and non-AD cognitive impairment using clinical testing battery and MRI data collected as standard-of-care. We develop and evaluate our approach using 1410 subjects across four groups (273 tAD, 184 atAD, 235 non-AD, and 685 cognitively normal) collected from one private data set and two public data sets from the National Alzheimer's Coordinating Center (NACC) and the Alzheimer's Disease Neuroimaging Initiative (ADNI). We perform multiple atAD vs. non-AD classification experiments using clinical features and hippocampal volume as well as a comprehensive set of MRI features from across the brain. The best performance is achieved by incorporating additional important MRI features, which outperforms using hippocampal volume alone. Furthermore, we use the Boruta statistical approach to identify and visualize significant brain regions distinguishing between diagnostic groups. Our ML approach improves the percentage of correctly diagnosed atAD cases (the recall) from 52% to 69% for NACC and from 34% to 77% for ADNI, while achieving high precision. The proposed approach has important implications for improving diagnostic accuracy for non-amnestic atAD in clinical settings using only clinical testing battery and MRI.

2601.15333 2026-01-27 cs.LG cs.AI q-bio.QM

Empowering LLMs for Structure-Based Drug Design via Exploration-Augmented Latent Inference

Xuanning Hu, Anchen Li, Qianli Xing, Jinglong Ji, Hao Tuo, Bo Yang

详情
英文摘要

Large Language Models (LLMs) possess strong representation and reasoning capabilities, but their application to structure-based drug design (SBDD) is limited by insufficient understanding of protein structures and unpredictable molecular generation. To address these challenges, we propose Exploration-Augmented Latent Inference for LLMs (ELILLM), a framework that reinterprets the LLM generation process as an encoding, latent space exploration, and decoding workflow. ELILLM explicitly explores portions of the design problem beyond the model's current knowledge while using a decoding module to handle familiar regions, generating chemically valid and synthetically reasonable molecules. In our implementation, Bayesian optimization guides the systematic exploration of latent embeddings, and a position-aware surrogate model efficiently predicts binding affinity distributions to inform the search. Knowledge-guided decoding further reduces randomness and effectively imposes chemical validity constraints. We demonstrate ELILLM on the CrossDocked2020 benchmark, showing strong controlled exploration and high binding affinity scores compared with seven baseline methods. These results demonstrate that ELILLM can effectively enhance LLMs capabilities for SBDD.

2601.15219 2026-01-27 q-bio.PE math.CO

A height-based metaconcept for rooted tree balance and its implications for the $B_1$ index

Mareike Fischer, Tom Niklas Hamann, Kristina Wicke

详情
英文摘要

Tree balance has received considerable attention in recent years, both in phylogenetics and in other areas. Numerous (im)balance indices have been proposed to quantify the (im)balance of rooted trees. A recent comprehensive survey summarized this literature and showed that many existing indices are based on similar underlying principles. To unify these approaches, three general metaconcepts were introduced, providing a framework to classify, analyze, and extend imbalance indices. In this context, a metaconcept is a function $Φ_f$ that depends on another function $f$ capturing some aspect of tree shape. In this manuscript, we extend this line of research by introducing a new metaconcept based on the heights of the pending subtrees of all inner vertices. We provide a thorough analysis of this metaconcept and use it to answer open questions concerning the well-known $B_1$ balance index. In particular, we characterize the tree shapes that maximize the $B_1$ index in two cases: (i) arbitrary rooted trees and (ii) binary rooted trees. For both cases, we also determine the corresponding maximum values of the index. Finally, while the $B_1$ index is induced by a so-called third-order metaconcept, we explicitly introduce three new (im)balance indices derived from the first- and second-order height metaconcepts, respectively, thereby demonstrating that pending subtree heights give rise to a variety of novel (im)balance indices.

2601.06272 2026-01-27 physics.soc-ph nlin.AO physics.bio-ph q-bio.MN q-bio.PE

Crossing the Functional Desert: Cascade-Driven Assembly and Feasibility Transitions in Early Life

Galen J. Wilkerson

Comments 11 pages, 2 figures

详情
英文摘要

The origin of life poses a problem of combinatorial feasibility: How can temporally supported functional organization arise in exponentially branching assembly spaces when unguided exploration behaves as a memoryless random walk? We show that nonlinear threshold-cascade dynamics in connected interaction networks provide a minimal, substrate-agnostic mechanism that can soften this obstruction. Below a critical connectivity threshold, cascades die out locally and structured input-output response mappings remain sparse and transient-a "functional desert" in which accumulation is dynamically unsupported. Near the critical percolation threshold, system-spanning cascades emerge, enabling discriminative functional responses. We illustrate this transition using a minimal toy model and generalize the argument to arbitrary networked systems. Also near criticality, cascades introduce finite-timescale structural and functional coherence, directional bias, and weak dynamical path-dependence into otherwise memoryless exploration, allowing biased accumulation. This connectivity-driven transition-functional percolation-requires only generic ingredients: interacting units, nonlinear thresholds, influence transmission, and non-zero coherence times. The mechanism does not explain specific biochemical pathways, but it identifies a necessary dynamical regime in which structured functional organization can emerge and be temporarily supported, providing a physical foundation for how combinatorial feasibility barriers can be crossed through network dynamics alone.

2511.11293 2026-01-27 cs.LG q-bio.QM

Toward Scalable Early Cancer Detection: Evaluating EHR-Based Predictive Models Against Traditional Screening Criteria

Jiheum Park, Chao Pang, Tristan Y. Lee, Jeong Yun Yang, Jacob Berkowitz, Alexander Z. Wei, Nicholas Tatonetti

详情
英文摘要

Current cancer screening guidelines cover only a few cancer types and rely on narrowly defined criteria such as age or a single risk factor like smoking history, to identify high-risk individuals. Predictive models using electronic health records (EHRs), which capture large-scale longitudinal patient-level health information, may provide a more effective tool for identifying high-risk groups by detecting subtle prediagnostic signals of cancer. Recent advances in large language and foundation models have further expanded this potential, yet evidence remains limited on how useful EHR-based models are compared with traditional risk factors currently used in screening guidelines. We systematically evaluated the clinical utility of EHR-based predictive models against traditional risk factors, including gene mutations and family history of cancer, for identifying high-risk individuals across eight major cancers (breast, lung, colorectal, prostate, ovarian, liver, pancreatic, and stomach), using data from the All of Us Research Program, which integrates EHR, genomic, and survey data from over 865,000 participants. Even with a baseline modeling approach, EHR-based models achieved a 3- to 6-fold higher enrichment of true cancer cases among individuals identified as high risk compared with traditional risk factors alone, whether used as a standalone or complementary tool. The EHR foundation model, a state-of-the-art approach trained on comprehensive patient trajectories, further improved predictive performance across 26 cancer types, demonstrating the clinical potential of EHR-based predictive modeling to support more precise and scalable early detection strategies.

2510.22860 2026-01-27 cs.CL q-bio.NC

Far from the Shallow: Brain-Predictive Reasoning Embedding through Residual Disentanglement

Linyang He, Tianjun Zhong, Richard Antonello, Gavin Mischler, Micah Goldblum, Nima Mesgarani

Comments Accepted at NeurIPS 2025

详情
英文摘要

Understanding how the human brain progresses from processing simple linguistic inputs to performing high-level reasoning is a fundamental challenge in neuroscience. While modern large language models (LLMs) are increasingly used to model neural responses to language, their internal representations are highly "entangled," mixing information about lexicon, syntax, meaning, and reasoning. This entanglement biases conventional brain encoding analyses toward linguistically shallow features (e.g., lexicon and syntax), making it difficult to isolate the neural substrates of cognitively deeper processes. Here, we introduce a residual disentanglement method that computationally isolates these components. By first probing an LM to identify feature-specific layers, our method iteratively regresses out lower-level representations to produce four nearly orthogonal embeddings for lexicon, syntax, meaning, and, critically, reasoning. We used these disentangled embeddings to model intracranial (ECoG) brain recordings from neurosurgical patients listening to natural speech. We show that: 1) This isolated reasoning embedding exhibits unique predictive power, accounting for variance in neural activity not explained by other linguistic features and even extending to the recruitment of visual regions beyond classical language areas. 2) The neural signature for reasoning is temporally distinct, peaking later (~350-400ms) than signals related to lexicon, syntax, and meaning, consistent with its position atop a processing hierarchy. 3) Standard, non-disentangled LLM embeddings can be misleading, as their predictive success is primarily attributable to linguistically shallow features, masking the more subtle contributions of deeper cognitive processing.

2509.13481 2026-01-27 q-bio.NC

Complex-valued Phase Synchrony Reveals Directional Coupling in FMRI and Tracks Medication Effects

Sir-Lord Wiafe, Najme Soleimani, Masoud Seraji, Bradley Baker, Robyn Miller, Ashkan Faghiri, Vince D. Calhoun

Comments 5 pages, 3 Figures, conference

详情
英文摘要

Understanding interactions in complex systems requires capturing the relative timing of coupling, not only its strength. Phase synchronization captures this timing, yet most methods either reduce the phase to its cosine or collapse it into scalar indices such as the phase-locking value, discarding relative timing. We propose a complex-valued phase synchrony (CVPS) framework that estimates phase with an adaptive Gabor wavelet and preserves both cosine and sine components. Simulations confirm that CVPS recovers true phase offsets and tracks non-stationary dynamics more faithfully than Hilbert-based methods. Because antipsychotics are known to modulate the timing of cortical interactions, they provide a rigorous context to evaluate whether CVPS can capture such pharmacological effects. CVPS further reveals cortical neuro-hemodynamic drivers, with occipital-to-parietal and prefrontal-to-striatal lead--lag flows consistent with known receptor targets, confirming its ability to capture pharmacological timing. CVPS, therefore, offers a robust, generalizable framework for detecting relative timing in complex systems such as the brain.