arXivDaily arXiv每日学术速递 周一至周五更新
2602.11054 2026-02-12 q-bio.NC nlin.CD

A Dynamical Microscope for Multivariate Oscillatory Signals: Validating Regime Recovery on Shared Manifolds

Łukasz Furman, Ludovico Minati, Włodzisław Duch

Comments 11 pages, 6 figures, submitted to the PP-RAI conference

详情
英文摘要

Multivariate oscillatory signals from complex systems often exhibit non-stationary dynamics and metastable regime structure, making dynamical interpretation challenging. We introduce a ``dynamical microscope'' framework that converts multichannel signals into circular phase--amplitude features, learns a data-driven latent trajectory representation with an autoencoder, and quantifies dynamical regimes through trajectory geometry and flow field metrics. Using a coupled Stuart--Landau oscillator network with topology-switching as ground-truth validation, we demonstrate that the framework recovers differences in dynamical laws even when regimes occupy overlapping regions of state space. Group differences can be expressed as changes in latent trajectory speed, path geometry, and flow organization on a shared manifold, rather than requiring discrete state separation. Speed and explored variance show strong regime discriminability ($η^2 > 0.5$), while some metrics (e.g., tortuosity) capture trajectory geometry orthogonal to topology contrasts. The framework provides a principled approach for analyzing regime structure in multivariate time series from neural, physiological, or physical systems.

2602.10856 2026-02-12 cond-mat.dis-nn q-bio.PE

Fragile $\mathit{vs}$ robust Multiple Equilibria phases in generalized Lotka-Volterra model with non-reciprocal interactions

Thomas Louis-Sarrola, Valentina Ros

Comments 31 pages, 8 figures

详情
英文摘要

We investigate the Multiple Equilibria phase of generalized Lotka-Volterra dynamics with random, non-reciprocal interactions. We compute the topological complexity of equilibria, which quantifies how rapidly the number of equilibria of the dynamical equations grows with the total number of species. We perform the calculation for arbitrary degree of non-reciprocity in the interactions, distinguishing between configurations that are dynamically stable to invasions by species absent from the equilibrium, and those that are not. We characterize the properties of typical (i.e., most numerous) equilibria at a given diversity, including their average abundance, mutual similarity, and internal stability. This analysis reveals the existence of two distinct ME phases, which differ in how internally stable equilibria behave under invasions by absent species. We discuss the implications of this finding for the system's dynamical behavior.

2602.08640 2026-02-12 math.DS q-bio.NC

Universal Approximation Theorems for Dynamical Systems with Infinite-Time Horizon Guarantees

Abel Sagodi, Il Memming Park

详情
英文摘要

Universal approximation theorems establish the expressive capacity of neural network architectures. For dynamical systems, existing results are limited to finite time horizons or systems with a globally stable equilibrium, leaving multistability and limit cycles unaddressed. We prove that Neural ODEs achieve $\varepsilon$-$δ$ closeness -- trajectories within error $\varepsilon$ except for initial conditions of measure $< δ$ -- over the \emph{infinite} time horizon $[0,\infty)$ for three target classes: (1) Morse-Smale systems (a structurally stable class) with hyperbolic fixed points, (2) Morse-Smale systems with hyperbolic limit cycles via exact period matching, and (3) systems with normally hyperbolic continuous attractors via discretization. We further establish a temporal generalization bound: $\varepsilon$-$δ$ closeness implies $L^p$ error $\leq \varepsilon^p + δ\cdot D^p$ for all $t \geq 0$, bridging topological guarantees to training metrics. These results provide the first universal approximation framework for multistable infinite-horizon dynamics.

2602.02128 2026-02-12 cs.LG cs.AI physics.bio-ph q-bio.BM q-bio.QM

Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics

Nima Shoghi, Yuxuan Liu, Yuning Shen, Rob Brekelmans, Pan Li, Quanquan Gu

Comments 49 pages, 28 figures. Accepted by ICLR 2026. Project page: https://bytedance-seed.github.io/ConfRover/starmd

详情
英文摘要

Molecular dynamics (MD) simulations remain the gold standard for studying protein dynamics, but their computational cost limits access to biologically relevant timescales. Recent generative models have shown promise in accelerating simulations, yet they struggle with long-horizon generation due to architectural constraints, error accumulation, and inadequate modeling of spatio-temporal dynamics. We present STAR-MD (Spatio-Temporal Autoregressive Rollout for Molecular Dynamics), a scalable SE(3)-equivariant diffusion model that generates physically plausible protein trajectories over microsecond timescales. Our key innovation is a causal diffusion transformer with joint spatio-temporal attention that efficiently captures complex space-time dependencies while avoiding the memory bottlenecks of existing methods. On the standard ATLAS benchmark, STAR-MD achieves state-of-the-art performance across all metrics--substantially improving conformational coverage, structural validity, and dynamic fidelity compared to previous methods. STAR-MD successfully extrapolates to generate stable microsecond-scale trajectories where baseline methods fail catastrophically, maintaining high structural quality throughout the extended rollout. Our comprehensive evaluation reveals severe limitations in current models for long-horizon generation, while demonstrating that STAR-MD's joint spatio-temporal modeling enables robust dynamics simulation at biologically relevant timescales, paving the way for accelerated exploration of protein function.

2512.21320 2026-02-12 q-bio.GN cs.DB cs.DS

An Allele-Centric Pan-Graph-Matrix Representation for Scalable Pangenome Analysis

Roberto Garrone

Comments 11 Pages, 2 Figures, 1 Table

详情
英文摘要

Population-scale pangenome analysis increasingly requires representations that unify single-nucleotide and structural variation while remaining scalable across large cohorts. Existing formats are typically sequence-centric, path-centric, or sample-centric, and often obscure population structure or fail to exploit carrier sparsity. We introduce the H1 pan-graph-matrix, an allele-centric representation that encodes exact haplotype membership using adaptive per-allele compression. By treating alleles as first-class objects and selecting optimal encodings based on carrier distribution, H1 achieves near-optimal storage across both common and rare variants. We further introduce H2, a path-centric dual representation derived from the same underlying allele-haplotype incidence information that restores explicit haplotype ordering while remaining exactly equivalent in information content. Using real human genome data, we show that this representation yields substantial compression gains, particularly for structural variants, while remaining equivalent in information content to pangenome graphs. H1 provides a unified, population-aware foundation for scalable pangenome analysis and downstream applications such as rare-variant interpretation and drug discovery.

2508.07465 2026-02-12 cs.LG q-bio.GN stat.ML

MOTGNN: Interpretable Graph Neural Networks for Multi-Omics Disease Classification

Tiantian Yang, Zhiqian Chen

Comments 11 pages, 6 figures, 7 tables

详情
英文摘要

Integrating multi-omics data, such as DNA methylation, mRNA expression, and microRNA (miRNA) expression, offers a comprehensive view of the biological mechanisms underlying disease. However, the high dimensionality of multi-omics data, the heterogeneity across modalities, and the lack of reliable biological interaction networks make meaningful integration challenging. In addition, many existing models rely on handcrafted similarity graphs, are vulnerable to class imbalance, and often lack built-in interpretability, limiting their usefulness in biomedical applications. We propose Multi-Omics integration with Tree-generated Graph Neural Network (MOTGNN), a novel and interpretable framework for binary disease classification. MOTGNN employs eXtreme Gradient Boosting (XGBoost) for omics-specific supervised graph construction, followed by modality-specific Graph Neural Networks (GNNs) for hierarchical representation learning, and a deep feedforward network for cross-omics integration. Across three real-world disease datasets, MOTGNN outperforms state-of-the-art baselines by 5-10% in accuracy, ROC-AUC, and F1-score, and remains robust to severe class imbalance. The model maintains computational efficiency through the use of sparse graphs and provides built-in interpretability, revealing both top-ranked biomarkers and the relative contributions of each omics modality. These results highlight the potential of MOTGNN to improve both predictive accuracy and interpretability in multi-omics disease modeling.

2506.14508 2026-02-12 q-bio.OT

An ELIXIR scoping review on domain-specific evaluation metrics for synthetic data in life sciences

Styliani-Christina Fragkouli, Somya Iqbal, Lisa Crossman, Barbara Gravel, Nagat Masued, Mark Onders, Devesh Haseja, Alex Stikkelman, Alfonso Valencia, Tom Lenaerts, Fotis Psomopoulos, Pilib Ó Broin, Núria Queralt-Rosinach, Davide Cirillo

详情
英文摘要

Synthetic data has emerged as a powerful resource in life sciences, offering solutions for data scarcity, privacy protection and accessibility constraints. By creating artificial datasets that mirror the characteristics of real data, allows researchers to develop and validate computational methods in controlled environments. Despite its promise, the adoption of synthetic data in Life Sciences hinges on rigorous evaluation metrics designed to assess their fidelity and reliability. To explore the current landscape of synthetic data evaluation metrics in several Life Sciences domains, the ELIXIR Machine Learning Focus Group performed a systematic review of the scientific literature following the PRISMA guidelines. Six critical domains were examined to identify current practices for assessing synthetic data. Findings reveal that, while generation methods are rapidly evolving, systematic evaluation is often overlooked, limiting researchers ability to compare, validate, and trust synthetic datasets across different domains. This systematic review underscores the urgent need for robust, standardized evaluation approaches that not only bolster confidence in synthetic data but also guide its effective and responsible implementation. By laying the groundwork for establishing domain-specific yet interoperable standards, this scoping review paves the way for future initiatives aimed at enhancing the role of synthetic data in scientific discovery, clinical practice and beyond.

2407.06211 2026-02-12 q-bio.OT cs.CY cs.LG

Synthetic data: How could it be used for infectious disease research?

Styliani-Christina Fragkouli, Dhwani Solanki, Leyla J Castro, Fotis E Psomopoulos, Núria Queralt-Rosinach, Davide Cirillo, Lisa C Crossman

详情
英文摘要

Over the last three to five years, it has become possible to generate machine learning synthetic data for healthcare-related uses. However, concerns have been raised about potential negative factors associated with the possibilities of artificial dataset generation. These include the potential misuse of generative artificial intelligence (AI) in fields such as cybercrime, the use of deepfakes and fake news to deceive or manipulate, and displacement of human jobs across various market sectors. Here, we consider both current and future positive advances and possibilities with synthetic datasets. Synthetic data offers significant benefits, particularly in data privacy, research, in balancing datasets and reducing bias in machine learning models. Generative AI is an artificial intelligence genre capable of creating text, images, video or other data using generative models. The recent explosion of interest in GenAI was heralded by the invention and speedy move to use of large language models (LLM). These computational models are able to achieve general-purpose language generation and other natural language processing tasks and are based on transformer architectures, which made an evolutionary leap from previous neural network architectures. Fuelled by the advent of improved GenAI techniques and wide scale usage, this is surely the time to consider how synthetic data can be used to advance infectious disease research. In this commentary we aim to create an overview of the current and future position of synthetic data in infectious disease research.

2602.10375 2026-02-12 cond-mat.soft q-bio.TO

Morphological instability of an invasive active-passive interface

Sumit Sinha, Haiqian Yang, L Mahadevan

详情
英文摘要

Morphological instabilities of growing tissues that impinge on passive materials are typical of invasive cancers. To explain these instabilities in experiments on breast epithelial spheroids in an extracellular matrix, we develop a continuum phase field model of a growing active liquid expanding into a passive viscoelastic matrix. Linear stability analysis of the sharp-interface limit of the governing equations predicts that the tissue interface can develops long-wavelength instabilities, but these instabilities are suppressed when the active carcinoid is embedded in an elastic matrix. We develop a theoretical morphological phase diagram, and complement these with two-dimensional finite element (FEM) phase-field simulations to track the nonlinear evolution of the interface with results consistent with theoretical predictions and experimental observations. Our study provides a basis for the emergence of interfacial instabilities in active-passive systems with the potential to control them.

2602.10361 2026-02-12 q-bio.NC cs.AI cs.CV cs.HC

ENIGMA: EEG-to-Image in 15 Minutes Using Less Than 1% of the Parameters

Reese Kneeland, Wangshu Jiang, Ugo Bruzadin Nunes, Paul Steven Scotti, Arnaud Delorme, Jonathan Xu

详情
英文摘要

To be practical for real-life applications, models for brain-computer interfaces must be easily and quickly deployable on new subjects, effective on affordable scanning hardware, and small enough to run locally on accessible computing resources. To directly address these current limitations, we introduce ENIGMA, a multi-subject electroencephalography (EEG)-to-Image decoding model that reconstructs seen images from EEG recordings and achieves state-of-the-art (SOTA) performance on the research-grade THINGS-EEG2 and consumer-grade AllJoined-1.6M benchmarks, while fine-tuning effectively on new subjects with as little as 15 minutes of data. ENIGMA boasts a simpler architecture and requires less than 1% of the trainable parameters necessary for previous approaches. Our approach integrates a subject-unified spatio-temporal backbone along with a set of multi-subject latent alignment layers and an MLP projector to map raw EEG signals to a rich visual latent space. We evaluate our approach using a broad suite of image reconstruction metrics that have been standardized in the adjacent field of fMRI-to-Image research, and we describe the first EEG-to-Image study to conduct extensive behavioral evaluations of our reconstructions using human raters. Our simple and robust architecture provides a significant performance boost across both research-grade and consumer-grade EEG hardware, and a substantial improvement in fine-tuning efficiency and inference cost. Finally, we provide extensive ablations to determine the architectural choices most responsible for our performance gains in both single and multi-subject cases across multiple benchmark datasets. Collectively, our work provides a substantial step towards the development of practical brain-computer interface applications.

2602.10303 2026-02-12 cs.LG q-bio.QM stat.ML

ICODEN: Ordinary Differential Equation Neural Networks for Interval-Censored Data

Haoling Wang, Lang Zeng, Tao Sun, Youngjoo Cho, Ying Ding

详情
英文摘要

Predicting time-to-event outcomes when event times are interval censored is challenging because the exact event time is unobserved. Many existing survival analysis approaches for interval-censored data rely on strong model assumptions or cannot handle high-dimensional predictors. We develop ICODEN, an ordinary differential equation-based neural network for interval-censored data that models the hazard function through deep neural networks and obtains the cumulative hazard by solving an ordinary differential equation. ICODEN does not require the proportional hazards assumption or a prespecified parametric form for the hazard function, thereby permitting flexible survival modeling. Across simulation settings with proportional or non-proportional hazards and both linear and nonlinear covariate effects, ICODEN consistently achieves satisfactory predictive accuracy and remains stable as the number of predictors increases. Applications to data from multiple phases of the Alzheimer's Disease Neuroimaging Initiative (ADNI) and to two Age-Related Eye Disease Studies (AREDS and AREDS2) for age-related macular degeneration (AMD) demonstrate ICODEN's robust prediction performance. In both applications, predicting time-to-AD or time-to-late AMD, ICODEN effectively uses hundreds to more than 1,000 SNPs and supports data-driven subgroup identification with differential progression risk profiles. These results establish ICODEN as a practical assumption-lean tool for prediction with interval-censored survival data in high-dimensional biomedical settings.

2602.10242 2026-02-12 cond-mat.stat-mech physics.bio-ph q-bio.PE

Whodunnit? The case of midge swarms

L. L. Bonilla, R. González-Albaladejo

Comments 21 pages, 4 figures, revtex

详情
英文摘要

As collective states of animal groups go, swarms of midge insects pose a number of puzzling questions. Their ordering polarization parameter is quite small and the insects are weakly coupled among themselves but strongly coupled to the swarm. In laboratory studies (free of external perturbations), the correlation length is small, whereas midge swarms exhibit strong correlations, scale free behavior and power laws for correlation length, susceptibility and correlation time in field studies. Data for the dynamic correlation function versus time collapse to a single curve only for small values of time scaled with the correlation time. Is there a theory that explains these disparate observations? Among the existing theories, whodunnit? Here we review and discuss several models proposed in the literature and extend our own one, the harmonically confined Vicsek model, to anisotropic confinement. Numerical simulations of the latter produce elongated swarm shapes and values of the static critical exponents between those of the two dimensional and isotropic three dimensional models. The new values agree better with those measured in natural swarms.

2602.10156 2026-02-12 q-bio.GN cs.LG q-bio.CB

STRAND: Sequence-Conditioned Transport for Single-Cell Perturbations

Boyang Fu, George Dasoulas, Sameer Gabbita, Xiang Lin, Shanghua Gao, Xiaorui Su, Soumya Ghosh, Marinka Zitnik

Comments 8 pages for main draft, 6 main figures

详情
英文摘要

Predicting how genetic perturbations change cellular state is a core problem for building controllable models of gene regulation. Perturbations targeting the same gene can produce different transcriptional responses depending on their genomic locus, including different transcription start sites and regulatory elements. Gene-level perturbation models collapse these distinct interventions into the same representation. We introduce STRAND, a generative model that predicts single-cell transcriptional responses by conditioning on regulatory DNA sequence. STRAND represents a perturbation by encoding the sequence at its genomic locus and uses this representation to parameterize a conditional transport process from control to perturbed cell states. Representing perturbations by sequence, rather than by a fixed set of gene identifiers, supports zero-shot inference at loci not seen during training and expands inference-time genomic coverage from ~1.5% for gene-level single-cell foundation models to ~95% of the genome. We evaluate STRAND on CRISPR perturbation datasets in K562, Jurkat, and RPE1 cells. STRAND improves discrimination scores by up to 33% in low-sample regimes, achieves the best average rank on unseen gene perturbation benchmarks, and improves transfer to novel cell lines by up to 0.14 in Pearson correlation. Ablations isolate the gains to sequence conditioning and transport, and case studies show that STRAND resolves functionally alternative transcription start sites missed by gene-level models.

2602.09793 2026-02-12 cs.LG q-bio.QM

Fully-automated sleep staging: multicenter validation of a generalizable deep neural network for Parkinson's disease and isolated REM sleep behavior disorder

Jesper Strøm, Casper Skjærbæk, Natasha Becker Bertelsen, Steffen Torpe Simonsen, Niels Okkels, David Bertram, Sinah Röttgen, Konstantin Kufer, Kaare B. Mikkelsen, Marit Otto, Poul Jørgen Jennum, Per Borghammer, Michael Sommerauer, Preben Kidmose

Comments 21 pages excluding supplementary, 9 figures

详情
英文摘要

Isolated REM sleep behavior disorder (iRBD) is a key prodromal marker of Parkinson's disease (PD), and video-polysomnography (vPSG) remains the diagnostic gold standard. However, manual sleep staging is particularly challenging in neurodegenerative diseases due to EEG abnormalities and fragmented sleep, making PSG assessments a bottleneck for deploying new RBD screening technologies at scale. We adapted U-Sleep, a deep neural network, for generalizable sleep staging in PD and iRBD. A pretrained U-Sleep model, based on a large, multisite non-neurodegenerative dataset (PUB; 19,236 PSGs across 12 sites), was fine-tuned on research datasets from two centers (Lundbeck Foundation Parkinson's Disease Research Center (PACE) and the Cologne-Bonn Cohort (CBC); 112 PD, 138 iRBD, 89 age-matched controls. The resulting model was evaluated on an independent dataset from the Danish Center for Sleep Medicine (DCSM; 81 PD, 36 iRBD, 87 sleep-clinic controls). A subset of PSGs with low agreement between the human rater and the model (Cohen's $κ$ < 0.6) was re-scored by a second blinded human rater to identify sources of disagreement. Finally, we applied confidence-based thresholds to optimize REM sleep staging. The pretrained model achieved mean $κ$ = 0.81 in PUB, but $κ$ = 0.66 when applied directly to PACE/CBC. By fine-tuning the model, we developed a generalized model with $κ$ = 0.74 on PACE/CBC (p < 0.001 vs. the pretrained model). In DCSM, mean and median $κ$ increased from 0.60 to 0.64 (p < 0.001) and 0.64 to 0.69 (p < 0.001), respectively. In the interrater study, PSGs with low agreement between the model and the initial scorer showed similarly low agreement between human scorers. Applying a confidence threshold increased the proportion of correctly identified REM sleep epochs from 85% to 95.5%, while preserving sufficient (> 5 min) REM sleep for 95% of subjects.

2602.09649 2026-02-12 q-bio.PE q-bio.GN

Population-scale Ancestral Recombination Graphs with tskit 1.0

Ben Jeffery, Yan Wong, Kevin Thornton, Georgia Tsambos, Gertjan Bisschop, Yun Deng, E. Castedo Ellerman, Thomas B. Forest, Halley Fritze, Daniel Goldstein, Gregor Gorjanc, Graham Gower, Simon Gravel, Jeremy Guez, Benjamin C. Haller, Andrew D. Kern, Lloyd Kirk, Ivan Krukov, Hanbin Lee, Brieuc Lehmann, Hossameldin Loay, Matthew M. Osmond, Duncan S. Palmer, Nathaniel S. Pope, Aaron P. Ragsdale, Duncan Robertson, Murillo F. Rodrigues, Hugo van Kemenade, Clemens L. Weiß, Anthony Wilder Wohns, Shing H. Zhan, Brian C. Zhang, Marianne Aspbury, Nikolas A. Baya, Saurabh Belsare, Arjun Biddanda, Francisco Campuzano Jiménez, Ariella Gladstein, Bing Guo, Savita Karthikeyan, Warren W. Kretzschmar, Inés Rebollo, Kumar Saunack, Ruhollah Shemirani, Alexis Simon, Chris Smith, Jeet Sukumaran, Jonathan Terhorst, Per Unneberg, Ao Zhang, Peter Ralph, Jerome Kelleher

详情
英文摘要

Ancestral recombination graphs (ARGs) are an increasingly important component of population and statistical genetics. The tskit library has become key infrastructure for the field, providing an expressive and general representation of ARGs together with a suite of efficient fundamental operations. In this note, we announce tskit version 1.0, describe its underlying rationale, and document its stability guarantees. These guarantees provide a foundation for durable computational artefacts and support long-term reproducibility of code and analyses.

2602.09116 2026-02-12 cs.LG physics.soc-ph q-bio.QM

Importance inversion transfer identifies shared principles for cross-domain learning

Daniele Caligiore

Comments Formatting of lists and placement of tables and figures refined for improved readability

详情
英文摘要

The capacity to transfer knowledge across scientific domains relies on shared organizational principles. However, existing transfer-learning methodologies often fail to bridge radically heterogeneous systems, particularly under severe data scarcity or stochastic noise. This study formalizes Explainable Cross-Domain Transfer Learning (X-CDTL), a framework unifying network science and explainable artificial intelligence to identify structural invariants that generalize across biological, linguistic, molecular, and social networks. By introducing the Importance Inversion Transfer (IIT) mechanism, the framework prioritizes domain-invariant structural anchors over idiosyncratic, highly discriminative features. In anomaly detection tasks, models guided by these principles achieve significant performance gains - exhibiting a 56% relative improvement in decision stability under extreme noise - over traditional baselines. These results provide evidence for a shared organizational signature across heterogeneous domains, establishing a principled paradigm for cross-disciplinary knowledge propagation. By shifting from opaque latent representations to explicit structural laws, this work advances machine learning as a robust engine for scientific discovery.

2602.09067 2026-02-12 q-bio.GN cs.AI cs.CE cs.CL

AntigenLM: Structure-Aware DNA Language Modeling for Influenza

Yue Pei, Xuebin Chi, Yu Kang

Comments Accepted by ICLR 2026

详情
英文摘要

Language models have advanced sequence analysis, yet DNA foundation models often lag behind task-specific methods for unclear reasons. We present AntigenLM, a generative DNA language model pretrained on influenza genomes with intact, aligned functional units. This structure-aware pretraining enables AntigenLM to capture evolutionary constraints and generalize across tasks. Fine-tuned on time-series hemagglutinin (HA) and neuraminidase (NA) sequences, AntigenLM accurately forecasts future antigenic variants across regions and subtypes, including those unseen during training, outperforming phylogenetic and evolution-based models. It also achieves near-perfect subtype classification. Ablation studies show that disrupting genomic structure through fragmentation or shuffling severely degrades performance, revealing the importance of preserving functional-unit integrity in DNA language modeling. AntigenLM thus provides both a powerful framework for antigen evolution prediction and a general principle for building biologically grounded DNA foundation models.

2507.10136 2026-02-12 q-bio.QM cs.AI

A PBN-RL-XAI Framework for Discovering a "Hit-and-Run" Therapeutic Strategy in Melanoma

Zhonglin Liu

Comments 7 pages, 7 figures. Accepted by the IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2025. Code is available at https://github.com/Liu-Zhonglin/pbn-melanoma-project

详情
英文摘要

Innate resistance to anti-PD-1 immunotherapy remains a major clinical challenge in metastatic melanoma, with the underlying molecular networks being poorly understood. To address this, we constructed a dynamic Probabilistic Boolean Network model using transcriptomic data from patient tumor biopsies to elucidate the regulatory logic governing therapy response. We then employed a reinforcement learning agent to systematically discover optimal, multi-step therapeutic interventions and used explainable artificial intelligence to mechanistically interpret the agent's control policy. The analysis revealed that a precisely timed, 4-step temporary inhibition of the lysyl oxidase like 2 protein (LOXL2) was the most effective strategy. Our explainable analysis showed that this ''hit-and-run" intervention is sufficient to erase the molecular signature driving resistance, allowing the network to self-correct without requiring sustained intervention. This study presents a novel, time-dependent therapeutic hypothesis for overcoming immunotherapy resistance and provides a powerful computational framework for identifying non-obvious intervention protocols in complex biological systems.

2409.17038 2026-02-12 q-bio.OT

Omnibenchmark: transparent, reproducible, extensible and standardized orchestration of solo and collaborative benchmarks

Izaskun Mallona, Almut Luetge, Ben Carrillo, Daniel Incicau, Reto Gerber, Aidan Meara, Anthony Sonrel, Charlotte Soneson, Mark D. Robinson

Comments 20 page, 2 figures

详情
英文摘要

Benchmarking involves designing, running and disseminating rigorous performance assessments of methods, most often for data analysis and software tools, but the process can also be applied to experimental systems. Ideally, a benchmarking system is used to facilitate the benchmarking process by providing a structured entrypoint to design, coordinate, execute, and store standardized benchmarks. We describe a novel benchmarking system, Omnibenchmark, that facilitates benchmark formalization and execution in both solo and community efforts. Omnibenchmark provides a flexible benchmark plan syntax (i.e., a configuration YAML file), dynamic workflow generation based on Snakemake, S3-compatible storage handling, and reproducible software environments using environment modules, Apptainer or Conda. Such a setup provides an unprecedented flexibility such that existing benchmark designs can be forked and extended, run separately or collaboratively, giving versioned and standardized result outputs and therefore much-needed transparency to the analysis and interpretation of benchmark results. Tutorials and installation instructions are available from https://omnibenchmark.org.

2408.01253 2026-02-12 cs.AI cs.SY eess.SY q-bio.NC

Metareasoning in uncertain environments: a meta-BAMDP framework

Prakhar Godara, Tilman Diego Alemán

详情
英文摘要

\textit{Reasoning} may be viewed as an algorithm $P$ that makes a choice of an action $a^* \in \mathcal{A}$, aiming to optimize some outcome. However, executing $P$ itself bears costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$, generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems that humans and AI systems face. As a first step, we apply the framework to Bernoulli bandit tasks. Owing to the meta problem's complexity, our solutions are necessarily approximate. However, we introduce two novel theorems that significantly enhance the tractability of the problem, enabling stronger approximations that are robust within a range of assumptions grounded in realistic human decision-making scenarios. These results offer a resource-rational perspective and a normative framework for understanding human exploration under cognitive constraints, as well as providing experimentally testable predictions about human behavior in Bernoulli Bandit tasks.

2310.03111 2026-02-12 cs.LG q-bio.NC

Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data

Rabia Gondur, Usama Bin Sikandar, Evan Schaffer, Mikio Christian Aoi, Stephen L Keeley

Comments Updated version published in ICLR 2024

Journal ref In The Twelfth International Conference on Learning Representations. (2024)

详情
英文摘要

Characterizing the relationship between neural population activity and behavioral data is a central goal of neuroscience. While latent variable models (LVMs) are successful in describing high-dimensional time-series data, they are typically only designed for a single type of data, making it difficult to identify structure shared across different experimental data modalities. Here, we address this shortcoming by proposing an unsupervised LVM which extracts temporally evolving shared and independent latents for distinct, simultaneously recorded experimental modalities. We do this by combining Gaussian Process Factor Analysis (GPFA), an interpretable LVM for neural spiking data with temporally smooth latent space, with Gaussian Process Variational Autoencoders (GP-VAEs), which similarly use a GP prior to characterize correlations in a latent space, but admit rich expressivity due to a deep neural network mapping to observations. We achieve interpretability in our model by partitioning latent variability into components that are either shared between or independent to each modality. We parameterize the latents of our model in the Fourier domain, and show improved latent identification using this approach over standard GP-VAE methods. We validate our model on simulated multi-modal data consisting of Poisson spike counts and MNIST images that scale and rotate smoothly over time. We show that the multi-modal GP-VAE (MM-GPVAE) is able to not only identify the shared and independent latent structure across modalities accurately, but provides good reconstructions of both images and neural rates on held-out trials. Finally, we demonstrate our framework on two real world multi-modal experimental settings: Drosophila whole-brain calcium imaging alongside tracked limb positions, and Manduca sexta spike train measurements from ten wing muscles as the animal tracks a visual stimulus.

2307.11078 2026-02-12 q-bio.NC cs.LG cs.SD eess.AS

Brain2Music: Reconstructing Music from Human Brain Activity

Timo I. Denk, Yu Takagi, Takuya Matsuyama, Andrea Agostinelli, Tomoya Nakai, Christian Frank, Shinji Nishimoto

Comments Preprint; 21 pages; supplementary material: https://google-research.github.io/seanet/brain2music

Journal ref Nat Commun 17, 91 (2026)

详情
英文摘要

The process of reconstructing experiences from human brain activity offers a unique lens into how the brain interprets and represents the world. In this paper, we introduce a method for reconstructing music from brain activity, captured using functional magnetic resonance imaging (fMRI). Our approach uses either music retrieval or the MusicLM music generation model conditioned on embeddings derived from fMRI data. The generated music resembles the musical stimuli that human subjects experienced, with respect to semantic properties like genre, instrumentation, and mood. We investigate the relationship between different components of MusicLM and brain activity through a voxel-wise encoding modeling analysis. Furthermore, we discuss which brain regions represent information derived from purely textual descriptions of music stimuli. We provide supplementary material including examples of the reconstructed music at https://google-research.github.io/seanet/brain2music