arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.20115 2026-03-23 cs.LG q-bio.BM q-bio.QM

Conditioning Protein Generation via Hopfield Pattern Multiplicity

Jeffrey D. Varner

详情
英文摘要

Protein sequence generation via stochastic attention produces plausible family members from small alignments without training, but treats all stored sequences equally and cannot direct generation toward a functional subset of interest. We show that a single scalar parameter, added as a bias to the sampler's attention logits, continuously shifts generation from the full family toward a user-specified subset, with no retraining and no change to the model architecture. A practitioner supplies a small set of sequences (for example, hits from a binding screen) and a multiplicity ratio that controls how strongly generation favors them. The method is agnostic to what the subset represents: binding, stability, specificity, or any other property. We find that the conditioning is exact at the level of the sampler's internal representation, but that the decoded sequence phenotype can fall short because the dimensionality reduction used to encode sequences does not always preserve the residue-level variation that defines the functional split. We term this discrepancy the calibration gap and show that it is predicted by a simple geometric measure of how well the encoding separates the functional subset from the rest of the family. Experiments on five Pfam families (Kunitz, SH3, WW, Homeobox, and Forkhead domains) confirm the monotonic relationship between separation and gap across a fourfold range of geometries. Applied to omega-conotoxin peptides targeting a calcium channel involved in pain signaling, curated seeding from 23 characterized binders produces over a thousand candidates that preserve the primary pharmacophore and all experimentally identified binding determinants. These results show that stochastic attention enables practitioners to expand a handful of experimentally characterized sequences into diverse candidate libraries without retraining a generative model.

2603.19881 2026-03-23 q-bio.NC

Problem difficulty and waiting time shape the level of detail and temporal organization of visual strategies in human planning

Mattia Eluchans, Giovanni Pezzulo

详情
英文摘要

Planning entails identifying sequences of actions to reach a goal, yet we still have incomplete knowledge of how problem constraints, such as difficulty and available time, influence the visual strategies supporting plan construction, both in terms of coverage of the to-be-executed plans and its temporal organization. To fill this gap, we recorded participants' cursor and eye movements in a multi-target problem solving task on a grid. We manipulated two orthogonal dimensions: problem difficulty, by introducing the novel construct of misleadingness, which measures how nodes' distances on the grid diverged from their relative position along the solution, and waiting time, by allowing participants either to act immediately or wait before moving. We found that difficulty significantly affected both performance and gaze: harder problems reduced success rates, required more corrections and pauses, elicited longer pre-movement inspection that provided higher coverage of the to-be-executed plan, and more re-fixations. When participants could start immediately, they did so without fully consolidating their plan. This led to more pauses and backtracks, but also to more precise gaze-cursor alignment during execution, suggesting improved online control compensating for incomplete planning. With increased planning time, greater difficulty led participants to achieve a better temporal alignment between pre-movement visual inspection and cursor movement during execution. Overall, our results suggest that problem difficulty increases the visual coverage of the upcoming plan, whereas time availability shapes the extent of replanning during execution and determines whether gaze-path coherence emerges before movement or only during execution in difficult problems.

2603.19751 2026-03-23 math.OC q-bio.NC q-bio.QM

Branched Optimal Transport for Stimulus to Reaction Brain Mapping

Cristian Mendico

详情
英文摘要

A central problem in systems neuroscience is to determine how an external stimulation is propagated through the brain so as to produce a reaction. Current deterministic and stochastic control models quantify transition costs between brain states on a prescribed network, but do not treat the transport network itself as an unknown. Here we propose a variational framework in which the inferred object is a graph/current connecting a stimulation source measure to a reaction target measure. The model is posed as an anisotropic branched optimal transport problem, where concavity of the flux cost promotes aggregation and branching. The support of an optimal current defines a stimulus-to-reaction routing architecture, interpreted as a brain reaction map. We prove existence of minimizers in discrete and continuous formulations and introduce a hybrid stochastic extension combining ramified transport with a path-space Kullback--Leibler control cost on the induced graph dynamics. This approach provides a mathematical mechanism for inferring propagation architectures rather than controlling trajectories on fixed substrates.

2603.19723 2026-03-23 cond-mat.soft q-bio.TO

Modelling the passive and active response of skeletal muscles within the adapted Voigt representation framework

Sara Galasso, Giulio G. Giusteri

Comments 25 pages, 7 figures

详情
英文摘要

We present a constitutive model for the passive and active response of skeletal muscles. At variance with more classical approaches, the model is developed exploiting adapted Voigt representations of strain and stress tensors within the context of nonlinear Cauchy elasticity. This framework allows us to identify non-trivial stress-strain relations in a rather direct way from experimental data, enhancing the mechanical interpretability of the material functions that describe the tissue response and obtaining additional insight on the distinct role of the contractile fibres and of the surrounding extracellular matrix. We propose a two-material model, with an additive splitting of the stress contributions, in which only one component depends on an activation parameter. The constitutive model for the passive behaviour satisfactorily predicts the nonlinear stress response to elongation at different relative orientations with respect to the fibre direction and highlights the dominant role of the extracellular matrix. The activation model, essentially determined by the mechanics of the contractile fibres, captures well the isometric stress response through the prescription of an elasto-plastic evolution of the along-fibre active strain.

2603.19690 2026-03-23 q-bio.NC cs.NE

A Unified Phase-native Computational Principle Governs Hippocampal Spike Timing and Neural Coding

Reza Ahmadvand, Sara Safura Sharif, Yaser Mike Banad

Comments 27 Pages, 5 Figures, 2 Tables

详情
英文摘要

Hippocampal neurons exhibit precise phase locking to network oscillations, but the computational principle governing this temporal precision is still unclear. Neural information is conveyed jointly by firing rates and spike timing, but existing models treat these dimensions separately, limiting mechanistic interpretation of spike-field coupling and its reported association with spectral features such as the aperiodic slope. Here we show that hippocampal phase locking emerges from a fundamental dynamical mechanism referred to as forced phase integration that separates neural information into orthogonal magnitude (what) and phase (when) coordinates. To formalize this principle, the unified complex-valued neuron (UCN) has been developed, a biologically grounded generative framework in which spike timing arises from phase accumulation while spike magnitude encodes instantaneous signal strength. This framework reproduces biological spike-theta synchronization and enables mechanistic re-evaluation of slope-locking associations, demonstrating that previously reported effects arise from oscillatory contamination rather than causal modulation. These findings establish a unified phase-native principle of neural timing and coding.

2601.18921 2026-03-23 cs.DB cs.CE cs.LG q-bio.QM

Accelerating Large-Scale Cheminformatics Using a Byte-Offset Indexing Architecture for Terabyte-Scale Data Integration

Malikussaid, Septian Caesar Floresko, Sutiyo

Comments 6 pages, 3 figures, 5 equations, 3 algorithms, 4 tables, to be published in ICoICT 2026, unabridged version exists as arXiv:2512.24643v1

详情
英文摘要

The integration of large-scale chemical databases represents a critical bottleneck in modern cheminformatics research, particularly for machine learning applications requiring high-quality, multi-source validated datasets. This paper presents a case study of integrating three major public chemical repositories: PubChem (176 million compounds), ChEMBL, and eMolecules, to construct a curated dataset for molecular property prediction. We investigate whether byte-offset indexing can practically overcome brute-force scalability limits while preserving data integrity at hundred-million scale. Our results document the progression from an intractable brute-force search algorithm with projected 100-day runtime to a byte-offset indexing architecture achieving 3.2-hour completion - a 740-fold performance improvement through algorithmic complexity reduction from $O(N \times M)$ to $O(N + M)$. Systematic validation of 176 million database entries revealed hash collisions in InChIKey molecular identifiers, necessitating pipeline reconstruction using collision-free full InChI strings. We present performance benchmarks, quantify trade-offs between storage overhead and scientific rigor, and compare our approach with alternative large-scale integration strategies. The resulting system successfully extracted 435,413 validated compounds and demonstrates generalizable principles for large-scale scientific data integration where uniqueness constraints exceed hash-based identifier capabilities.

2603.19577 2026-03-23 math.PR q-bio.QM stat.ME

Stochastic Averaging and Statistical Inference of Glycolytic Pathway

Arnab Ganguly, Hye-Won Kang

Comments 33 pages, 2 figures

详情
英文摘要

Many biological processes exhibit oscillatory behavior. Among these, glycolytic oscillations have been extensively studied due to their well-characterized biochemical reaction networks. However, the complexity of these networks necessitates low-dimensional ordinary differential equation (ODE) models to identify core mechanisms and perform stability analysis. While previous studies proposed reduced ODE models, these were typically introduced from deterministic descriptions rather than the underlying stochastic dynamics, which more accurately represent discrete reaction events occurring at random times. In this paper, we develop a rigorous probabilistic framework for deriving a reduced Othmer-Aldridge model of the glycolytic pathway from its stochastic formulation. The full system is modeled as a multiscale continuous-time Markov chain with different time and abundance scales. Under an appropriate scaling regime and specific structural conditions, we prove that the dynamics of the slow components are approximated by a two-dimensional ODE. The proof is technically involved due to the network's complexity and strong coupling between its components. We further consider the problem of parameter estimation when observations are limited to the slow species: fructose-6-phosphate and ADP. The reduced system yields a tractable loss function depending solely on these variables. We prove that the resulting estimators are statistically consistent when the data originate from the full stochastic reaction network. Together, these results provide a mathematically rigorous framework linking stochastic biochemical reaction networks, reduced deterministic dynamics, and statistically reliable parameter estimation.

2603.19473 2026-03-23 q-bio.BM cs.LG

Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids

Lucas Ferraz, Ana F. Rodrigues, Pedro Giesteira Cotovio, Mafalda Ventura, Gabriela Silva, Ana Sofia Coroadinha, Miguel Machuqueiro, Catia Pesquita

详情
英文摘要

Adeno-associated viral (AAV) vectors are widely used delivery platforms in gene therapy, and the design of improved capsids is key to expanding their therapeutic potential. A central challenge in AAV bioengineering, as in protein design more broadly, is the vast sequence design space relative to the scale of feasible experimental screening. Machine-guided generative approaches provide a powerful means of navigating this landscape and proposing novel protein sequences that satisfy functional constraints. Here, we develop a generative design framework based on protein language models and reinforcement learning to generate highly novel yet functionally plausible AAV capsids. A pretrained model was fine-tuned on experimentally validated capsid sequences to learn patterns associated with viability. Reinforcement learning was then used to guide sequence generation, with a reward function that jointly promoted predicted viability and sequence novelty, thereby enabling exploration beyond regions represented in the training data. Comparative analyses showed that fine-tuning alone produces sequences with high predicted viability but remains biased toward the training distribution, whereas reinforcement learining-guided generation reaches more distant regions of sequence space while maintaining high predicted viability. Finally, we propose a candidate selection strategy that integrates predicted viability, sequence novelty, and biophysical properties to prioritize variants for downstream evaluation. This work establishes a framework for the generative exploration of protein sequence space and advances the application of generative protein language models to AAV bioengineering.

2603.19425 2026-03-23 q-bio.NC math.DG math.FA

Curvature Sensitive Cells in the Modular Structures of The Visual Cortex

Giovanna Citti, Vasiliki Liontou

详情
英文摘要

We propose a model of the functional architecture of curvature-sensitive cells in the primary visual cortex. The model accounts for the modular and hierarchical organization of the cortex, the horizontal connectivity, and the shape of receptive profiles of these cells as Gabor-type filters. We construct a canonical affine subbundle of the cotangent bundle of the manifold of oriented contact elements of the retina as a geometric model for these cells, and show that this subbundle carries an Engel structure related to that of the Cartan prolongation. On an open submanifold of the Cartan prolongation, we identify generators of the Engel distribution whose iterated Lie brackets span the Lie algebra of SIM(2). The identification of sim(2) as the Lie algebra of these generators determines SIM(2) as the natural symmetry group for curvature-sensitive cells. Finally, we characterize the receptive profiles of curvature-sensitive cells as minima of a SIM(2)-adapted uncertainty principle applied to the generators of the Engel structure.

2603.19341 2026-03-23 q-bio.QM math.CO

Assessing 3D tree model quality and species classification using imbalance indices

Sophie J. Kersting, Mareike Fischer

详情
英文摘要

We investigate the use of additional 3D and phylogenetic non-3D tree balance indices for analyzing and monitoring forests using an exemplary "virtual forest" dataset from the Wytham Woods, Oxford, UK. This study assesses 3D model quality, species classification performance, and the relevance of these indices. Our study shows that indices stemming from the study of ancestry trees of species can be successfully applied to 3D models of organic trees and, accompanied with recently introduced 3D imbalance indices, offer a complementary perspective on 3D tree models and improve the detection of deviations. Their computational efficiency combined with the simple and reproducible workflow presented in this manuscript form a computationally feasible quality control step in the 3D model construction. Species classification models reached an estimated accuracy of up to 81.8% and allowed to make confident species predictions for a large portion of the unlabeled trees in the dataset. While conventional tree metrics can already provide strong predictive performance, the addition of filtered 3D and non-3D statistics improved results consistently, particularly for minority species classes. Alongside this manuscript, we provide updated functionality in the R package treeDbalance to include the necessary functionalities and release the derived index datasets and species predictions.

2603.19326 2026-03-23 q-bio.QM cs.LG cs.NA math.AP math.NA

Mathematical Modeling of Cancer-Bacterial Therapy: Analysis and Numerical Simulation via Physics-Informed Neural Networks

Ayoub Farkane, David Lassounon

详情
英文摘要

Bacterial cancer therapy exploits anaerobic bacteria's ability to target hypoxia tumor regions, yet the interactions among tumor growth, bacterial colonization, oxygen levels, immunosuppressive cytokines, and bacterial communication remain poorly quantified. We present a mathematical model of five coupled nonlinear reaction-diffusion equations in a two-dimensional tissue domain. We proved the global well-posedness of the model and identified its steady states to analyze stability. Furthermore, a physics-informed neural network (PINN) solves the system without a mesh and without requiring extensive data. It provides convergence guarantees by combining residual stability and Sobolev approximation error bounds. This results in an overall error rate of O(n^-2 ln^4(n) + N^-1/2), which depends on the network width n and the number of collocation points N. We conducted several numerical experiments, including predicting the tumor's response to therapy. We also performed a sensitivity analysis of certain parameters. The results suggest that long-term therapeutic efficacy may require the maintenance of hypoxia regions in the tumor, or using bacteria that tolerate oxygen better, may be necessary for long-lasting tumor control.

2603.19320 2026-03-23 q-bio.NC cond-mat.dis-nn cs.NE cs.SI

Analytically tractable model of synaptic crowding explains emergent small-world structure and network dynamics

Makoto Fukushima

Comments An earlier version appears on Research Square

详情
英文摘要

Neural circuits must balance local connectivity constraints against the need for global integration. Here we introduce a minimal wiring rule motivated by synaptic crowding: as a neuron accumulates incoming connections, each additional synapse becomes progressively harder to form. This single-parameter model admits an exact finite-size solution for the induced in-degree distribution and yields simple scaling laws: mean connectivity grows only logarithmically with network size while variance remains bounded -- consistent with homeostatic regulation of synaptic density. When candidates are encountered in order of spatial proximity, the crowding rule produces a broad, approximately power-law distribution of connection lengths without prescribing any explicit distance-dependent wiring law; combined with shortcut rewiring, this yields networks with small-world characteristics. We further show that the induced degree statistics largely determine attractor basin boundaries in threshold network dynamics, while local clustering primarily modulates the prevalence of long-lived non-absorbing outcomes near these boundaries. The model provides testable predictions linking local developmental constraints to macroscopic network organization and dynamics.

2603.18475 2026-03-23 math.NA cs.NA math.AP q-bio.NC

Resolving the Blow-Up: A Time-Dilated Numerical Framework for Multiple Firing Events in Mean-Field Neuronal Networks

Xu'an Dou, Louis Tao, Zhe Xue, Zhennan Zhou

详情
英文摘要

In large-scale excitatory neuronal networks, rapid synchronization manifests as {multiple firing events (MFEs)}, mathematically characterized by a finite-time blow-up of the neuronal firing rate in the mean-field Fokker-Planck equation. Standard numerical methods struggle to resolve this singularity due to the divergent boundary flux and the instantaneous nature of the population voltage reset. In this work, we propose a robust {multiscale numerical framework based on time dilation}. By transforming the governing equation into a dilated timescale proportional to the firing activity, we desingularize the blow-up, effectively stretching the instantaneous synchronization event into a resolved mesoscopic process. This approach is shown to be physically consistent with the {microscopic cascade mechanism} underlying MFEs and the system's inherent fragility. To implement this numerically, we develop a hybrid scheme that utilizes a {mesh-independent flux criterion} to switch between timescales and a semi-analytical ``moving Gaussian'' method to accurately evolve the post-blowup Dirac mass. Numerical benchmarks demonstrate that our solver not only captures steady states with high accuracy but also efficiently reproduces periodic MFEs, matching Monte Carlo simulations without the severe time-step restrictions associated with particle cascades.

2601.09320 2026-03-23 q-bio.NC

Mapping Connectomic Structure to Function(s) in Cerebellar-like Networks using Kernel Regression

William Dorrell, Peter E. Latham

Comments 12 pages, 7 figures

详情
英文摘要

Cerebellar-like networks, in which input activity patterns are separated by projection to a much higher-dimensional space before classification, are a recurring neurobiological motif, present in the cerebellum, dentate gyrus, insect olfactory system, and electrosensory system of the electric fish. Their relatively well-understood design presents a promising test-case for probing principles of biological learning. The circuits' expansive projections have long been modelled as random, enabling effective general purpose pattern separation. However, electron-microscopy studies have discovered interesting hints of structure in both the fly mushroom body and mouse cerebellum. Recent numerical work suggested that this non-random connectivity enables the circuit to prioritise learning of some, presumably natural, tasks over others. Here, rather than numerical results, we present a robust mathematical link between the observed connectivity patterns and the cerebellar circuit's learning ability. In particular, we extend a simplified kernel regression model of the system and use recent machine learning theory results to relate connectivity to learning. We find that the reported structure in the projection weights shapes the network's inductive bias in intuitive ways: functions are easier to learn if they depend on inputs that are oversampled, or on collections of neurons that tend to connect to the same hidden layer neurons. Our approach is analytically tractable and pleasingly simple, and we hope it continues to serve as a model for understanding the functional implications of other processing motifs in cerebellar-like networks.

2506.12177 2026-03-23 stat.ME q-bio.QM stat.AP

A proxy-based approach for unmeasured confounding in electronic health records research

Haley Colgate Kottler, Amy Cochran

详情
英文摘要

Electronic health records (EHR) are widely used to study clinical decisions, yet unmeasured confounding remains a persistent challenge. Proxy variables offer a potential solution. In EHR data, clinicians already record many such measurements (e.g., vitals), each revealing something about a patient's underlying health. Despite this, proxy-based methods are rarely used in practice. We introduce a new way to use proxies to adjust for unmeasured confounding. Our approach uses a vector of proxies to construct covariates that capture aspects of the unmeasured confounder, which are then included in a regression model. As one implementation, we use factor analysis followed by regression. We compare this approach with existing methods, including proximal causal inference, across a range of realistic settings. In practice, assumptions rarely hold exactly, so we study what happens when models are misspecified and variables are used incorrectly: e.g., a confounder or instrument is treated as a proxy. Finally, we apply the method to EHR data to estimate the effect of hospital admission for older adults presenting to the emergency department with chest pain, a setting where unmeasured confounding is a substantial concern. This work provides a practical way to use proxies and may help bring proxy-based methods into broader use.

2504.09537 2026-03-23 q-bio.QM

Machine Learning - driven insights for predicting the impact of nanoparticles on the functionality of biomolecules, Illustrated by the case of DNA Damage-Inducible Transcript 3 (CHOP) inhibitors

Mariya L. Ivanova, Michael Nicholls, Nicola Russo, Gueorgui Mihaylov, Konstantin Nikolic

Comments 34 pages, 13 figures, 23 tables

详情
英文摘要

This study introduces a pioneering machine learning (ML)-based approach for predicting the impact of nanoparticle (NP) carriers on the functionality of attached small biomolecules. It was hypothesised that NP interactions induce measurable perturbations in the atomic environment of the small biomolecules, which are reliably captured by chemical shifts in 13C and 1H NMR spectroscopy. Ten datasets were generated by combining 13C, 1H NMR spectroscopy data, derived from SMILES notations and molecular features provided by PubChem. The resulting datasets were used to train predictive models via traditional ML algorithms (Scikit-learn) and Deep Neural Network DNN (PyTorch). The methodology was demonstrated through a quantitative high-throughput screening (qHTS) focused on DNA Damage-Inducible Transcript 3 (CHOP) inhibitors. The optimal ML performance was achieved by the Random Forest Classifier, which was trained on 19,184 samples and tested on 4,000, resulting in 81.1% accuracy, 83.4% precision, 77.7% recall, 80.4% F1-score, 81.1% ROC, and a five-fold cross-validation score of 0.821. Complementing the main study, two computational approaches were developed to enhance CHOP inhibitor prediction. The first identifies the most desirable/undesirable functional groups for CHOP inhibition. The second, a CID_SID ML model, achieved 90.1% accuracy in predicting whether compounds designed for other purposes possess CHOP inhibition potential.

2504.08637 2026-03-23 physics.bio-ph q-bio.NC

Direct dependencies between neurons explain activity

Christopher W. Lynn

Comments 43 pages, 13 figures

详情
英文摘要

Our understanding of neural computation is founded on the assumption that neurons fire in response to a linear summation of inputs. Yet experiments demonstrate that some neurons are capable of complex functions that require interactions between inputs. Here we show, across multiple brain regions and species, that direct dependencies (without interactions between inputs) explain most of the variability in neuronal activity. Neurons are quantitatively described by models that capture the measured dependence on each input individually, but assume nothing about combinations of inputs. These minimal models, which are equivalent to logistic artificial neurons, predict complex higher-order dependencies and recover known features of synaptic connectivity. The inferred neural network is sparse, indicating a highly redundant neural code that is robust to perturbations. These results suggest that, despite intricate biophysical details, most neurons are described by simple artificial models.

2503.03773 2026-03-23 q-bio.GN cs.LG

A Phylogenetic Approach to Genomic Language Modeling

Carlos Albors, Jianan Canal Li, Gonzalo Benegas, Chengzhong Ye, Yun S. Song

Comments 15 pages, 7 figures

详情
英文摘要

Genomic language models (gLMs) have shown mostly modest success in identifying evolutionarily constrained elements in mammalian genomes. To address this issue, we introduce a novel framework for training gLMs that explicitly models nucleotide evolution on phylogenetic trees using multispecies whole-genome alignments. Our approach integrates an alignment into the loss function during training but does not require it for making predictions, thereby enhancing the model's applicability. We applied this framework to train PhyloGPN, a model that excels at predicting functionally disruptive variants from a single sequence alone and demonstrates strong transfer learning capabilities.

2501.14044 2026-03-23 q-bio.OT

Machine learning model leveraging SMILES-derived NMR spectroscopy data to predict dopamine D1 receptor antagonists: a prospective framework for forecasting the impact of engineered nanoparticles on the functionalities of small biomolecules

Mariya L Ivanova, Michael Nichols, Nicola Russo, Gueorgui Mihaylov, Konstantin Nikolic

Comments 27 pages, 8 figures, 2 tables

详情
英文摘要

The article proposes a conceptual approach for evaluating the impact of engineered nanoparticles (NPs) on the functionality of small biomolecules. The developed machine learning (ML) model is based on in-silico 13C NMR spectroscopy chemical shifts derived by the SMILES notations on small biomolecules. The rationale behind this approach is that 13C NMR provide information about the atom environment of the carbon atoms. Thus, decomposing the small biomolecules into their fundamental 13C NMR spectral data, and performing classification based on the count and position of chemical peaks, establishes a baseline for evaluating the impact of NPs on the functionality of small biomolecules, even if the ML model is not based on nano data. The approach mitigates not only the scarcity of nano-bio data but also hold potential for building of NP`s portfolio by utilising data collected from various in vitro, in situ, in vivo, and organ-on-a-chip environments across multiple timeframes. Such a framework enables predictive modeling based on these multi-environmental datasets, facilitating a deeper understanding of NP behaviour. The methodology was demonstrated using data from bioassay focused on human dopamine D1 receptor antagonists provided by PubChem. The model was train with 26,766 samples and test on 5,466 samples, achieving Accuracy of 70.8%, Precision of 74.3%, recall of 63.6%, F1-score of 68.5% and ROC of 70.8% were achieved by the Support Vector classifier, with an Area Under the Curve (AUC) of 76% and Matthews Correlation Coefficient, MCC=0.4204. A secondary, non-NP-related ML model was developed to complement the study case. It uses PubChem compound and substance identifiers (CIDs and SIDs) to predict whether pre-designed small biomolecules have the potential to be human dopamine D1 receptor antagonists.