arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.06434 2026-06-05 q-bio.GN cs.PF

rsx: A high-performance streaming toolkit for RAD-seq sex determination

rsx: 用于RAD-seq性别鉴定的高性能流式工具包

Rohit Goswami, Ruhila Goswami

AI总结 针对RAD-seq性别鉴定中大规模数据的内存和效率问题,提出Rust实现的rsx工具包,通过2-bit DNA键、并行读取、内存映射、外部排序、位集分组计数和流式Gram矩阵PCA等优化,并添加共轭Beta-Binomial贝叶斯因子,实现8.38倍几何平均加速并保持结果一致性。

详情
Comments
37 pages, 12 figures. Software: https://github.com/HaoZeke/rsx-rs . Reproducibility archive: https://doi.org/10.5281/zenodo.20531539
AI中文摘要

限制性位点相关DNA测序(RAD-seq)广泛用于发现非模式生物中的性别连锁标记,但大型研究会产生包含数百万RAD标签的标记表。RADSex提供了构建标记-个体深度表和测试性别偏向标记分布的参考工作流程,但其depth、merge和相关的表构建命令内存消耗大,且标准输出仅提供频率论调用,无后验证据,也无直接的Python或C集成。我们提出rsx,一个用Rust实现的完整RADSex命令集,保留标记表语义和命令行兼容性。rsx结合了2-bit DNA键、并行读取、内存映射标记表、外部排序、位集组计数和流式Gram矩阵PCA,使得内存受限于个体数量或显式缓冲区。它增加了共轭Beta-Binomial贝叶斯因子以及XY和ZW假设下的后验概率,返回严格、后验支持和仅贝叶斯因子的证据等级。一个便携、独立于libm的误差函数极小极大近似,使得卡方尾部在不同平台上可重现,而不改变底层的Yates检验。在四个真实RAD-seq数据集(包含419亿碱基和2900万个标记)上,rsx重现了已发表的RADSex v1.2.0调用,在56个配对计时中实现了8.38倍几何平均加速(FASTQ处理为2.77倍),并恢复了所有Bonferroni显著的阳性对照标记。在Danio albolineatus(在源出版物中被视为零假设)中,后验层发现了30个W连锁标记假设;在Notothenia rossii中,它保留了400个仅贝叶斯因子的行,与低流行率零假设兼容。Python绑定、C API和可重复性存档提供了所有报告数字所用的工作流程。rsx在GPL-3.0-or-later下发布。

英文摘要

Restriction site-associated DNA sequencing (RAD-seq) is widely used to discover sex-linked markers in non-model organisms, but large studies produce marker tables with millions of RAD tags. RADSex provides the reference workflow for building marker-by-individual depth tables and testing sex-biased marker distributions, but its depth, merge, and related table-building commands grow memory-hungry, and its standard output reports frequentist calls with no posterior evidence and no direct Python or C integration. We present rsx, a Rust implementation of the complete RADSex command set that preserves marker-table semantics and command-line compatibility. rsx combines 2-bit DNA keys, parallel ingestion, memory-mapped marker tables, external sorting, bitset group counts, and streamed Gram-matrix PCA so that memory stays bounded by the number of individuals or by explicit buffers. It adds conjugate Beta-Binomial Bayes factors and posterior probabilities under XY and ZW hypotheses, returning strict, posterior-supported, and Bayes-factor-only evidence grades. A portable, libm-independent minimax approximation of the error function keeps the chi-squared tail reproducible across platforms without changing the underlying Yates test. On four real RAD-seq datasets comprising 41.9 billion bases and 29 million markers, rsx reproduced published RADSex v1.2.0 calls, achieved an 8.38-fold geometric-mean speedup across 56 paired timings (2.77-fold for FASTQ processing), and recovered every Bonferroni-significant positive-control marker. In Danio albolineatus, treated as null in the source publication, the posterior layer surfaced 30 W-linked marker hypotheses; in Notothenia rossii it withheld 400 Bayes-factor-only rows compatible with a low-prevalence null. Python bindings, a C API, and a reproducibility archive provide the workflows used for all reported numbers. rsx is released under GPL-3.0-or-later.

2606.06424 2026-06-05 q-bio.NC

Intrinsic Computational Functionalism: From Observer-Relative Maps to Observer-Independent Structures

内在计算功能主义:从观察者相对地图到观察者独立结构

Shuqin Ma, Ryota Kanai

AI总结 本文提出内在计算功能主义,通过系统内在实例化和因果动力学组织两个标准,论证计算属性可独立于观察者而存在,从而回应反计算论证中的观察者相对性挑战。

详情
Comments
23 pages, no figures. Shuqin Ma and Ryota Kanai contributed equally (joint first authors)
AI中文摘要

反计算论证表明,外部强加的计算解释无法作为意识的基础,但它们并未确立所有计算组织都是观察者相对的。我们发展了内在计算功能主义:即如果意识是由计算构成的,那么它依赖于系统自身实现的物理计算结构,而非外部解释者强加的标签。两个标准操作化了这一观点。(C1) 系统内在实例化:相关属性必须无需观察者标签即可指定,并且在系统变量的结构保持重标号下不变。(C2) 干预下的因果动力学组织:该属性必须基于一个状态空间结构,其中变量相互约束,并且其组织在干预下的反事实响应中得以展现。这两个标准共同规定了任何候选计算解释必须满足的条件以保持观察者独立性,而不选择哪些内在结构与经验相关。论证核心是一个三层分解的识别工作:解释者相对标签选择(第i层)、理论约束的划分选择(第ii层)和动力学内在粒度选择(第iii层)。我们认为,任何能够避免观察者相对性反驳的计算属性,如果存在的话,必须通过第(iii)层动力学内在粒度选择来识别,并依赖于经验上受约束的第(ii)层选择。语法不是语义论证、制图者论证以及生物自然主义反驳中的观察者相对性成分,成功反驳了将意识相关属性定位在第(i)层的观点;一旦区分了这些层次,内在计算功能主义便得以成立。

英文摘要

Anti-computational arguments show that externally imposed computational interpretations cannot ground consciousness, but they do not establish that all computational organisations are observer-relative. We develop intrinsic computational functionalism: the view that, if consciousness is computationally constituted, it depends on physically realised computational structures the system has in virtue of itself rather than on labels imposed by an external interpreter. Two criteria operationalise this view. (C1) System-intrinsic instantiation: the relevant property must be specifiable without an observer's labelling, and invariant under structure-preserving relabellings of the system's variables. (C2) Causal-dynamical organisation under intervention: the property must be grounded in a state-space structure whose variables mutually constrain one another, and whose organisation is exhibited in counterfactual response under intervention. Together these criteria specify what any candidate computational account must satisfy to remain observer-independent, without selecting which intrinsic structures bear on experience. The argumentative core is a three-tier decomposition of identification work: interpreter-relative label selection (tier i), theoretically constrained partition selection (tier ii), and dynamics-internal grain selection (tier iii). We argue that any computational property capable of avoiding the observer-relativity objection must be identified, if at all, through tier (iii) dynamics-internal grain selection, conditional on empirically disciplined tier (ii) choices. Syntax-is-not-semantics arguments, mapmaker arguments, and the observer-relativity component of biological-naturalist objections succeed against views that locate the consciousness-relevant property at tier (i); once the tiers are distinguished, intrinsic computational functionalism survives.

2606.06345 2026-06-05 cs.AI cs.LG q-bio.NC

Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation

使用TRIBE v2数据增强提升脑到图像解码

Yohann Benchetrit, Marlène Careil, Simon Dahan, Hubert Banville, Stéphane d'Ascoli, Jean-Rémi King

AI总结 针对脑解码中标记数据稀缺的问题,提出利用预训练的fMRI响应模型TRIBE v2生成合成数据来增强小样本数据集,在两个数据集上实现最高68%的Top-10图像检索准确率提升,并发现纯合成数据训练的解码器在零样本设置中也能达到高于随机水平的性能。

详情
AI中文摘要

脑解码受限于标记神经数据的可用性,在低数据量情况下仍然具有挑战性。为了解决这个问题,我们研究了是否以及何时可以通过使用预训练的fMRI刺激响应模型生成的合成数据来增强小样本fMRI数据集,从而提升脑解码性能。我们使用TRIBE v2,这是一个大型编码模型,在超过1000小时的视频、音频和语言fMRI响应数据上进行了预训练。对于每个数据集,我们评估了系统网格,展示了图像解码器性能如何随用于训练的合成数据量变化。基于两个数据集(7T fMRI自然场景数据集和3T fMRI BOLD5000)的结果显示,与仅使用真实数据训练的解码器相比,Top-10图像检索准确率最高提升68%。重要的是,达到给定图像解码性能所需的增强数据比例需要根据数据源进行调整。令人惊讶的是,仅使用合成fMRI数据训练的图像解码器在某些设置下性能高于随机水平,表明TRIBE v2可以支持零样本脑到图像解码。这些结果共同表明,大规模fMRI响应模型(针对视觉、声音和语言)可以为提高图像解码的数据效率提供基础。

英文摘要

Brain decoding is limited by the availability of labeled neural data, and remains challenging in low-data regimes. To address this issue, we investigate whether and when brain decoding can be boosted by augmenting small fMRI datasets with synthetic data generated by a pretrained model of fMRI responses to stimuli. We use TRIBE v2, a large encoding model pretrained on more than 1000 hours of fMRI responses to video, audio and language. For each dataset, we evaluate systematic grids that show how the performance of image decoders varies with the amount of synthetic data used for training. Our results, based on two datasets (the 7T fMRI Natural Scenes Dataset and 3T fMRI BOLD5000), show up to 68% improvement in Top-10 image-retrieval accuracy compared to decoders trained only on real data. Importantly, the proportion of augmented data required to reach a given image decoding performance needs to be adjusted depending on the data source. Surprisingly, image decoders trained exclusively on synthetic fMRI can perform above chance in some settings, suggesting that TRIBE v2 can support zero-shot brain-to-image decoding. Together, these results show how large-scale models of the fMRI responses to sight, sound and language may provide a foundation to improve the data efficiency for image decoding.

2606.06290 2026-06-05 q-bio.NC cond-mat.stat-mech

Early psychosis shows deviations in scaling behaviour within a critical regime

早期精神病在临界状态内表现出缩放行为的偏差

Irem Topal, Paola Moreno Ancalmo, Guillermo Montana Valverde, Philipp Homan, Wolfram Hinzen

AI总结 本研究结合现象学重整化群与频谱分析,发现早期精神病患者静息态fMRI的缩放指数系统性偏移,表明其集体动力学在保留临界组织的基础上发生重组。

详情
Comments
26 pages, 10 figures
AI中文摘要

越来越多的证据表明,大规模脑活动表现出与近临界状态运行一致的尺度不变动力学。这种动力学与长程相关性、高效信息处理和集体组织的涌现有关。尽管在精神疾病中已报道了临界性相关测量的改变,但先前的研究结果在可观测量和模态之间仍然分散,使得不清楚不同的缩放测量是否捕捉到大规模脑活动的共同改变。在这里,我们研究了早期精神病患者和健康对照者的静息态fMRI数据的缩放特性。我们将现象学重整化群(PRG)框架与功率谱密度(PSD)和去趋势波动分析(DFA)相结合,以表征跨尺度的集体动力学。在健康对照者中,静息态活动表现出与临界样组织一致的非平凡缩放行为。早期精神病参与者表现出相同的尺度不变组织整体现象学,但在多个可观测量的缩放指数上存在系统性偏移。这些发现表明,早期精神病的特征不是简单的临界样动力学丧失,而是在保留的缩放区域内集体动力学的重组。更广泛地说,我们的结果表明,将粗粒化方法与时间缩放分析相结合,为研究精神疾病中的大规模脑动力学提供了一个原则性框架。

英文摘要

Accumulating evidence suggests that large-scale brain activity exhibits scale-invariant dynamics consistent with operation in a near-critical regime. Such dynamics have been associated with long-range correlations, efficient information processing, and the emergence of collective organization. While altered criticality-related measures have been reported in psychiatric disorders, previous findings remain fragmented across observables and modalities, making it unclear whether different scaling measures capture a common alteration of large-scale brain dynamics. Here, we investigated scaling properties in resting-state fMRI data from individuals with early psychosis and healthy controls. We combined a phenomenological renormalization group (PRG) framework with power spectral density (PSD) and detrended fluctuation analysis (DFA) to characterize collective dynamics across scales. In healthy controls, resting-state activity exhibited non-trivial scaling behavior consistent with critical-like organization. Early psychosis participants showed the same overall phenomenology of scale-invariant organization, but with systematic shifts in scaling exponents across multiple observables. These findings indicate that early psychosis is not characterized by a simple loss of critical-like dynamics, but rather by a reorganization of collective dynamics within a preserved scaling regime. More broadly, our results suggest that combining coarse-graining approaches with temporal scaling analyses provides a principled framework for studying large-scale brain dynamics in psychiatric disorders.

2606.06117 2026-06-05 q-bio.QM cs.LG math.AT q-bio.GN

$p$-adic Bi-Filtrations for Topological Machine Learning on Genomic Sequences

$p$-adic 双过滤用于基因组序列的拓扑机器学习

Tirtharaj Dash, Gunja Sachdeva

AI总结 提出 pVR 框架,结合 $p$-adic 数与拓扑数据分析,通过双过滤 Vietoris-Rips 复形提取基因组序列拓扑特征,在低样本数据集上优于基线方法。

详情
Comments
12 pages, 5 figures, 8 tables
AI中文摘要

我们引入 pVR,一种用于无比对基因组序列分类的拓扑机器学习框架,该框架将 $p$-adic 数与拓扑数据分析相结合。每条 DNA 序列沿两个互补轴编码:$k$-mer 前缀上的 $p$-adic 距离,捕捉层次化位置结构;以及 $k$-mer 频率上的组合 $L_1$ 距离,捕捉局部序列内容。这两个距离共同参数化一个双过滤 Vietoris-Rips 复形,来自该双过滤的每条序列的拓扑摘要作为标准机器学习分类器的特征。我们为该构造建立了理论保证:在度量扰动下的稳定性以及对素数选择的不变性,同时还有一个结果解释了为什么单个 $p$-adic 轴在拓扑上无信息,而双过滤能恢复非平凡同调。在十二个基因组基准测试(28 到 500 条序列,3 到 7 个类别)上,pVR 在六个低样本数据集中的三个上优于四种已建立的无对齐基线方法,提升高达 21 个百分点;仅在一个 SARS-CoV-2 变异基准测试上表现不佳,该基准测试的点突变偏离违反了层次化假设,并且所有方法在大样本情况下均达到饱和。pVR 还在三个低样本基准测试上优于 5 亿参数 Nucleotide Transformer v2 的零样本冻结嵌入,提升 6.7 到 11.4 个百分点。pVR 代码库公开于 https://github.com/MAHI-Group/pVR。

英文摘要

We introduce pVR, a topological machine learning framework for alignment-free genomic sequence classification that combines $p$-adic numbers with topological data analysis. Each DNA sequence is encoded along two complementary axes: a $p$-adic distance on $k$-mer prefixes, which captures hierarchical positional structure, and a compositional $L_1$ distance on $k$-mer frequencies, which captures local sequence content. The two distances jointly parameterise a bi-filtered Vietoris--Rips complex, and per-sequence topological summaries from this bi-filtration serve as features for standard machine learning classifiers. We establish theoretical guarantees for the construction: stability under metric perturbations and invariance to the choice of prime, alongside a result that explains why a single $p$-adic axis is topologically uninformative and why the bi-filtration recovers nontrivial homology. On twelve genomic benchmarks ($28$ to $500$ sequences, $3$ to $7$ classes), pVR outperforms four established alignment-free baselines on three of six low-sample datasets, with gains of up to $21$ percentage points; it underperforms only on a SARS-CoV-2 variant benchmark whose point-mutation divergence violates the hierarchical assumption, and all methods saturate in the large-sample regime. pVR also outperforms zero-shot frozen embeddings from the 500M-parameter Nucleotide Transformer v2 by $6.7$ to $11.4$ percentage points on three low-sample benchmarks. The pVR codebase is publicly available at https://github.com/MAHI-Group/pVR.

2606.05980 2026-06-05 q-bio.QM

On the Promises and Limits of Multi-omics Integration for Deconvolution: The HADACA3 Benchmark

多组学整合用于去卷积的承诺与局限:HADACA3 基准测试

Hugo Barbot, Elise Amblard, Nicolas Homberg, Lucie Lamothe, Morgane Térézol, Hadaca Consortium, Mira Ayadi, Aurélia Baurès, Yasmina Kermezli, Carl Herrmann, Sebastien Dejean, Lionel Spinelli, David Causeur, Florent Chuffart, Anaïs Baudot, Yuna Blum, Magali Richard

AI总结 本文通过社区驱动的 HADACA3 基准测试,评估多组学整合(DNA甲基化与RNA)在细胞类型去卷积中的表现,发现单独DNA甲基化平均性能最优,但多组学整合可在特定设置下提升最佳性能。

详情
AI中文摘要

理解复杂组织(如肿瘤)的细胞组成是生物学和医学中的一个关键挑战。一种常见方法称为去卷积,旨在从批量分子测量中估计细胞组成。随着多种类型分子数据的日益可用,通常认为结合数据源应能提高去卷积性能。在此,我们提出 HADACA3,一个社区驱动的基准测试,旨在评估这一假设。我们进行了为期四天的协作竞赛,随后进行了大规模计算基准测试,在九个具有匹配DNA甲基化(DNAm)和RNA谱的数据集上测试了超过25万个分析流程,涵盖了广泛的生物学和实验条件。我们的框架联合评估了预处理、特征选择、建模和整合策略的影响。我们发现,单独DNAm在数据集间实现了最高的中位性能,使其成为最稳定可靠的单一模态方法。然而,多组学整合策略可以在特定数据集和流程配置中定期实现更高的顶级性能。在测试的策略中,基于误差加权平均的后期整合提供了一个强大且可靠的基线,而非线性早期整合方法(如最优传输)在真实生物数据集上显示出有希望的结果。总体而言,我们的结果表明,多组学整合并未系统性地提高相对于单独DNAm的平均性能,但可以在特定设置中提高最佳情况下的性能。这突显了稳健性与峰值性能之间的权衡,并强调了将整合策略与数据的统计特性对齐的重要性。所有数据、代码和评估工具均公开可用,以支持可重复研究和未来方法开发。

英文摘要

Understanding the cellular composition of complex tissues, such as tumors, is a key challenge in biology and medicine. A common approach, known as deconvolution, aims to estimate the cellular composition from bulk molecular measurements. With the growing availability of multiple types of molecular data, it is often assumed that combining data sources should improve deconvolution performance. Here, we present HADACA3, a community-driven benchmark designed to evaluate this assumption. We conducted a four-day collaborative competition followed by a large-scale computational benchmark, testing more than 250,000 analysis pipelines across nine datasets with matched DNA methylation (DNAm) and RNA profiles, representing a wide range of biological and experimental conditions. Our framework jointly evaluates the impact of preprocessing, feature selection, modeling, and integration strategies. We find that DNAm alone achieves the highest median performance across datasets, making it the most stable and reliable single-modality approach. However, multi-omics integration strategies can regularly achieve higher top performance in specific datasets and pipeline configurations. Among the tested strategies, late integration based on error-weighted averaging provides a strong and reliable baseline, while non-linear early integration methods, such as optimal transport, show promising results on real biological datasets. Overall, our results show that multi-omics integration does not systematically improve average performance over DNAm alone, but can improve best-case performance in specific settings. This highlights a trade-off between robustness and peak performance, and emphasizes the importance of aligning integration strategies with the statistical properties of the data. All data, code, and evaluation tools are publicly available to support reproducible research and future method development.

2606.05918 2026-06-05 q-bio.QM

Federated SPARQL querying for genomic variant functional annotation

用于基因组变异功能注释的联邦SPARQL查询

Alexandrina Bodrug-Schepers, Romain Bourcier, Richard Redon, Alban Gaignard

AI总结 提出使用联邦SPARQL查询对基因组变异进行功能注释,避免公共数据复制,同时保持数据在本地并符合FAIR原则。

详情
Comments
European Semantic Web Conference 2026, European Semantic Web Conference 2026 Organising Committee, May 2026, Dubrovnik, Croatia
AI中文摘要

敏感健康数据应优先在本地进行分析。在典型的生物信息学工作流中,公共数据库被复制并由专门工具用于丰富本地数据集。对于基因组变异数据,这一过程称为变异注释。在本节中,我们演示了使用联邦SPARQL查询进行变异注释。我们首先概述如何利用最先进的生物医学本体将临床基因组数据建模为知识图谱(KG)。然后,我们通过查询UniprotKB(一个大型的基因和蛋白质策展知识图谱)进行变异注释。我们的方法避免了公共数据复制,同时将基因组数据保留在本地,并使其符合FAIR原则。我们的用例基于ICAN项目,这是一个旨在研究脑浆果动脉瘤病理生理学的研究项目。

英文摘要

Sensitive health data should preferentially be analysed on site. In typical bioinformatics workows, public databases are duplicated and used by specialised tools to enrich the local datasets. In the case of genomic variation data, this process is called variant annotation. In this session we demonstrate variant annotation using federated SPARQL queries. We rst overview how clinico-genomic data can be modelled as a knowledge graph (KG), leveraging state-of-the-art biomedical ontologies. We then perform variant annotation by querying UniprotKB, a massive curated KG for gene and proteins. Our approach avoids public data duplication while maintaining genomic data on site and aligning it with FAIR principles. Our use-case is based on the ICAN project, a research program aimed at studying the physiopathology of cerebral berry aneurysms.

2606.05870 2026-06-05 q-bio.NC cs.LG q-bio.QM

Cross-scale spatially-aware generative modeling of transcriptomic programs underlying neurodegenerative brain organization

跨尺度空间感知生成模型揭示神经退行性脑组织下的转录组程序

Krishnakumar Vaithianathan

AI总结 提出一种跨尺度空间感知生成框架,通过变分生成架构结合图空间平滑正则化,学习区域基因表达与皮质退化的潜在生物程序,实现高精度预测(解释方差0.8604,空间相关r=0.9439)。

详情
Comments
26 pages, 5 figures
AI中文摘要

神经退行性疾病如阿尔茨海默病表现出高度有序的区域性脑脆弱性模式,但这种空间选择性的生物学机制仍不完全清楚。现有的成像-转录组研究主要依赖于基因表达与神经影像表型之间的相关性分析,限制了它们模拟分子组织如何导致神经退化的能力。在这里,我们引入了一个跨尺度空间感知生成框架,用于模拟皮质退化下的转录组程序。使用艾伦人脑图谱中910个标志基因在68个皮质区域的区域转录组图谱。通过计算认知正常对照(NC=926)和阿尔茨海默病受试者(AD=426)之间的皮质厚度差异,从ADNI FreeSurfer皮质厚度测量构建神经退行性脆弱性图谱。采用变分生成架构学习连接区域基因表达组织与皮质退化的潜在生物程序,同时结合基于图的空间平滑正则化以保持皮质组织。所提出的框架实现了对区域神经退行性脆弱性的强预测,解释方差为0.8604,预测与观察到的皮质退化图谱之间存在显著空间相关性(r=0.9439,p<0.001)。学习到的潜在表示揭示了与分布性疾病易感性相关的结构化转录组组织。这些发现表明,生物约束的生成建模可以桥接微观分子组织与宏观神经退化,为空间感知的生成神经生物学和计算神经科学奠定基础。

英文摘要

Neurodegenerative disorders such as Alzheimer's disease exhibit highly organized patterns of regional brain vulnerability, yet the biological mechanisms underlying this spatial selectivity remain incompletely understood. Existing imaging-transcriptomic studies have largely relied on correlation-based analyses between gene expression and neuroimaging phenotypes, limiting their ability to model how molecular organization gives rise to neurodegeneration. Here, we introduce a cross-scale spatially-aware generative framework for modeling transcriptomic programs underlying cortical neurodegeneration. Regional transcriptomic profiles were derived from the Allen Human Brain Atlas using 910 landmark genes across 68 cortical regions. Neurodegenerative vulnerability maps were constructed from ADNI FreeSurfer cortical thickness measurements by computing regional cortical thinning differences between cognitively normal controls (NC = 926) and Alzheimer's disease subjects (AD = 426). A variational generative architecture was used to learn latent biological programs linking regional gene-expression organization to cortical degeneration while incorporating graph-based spatial smoothness regularization to preserve cortical organization. The proposed framework achieved strong prediction of regional neurodegenerative vulnerability, yielding an explained variance of 0.8604 and a significant spatial correlation between predicted and observed cortical degeneration profiles (r = 0.9439, p < 0.001). The learned latent representations revealed structured transcriptomic organization associated with distributed disease susceptibility. These findings demonstrate that biologically constrained generative modeling can bridge microscale molecular organization with macroscale neurodegeneration, providing a foundation for spatially-aware generative neurobiology and computational neuroscience.

2606.05541 2026-06-05 physics.chem-ph cond-mat.soft q-bio.BM

Methods for Inferring Interaction Potentials from Cross-Linking Mass Spectrometry Data

从交联质谱数据推断相互作用势的方法

Börries von Seggern, Mohsen Sadeghi

AI总结 提出一种从交联质谱数据中参数化相互作用势的框架,通过连接逆Henderson问题并改进算法,在均匀流体和多相系统中实现了高效准确的势参数恢复。

详情
Comments
19 pages, 10 Figure, 5 Tables
AI中文摘要

交联质谱(XL-MS)已成为一种强大的定量技术,以前所未有的规模探测蛋白质内部结构信息以及蛋白质-蛋白质相互作用。XL-MS数据通过分子间连接子提供蛋白质对空间邻近性的信息。然而,将这些数据适配到粗粒化相互作用粒子模型的系统方法仍然有限。主要焦点集中在直接拟合径向分布函数(RDF),而许多可观测量,例如配位数(RDF的泛函),无法唯一地反演。在这项工作中,我们开发了一个框架,用于从这些可观测量中参数化相互作用势,适用于XL-MS结果中可能出现的相分离混合物。我们将该问题与逆Henderson问题建立联系,并采用迭代Boltzmann反演和迭代蒙特卡洛等算法进行数值求解。我们推导了精确和低密度极限梯度近似,并基于预测-校正框架提出了两种新算法。总共,我们在生物真实的十组分测试系统上评估了几种优化算法。我们证明,对于均匀流体,所有方法都实现了卓越的效率和准确性。关键的是,我们进一步证明在具有挑战性的三相系统中成功进行了参数化。其中,三种算法,即Adam、采用低密度导数的梯度下降以及使用精确梯度的牛顿法,可靠地恢复了正确的参数。这些结果为从XL-MS实验到相分离控制生物功能的系统的粗粒化蛋白质模型建立了清晰路径,可能促进对生物分子凝聚体和蛋白质聚集的新研究。

英文摘要

Cross-linking mass spectrometry (XL-MS) has emerged as a powerful quantitative technique for probing intra-protein structural information as well as protein-protein interactions at an unprecedented scale. XL-MS data yield information on the pairwise spatial proximity of proteins through inter-molecular linkers. However, systematic methods for adapting such data for coarse-grained interacting particle models remain limited. Predominant focus is put on directly fitting radial distribution functions (RDFs), while numerous observables, e.g. coordination numbers, which are functionals of the RDF, cannot be uniquely inverted. In this work, we develop a framework for parameterizing interaction potentials from such observables in potentially phase-separated mixtures, as encountered in XL-MS results. We establish a connection between this problem and the inverse Henderson problem and adapt algorithms such as Iterative Boltzmann Inversion and Iterative Monte Carlo to its numerical solution. We derive exact and low-density limit gradient approximations and propose two new algorithms based on an adaptation of the predictor-corrector~framework. In total, we evaluate several optimization algorithms on biologically realistic ten-component test systems. We demonstrate that for homogeneous fluids, all methods achieve exceptional efficiency and accuracy. Critically, we further demonstrate successful parametrization in a challenging three-phase system. Here, three algorithms, namely Adam and gradient descent employing the low-density derivative as well as Newton's method with the exact gradient, reliably recover the correct parameters. These results establish a clear pathway from XL-MS experiments to coarse-grained protein models for systems where phase separation governs biological function, potentially enabling new investigations of biomolecular condensates and protein aggregation.

2606.05474 2026-06-05 q-bio.BM cs.LG

AlloGen: Conformation-Selective Binder Generation with Differential State Scoring

AlloGen: 基于差异状态评分的构象选择性结合物生成

Hanqun Cao, Zachary Quinn, Aastha Pal, Sumi Kimura, Jingjie Zhang, Pheng Ann Heng, Pranam Chatterjee

AI总结 提出AlloGen框架,通过可学习的构象选择性评分器Qθ,结合骨架生成与状态选择性,实现针对蛋白不同构象状态的选择性结合物设计。

详情
AI中文摘要

蛋白质结合物设计主要优化亲和力,忽视了构象选择性:对于激酶、核受体和GPCR等变构靶点,无论结合多紧密,同时结合活性态和非活性态的结合物无法提供功能特异性。我们提出AlloGen,一个模块化框架,将骨架生成与学习到的状态选择性评分器$Q_θ$解耦,$Q_θ$是一个SE(3)不变的界面图变换器,通过两阶段课程训练,先学习界面几何,再施加构象区分。由于$Q_θ$完全可微且与生成器无关,它可以作为被动重排序器或主动基于梯度的引导器与任何骨架生成器集成,无需重新训练。在跨越多个家族和构象机制的多样化蛋白质基准上,AlloGen一致地识别出优先识别所需结构状态同时排斥替代构象的结合物。在钙调蛋白上的实验验证进一步表明,这些计算选择性信号可转化为物理分子,产生从头设计的肽,结合所需的全息构象,而对apo状态无检测到的结合。总之,这些结果确立了构象选择性作为可学习属性,并为状态选择性蛋白质结合物设计提供了通用框架。

英文摘要

Protein binder design has largely optimized for affinity alone, leaving conformational selectivity unaddressed: for allosteric targets such as kinases, nuclear receptors, and GPCRs, a binder that engages both active and inactive states provides no functional specificity regardless of how tightly it binds. We introduce AlloGen, a modular framework that decouples backbone generation from a learned state-selectivity scorer $Q_θ$, an SE(3)-invariant interface graph transformer trained via a two-phase curriculum that first learns interface geometry before imposing conformational discrimination. Because $Q_θ$ is fully differentiable and generator-agnostic, it integrates with any backbone generator as a passive reranker or an active gradient-based guide without retraining. Across a diverse benchmark of proteins spanning multiple families and conformational mechanisms, AlloGen consistently identifies binders that preferentially recognize desired structural states while rejecting alternative conformations. Experimental validation on calmodulin further demonstrates that these computational selectivity signals translate to physical molecules, yielding de novo peptides that bind the desired holo conformation while exhibiting no detectable binding to the apo state. Together, these results establish conformational selectivity as a learnable property and provide a general framework for state-selective protein binder design.

2606.05351 2026-06-05 nlin.CD physics.comp-ph physics.soc-ph q-bio.PE

Tricriticality and chaos in a generalized Allee-logistic map

广义Allee-logistic映射中的三临界点与混沌

Marcelo A. Pires, José S. Andrade, Hans J. Herrmann

AI总结 提出广义Allee-logistic映射,研究其从连续到不连续灭绝相变的三临界行为,并发现Allee效应抑制混沌。

详情
Comments
8 pages, 7 figures and 1 table
AI中文摘要

我们提出一个新的非线性动力学模型,即广义Allee-logistic (GAL) 映射,由 $x_{t+1} = r x_t (1 - x_t) G(x_t)$ 给出,其中 $G(x_t) = m (x_t - h) + 1 - m$ 包含了幅度 $m$ 和阈值 $h$ 的Allee效应。当 $m = 0$ 时,得到具有连续灭绝相变的logistic映射。相反,$m = 1$ 恢复了一个先前研究的模型,该模型仅经历不连续的灭绝-活跃相变。在这两个极端之间,GAL映射表现出非平凡现象,包括具有三临界点闭式表达式和普适交叉函数的三临界性。在小的外部输入下,我们验证了类似Widom的关系。我们还注意到Allee效应不利于混沌的出现。我们的工作在解析可处理的混沌映射、非平衡三临界性和Allee效应之间建立了额外的桥梁。

英文摘要

We present a novel nonlinear dynamical model, the generalized Allee-logistic (GAL) map given by $x_{t+1} = r x_t (1 - x_t) G(x_t)$ where $G(x_t) = m (x_t - h) + 1 - m$ incorporates the Allee effect with magnitude $m$ and threshold $h$. The case $m = 0$ yields the logistic map with a continuous transition to extinction. Conversely, $m = 1$ recovers a previously studied model that undergoes only a discontinuous extinction-to-active transition. Between these extremes, the GAL map exhibits nontrivial phenomena, including tricriticality with a closed-form expression for the tricritical point and a universal crossover function. Under a small external input, we verify Widom-like relations. We also note that the Allee effect disfavors the onset of chaos. Our work establishes additional bridges between analytically tractable chaotic maps, nonequilibrium tricriticality, and Allee effects.

2606.05327 2026-06-05 cs.LG q-bio.QM stat.ML

Multimarginal flow matching with optimal transport potentials

基于最优传输势的多边缘流匹配

Raghav Kansal, David Crair, Nghia Nguyen, Scott Pope, Bradley Parry

AI总结 提出一种利用动态最优传输势引导流匹配学习中间边缘分布的方法,实现高效无模拟的多边缘流匹配,在单细胞RNA测序、海洋学和气象数据集上取得最优性能。

详情
Comments
9 pages, 3 figures, 4 tables, and a 27 page appendix. Accepted to the Forty-Third International Conference on Machine Learning
AI中文摘要

流匹配(FM)已成为学习两个经验分布之间动态传输映射的强大框架。然而,对于存在中间观测边缘分布的情况,这些边缘分布有助于约束端点之间的流,这方面的研究较少。这种“多边缘”设置对于许多科学领域中动态系统的时间演化建模至关重要,这些领域可以对序列分布进行采样。我们通过一种新颖的方法解决了这个问题,该方法利用了FM与动态最优传输(OT)之间的联系,通过动态OT作用中的势项将流柔和地引导向中间边缘分布。通过扩展条件FM学习目标以包含这些势,我们推导出一种高效、无模拟的多边缘FM算法,该算法在学习流的时空动力学方面提供了相当大的灵活性。我们在不同的单细胞RNA测序、海洋学和气象数据集上展示了OT势FM(OTP-FM)的最先进性能和训练效率。我们的代码可在https://github.com/Bexorg-Inc/OTP-FM获取。

英文摘要

Flow matching (FM) has emerged as a powerful framework for learning dynamic transport maps between two empirical distributions. However, less explored is the setting with intermediate observed marginals that can help constrain the flows between the endpoints. This "multimarginal" regime is central to modeling temporal evolution in dynamical systems in many scientific domains that can sample sequential distributions. We tackle this problem with a novel approach that leverages the connection between FM and dynamic optimal transport (OT), softly steering the flow towards the intermediate marginals through potential terms in the dynamic OT action. By extending the conditional FM learning target to incorporate these potentials, we derive an efficient, simulation-free algorithm for multimarginal FM that offers considerable flexibility in the spatiotemporal dynamics of the learned flows. We demonstrate state-of-the-art performance and training efficiency of OT-potential FM (OTP-FM) on diverse single-cell RNA sequencing, oceanographic, and meteorological datasets. Our code is available at https://github.com/Bexorg-Inc/OTP-FM.

2606.05227 2026-06-05 q-bio.CB cs.LG math-ph math.MP q-bio.BM

Quantifying the biophysical properties of stomatocytes in health and disease

量化健康与疾病状态下口形红细胞的生物物理特性

Zhaojie Chai, Jianlu Zheng, He Li, Ming Dao, George Em Karniadakis

AI总结 通过耗散粒子动力学模拟与微流控成像结合,构建三种口形红细胞模型,揭示其几何主导的脾窦穿越行为、膜滚动抑制及高剪切粘度增加,统一解释遗传性口形红细胞增多症的脾切除悖论。

详情
Comments
26 pages, 9 figures
AI中文摘要

遗传性口形红细胞增多症(HS)包括以杯状红细胞为特征的红细胞疾病,这些细胞对脾切除术的反应相反:在过度水化型HS(OHS)中可治愈,但在脱水型HS(DHS/干裂红细胞)中可能促进血栓形成。这一悖论持续存在,因为红细胞生物力学由部分独立的参数——剪切模量、弯曲刚度、表面积体积比(S/V)和细胞质粘度——控制,而现有检测方法仅能零散地捕获这些参数。本文结合耗散粒子动力学(DPD)模拟与微流控成像,在固定膜面积和递减体积(109.7、101.5、89.8 fL)下构建了一个对照盘状红细胞和三种口形红细胞模型(ST-RBC1-3),覆盖OHS到DHS的范围。通过五种力学正交的检测追踪这组参数,我们发现内皮间裂隙(IES)穿越由几何主导:过度水化型ST-RBC1需要比健康红细胞高一个数量级的临界压力,而脱水型ST-RBC3可自由通过。然而,ST-RBC3抑制膜滚动,并在生理血细胞比容下将低剪切全血粘度提高约29%,与戈谢病高粘度相当。一个漏斗-障碍芯片将这些差异放大为无标记的中心线偏移信号,预计可区分所有四种红细胞类型(极端表型间约4.5个标准差)。这些结果将单细胞力学、脾滤过和血液流变学统一在一个框架内,解决了脾切除悖论,并指向HS的微流控术前风险分层。

英文摘要

Hereditary stomatocytosis (HS) comprises red blood cell (RBC) disorders characterized by cup-shaped erythrocytes that respond oppositely to splenectomy: curative in overhydrated HS (OHS) but potentially thrombogenic in dehydrated HS (DHS/xerocytosis). This paradox persists because RBC biomechanics is governed by partly independent parameters--shear modulus, bending rigidity, surface-to-volume ratio (S/V), and cytoplasmic viscosity--that existing assays capture only piecemeal. Here we combine dissipative particle dynamics (DPD) simulations with microfluidic imaging to construct a control discocyte and three stomatocyte models (ST-RBC1-3) at fixed membrane area and decreasing volume (109.7, 101.5, 89.8 fL), spanning the OHS-to-DHS range. Tracing this parameter set through five mechanically orthogonal assays, we find that interendothelial-slit (IES) traversal is geometry-dominated: overhydrated ST-RBC1 requires an order of magnitude higher critical pressure than healthy RBCs, whereas dehydrated ST-RBC3 passes freely. ST-RBC3 nonetheless suppresses membrane tank-treading and raises low-shear whole-blood viscosity by ~29% at physiological haematocrit, comparable to Gaucher-disease hyperviscosity. A funnel-obstacle chip amplifies these differences into a label-free centerline-offset signal predicted to separate all four RBC types (~4.5 standard deviations between extreme phenotypes). These results unite single-cell mechanics, splenic filtration, and hemorheology in one framework, resolve the splenectomy paradox, and point toward microfluidic pre-operative risk stratification in HS.

2606.05225 2026-06-05 q-bio.QM cs.LG

The Language of Elution: Autoregressive Prediction of the Next Feature in Untargeted LC-HRMS Lipidomics

洗脱的语言:非靶向LC-HRMS脂质组学中下一个特征的自回归预测

Dayanjan S. Wijesinghe

AI总结 将色谱洗脱建模为自回归序列预测任务,使用LSTM和Transformer模型基于无注释特征预测下一个洗脱的质荷比区间,在临床脂质组学数据上达到98.4%的top-1准确率,并揭示了序列模式而非分子特性驱动预测。

详情
AI中文摘要

非靶向液相色谱-高分辨质谱(LC-HRMS)每份样本可检测数千个分子特征,但仅有2-20%获得可靠的结构注释。造成这种“暗代谢组”的一个根本原因是串联质谱(MS/MS)采集是反应性的:仪器仅在离子出现后选择前体,而对后续洗脱的离子一无所知。我们将色谱洗脱重新定义为自回归序列预测任务。由于反相洗脱顺序由疏水性决定,连续特征形成物理约束的序列,类似于语言中的标记。我们将质荷比(m/z)轴离散化为110个区间,并训练长短期记忆(LSTM)和Transformer模型,基于五个无注释的每个标记特征(m/z区间、质量亏损、保留时间间隔、极性和强度排名)预测下一个洗脱的m/z区间。在来自四个临床脂质组学队列(342份血浆样本;SCIEX TripleTOF 6600+,Waters CSH C18)的15,242个特征上训练,LSTM达到98.4%的top-1准确率(top-5为99.99%;平均绝对误差3.6 Da),Transformer达到98.0%。消融实验表明,自回归上下文贡献了55.5个百分点,而没有任何单个特征贡献超过0.2个百分点:是序列模式而非分子特性驱动预测。模型在共享方法的仪器间可迁移(在独立Agilent 6530数据集上r=0.999),但在不同色谱柱化学(top-1为5.1%)或极性模式(2.6%)下失败,证实了方法和模式特异性。在少至2到5次质控进样上进行微调,可将保留准确率从2.6%恢复到近50%,因此跨条件部署需要最少的校准。这些结果表明洗脱序列高度可预测,并为预测性MS/MS采集奠定基础,以提高非靶向代谢组学的注释覆盖率。

英文摘要

Untargeted liquid chromatography-high-resolution mass spectrometry (LC-HRMS) detects thousands of molecular features per sample, yet only 2-20% receive confident structural annotations. A root cause of this "dark metabolome" is that tandem MS/MS acquisition is reactive: instruments select precursors only after ions appear, blind to what elutes next. We reframe chromatographic elution as an autoregressive sequence prediction task. Because reversed-phase elution order is governed by hydrophobicity, successive features form a physically constrained sequence, like tokens in language. We discretize the mass-to-charge (m/z) axis into 110 bins and train long short-term memory (LSTM) and Transformer models to predict the next eluting m/z bin from five annotation-free per-token features: m/z bin, mass defect, retention-time gap, polarity, and intensity rank. Trained on 15,242 features from four clinical lipidomics cohorts (342 plasma samples; SCIEX TripleTOF 6600+, Waters CSH C18), the LSTM reaches 98.4% top-1 accuracy (99.99% top-5; mean absolute error 3.6 Da) and the Transformer 98.0%. Ablation shows autoregressive context accounts for 55.5 percentage points while no single feature contributes more than 0.2 pp: the sequential pattern, not molecular properties, drives prediction. Models transfer across instruments sharing the method (r=0.999 on an independent Agilent 6530 dataset) but fail under a different column chemistry (5.1% top-1) or polarity mode (2.6%), confirming method- and mode-specificity. Fine-tuning on as few as two to five quality-control injections recovers held-out accuracy from 2.6% to nearly 50%, so cross-condition deployment needs minimal calibration. These results establish that elution sequences are highly predictable and lay the groundwork for predictive MS/MS acquisition to improve annotation coverage in untargeted metabolomics.

2606.05206 2026-06-05 q-bio.NC cs.AI stat.AP

Ontology-constrained multi-LLM scoring of hypothesis support in the predictive processing literature

本体约束的多LLM评分在预测处理文献中假设支持度的应用

Hamed Nejat, Alexander Maier, Jesse Spencer-Smith, André M. Bastos

AI总结 本文提出一个本地多LLM流水线,通过本体约束对预测编码文献中的研究进行评分,将异构文献映射到定量证据空间,并揭示假设间的结构化分歧。

详情
Comments
33 pages, 5 tables and 9 figures
AI中文摘要

跨学科领域由于方法多样和理论承诺不同,常常存在碎片化问题。预测编码神经科学是一个典型例子:其文献涵盖计算理论、电生理学、影像学、行为学和建模,造成了传统荟萃分析难以解决的综合问题。本文描述了一个用于本体约束文献综合的本地多LLM流水线。该流水线读取论文、提取证据、整合图表描述、组装约束提示,并根据专家词汇表验证输出。我们手动定义了一个预测编码词汇表,包含36个概念,分为三个假设:预测抑制、前向误差传播和普遍性。由十个本地语言模型组成的委员会根据每个词汇因子在局部和全局oddball情境下的一致性或不一致性,对31项研究进行评分。这使得可以进行成对研究一致性分析、跨模型比较和三维假设空间映射。某些假设的一致性较高,而其他假设则较弱,揭示了结构化分歧,特别是在局部与全局oddball范式之间。我们进一步定义了假设空间温度,这是一种几何离散度度量,用于衡量研究在假设空间中的紧凑程度。局部oddball情境的温度较低,而全局oddball情境的温度较高,表明后者离散度更大。评分几何还允许我们估计实验情境之间的变化向量。这些结果表明,本地多LLM委员会可以产生可审计的不一致性测量,将异构文献映射到定量证据空间。该框架可能推广到传统荟萃分析缺乏共同比较空间的跨研究假设映射。

英文摘要

Fragmentation is common in interdisciplinary fields with diverse methods and theoretical commitments. Predictive coding neuroscience is a clear example: its literature spans computational theory, electrophysiology, imaging, behavior, and modeling, creating a synthesis problem that conventional meta-analysis cannot easily resolve. Here, we describe a local multi-LLM pipeline for ontology-constrained literature synthesis. The pipeline reads papers, extracts evidence, incorporates figure descriptions, assembles constrained prompts, and validates outputs against an expert glossary. We manually defined a predictive-coding glossary of thirty-six concepts grouped into three hypotheses: predictive suppression, feedforward error propagation, and ubiquity. A council of ten local language models scored 31 studies according to their agreement or disagreement with each glossary factor across local and global oddball contexts. This enabled pairwise study-agreement analysis, cross-model comparison, and three-dimensional hypothesis-space mapping. Agreement was high for some hypotheses but weaker for others, revealing structured disagreement, particularly across local versus global oddball paradigms. We further define hypothesis-space temperature, a geometric dispersion metric measuring how compactly studies occupy the hypothesis space. Temperature was lower for local oddball contexts and higher for global oddball contexts, indicating greater dispersion in the latter. The scoring geometry also allowed us to estimate vectors of change between experimental contexts. These results demonstrate that local multi-LLM councils can produce auditable disagreement measurements that map heterogeneous literatures into quantitative evidence spaces. This framework may generalize to cross-study hypothesis mapping where conventional meta-analysis lacks a common comparison space.

2606.05198 2026-06-05 q-bio.BM cs.LG

An accurate nucleic acid-small molecule docking framework via geometric deep learning with large-scale pretraining

基于几何深度学习与大尺度预训练的核酸-小分子精准对接框架

Shi Li, Xujun Zhang, Mingquan Liu, Hui Zhang, Shuoying Jia, Yu Kang, Tingjun Hou, Peichen Pan

AI总结 提出NucleoDock框架,通过物理引导的大规模预训练和精细调优,结合序列、结构及原子级特征,利用混合密度网络几何评分头实现核酸-小分子对接,在125个复合物基准上达到56%的top-1成功率(RMSD<2.0Å),优于传统方法。

详情
Comments
34 pages, 4 figures, 4 tabels, Supplementary Materials includes 8 tabels
AI中文摘要

核酸越来越被认为是超越传统以蛋白质为中心的药物发现中的治疗靶点,然而将小分子准确高效地对接至核酸结构仍然具有挑战性。基于物理的对接方法通常准确性和效率有限,而深度学习方法则受限于实验解析的核酸-配体复合物稀缺。在此,我们提出NucleoDock,一个用于核酸-小分子对接的深度学习框架。为解决数据稀缺问题,NucleoDock将物理引导的大规模预训练(对数百万个对接生成的合成复合物)与在精选实验共晶结构上的微调相结合。它进一步整合了序列和结构信息的核苷酸表示与原子级三维特征,以捕获生物学背景和结合位点几何结构。使用基于混合密度网络的几何评分头来建模条件相互作用距离分布以进行构象排序。在125个核酸-配体复合物的外部基准测试中,NucleoDock在RMSD截止值2.0Å下实现了56%的top-1成功率,优于rDock的29%,同时每个复合物生成100个构象约需5秒。在ROBIN基准上的回顾性虚拟筛选进一步显示了早期富集的改善。NucleoDock代表了在弥合蛋白质导向和核酸导向的计算药物发现之间方法论差距方面迈出的一步。

英文摘要

Nucleic acids are increasingly recognized as therapeutic targets beyond conventional protein-centered drug discovery, yet accurate and efficient docking of small molecules to nucleic acid structures remains challenging. Physics-based docking methods often show limited accuracy and efficiency, whereas deep learning approaches are constrained by the scarcity of experimentally resolved nucleic acid-ligand complexes. Here, we present NucleoDock, a deep learning framework for nucleic acid-small molecule docking. To address data scarcity, NucleoDock combines physics-guided large-scale pretraining on millions of docking-generated synthetic complexes with fine-tuning on curated experimental co-crystal structures. It further integrates sequence- and structure-informed nucleotide representations with atomistic three-dimensional features to capture both biological context and binding-site geometry. A mixture density network-based geometric scoring head is used to model conditional interaction-distance distributions for pose ranking. On an external benchmark of 125 nucleic acid-ligand complexes, NucleoDock achieved a top-1 success rate of 56 percent at an RMSD cutoff of 2.0 Angstrom, outperforming rDock with 29 percent, while generating 100 poses in approximately 5 seconds per complex. Retrospective virtual screening on the ROBIN benchmark further showed improved early enrichment. NucleoDock represents a step toward bridging the methodological gap between protein- and nucleic acid-directed computational drug discovery.

2606.05196 2026-06-05 q-bio.MN math.CO

Uniform sampling of canalizing Boolean functions reveals hidden biases in Boolean network analysis

布尔函数规范采样的均匀化揭示布尔网络分析中的隐藏偏差

Ahana Ghosh, Claus Kadelka

AI总结 针对布尔函数参数均匀采样导致函数分布非均匀的偏差,提出结合动态规划和拒绝采样的均匀采样算法,并证明该偏差显著影响网络鲁棒性、稳定性等结论。

详情
Comments
14 pages, 4 figures
AI中文摘要

布尔网络被广泛用于模拟基因调控系统,其中布尔函数集合作为评估结构和动力学特性的零模型。常见方法通过均匀随机采样定义参数来生成规范和嵌套规范函数。然而,由于多个参数化可以表示相同的布尔函数,这导致不同函数上的非均匀分布,并系统性地偏倚随机集合。 本文开发了高效算法,用于均匀采样具有指定精确或最小规范深度的布尔函数,纠正了这一偏差。我们的方法结合了用于采样规范层结构的动态规划与基于拒绝的方法,并在BoolForge中实现。 我们表明,采样方案显著影响通常研究的函数级度量。在传统参数均匀采样下,嵌套规范函数的期望平均敏感度等于1,与变量数无关。相反,在函数均匀采样下,期望敏感度随系统规模增加,数值上接近约1.183。这种差异源于基于参数的采样对高敏感度函数的指数抑制。 这些差异传播到布尔网络模型,影响关于鲁棒性、稳定性、吸引子结构和基线动力学预期的结论。重新审视122个已发表的布尔基因调控网络模型,我们表明函数均匀零模型揭示了比先前推断的更强的低敏感度规范架构富集。我们的结果表明,广泛使用的零模型系统性地低估了基线敏感度,因此可能扭曲对生物网络中规范稳定作用的评估。

英文摘要

Boolean networks are widely used to model gene regulatory systems, where ensembles of Boolean functions serve as null models for assessing structural and dynamical properties. A common approach generates canalizing and nested canalizing functions by sampling their defining parameters uniformly at random. However, because multiple parameterizations can represent the same Boolean function, this induces a non-uniform distribution over distinct functions and systematically biases random ensembles. Here, we develop efficient algorithms for uniform sampling of Boolean functions with prescribed exact or minimal canalizing depth that correct this bias. Our approach combines dynamic programming for sampling canalizing layer structures with rejection-based methods and is implemented in BoolForge. We show that the sampling scheme substantially affects commonly studied function-level metrics. Under traditional parameter-uniform sampling, the expected average sensitivity of nested canalizing functions equals one independent of the number of variables. In contrast, under function-uniform sampling, the expected sensitivity increases with system size and numerically approaches approximately 1.183. This discrepancy arises from an exponential suppression of high-sensitivity functions under parameter-based sampling. These differences propagate to Boolean network models, affecting conclusions about robustness, stability, attractor structure, and baseline dynamical expectations. Revisiting 122 published Boolean gene regulatory network models, we show that function-uniform null models reveal a substantially stronger enrichment of low-sensitivity canalizing architectures than previously inferred. Our results demonstrate that widely used null models systematically underestimate baseline sensitivity and can therefore distort assessments of the stabilizing role of canalization in biological networks.

2606.05189 2026-06-05 q-bio.NC cs.NE

Bio-plausible Neuromorphic Disturbance Observer Based on Emulation Theory: Extended Version

基于仿真理论的生物合理性神经形态扰动观测器:扩展版

Hongfu Xu, Xiaoyu Guo, Shengbo Wang, Shuo Gao

AI总结 受生物神经系统启发,提出一种基于脉冲时序编码的神经形态扰动观测器与控制框架,利用自适应阈值触发机制减少脉冲事件,在噪声条件下将脉冲数量降至固定阈值的42.6%。

详情
AI中文摘要

生物神经系统通过稀疏、事件驱动的脉冲信息处理和自适应调节,在不确定环境中实现了显著的鲁棒性和适应性。受此范式启发,本文开发了一种神经形态扰动观测器(NDO)和控制框架,用脉冲时序编码替代传统的连续时间信号表示。扰动估计和控制输入均通过积分-点火(IF)神经元动力学从离散脉冲事件构建,产生内在的事件驱动更新。自适应阈值触发机制受脉冲频率适应(SFA)启发,实现了对脉冲生成的历史依赖调节。仿真结果表明,所提框架实现了神经启发的鲁棒性和适应性,而自适应阈值脉冲方案在噪声条件下将脉冲事件减少到固定阈值情况的42.6%。

英文摘要

Biological neural systems achieve remarkable robustness and adaptability in uncertain environments through sparse, event-driven spike-based information processing and adaptive regulation. Inspired by this paradigm, this paper develops a neuromorhpic disturbance observer (NDO) and control framework that replaces conventional continuous-time signal representations with spike-timing encoding. Both disturbance estimates and control inputs are constructed via integrate-and-fire (IF) neuron dynamics from discrete spike events, yielding intrinsically event-driven updates. An adaptive-threshold triggering mechanism is inspired by spike-frequency adaptation (SFA), enabling history-dependent regulation of spike generation. Simulation results demonstrate that the proposed framework achieves neurally inspired robustness and adaptability, while the adaptive-threshold spiking scheme reduces spike events to 42.6% of the fixed-threshold case under noisy conditions.

2605.23111 2026-06-05 q-bio.NC

Contextual Role Modulates Object Representational Geometry in the Human Brain

情境角色调节人脑中物体的表征几何结构

Julien Dirani, Shankar Chawla, Leila Wehbe, Bradford Z. Mahon

AI总结 本研究结合fMRI与自然电影观看,发现物体作为动作目标时激活顶叶动作网络,其表征按动作可供性组织;作为被动元素时激活枕颞网络,按语义维度组织,表明大脑根据情境角色动态重映射物体表征几何结构。

详情
AI中文摘要

人脑表征物体时既保持跨实例的不变性,又足够灵活以支持不同情境和任务。然而,当同一物体在情境角色间转换时,其表征如何被动态重映射仍不清楚。本研究结合fMRI与自然电影观看,探究同一物体作为场景中的被动元素与作为目标导向动作的目标时,其表征方式。当物体是动作目标时,它们激活了以缘上回和中央后回为中心的顶叶动作网络;而被动物体则招募了参与视觉物体识别的分布式枕颞网络。在各自情境中最强编码物体的网络内,表征几何结构表现出双重分离:目标物体表征按动作可供性和手姿势可供性维度组织,而被动物体表征则与语义维度对齐。此外,视觉表征结构在不同情境下保持不变。在这些情境特异性脑网络之外,表征内容保持情境不变性,表明灵活性和不变性在同一表征系统的不同层次上运作。总之,这些发现展示了物体表征几何结构的神经重映射,其方式依赖于自然场景中物体情境相关性的实时变化。

英文摘要

The human brain represents objects in a way that is both invariant across instances and flexible enough to support different contexts and tasks. Yet it remains unknown how object representations are dynamically remapped as the same object shifts across contextual roles. Using fMRI during naturalistic movie viewing we investigated how the same objects are represented when they are passive scene elements versus targets of goal-directed actions. Action targets engaged a parietal action network centered in the supramarginal and postcentral gyri, while passive objects recruited a distributed occipito-temporal network involved in visual object recognition. Within context-selective networks, representational geometry showed a double dissociation: target objects were organized by action affordance and hand posture affordance dimensions, while passive objects aligned with semantic dimensions. Visual representational structure was invariant to context. Outside these networks, representational content retained invariance, indicating that flexibility and invariance operate at different levels of the same representational system. These findings demonstrate neural remapping of object representations depending on moment-to-moment changes in contextual roles during a naturalistic scene.

2604.04285 2026-06-05 q-bio.MN cs.ET physics.bio-ph physics.chem-ph

Amplification at Equilibrium: Structural and Thermodynamic Limitations, and Implementation

平衡时的放大:结构和热力学限制,以及实现

Hamidreza Akef, Chia-Yu Sung, Aneesh Vanguri, David Soloveichik

AI总结 该研究探讨了在热力学平衡下实现信号放大的结构和热力学限制,证明了二聚体网络无法实现平衡放大,但通过允许三聚体复合物突破这一限制,并提出了一种等距三聚体放大器,实验验证了其放大因子接近预期的2倍,并推导出适用于任何平衡网络的通用热力学界限。

详情
Comments
To be published in DNA32 (32nd International Conference on DNA Computing and Molecular Programming)
AI中文摘要

放大弱分子信号在自然和工程生物化学系统中至关重要。虽然大多数放大方案在非平衡状态下运行,依赖于动能障碍和燃料驱动的级联反应,但通过在添加分析物时改变能量景观,也可以在热力学平衡下实现放大。平衡放大具有吸引力,因为原则上它可以保持在未触发状态下。在本工作中,我们建立了基于平衡的放大在结构和热力学方面的基本限制。我们首先证明了二聚体网络——仅限于最多两个单体的复合物系统——本质上无法实现平衡放大。这一“不可能”定理解释了先前不足互补的“链交换”设计中放大缺失的原因。然后我们显示,允许三聚体复合物突破这一障碍。我们提出了一种基于等距三聚体的放大器,其输出保持输入大小,从而实现模块化组合,并通过实验验证,实现了接近预期的2倍放大因子。最后,我们推导出适用于任何平衡网络的通用热力学界限,无论复合物大小如何:最大放大因子与分析物与放大器组件之间相互作用的自由能成线性关系。对于核酸系统,这意味着分析物长度必须与所需放大因子成线性增长,且固定分析物下模块化放大器的组合会产生边际效益递减。这些结果界定了平衡放大在结构和能量方面的边界,并严格证明了为实现高增益而采用非平衡方法的必要性。

英文摘要

Amplifying weak molecular signals is essential in both natural and engineered biochemical systems. While most amplification schemes operate out of equilibrium, relying on kinetic barriers and fuel-driven cascades, it is also possible to amplify at thermodynamic equilibrium by shifting the energy landscape upon addition of an analyte. Equilibrium amplification is appealing because, in principle, it can remain indefinitely in the untriggered state. In this work, we establish fundamental structural and thermodynamic limits on equilibrium-based amplification. We first prove that dimerization networks--systems restricted to complexes of at most two monomers--are inherently incapable of equilibrium amplification. This no-go theorem explains the absence of amplification in prior undercomplementary "strand commutation" designs. We then show that allowing trimeric complexes breaks this barrier. We propose an isometric trimer-based amplifier whose output preserves the size of the input, enabling modular composition, and validate it experimentally, achieving an amplification factor close to the expected $2\times$. Finally, we derive universal thermodynamic bounds applicable to any equilibrium network regardless of complex size: the maximum amplification factor scales linearly with the free energy of interaction between the analyte and the amplifier components. For nucleic acid systems, this implies that the analyte length must grow linearly with the desired amplification factor, and that composing modular amplifiers yields diminishing returns for a fixed analyte. Together, these results delineate the structural and energetic boundaries of equilibrium amplification and rigorously justify the necessity of out-of-equilibrium approaches for achieving high gain.

2601.00618 2026-06-05 q-bio.QM q-bio.BM

Quantifying the uncertainty of molecular dynamics simulations : Good-Turing statistics revisited

分子动力学模拟不确定性的量化:Good-Turing统计学的重新审视

Vasiliki Tsampazi, Nicholas M. Glykos

AI总结 该研究提出了一种改进的Good-Turing算法,用于高效估计分子动力学模拟中完全新结构出现的概率,适用于超长模拟。

详情
AI中文摘要

我们之前已展示Good-Turing统计学可应用于分子动力学轨迹以估计完全新(迄今为止未观察到)生物分子结构出现的概率,并证明该方法稳定可靠且预测可验证。初始算法的主要问题是需要计算并存储当前可用轨迹的二维RMSD矩阵,这限制了该方法在超长模拟中的应用。在此,我们描述了一种新的Good-Turing算法变体,其内存需求与轨迹中的结构数量成线性关系,使其适用于极长模拟。我们证明新方法与旧实现结果几乎相同,并展示了包含多达2200万结构的轨迹结果。实现新算法的计算机程序可在标准存储库中获得。

英文摘要

We have previously shown that Good-Turing statistics can be applied to molecular dynamics trajectories to estimate the probability of observing completely new (thus far unobserved) biomolecular structures, and showed that the method is stable, dependable and its predictions verifiable. The major problem with that initial algorithm was the requirement for calculating and storing in memory the two-dimensional RMSD matrix of the currently available trajectory. This requirement precluded the application of the method to very long simulations. Here we describe a new variant of the Good-Turing algorithm whose memory requirements scale linearly with the number of structures in the trajectory, making it suitable even for extremely long simulations. We show that the new method gives essentially identical results with the older implementation, and present results obtained from trajectories containing up to 22 million structures. A computer program implementing the new algorithm is available from standard repositories.

2512.23661 2026-06-05 q-bio.NC

Dynamical incompatibilities in paced finger tapping experiments

节律性任务中节律不相容性

Ariel D. Silva, Claudia R. González, Rodrigo Laje

AI总结 本文研究了节律性任务中不同扰动类型对响应时间的影响,发现不同扰动类型在不同实验中存在动态不相容性,但在同一实验中通过随机呈现可实现兼容,挑战了传统相位和周期修正过程的独立激活观念。

详情
AI中文摘要

节律性任务被用来探测传感器运动同步背后的误差修正机制。尽管已有百年历史,文献中仍存在基本矛盾。其中一种矛盾出现在比较两种最常见的周期扰动类型:阶跃变化和相位移时。刺激序列在包括意外扰动刺激的范围内完全相同。那么为何不同扰动类型会导致下一个响应时间不同,如观察到的?我们通过实验和理论证明,当在不同实验中记录时,两种扰动类型的响应在动态上是不相容的;即它们不能由单一的动态系统描述,因为不同时间背景的积累。相反,当两种扰动类型在同一实验中随机呈现时,响应变得兼容,并可以由单一机制解释。我们得出结论,单一动态系统可以代表所有扰动类型的响应,但仍然通过时间背景进行校准。我们的结果挑战了传统认为不同扰动类型分别激活相位和周期修正过程的观点。

英文摘要

Paced finger-tapping tasks are used to probe the error correction mechanism underlying sensorimotor synchronization. Despite their century-long history, fundamental contradictions persist in the literature. One such contradiction arises when comparing the two most common types of period perturbation: step change and phase shift. The stimulus sequence is exactly the same up to and including the (unexpected) perturbed stimulus. Why then would the timing of the next response be different between perturbation types, as observed? We show, both experimentally and theoretically, that responses to both types of perturbation are dynamically incompatible when recorded in separate experiments; that is, they cannot be described by a single underlying dynamical system due to the build-up of different temporal contexts. In contrast, when both types of perturbation are presented randomly within the same experiment, the responses become compatible and can be explained by a single mechanism. We conclude that a single underlying dynamical system can represent the response to all perturbation types, signs, and sizes, which is nevertheless calibrated by temporal context. Our results challenge the established idea of phase and period correction processes that are separately activated for different perturbation types.

2506.11152 2026-06-05 q-bio.GN cs.LG q-bio.CB

HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data

HEIST:一种用于空间转录组学和蛋白质组学数据的图基础模型

Hiren Madhu, João Felipe Rocha, Tinglin Huang, Siddharth Viswanath, Smita Krishnaswamy, Rex Ying

AI总结 本文提出HEIST模型,通过图结构建模空间转录组学和蛋白质组学数据,利用层次化图Transformer实现对细胞空间位置和基因表达的联合建模,从而提升对细胞异质性和微环境响应的理解。

详情
AI中文摘要

单细胞转录组学和蛋白质组学已成为驱动生物学研究的重要数据来源,使高级深度学习方法能够理解单细胞水平的细胞异质性和基因表达。随着空间组学数据的出现,我们有希望在组织背景下表征细胞,因为其提供了空间坐标和细胞内转录或蛋白质计数。蛋白质组学通过直接测量蛋白质提供互补视角,蛋白质是细胞功能的主要效应器和关键治疗靶点。然而,现有模型要么忽略空间信息,要么忽略细胞内的复杂遗传和蛋白质组程序,因此无法推断细胞内部调节如何适应微环境信号。此外,这些模型通常使用固定基因词汇表,限制了其对未知基因的泛化能力。在本文中,我们介绍了HEIST,一种用于空间转录组学和蛋白质组学的层次化图Transformer基础模型。HEIST将组织建模为层次化图。高层图是空间细胞图,每个细胞再由其下层的基因共表达网络图表示。HEIST通过执行不同层次的消息传递来利用其嵌入中的层次结构,从而能够泛化到包括空间蛋白质组学在内的新数据类型,而无需重新训练。HEIST在15个器官的124种组织中使用空间感知对比和掩码自动编码目标,预训练了2230万细胞。对HEIST嵌入的无监督分析揭示了先前模型遗漏的具有空间信息的亚群。下游评估显示其在蛋白质组学数据上的泛化能力和在临床结果预测、细胞类型注释和基因填补中的最先进性能。

英文摘要

Single-cell transcriptomics and proteomics have become a great source for data-driven insights into biology, enabling the use of advanced deep learning methods to understand cellular heterogeneity and gene expression at the single-cell level. With the advent of spatial-omics data, we have the promise of characterizing cells within their tissue context as it provides both spatial coordinates and intra-cellular transcriptional or protein counts. Proteomics offers a complementary view by directly measuring proteins, which are the primary effectors of cellular function and key therapeutic targets. However, existing models either ignore the spatial information or the complex genetic and proteomic programs within cells. Thus they cannot infer how cell internal regulation adapts to microenvironmental cues. Furthermore, these models often utilize fixed gene vocabularies, hindering their generalizability unseen genes. In this paper, we introduce HEIST, a hierarchical graph transformer foundation model for spatial transcriptomics and proteomics. HEIST models tissues as hierarchical graphs. The higher level graph is a spatial cell graph, and each cell in turn, is represented by its lower level gene co-expression network graph. HEIST achieves this by performing both intra-level and cross-level message passing to utilize the hierarchy in its embeddings and can thus generalize to novel datatypes including spatial proteomics without retraining. HEIST is pretrained on 22.3M cells from 124 tissues across 15 organs using spatially-aware contrastive and masked autoencoding objectives. Unsupervised analysis of HEIST embeddings reveals spatially informed subpopulations missed by prior models. Downstream evaluations demonstrate generalizability to proteomics data and state-of-the-art performance in clinical outcome prediction, cell type annotation, and gene imputation across multiple technologies.

2508.04435 2026-06-05 q-bio.NC

Cognitive Effort in the Two-Step Task: An Active Inference Drift-Diffusion Model Approach

两步任务中的认知努力:一种主动推断漂移扩散模型方法

Alvaro Garrido Perez, Viktor Lemoine, Amrapali Pednekar, Yara Khaluf, Pieter Simoens

AI总结 本文提出了一种结合主动推断(AIF)与漂移扩散模型(DDM)的模型,以探讨该模型能否同时解释习惯违反和价值辨别性所导致的认知努力,同时提出改进两步任务以更好地测量和隔离认知努力。

详情
Journal ref
Active Inference, Communications in Computer and Information Science 2857 (2026) 24-44
Comments
Paper accepted in the International Workshop on Active Inference, 2025: https://iwaiworkshop.github.io/#
AI中文摘要

基于贝叶斯大脑假说的高级理论通常将认知努力视为解决习惯与最优策略之间冲突的成本。同时,证据累积模型(EAMs)提供了关于努力如何从可用选项的主观价值竞争中产生的机制性解释。尽管EAMs已被结合到强化学习等框架中,以弥合高级理论与过程机制之间的差距,但对其在统一认知努力概念上的影响较少被关注。在此,我们结合主动推断(AIF)与漂移扩散模型(DDM)来探讨由此产生的AIF-DDM是否能同时解释由习惯违反和价值辨别性所导致的认知努力。到目前为止,这是首次将AIF与EAM结合。我们测试了AIF-DDM在两步任务的实验数据上的表现,并将其预测与基于AIF的信息论定义的认知努力进行比较。该模型的预测成功解释了第二阶段的反应时间,但未能捕捉到第一阶段的动力学。我们认为后者差异可能源于实验设计而非模型对认知努力假设的根本性缺陷。因此,我们提出了一些改进两步任务以更好地测量和隔离认知努力的修改。最后,我们发现整合DDM显著提高了参数恢复,这可能有助于未来研究获得更可靠的参数估计。

英文摘要

High-level theories rooted in the Bayesian Brain Hypothesis often frame cognitive effort as the cost of resolving the conflict between habits and optimal policies. In parallel, evidence accumulator models (EAMs) provide a mechanistic account of how effort arises from competition between the subjective values of available options. Although EAMs have been combined with frameworks like Reinforcement Learning to bridge the gap between high-level theories and process-level mechanisms, relatively less attention has been paid to their implications for a unified notion of cognitive effort. Here, we combine Active Inference (AIF) with the Drift-Diffusion Model (DDM) to investigate whether the resulting AIF-DDM can simultaneously account for effort arising from both habit violation and value discriminability. To our knowledge, this is the first time AIF has been combined with an EAM. We tested the AIF-DDM on a behavioral dataset from the two-step task and compared its predictions to an information-theoretic definition of cognitive effort based on AIF. The model's predictions successfully accounted for second-stage reaction times but failed to capture the dynamics of the first stage. We argue the latter discrepancy likely stems from the experimental design rather than a fundamental flaw in the model's assumptions about cognitive effort. Accordingly, we propose several modifications of the two-step task to better measure and isolate cognitive effort. Finally, we found that integrating the DDM significantly improved parameter recovery, which could help future studies to obtain more reliable parameter estimates.

2505.18566 2026-06-05 q-bio.BM

Atomic Density Distributions in Proteins: Structural and Functional Implications

蛋白质中的原子密度分布:结构与功能的启示

Sotirios Touliopoulos, Nicholas M. Glykos

AI总结 本研究通过分析21255个非冗余蛋白质结构的原子密度分布,发现存在显著的统计差异,揭示了不同蛋白质结构在原子排列上的系统性偏差及功能特征。

详情
AI中文摘要

原子排列是表征蛋白质结构的重要指标,因为它显著影响各种特性,包括蛋白质的稳定性、进化速率和功能作用。蛋白质结构中的排列是蛋白质原子整体接近程度的度量,并且在不同结构中可以有显著变化。然而,即使是单域蛋白质在其结构中也不表现出均匀的排列。许多不同的方法已被用来测量蛋白质排列的质量,识别影响它的因素及其可能的含义。在本工作中,我们检查了从21255个非冗余蛋白质结构中得出的原子密度分布,并显示这些分布之间存在统计显著的差异。生物分子组装单元被选为这些结构的代表。几种蛋白质结构显著且系统性地偏离平均排列行为。层次聚类表明,存在具有相似原子密度分布的结构群。在这些群中寻找共同特征和模式显示,其中一些包括具有特征结构如螺旋结构和细胞色素的蛋白质。某些分类家族如水解酶和转移酶也更频繁地出现在密集和松散排列的群中。关于影响排列的因素,我们的结果支持知识,即较大结构在密度值的范围上较小,但比较小蛋白质更松散排列。我们还使用了指标,如晶体学水分子的丰富度和B因子作为结构稳定性估计,以揭示其与排列的关系。

英文摘要

Atomic packing is an important metric for characterizing protein structures, as it significantly influences various features including the stability, the rate of evolution and the functional roles of proteins. Packing in protein structures is a measure of the overall proximity between the proteins' atoms and it can vary notably among different structures. However, even single domain proteins do not exhibit uniform packing throughout their structure. Many different methods have been used to measure the quality of packing in proteins, identify factors that influence it, and their possible implications. In this work, we examine atomic density distributions derived from 21,255 non-redundant protein structures and show that statistically significant differences between those distributions are present. The biomolecular assembly unit was chosen as a representative for these structures. Several protein structures deviate significantly and systematically from the average packing behavior. Hierarchical clustering indicated that there are groups of structures with similar atomic density distributions. Search for common features and patterns in these clusters showed that some of them include proteins with characteristic structures such as coiled-coils and cytochromes. Certain classification families such as hydrolases and transferases have also a preference to appear more frequently in dense and loosely-packed clusters respectively. Regarding factors influencing packing, our results support knowledge that larger structures have a smaller range in their density values, but tend to be more loosely packed, compared to smaller proteins. We also used indicators, like crystallographic water molecules abundance and B-factors as estimates of the stability of the structures to reveal its relationship with packing.

2404.01405 2026-06-05 q-bio.BM

The curious case of A31P, a topology-switching mutant of the Repressor of Primer protein : A molecular dynamics study of its folding and misfolding

A31P突变体的奇特案例:Repressor of Primer蛋白的拓扑转换突变体:对其折叠和错误折叠的分子动力学研究

Olympia-Dialekti Vouzina, Alexandros Tafanidis, Nicholas M. Glykos

AI总结 研究探讨了A31P突变如何改变Repressor of Primer蛋白的结构拓扑,通过分子动力学模拟揭示了该突变导致的结构不稳定性和折叠机制。

详情
AI中文摘要

突变对蛋白质结构的影响通常是局部和轻微的。找到一种突变能单方面改变蛋白质结构的折叠和/或拓扑结构是罕见的例外。Repressor of Primer(Rop)蛋白的A31P突变体就是一个例外:这种单一突变——通过两个独立的晶体结构测定证实——将Rop的典型(左-handed/全部反平行)4个α螺旋束转换为一种新的形式(右-handed/混合平行和反平行束),表现出以前未观察到的“bisecting U”拓扑结构。理解这种突变对Rop折叠的剧烈影响的主要问题是理解其存在的原因:大多数计算方法似乎认为该突变应无显著影响,大多数能量最小化方法和蛋白质结构预测协议表明该突变与原生Rop结构完全一致,只需要在突变位点进行局部和轻微的改变。在这里,我们使用两个长达10微秒的分子动力学模拟来比较原生Rop与一个与原生Rop相同但携带单个丝氨酸-31到脯氨酸突变的假设结构的稳定性与动态特性。对两个轨迹的比较分析令人信服地表明,与能量最小化方法的指示相反,但与实验数据一致,这种假设的原生样A31P结构是不稳定的,其转角区域几乎完全展开,即使是在我们用于本研究的相对温和的320K NpT模拟条件下也是如此。我们讨论这些发现对A31P突变体折叠的影响,特别是与双通道能景观模型的假设相关。

英文摘要

The effect of mutations on protein structures is usually rather localized and minor. Finding a mutation that can single-handedly change the fold and/or topology of a protein structure is a rare exception. The A31P mutant of the homodimeric Repressor of Primer (Rop) protein is one such exception: This single mutation -- and as demonstrated by two independent crystal structure determinations -- can convert the canonical (left-handed/all-antiparallel) 4-alpha-helical bundle of Rop, to a new form (right-handed/mixed parallel and antiparallel bundle) displaying a previously unobserved 'bisecting U' topology. The main problem with understanding the dramatic effect of this mutation on the folding of Rop is to understand its very existence : Most computational methods appear to agree that the mutation should have had no appreciable effect, with the majority of energy minimization methods and protein structure prediction protocols indicating that this mutation is fully consistent with the native Rop structure, requiring only a local and minor change at the mutation site. Here we use two long (10 us each) molecular dynamics simulations to compare the stability and dynamics of the native Rop versus a hypothetical structure that is identical with the native Rop but is carrying this single Alanine-31 to Proline mutation. Comparative analysis of the two trajectories convincingly shows that in contrast to the indications from energy minimization -- but in agreement with the experimental data -- this hypothetical native-like A31P structure is unstable, with its turn regions almost completely unfolding, even under the relatively mild 320K NpT simulations that we have used for this study. We discuss the implication of these findings for the folding of the A31P mutant, especially with respect to the proposed model of a double-funneled energy landscape.

2308.12224 2026-06-05 q-bio.QM cs.AI

Enhancing cardiovascular risk prediction through AI-enabled calcium-omics

通过AI赋能的钙组学增强心血管风险预测

Ammar Hoori, Sadeer Al-Kindi, Tao Hu, Yingnan Song, Hao Wu, Juhwan Lee, Nour Tashtish, Pingfu Fu, Robert Gilkeson, Sanjay Rajagopalan, David L. Wilson

AI总结 本文通过利用详细的钙沉积特征(即钙组学)结合AI方法,提高了主要不良心血管事件(MACE)预测的准确性,展示了钙组学在心血管风险预测中的应用价值。

详情
Comments
12 pages, 8 figures, 2 tables, 4 pages supplemental, journal paper format (under review)
AI中文摘要

背景. 冠状动脉钙化(CAC)是预测主要不良心血管事件(MACE)的强大预测因子。传统的Agatston评分只是简单地将钙含量相加,尽管是非线性方式,但仍有改进钙沉积评估的空间,以更全面地捕捉疾病程度。目标. 确定是否可以通过使用详细的钙沉积特征(即钙组学)的AI方法来提高MACE预测。方法. 我们研究了钙沉积的其他特征,包括质量、体积、密度、空间分布、区域等的评估。我们使用带有弹性网络正则化的Cox模型,在2457例CT钙化评分(CTCS)中,该评分富集了MACE事件,来源于一个大型无成本CLARIFY计划(ClinicalTrials.gov标识符:NCT04075162)。我们采用了采样技术来增强模型训练。我们还研究了使用选定特征的Cox模型,以识别可解释的高风险特征。结果. 我们提出的钙组学模型,通过修改的合成下采样和上采样,给出了C指数(80.5%/71.6%)和两年AUC(82.4%/74.8%)(80:20,训练/测试),分别(采样仅应用于训练集)。结果优于Agatston,后者给出了C指数(71.3%/70.3%)和AUC(71.8%/68.8%)。在钙组学特征中,钙化数量、左前降支质量及扩散率(空间分布的度量)是增加风险的重要决定因素,而致密钙化(>1000HU)与较低风险相关。钙组学模型在保留测试中将63%的MACE患者重新分类到高风险组。分类净再分类指数为NRI=0.153。结论. AI分析冠状动脉钙化可比Agatston评分产生更好的结果。我们的发现表明,钙组学在改进风险预测中的应用价值。

英文摘要

Background. Coronary artery calcium (CAC) is a powerful predictor of major adverse cardiovascular events (MACE). Traditional Agatston score simply sums the calcium, albeit in a non-linear way, leaving room for improved calcification assessments that will more fully capture the extent of disease. Objective. To determine if AI methods using detailed calcification features (i.e., calcium-omics) can improve MACE prediction. Methods. We investigated additional features of calcification including assessment of mass, volume, density, spatial distribution, territory, etc. We used a Cox model with elastic-net regularization on 2457 CT calcium score (CTCS) enriched for MACE events obtained from a large no-cost CLARIFY program (ClinicalTri-als.gov Identifier: NCT04075162). We employed sampling techniques to enhance model training. We also investigated Cox models with selected features to identify explainable high-risk characteristics. Results. Our proposed calcium-omics model with modified synthetic down sampling and up sampling gave C-index (80.5%/71.6%) and two-year AUC (82.4%/74.8%) for (80:20, training/testing), respectively (sampling was applied to the training set only). Results compared favorably to Agatston which gave C-index (71.3%/70.3%) and AUC (71.8%/68.8%), respectively. Among calcium-omics features, numbers of calcifications, LAD mass, and diffusivity (a measure of spatial distribution) were important determinants of increased risk, with dense calcification (>1000HU) associated with lower risk. The calcium-omics model reclassified 63% of MACE patients to the high risk group in a held-out test. The categorical net-reclassification index was NRI=0.153. Conclusions. AI analysis of coronary calcification can lead to improved results as compared to Agatston scoring. Our findings suggest the utility of calcium-omics in improved prediction of risk.

2303.07683 2026-06-05 q-bio.NC

Recovering Arrhythmic EEG Transients from Their Stochastic Interference

从其随机干扰中恢复无节律性EEG瞬变

Javier Díaz, Hiroyasu Ando, GoEun Han, Olga Malyshevskaya, Xifang Hayashi, Juan-Carlos Letelier, Masashi Yanagisawa, Kaspar E. Vogt

AI总结 本文提出了一种基于无节律性脉冲叠加的EEG起源新理论,通过改进的数学方法恢复脉冲信号,揭示了睡眠和清醒状态下的特定模式,并首次发现了与超快瞬变相关的EEG特征。

详情
Journal ref
Communications Biology (2026)
Comments
Original research manuscript in PDF format, 46 pages long, with 13 figures and one table
AI中文摘要

传统上,脑电图(EEG)的神经动力学被认为源自具有不同同步程度的节律性振荡器。这种主导的隐喻通过频域EEG分析来识别以频率和谱功率为主要特征的神经电流源群体。然而,新兴的EEG观点强调其无节律性本质,这主要通过宽带EEG特性如普遍存在的1/f谱来推断。在本研究中,我们使用一种无节律性脉冲叠加作为隐喻来解释EEG的起源。这种概念化存在根本问题,因为脉冲叠加产生的干扰会生成有色高斯噪声,掩盖生成脉冲的时间轮廓。我们通过开发一种涉及自相关函数导数的数学方法来解决这个问题,从而能够恢复出色的脉冲近似,显著扩展了此类随机过程的分析。当该方法应用于记录睡眠-觉醒周期中自发的小鼠EEG(采样率为5 kHz)时,揭示了特定模式——称为Ψ模式——这些模式表征了非快速眼动睡眠、快速眼动睡眠和清醒状态。Ψ模式可以理论上理解为时间域中的功率密度,并对应于在不同时间尺度上生成脉冲的组合。值得注意的是,我们报告了首个EEG清醒状态特异性特征,该特征对应于观察到模式中的超快(约1毫秒)瞬变成分。通过将EEG起源的范式从振荡器转移到随机脉冲生成器,我们的理论框架推动了传统傅里叶基EEG分析的边界,为神经动力学中无节律性成分的新见解铺平了道路。

英文摘要

Traditionally, the neuronal dynamics underlying electroencephalograms (EEG) have been understood as arising from \textit{rhythmic oscillators with varying degrees of synchronization}. This dominant metaphor employs frequency domain EEG analysis to identify the most prominent populations of neuronal current sources in terms of their frequency and spectral power. However, emerging perspectives on EEG highlight its arrhythmic nature, which is primarily inferred from broadband EEG properties like the ubiquitous $1/f$ spectrum. In the present study, we use an \textit{arrhythmic superposition of pulses} as a metaphor to explain the origin of EEG. This conceptualization has a fundamental problem because the interference produced by the superpositions of pulses generates colored Gaussian noise, masking the temporal profile of the generating pulse. We solved this problem by developing a mathematical method involving the derivative of the autocovariance function to recover excellent approximations of the underlying pulses, significantly extending the analysis of this type of stochastic processes. When the method is applied to spontaneous mouse EEG sampled at $5$ kHz during the sleep-wake cycle, specific patterns -- called $Ψ$-patterns -- characterizing NREM sleep, REM sleep, and wakefulness are revealed. $Ψ$-patterns can be understood theoretically as \textit{power density in the time domain} and correspond to combinations of generating pulses at different time scales. Remarkably, we report the first EEG wakefulness-specific feature, which corresponds to an ultra-fast ($\sim 1$ ms) transient component of the observed patterns. By shifting the paradigm of EEG genesis from oscillators to random pulse generators, our theoretical framework pushes the boundaries of traditional Fourier-based EEG analysis, paving the way for new insights into the arrhythmic components of neural dynamics.

0908.3170 2026-06-05 q-bio.NC cond-mat.dis-nn cond-mat.stat-mech cs.HC

The thermodynamics of human reaction times

人类反应时间的热力学

Fermín Moscoso del Prado Martín

AI总结 本文提出了一种新的方法来解释行为实验中反应时间数据,通过热力学视角将反应时间分布的熵作为无模型估计认知系统处理量的指标,从而更准确地评估不同信息源对认知的影响。

详情
Journal ref
Journal of Mathematical Psychology (2011), Volume 55, Issue 4, 302-319,
Comments
Submitted manuscript
AI中文摘要

我提出了一种新的方法来解释行为实验中反应时间(RT)数据的解释。从物理角度看,RT分布的熵提供了一个无模型估计认知系统处理量的模型。因此,焦点从传统解释个体RT为长或短,转为关注其分布在熵方面的复杂性。新方法使在不参考刺激信息内容的情况下估计认知处理负荷,从而更准确地评估不同信息源对认知的影响。本文介绍了理论的提出,随后使用人类词汇任务(视觉词汇决策和词命名)的数据集进行实证验证。结果表明,这种新的RT解释比传统方法更强大。该方法提供了对单个刺激引发的处理负荷的理论估计。这些负荷明显区分了不同任务的响应。此外,它还为系统处理信息的速度提供了上限估计。最后,我论证了该理论提案及其相关实证证据为一个能够系统调整其操作处理速度以适应每个刺激特定需求的适应性系统提供了有力论据。这一发现与Hick定律相矛盾,Hick定律认为在实验情境内处理速度相对恒定。

英文摘要

I present a new approach for the interpretation of reaction time (RT) data from behavioral experiments. From a physical perspective, the entropy of the RT distribution provides a model-free estimate of the amount of processing performed by the cognitive system. In this way, the focus is shifted from the conventional interpretation of individual RTs being either long or short, into their distribution being more or less complex in terms of entropy. The new approach enables the estimation of the cognitive processing load without reference to the informational content of the stimuli themselves, thus providing a more appropriate estimate of the cognitive impact of different sources of information that are carried by experimental stimuli or tasks. The paper introduces the formulation of the theory, followed by an empirical validation using a database of human RTs in lexical tasks (visual lexical decision and word naming). The results show that this new interpretation of RTs is more powerful than the traditional one. The method provides theoretical estimates of the processing loads elicited by individual stimuli. These loads sharply distinguish the responses from different tasks. In addition, it provides upper-bound estimates for the speed at which the system processes information. Finally, I argue that the theoretical proposal, and the associated empirical evidence, provide strong arguments for an adaptive system that systematically adjusts its operational processing speed to the particular demands of each stimulus. This finding is in contradiction with Hick's law, which posits a relatively constant processing speed within an experimental context.