arXivDaily arXiv每日学术速递 周一至周五更新
重置
Q-BIO定量生物31
2606.12219 2026-06-11 q-bio.GN q-bio.MN 新提交

m6A-FORM: A Foundation Model for Decoding N6-methyladenosine Biology

m6A-FORM:解码N6-甲基腺苷生物学的基础模型

Tinghe Zhang, Sumin Jo, Shou-Jiang Gao, Yufei Huang

AI总结 提出基于Transformer的基础模型m6A-FORM,利用MeRIP-seq峰作为先验,预训练后微调实现m6A位点预测,性能优于现有方法,并支持调控因子结合位点预测和组织保守位点分析。

详情
AI中文摘要

N6-甲基腺苷(m6A)是真核生物mRNA中最丰富的内部修饰。然而,现有大多数预测器采用以腺苷为中心的公式,计算效率低且易产生假阳性。本文提出m6A-FORM,一种基于Transformer的RNA甲基化基础模型,使用MeRIP-seq峰作为甲基化富集先验,并在来自143个人类MeRIP-seq研究的约2200万个峰衍生序列上预训练。使用来自m6A-Atlas v2.0和GLORI的高置信度单核苷酸m6A注释微调后,m6A-FORM-sites实现了最先进的m6A位点预测性能,PR-AUC为0.635,ROC-AUC为0.988,PR-AUC比现有方法至少提高0.14,同时推理速度显著加快。任务特定适配进一步支持19个m6A相关调控因子的结合位点预测,以及识别与mRNA降解相关的YTHDF2结合m6A位点。将m6A-FORM应用于来自24个人类组织的67个数据集,识别出19,631个组织保守位点,这些位点具有独特的定位、聚类、甲基化、表达、RBP相互作用和衰变相关特征。

英文摘要

N6-methyladenosine (m6A) is the most abundant internal modification in eukaryotic mRNA. However, most existing predictors use adenosine-centered formulations that are computationally inefficient and prone to false positives. Here we present m6A-FORM, a transformer-based foundation model for RNA methylation that uses MeRIP-seq peaks as methylation-enriched priors and is pretrained on approximately 22 million peak-derived sequences from 143 human MeRIP-seq studies. After fine-tuning with high-confidence single-nucleotide m6A annotations from m6A-Atlas v2.0 and GLORI, m6A-FORM-sites achieves state-of-the-art m6A site prediction performance, with a PR-AUC of 0.635 and ROC-AUC of 0.988, improving PR-AUC by at least 0.14 over existing methods while enabling substantially faster inference. Task-specific adaptation further supports prediction of binding sites for 19 m6A-associated regulators and identification of YTHDF2-bound m6A sites associated with mRNA degradation. Applying m6A-FORM across 67 datasets from 24 human tissues identifies 19,631 tissue-conserved sites with distinct localization, clustering, methylation, expression, RBP-interaction, and decay-associated signatures.

2606.12209 2026-06-11 q-bio.QM 新提交

Interpretable enzyme function prediction via sparse autoencoder features of ESMC across the microbial protein universe

通过ESMC稀疏自编码器特征实现可解释的酶功能预测:跨越微生物蛋白质宇宙

Yue Hu, Wanyu Cheng, Junqing Wang, Yingchao Liu

AI总结 利用ESMC-6B蛋白质语言模型的稀疏自编码器特征,无需任务特定训练即可准确预测酶功能,在微生物酶基准上实现78.9% top-1准确率,并发现16.9万个暗酶候选。

详情
Comments
17 pages, 5 figures, 3 tables
AI中文摘要

微生物基因组和宏基因组包含数百万功能未知的酶,即酶暗物质。虽然深度学习改进了蛋白质功能预测,但大多数方法是基于序列或结构相似性的黑箱,限制了新型催化活性的发现。ESMC-6B蛋白质语言模型及其稀疏自编码器(具有16,384维可解释生物学概念码本,每个概念由GPT-5注释)创造了新的机会:直接将这些特征用作酶功能的语义签名。在这里,我们展示了ESMC-SAE特征能够实现准确且可解释的酶委员会(EC)编号预测,无需任务特定训练或GPU密集型计算。在包含4,868个微生物SwissProt酶(涵盖161个EC3子类)的平衡基准上,ESMC-SAE二元特征达到78.9%的top-1和88.5%的top-5准确率,比3-mer基线(57.3%)高37.6%。在模拟发现新型酶类的留一EC3子类评估中,SAE特征在47.7%的情况下恢复了EC1超类(随机为14.3%,3.3倍),而序列方法为26.6%。判别性特征对应于机制上可解释的概念:水解酶的催化三联体几何结构、氧化还原酶的NAD(P)H结合Rossmann折叠、转移酶的磷酸结合P环。我们还调查了包含770万个簇的ESM Atlas,并在所有主要微生物门中识别出169,859个暗酶样候选。我们的结果为微生物暗物质中的酶功能发现建立了一个范式:设计上可解释,无需GPU集群即可扩展,适用于ESM Atlas中的数十亿蛋白质。

英文摘要

Microbial genomes and metagenomes contain millions of proteins whose enzymatic functions remain unknown, the enzyme dark matter. While deep learning has improved protein function prediction, most methods are black boxes relying on sequence or structural similarity, limiting discovery of novel catalytic activities. The ESMC-6B protein language model and its sparse autoencoder with a 16,384-dimensional codebook of interpretable biological concepts, each annotated by GPT-5, creates a new opportunity: using these features directly as semantic signatures for enzyme function. Here, we show that ESMC-SAE features enable accurate and interpretable enzyme commission (EC) number prediction without task-specific training or GPU-intensive computation. On a balanced benchmark of 4,868 microbial SwissProt enzymes across 161 EC3 subclasses, ESMC-SAE binary features achieve 78.9% top-1 and 88.5% top-5 accuracy, 37.6% higher than 3-mer baselines (57.3%). In leave-one-EC3-class-out evaluation simulating discovery of novel enzyme classes, SAE features recover the EC1 superclass in 47.7% of cases (3.3x random, 14.3%), versus 26.6% for sequence methods. Discriminative features correspond to mechanistically interpretable concepts: catalytic triad geometry for hydrolases, NAD(P)H-binding Rossmann folds for oxidoreductases, phosphate-binding P-loops for transferases. We also survey the ESM Atlas of 7.7 million clusters and identify 169,859 dark enzyme-like candidates across all major microbial phyla. Our results establish a paradigm for enzyme function discovery in microbial dark matter: interpretable by design, scalable without GPU clusters, and applicable to the billions of proteins in the ESM Atlas.

2606.11893 2026-06-11 cs.LG cs.AI cs.CL q-bio.NC 新提交

Beyond representational alignment with brain-guided language models for robust reasoning

超越表征对齐:基于大脑引导的语言模型实现稳健推理

Mingqing Xiao, Kai Du, Zhouchen Lin

发表机构 * State Key Lab of General AI, School of Intelligence Science and Technology, Peking University(北京大学通用人工智能国家重点实验室、智能科学与技术学院) Department of Psychological and Cognitive Sciences, Tsinghua University(清华大学心理与认知科学系) Microsoft Research Asia(微软亚洲研究院)

AI总结 研究通过fMRI信号增强大型语言模型推理能力,提出脑引导框架,在10个模型上实现最高13%的准确率提升。

详情
AI中文摘要

大型语言模型(LLMs)与人类高阶认知背后的神经机制之间的对应关系仍未得到充分表征。鉴于人脑中语言和推理似乎是可分离的,一个开放的问题是LLMs是否与来自推理相关区域的神经信号对齐,以及这些信号是否能够改进它们。在此,我们聚焦于演绎推理,表明LLM内部表征不仅与任务fMRI活动部分对齐,而且可以直接通过这些信号增强。使用神经预测性度量,我们发现LLMs在聚合水平上解释了推理相关区域中可解释方差的很大一部分,而在特定推理类型内的预测性较低,表明对齐和分歧并存。基于此,我们提出一个脑引导框架:我们沿着由模型和大脑表征的联合结构诱导的方向引导模型表征,在推理时进行干预,在训练时进行微调。我们证明任务诱发的脑信号可以直接增强LLM推理,在10个LLM(1.5B-72B)上产生与仅语言监督正交的增益,具有跨推理类型的迁移,以及高达13%的绝对准确率提升。我们的结果将LLM-大脑对应关系从相关性推进到引导,建立了一条由脑信号驱动的路径,通向更稳健和认知对齐的AI。

英文摘要

The correspondence between large language models (LLMs) and the neural mechanisms underlying human higher-order cognition remains insufficiently characterized. Given that language and reasoning in the human brain appear dissociable, an open question is whether LLMs align with neural signals from reasoning-related regions and whether such signals can improve them. Here, focusing on deductive reasoning, we show that LLM internal representations are not only partially aligned with task-fMRI activity but can also be directly enhanced by these signals. Using a neural-predictivity metric, we find that LLMs explain a substantial fraction of the explainable variance in reasoning-related regions at the aggregate level, whereas predictivity within specific reasoning types is lower, indicating both alignment and divergence. Building on this, we propose a brain-guided framework: we steer model representations along directions induced by the joint structure of model and brain representations, applying intervention at inference and fine-tuning during training. We demonstrate that task-evoked brain signals can directly enhance LLM reasoning, yielding gains orthogonal to language-only supervision across 10 LLMs (1.5B-72B), with transfer across reasoning types and up to 13\% absolute accuracy gain. Our results advance LLM-brain correspondences from correlation to guidance, establishing a brain-signal-driven pathway toward more robust and cognitively aligned AI.

2606.11876 2026-06-11 q-bio.QM cs.LG stat.ME 新提交

Seeing Below the Limit of Detection: A Censored-Poisson Bayesian Latent-Growth Change-Point Detector (the Span Detector) for Serial ctDNA in HR+/HER2- Metastatic Breast Cancer

检测限以下:用于HR+/HER2-转移性乳腺癌连续ctDNA的删失泊松贝叶斯潜在增长变点检测器(Span检测器)

Aarchi Singh Thakur, Abhijoy Sarkar

AI总结 提出Span检测器,利用删失泊松贝叶斯潜在增长变点模型处理ctDNA非检测作为左删失观测,通过序贯广义似然比统计量检测变异检测率上升点,在10%假警报率下将提前三个月捕获进展的比例从11%提升至25%。

详情
Comments
9 pages, 4 figures, 2 tables. Code and synthetic data generator: this https URL
AI中文摘要

循环肿瘤DNA(ctDNA)在影像学显示耐药性数月前就已携带证据,但最早证据存在于检测限(LoD)以下:新生亚克隆仅被间歇性检测到,产生微弱检测和非检测的闪烁序列。商业液体活检将每次抽取视为独立快照,并将非检测视为无信号。我们认为非检测是左删失观测,而随时间变化的非检测和微弱检测模式在单个值可信之前就携带了可操作的生长证据。我们引入Span,一种删失泊松贝叶斯潜在增长变点检测器,它对二元检测过程建模,为每个变异的检测率累积一个向上变点的序贯广义似然比统计量,并以校准的假警报控制发出竞争风险警报。Span没有学习权重,因此没有过拟合风险。在一线CDK4/6抑制剂联合内分泌治疗的HR+/HER2-转移性乳腺癌合成队列中,在匹配的10%假警报率下,Span将提前三个月捕获的即将进展比例大约翻倍(惰性出现:25% vs 快照的11%),具有可证伪的剂量反应:对惰性出现效果显著,对快速出现效果消失。值轨迹基线表现与快照相同,将增益归因于删失检测模型。生存主干在真实乳腺癌数据(GBSG-2,n=686;C指数0.67 vs 0.68)上与Cox基线匹配,在具有清洁生物标志物的真实纵向队列(PBC2,n=312)上,同一管道正确拒绝获胜,这是一个可证伪的边界测试,确认机制是特定于状态的。所有ctDNA轨迹均为合成数据。

英文摘要

Circulating-tumour DNA (ctDNA) carries evidence of drug resistance months before imaging shows it, but the earliest evidence lives below the assay's limit of detection (LoD): a nascent subclone is detected only intermittently, producing a flickering sequence of faint detects and non-detects. Commercial liquid biopsies treat each draw as an independent snapshot and a non-detect as nothing. We argue a non-detect is a left-censored observation, and the pattern of non-detects and faint detects over time carries actionable evidence of growth before any single value is trustworthy. We introduce Span, a censored-Poisson Bayesian latent-growth change-point detector that models the binary detection process, accumulates a sequential generalised-likelihood-ratio statistic for an upward change-point in the per-variant detection rate, and raises a competing-risks alarm with calibrated false-alarm control. Span has no learned weights, so there is nothing to overfit. On a synthetic cohort of HR+/HER2- metastatic breast cancer on first-line CDK4/6-inhibitor plus endocrine therapy, at a matched 10% false-alarm rate, Span roughly doubles the fraction of impending progressions caught three months ahead (indolent regime: 25% vs 11% for the snapshot), with a falsifiable dose-response: large for indolent emergence, vanishing for fast emergence. A value-trajectory baseline performs identically to the snapshot, isolating the gain to the censored detection model. The survival backbone matches a Cox baseline on real breast-cancer data (GBSG-2, n=686; C-index 0.67 vs 0.68), and on a real longitudinal cohort with clean biomarkers (PBC2, n=312) the same pipeline correctly declines to win, a falsifiable boundary test confirming the mechanism is regime-specific. All ctDNA trajectories are synthetic.

2606.11868 2026-06-11 cs.LG q-bio.QM 新提交

MemNovo: Look Back at the Spectrum for Balanced De Novo Peptide Sequencing from Mass Spectrometry

MemNovo: 回顾谱图以实现质谱中平衡的从头肽段测序

Dongxin Lyu, Jingbo Zhou, Hongxin Xiang, Yuqiang Li, Jun Xia

发表机构 * Westlake University(西湖大学) Hunan University(湖南大学) Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) HKUST-GZ & HKUST(香港科技大学(广州)与香港科技大学)

AI总结 针对现有Transformer模型在从头肽段测序中过度依赖生成序列先验而忽视谱图证据的问题,提出训练无关的即插即用机制MemNovo,通过建立持久谱记忆库和超保守残差连接在解码阶段注入谱特征,显著提升氨基酸和肽段精度。

详情
Comments
Code: this https URL
AI中文摘要

从串联质谱中进行从头肽段测序是蛋白质组学的关键,能够在不依赖参考数据库的情况下识别新型肽段。尽管基于Transformer的编码器-解码器模型已取得显著性能,但我们发现其推理动态中存在关键病理现象。通过全面的特征缩放实验,我们证明现有的自回归肽段解码器倾向于过度依赖生成序列的先验,同时逐渐未能充分利用输入质谱中的细粒度物理证据。这一现象导致次优结果,生成的肽段序列在生物学上合理但不符合输入谱图。为解决此问题,我们提出MemNovo,一种无需训练且即插即用的机制,在推理时重新平衡肽段和谱图的贡献。MemNovo通过建立持久的谱记忆库,并通过超保守残差连接将检索到的特征直接注入最终解码阶段,从而缓解信息瓶颈。理论分析证实,该机制恢复了解码器状态与原始谱图之间的互信息。在Nine Species基准上使用两个代表性基线模型Casanovo和InstaNovo进行的大量实验表明,MemNovo持续提高了氨基酸精度和肽段精度,对于Casanovo,肽段精度相对提升高达39.1%,对于InstaNovo提升高达3.9%,且计算开销可忽略不计。

英文摘要

De novo peptide sequencing from tandem mass spectrometry is pivotal in proteomics, enabling identification of novel peptides without reference databases. While recent Transformer-based encoder-decoder models have achieved remarkable performance, we uncover a critical pathology in their inference dynamics. Through comprehensive feature scaling experiments, we demonstrate that existing auto-regressive peptide decoders tend to over-rely on generated-sequence priors while progressively under-utilizing fine-grained physical evidence from the input mass spectrum. This phenomenon leads to suboptimal results, where generated peptide sequences are biologically plausible yet not faithful to the input spectrum. To rectify this, we propose MemNovo, a training-free and plug-and-play mechanism that re-balances peptide and spectral contributions at inference time. MemNovo alleviates the information bottleneck by establishing a persistent spectral memory bank and injecting retrieved features directly into the final decoding stage via an ultra-conservative residual connection. Theoretical analysis confirms that this mechanism restores the mutual information between the decoder state and the raw spectrum. Extensive experiments on the Nine Species benchmark with two representative baselines, Casanovo and InstaNovo, demonstrate that MemNovo consistently improves both amino acid precision and peptide precision, achieving up to 39.1% relative improvement in peptide precision for Casanovo and up to 3.9% for InstaNovo, with negligible computational overhead.

2606.11833 2026-06-11 cs.LG q-bio.NC 新提交

Flow Matching with In-Context Priors for Out-of-Distribution Brain Dynamics

基于上下文先验的分布外脑动力学流匹配

Sam Gijsen, Michał Łukomski, Marc-André Schulz, Kerstin Ritter

发表机构 * Hertie Institute for AI in Brain Health, University of Tübingen(赫蒂人工智能脑健康研究所,图宾根大学) Tübingen AI Center, University of Tübingen(图宾根人工智能中心,图宾根大学) Charité – Universitätsmedizin Berlin, Department of Psychiatry and Psychotherapy(柏林夏里特医学院,精神病学与心理治疗系) German Center for Mental Health (DZPG), partner site Tübingen(德国心理健康中心(DZPG),图宾根合作站点)

AI总结 提出一种逐时间步条件扩散Transformer,通过注入组合语言和可选空间先验,实现未见认知任务下fMRI脑动力学的零样本生成,支持反事实神经科学。

详情
Comments
Code and pretrained models available at this https URL
AI中文摘要

流匹配和扩散模型能够实现从图像到蛋白质等领域的条件生成,最近扩展到分布外上下文。然而,神经时间序列的生成模型主要局限于分类条件,阻碍了组合和零样本泛化。在这项工作中,我们提出了一种逐时间步条件扩散Transformer,通过注入组合语言和可选空间先验在上下文中,生成未见认知任务期间的真实fMRI脑动力学。这种零样本生成可以通过在经验验证之前支持新型认知实验的计算机设计和评估,从而促进反事实神经科学。利用该模型,我们在数百个保留任务条件下进行评估,并描述与训练流形相关的预测性能。仅从语言出发,模型恢复了跨任务和保留空间激活模式的区域特异性招募。当空间先验可用时,它们通过将生成锚定在仅靠语言退化的任务空间区域来补充文本路径,同时保留反事实任务规范所需的组合结构。据我们所知,这是首个用于未见认知任务的整个皮层fMRI动力学生成模型,推动了反事实神经科学和数据驱动的实验设计。

英文摘要

Flow matching and diffusion models enable conditional generation across domains ranging from images to proteins, with recent extensions to out-of-distribution contexts. Yet generative models of neural time series have largely remained restricted to categorical conditioning, precluding compositional and zero-shot generalization. In this work, we propose a per-timestep conditioned diffusion transformer for generating realistic fMRI brain dynamics during unseen cognitive tasks by injecting both compositional language and optional spatial priors in-context. Such zero-shot generation could enable counterfactual neuroscience by supporting in-silico design and evaluation of novel cognitive experiments before empirical validation. Leveraging this model, we evaluate across hundreds of held-out task conditions and characterize predictive performance in relation to the training manifold. From language alone, the model recovers region-specific recruitment across tasks and held-out spatial activation patterns. Spatial priors, when available, complement the text pathway by anchoring generation in regions of task space where language alone degrades, while retaining the compositional structure needed for counterfactual task specification. To our knowledge this is the first generative model of whole-cortex fMRI dynamics for unseen cognitive tasks, advancing counterfactual neuroscience and data-driven experimental design.

2606.11775 2026-06-11 math.MG q-bio.QM stat.ML 新提交

Magnitude-Based Features for Multispecies Spatial Data

基于量值的多物种空间数据特征

Julia Sollberger, Joshua Bull, Sara Kališnik, Bernadette Stolz

AI总结 提出基于量值的全局和局部特征向量,用于分析多物种空间数据中的相互作用,在合成肿瘤微环境和人类结直肠癌组织微阵列数据中验证了其识别空间异质性和分类能力。

详情
Comments
32 pages, 24 figures
AI中文摘要

多物种空间数据出现在许多应用中,其中不同实体之间的相互作用对系统行为至关重要,包括生物医学成像、地理空间分析和物种生态学。尽管它们很重要,但捕获这种相互作用的定量工具相对较少。在这项工作中,我们提出了基于量值的特征用于分析多物种空间数据。量值是有限度量空间的一个实值不变量,可以解释为有效点数,结合了空间配置和尺度。我们开发了全局和局部量值特征向量,并在合成肿瘤微环境数据以及人类结直肠癌样本的组织微阵列数据中展示了它们的实用性。在局部,该方法识别出不同的邻域类型并揭示空间异质性;在模型中,这包括与模拟的不同定性结果相关的径向模式,而在真实世界数据中,它反映了B细胞和T细胞群体之间三级淋巴结构样相互作用的重要性。在全局上,该方法恢复了合成数据中跨参数区域的长期模拟结果的已知分类,并提示CD4+ T细胞和CD163+巨噬细胞在区分有利的克罗恩样反应与不利的弥漫性免疫浸润患者中发挥重要作用。总之,这些结果表明基于量值的特征为多物种空间数据分析提供了强大而灵活的工具。

英文摘要

Multispecies spatial data arise in many applications where interactions between different entities are central to system behaviour, including biomedical imaging, geospatial analysis, and species ecology. Despite their importance, relatively few quantitative tools exist to capture such interactions. In this work, we propose magnitude-based features for the analysis of multispecies spatial data. Magnitude is a real-valued invariant of finite metric spaces that can be interpreted as an effective number of points, incorporating both spatial configuration and scale. We develop global and local magnitude feature vectors and demonstrate their utility on synthetic tumour microenvironment data, and in tissue microarray data from human colorectal cancer samples. Locally, the method identifies distinct neighbourhood types and reveals spatial heterogeneity; in the model, this includes radial patterns associated with different qualitative outcomes of the simulations, while in the real-world data it reflects the importance of tertiary lymphoid structure-like interactions between B and T cell populations. Globally, the approach recovers known classifications of long-term simulation outcomes across parameter regimes in synthetic data, and suggests important roles for CD4+ T cells and CD163+ macrophages in distinguishing patients with favourable Crohn's like reactions from unfavourable diffuse immune infiltration. Together, these results suggest that magnitude-based features provide a powerful and flexible tool for the analysis of multispecies spatial data.

2606.11651 2026-06-11 cs.LG q-bio.QM stat.AP 新提交

DeepRHP: A Hybrid Variational Autoencoder for Designing Random Heteropolymers as Protein Mimics

DeepRHP:一种用于设计随机异聚合物作为蛋白质模拟物的混合变分自编码器

Shuni Li, Zhiyuan Ruan, Andy Shen, Ivan Jayapurna, Ting Xu, Haiyan Huang

AI总结 提出混合变分自编码器DeepRHP,在半监督框架下结合特征VAE与经典VAE,通过潜在空间捕获关键化学特征与序列模式,指导随机异聚合物设计,实验验证其稳定膜蛋白的有效性。

详情
Comments
Oral presentation at AAAI 2023 Workshop on AI to Accelerate Science and Engineering
AI中文摘要

由预定义单体组成的合成随机异聚合物(RHP)为设计类蛋白质材料提供了一种方法。如果设计得当,这些RHP可以模拟蛋白质的行为和功能。因此,需要计算工具来有效指导RHP设计。我们通过开发DeepRHP(一种在半监督框架下改进的变分自编码器(VAE)模型)来弥补这一差距。通过为经典VAE配备额外的基于特征的VAE,DeepRHP迫使潜在空间捕获关键化学特征的结构以及单个RHP序列模式。从这个意义上说,我们的方法是通用的,允许以混合方式纳入任何相关特征。我们通过提出在非原生环境中稳定膜蛋白(例如水通道蛋白Z)的潜在单体组成,并将我们的预测与已发表的结果进行交叉验证,证明了DeepRHP的有效性。我们的模型与真实RHP功能之间的一致性表明,利用混合自编码器架构来指导蛋白质和其他生物化合物的RHP设计具有巨大潜力。

英文摘要

Synthetic random heteropolymers (RHPs), consisting of a predefined set of monomers, offer an approach toward the design of protein-like materials. These RHPs, if designed appropriately, can mimic protein behavior and function. As such, there is a need for computational tools to efficiently guide RHP design. We bridge this gap by developing DeepRHP, a modified variational autoencoder (VAE) model under a semi-supervised framework. By equipping a classical VAE with an additional feature-based VAE, DeepRHP forces the latent space to capture structures of critical chemical features as well as individual RHP sequence patterns. In this sense, our method is versatile by allowing any relevant features to be incorporated in a hybrid manner. We demonstrate the effectiveness of DeepRHP by suggesting potential monomer compositions that stabilize membrane proteins (e.g. Aquaporin Z) in non-native environments and cross-validating our prediction with published results. The concordance between our model and true RHP function suggests strong potential in utilizing hybrid autoencoder architectures to guide RHP design for proteins and other biological compounds.

2606.11646 2026-06-11 cs.LG q-bio.QM stat.ML 新提交

Tree-Structured Orthonormal Decomposition of the Aitchison Simplex

Aitchison单纯形的树结构正交分解

Daisuke Yamada, Qijun Zhang, Travis Pence, Barbara B. Bendlin, Federico Rey, Vikas Singh

AI总结 提出PolyILR方法,利用树结构对成分数据进行正交分解,在微生物组和单细胞数据中生成稳定可解释的特征,并建立与softmax分类器的理论联系。

详情
Comments
Accepted at ICML 2026. To appear in PMLR vol. 306
AI中文摘要

成分数据——编码相对比例的向量——出现在包括生态学、地球化学和基因组学在内的科学领域。这些数据中的特征通常具有已知的层次结构(例如,分类学、系统发育、本体论),但现有方法要么忽略这种结构,要么丢弃内在的Aitchison几何,要么设计用于二叉树,要么产生不完整的坐标系。我们描述了PolyILR,一种与任何树拓扑对齐的Aitchison切空间的正交分解。我们的构造在每个内部节点定义了一个加权局部几何,捕获完整的分支结构,然后将这些提升到一个全局正交基,其中每个坐标对应一个特定的树位置。在微生物组和单细胞基准测试中,PolyILR产生稳定、可解释的特征,并支持多尺度树分辨率下的推理。我们还建立了与softmax分类器的新理论联系,暗示了在概率建模中的可能应用。

英文摘要

Compositional data -- vectors encoding relative proportions -- arise across scientific domains, including ecology, geochemistry, and genomics. The features in these data often come with known hierarchical structure (e.g., taxonomies, phylogenies, ontologies), yet existing methods either ignore this structure, discard the intrinsic Aitchison geometry, are designed for binary trees, or yield incomplete coordinate systems. We describe PolyILR, a canonical orthonormal decomposition of the Aitchison tangent space aligned with any tree topology. Our construction defines a weighted local geometry at each internal node capturing full branching structure, then lifts these to a global orthonormal basis where every coordinate corresponds to a specific tree location. On microbiome and single-cell benchmarks, PolyILR yields stable, interpretable features and enables inference at multiscale tree resolution. We also establish a novel theoretical connection to softmax classifiers, suggesting possible applications to probabilistic modeling.

2606.11598 2026-06-11 q-bio.NC 新提交

Large language models selectively converge with human-shared neural semantic representations

大型语言模型与人类共享的神经语义表征选择性趋同

Chen Hong, Ximing Shao, Gangyi Feng

AI总结 本研究结合MEG和跨脑编码模型,比较人类与LLM在共享神经语义表征上的维度结构,发现LLM部分捕捉了人类共享语义,但与社会情感相关的维度存在偏差。

详情
AI中文摘要

人际交流需要建立共享语义,使听众能够从说话者展开的语言中理解其含义,但这种共享神经表征的维度结构仍不清楚。LLM越来越接近人类语言能力和神经反应,引发它们是否捕捉到人脑之间共享的相同语义结构的问题。在这里,我们将讲故事-听故事伪超扫描MEG与维度分辨的跨脑编码建模相结合,比较人类和LLM衍生的共享神经语义表征。说话者叙述中的实词由人类和五个最近的LLM在十个语义维度(即感知、运动、空间、时间、社会性、生命性、情感、注意力、因果和驱力)上评分。我们测试了这些维度是否在声学和语音特征之外解释了说话者-听者神经同步(NS)。人类和LLM衍生的语义空间都解释了NS,但这些共享语义更好地被表征为多维神经结构,而非单一全局信号。这些模式还预测了听者故事理解的个体差异,将神经对齐与认知联系起来。然而,可比较的整体预测掩盖了表征几何的系统性差异。较大的LLM与人类在语义结构和NS上更接近且重叠更大,但这种接近是不完全的且依赖于维度。最大的分歧出现在与能动性、情感和社会经验紧密相关的维度上。这些发现表明,LLM捕捉了人类共享神经语义的实质性组成部分,但其对齐是有选择性的。更大或更强大的模型改善了近似,而社会和情感基础的维度仅被部分捕捉。

英文摘要

Interpersonal communication requires building shared semantics that enable listeners to understand speakers' meanings from their unfolding language, but the dimensional structure of this shared neural representation remains unclear. LLMs increasingly approximate human language capability and neural responses, raising the question of whether they capture the same semantic structure shared between human brains. Here, we combined storytelling-listening pseudo-hyperscanning MEG with dimension-resolved interbrain encoding modeling to compare human- and LLM-derived accounts of shared neural semantic representations. Content words from the speaker's narratives were rated by humans and five recent LLMs along ten semantic dimensions (i.e., perception, motor, space, time, socialness, animacy, emotion, attention, causality, and drive). We tested whether these dimensions explained speaker-listener neural synchronization (NS) beyond acoustic and phonological features. Both human- and LLM-derived semantic spaces explained NS, but these shared semantics are better characterized as a multidimensional neural structure rather than a single global signal. These patterns also predicted individual differences in listeners' story comprehension, linking neural alignment to cognition. However, comparable overall prediction concealed systematic differences in representational geometry. Larger LLMs aligned more closely and showed greater overlap with humans in semantic structure and NS, but this was incomplete and dimension-dependent. The largest divergences emerged for dimensions closely tied to agency, affect, and social experience. These findings show that LLMs capture substantial components of human shared neural semantics, but their alignment is selective. Larger or more capable models improve the approximation, whereas socially and affectively grounded dimensions are captured only partially.

2606.11555 2026-06-11 q-bio.NC cs.AI cs.LG 新提交

End-to-End Machine Learning for Depressive State Classification via EEG and fNIRS

基于EEG和fNIRS的抑郁状态分类的端到端机器学习

Riki Sakurai, Simon Kojima, Mihoko Otake-Matsuura, Shin'ichiro Kanoh, Tomasz M. Rutkowski

AI总结 本研究提出一个端到端机器学习框架,利用EEG和fNIRS信号对抑郁状态进行分类,旨在克服传统诊断的主观性,为临床提供客观的自动化诊断工具。

详情
Comments
4 pages, 4 figures, Accepted for publication in the Proc. 48th Annu. Int. Conf. IEEE EMBS (EMBC 2026), Toronto, Canada, July 20-24, 2026
AI中文摘要

随着社会压力的增加,对心理医疗的需求不断上升,凸显了传统精神病学诊断的局限性。传统方法——主要依赖临床访谈和患者自我报告——本质上容易受到主观偏见和从业者不同的经验判断的影响。为了满足定量评估的需求,基于生物信号的检测,包括脑电图(EEG)和功能性近红外光谱(fNIRS),已成为一种有前景的客观替代方案。这类技术对于识别可能未被受试者自身意识到的潜在抑郁状态尤为重要。此外,在老龄化人群中,抑郁症与痴呆症的高共病性要求早期区分,以防止症状相互恶化并维持生活质量(QoL)。这项针对11名健康学生的初步研究建立了一个基于生物信号的抑郁症检测框架,为临床使用的自动化、客观诊断工具奠定了基础。

英文摘要

The escalating demand for mental healthcare, driven by rising societal stress, highlights the limitations of traditional psychiatric diagnostics. Conventional methods - relying primarily on clinical interviews and patient self-reports - are inherently vulnerable to subjective bias and the varying empirical judgment of practitioners. To address the need for quantitative evaluation, biological signal-based detection, including electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), has emerged as a promising objective alternative. Such technology is particularly vital for identifying latent depressive states that may be unrecognized by the subjects themselves. Furthermore, in aging populations, the high comorbidity between depression and dementia necessitates early differentiation to prevent mutual symptom exacerbation and maintain Quality of Life (QoL). This pilot study of eleven healthy students establishes a framework for biological signal-based depression detection, serving as a foundational step toward automated, objective diagnostic tools for clinical use.

2606.11510 2026-06-11 q-bio.QM q-bio.PE stat.ML 新提交

Continuous biome representations from Earth observation embeddings

从地球观测嵌入中提取连续生物群落表示

Maxwell B. Joseph, Flávia De Souza Mendes, Dieu My T. Nguyen, Camile Sothe, Christopher B. Anderson (Planet Labs PBC)

AI总结 针对离散生物群落图压缩生态连续性的问题,提出从卫星图像嵌入中学习连续概率表示,在巴西6个生物群落和4672种植物数据上验证,优于离散标签预测物种分布。

详情
Comments
8 pages, 4 figures
AI中文摘要

生物群落随空间连续变化,但生物群落图通过分类边界压缩了这种变化,特别是在生态过渡带,过渡群落具有独特的生态特征。地球观测基础模型通过密集嵌入编码光谱、空间和时间信息,能否将离散的生物群落图转换为更好地捕捉生态变化的连续表示?本文在Clay v1.5卫星图像嵌入上拟合线性分类器,从分类图中预测生物群落标签。softmax输出产生一个连续概率向量,其维度对应命名的生物群落类别。我们使用巴西六个生物群落、130万个嵌入和10015个保留的森林清查样地(涵盖4672种植物)评估该方法。连续生物群落表示在预测物种出现方面优于离散生物群落标签(10次空间交叉验证中平均每物种AUC 0.618 vs. 0.570)。分解这一增益表明,改进来自分级概率输出的连续性,而非标签重新分配;该模式在距生物群落边界的所有距离上均成立。原始1024维嵌入仍然是我们测试的最强预测因子(平均AUC 0.646 vs. 0.618),但连续表示恢复了嵌入相对于离散标签的大部分增益。这种简单方法为分类地图标签提供了概率替代方案,保留了其含义,同时编码了离散地图抑制的分级变化。

英文摘要

Biotic communities vary continuously across space, yet biome maps impose categorical boundaries that compress this variation, particularly at ecotones where transitional communities are ecologically distinct. Could Earth observation (EO) foundation models, which encode spectral, spatial, and temporal information with dense embeddings, convert discrete biome maps into continuous representations that better capture ecological variation? Here, we fit a linear classifier on Clay v1.5 satellite image embeddings to predict biome labels from a categorical map. The softmax output yields a continuous probability vector whose dimensions correspond to named biome classes. We evaluate this approach using six Brazilian biomes, 1.3 million embeddings, and 10,015 withheld forest inventory plots spanning 4,672 plant species. The continuous biome representation outperforms discrete biome labels for predicting species occurrence (mean per-species AUC 0.618 vs. 0.570 across 10 spatial cross-validation folds). Decomposing this gain shows that continuity in the graded probability output, rather than label reassignment, accounts for the improvement; the pattern holds across all distances from biome boundaries. The raw 1024-dimensional embedding remains the strongest predictor we tested (mean AUC 0.646 vs. 0.618), but the continuous representation recovers most of the embedding's gain over discrete labels. This simple approach provides a probabilistic replacement for categorical map labels, preserving their meaning while encoding graded variation that discrete maps suppress.

2606.11508 2026-06-11 cs.LG q-bio.QM 新提交

Probabilistic Contrastive Pretraining for Multi-task ADME Property Prediction

概率对比预训练用于多任务ADME性质预测

Yifan Xue, Srimukh Prasad Veccham, Saee Paliwal, Tyler Shimko, Micha Livne

发表机构 * NVIDIA(英伟达)

AI总结 提出分子图-Transformer预训练框架,结合化学自监督与对比互信息,通过统一概率潜变量目标优化重构、对比和化学任务,在多任务微调中采用任务特定MLP头,在三个数据集上平均提升7.6%-9.5%。

详情
AI中文摘要

准确预测吸收、分布、代谢和排泄(ADME)性质对药物发现至关重要,但由于ADME终点存在噪声、相互依赖且数据有限,仍然具有挑战性。我们提出了一种分子图-Transformer预训练框架,结合了化学特异性自监督与对比互信息机器学习(cMIM)。我们的方法将分子图编码为潜变量,从图导出的潜代码重建SMILES字符串,并用领域特定的自监督化学任务增强对比目标。我们不是将这些任务视为具有单独调整损失权重的辅助正则化器,而是将重建、对比判别和化学特异性监督表述为单个概率潜变量目标中的单位权重对数概率因子。对于微调,我们提出了一种具有任务特定多层感知器头的多任务GNN读出架构,在保留共享表示学习的同时减轻负迁移并改进异质非线性任务关系的建模。在Biogen、ExpansionRX和ChEMBL-MT上,所得到的对比KERMT预训练相比KERMT基线分别提高了7.6%、9.9%和9.5%(在显著改进的终点上平均)。将ADME邻近分子添加到预训练语料库进一步改善了迁移,并且对比组件锐化了化学上有意义的潜邻域。

英文摘要

Accurate prediction of absorption, distribution, metabolism, and excretion (ADME) properties is critical to drug discovery, but remains challenging because ADME endpoints are noisy, interdependent, and often data-limited. We propose a molecular graph-transformer pretraining framework that combines chemistry-specific self-supervision with contrastive mutual information machine learning (cMIM). Our method encodes molecular graphs into latent variables, reconstructs SMILES strings from the graph-derived latent codes, and augments the contrastive objective with domain-specific self-supervised chemistry tasks. Rather than treating these tasks as auxiliary regularizers with separately tuned loss weights, we formulate reconstruction, contrastive discrimination, and chemistry-specific supervision as unit-weighted log-probability factors in a single probabilistic latent-variable objective. For fine-tuning, we propose a multi-task GNN readout architecture with task-specific multilayer perceptron heads, preserving shared representation learning while mitigating negative transfer and improving the modeling of heterogeneous, nonlinear task relationships. Across Biogen, ExpansionRX, and ChEMBL-MT, the resulting Contrastive KERMT pretraining improves over the KERMT baseline by 7.6%, 9.9%, and 9.5% respectively (averaged over significantly-improved endpoints). Adding ADME-adjacent molecules to the pretraining corpus further improves transfer, and the contrastive component sharpens chemically meaningful latent neighborhoods.

2606.11500 2026-06-11 eess.IV cs.CE cs.IT cs.LG q-bio.NC 新提交

FlexiBrain: Resolution-Agnostic Voxel-Level Encoding for Native fMRI

FlexiBrain: 面向原生fMRI的分辨率无关体素级编码

Mo Wang, Wenhao Ye, Junfeng Xia, Minghao Xu, Hongkai Wen, Quanying Liu

AI总结 提出FlexiBrain,一种基于Mamba-JEPA的分辨率无关体素级编码框架,通过动态补丁调整直接处理原生fMRI数据,避免破坏性空间标准化,在五个下游任务中性能提升达12个百分点,并显著降低预处理成本。

详情
AI中文摘要

大规模深度学习模型在神经科学中的成功从根本上受到严重数据异质性的制约。从不同来源聚合的原生fMRI数据在空间和时间分辨率上表现出显著差异。因此,大多数现有框架依赖于冗长、僵化的预处理流程,以强制数据集之间的一致性。这种做法引入了两个关键限制:(1)可能退化受试者特定的解剖信息;(2)显著的计算开销,通常每个受试者需要数小时的处理。在此,我们提出FlexiBrain,一种基于Mamba-JEPA的分辨率无关体素级编码框架,用于原生fMRI。FlexiBrain以真实物理单位定义补丁大小,并采用动态补丁调整,从而绕过破坏性的空间标准化,同时允许直接摄取原生空间中的数据。我们使用高效的Mamba-JEPA骨干网络实例化该框架,以建模高维4D fMRI信号。在五个不同的下游神经科学任务中,FlexiBrain持续优于近期最先进的方法,在不使用外部数据增强的情况下实现了高达12个百分点的提升。重要的是,FlexiBrain作为一个无缝插件模块,显著降低了预处理成本,并加速了稳健的体素级fMRI基础模型的开发。代码可在该https URL获取。

英文摘要

The success of large-scale deep learning models in neuroscience is fundamentally constrained by severe data heterogeneity. Native fMRI data aggregated from diverse sources exhibit substantial variation in both spatial and temporal resolutions. Consequently, most existing frameworks rely on lengthy, rigid preprocessing pipelines that enforce uniformity across datasets. This practice introduces two critical limitations: (1) potential degradation of subject-specific anatomical information; (2) significant computational overhead, often requiring hours of processing per subject. Here, we propose FlexiBrain, a resolution-agnostic voxel-level encoding framework for native fMRI based on Mamba-JEPA. FlexiBrain defines patch sizes in real-world physical units and employs a dynamic patch resizing, thereby bypassing destructive spatial standardization while enabling direct ingestion of data in native space. We instantiate the framework using an efficient Mamba-JEPA backbone to model high-dimensional 4D fMRI signals. Across five diverse downstream neuroscience tasks, FlexiBrain consistently outperforms recent state-of-the-art methods, achieving gains of up to 12 percentage points without external data augmentation. Importantly, FlexiBrain functions as a seamless plug-in module, substantially reducing preprocessing costs and accelerating the development of robust voxel-level fMRI foundation models. Code is available at this https URL.

2606.11486 2026-06-11 physics.chem-ph q-bio.MN 新提交

Elucidating the Size of Chemical Space with Assembly Theory

通过组装理论阐明化学空间的大小

Juan Carlos Morales Parra, Keith Y Patarroyo, Abhishek Sharma, David Obeh Alobo, Leroy Cronin

AI总结 利用组装理论,通过组装指数量化分子复杂度,首次从第一性原理估计化学空间大小,发现其随复杂度至少超指数增长,最多双指数增长,在药物相似约束下约为10^117个分子。

详情
Comments
26 pages, 10 figures, 31 references
AI中文摘要

化学空间极其广阔,常见启发式估计表明,在分子质量低于500 Da时,可能存在约10^60个“类药”分子。然而,这些估计很大程度上忽略了所枚举分子的结构和合成复杂性。这里,我们利用组装理论从第一性原理估计化学空间的大小,该理论量化了形成分子所需的因果量,由组装指数捕获。这是一个可测量的分子复杂度度量,源于构建分子图所需的最小递归键合操作次数。组装理论将化学空间划分为由组装指数定义的层次,从而可以对其随分子复杂度增加的增长设定界限。我们表明,化学空间(累积的组装指数水平集)至少以超指数方式增长,至多以双指数方式增长相对于组装指数。使用GDB-13数据库作为增长率估计的参考,我们模拟了化学空间如何在复杂度增加下扩张以及在结构约束(包括原子和键类型、环数、环大小和化学基序)下收缩。在类似于标准类药估计的约束下,包括分子质量低于500 Da,我们的分析得出在组装指数25时化学空间约为10^117个分子。最后,我们通过生物相关基序约束化学空间,并识别出这些组装定义空间的可访问边界附近的结构相关分子。

英文摘要

Chemical space is unimaginably vast with common heuristic estimates suggesting that there are ca. 10^60 'drug-like' molecules possible below a molecular mass of 500 Da. However, these estimates largely ignore the structural and synthetic complexity of the molecules enumerated. Here we present a first-principles estimate of the size of chemical space using the Assembly Theory, which quantifies the amount of causation required to form a molecule, captured in the assembly Index. This is a measurable molecular complexity measure derived from the minimum number of recursive bond-joining operations required to construct a molecular graph. Assembly Theory partitions chemical space into levels defined by Assembly Index, allowing bounds to be placed on its growth as molecular complexity increases. We show that chemical space (the accumulated Assembly Index level sets) grows at least super-exponentially, and at most, double-exponentially with respect to the Assembly Index. Using the GDB-13 database as a reference for growth-rate estimation, we model how chemical space expands under increasing complexity and contracts under structural constraints, including atom and bond types, number of rings, ring size, and chemical motifs. Under constraints comparable to standard drug-like estimates, including molecular mass below 500 Da, our analysis yields a chemical space of approximately 10117 molecules at Assembly Index 25. Finally, we constrain chemical space by biologically relevant motifs and identify structurally relevant molecules near the accessible boundaries of these assembly-defined spaces.

2606.11426 2026-06-11 math.OC math.CA q-bio.QM 新提交

Sharpness characterizes Hill functions

Sharpness刻画Hill函数

Marc Stephan

AI总结 本文严格证明了在有理函数中,Hill函数是半对数尺度下导数上确界(sharpness)达到最大值的唯一函数,且sharpness不超过Hill系数n/4。

详情
Comments
10 pages, 2 figures
AI中文摘要

虽然长期以来被视为经验拟合,但Martinez-Corral、Nam、DePace和Gunawardena提出Hill函数是输入-输出响应sharpness的通用Hopfield屏障。Hopfield屏障是生物系统在不消耗能量的情况下处理信息的基本限制。他们的论证基于Hill系数$4$和$6$的数值结果。我们给出了精确表述和证明:通过半对数尺度下导数的上确界衡量sharpness,任何具有实系数$0\leq \alpha_i\leq \beta_i$的有理函数$r(x)=(\alpha_0+\alpha_1 x+ \cdots +\alpha_n x^n)/(\beta_0 + \beta_1 x+ \cdots + \beta_n x^n)$的sharpness至多为$n/4$,当且仅当$r$是Hill系数为$n$的Hill函数时取等。

英文摘要

While long treated as empirical fits, Hill functions have been postulated to be the universal Hopfield barrier for sharpness of input-output responses by Martinez-Corral, Nam, DePace, and Gunawardena. A Hopfield barrier is a fundamental limit on how well biological systems can process information without expending energy. Their case rested on numerical findings for Hill coefficients $4$ and $6$. We give a precise formulation and proof of this: measuring sharpness by the supremum of the derivative in semi-log scale, any rational function $r(x)=(\alpha_0+\alpha_1 x+ \cdots +\alpha_n x^n)/(\beta_0 + \beta_1 x+ \cdots + \beta_n x^n)$ with real coefficients $0\leq \alpha_i\leq \beta_i$ has sharpness at most $n/4$, with equality if and only if $r$ is a Hill function with Hill coefficient $n$.

2606.11415 2026-06-11 q-bio.NC cs.LG physics.data-an q-bio.QM 新提交

Spatially Masked Regression Reveals Local and Distributed Predictability in Electrophysiological Recordings

空间掩蔽回归揭示电生理记录中的局部和分布式可预测性

Maryam Ostadsharif Memar, Nima Dehghani

AI总结 提出空间掩蔽回归(SMR)框架,通过逐步增大掩蔽区域量化电极信号中局部与分布式信息的贡献,应用于颅内和头皮脑电数据,发现邻近电极贡献显著但非全部,表明信号同时包含局部冗余和全局结构。

详情
AI中文摘要

神经记录通常被解释为局部测量,但任何单个传感器的信号也可能反映分布在整个网络中的结构化活动。这引出一个基本问题:电极信号在多大程度上反映底层系统中的局部信息与分布式信息?更具体地说,电极的活动有多少由其邻近区域携带,又有多少嵌入在阵列的更广泛分布中?我们通过空间掩蔽回归(SMR)框架解决这一问题,该框架从其余电极重建每个电极的时间序列,同时排除目标周围可配置的邻域。通过逐步增大掩蔽,空间局部性成为实验控制,用于量化在移除附近通道后有多少预测信息幸存。我们将SMR应用于具有异质电极覆盖的颅内脑电图(iEEG)和具有标准化导联组合的感觉运动皮层头皮脑电图(EEG)。使用原始信号与重建信号之间的距离相关性,我们发现两种模态中均存在强烈的受试者内重建,即使排除局部邻域后仍有显著的可预测性,且EEG中的跨受试者转移明显强于iEEG。掩蔽显示邻近电极对重建贡献显著,但并非全部,表明单个通道既反映局部冗余也反映更广泛的分布式结构。保留选定边际或谱特性但破坏相位结构或时间顺序的替代数据显著降低了性能,支持SMR依赖于结构化时间和跨通道组织而非仅边际统计的结论。这些结果将SMR定位为量化记录中局部与分布式信息平衡的可解释框架。

英文摘要

Neural recordings are often interpreted as local measurements, yet the signal at any one sensor can also reflect structured activity distributed across the broader network. This raises a basic question: to what extent does an electrode's signal reflect local versus distributed information in the underlying system? More specifically, how much of an electrode's activity is carried by its immediate neighborhood, and how much is embedded more broadly across the array? We address this with a Spatially Masked Regression (SMR) framework that reconstructs each electrode's timeseries from the remaining electrodes while excluding a configurable neighborhood around the target. By progressively increasing this mask, spatial locality becomes an experimental control for quantifying how much predictive information survives after nearby channels are withheld. We apply SMR to intracranial EEG with heterogeneous electrode coverage and to scalp EEG with standardized montages over sensorimotor cortex. Using distance correlation between original and reconstructed signals, we find strong within-subject reconstruction in both modalities, substantial residual predictability even when local neighbors are excluded, and markedly stronger cross-subject transfer in EEG than in iEEG. Masking shows that nearby electrodes contribute strongly to reconstruction but do not account for all of it, indicating that individual channels reflect both local redundancy and broader distributed structure. Surrogates that preserve selected marginal or spectral properties while disrupting phase structure or temporal ordering substantially reduce performance, supporting the conclusion that SMR depends on structured temporal and cross-channel organization rather than on marginal statistics alone. These results position SMR as an interpretable framework for quantifying the balance between local and distributed information in recordings.

2606.11382 2026-06-11 cs.LG q-bio.BM 新提交

GLACIER: A Multimodal Student-Teacher Foundation Model for Molecular Property Prediction

GLACIER:用于分子性质预测的多模态师生基础模型

Emily Nguyen, Yongchan Hong, Harsh Toshniwal, Yan Liu, Andreas Luttens

发表机构 * Department of Computer Science, University of Southern California(南加州大学计算机科学系) Department of Quantitative and Computational Biology, University of Southern California(南加州大学定量与计算生物学系) Amazon(亚马逊) Department of Medical Biochemistry and Biophysics, Science for Life Laboratory, Karolinska Institutet(卡罗林斯卡学院医学生物化学与生物物理系,生命科学实验室)

AI总结 提出GLACIER师生框架,通过融合分子图、SMILES和物理化学描述符三种模态,并利用大模型蒸馏,实现高效准确的分子性质预测。

详情
AI中文摘要

深度学习模型有助于在数十亿候选化合物中发现具有定制性质的分子。然而,开发和部署最先进模型的计算负担不断增加,限制了其可扩展性。大多数大规模模型本质上是单模态的,忽视了利用互补分子数据模态的潜力。为了解决这些缺点,本文介绍了用于化学推理和探索的图-语言对齐表示(GLACIER)模型,这是一个师生框架,集成了分子图、SMILES字符串和物理化学描述符,以学习丰富的分子嵌入。我们的框架包括三个阶段:(1)我们在100,000个药物样分子上预训练三个学生编码器:用于分子图的消息传递神经网络、用于SMILES字符串的基于Transformer的编码器以及用于物理化学描述符的多层感知器;(2)我们使用新颖的Finsler几何感知模块融合这些学生模态;(3)通过对比学习,将来自大型教师模型(包括MiniMol和MolFormer)的互补知识蒸馏到一个轻量级模型中。我们证明GLACIER是一个稳健的框架,在复杂的分子性质预测任务中提供高预测性能和计算效率。我们的代码在此https URL公开可用。

英文摘要

Deep learning models facilitate the discovery of molecules with tailored properties among billions of candidate compounds. However, the computational burden to develop and deploy state-of-the-art models continuously increases, limiting their scalability. Most large-scale models are unimodal in nature and overlook the potential to leverage complementary molecular data modalities. To address these shortcomings, this paper introduces the Graph-Language Alignment for Chemical Inference and Exploration using Representations (GLACIER) model, a student-teacher framework that integrates molecular graphs, SMILES strings, and physicochemical descriptors to learn rich molecular embeddings. Our framework consists of three stages: (1) we pretrain three student encoders on 100,000 drug-like molecules: a message-passing neural network for molecular graphs, a transformer-based encoder for SMILES strings, and a multilayer perceptron for physicochemical descriptors, (2) we fuse these student modalities using a novel Finsler geometry-aware module, and (3) distill complementary knowledge from large teacher models, including MiniMol and MolFormer, into a single lightweight model via contrastive learning. We demonstrate that GLACIER is a robust framework that delivers high predictive performance and computational efficiency in complex molecular property prediction tasks. Our code is publicly available at this https URL.

2606.11276 2026-06-11 q-bio.GN 新提交

A mathematical framework for centromere-aware evaluation of human genome assemblies

面向人类基因组组装的着丝粒感知评估的数学框架

Luca Franco, Matteo Migliarini, Matteo Tommaso Ungaro, Egnald Çela, Luca Corda, Andreas Giannis, Ester Mondelli, Fabio Galasso, Simona Giunta

AI总结 提出基于着丝粒功能基序间距离分布的KL散度度量,实现重复区域基因组组装准确性的定量评估,并应用于T2T基因组。

详情
AI中文摘要

在高度重复区域(如着丝粒)中准确评估基因组组装仍然是基因组学中的一个主要开放挑战。传统的基准测试依赖于序列比对,这在高度同质性和差异性的区域中会出现问题。在这里,我们将着丝粒组装评估框架化为一个紧凑的centeny表示中的比较分布问题,通过计算功能基序之间的基因组距离,而不是依赖于核苷酸序列。我们的基于分布的度量通过比较由KL散度呈现的着丝粒间基序距离来评估查询染色体和目标染色体之间的一致性。当全基因组应用于当前可用的人类端粒到端粒(T2T)基因组时,该方法为整个组装和每个单独染色体提供了准确性排名。总之,我们提出了一个基于基因组间基序距离数值呈现的快速且稳健的评分系统,为重复DNA区域中的组装完整性提供了定量标准,并建立了染色体水平基因组间比较的真正框架。

英文摘要

Accurate evaluation of genome assemblies within highly repetitive regions, such as centromeres, remains a major open challenge in genomics. Conventional benchmarking relies on sequence alignment, which becomes problematic in regions of high homogeneity and divergence. Here, we framed centromere assembly evaluation as a comparative distribution problem in a compact centeny representation by computing genomic distances between functional motifs, rather than relying on nucleotide sequence. Our distribution-based metric assesses agreement between a query and a target chromosome by comparing their centromeric inter-motif distances rendered by KL divergence. When applied genome-wide to currently available human telomere-to-telomere (T2T) genomes, this approach yields an accuracy ranking for the entire assembly and for each individual chromosome. Altogether, we present a rapid and robust scoring system based on genomes numerical rendering of inter-motif distances, that provides a quantitative standard of assembly integrity in repetitive DNA regions and establishes a bona fide framework for chromosome-level genome-to-genome comparison.

2606.11264 2026-06-11 q-bio.QM cs.AI 新提交

OmniBioTwin: A System-of-Twinned-Systems Framework for Health Digital Twins

OmniBioTwin:用于健康数字孪生的孪生系统之系统框架

Zhaohui Wang, Yu Huang, Jiang Bian

AI总结 提出OmniBioTwin框架,通过多层级网络架构中的模块化孪生体和交互算子,实现跨尺度健康数字孪生的系统级集成,并在阿尔茨海默病GLP-1信号通路中验证。

详情
AI中文摘要

健康数字孪生(HDT)有望实现患者特异性建模和决策支持,但目前的方法在结构上仍然碎片化:针对单个器官或任务的单一模型缺乏跨尺度保真度,而系统级孪生缺乏通用的架构框架。我们提出OmniBioTwin,一种孪生系统之系统(SoTS)框架,将HDT组织为模块化计算实体,通过多层网络架构中的显式交互算子进行耦合。该框架包括七个协调层——涵盖数据集成、自主孪生建模、跨尺度耦合、时间同步和人机交互决策支持。我们通过实例化阿尔茨海默病中胰高血糖素样肽-1(GLP-1)信号通路的多尺度孪生来演示OmniBioTwin,说明如何在统一系统中组合和耦合分子、细胞和器官级别的孪生。

英文摘要

Health digital twins (HDTs) promise patient-specific modeling and decision support but current approaches remain structurally fragmented: monolithic models that address a single organ or task lack cross-scale fidelity, while system-level twins lack generalizable architectural frameworks. We propose OmniBioTwin, a System-of-Twinned-Systems (SoTS) framework that organizes HDTs as modular computational entities coupled through explicit interaction operators within a multi-layer network architecture. The framework comprises seven coordinated layers - spanning data integration, autonomous twin modeling, cross-scale coupling, temporal synchronization, and human-in-the-loop decision support. We demonstrate OmniBioTwin by instantiating a multiscale twin for glucagon-like peptide-1 (GLP-1) signaling pathways in Alzheimer's disease, illustrating how molecular, cellular, and organ-level twins can be composed and coupled within a unified system.

2606.11259 2026-06-11 nlin.AO cond-mat.stat-mech cs.SI math.DS q-bio.PE 新提交

Stabilizing Role of Uninformed Participants in Collective Decision Making

无信息参与者在集体决策中的稳定作用

Leonardo Colombo, Marıa Emma Eyrea Irazu, Laura P. Schaposnik, James Unwin

AI总结 通过耗散哈密顿量建模,发现无信息参与者通过方向无关的耗散延迟极化转变,稳定集体决策。

详情
Comments
23 pages, 6 images
AI中文摘要

对于没有严格等级制度的群体,集体决策通常通过妥协产生。我们使用耗散哈密顿量公式开发了一个集体决策的二阶网络模型,其中知情代理引入偏好方向,而无信息参与者仅贡献方向无关的耗散。我们表明,在低冲突下,该模型允许一个局部唯一、指数稳定的妥协状态。使用结构化模块网络,我们进一步表明,随着冲突增加,局部妥协分支通过鞍节点折叠终止,而不是通过平滑的平均场对称破缺转变。模块化极化状态在局部与妥协分支分离的分支上持续存在。方向无关的耗散不会改变静态结构阈值,但会延迟从鞍节点幽灵的逃逸,并将极化的可观察起始点推向更大的冲突。我们的工作确定了一种耗散介导的机制,与基于连通性的解释互补,通过该机制,无信息参与者稳定了生物和工程群体中的集体行为。

英文摘要

For groups without strict hierarchy, collective decisions often emerge through compromise. We develop a second-order network model of collective decision-making using a dissipative Hamiltonian formulation, in which informed agents introduce preferred directions while uninformed participants contribute only direction-free dissipation. We show that under low conflict, the model admits a locally unique, exponentially stable compromise state. Using a structured modular network we further show that as conflict increases the local compromise branch terminates through a saddle-node fold rather than through a smooth mean-field symmetry-breaking transition. Modular polarized states persist on branches that are locally separated from the compromise branch. Direction-free dissipation does not shift the static structural threshold, but it delays escape from the saddle-node ghost and pushes the observable onset of polarization to larger conflicts. Our work identifies a dissipation-mediated mechanism, complementary to connectivity-based accounts, through which uninformed participants stabilize collective behavior in biological and engineered swarms.

2606.11245 2026-06-11 cs.AI cs.NE q-bio.NC 新提交

Position: Hippocampal Explicit Memory Is the Cornerstone for AGI

立场:海马体显式记忆是通用人工智能的基石

Sangjun Park

AI总结 本文主张,将显式记忆整合到大语言模型中是迈向通用人工智能的关键,因为LLM的学习机制类似人类内隐记忆,而高阶认知功能依赖海马体显式记忆。

详情
Comments
Accepted to ICML 2026 (Position Paper Track)
AI中文摘要

大语言模型(LLM)在各种任务中展现了卓越的能力,提升了人们对通用人工智能(AGI)的期望。这篇立场论文认为,整合显式记忆是推动LLM迈向AGI的基石。关键原因在于,LLM的底层学习机制与人类内隐记忆高度相似。然而,AGI所需的高阶认知功能,如长期战略规划、元认知和符号推理,严重依赖海马体显式记忆,无法仅从内隐统计学习中产生。借鉴神经科学的发现,我提出这一观点,并辅以人工显式记忆系统的计算要求,希望促进进一步研究,为显式记忆整合奠定基础。

英文摘要

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, raising expectations for Artificial General Intelligence (AGI). This position paper argues that integrating explicit memory is the cornerstone for advancing LLMs toward AGI. The key reason is that the underlying learning mechanism of LLMs is highly analogous to human implicit memory. However, higher-order cognitive functions necessary for AGI, such as long-term strategic planning, metacognition, and symbolic reasoning, heavily rely on hippocampal explicit memory and cannot arise solely from implicit statistical learning. Drawing on findings from neuroscience, I advance this perspective and complement it with computational requirements for artificial explicit memory systems, hoping to foster further research and lay the groundwork for explicit memory integration.

2606.08493 2026-06-11 q-bio.GN cs.LG stat.ML 版本更新

Querying Counterfactuals on Tissue Graphs with Supervised Disentanglement

在组织图上通过监督解缠查询反事实

Abdul Moeed, Stefan Schrod, Martin Rohbeck, Marc Jan Bonder, Pavlo Lutsik, Oliver Stegle, Daniel Dimitrov

AI总结 本文形式化组织图反事实为空间干预,提出Cellina框架通过监督解缠分解细胞内在状态与空间上下文,用于反事实预测,在结直肠癌和小鼠大脑数据上优于现有方法。

详情
AI中文摘要

组织图反事实询问在改变的空间邻居上下文中细胞的表达将如何变化。这类查询对于预测组织中细胞行为至关重要,但缺乏统一定义,现有方法针对特定干预类型或将细胞视为独立同分布。在这项工作中,我们首先将组织图反事实形式化为一类空间干预,这些干预要么重新连接细胞之间的边(边扰动),要么修改其邻居的表达(节点扰动)。然后,我们介绍Cellina(https://cellina.readthedocs.io),一个使用监督解缠将细胞内在状态从其空间上下文中分解出来的框架,将后者作为反事实预测的条件输入。在跨越结直肠癌和小鼠大脑中超过250万个空间分辨细胞的基准测试中,Cellina在组织扰动、解缠和可扩展性方面优于空间感知和非空间的竞争对手。此外,我们展示了Cellina以无监督方式揭示生物学上不同的癌症子域,并实现靶向邻居扰动模拟。

英文摘要

Tissue graph counterfactuals ask how a cell's expression would change under altered spatial neighbor contexts. Such queries are central to predicting cell behavior in tissues, but lack a unified definition, with existing methods targeting specific intervention types or treating cells as i.i.d. In this work, we first formalize tissue graph counterfactuals as a class of spatial interventions that either rewire connections between cells (edge perturbation) or modify the expression of their neighbors (node perturbation). We then introduce Cellina ( this https URL ) - a framework that uses supervised disentanglement to decompose a cell's intrinsic state from its spatial context, using the latter as a conditioning input for counterfactual predictions. Across benchmarks spanning over 2.5 million spatially-resolved cells in colorectal cancer and mouse brain, Cellina outperforms spatially-informed and non-spatial competitors in in-silico graph perturbations, disentanglement, and scalability. Additionally, we show that Cellina reveals biologically distinct cancer subdomains in an unsupervised manner and enables targeted neighbor perturbation simulations.

2605.29588 2026-06-11 cs.CV cs.AI q-bio.NC 版本更新

Brain-IT-VQA: From Brain Signals to Answers

Brain-IT-VQA: 从脑信号到答案

Roman Beliy, Matias Cosarinsky, Oliver Heinimann, Navve Wasserman, Michal Irani

AI总结 提出 Brain-IT-VQA 框架,基于 fMRI 脑信号解码语言令牌并结合语言模型进行视觉问答,在 NSD-VQA 新基准上显著优于先前方法,并用于分析脑区对视觉信息的贡献。

详情
AI中文摘要

从观看图像时记录的 fMRI 信号解码视觉内容,特别是回答关于所看图像的问题,是一个长期挑战。尽管近年来在基于 fMRI 的视觉问答(VQA)方面取得了显著进展,但性能仍然有限。此外,尽管最近的模型能够做出越来越准确的预测,但它们很少被用作理解大脑中视觉表征结构的工具。我们提出了 Brain-IT-VQA,一个基于 fMRI 的视觉问答框架。基于脑交互变换器(Brain-IT),我们的方法从脑活动中解码语言令牌,并将其与语言模型集成以回答视觉问题。我们的模型显著优于先前的基于 fMRI 的标题生成和 VQA 方法。我们进一步引入了 NSD-VQA,一个新的基于 fMRI 的视觉问答数据集和基准。与现有的图像-fMRI VQA 数据集通常每张图像只提供少数宽泛且弱控制的问题不同,NSD-VQA 在 20 个受控问题类别中平均每张图像提供 20 个问答对,这些类别解耦了多个层次的视觉理解。这使得在有限的 fMRI 测试数据下能够进行更可靠和可解释的评估。Brain-IT-VQA 和 NSD-VQA 共同提供了一个强大的预测框架和研究脑表征的工具。利用这个基准,我们量化了哪些形式的视觉和语义信息可以从对自然图像的 fMRI 响应中可靠解码。我们进一步分析了不同脑区在不同问题类型上的贡献。

英文摘要

Decoding visual content from fMRI signals recorded while a person views images, and specifically answering questions about the seen images, is a long-standing challenge. While significant progress has been made in recent years in visual question answering (VQA) from fMRI, performance remains limited. Moreover, although recent models can make increasingly accurate predictions, they have rarely been used as tools for understanding the structure of visual representations in the brain. We present Brain-IT-VQA, a framework for visual question answering from fMRI. Building on the Brain Interaction Transformer (Brain-IT), our method decodes language tokens from brain activity and integrates them with a language model to answer visual questions. Our model substantially outperforms previous fMRI-based captioning and VQA approaches. We further introduce NSD-VQA, a new dataset and benchmark for visual question answering from fMRI. Unlike existing image-fMRI VQA datasets, which typically provide only a few broad and weakly controlled questions per image, NSD-VQA provides on average 20 question-answer pairs per image across 20 controlled question categories that disentangle multiple levels of visual understanding. This enables more reliable and interpretable evaluation despite limited fMRI test data. Together, Brain-IT-VQA and NSD-VQA provide both a strong predictive framework and a tool for studying brain representations. Using this benchmark, we quantify which forms of visual and semantic information can be reliably decoded from fMRI responses to natural images. We further analyze the contributions of different brain regions across question types.

2605.00545 2026-06-11 cs.LG cs.AI math-ph q-bio.GN q-bio.QM 版本更新

Beyond Continuity: Simulation-free Reconstruction of Discrete Branching Dynamics from Single-cell Snapshots

超越连续性:从单细胞快照无模拟重建离散分支动力学

Junda Ying, Yuxuan Wang, Bowen Yang, Peijie Zhou, Lei Zhang

AI总结 针对单细胞快照数据中随机性和非保守质量动态(如细胞增殖和凋亡)的挑战,提出无模拟框架Unbalanced Schrödinger Bridge (USB),通过离散分支薛定谔桥问题建模单细胞分辨率的跳跃式生灭动态,实现高效轨迹重建与离散模拟。

详情
AI中文摘要

从破坏性快照推断细胞轨迹因随机性和非保守质量动态(如细胞增殖和凋亡)的挑战而复杂化。现有的不平衡最优传输(OT)方法将质量视为连续流体,在群体水平进行推断。然而,这种宏观视角往往无法捕捉单细胞分辨率下生灭事件的离散跳跃性质,而这对于理解谱系分支和命运决定至关重要。我们提出无模拟框架Unbalanced Schrödinger Bridge (USB),用于学习底层动态,有效整合随机和非平衡效应,并在单细胞分辨率下建模离散、跳跃式的生灭动态。理论上,USB为分支薛定谔桥(BSB)问题提供了可处理的解,给出了严格的微观解释,其中单个细胞同时经历布朗运动和离散生灭跳跃。技术上,该方法通过引入无模拟训练目标实现高效求解器,有效扩展到高维组学数据。实验上,我们在模拟和真实数据集上证明,USB不仅达到优于或可比于确定性基线的轨迹重建性能,而且独特地实现了单细胞分辨率下生灭动态的真实离散模拟。

英文摘要

Inferring cellular trajectories from destructive snapshots is complicated by the challenges of stochasticity and non-conservative mass dynamics such as cell proliferation and apoptosis. Existing unbalanced Optimal Transport (OT) methods treat mass as a continuous fluid, performing inference at the population level. However, this macroscopic view often fails to capture the discrete, jump-like nature of birth-death events at single-cell resolution, which is essential for understanding lineage branching and fate decisions. We present Unbalanced Schrödinger Bridge (USB), a simulation-free framework for learning underlying dynamics that effectively integrates both stochastic and unbalanced effects which also models the discrete, jump-like birth-death dynamics at single-cell resolution. Theoretically, USB provides a tractable solution to the Branching Schrödinger Bridge (BSB) problem, offering a rigorous microscopic interpretation where individual cells undergo both Brownian motion and discrete birth-death jumps. Technically, the method implements an efficient solver by introducing a simulation-free training objective that effectively scales to high-dimensional omics data. Empirically, we demonstrate on both simulated and real-world datasets that USB not only achieves trajectory reconstruction performance better than or comparable to deterministic baselines but also uniquely enables realistic discrete simulation of birth-death dynamics at single-cell resolution.

2511.10223 2026-06-11 math.PR q-bio.MN 版本更新

Stochastic Reaction Networks Within Interacting Compartments with Content-Dependent Fragmentation

具有内容依赖碎裂的相互作用隔室内的随机反应网络

David F. Anderson, Aidan S. Howells, Diego Rojas La Luz

AI总结 研究隔室碎裂速率依赖于内部指定物种丰度的随机反应网络模型,证明在内容依赖碎裂下原有爆炸性刻画失效,给出非爆炸性和正递归的新充分条件。

详情
Comments
25 pages; corrected a typo (present in all previous versions) in Step 3 of the proof of Proposition 3.12
AI中文摘要

具有质量作用动力学的随机反应网络为理解均匀环境中的过程(包括生化过程)提供了有用的框架。然而,细胞反应通常是区室化的,无论是在细胞水平还是在细胞内,因此是非均匀的。我们研究了一个区室化模型,其中区室的碎裂速率取决于该区室内某些指定物种的丰度。该特定研究模型是(Duso 和 Zechner, PNAS, 2020)提出的具有动态区室的区室化化学通用框架的一部分。本文建立在(Anderson 和 Howells, Bull. Math. Biol., 2023)的基础上,该文从数学上研究了区室动力学不依赖于其内容的特殊情况。特别地,我们证明了(Anderson 和 Howells, Bull. Math. Biol., 2023)中的爆炸性刻画在此设置下失效,并在底层CRN承认线性Lyapunov函数的假设下,提供了非爆炸性和正递归的新充分条件。这些结果扩展了建模内容介导的区室动力学的理论基础,对细胞分裂和细胞内运输等系统具有意义。

英文摘要

Stochastic reaction networks with mass-action kinetics provide a useful framework for understanding processes -- biochemical and otherwise -- in homogeneous environments. However, cellular reactions are often compartmentalized, either at the cell level or within cells, and hence non-homogeneous. We investigate a model of compartmentalization in which the rate of fragmentation of a compartment depends on the abundance of some designated species inside that compartment. The particular model of study is part of a general framework for compartmentalized chemistry with dynamic compartments that was proposed in (Duso and Zechner, PNAS, 2020). This paper builds on (Anderson and Howells, Bull. Math. Biol., 2023) where the special case where the compartment dynamics do not depend on their contents was studied mathematically. In particular, we demonstrate that the explosivity characterization from (Anderson and Howells, Bull. Math. Biol., 2023) fails in this setting and provide new sufficient conditions for non-explosivity and positive recurrence, under the assumption that the underlying CRN admits a linear Lyapunov function. These results extend the theoretical foundation for modeling content-mediated compartment dynamics, with implications for systems such as cell division and intracellular transport.

2604.25701 2026-06-11 physics.bio-ph physics.data-an q-bio.BM q-bio.MN q-bio.PE 版本更新

Bayesian Rate Inference for Sequence Motif Dynamics in Systems of Reactive Nucleic Acids

反应性核酸系统中序列基序动力学的贝叶斯速率推断

Johannes Harth-Kitzerow, Ulrich Gerland, Torsten A. Enßlin

AI总结 提出贝叶斯推断框架,从链反应器模拟的连接计数数据中推断基序速率方程参数,为匹配简化模型与复杂模拟提供方法,并迈向从实验数据直接推断反应速率常数。

详情
Comments
18 pages, 8 figures, pre-submission
AI中文摘要

RNA世界假说提出了生命在早期地球上出现的一条途径。它假设生命始于基于RNA的系统,能够存储、传递和复制信息,设想单体和短RNA寡聚体相互作用形成更长的链,最终成为具有催化活性的核酶。RNA池中的关键反应是杂交、去杂交、模板化连接和切割。这些反应依赖于许多环境参数以及相互作用链之间广泛可能的构型。为了扫描如此高维的参数空间,需要高效的描述。基序速率方程将复杂的链反应器动力学投影到序列基序空间。这里我们提出了一个贝叶斯推断框架,从链反应器模拟产生的连接计数数据中推断其参数。这提供了一个将更简单的基序速率方程与更复杂的模拟相匹配的框架。此外,这是朝着直接从实验数据推断反应速率常数(包括严格的 uncertainty 估计)迈出的一步。这可能是连接理论与实验、加深我们对生命出现所必需的基本特征理解的关键步骤。

英文摘要

The RNA world hypothesis suggests a pathway of how life emerged on early earth. It assumes that life started with RNA based systems, capable of storing, transmitting and replicating information, envisioning that monomers and short RNA oligomers interact to form longer strands, eventually becoming catalytically active ribozymes. Key reactions in RNA pools are hybridization, dehybridization, templated ligation, and cleavage. Those reactions depend on many environmental parameters and the wide range of possible configurations among interacting strands. In order to scan such high dimensional parameter spaces, efficient descriptions are needed. Motif rate equations project complex strand reactor dynamics onto sequence motif space. Here we present a Bayesian inference framework to infer their parameters from ligation count data produced by strand reactor simulations. This provides a framework to match the simpler motif rate equations to more complex simulations. Additionally, it is a step towards inferring reaction rate constants directly from experimental data, including rigorous uncertainty estimation. This could be an essential procedure to connect theory and experiment, and deepen our understanding of the essential features necessary for life to emerge.

2602.20266 2026-06-11 math.PR math.ST q-bio.PE 版本更新

Multiple Poisson-Dirichlet diffusions on generalized Kingman simplices

广义Kingman单纯形上的多重Poisson-Dirichlet扩散

Cristina Costantini, Matteo Ruggiero

AI总结 构造了有限标记广义Kingman单纯形上的无穷维扩散过程,通过分块斜积分解和极限过程,得到了多重Poisson-Dirichlet平稳分布。

详情
Comments
Revised version; dedicated to the memory of T.G. Kurtz
AI中文摘要

我们在带有有限个标记的广义Kingman单纯形上构造了一类新的无穷维扩散过程。该模型描述了由有限个$H$标记标记的无穷多种类型的相对频率的时间演化,但在每个标记内类型是无标记的。我们首先建立了有限类型Wright-Fisher扩散的分块斜积表示,扩展了Dirichlet律的聚合-重归一化自相似性质。该分解将控制演化中的随机标记质量的$H$维Wright-Fisher扩散与$H$个Wright-Fisher扩散(每个在其自己的随机时钟上运行)分开,后者描述了每个标记内相对频率的演化。在对标记内频率进行降序排序后,我们确定了当每个标记的类型数趋于无穷大时的分布极限,并在适当定义域上推导出其无穷小生成元的显式形式。极限扩散以多重Poisson-Dirichlet分布作为平稳分布;当所有类型共享相同标记时,它恢复为无穷多中性等位基因扩散,而当有两个标记时,它产生Thoma单纯形上的扩散。

英文摘要

We construct a new class of infinite-dimensional diffusions with values in a generalized Kingman simplex with finitely many marks. The model describes the temporal evolution of the relative frequencies of infinitely many types that are labeled by a finite number $H$ of marks, but unlabeled within each mark. We first establish a blockwise skew-product representation for a finite-type Wright-Fisher diffusion, extending the aggregation-renormalization self-similarity property of Dirichlet laws. The decomposition separates an $H$-dimensional Wright-Fisher diffusion governing the evolving random mark masses, from $H$ Wright-Fisher diffusions, each run on its own random clock, which describe the evolution of the relative frequencies within each mark. After ranking the within-mark frequencies in decreasing order, we identify the distributional limit as the number of types per mark tends to infinity and we derive an explicit form of its infinitesimal generator on a suitable domain. The limiting diffusion admits the multiple Poisson-Dirichlet distribution as a stationary distribution; it recovers the infinitely-many-neutral-alleles diffusion when all types share the same mark and yields a diffusion on the Thoma simplex when there are two marks.

2501.09172 2026-06-11 q-bio.PE 版本更新

Towards a less spherical cow: Species differences dilute the stabilizing effect of higher-order interactions

走向更少球形的牛:物种差异稀释了高阶相互作用的稳定效应

Marc Duran-Sala, Sandro Meloni, Violeta Calleja-Solanas

AI总结 通过分析包含成对和高阶相互作用的竞争群落模型,发现高阶相互作用单独不能保证共存,其稳定效应在物种差异存在时减弱,挑战了高阶相互作用作为通用稳定机制的观点。

详情
AI中文摘要

生态模型传统上通过物种间的成对相互作用来解释稳定性和共存。然而,相互作用也可能涉及三个或更多物种的群体,即高阶相互作用,最近的理论表明这可以稳定群落。然而,在成对和高阶相互作用同时发生的群落中,高阶相互作用足以稳定共存的条件仍然未知。本研究通过分析包含一定比例成对和高阶相互作用的竞争群落模型来填补这一空白。利用经验数据、数值模拟和解析方法,我们表明高阶相互作用单独不能保证共存。我们发现,虽然一小部分高阶相互作用可以稳定相同物种群落的动态,但在更现实的条件下,如出生率和死亡率的变化或明确的相互作用结构,这种效应会减弱。我们的结果挑战了高阶相互作用作为通用稳定机制的普遍观点,提供了定量证据,表明成对和高阶相互作用以及网络结构和物种参数对于理解生态稳定性共同重要。

英文摘要

Ecological models traditionally explain stability and coexistence through pairwise interactions among species. However, interactions can also involve groups of three or more species, higher-order interactions, which recent theory suggests can stabilize communities. Yet, the conditions under which higher-order interactions are sufficient to stabilize coexistence in communities where pairwise and higher-order interactions occur simultaneously remain unknown. This work addresses this gap by analyzing a model of competitive communities that incorporates a proportion of pairwise and higher-order interactions. Using empirical data, numerical simulations, and analytical methods, we show that higher-order interactions alone cannot guarantee coexistence. We find that, while a small fraction of higher-order interactions can stabilize dynamics in communities of identical species, this effect weakens under more realistic conditions, such as variability in birth and mortality rates or explicit interaction structures. Our results challenge the prevailing view of higher-order interactions as a universal stabilizing mechanism, providing quantitative evidence of the joint importance of both pairwise and higher-order interactions, together with network structure and species parameters, for understanding ecological stability.

2605.29355 2026-06-11 cs.LG q-bio.NC

Neural-Behavioral Representation of Natural Whole-body Movement in Monkeys

猴子自然全身运动的神经-行为表征

Jieshi He, Puzhe Li, Yanan Sui, Mu-ming Poo

AI总结 通过大规模皮层信号与多视角运动捕捉,结合自回归编码器-解码器模型,实现了对自由运动猴子全身运动的准确解码。

详情
AI中文摘要

理解皮层活动如何表征灵长类动物的自然全身行为仍然具有挑战性。受限于运动的多样性和全身运动学大规模神经表征的不可及性,先前的运动解码研究集中于受限任务和有限的肢体运动。在这里,我们提出了一个用于自由运动猴子的神经-行为记录和建模框架,通过定制的数据采集平台,将来自分布式感觉和运动相关区域的大规模硬膜外皮层信号与同步的多视角运动捕捉相结合。我们重建了猴子的全身运动学,并使用自回归编码器-解码器模型学习了紧凑的行为先验。以神经信号为条件,该模型在没有明确物理约束的情况下解码出准确且逼真的全身运动。我们的结果为利用大规模颅内神经活动解码灵长类动物的自然全身运动提供了一种新颖的概念验证方法。

英文摘要

Understanding how cortical activity represents natural whole-body behaviors in primates remains challenging. Limited by the diversity of movements and inaccessibility of large-scale neural representation of whole-body kinematics, previous motor decoding studies focused on constrained tasks and limited limb movements. Here, we present a neural-behavioral recording and modeling framework for freely moving monkeys, combining large-scale epidural cortical signals from distributed sensory- and motor-related areas with synchronized multi-view motion capture through a custom-made data collection platform. We reconstructed whole-body monkey kinematics and learned a compact behavior prior using an autoregressive encoder-decoder model. Conditioned on neural signals, the model decoded accurate and realistic whole-body movement without explicit physical constraints. Our results provide a novel proof-of-concept approach for decoding natural whole-body movements in primates using large-scale intracranial neural activity.