arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.23877 2026-05-25 physics.flu-dyn q-bio.QM

Particle Image Velocimetry of 3D printed vascular fluidic phantom devices

3D打印血管流体仿体模型的粒子图像测速

Job van Essen, Ahmed Sharaf, Denzel Hopman, Selene Pirola, Paola Fanzio

AI总结 本研究提出了一种基于透明3D打印血管模型和粒子图像测速(PIV)技术的实验框架,用于研究微尺度脑血管中的血流动力学特性。通过构建具有正常和病理(如动脉瘤、狭窄)结构的微流体模型,结合微PIV技术测量局部速度场和壁面剪切应力,验证了该方法在捕捉关键流动特征和速度分布方面的可靠性。研究结果表明,该方法为探究微尺度脑血管血流动力学提供了稳定且有效的实验手段。

详情
AI中文摘要

血流动力学改变在动脉瘤和狭窄等脑血管疾病中起关键作用。然而,体内成像缺乏解决小血管中流动动力学所需的空间分辨率。本研究提出了一种实验框架,利用透明3D打印血管模型和粒子图像测速(PIV)研究微尺度血流动力学。通过增材制造制备了具有直管和病理(动脉瘤和狭窄)几何形状的光学透明微流控模型,最小直径达500微米,并使用光学显微镜进行表征。在稳态层流条件下进行流动实验,使用microPIV测量局部速度场和壁面剪切应力(WSS)。将测量速度与解析的Hagen-Poiseuille预测进行比较,平均相对误差为5%至17%。该平台可靠地捕捉了关键流动特征和速度的空间变化。总体而言,结果表明,透明3D打印血管模型结合microPIV为研究微尺度脑血管血流动力学提供了一种稳健的实验方法。

英文摘要

Altered hemodynamics play a key role in cerebrovascular diseases such as aneurysms and stenosis. However, in vivo imaging lacks the spatial resolution required to resolve flow dynamics in small vessels. This study presents an experimental framework to investigate microscale hemodynamics using transparent 3D printed vascular models and particle image velocimetry (PIV). Optically transparent microfluidic models with straight and pathological (aneurysmal and stenotic) geometries were fabricated via additive manufacturing up to a minimum diameter size of 500 microns and characterized using optical microscopy. Flow experiments were conducted under steady laminar conditions, and local velocity fields and wall shear stress (WSS) were measured using microPIV. Measured velocities have been compared with analytical Hagen Poiseuille predictions, obtaining mean relative errors of 5 to 17 percent. The platform reliably captured key flow features and spatial variations in velocity. Overall, the results demonstrate that transparent 3D printed vascular models combined with microPIV provide a robust experimental approach for studying microscale cerebrovascular hemodynamics.

2605.23745 2026-05-25 q-bio.QM

On the Design of an Analog-Dyadic Converter CRN

模拟-二进制转换CRN的设计

Mathieu Hemery

AI总结 本文研究了如何设计一种能够将分子浓度转换为二进制表示的类比-二进制转换化学反应网络(CRN)。该CRN接收一个范围在[0,1]内的分子浓度作为输入,输出对应的二进制脉冲序列,从而实现对浓度值的近似编码。文章详细分析了反应速率常数变化对误差的影响,并提出了一个能够根据输入浓度和所需精度输出相应二进制编码的读取模块设计方案,为实现高精度分子浓度读取提供了理论基础。

详情
Journal ref
CMSB 2026 - 24th International Conference on Computational Methods in Systems Biology, Jul 2026, Lisboa, Portugal
AI中文摘要

通过微分语义解释的化学反应网络(CRN),即使限制为质量作用定律动力学的基元反应,也构成图灵完备的语言。这意味着任何可计算的实函数都可以被编程,实际上被编译成一个抽象的CRN,并以任意高的精度计算它。在这个计算框架中,信息载体是分子浓度,所需的精度作为输入给出,输出浓度保证满足所需精度。另一方面,人们可能对估计未知输入信号的导数或读取输入分子种类的浓度值感兴趣。本质上,这类问题只能以有限精度近似。因此,先前提出的计算框架无法应用,我们需要设计和分析定制的CRN来执行这些任务。在本文中,我们提出了一种模拟-二进制转换CRN,它接受一个分子浓度(在[0,1]内但不一定是可计算的)作为输入,并产生一系列“开”和“关”脉冲作为输出,这些脉冲在一定程度上对应于输入浓度的二进制表示中的比特序列。我们详细分析了误差源及其随反应速率常数变化的行为。最后,我们勾勒了一个可能的读取器模块设计,该模块接受任意浓度和所需精度作为输入,并输出近似该浓度值的二进制编码,精度满足要求。我们留下一个开放问题,即证明我们构造的正确性。

英文摘要

The Chemical Reaction Networks (CRN) interpreted through the differential semantics, even when restricted to elementary reactions with mass action law kinetics, form a Turing-complete language. This means that any computable real function can thus be programmed, and in fact compiled, in an abstract CRN that will compute it with an arbitrarily high precision. In this computational framework, the information carriers are the molecular concentrations, the required precision is given as input, and the output concentration is guaranteed to satisfy the required precision. On the other hand, one can be interested in estimating the derivative of an unknown input signal or in reading the concentration value of an input molecular species. By nature, such problems can only be approximated with a finite precision. Hence, the computation framework proposed previously cannot be applied and we need to design and analyze custom CRNs to perform these tasks. In this paper, we present an analog-dyadic converter CRN which takes as input one molecular concentration (in [0, 1] but not necessarily computable), and produces as output a sequence of ''on'' and ''off'' spikes corresponding to some extent to the sequence of bits in the dyadic representation of the input concentration. We provide a detailed analysis of the source of errors and their behavior when varying the reactions rate constants. We conclude by sketching a possible design for a reader module that takes as input an arbitrary concentration and a desired precision and outputs a dyadic encoding approximating the value of the concentration with the desired precision. We leave as an open question to prove the correctness of our construction.

2502.20349 2026-05-25 q-bio.NC cs.AI

Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior

自然主义计算认知科学:迈向能够捕捉自然行为全范围的通用模型与理论

Wilka Carvalho, Andrew Lampinen

AI总结 本文探讨如何通过结合人工智能的最新进展,构建能够涵盖自然情境和行为全貌的通用认知科学理论。研究指出,采用更加自然化的实验范式和计算模型,有助于更准确地理解自然智能的本质,并推动理论的泛化能力。文章综述了认知科学、神经科学和人工智能领域的相关研究,提出整合这些领域进展有助于在保持实验控制和理论深度的同时,更好地解释和模拟人类认知过程。

详情
AI中文摘要

认知科学如何构建能够涵盖自然情境与行为全范围的通用理论?我们认为,人工智能(AI)的进展为认知科学提供了及时的机会,使其能够采用日益自然化的刺激、任务和行为进行实验,并构建能够适应这些变化的计算模型。我们首先回顾了涵盖神经科学、认知科学和AI的日益增长的研究,这些研究表明,纳入更广泛的自然主义实验范式及其相应模型,可能是解决自然智能某些方面并确保理论泛化的必要条件。我们回顾了认知科学和神经科学中的案例,其中自然主义范式引发了不同的行为或涉及不同的过程。然后,我们讨论了AI的最新进展,表明从自然主义数据中学习会产生定性的不同行为模式和泛化模式,并探讨了这些发现如何影响我们从认知建模中得出的结论,以及如何帮助产生关于认知和神经现象根源的新假设。接着,我们建议整合AI和认知科学的最新进展,将使我们能够处理更自然的现象,而不放弃实验控制或对理论理解基础的追求。我们提供了关于方法论实践如何有助于自然主义计算认知科学中累积进展的实用指导,并描绘了一条构建能够解决自然认知实际问题的计算模型的道路,同时对这些模型所依据的过程和原则进行还原性理解。

英文摘要

How can cognitive science build generalizable theories that span the full scope of natural situations and behaviors? We argue that progress in Artificial Intelligence (AI) offers timely opportunities for cognitive science to embrace experiments with increasingly naturalistic stimuli, tasks, and behaviors; and computational models that can accommodate these changes. We first review a growing body of research spanning neuroscience, cognitive science, and AI that suggests that incorporating a broader range of naturalistic experimental paradigms, and models that accommodate them, may be necessary to resolve some aspects of natural intelligence and ensure that our theories generalize. We review cases from cognitive science and neuroscience where naturalistic paradigms elicit distinct behaviors or engage different processes. We then discuss recent progress in AI that shows that learning from naturalistic data yields qualitatively different patterns of behavior and generalization, and examine how these findings impact the conclusions we draw from cognitive modeling, and can help yield new hypotheses for the roots of cognitive and neural phenomena. We then suggest that integrating recent progress in AI and cognitive science will enable us to engage with more naturalistic phenomena without giving up experimental control or the pursuit of theoretically grounded understanding. We offer practical guidance on how methodological practices can contribute to cumulative progress in naturalistic computational cognitive science, and illustrate a path towards building computational models that solve the real problems of natural cognition, together with a reductive understanding of the processes and principles by which they do so.

2605.23669 2026-05-25 physics.bio-ph q-bio.NC

Geometric Origin of Exact Mean-Field Reductions: M{ö}bius Symmetry and the Lorentzian Ansatz

精确平均场约化的几何起源:Möbius对称性与Lorentzian假设

Hugues Berry, Leonardo Trujillo

AI总结 本文揭示了洛伦兹型分布(Lorentzian Ansatz)在描述大量耦合振子和脉冲神经元系统中的几何起源。研究发现,洛伦兹分布之所以在里卡蒂动力学中具有独特地位,是因为它是唯一在投影传输下保持不变的二维连续概率密度族。该结论为奥特-安顿森和蒙特布里奥-帕佐-罗欣等精确约简方法提供了统一的几何基础,并解释了高斯闭包方法的失效原因。

详情
AI中文摘要

大型耦合振子和脉冲神经元系统的低维描述严重依赖于Lorentzian假设。我们证明其特殊作用是几何而非启发式的:对于Riccati动力学诱导的输运,Cauchy-Lorentz族确实是诱导射影输运下不变唯一的连通二维连续概率密度族。证明的关键步骤是将动力学重新表述在圆上,此时问题简化为旋转不变概率测度的唯一性。在球极投影下,这得到标准Cauchy律,而在完全射影作用下,得到Lorentzian族。这一结果为Ott-Antonsen [Chaos 18, 037113 (2008)]和Montbrió-Pazó-Roxin [Phys. Rev. X 5, 021028 (2015)]约化提供了统一的几何基础,解释了Gaussian闭包的失败,并确定了精确双参数约化的结构条件。

英文摘要

Low-dimensional descriptions of large systems of coupled oscillators and spiking neurons rely heavily on the Lorentzian Ansatz. We show that its privileged role is geometric rather than heuristic: for the transport induced by Riccati dynamics, the Cauchy-Lorentz family indeed emerges as the unique connected two-dimensional family of continuous probability densities that is invariant under the induced projective transport. The key step of the demonstration is to reformulate the dynamics on the circle, where the problem reduces to the uniqueness of the rotation-invariant probability measure. Under stereographic projection, this yields the standard Cauchy law and, under the full projective action, the Lorentzian family. This result gives a unified geometric foundation for the Ott-Antonsen [Chaos 18, 037113 (2008)] and Montbri{ó}-Paz{ó}-Roxin [Phys. Rev. X 5, 021028 (2015)] reductions, explains the failure of Gaussian closures, and identifies the structural condition underlying exact two-parameter reductions.

2605.23521 2026-05-25 q-bio.GN

Population-Specific Genetic and Non-Genetic Influences on Sleep Traits and Health Outcomes

睡眠特征与健康结局的群体特异性遗传与非遗传影响

Jiheum Park, Stephanie Y. Shue, Rocio Barragan, Jeong Yun Yang, Tian Gu, Chin Hur, Marie-Pierre St-Onge

AI总结 该研究探讨了不同人群群体中遗传和非遗传因素对睡眠特征及健康结果的影响。研究利用All of Us研究计划中的电子健康记录、基因组数据和可穿戴设备数据,分析了与睡眠时长和类型相关的遗传变异对肥胖、糖尿病和心血管疾病等健康指标的影响,并发现实际睡眠时长在一定程度上削弱了这些遗传关联。研究强调了针对不同人群开展睡眠相关健康研究的重要性。

详情
AI中文摘要

睡眠特征由遗传和环境因素共同塑造,并可能影响多种健康状况。All of Us研究项目包含跨祖先群体的电子健康记录、身体测量、基因组数据和可穿戴设备数据,为研究睡眠相关健康结局的遗传和非遗传因素提供了机会。我们考察了遗传倾向(如睡眠时间型、睡眠时长和短睡眠)与跨祖先健康结局之间的关联,以及实际测量睡眠时长的作用。我们利用All of Us全基因组关联研究结果(包括3,414种表型的祖先特异性分析和荟萃分析)来识别与455个睡眠相关SNP相关的表型。横断面和纵向分析(n = 212,529)评估了多基因风险评分与电子健康记录中人体测量和代谢指标之间的关联。亚组分析(n = 7,655)利用Fitbit数据评估了睡眠时长。在六个祖先群体中,SNP分析识别出61种表型与29个睡眠特征相关SNP相关联。FTO基因中的睡眠时间型SNP rs1421085与肥胖、糖尿病和心血管疾病的关联最强,主要在欧洲、美洲和非洲群体中。多基因风险评分分析显示,较短睡眠时长的较高遗传倾向与肥胖和糖尿病风险增加相关,且存在祖先特异性变异。实际测量睡眠时长减弱了这些关联,与多基因风险评分相比,在横断面分析中相对贡献为85.6%-99.9%,在纵向分析中为7.1%-44.0%。本研究识别了与睡眠特征遗传倾向相关的健康状况,并提示实际睡眠时长可能在睡眠相关健康结局中发挥重要作用。荟萃分析、合并分析和祖先特异性分析之间的差异强调了群体特异性研究的重要性。

英文摘要

Sleep traits are shaped by genetic and environmental factors and may influence many health conditions. The All of Us Research Program, which includes EHR, physical measurements, genomic data, and wearable data across ancestry groups, provides an opportunity to study genetic and non-genetic contributors to sleep-related health outcomes. We examined associations between genetic predispositions to chronotype, sleep duration, and short sleep and health outcomes across ancestries, as well as the role of measured sleep duration. We used All of Us genome-wide association study results, including ancestry-specific and meta-analyses for 3,414 phenotypes, to identify phenotypes associated with 455 sleep-related SNPs. Cross-sectional and longitudinal analyses (n = 212,529) evaluated associations between polygenic risk scores (PRS) and anthropometric and metabolic measures from EHR. A subgroup analysis (n = 7,655) assessed sleep duration using Fitbit data. Across six ancestry groups, SNP analysis identified 61 phenotypes linked to 29 sleep-trait-associated SNPs. The chronotype SNP rs1421085 in FTO showed the strongest associations with obesity, diabetes, and cardiovascular conditions, mainly in European, American, and African groups. PRS analysis showed that higher predisposition to shorter sleep duration was associated with increased risk of obesity and diabetes, with ancestry-specific variation. Measured sleep duration attenuated these associations, with relative contributions of 85.6%-99.9% in cross-sectional analyses and 7.1%-44.0% in longitudinal analyses compared with PRS. This study identified health conditions associated with genetic predispositions to sleep traits and suggests that actual sleep duration may play a prominent role in sleep-related health outcomes. Differences among meta-, pooled-, and ancestry-specific analyses highlight the importance of population-specific research.

2605.23164 2026-05-25 q-bio.PE

Tread lightly interpreting group differences in genetic risk

轻描淡写地解读群体遗传风险差异

Nicole Kleman, Meng Lin, Christopher R. Gignoux, Arslan A. Zaidi

AI总结 随着大规模基因组研究和多基因风险预测的发展,人类群体间表型均值差异引起了广泛关注。然而,这些差异的遗传基础比通常认为的要复杂得多,群体间等位基因频率的差异并不一定意味着平均遗传值的差异。本文指出,现有方法在区分真实遗传差异与统计偏差(如群体结构、样本选择偏差和跨群体可移植性差)方面存在局限,因此对群体间遗传风险差异的结论应持谨慎态度。

详情
AI中文摘要

随着大规模基因组研究和多基因风险预测的兴起,人类群体间平均表型值的观察差异重新引起了人们的兴趣。然而,这些差异的遗传基础远比通常认为的难以确定。群体可以在等位基因频率上发生分化,而不一定在平均遗传值上发生分化。推断群体平均遗传值是否不同的实证方法可分为两大类:自上而下的方法,量化祖先解释的表型变异比例;以及自下而上的方法,比较不同群体的多基因评分。然而,这两种方法都有局限性,无法可靠地区分真正的遗传差异与统计伪像,如群体结构、确定偏倚和跨祖先可移植性差。此外,观察到的群体间表型变化可能反映表型测量的偏差和研究设计的异质性,而非潜在的遗传驱动因素。我们认为,关于群体遗传风险差异的主张应相当谨慎地解读。

英文摘要

Observed differences in mean phenotypic values across human groups have attracted renewed interest with the rise of large-scale genomic studies and polygenic risk prediction. However, the genetic basis of these differences is far more difficult to establish than is often appreciated. Populations can diverge in allele frequency differences without diverging in mean genetic value. Empirical approaches to infer whether populations differ in mean genetic value fall under two broad categories: top-down approaches, which quantify the proportion of phenotypic variance explained by ancestry and bottom-up approaches, which compare polygenic scores across groups. However, both approaches have limitations that prevent them from reliably distinguishing true differences in genetic apart from statistical artifacts like population structure, ascertainment bias, and poor cross-ancestry portability. Further, observed phenotypic shifts between populations may reflect bias in phenotype measurement and heterogeneity in study design rather than underlying genetic drivers. We argue that claims about group differences in genetic risk should be interpreted with considerable caution.

2605.23161 2026-05-25 q-bio.QM

Abstract relational structures in models of biology

生物学模型中的抽象关系结构

Léo Diaz, Sean T. Vittadello, Michael P. H. Stumpf

AI总结 本文提出了一种名为“系统超图”的数学形式化方法,用于更精确地表示生物学系统中的抽象关系结构。该方法结合了超图与属性层次结构,能够清晰表达系统属性之间的依赖关系,避免了传统模型中的模糊性和冗余。通过将化学反应网络和随机Petri网两种系统生物学常用形式化方法转化为系统超图,研究揭示了二者之间严格的包含关系,表明随机Petri网比化学反应网络更具表达能力,从而展示了抽象方法在生物复杂系统建模中的重要价值。

详情
AI中文摘要

用于建模生物系统的数学形式化方法会引入潜在和模糊的假设,从而限制或扭曲其表示能力。开发能够更精确表示系统的形式化方法对于理解其复杂性和细微差别至关重要。本文引入系统超图,这是一种通用且可扩展的形式化方法,用于表示抽象关系系统。系统超图将超图(表示对象间的多维关系)与表示系统属性及其相互依赖关系的分层属性系统相结合。属性结构确保系统属性之间的依赖关系清晰明确,从而澄清假设并避免数据关联中的冗余。作为应用,我们考虑了系统生物学中广泛使用的两种形式化方法——化学反应网络和随机Petri网,并研究了它们作为系统超图的自然表示。这使我们能够严格关联这两种形式化方法,特别证明了随机Petri网比化学反应网络更具一般性,这与它们通常被认为等价的假设相反。更广泛地说,我们的工作展示了抽象的力量,特别是在生物复杂性的数学表示中协调对象与关系之间的作用。

英文摘要

The mathematical formalisms used to model biological systems induce both latent and ambiguous assumptions that can limit or distort their representational capabilities. Developing formalisms that can represent systems more precisely is fundamental to comprehending their intricacies and complexities. Here we introduce the systems hypergraph, a general and extendable formalism for representing abstract relational systems. A systems hypergraph combines a hypergraph, representing multidimensional relations among objects, with a hierarchical system of attributes representing system properties and their interdependencies. The attribute structure ensures that dependencies between system properties are patent and unambiguous, thereby clarifying assumptions and avoiding redundancy in data association. As an application we consider two formalisms widely used in systems biology - chemical reaction networks and stochastic Petri nets - and study their natural representation as systems hypergraphs. This allows us to relate the two formalisms rigorously, demonstrating in particular that stochastic Petri nets are strictly more general than chemical reaction networks in contrast to their commonly assumed equivalence. More broadly our work demonstrates the power of abstraction, and in particular its role in mediating between objects and relations in mathematical representations of biological complexity.

2605.23126 2026-05-25 q-bio.PE

Asymptotic Counting of Binary Phylogenetic Networks

二元系统发育网络的渐近计数

Hao Yu, Louxin Zhang

AI总结 本文研究了具有 $k$ 个网状结构的二元系统发育网络在 $n$ 个分类单元下的渐近计数问题,其中 $k$ 可以随着 $n$ 增长。通过分析边插入过程中影响网络构造数量的局部结构,并结合对特殊局部结构贡献的上界与已知的树-子网络渐近公式,作者得出了当 $k=o(\sqrt{n})$ 时,此类网络数量的渐近表达式。这一结果为理解复杂进化过程下的网络结构提供了重要的数学基础。

详情
Comments
24 pages, 7 figures
AI中文摘要

系统发育网络为模拟网状进化过程(如杂交、重组和水平基因转移)提供了一个通用框架。本文研究了在n个分类群上具有k个网状结构的二元系统发育网络的渐近计数,其中k允许随n增长。通过边插入,我们分析了影响此类网络可能构造数量的局部结构。通过限制具有异常局部配置的网络的贡献,并将这些界限与树-子网络的已知渐近公式相结合,我们证明当k=o(√n)时,在n个分类群上具有k个网状结构的二元系统发育网络的数量渐近于\[ \binom{n}{k}2^{n+k-1/2}n^{n+k-1}e^{-n} \]。

英文摘要

Phylogenetic networks provide a general framework for modeling reticulate evolutionary processes such as hybridization, recombination, and horizontal gene transfer. In this paper, we study the asymptotic counting of binary phylogenetic networks with $k$ reticulations on $n$ taxa, where $k$ is allowed to grow with $n$. Using edge insertion, we analyze the local structures that affect the number of possible constructions of such networks. By bounding the contribution of networks with exceptional local configurations and combining these bounds with known asymptotic formulas for tree-child networks, we show that, when $k=o(\sqrt n)$, the number of binary phylogenetic networks with $k$ reticulations on $n$ taxa is asymptotic to \[ \binom{n}{k}2^{n+k-1/2}n^{n+k-1}e^{-n}. \]

2605.23035 2026-05-25 cs.CL cs.AI q-bio.NC

Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography

稀疏自编码器将大脑-LLM对齐映射到皮层语义拓扑

Dongxin Guo, Jikun Wu, Siu Ming Yiu

AI总结 该研究探讨了大型语言模型(LLM)中间层与人类大脑语言响应之间的对应关系,并利用稀疏自编码器(SAEs)对其进行机制解释。通过将SAEs与神经编码模型结合,研究者分解了GPT-2 XL和Llama-3.1-8B模型,提取出每层1.6万至3.2万个可解释特征,并验证了语义特征在预测大脑编码性能中的主导作用。研究进一步表明,SAE提取的语义特征能够重现大脑皮层的语义拓扑结构,并在多种语言中展现出良好的泛化能力。

详情
Comments
Accepted at CoNLL 2026. 20 pages (9 main + 1 limitations/acknowledgments + 3 references + 7 appendix), 5 figures, 20 tables
AI中文摘要

大型语言模型(LLM)的中间层最能预测人脑对语言的反应,这是计算神经语言学中最稳健的发现之一,但其机制原因仍未得到解释。我们通过将可解释性机制中的稀疏自编码器(SAE)与神经编码模型相结合来填补这一空白,将GPT-2 XL和Llama-3.1-8B分解为每层16K-32K个可解释特征。一个人工验证的分类法(κ≥0.74)显示,仅语义特征就恢复了94%的峰值编码性能(r=0.285),显著超过了方差匹配的基线(p<0.001,d=1.31)。除了这种总体主导性之外,我们还测试了一个新颖的皮层拓扑预测:从三个独立神经科学项目先验导出的五个语义子类别应映射到不同的大脑区域。一个正式的收敛测试证实了这种对齐(Spearman ρ=0.72,p<0.001;超几何p=0.007),表明SAE发现的特征以先前方法无法达到的粒度重现了已知的皮层语义组织。SAE特征进一步预测了超出词汇控制的人类阅读时间(ΔlogLik=38.4,p<0.001),并且一项探索性的预测误差分析提供了初步证据,表明大脑还编码了意外的语义内容。结果在英语、中文和法语中具有普适性。

英文摘要

Intermediate layers of large language models (LLMs) best predict human brain responses to language, one of the most robust findings in computational neurolinguistics, yet why remains mechanistically unexplained. We address this gap by bridging sparse autoencoders (SAEs) from mechanistic interpretability with neural encoding models, decomposing GPT-2 XL and Llama-3.1-8B into 16K-32K interpretable features per layer. A human-validated taxonomy ($κ\geq 0.74$) reveals that semantic features alone recover 94% of peak encoding performance ($r=0.285$), substantially exceeding variance-matched baselines ($p<0.001$, $d=1.31$). Beyond this aggregate dominance, we test a novel cortical topography prediction: five semantic subcategories derived a priori from three independent neuroscience programs should map onto distinct brain regions. A formal convergence test confirms this alignment (Spearman $ρ=0.72$, $p<0.001$; hypergeometric $p=0.007$), demonstrating that SAE-discovered features recapitulate known cortical semantic organization at a granularity inaccessible to prior methods. SAE features further predict human reading times beyond lexical controls ($Δ\mathrm{logLik}=38.4$, $p<0.001$), and an exploratory prediction-error analysis provides preliminary evidence that the brain additionally encodes unexpected semantic content. Results generalize across English, Chinese, and French.

2605.23032 2026-05-25 cs.CL cs.AI q-bio.NC

Brain-LLM Alignment Tracks Training Data, Not Typology

大脑-大语言模型对齐追踪训练数据,而非语言类型学

Dongxin Guo, Jikun Wu, Siu Ming Yiu

AI总结 该研究探讨了大脑与大语言模型(LLM)之间的对齐模式是否具有跨语言泛化能力,发现对齐模式主要由模型训练语言的主导性决定,而非英语本身的特性。通过对比多种语言的fMRI数据和不同语言主导的LLM,研究发现以中文为主导训练的模型在与中文大脑对齐时表现最佳,而与英语大脑对齐最差。此外,语言类型学距离、句法相关脑区的梯度差异以及分词粒度等因素也对对齐效果产生显著影响,揭示了此前观察到的“英语优势”主要源于训练数据的组成,而非语言结构本身的特性。

详情
Comments
Accepted to CoNLL 2026. 9 pages main content + 4 pages references + 6 pages appendix; 4 figures, 13 tables
AI中文摘要

大脑-大语言模型对齐在英语中已得到充分证实,然而大脑的语言网络在神经解剖学上跨语言具有普遍性。这种对齐是否也能跨语言泛化,以及什么因素决定了其变化?我们使用来自英语、中文和法语(《小王子》语料库)112名参与者的fMRI数据,以及涵盖英语主导、中文主导和多语言架构的七种大语言模型进行了测试。我们的核心发现是,训练语言主导性(而非英语的固有属性)驱动了对齐模式:一个中文主导模型(Baichuan2-7B),其架构与LLaMA-2-7B匹配,完全逆转了梯度,与中文大脑对齐最佳,与英语对齐最差。除训练主导性外,形式类型学距离独立地与对齐退化共变,与句法相关的大脑区域(IFG)显示出比词汇语义区域(PTL)陡峭2.3倍的类型学梯度,而分词丰度解释了跨语言最优编码层转移的约60%。这些结果表明,大脑-大语言模型对齐中明显的“英语优势”是训练数据组成的假象,而剩余的变化反映了集中在句法处理中的真实类型学结构。

英文摘要

Brain-LLM alignment is well established in English, yet the brain's language network is neuroanatomically universal across languages. Does alignment also generalize cross-linguistically, and what governs the variation? We test this using fMRI data from 112 participants across English, Chinese, and French (the Le Petit Prince corpus) and seven LLMs spanning English-dominant, Chinese-dominant, and multilingual architectures. Our central finding is that training-language dominance, not an inherent property of English, drives the alignment pattern: a Chinese-dominant model (Baichuan2-7B), architecture-matched to LLaMA-2-7B, reverses the gradient entirely, aligning best with Chinese brains and worst with English. Beyond training dominance, formal typological distance independently covaries with alignment degradation, syntax-associated brain regions (IFG) show $2.3\times$ steeper typological gradients than lexico-semantic regions (PTL), and tokenization fertility accounts for $\sim$60% of a cross-linguistic shift in optimal encoding layer. These results reveal that the apparent "English advantage" in brain-LLM alignment is an artifact of training data composition, while the remaining variation reflects genuine typological structure concentrated in syntactic processing.

2605.23012 2026-05-25 q-bio.NC

Integrating Cognitive Load and Embodied Cognition Theories Through Representations as Multi-Scale Attractors

通过多尺度吸引子表征整合认知负荷与具身认知理论

David C. Gibson, Mary Elizabeth Azukas, Meryem Yilmaz Soylu

AI总结 本文通过将心理表征重新概念化为动态多尺度吸引子,提出了认知负荷理论与具身认知理论的形式统一。研究指出,两者看似矛盾,实则在时间尺度上描述了互补的过程,分别对应中等时间尺度的压缩表征和快速的感官运动循环。基于动态系统理论和分层预测处理框架,文章提出了学习是跨时间层级吸引子塑造的观点,并提出了五个可检验的理论预测,为教学设计和教育实践提供了新的理论基础。

详情
AI中文摘要

本文通过将心理表征重新概念化为时间层次预测架构中的动态多尺度吸引子,提出认知负荷理论与具身认知之间的形式化调和。当通过复杂系统视角审视时,两种理论之间的明显冲突得以消解。认知负荷理论描述了在中等时间尺度上运行的压缩表征,而具身认知描述了快速的感知运动环路。这两种理论描述了互补的、时间尺度分离的过程,它们同时运行而不矛盾。基于动力系统理论、层次预测处理和一个六节点开放系统架构,本文提出学习最好被理解为跨耦合时间层的吸引子塑造,从毫秒级的感知运动环路,经过数秒至数分钟的工作记忆压缩,到缓慢的、长达数年的知识结构重塑。发展了三个理论调和:时间尺度分离、空间扩展的层次结构,以及从新手到专家配置的发展轨迹。基于这些理解,提出了五个新颖的、可检验的预测,涉及跨时间尺度干扰、具身负荷减轻、元认知作为时间尺度耦合、反馈拓扑和图示灵活性悖论。对于每个预测,回顾了汇聚的经验证据,并提出了正式的经验研究设计。全文基于认知负荷与具身参与不是竞争性需求而是统一的时间层次认知系统的互补表达这一原则,探讨了对教学设计、评估实践和教育领导力的启示。

英文摘要

This article proposes a formal rapprochement between cognitive load theory and embodied cognition by reconceptualizing psychological representations as dynamic multiscale attractors within a temporal-hierarchical prediction architecture. The apparent conflict between the two theories dissolves when viewed through a complex systems lens. Cognitive load theory describes compressed representations operating at medium timescales, while embodied cognition describes fast sensorimotor loops. These two theories describe complementary, timescale-separated processes that operate simultaneously without contradiction. Drawing on dynamical systems theory, hierarchical predictive processing, and a six-node open-systems architecture, the article proposes that learning is best understood as attractor sculpting across coupled temporal layers, from millisecond sensorimotor loops through seconds-to-minutes working memory compression to the slow, years-long reshaping of knowledge structures. Three theoretical reconciliations are developed: time-scale separation, spatially extended hierarchies, and developmental trajectories from novice to expert configurations. From these understandings, five novel, testable predictions are advanced concerning cross-timescale interference, embodied load reduction, metacognition as timescale coupling, feedback topology, and the schema flexibility paradox. For each prediction, converging empirical evidence is reviewed, and formal empirical research designs are proposed. Implications for instructional design, assessment practice, and educational leadership are developed throughout, grounded in the principle that cognitive load and embodied engagement are not competing demands but complementary expressions of a unified temporal-hierarchical cognitive system.

2605.22988 2026-05-25 q-bio.NC cs.LG cs.RO cs.SY eess.SY

Active Sensing Subserves Task-Level Control

主动感知服务于任务级控制

Andrew Lamperski, Debojyoti Biswas, Eric S. Fortune, John Guckenheimer, Kathleen Hoffman, Noah J. Cowan

AI总结 本文探讨了主动感知在任务级控制中的作用,提出主动感知并非由感官目标驱动,而是任务控制的必要组成部分。研究结合生物实证数据和数学理论,表明主动感知行为通常以离散阶段出现,动物在“探索”与“利用”两种行为模式间切换,以适应性传感器和模式切换实现反馈控制。这一策略在生物系统中普遍存在,但在工程系统中却较少应用,提示当前机器人控制体系仍有待改进。

详情
AI中文摘要

主动感知传统上被定义为为了获取信息而消耗能量,通常以运动的形式。在这里,我们提出,对自适应传感器的依赖、运动与感知之间的联系以及任务级控制的结合,必然导致主动感知运动的出现。这样,主动感知并非由感官目标驱动,例如最小化状态不确定性,而是任务级控制所必需的。这一假设,即主动感知服务于控制,得到了来自生物体的经验数据和数学理论的支持。有趣的是,主动感知行为通常发生在离散的时段中,与目标导向行为交替出现。这表明动物在两种具有不同控制策略的行为模式之间切换:一种“探索”模式,动物产生动态运动以塑造感觉反馈;以及一种“利用”模式,动物产生与实现任务目标直接相关的较慢补偿运动。这种依赖于自适应传感器、主动感知和模式切换的反馈控制策略在工程系统中并不常用,尽管在生物学中普遍存在。由最先进的传感器、执行器和机械设计组成的工程系统在“成本函数”方面(如最大力生成、精度和速度)可以胜过动物。然而,动物通常能够实现目前工程系统无法比拟的稳健、优雅的行为,这表明当前的控制系统存在不足。这些以控制理论语言表达的见解可能对改进机器人感知和控制至关重要。

英文摘要

Active sensing is traditionally defined as the expenditure of energy, typically in the form of movement, for obtaining information. Here, we propose that the combination of reliance on adaptive sensors, the linkage between movement and sensing, and task-level control inevitably gives rise to the emergence of active sensing movements. In this way, active sensing is not driven by sensory goals, such as minimizing uncertainty about the state, but rather is necessary for task-level control. This hypothesis, that active sensing subserves control, is supported by both empirical data from organisms and mathematical theory. Interestingly, active sensing behaviors often occur in discrete epochs, interspersed with goal-oriented behavior. This suggests that animals switch between two behavioral modes with distinct control policies, an `explore' mode in which animals produce dynamic movements to shape sensory feedback, and an `exploit' mode in which animals produce slower compensatory movements that are directly related to achieving task goals. This strategy for feedback control that relies on adaptive sensors, active sensing, and mode switching is not commonly used in engineered systems despite being ubiquitous in biology. Engineered systems comprising state-of-the-art sensors, actuators, and mechanical designs can outperform animals with respect to ``cost functions'' such as maximum force generation, precision, and speed. Nevertheless, animals routinely achieve robust, graceful behaviors that are currently unmatched by engineered systems, suggesting that current control systems are insufficient. These insights, expressed in the language of control theory, may be critical for improving robotic sensing and control.

2605.22968 2026-05-25 q-bio.QM cs.LG stat.ML

Uncertainty-aware classification and triage of structural heart disease using electrocardiography and echocardiography metrics

基于心电图和超声心动图指标的结构性心脏病不确定性感知分类与分诊

Mitchel J. Colebank

AI总结 该研究探讨了利用心电图(ECG)和超声心动图指标对结构性心脏病(SHD)进行分类与分诊的不确定性感知方法。研究对比了频率学派和贝叶斯神经网络分类器在SHD检测中的表现,发现贝叶斯方法在分类性能和不确定性量化方面更具优势。研究还展示了如何将不确定性感知分类应用于SHD筛查,为通过机器学习辅助分诊、优化医疗资源分配提供了可行方案。

详情
Comments
15 pages, 5 figures
AI中文摘要

机器学习方法提供了一种方法创新,可以通过无创且易于获得的测量方式帮助筛查心血管疾病。最近在利用心电图数据筛查结构性心脏病方面的投资就是一个例子,其中心电图提供了一种低成本、可用的筛查方式。这导致了EchoNext数据集的产生,这是一个配对的心电图-超声心动图数据存储库,用于测试新的结构性心脏病检测方法。然而,相对较少的研究探讨了通过贝叶斯推理进行更概率性的分类如何改善这种情况下的不确定性量化。此外,很少有研究考虑如何开发分诊系统以缓解医疗瓶颈,例如由专家超声技师审查来自服务不足的农村诊所的数据以进行结构性心脏病评估。在本研究中,我们利用现有的心电图-超声心动图数据来比较频率派和贝叶斯神经网络分类器。我们表明,贝叶斯方法在结构性心脏病分类中与频率派方法相当或更好,并且它们具有更稳健的不确定性量化。我们提供了一个示例,说明如何将此不确定性感知分类方案用于结构性心脏病筛查,为机器学习如何帮助分诊提供了概念验证,即在结构性心脏病高度可能或测量高度不确定时,让个体获得专家超声技师的输入。

英文摘要

Machine learning methods provide a methodological innovation that can help screen for cardiovascular disease through noninvasive and readily available measurement modalities. Recent investments in using electrocardiogram (ECG) data to screen for structural heart disease (SHD) are one example, where ECGs provide a low-cost, available modality for screening. This has led to the EchoNext dataset, a paired ECG-echocardiogram data repository for testing new methods of SHD detection. However, relatively few studies have investigated how more probabilistic classification through Bayesian inference may improve uncertainty quantification in this setting. Moreover, few studies have considered how triage systems can be developed to alleviate healthcare bottlenecks, such as the review of data from underserved, rural clinics by expert sonographers for SHD assessment. In this study, we leverage existing ECG-echocardiogram data to compare frequentist and Bayesian neural network classifiers. We show that the Bayesian approach is comparable or better than frequentist methods in SHD classification, and that they have a more robust uncertainty quantification attached to them. We provide an example of how this uncertainty-aware classification scheme can be used for screening SHD, providing a proof-of-concept for how machine learning can help with triage in getting individuals expert sonographer input when SHD is highly likely or measurements are highly uncertain.

2605.22962 2026-05-25 cs.CV cs.CE cs.HC cs.SE q-bio.NC

GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction

凝视行为注释工具包 (GBAT): 基于AI的自动注释工具,用于自我中心眼动追踪和儿童-照顾者互动视频数据

Iba Baig, Kevin Li, Yanbin Xu, Seiji Cattelain, Marie Hallo, Hayato Ono, Sho Tsuji, Ming Bo Cai

AI总结 该研究提出了一种基于人工智能的工具GazeBehavior Annotation Toolkit(GBAT),用于自动标注儿童与照顾者互动过程中的第一人称眼动追踪和视频数据。该工具通过深度学习技术实现了多视频后同步、视线目标半自动标注以及参与者姿态和手部动作的分类,显著提高了数据预处理和特征提取的效率与可扩展性。这一工具为研究人类早期发展中注意力动态和自然行为的大规模长期研究提供了重要支持。

详情
Comments
submitted to IEEE International Conference on Development and Learning (ICDL), 2026
AI中文摘要

儿童-照顾者互动的视频记录使得能够研究自然行为中的注意力动态。这种多模态记录还允许研究人员实时检查注意力如何与动作和语言使用相互作用。然而,手动注释此类数据非常耗时。在这里,我们介绍凝视行为注释工具包,这是一个基于深度学习的工具包,旨在促进数据预处理和特征提取中的三个关键过程:多视频的事后同步、注视目标类别的半自动注释以及参与者姿态和手部动作的分类。该工具包提高了从人类自我中心眼动追踪和视频数据中提取特征的效率和可扩展性。这种改进对于支持人类早期发展中注意力动态和自然行为的大规模纵向研究至关重要。

英文摘要

Video recordings of child-caregiver interactions enable investigation of attentional dynamics during naturalistic behavior. Such multimodal recording also allows researchers to examine how attention interacts with action and language use in real time. However, manual annotation of such data is time-consuming. Here, we introduce GazeBehavior Annotation Toolkit, a deep-learning-based toolkit designed to facilitate three key processes in data preprocessing and feature extraction: post-hoc synchronization across multiple videos, semi-automatic annotation of gaze target categories, and categorization of participants' poses and hand actions. This toolkit improves the efficiency and scalability of feature extraction from human egocentric eye-tracking and video data. Such improvement is critical in supporting large-scale and longitudinal investigations of attentional dynamics and naturalistic behavior in human early development.

2605.04118 2026-05-25 q-bio.QM cs.AI

ProtDBench: A Unified Benchmark of Protein Binder Design and Evaluation

ProtDBench: 蛋白质结合物设计与评估的统一基准

Cong Liu, Milong Ren, Jiaqi Guan, Chengyue Gong, Jinyuan Sun, Xinshi Chen, Wenzhi Xiao

AI总结 本文提出ProtDBench,一个统一的蛋白质配体设计与评估基准框架,旨在解决当前研究中因评估标准不统一而导致的性能指标难以比较的问题。该框架定义了标准化的任务、评估流程和成功标准,并引入基于固定预算和结构多样性的评估指标,揭示了不同验证方法和过滤规则对性能评估的影响。ProtDBench为蛋白质配体设计方法提供了公平、可复现的评估体系,支持在实际条件下进行系统对比。

详情
AI中文摘要

近年来,从头蛋白质结合物设计的进展使得越来越多的实验验证成为可能,但由于缺乏标准化的评估协议,报道的计算指标仍然难以解释或跨研究比较。我们引入了ProtDBench,一个标准化且考虑通量的蛋白质结合物设计评估框架。ProtDBench定义了统一的基准任务、评估协议和成功标准,能够系统分析评估设计如何影响观察到的性能。利用一个大型湿实验标注数据集,我们分析了常用的结构预测模型作为评估验证器,揭示了在相同过滤协议下显著的验证器依赖偏差和有限的一致性。然后,我们在固定评估协议下,针对十个不同的蛋白质靶点,对代表性的开源生成式结合物设计方法进行了基准测试。除了每条序列的成功率外,ProtDBench还基于固定的24小时预算纳入了考虑通量的指标,以及考虑结构多样性的聚类级成功标准。总之,这些结果揭示了过滤规则、成功定义以及考虑通量的评估在计算效率、成功率和结构多样性之间引起的系统性差异。总体而言,ProtDBench提供了一个公平且可复现的评估流程,支持在现实评估设置下对蛋白质结合物设计方法进行系统且受控的比较。

英文摘要

Recent advances in de novo protein binder design have enabled increasing experimental validation, yet reported in silico metrics remain difficult to interpret or compare across studies due to non-standardized evaluation protocols. We introduce ProtDBench, a standardized and throughput-aware evaluation framework for protein binder design. ProtDBench defines unified benchmark tasks, evaluation protocols, and success criteria, enabling systematic analysis of how evaluation design influences observed performance. Using a large wet-lab annotated dataset, we analyze commonly used structure prediction models as evaluation verifiers, revealing substantial verifier-dependent bias and limited agreement under identical filtering protocols. We then benchmark representative open-source generative binder design methods across ten diverse protein targets under a fixed evaluation protocol. Beyond per-sequence success rates, ProtDBench incorporates throughput-aware metrics based on a fixed 24-hour budget, as well as cluster-level success criteria to account for structural diversity. Together, these results expose systematic differences induced by filtering rules, success definitions, and throughput-aware evaluation between computational efficiency, success rate, and structural diversity. Overall, ProtDBench provides a fair and reproducible evaluation pipeline that supports systematic and controlled comparison of protein binder design methods under realistic evaluation settings.

2603.22297 2026-05-25 q-bio.NC physics.bio-ph

Black Hole-Inspired Horizon Model for Neural Signal Dynamics

黑洞启发式神经信号动力学视界模型

E. Canessa

AI总结 本文提出了一种受黑洞视界启发的模型,用于描述神经信号的动力学特性。该模型将脑电图(EEG)信号视为受限于类似事件视界的有效边界下的波形投影,并通过重整化群标度关系和谱熵参数化可观测模式的可及性。该方法建立了熵基神经观测量与波形信号表示之间的物理联系,为理解神经振荡的尺度依赖动态提供了新的理论框架。

详情
Comments
4 pages, 2 figures
AI中文摘要

脑电图(EEG)信号提供了复杂神经动力学的宏观可观测。我们引入了一个受视界启发的框架,其中测量的EEG信号被建模为一个受有效边界(类似于事件视界)约束的复杂类波表示的投影。在该公式中,信号幅度服从重整化群标度关系,而EEG谱熵参数化可观测模式的可及性。所得解产生振荡结构,其几何和谱特征可通过信号分析和声化进行探索。这种基于熵的神经可观测与类波信号表示之间的映射提供了一个物理动机的框架,连接了熵度量、尺度依赖动力学和可观测的神经振荡,并提出了谱熵与EEG模式幅度标度之间的可检验联系。

英文摘要

Electroencephalographic (EEG) signals provide macroscopic observables of complex neural dynamics. We introduce a horizon-inspired framework in which measured EEG signals are modeled as projections of a complex wave-like representation constrained by an effective boundary analogous to an event horizon. In this formulation the signal amplitude obeys a renormalization-group scaling relation while EEG spectral entropy parameterizes the accessibility of observable modes. The resulting solutions generate oscillatory structures whose geometry and spectral signatures can be explored through signal analysis and sonification. This mapping between entropy-based neural observables and wave-like signal representations provides a physically motivated framework linking entropy measures, scale-dependent dynamics, and observable neural oscillations, and suggests testable connections between spectral entropy and the amplitude scaling of EEG modes.

2602.13249 2026-05-25 q-bio.BM cs.AI cs.LG

A Systematic Evaluation of Co-folding Model Representations for Small-Molecule Learning

小分子学习的共折叠模型表示的系统评估

Hyosoon Jang, Hyunjin Seo, Honghui Kim, Seonghyun Park, Taewon Kim, Yunhui Jang, Sungsoo Ahn

AI总结 本文系统评估了基于蛋白质-配体共折叠的模型在小分子学习中的表示能力。研究使用现代共折叠模型Boltz2,将其原子级配体表示迁移到独立的小分子任务中,结果表明其性能在ADMET基准测试中达到或超越现有模型,并提升了分子生成建模和结构引导的配体优化效率。此外,Boltz2的表示与传统独立分子监督方法具有互补性,并可应用于强化学习以增强分子发现过程。这些结果表明,蛋白质-配体共折叠是一种有前景的小分子表示学习预训练范式。

详情
AI中文摘要

小分子基础模型通常仅在独立分子数据上进行预训练,这与视觉和语言模型不同,后者通常受益于跨模态或关系监督。蛋白质-配体共折叠通过将模型暴露于原子级配体-蛋白质相互作用,提供了这种监督的分子类似物,引发了一个问题:共折叠模型能否产生强大的小分子表示。我们使用现代共折叠模型Boltz2研究这个问题,通过将其原子级配体表示转移到独立的小分子任务。通过系统探测和蒸馏,我们表明Boltz2表示在ADMET基准上匹配或超越现有模型,加速分子生成建模,并提高结构引导配体优化的样本效率。我们进一步发现Boltz2表示与从传统独立分子监督(包括3D构象、生物测定标签和量子化学性质)中学习到的表示互补。最后,我们将表示对齐扩展到强化学习,表明密集的表示级监督可以补充分子发现中的标量奖励。这些结果将蛋白质-配体共折叠确定为小分子表示学习的有前景的预训练范式,并将Boltz2定位为强大的现成分子基础模型。

英文摘要

Small-molecule foundation models are typically pretrained on standalone molecular data, unlike vision and language models that often benefit from cross-modal or relational supervision. Protein-ligand co-folding provides a molecular analogue of such supervision by exposing models to atom-level ligand-protein interactions, raising the question of whether co-folding models can yield strong small-molecule representations. We study this question using Boltz2, a modern co-folding model, by transferring its atom-level ligand representations to standalone small-molecule tasks. Through systematic probing and distillation, we show that Boltz2 representations match or outperform existing models on the ADMET benchmark, accelerate molecular generative modeling, and improve sample efficiency in structure-guided ligand optimization. We further find that Boltz2 representations are complementary to those learned from conventional standalone molecular supervision, including 3D conformers, bioassay labels, and quantum-chemical properties. Finally, we extend representation alignment to reinforcement learning, showing that dense representation-level supervision can complement scalar rewards in molecular discovery. These results identify protein-ligand co-folding as a promising pretraining paradigm for small-molecule representation learning and position Boltz2 as a strong, off-the-shelf molecular foundation model.

2601.12805 2026-05-25 q-bio.GN cs.AI cs.CL

SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding

SciHorizon-GENE:从基因知识到功能理解的生命科学推理基准测试

Xiaohan Huang, Meng Xiao, Chuan Qin, Qingqing Long, Jinmiao Chen, Yuanchun Zhou, Hengshu Zhu

AI总结 该研究提出 SciHorizon-GENE,一个大规模的以基因为中心的基准测试平台,用于评估大型语言模型在生命科学推理中的能力,特别是在从基因知识推导功能理解方面。该基准整合了超过19万个基因的权威生物学知识,包含54万个涵盖细胞类型注释、功能解释和机制分析等问题,从四个生物学关键角度评估模型的表现。研究系统评估了多种先进通用和生物医学语言模型,揭示了它们在基因层面推理能力上的显著差异,并为模型选择和优化提供了重要参考。

详情
Comments
Accepted by SIGKDD 2026. 12 pages
AI中文摘要

大型语言模型(LLMs)在生物医学研究中展现出日益增长的潜力,尤其是在知识驱动的解释任务中。然而,它们从基因知识到功能理解的可靠推理能力——这是知识增强型细胞图谱解释的核心要求——仍然在很大程度上未被探索。为了填补这一空白,我们引入了SciHorizon-GENE,这是一个基于权威生物数据库构建的大规模基因中心基准。该基准整合了超过19万个人类基因的 curated 知识,包含超过54万个问题,涵盖了与细胞类型注释、功能解释和机制导向分析相关的多种基因到功能推理场景。受初步检查中观察到的行为模式启发,SciHorizon-GENE从四个生物学关键角度评估LLMs:研究关注敏感性、幻觉倾向、答案完整性和文献影响力,明确针对限制LLMs在生物解释管道中安全采用的失败模式。我们系统评估了多种最先进的通用和生物医学LLMs,揭示了基因级推理能力的显著异质性,以及在生成忠实、完整且基于文献的功能解释方面的持续挑战。我们的基准为在基因尺度上分析LLM行为建立了系统基础,并为模型选择和发展提供了见解,与知识增强型生物解释直接相关。

英文摘要

Large language models (LLMs) have shown growing promise in biomedical research, particularly for knowledge-driven interpretation tasks. However, their ability to reliably reason from gene-level knowledge to functional understanding, a core requirement for knowledge-enhanced cell atlas interpretation, remains largely underexplored. To address this gap, we introduce SciHorizon-GENE, a large-scale gene-centric benchmark constructed from authoritative biological databases. The benchmark integrates curated knowledge for over 190K human genes and comprises more than 540K questions covering diverse gene-to-function reasoning scenarios relevant to cell type annotation, functional interpretation, and mechanism-oriented analysis. Motivated by behavioral patterns observed in preliminary examinations, SciHorizon-GENE evaluates LLMs along four biologically critical perspectives: research attention sensitivity, hallucination tendency, answer completeness, and literature influence, explicitly targeting failure modes that limit the safe adoption of LLMs in biological interpretation pipelines. We systematically evaluate a wide range of state-of-the-art general-purpose and biomedical LLMs, revealing substantial heterogeneity in gene-level reasoning capabilities and persistent challenges in generating faithful, complete, and literature-grounded functional interpretations. Our benchmark establishes a systematic foundation for analyzing LLM behavior at the gene scale and offers insights for model selection and development, with direct relevance to knowledge-enhanced biological interpretation.

2511.18000 2026-05-25 cs.LG cs.AI q-bio.PE

Reward Engineering for Spatial Epidemic Simulations: A Reinforcement Learning Platform for Individual Behavioral Learning

空间流行病模拟中的奖励工程:个体行为学习的强化学习平台

Radman Rakhshandehroo, Daniel Coombs

AI总结 本文介绍了 ContagionRL,一个专为疫情空间模拟设计的强化学习平台,用于系统研究奖励函数设计对个体行为学习的影响。该平台结合了可配置的 SIRS+D 流行病模型,支持在不同环境条件下评估多种奖励机制对智能体生存策略的影响,并通过实验发现方向引导和明确遵守激励是提升策略学习的关键因素。研究还表明,采用势场奖励函数的智能体在非药物干预遵守和空间规避策略方面表现最优,平台为探索奖励与行为关系提供了模块化工具,具有重要的理论和应用价值。

详情
Journal ref
Transactions on Machine Learning Research, 2026
Comments
38 pages, 15 figures and 18 tables; Accepted to TMLR. OpenReview: https://openreview.net/forum?id=yPEASsx3hk
AI中文摘要

我们提出了ContagionRL,一个与Gymnasium兼容的强化学习平台,专门用于空间流行病模拟中的系统奖励工程。与依赖固定行为规则的传统基于智能体的模型不同,我们的平台能够严格评估奖励函数设计如何影响在不同流行病场景中学到的生存策略。ContagionRL集成了空间SIRS+D流行病模型与可配置的环境参数,允许研究人员在包括有限可观测性、不同移动模式和异质人口动态等变化条件下对奖励函数进行压力测试。我们评估了五种不同的奖励设计,从稀疏生存奖励到一种新颖的势场方法,跨越多种RL算法(PPO、SAC、A2C)。通过系统的消融研究,我们发现方向性指导和明确的依从性激励是稳健策略学习的关键组成部分。我们在不同感染率、网格大小、可见性约束和移动模式下的全面评估表明,奖励函数的选择显著影响智能体行为和生存结果。使用我们的势场奖励训练的智能体始终获得优越性能,学习最大程度地遵守非药物干预,同时发展出复杂的空间规避策略。该平台的模块化设计使得能够系统地探索奖励-行为关系,弥补了这类模型中奖励工程关注有限的空白。ContagionRL是研究流行病背景下适应性行为反应的有效平台,并强调了奖励设计、信息结构和环境可预测性在学习中的重要性。我们的代码公开在https://github.com/redradman/ContagionRL。

英文摘要

We present ContagionRL, a Gymnasium-compatible reinforcement learning platform specifically designed for systematic reward engineering in spatial epidemic simulations. Unlike traditional agent-based models that rely on fixed behavioral rules, our platform enables rigorous evaluation of how reward function design affects learned survival strategies across diverse epidemic scenarios. ContagionRL integrates a spatial SIRS+D epidemiological model with configurable environmental parameters, allowing researchers to stress-test reward functions under varying conditions including limited observability, different movement patterns, and heterogeneous population dynamics. We evaluate five distinct reward designs, ranging from sparse survival bonuses to a novel potential field approach, across multiple RL algorithms (PPO, SAC, A2C). Through systematic ablation studies, we identify that directional guidance and explicit adherence incentives are critical components for robust policy learning. Our comprehensive evaluation across varying infection rates, grid sizes, visibility constraints, and movement patterns reveals that reward function choice dramatically impacts agent behavior and survival outcomes. Agents trained with our potential field reward consistently achieve superior performance, learning maximal adherence to non-pharmaceutical interventions while developing sophisticated spatial avoidance strategies. The platform's modular design enables systematic exploration of reward-behavior relationships, addressing a knowledge gap in models of this type where reward engineering has received limited attention. ContagionRL is an effective platform for studying adaptive behavioral responses in epidemic contexts and highlight the importance of reward design, information structure, and environmental predictability in learning. Our code is publicly available at https://github.com/redradman/ContagionRL

2411.03421 2026-05-25 astro-ph.EP nlin.AO physics.bio-ph physics.pop-ph q-bio.PE

Exo-Daisy World: Revisiting Gaia Theory through an Informational Architecture Perspective

Exo-Daisy World: 通过信息架构视角重新审视盖亚理论

Damian R Sowinski, Gourab Ghoshal, Adam Frank

AI总结 本文通过语义信息理论的视角,重新审视经典的 Daisy World 模型,旨在刻画生物圈与行星环境之间的信息流动,即 Daisy World 系统的信息架构。研究引入了适用于 M 型矮星系外行星的扩展模型——Exo-Daisy World,利用随机微分方程描述生物圈与行星环境的共演化过程,揭示了随着恒星光度增加,生物圈与环境之间的相关性增强,并对应于信息交换的不同阶段。该方法为构建更精细的 ExoGaia 模型提供了定量分析工具,对系外行星生物标志物的识别和天体生物学观测具有重要意义。

详情
Comments
14 pages, 4 figures, 1 appendix
AI中文摘要

Daisy World模型长期以来一直是理解行星生物圈自我调节的基础框架,为可能控制宜居系外行星的反馈机制提供了见解。在本研究中,我们通过语义信息理论(SIT)的视角扩展了经典Daisy World模型,旨在表征生物圈与行星环境之间的信息流——我们称之为Daisy World系统的\emph{信息架构}。我们的目标是开发分析耦合行星系统(包括生物圈和岩石圈)演化的新方法,对天体生物学观测和不可知生物特征的识别具有重要意义。为了在此背景下实现SIT,我们引入了一个针对M矮星系外行星潜在条件定制的Daisy World模型版本,构建了一个描述雏菊及其行星环境共同演化的随机微分方程组。对该Exo-Daisy World模型的分析揭示了生物圈与环境之间的相关性如何随着恒星亮度增强而增强,以及这些相关性如何对应于耦合系统之间信息交换的不同阶段。这种\emph{rein控制}提供了生物圈与其宿主行星之间信息反馈的定量描述。最后,我们讨论了我们的方法对于开发详细的宜居系外行星系统ExoGaia模型的更广泛意义,提出了解释天体生物学数据和探索生物特征候选的新途径。

英文摘要

The Daisy World model has long served as a foundational framework for understanding the self-regulation of planetary biospheres, providing insights into the feedback mechanisms that may govern inhabited exoplanets. In this study, we extend the classic Daisy World model through the lens of Semantic Information Theory (SIT), aiming to characterize the information flow between the biosphere and planetary environment -- what we term the \emph{information architecture} of Daisy World systems. Our objective is to develop novel methodologies for analyzing the evolution of coupled planetary systems, including biospheres and geospheres, with implications for astrobiological observations and the identification of agnostic biosignatures. To operationalize SIT in this context, we introduce a version of the Daisy World model tailored to reflect potential conditions on M-dwarf exoplanets, formulating a system of stochastic differential equations that describe the co-evolution of the daisies and their planetary environment. Analysis of this Exo-Daisy World model reveals how correlations between the biosphere and environment intensify with rising stellar luminosity, and how these correlations correspond to distinct phases of information exchange between the coupled systems. This \emph{rein control} provides a quantitative description of the informational feedback between the biosphere and its host planet. Finally, we discuss the broader implications of our approach for developing detailed ExoGaia models of inhabited exoplanetary systems, proposing new avenues for interpreting astrobiological data and exploring biosignature candidates.

2405.09748 2026-05-25 q-bio.CB math.CO math.GR q-bio.QM

A Mathematical Reconstruction of Endothelial Cell Networks

内皮细胞网络的数学重构

Okezue Bell, Anthony Bell

AI总结 本文提出了一种名为 $π$-图的数学形式化方法,用于精确描述内皮细胞网络的多类型连接结构。通过引入 $π$-同构的概念,研究建立了内皮网络连接结构的数学表征,并证明了其与传统图论表示的关系。该方法还拓展了时间维度,用于分析 $π$-图在几何空间中的拓扑演化,为内皮网络功能与结构关系的定量研究提供了新工具。

详情
Comments
Authors believe clinical validation is necessary before publication and discovered errors in the initial model; specifically that the the central definition does not encode the network connectivity
AI中文摘要

内皮细胞构成血管和淋巴系统的关键,形成对血管生成、控制血管通透性和维持组织稳态至关重要的复杂网络。尽管其关键作用,目前尚无严格的数学框架来表示内皮网络的连接结构。在此,我们开发了一种开创性的数学形式化方法,称为π-图,用于建模内皮网络的多类型连接结构。我们将π-图定义为由内皮细胞及其连接集组成的抽象对象,并引入了π-同构的关键概念,该概念捕捉了两个π-图具有相同连接结构的情况。我们证明了几个关于π-图表示与传统图论表示关系的命题,表明π-同构意味着相应的非嵌套内皮图同构,但反之不成立。我们还为π-图形式化引入了时间维度,并探讨了π-图空间嵌入中拓扑不变量的演化。最后,我们概述了一个表示π-图嵌入几何空间的拓扑框架。π-图形式化为内皮网络连接性及其与功能关系的定量分析提供了一种新工具,有望为血管生理学和病理生理学带来新见解。

英文摘要

Endothelial cells form the linchpin of vascular and lymphatic systems, creating intricate networks that are pivotal for angiogenesis, controlling vessel permeability, and maintaining tissue homeostasis. Despite their critical roles, there is no rigorous mathematical framework to represent the connectivity structure of endothelial networks. Here, we develop a pioneering mathematical formalism called $π$-graphs to model the multi-type junction connectivity of endothelial networks. We define $π$-graphs as abstract objects consisting of endothelial cells and their junction sets, and introduce the key notion of $π$-isomorphism that captures when two $π$-graphs have the same connectivity structure. We prove several propositions relating the $π$-graph representation to traditional graph-theoretic representations, showing that $π$-isomorphism implies isomorphism of the corresponding unnested endothelial graphs, but not vice versa. We also introduce a temporal dimension to the $π$-graph formalism and explore the evolution of topological invariants in spatial embeddings of $π$-graphs. Finally, we outline a topological framework to represent the spatial embedding of $π$-graphs into geometric spaces. The $π$-graph formalism provides a novel tool for quantitative analysis of endothelial network connectivity and its relation to function, with the potential to yield new insights into vascular physiology and pathophysiology.

2312.10916 2026-05-25 q-bio.NC

MEG Evidence That Modality-Independent Conceptual Representations Encode Visual but Not Lexical Representations

MEG证据表明模态无关的概念表征编码视觉表征而非词汇表征

Julien Dirani, Liina Pylkkänen

AI总结 该研究利用MEG数据,结合词/图交叉条件解码与神经网络分类器,探讨了大脑中模态无关概念表征的性质。研究发现,这些表征并非完全抽象,而是包含了视觉信息,但未涉及词汇属性。这一结果表明,感知过程在编码模态无关概念中起关键作用,而词汇表征似乎不参与模态无关语义知识的构建。

详情
AI中文摘要

我们大脑中存储的语义知识可以通过不同的刺激模态访问。例如,一张猫的图片和单词“cat”都能激活相似的概念表征。虽然现有研究已经发现了模态无关表征的证据,但其内容仍然未知。模态无关表征可能是抽象的,也可能本质上是感知的甚至是词汇的。我们采用了一种新颖的方法,将单词/图片跨条件解码与从MEG数据中学习潜在模态无关表征的神经网络分类器相结合。然后,我们将这些表征与代表语义、感觉和词汇特征的模型进行比较。结果表明,模态无关表征并非严格意义上的非模态;相反,它们也包含视觉表征。没有证据表明词汇属性有助于模态无关概念的表征。这些发现支持了感知过程在编码模态无关概念表征中发挥基础作用的观点。相反,词汇表征似乎不参与模态无关的语义知识。

英文摘要

The semantic knowledge stored in our brains can be accessed from different stimulus modalities. For example, a picture of a cat and the word "cat" both engage similar conceptual representations. While existing research has found evidence for modality-independent representations, their content remains unknown. Modality-independent representations could be abstract, or they might be perceptual or even lexical in nature. We used a novel approach combining word/picture cross-condition decoding with neural network classifiers that learned latent modality-independent representations from MEG data. We then compared these representations to models representing semantic, sensory, and lexical features. Results show that modality-independent representations are not strictly amodal; rather, they also contain visual representations. There was no evidence that lexical properties contributed to the representation of modality-independent concepts. These findings support the notion that perceptual processes play a fundamental role in encoding modality-independent conceptual representations. Conversely, lexical representations did not appear to partake in modality-independent semantic knowledge.

2605.22954 2026-05-25 cs.LG q-bio.QM

FederatedRSF : Federated Random Survival Forests for Partially Overlapping Medical Data

FederatedRSF:面向部分重叠医学数据的联邦随机生存森林

Maryam Moradpour, Jonas Harriehausen, Amirreza Aleyasin, Lion Philipp Wolf, Youngjun Park, Anne-Christin Hauschild

AI总结 本文提出了一种名为FederatedRSF的联邦学习方法,用于处理多中心医疗数据中的生存分析问题,特别是在数据特征部分重叠的情况下。该方法通过在各机构本地训练随机生存森林模型,并仅共享特征兼容的树结构,从而在不泄露原始数据的前提下实现模型聚合与推理。实验表明,该方法在乳腺癌数据集上的表现与集中式训练模型相当,有效解决了数据隐私和特征异质性带来的挑战。

详情
Comments
4 pages, 2 figures. Maryam Moradpour, Jonas Harriehausen, and Amirreza Aleyasin contributed equally to this work. Includes supplementary material
AI中文摘要

多中心生存预测可以提高鲁棒性和泛化性,但隐私法规和机构治理通常阻止跨机构汇集患者水平的临床和基因组数据。在实践中,部署因特征空间异质性而进一步复杂化,其中不同站点收集不同的协变量或使用不同的测序面板,导致特征集仅部分重叠。我们提出了FederatedRSF,一个实现联邦随机生存森林的Python包,它聚合本地训练的生存树,并仅将特征兼容的树重新分发到每个站点,从而在无需共享原始数据的情况下实现部分重叠的推理。我们在scikit-survival包中分发的GBSG2乳腺癌队列上评估了FederatedRSF,通过保留特征子集模拟客户端之间的特征异质性,并使用Harrell一致性指数(C-Index)在重复交叉验证和站点分割下评估区分能力。结果表明,联邦模型可以达到与集中式训练设置相当的性能。

英文摘要

Multi-center survival prediction can improve robustness and generalizability, yet privacy regulations and institutional governance often prevent pooling patient-level clinical and genomic data across institutions. In practice, deployment is further complicated by feature-space heterogeneity, in which sites collect different covariates or use different sequencing panels, resulting in only partially overlapping feature sets. We present FederatedRSF, a Python package that implements federated random survival forests, aggregating locally trained survival trees and redistributing only feature-compatible trees to each site, enabling inference with partial overlap without sharing raw data. We evaluate FederatedRSF on the GBSG2 breast cancer cohort distributed with the scikit-survival package, simulating feature heterogeneity across clients by withholding subsets of features, and assessing discrimination using Harrell's concordance index (C-Index) under repeated cross-validation and site-splits. The results demonstrated that the federated model can achieve performance comparable to that of the centralized training setting.

2605.22899 2026-05-25 q-bio.TO eess.IV

ROI Extraction in Thermographic Breast Images Using Genetic Algorithms

使用遗传算法在热成像乳腺图像中提取感兴趣区域

LC Mendes, EO Rodrigues, Sandro C Izidoro, Aura Conci, Panos Liatsis

AI总结 本文提出了一种基于遗传算法(GA)的方法,用于从热成像乳腺图像中提取乳腺区域。该方法结合颜色信息和基于心形曲线的适应度函数,实现了自动化的感兴趣区域(ROI)分割。该方法在58张图像中成功分割出乳腺区域的52张,无需手动选择种子点,为乳腺癌检测的准确性提升和采集协议标准化提供了有效支持。

详情
Journal ref
IWSSIP 2020
AI中文摘要

本工作提出使用遗传算法(GA)从热成像乳腺图像的背景中识别乳房区域。所提出的方法利用颜色信息、基于心形线的适应度函数和GA。这是文献中首次提出基于GA和心形线的感兴趣区域(ROI)提取。ROI提取可以提高癌症检测的准确性,并有助于采集协议的标准化。该方法能够成功地在58张图像中分离出52张的乳房区域,同时完全自动化,无需手动选择种子点。

英文摘要

This work proposes the use of Genetic Algorithms (GA) to identify the area of the breast from the background in thermographic breast images. The proposed method uses color information, a fitness function based on cardioids, and GA. This is the first work in the literature to propose a Region of Interest (ROI) extraction based on GA and cariods. ROI extraction can improve the accuracy of cancer detection and assist with the standardization of acquisition protocols. The method is able to successfully separate the breast region in 52 out of 58 images, while being fully automatic, and not requiring manual selection of seed points.

2605.22853 2026-05-25 eess.SP cs.LG q-bio.QM

Topological Signal Processing: An Application-Oriented Tutorial

拓扑信号处理:面向应用的教程

Flavia Petruso, Maria Giulia Preti, Dimitri Van De Ville

AI总结 本文介绍了拓扑信号处理(TSP)的基础概念及其在实际应用中的方法,旨在帮助研究者更好地理解和应用这一新兴领域。TSP 扩展了传统图信号处理(GSP),能够处理定义在节点、边、三角形等高阶网络结构上的信号,通过组合霍奇拉普拉斯算子等工具,实现了对复杂系统中高阶相互作用的分析。文章结合脑成像等实际案例,展示了 TSP 在揭示非平凡区域交互关系中的潜力,推动其在理论与应用研究中的广泛应用。

详情
AI中文摘要

许多现代数据集规模庞大且具有复杂的结构关系。传统上,基于图的方法用于表示网络数据,将个体元素建模为节点,将成对交互建模为边。此外,图信号处理(GSP)已被开发用于分析图节点上的信号,例如全国不同地区的温度测量值(节点信号)表示为图。拓扑信号处理(TSP)是一个新兴领域,它推广了GSP,使得不仅可以分析节点上的信号,还可以分析边、三角形以及更高维网络元素上的信号,这些元素被建模为单纯复形及相关拓扑结构。这使得TSP通过将滤波和傅里叶变换等经典信号处理概念扩展到拓扑层面,自然适用于研究复杂系统中的高阶交互。尽管TSP具有多功能性,但对许多实践者来说仍然具有挑战性。因此,我们提供了一个易于理解的TSP基础概述,同时与面向应用的场景建立联系。我们重点介绍基于组合Hodge Laplacian的处理技术,该技术将图Laplacian推广到单纯复形。特别地,我们回顾了关键的TSP概念,将其与现实世界的例子联系起来,并讨论了如何从数据集中导出高阶结构和信号。例如,我们引入了一种捕捉节点信号之间滞后交互的边级信号,并在基于TSP的脑成像数据分析案例研究中展示了其应用,揭示了脑区域集合之间的非平凡交互。总体而言,我们旨在通过弥合方法发展与应用程序之间的差距,促进TSP的更广泛采用,推动其在理论和应用研究人员社区中的使用。

英文摘要

Many modern datasets are large and carry complex structural relationships. Graph-based methods have traditionally been used to represent networked data, modeling individual elements as nodes and pairwise interactions as edges. Furthermore, Graph Signal Processing (GSP) has been developed to analyze signals on graph nodes, such as temperature measurements (node signals) across different regions of a country represented as a graph. Topological Signal Processing (TSP) is an emerging field that generalizes GSP, enabling the analysis of signals defined not only on nodes but also on edges, triangles, and higher-dimensional network elements, modeled as simplicial complexes and related topological structures. This makes TSP naturally well-suited for studying higher-order interactions in complex systems by extending classical signal processing concepts, such as filtering and Fourier transforms, to the topological level. Despite its versatility, TSP remains challenging for many practitioners. Therefore, we present an accessible overview of TSP foundations while drawing connections with application-oriented settings. We focus on processing techniques based on the combinatorial Hodge Laplacian, which generalizes the graph Laplacian to simplicial complexes. In particular, we review key TSP concepts, relate them to real-world examples, and discuss how higher-order structures and signals can be derived from datasets. For instance, we introduce an edge-level signal capturing lagged interactions between nodal signals, and demonstrate its use in a case study on TSP-based analysis of brain imaging data, revealing nontrivial interactions between sets of brain regions. Overall, we aim to promote a broader adoption of TSP by bridging methodological developments with applications, fostering its use among a wide community of theoretical and applied researchers.

2605.22848 2026-05-25 cs.CE cs.LG q-bio.OT

From Simulation to Discovery: AI Enabled Probabilistic Emulation of Mechanistic Crop Systems

从模拟到发现:AI驱动的机理作物系统概率仿真

Mojdeh Saadati, Juan Panelo, Gustavo Visentini, Soumik Sarkar, Carlos Messina, Baskar Ganapathysubramanian

AI总结 该研究提出了一种基于人工智能的概率神经模拟器,用于高效模拟作物生长过程,解决了传统作物模型计算成本过高的问题。通过训练大量多样化条件下的模拟数据,并结合物理一致的天气生成器,该方法在保持高预测精度的同时大幅提升了模拟效率,能够快速探索不同基因型、环境和管理条件下的作物响应。研究发现了一些在多种条件下保持高产量的玉米性状组合,并揭示了辐射利用效率和温度驱动的根系动态是影响产量韧性的关键因素,展示了该方法在农业适应气候变化研究中的巨大潜力。

详情
AI中文摘要

全球粮食安全依赖于预测作物对气候变异的响应,但基于过程的作物模型对于大规模探索基因型和环境相互作用而言计算成本过高。本文开发了APSIM的概率神经仿真器,该仿真器在13个输出上以高保真度(R²=0.93)再现了关键玉米生长过程,同时将模拟时间降低了数个数量级。该框架在涵盖多样化遗传、土壤和管理条件的200万次模拟上训练,并辅以卷积合成天气生成器以产生物理一致的气候序列,从而能够在现实且多样化的环境输入下进行可扩展的作物响应探索,同时提供校准的预测不确定性,无需昂贵的贝叶斯推断。将该框架应用于10万个性状配置、爱荷华州和伊利诺伊州的六种土壤环境以及两种排放情景下直至2100年的气候预测,我们识别出181种在所有测试条件下均能持续保持高产的玉米性状组合——这一分析仅靠机理模型是无法实现的。我们进一步表明,辐射利用效率和温度驱动的根系动态是产量韧性的主要驱动因素。值得注意的是,预测的产量分布在不同地点间差异显著,一些低生产力地点在未来气候情景下产量增加,表明气候变化可能以非直观的方式重塑区域产量潜力。这些结果证明了不确定性感知仿真如何将机理作物模拟从计算瓶颈转变为按需发现引擎,其能够以任何基于过程的模型无法比拟的规模探索完整的基因型、环境和管理空间。

英文摘要

Global food security depends on predicting crop responses to climate variability, yet process based crop models remain too computationally expensive for large scale exploration of genotype and environment interactions. Here we develop a probabilistic neural emulator of APSIM that reproduces key maize growth processes across 13 outputs with high fidelity (with R^2 of 0.93) while reducing simulation time by several orders of magnitude. Trained on two million simulations spanning diverse genetic, soil, and management conditions, and augmented with a convolutional synthetic weather generator that produces physically consistent climate sequences, the framework enables scalable exploration of crop responses under realistic and diverse environmental inputs while providing calibrated predictive uncertainty without costly Bayesian inference. Applying this framework across 100,000 trait configurations, six soil environments in Iowa and Illinois, and climate projections through the year 2100 under two emissions scenarios, we identify 181 maize trait combinations that consistently maintain high yield across all tested conditionsan analysis infeasible with the mechanistic model alone. We further show that radiation use efficiency and temperature driven root dynamics are dominant drivers of yield resilience. Notably, projected yield distributions vary substantially across locations, with some lower productivity sites exhibiting yield increases under future climate scenarios, indicating that climate change may reshape regional yield potential in nonintuitive ways. These results demonstrate how uncertainty aware emulation transforms mechanistic crop simulation from a computational bottleneck into an on demand discovery engine, one capable of interrogating the full genotype, environment and management space at a scale no process-based model can match.

2605.22838 2026-05-25 q-bio.GN math.OC stat.AP

Detecting and Correcting Sample-by-Sample Scale Distortion in RNA Sequencing Data

检测和校正RNA测序数据中逐样本尺度失真

Christopher Thron, Farhad Jafari

AI总结 本文研究了RNA测序数据中样本间表达水平依赖的尺度偏差问题,并提出两种基于统计学的非线性变换方法以检测和校正此类偏差。传统归一化方法无法消除这些偏差,而该方法有效减少了样本间的方差,提高了基因间相关性分布的特性,并增强了群体间差异检测的灵敏度和特异性。研究结果有助于更准确地理解基因间的相互作用,并可能提升临床检测信息的应用价值。

详情
Journal ref
BMC bioinformatics 26.1 (2025): 32
Comments
25 pages, 17 figures
AI中文摘要

RNA测序(RNA-seq)是用于捕获生物样本中所有可检测基因表达水平的常规基因组规模方法。现在,这被常规用于基于人群的研究,以确定各种疾病的遗传决定因素。自然,这些测试的准确性应得到验证并尽可能改进。在本研究中,我们旨在检测和校正随样本变化的表达水平依赖误差,这些误差无法通过常规标准化技术校正。我们检查了来自癌症基因组图谱(TCGA)、Stand Up 2 Cancer(SU2C)和GTEx数据库的多个RNA-seq数据集,这些数据集经过不同类型的预处理。通过应用局部平均,我们在所有研究的数据集中发现了逐样本表达水平依赖的偏差。使用模拟,我们表明这些偏差会破坏亚群之间的基因-基因相关性估计和$t$检验。为了减轻这些偏差,我们基于统计考虑引入了两种不同的非线性变换,以校正观察到的偏差。我们证明这些变换有效地消除了观察到的逐样本偏差,减少了样本间方差,并改善了基因-基因相关性分布的特征。使用一种新颖的模拟方法在亚群之间创建受控差异,我们表明这些变换减少了变异性并增加了两个群体检验的敏感性。在大多数情况下,数据校正偏差后,敏感性和特异性的改进幅度约为3-5%。总之,这些结果提高了我们理解基因-基因关系的能力,并可能带来利用临床测试信息的新方法。

英文摘要

RNA sequencing (RNA-seq) is the conventional genome-scale approach used to capture the expression levels of all detectable genes in a biological sample. This is now regularly used for population-based studies designed to identify genetic determinants of various diseases. Naturally, the accuracy of these tests should be verified and improved if possible. In this study, we aimed to detect and correct for expression level-dependent errors which vary from sample to sample, and are not corrected by conventional normalization techniques . We examined several RNA-seq datasets from the Cancer Genome Atlas (TCGA), Stand Up 2 Cancer (SU2C), and GTEx databases with various types of preprocessing. By applying local averaging, we found sample by sample expression-level dependent biases in all datasets studied. Using simulations, we show that these biases corrupt gene-gene correlation estimations and $t$ tests between subpopulations. To mitigate these biases, we introduce two different nonlinear transforms based on statistical considerations that correct these observed biases. We demonstrate that that these transforms effectively remove the observed per-sample biases, reduce sample-to-sample variance, and improve the characteristics of gene-gene correlation distributions. Using a novel simulation methodology that creates controlled differences between subpopulations, we show that these transforms reduce variability and increase sensitivity of two population tests. The improvements in sensitivity and specificity were of the order of 3-5\% in most instances after the data was corrected for bias. Altogether, these results improve our capacity to understand gene-gene relationships, and may lead to novel ways to utilize the information derived from clinical tests.