arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.13750 2026-05-14 physics.soc-ph cs.GT q-bio.PE

The Co-evolution of Costly Signaling and Cooperation in Social Dilemmas

Mahdi Abolhasani, Saman Moghimi-Araghi, Mohammad Salahshour

AI总结该研究探讨了在社会困境中，昂贵合作行为与昂贵信号机制如何共同演化的问题。通过构建一个信号与行为相互影响的模型，研究发现信号的演化更多取决于其引发的合作反应，而非单纯的信号成本。研究显示，在不同博弈场景和群体结构下，信号机制能够促进合作的维持，尤其在空间结构中合作更易增强，而在囚徒困境中则需引入动态关联因素来解释其演化机制。

Comments 25 pages, 9 figures

2605.13675 2026-05-14 cs.CV cs.LG q-bio.NC

Characterizing Universal Object Representations Across Vision Models

Florian P. Mahner, Johannes Roth, Ka Chun Lam, Michael F. Bonner, Francisco Pereira, Martin N. Hebart

AI总结本研究探讨了不同架构、目标函数和数据集训练的深度神经网络在视觉表征上的收敛现象，旨在揭示模型实际收敛于哪些视觉属性以及影响这一收敛的因素。通过将162个多样化视觉模型的对象相似性结构分解为少量非负维度，并分析这些维度在模型间的重复出现情况，研究发现部分维度具有跨模型的普遍性，且更易解释、更受图像语义属性驱动。研究还表明，模型的普遍性维度与灵长类动物视觉皮层活动和人类相似性判断的预测能力更强，暗示了这种普遍性可能反映了与生物视觉相关的表征特性。

2605.13535 2026-05-14 q-bio.PE

Shared quasispecies architecture in experimental and natural RNA virus populations

Samuel Martínez-Alcalá, Iker Atienza-Diez, Pilar Somovilla, Brenda Martínez-González, Celia Perales, Luis F. Seoane, Ester Lázaro, Susanna Manrubia

AI总结该研究探讨了RNA病毒群体在基因组序列空间中的结构组织，比较了实验条件下进化的大肠杆菌噬菌体Qβ和自然宿主中进化的SARS-CoV-2的基因型网络架构。通过深度测序数据重建病毒群体中突变耦合变异体的网络结构，发现尽管两者在基因组大小、突变率和生态环境上差异显著，但均表现出相似的层次化结构：一个高度丰富的中心基因型被不同丰度的变异体环绕。这一共同的基因型网络架构表明，RNA病毒可能共享由序列空间基本性质和复制突变机制决定的普遍组织模式，为理解病毒进化可预测性提供了新的视角。

Comments 19 pages (main ms, 5 figures, 1 table) + 4 pages (supplementary information, 3 figures)

2605.13504 2026-05-14 stat.ME math.AP math.DS math.PR q-bio.QM

Structural identifiability of partially-observed stochastic processes: from single-particle trajectories to total particle density data

Arianna Ceccarelli, Alexander P. Browning, Ruth E. Baker

AI总结本文研究了部分观测随机过程的结构可识别性问题，探讨了在单粒子轨迹数据和总粒子密度数据下参数能否唯一确定。作者提出了一种适用于时空随机过程的方法，针对轨迹数据采用个体模型描述，针对密度数据建立偏微分方程模型并结合微分代数方法进行分析，同时引入基于特征方程的初始条件分析方法，揭示了初始条件对可识别性的重要影响，并通过实例展示了该方法在识别参数组合上的有效性。

Comments Main: 26 pages, 4 figures Supplementary Information: 20 pages, 5 figures

2605.13364 2026-05-14 q-bio.BM

Predicting Endocrine Disruptors: A Deep Learning QSAR Model for Estrogen Receptor Activity

Belaguppa Manjunath Ashwin Desai, Shreyas Murthy, Bhoomika Sridhar, Anirudh Belaguppa Manjunath, Vivien Humtsoe, Pronama Biswas

AI总结该研究针对内分泌干扰物（EDCs）对健康和生态系统的潜在威胁，开发了一种基于深度学习的QSAR模型，用于预测化合物对雌激素受体的结合活性。研究利用包含224种化合物和2,944个分子描述符与指纹的数据库，训练了一个结合Dropout和批归一化的深度神经网络，模型在测试集上表现出较高的准确率和良好的分类性能。通过分子对接验证，部分预测化合物被证实具有与雌二醇相似的受体结合能力，该模型为快速筛选潜在EDCs提供了有效工具，有助于提升化学品风险评估效率和生物多样性保护。

Comments Copyright IEEE 2026. Permission from IEEE must be obtained for all other uses, including reprinting/republishing for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work. DOI: 10.1109/B-HTC67770.2026.11502197

2605.09597 2026-05-14 cs.SI q-bio.QM

Interactively visualizing biological multilayer networks using MiRA

Shir Miryam Nehoray, Yuval Bloch, Shai Pilosof

AI总结本文介绍了一种名为 MiRA 的浏览器端交互式可视化工具，用于展示生物多层网络。该工具提供了七种互补的可视化模式和交互功能，帮助研究人员和教育者更直观地探索复杂多层网络的结构与特性。MiRA 的提出填补了生物多层网络可视化工具的空白，为相关研究提供了有效的支持。

Comments 7 pages, 2 figures. For the application page, see https://mira.ecomplab.com/

2605.07433 2026-05-14 q-bio.MN cs.LG cs.LO

Inference of Qualitative Models from Steady-State Data via Weighted MaxSMT

Ondřej Huvar, Nikola Beneš, Martin Jonáš, David Šafránek, Samuel Pastva

AI总结该研究提出了一种基于加权MaxSMT的鲁棒方法，用于从稳态数据中推断定性生物模型。该方法通过将不确定的生物观测表示为带权重的软约束，能够在存在冲突约束的情况下，识别出最符合观测的模型。研究支持布尔和多值变量域，并结合离散化和差异表达约束，成功应用于从先验知识网络中推断神经细胞分化模型。

2605.05778 2026-05-14 q-bio.QM cs.CG cs.NA math.NA

Planar morphometry via functional shape data analysis and quasi-conformal mappings

Hangyu Li, Gary P. T. Choi

AI总结本文提出了一种新的平面形态度量方法FDA-QC，结合功能形状数据分析（FDA）和拟共形映射（QC），同时考虑平面形状的边界和内部特征，克服了传统方法仅关注单一特征或无法量化形状变化的局限。该方法通过平方根速度函数对闭合曲线进行表示与配准，并利用拟共形映射将边界对应关系扩展到整个平面区域，可实现形状变形与变化量化的统一框架。实验表明，该方法在叶片和昆虫翅膀等数据集上能更有效地捕捉形态变化，为理解平面生物形状的生长与演化提供了新途径。

2604.26070 2026-05-14 cs.LG math.OC math.ST q-bio.QM stat.TH

Observable Neural ODEs for Identifiable Causal Forecasting in Continuous Time

Jennifer Wendland, Nicolas Freitag, Maik Kschischo

AI总结该论文研究了连续时间因果推理中的可识别性问题，针对存在隐藏混杂因素的动态决策场景，提出了可观测神经ODE（ObsNODE）模型。通过将控制理论中的可观测性概念与因果可识别性联系起来，论文推导出一种连续时间调整公式，并设计了能够从观测数据中重构潜在状态的神经ODE模型，从而实现对不同干预路径下结果的预测。实验表明，该方法在合成癌症数据、基于MIMIC-IV的半合成数据和真实脓毒症数据上均表现出优越的性能。

Comments 20 pages, 5 figures

2603.03337 2026-05-14 q-bio.NC

Does the motor cortex draw on a wire plane?

Patrick Iglesias-Zemmour

AI总结本文探讨了人类运动控制中三分之二幂律的几何本质，指出其等价于等仿射速度恒定。传统微分几何中，等仿射度量并非张量，需限制对称群或引入额外结构才能恢复张量性质。本文提出一种新的几何框架——“线性微分学结构”，使等仿射度量在全微分同胚群下成为真正的协变三阶张量，无需限制对称性或附加结构。该方法基于运动皮层追踪曲线而非二维区域的特性，为运动原语的几何形式化提供了新视角。

Comments 7.33 pages. This note applies the framework of Diffeology (specifically the Wire Plane) to resolve the non-tensorial nature of the equi-affine metric in motor control

2602.00586 2026-05-14 q-bio.MN cs.AI cs.LG

RAG-GNN: Integrating Retrieved Knowledge with Graph Neural Networks for Precision Medicine

Hasi Hays, William J. Richardson

AI总结该研究提出了一种名为 RAG-GNN 的端到端可训练框架，将图神经网络（GNN）与动态检索的生物医学文献知识相结合，以提升精准医学中的功能聚类性能。通过联合优化的检索投影、门控融合机制和对比对齐方法，RAG-GNN 在癌症信号通路案例中显著提升了功能聚类效果，并验证了检索信息对聚类一致性和内部紧密性的积极影响。实验表明，该方法在功能聚类任务上优于仅依赖图结构的传统方法，为精准医学中的知识整合提供了新思路。

2510.04698 2026-05-14 q-bio.NC cs.AI econ.TH

The Bayesian Origin of the Probability Weighting Function in Human Representation of Probabilities

Xin Tong, Thi Thu Uyen Hoang, Xue-Xin Wei, Michael Hahn

AI总结人类在感知概率时普遍存在系统性的扭曲，表现为典型的反S型权重模式，但其成因长期未明。本文提出一种基于贝叶斯编码-解码的解释框架，认为概率通过带有噪声的内部信号进行编码，并通过最小化贝叶斯风险进行解码。研究发现，这种编码过程中的扭曲可分解为边界回归、似然排斥和先验吸引，从而预测出反S型权重模式源于编码精度的U型分布，即在概率接近0和1时更为敏感。实验结果表明，该框架能够从数据中自然恢复出U型编码结构，并在多个任务中优于传统确定性权重函数和其它模型。

2501.07738 2026-05-14 math.PR math-ph math.MP math.ST q-bio.PE stat.TH

Mixing time for an epidemic model on graphs with external sources of infection

Wasiur R. KhudaBukhsh, Yangrui Xiang

AI总结本文研究了带有外部感染源的易感-感染-易感（SIS）传染病模型在图上的混合时间问题。作者在参数适当假设下，证明了该模型的混合时间与顶点数 $n$ 的数量级为 $Θ(n\log n)$。进一步地，他们在随机图家族（如 Erdős–Rényi 图、随机正则多重图和 Galton–Watson 树）上分析了该模型，证明在高概率下混合时间仍保持 $Θ(n\log n)$ 的数量级。

Comments improved results, minor typos fixed, 19 pages, no figures

2412.20570 2026-05-14 cond-mat.soft math.AP q-bio.SC

Voltage laws in nanodomains revealed by asymptotics and simulations of electro-diffusion equations

Frédéric Paquin-Lefebvre, Alejandro Barea Moreno, David Holcman

AI总结本研究旨在揭示纳米生理域内由离子电流驱动的局部电压分布特性，这对理解细胞活动具有重要意义。研究通过建立一种求解包含离子电流输入输出的泊松-能斯特-普兰克方程的方法，结合渐近展开和格林函数表示，推导出不同电荷条件下的离子分布和电压变化规律。研究还分析了表面曲率和窗口通道尺寸对电压动态的影响，并通过数值模拟验证了理论结果，为纳米域生理行为的表征提供了新的理论依据。

Comments 30 pages, 7 figures, 3 tables

详情

DOI: 10.1137/25M1725553
Journal ref: Multiscale Modeling and Simulation, 24(1), 365-397 (2026)

英文摘要

Characterizing the local voltage distribution within nanophysiological domains, driven by ionic currents through membrane channels, is crucial for studying cellular activity in modern biophysics, yet it presents significant experimental and theoretical challenges. Theoretically, the complexity arises from the difficulty of solving electro-diffusion equations in three-dimensional domains. Currently, there are no methods available for obtaining asymptotic computations or approximated solutions of nonlinear equations, and numerically, it is challenging to explore solutions across both small and large spatial scales. In this work, we develop a method to solve the Poisson-Nernst-Planck equations with ionic currents entering and exiting through two narrow, circular window channels located on the boundary. The inflow through the first window is composed of a single cation, while the outflow maintains a constant ionic density satisfying local electro-neutrality conditions. Employing regular expansions and Green's function representations, we derive the ionic profiles and voltage drops in both small and large charge regimes. We explore how local surface curvature and window channels size influence voltage dynamics and validate our theoretical predictions through numerical simulations, assessing the accuracy of our asymptotic computations. These novel relationships between current, voltage, concentrations and geometry can enhance the characterization of physiological behaviors of nanodomains.

URL PDF HTML ☆

赞 0 踩 0

2605.13315 2026-05-14 cs.ET cs.LG cs.NE cs.SY eess.SY q-bio.NC

Embodied Neurocomputation: A Framework for Interfacing Biological Neural Cultures with Scaled Task-Driven Validation

Johnson Zhou, Daniel Tanneberg, Forough Habibollahi, Alon Loeffler, Kiaran Lawson, Valentina Baccetti, Kwaku Dad Abu-Bonsrah, Candice Desouza, Finn Doensen, Bradley Watmuff, Daria Kornienko, Azin Azadi, Justin Leigh Bourke, Bernhard Sendhoff, Brett J. Kagan

AI总结该研究提出了一种“具身神经计算”框架，旨在解决生物神经网络与传统硅基计算接口之间的最优编码与解码问题。通过在模拟环境中对生物神经网络代理进行闭环导航任务的参数优化，研究发现了12种能够稳定学习的配置，其任务表现优于相同交互预算下的硅基深度Q网络代理。该工作为基于生物神经网络的目标导向学习提供了基础，并推动了任务驱动神经计算和跨领域基准的建立。

2605.13262 2026-05-14 cs.LG q-bio.QM

Chem-GMNet: A Sphere-Native Geometric Transformer for Molecular Property Prediction

Deepak Warrier, Raja Sekhar Pappala

AI总结本文提出了一种名为Chem-GMNet的球面原生几何变换器，用于分子属性预测任务。该模型通过将传统变换器中的各个模块替换为基于球面几何的结构，充分利用了化学结构中的几何先验信息。实验表明，Chem-GMNet在参数更少的情况下取得了优于现有方法如ChemBERTa的性能，尤其在无需预训练的情况下也表现出色。

2605.13234 2026-05-14 physics.bio-ph cond-mat.stat-mech q-bio.TO

A hyperbolic cell cycle law for early embryonic developmental timing

Adrián Aguirre-Tamaral, Johanna Royer, Magdalena Schindler-Johnson, Jun-Ru Lee, Daniel R. Amor, Nicoletta I. Petridou, Bernat Corominas-Murtra

AI总结本研究揭示了早期胚胎发育中细胞周期长度随时间变化的普遍规律，发现尽管不同物种的发育速度和分子机制各异，但其细胞复制速度均呈现一致的减缓趋势。研究提出，这种现象源于母源资源有限消耗与生化反应动力学之间的耦合，导致细胞周期长度呈双曲增长，趋近于数学上的奇点，即发育停滞。通过多物种数据验证，该模型不仅定量描述了细胞数量随时间的变化、细胞大小对周期长度的影响，还准确预测了原肠形成的时间点，揭示了资源消耗速率是调控早期发育时序的关键机制。

Comments 10 pages, 3 figures

2605.13070 2026-05-14 q-bio.CB physics.bio-ph q-bio.TO

3D mechano-geometric multicellular model of apical stem cell-driven plant morphogenesis

Naoya Kamamoto, Koichi Fujimoto

AI总结本文提出了一种基于三维力学与几何的多细胞模型，用于研究顶端干细胞驱动的植物形态建成过程。该模型结合了细胞力学、不可逆的细胞壁生长以及可变形组织结构，能够模拟细胞分裂方向对三维植物形态形成的影响。文章详细介绍了模型的构建方法，包括细胞的三角化薄壳表示、膨压处理、细胞壁弹性和应变驱动生长机制，以及支持多种分裂规则的细胞分裂算法和保持网格质量的重网格化操作，旨在为实验植物学家提供一个可访问且可定制的建模工具。

Comments 14 pages, 2 figures

2605.13009 2026-05-14 q-bio.PE physics.bio-ph

Conditioning as a route to stereotyped behavior in growing populations

Riccardo Ravasio, Kabir Husain, Constantine G. Evans, Rob Phillips, Marco Ribezzi-Crivellari, Jack W. Szostak, Arvind Murugan

AI总结本文研究了在生物系统中，如何通过条件选择（conditioning）策略实现复杂多步骤过程的可重复行为。作者提出了一种不同于传统分子机制纠错的策略，即通过淘汰未在限定时间内完成任务的个体，使得存活下来的群体行为更加有序。研究表明，这种条件选择机制可以在无需微观控制的情况下，自发地形成时间层级结构，并在某些情况下自动促进最有序群体的增长，为实现可靠行为提供了一种简单而高效的途径。

2605.12999 2026-05-14 q-bio.NC cs.LG

Implicit Behavioral Decoding from Next-Step Spike Forecasts at Population Scale

John R. Minnick, Jesus Gonzalez-Ferrer, Kamran Hussain, Jinghui Geng, Ash Robbins, Mohammed A. Mostajo-Radji, David Haussler, Jason Eshraghian, Mircea Teodorescu

AI总结该研究提出了一种基于单个Mamba模型的闭环脑机接口方法，能够在一次前向传播中同时预测神经群体活动和解码动物行为状态。通过在Neuropixels尺度的下一步尖峰计数上训练模型，并使用轻量级线性分类器读取模型预测的发放率，该方法在行为解码任务中表现优于直接使用原始尖峰数据的线性分类器。实验表明，该方法在视觉辨别任务中能够准确解码小鼠的选择和刺激侧，且在计算效率和解码性能上均优于传统方法。

Comments 21 pages, 6 figures, 5 tables; submitted to NeurIPS 2026 Neuroscience & Cognitive Science Track

2605.12992 2026-05-14 q-bio.NC cs.LG

SpikeProphecy: A Large-Scale Benchmark for Autoregressive Neural Population Forecasting

John R. Minnick, Jinghui Geng, Kamran Hussain, Jesus Gonzalez-Ferrer, Ash Robbins, Mohammed A. Mostajo-Radji, David Haussler, Jason K. Eshraghian, Mircea Teodorescu

AI总结本文提出SpikeProphecy，首个用于真实电生理记录的因果自回归神经元群体预测的大规模基准。研究通过分解群体预测性能指标，分别评估时间保真度、空间模式准确性和幅度不变对齐，揭示了传统单一相关系数所掩盖的关键信息。实验基于105个Neuropixels数据集，对比了七种不同结构的预测模型，发现不同脑区的预测能力存在显著差异，并揭示了在泊松计数域中ANN到SNN迁移的负结果。

Comments 26 pages, 4 figures, 12 tables; submitted to NeurIPS 2026 Datasets and Benchmarks Track; processed dataset at https://huggingface.co/datasets/mysteriousauthor/spikeprophecy-steinmetz (CC-BY-4.0); code at https://github.com/JohnMinnick/SpikeProphecy-A-Large-Scale-Benchmark-for-Autoregressive-Neural-Population-Forecasting

2605.12852 2026-05-14 cs.LG q-bio.QM

Multitask Multimodal Fusion with Tabular Foundation Models for Peak and Durability Prediction of Pertussis Booster Response

Divya Sitani

AI总结该研究旨在同时预测百日咳加强疫苗接种后的免疫反应峰值和持续时间，这两个过程由不同的生物学机制驱动。研究提出了一种多任务多模态融合模型，结合冻结的TabPFN-v2编码器、双标签对比损失、缺失校准的模态丢弃和注意力融合机制，以应对数据模态异质性、缺失值和任务间关联弱的挑战。实验表明，该模型在两个预测任务上均优于传统方法，且结果与免疫学机制一致，揭示了不同模态对峰值和持续时间预测的特异性贡献。

Comments 22 pages, 8 figures, 4 tables. Code available at https://github.com/Divya1205/cmi-pb-multitask

详情

英文摘要

Pertussis booster vaccination produces immune responses that vary widely across individuals in both peak magnitude and long-term durability. These two phases are governed by partly distinct biological compartments:peak reflects acute B-cell activation and antibody secretion, while durability reflects the establishment of long-term humoral memory. Yet most computational models target only one, missing the full boost-and-wane trajectory. Jointly predicting both is non-trivial because the two endpoints are biologically dissociated rather than redundant; samples are small, modalities are heterogeneous with structured missingness, and the two tasks rely on different measurement windows. We propose a multi-task contrastive multimodal fusion architecture combining frozen TabPFN-v2 per-modality encoders, a dual-label supervised contrastive loss that treats two subjects as a positive pair if they agree on the Task 1 label or the Task 2 label, modality dropout calibrated to empirical missingness, and missingness-masked attention fusion. Applied to a curated subset of the CMI-PB pertussis booster dataset (n = 158 subjects, four modalities, 44.9% with at least one modality missing; Spearman r = -0.58 between peak and durability, n = 96), the model achieves test AUROC 0.797 (95% CI [0.621, 0.948]) for peak response and 0.755 (95% CI [0.519, 0.945]) for durability, with both significant under joint label permutation (N = 1000; p = 0.002 and p = 0.045). Across logistic regression, XGBoost, and MLP baselines on raw features and on TabPFN embeddings, the proposed model is the only one whose 95% CIs lie above chance on both tasks simultaneously. Per-modality contribution analyses recover task-specific modality contributions consistent with the underlying immunology: peak prediction is carried by cytokine signatures, while durability is carried by baseline antibody features.

URL PDF HTML ☆

赞 0 踩 0

2605.12823 2026-05-14 cs.LG physics.chem-ph physics.comp-ph q-bio.BM

Hessian Matching for Machine-Learned Coarse-Grained Molecular Dynamics

Sanya Murdeshwar, Sanjit Shashi, Kevin Bachelor, William Noid, Ashwin Lokapally, Razvan Marinescu

AI总结该研究提出了一种基于Hessian向量积匹配的机器学习粗粒化分子动力学方法，旨在提升粗粒化势能函数对自由能曲率的建模能力。通过引入随机探针向量，该方法在不显式构造Hessian矩阵的情况下，将二阶曲率信息融入粗粒化势能函数中，从而提高了模拟的准确性。实验表明，该方法在多个蛋白质体系中显著优于传统的梯度匹配方法，尤其在慢模动力学指标上表现出更优的性能。

Comments 15 pages, 4 figures, 1 table

2605.12773 2026-05-14 q-bio.PE physics.soc-ph

The interplay of network structure and correlated infectious traits in epidemic models

Abhay Gupta, Nicholas W. Landry

AI总结该研究探讨了网络结构与个体易感性和传播能力相关性在传染病模型中的相互作用。作者构建了一个包含不同子群体的数学模型，每个子群体具有特定的易感性和传播能力联合分布，用于分析社区结构和节点度异质性对疫情传播的影响。通过推导基本再生数的解析表达式并结合数值模拟，验证了模型的有效性，并进一步探讨了其在社会干预策略中的应用意义。

Comments 14 pages, 6 figures

2605.12763 2026-05-14 cs.LG math.DS math.OC q-bio.NC

State-Space NTK Collapse Near Bifurcations

James Hazelden, Eric Shea-Brown

AI总结本文研究了在时间展开任务中，模型通过分岔点时的特征学习问题，提出了基于经验状态空间神经切线核（sNTK）的局部梯度下降理论。研究发现，分岔点不仅主导了学习动态，还简化了学习过程，使得sNTK可近似为一个秩一算子，从而提供了对高维递归系统局部学习几何的解析描述。通过将sNTK分解为与分岔相关的通道和残差通道，论文展示了分岔通道在常见分岔点附近的显著放大效应，并指出低秩自然梯度方法能有效解决分岔附近的学习不稳定性问题。

2605.12732 2026-05-14 q-bio.NC

Predictive Coding Light+: learning to predict visual sequences with spike timing-dependent plasticity and synaptic delays

Antony W. N'dri, Thomas Barbier, Céline Teulière, Jochen Triesch

AI总结本文提出了一种名为Predictive Coding Light+（PCL+）的脉冲神经网络架构，用于无监督处理视觉序列。该网络通过学习具有延迟的递归兴奋性连接，实现了对近期信息的短期记忆，从而能够预测未来输入。研究显示，PCL+不仅能重现视觉皮层在序列学习中的经典结果，还能在复杂的手势识别任务中填补缺失的输入，展示了脉冲神经网络在学习和预测未来方面的潜力。

Comments 13 pages, 7 figures, 2 tables, preprint

2605.12706 2026-05-14 cs.LG q-bio.GN

A Resampling-Based Framework for Network Structure Learning in High-Dimensional Data

Ziwei Huang, Zeyuan Song, Paola Sebastiani, Stefano Monti

AI总结 RSNet 是一个开源的 R 软件包，提供了一种基于重采样的框架，用于在高维数据中进行稳健且可解释的网络结构学习，旨在解决小样本量带来的挑战。该框架支持连续和离散混合数据类型的条件高斯贝叶斯网络及部分相关网络的估计，并结合多种重采样策略以适应独立或相关观测。RSNet 通过引入基于图元的拓扑分析，增强了网络结构的可解释性，并首次实现了在稀疏网络中高效构建带符号的图元度向量矩阵，从而支持对高阶网络结构的可扩展分析。

Comments 7 pages, 1 figure

2605.12662 2026-05-14 cs.LG q-bio.GN

scShapeBench: Discovering geometry from high dimensional scRNAseq data

Andrew J Steindl, João Felipe Rocha, Brian Tshilengi Di Bassinga, Zachary Warren, Matthew Scicluna, César Miguel Valdez Córdova, Shabarni Gupta, Leire Torices, Daniel Neumann, Timothy J. Mann, Ihuan Gunawan, Dhananjay Bhaskar, John G Lock, Christine L Chaffer, Guy Wolf, Smita Krishnaswamy

AI总结 scShapeBench 是一个用于单细胞转录组数据形状检测的基准数据集，旨在自动识别数据中的几何结构，如聚类、轨迹和典型模式，从而辅助选择合适的下游分析流程。该研究引入了 scReebTower 方法，基于扩散几何提取 Reeb 图，实现了可视化与分析流程的自动匹配，并提供了拓扑感知的评估指标。实验表明，scReebTower 在合成和真实数据上均优于现有方法，为单细胞数据的自动化分析提供了重要工具。

详情

英文摘要

High-dimensional point cloud data arise across many scientific domains, especially single-cell biology. The shapes or topologies of these datasets determine the types of information that can be extracted. For example, clustered data supports cell-type identification, trajectory structures support transition analysis, and archetypal structures capture continua of cellular behaviors. Existing analysis pipelines often assume a specific shape. The standard Seurat pipeline combines UMAP visualization with Louvain clustering and therefore assumes clustered data, while tools such as Monocle and SPADE assume tree-like structures, and flow-based models such as MIOFlow and Conditional Flow Matching target trajectories. Choosing which pipeline to apply is therefore often left to bioinformaticians who visually inspect datasets before selecting an analysis strategy. With the rise of agentic AI scientists, automating shape detection is increasingly important for selecting downstream analysis pipelines. To address this problem, we introduce scShapeBench, a benchmark dataset for shape detection containing both synthetic and expert-annotated single-cell datasets. Synthetic datasets are sampled from ground-truth skeleton graphs with controlled variance. Real single-cell datasets are curated from diverse sources and annotated by experts into four categories: clusters, single trajectory, multi-branching, and archetypal. We additionally introduce scReebTower, a baseline method that uses diffusion geometry to extract Reeb graphs and connect visualization with pipeline selection. We provide topology-aware evaluation metrics and compare scReebTower against PAGA and Mapper on synthetic and real data. Our results indicate that scReebTower outperforms existing baselines. Overall, our contributions span benchmarks, evaluation metrics, and a baseline for automated shape detection in single-cell data.

URL PDF HTML ☆

赞 0 踩 0

2602.12026 2026-05-14 cs.LG q-bio.QM

Protein Circuit Tracing via Cross-layer Transcoders

Darin Tsui, Kunal Talreja, Daniel Saeedi, Amirali Aghazadeh

AI总结该研究提出了一种名为ProtoMech的框架，用于揭示蛋白质语言模型（pLMs）中的计算电路，通过跨层转码器学习各层之间的稀疏潜在表示，从而捕捉模型的整体计算流程。该方法应用于ESM2模型后，在蛋白质家族分类和功能预测任务中恢复了82-89%的原始性能，并识别出仅使用不到1%潜在空间却保留高达79%模型精度的压缩电路，揭示了与结构和功能模体的对应关系。该成果为蛋白质功能设计提供了高效且精准的指导，显著优于现有方法。

Comments Accepted into ICML 2026. 32 pages, 17 figures

2602.11618 2026-05-14 cs.LG q-bio.QM

How Well Do Large-Scale Chemical Language Models Transfer to Downstream Tasks?

Tatsuya Sagawa, Ryosuke Kojima

AI总结本文研究了大规模化学语言模型（CLMs）在下游分子属性预测任务中的迁移性能。通过扩展训练资源（如模型规模、数据集大小和计算量），作者系统评估了预训练损失与下游任务表现之间的关系，发现尽管预训练损失持续下降，下游任务性能提升有限。研究还揭示了预训练指标与实际任务表现之间的差距，并分析了影响迁移效果的任务依赖性失效模式，强调了在模型选择和评估中需考虑下游任务特性的必要性。