arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.12404 2026-05-13 q-bio.NC

Empirical scaling laws in balanced networks with conductance-based synapses

Vicky Zhu, Gabriel Ocker, Robert Rosenbaum

AI总结本文研究了在平衡网络中使用电导型突触模型对膜电位波动的影响。作者通过计算机模拟发现，尽管电导型突触模型单独使用时会导致膜电位波动过小，而电流型突触模型引入尖峰时间相关性时又会导致波动过大，但将两者结合使用可以产生更接近实际的中等波动水平。该研究揭示了在构建更真实的神经网络模型时，多个现实假设的协同作用至关重要。

2605.12286 2026-05-13 q-bio.GN cs.AI

Set-Aggregated Genome Embeddings for Microbiome Abundance Prediction

Younhun Kim, Georg K. Gerber, Travis E. Gibson

AI总结该研究探讨了是否仅通过微生物群落成员的原始DNA序列即可预测其群落层面的丰度特征。研究提出了一种基于集合聚合基因组嵌入（SAGE）的方法，结合基因组语言模型（GLMs）的少样本学习能力，用于预测微生物群落的丰度分布。实验表明，该方法在新型基因组上的泛化能力优于传统生物信息学方法，并验证了群落层面潜在表示对性能提升的关键作用。

2605.10818 2026-05-13 cs.LG q-bio.NC

On periodic distributed representations using Fourier embeddings

Jakeb Chouinard

AI总结本文研究了如何利用傅里叶嵌入构建周期性分布式表示，以更好地处理角度等周期性信号。作者提出使用高维实值周期嵌入，解决传统标量角度表示在处理接近角度时的困难，并通过点积相似性控制不同核函数的形状。研究重点在于利用空间语义指针这一神经可解释的表示方法，形式化定义狄利克雷核和周期高斯核，为周期性信号的建模提供了新的思路。

2604.16642 2026-05-13 q-bio.QM q-bio.CB q-bio.GN stat.AP

Geometric coherence of single-cell CRISPR perturbations reveals regulatory architecture and predicts cellular stress

Prashant C. Raju

AI总结该研究提出了一种新的几何稳定性度量方法Shesha，用于评估单细胞CRISPR扰动响应的方向一致性，揭示了基因调控结构并预测细胞应激状态。通过分析多个CRISPR数据集，研究发现稳定性与扰动效应大小高度相关，但在某些情况下二者分离，揭示了不同调控因子的生物学特性。该方法为筛选实验中的靶点优先级排序、细胞制造中的表型质量控制以及计算扰动预测的评估提供了新视角。

详情

英文摘要

Genome engineering has achieved remarkable sequence-level precision, yet predicting the transcriptomic state that a cell will occupy after perturbation remains an open problem. Single-cell CRISPR screens measure how far cells move from their unperturbed state, but this effect magnitude ignores a fundamental question: do the cells move together? Two perturbations with identical magnitude can produce qualitatively different outcomes if one drives cells coherently along a shared trajectory while the other scatters them across expression space. We introduce a geometric stability metric, Shesha, that quantifies the directional coherence of single-cell perturbation responses as the mean cosine similarity between individual cell shift vectors and the mean perturbation direction. Across five CRISPR datasets (2,200+ perturbations spanning CRISPRa, CRISPRi, and pooled screens), stability correlates strongly with effect magnitude (Spearman $ρ=0.75-0.97$), with a calibrated cross-dataset correlation of 0.97. Crucially, discordant cases where the two metrics decouple expose regulatory architecture: pleiotropic master regulators such as CEBPA and GATA1 pay a "geometric tax," producing large but incoherent shifts, while lineage-specific factors such as KLF1 produce tightly coordinated responses. After controlling for magnitude, geometric instability is independently associated with elevated chaperone activation (HSPA5/BiP; $ρ_{partial}=-0.34$ and $-0.21$ across datasets), and the high-stability/high-stress quadrant is systematically depleted. The magnitude-stability relationship persists in scGPT foundation model embeddings, confirming it is a property of biological state space rather than linear projection. Perturbation stability provides a complementary axis for hit prioritization in screens, phenotypic quality control in cell manufacturing, and evaluation of in silico perturbation predictions.

URL PDF HTML ☆

赞 0 踩 0

2603.21919 2026-05-13 cond-mat.soft q-bio.SC

Mechanical stress induced by the polymerisation of an active gel near a surface

Kristiana Mihali, Dennis Wörthmüller, Pierre Sens

AI总结该研究探讨了细胞膜附近活性凝胶聚合过程中产生的机械应力对膜形变的影响。通过建立可压缩活性凝胶的流体力学模型，研究了肌动蛋白流动、密度弛豫及与膜的摩擦如何在线性形变范围内诱导膜上的正交和切向应力。研究结合解析解与有限元方法，揭示了压缩性、界面摩擦及肌动蛋白周转率对膜稳定性的影响，并确定了导致膜线性不稳定的条件。

2510.00733 2026-05-13 cs.LG cs.AI q-bio.QM

Neural Diffusion Processes for Physically Interpretable Survival Prediction

Alessio Cristofoletto, Cesare Rollo, Giovanni Birolo, Piero Fariselli

AI总结本文提出了一种名为DeepFHT的生存分析框架，将深度神经网络与随机过程理论中的首次穿越时间（FHT）分布相结合，将事件发生时间建模为潜在扩散过程首次到达吸收边界的时间。该方法通过神经网络将输入变量映射到具有物理意义的参数，如初始条件、漂移和扩散系数，从而在无需假设比例风险的前提下，生成闭式生存和风险函数。实验表明，DeepFHT在预测性能上与现有先进方法相当，同时保持了物理可解释的参数化特性，有助于揭示输入特征与风险之间的关系。

2507.16179 2026-05-13 cond-mat.soft q-bio.BM

Cooperation and competition of basepairing and electrostatic interactions in mixtures of DNA nanostars and polylysine

Gabrielle R. Abraham, Tianhao Li, Anna Nguyen, William M. Jacobs, Omar A. Saleh

AI总结该研究探讨了DNA纳米星与聚赖氨酸混合体系中碱基配对与静电相互作用的协同与竞争效应。通过实验与理论结合，研究了温度、离子强度和组分比例对相分离行为的影响，发现两者在高盐和高温条件下协同作用，稳定共凝集相，并形成多相共存现象。研究还揭示了不同盐浓度下相分离的动力学路径及非平衡聚集行为，展示了多种相互作用模式对生物分子体系相行为复杂性的显著影响。

详情

DOI: 10.1021/jacs.5c12436
Journal ref: J. Am. Chem. Soc. 2025, 147, 46, 42452-42461
Comments: Include supplementary information

英文摘要

Phase separation in biomolecular mixtures can result from multiple physical interactions, which may act either complementarily or antagonistically. In the case of protein-nucleic acid mixtures, charge plays a key role but can have contrasting effects on phase behavior. Attractive electrostatic interactions between oppositely charged macromolecules are screened by added salt, reducing the driving force for coacervation. By contrast, base pairing interactions between nucleic acids are diminished by charge repulsion and thus enhanced by added salt, promoting associative phase separation. To explore this interplay, we combine experiment and theory to map the complex phase behavior of a model solution of poly-L-lysine (PLL) and self-complementary DNA nanostars (NS) as a function of temperature, ionic strength, and macromolecular composition. Despite having opposite salt dependences, we find that electrostatics and base pairing cooperate to stabilize NS-PLL coacervation at high ionic strengths and temperatures, leading to two- or three-phase coexistence under various conditions. We further observe a variety of kinetic pathways to phase separation at different salt concentrations, resulting in the formation of nonequilibrium aggregates or droplets whose compositions evolve on long timescales. Finally, we show that the cooperativity between electrostatics and base pairing can be used to create immiscible coacervates that partition various NS species at intermediate salt concentrations. Our results illustrate how the interplay between distinct interaction modes can greatly increase the complexity of the phase behavior relative to systems with a single type of interaction.

URL PDF HTML ☆

赞 0 踩 0

2412.04172 2026-05-13 q-bio.NC math.DS

Activity-dependent neuromodulation and calcium homeostasis cooperate to produce robust and modulable neuronal function

Arthur Fyon, Guillaume Drion

AI总结本研究探讨了活动依赖性神经调节与钙稳态如何协同作用，以维持神经元功能的稳定性和可调性。通过构建基于电导的计算模型，研究发现一种受生物机制启发的神经调节控制器能够与钙稳态机制协同工作，既保持神经元放电模式，又维持细胞内钙浓度。研究还表明，这种协同依赖于电导空间中的交集区域，并指出增强神经元退化性有助于实现更可靠的调控，该机制在神经网络层面也具有广泛适用性。

详情

DOI: 10.1371/journal.pcbi.1014177
Journal ref: PLOS Computational Biology 22(4): e1014177 (2026)

英文摘要

Neurons rely on two interdependent mechanisms, homeostasis and neuromodulation, to maintain robust and adaptable functionality. Calcium homeostasis stabilizes neuronal activity by adjusting ionic conductances, whereas neuromodulation dynamically modifies ionic properties in response to external signals carried by neuromodulators. Combining these mechanisms in conductance-based models often produces unreliable outcomes, particularly when sharp neuromodulation interferes with calcium-homeostatic tuning. This study explores how a biologically inspired neuromodulation controller can harmonize with calcium homeostasis to ensure reliable neuronal function. Using computational models of stomatogastric ganglion and dopaminergic neurons, we demonstrate that controlled neuromodulation preserves neuronal firing patterns while calcium homeostasis simultaneously maintains target intracellular calcium levels. Unlike sharp neuromodulation, the neuromodulation controller integrates activity-dependent feedback through mechanisms mimicking G-protein-coupled receptor cascades. The interaction between these controllers critically depends on the existence of an intersection in conductance space, representing a balance between target calcium levels and neuromodulated firing patterns. Maximizing neuronal degeneracy enhances the likelihood of such intersections, enabling robust modulation and compensation for channel blockades. We further show that this controller pairing extends to network-level activity, reliably modulating the rhythmic activity of central pattern generators. This study highlights the complementary roles of calcium homeostasis and neuromodulation, proposing a unified control framework for maintaining robust and adaptive neural activity under physiological and pathological conditions.

URL PDF HTML ☆

赞 0 踩 0

2410.00532 2026-05-13 q-bio.QM

smICA: Open-Source Software for Quantitative, Lifetime-Resolved Mapping of Absolute Fluorophore Concentrations in Living Cells

Tomasz Kalwarczyk, Grzegorz Bubak, Jarosław Michalski, Antoni Lis, Karina Kwapiszewska, Marta Pilz, Adam Mamot, Olga Perzanowska, Joanna Kowalska, Jacek Jemielity, Robert Hołyst

AI总结该研究提出了一种名为smICA的开源软件工具，用于定量解析活细胞中荧光分子的绝对浓度及其寿命信息。该方法通过单分子成像数据实现高灵敏度的浓度映射，仅需少量光子即可完成细胞分割与信号过滤，显著提升了测量效率。研究通过体外和体内实验验证了方法的可靠性，并展示了其在监测活细胞内荧光标记mRNA浓度动态变化中的应用，为单细胞层面的定量生物学研究提供了有力工具。

2405.02038 2026-05-13 q-bio.NC math-ph math.MP q-bio.CB

Dimensionality reduction of neuronal degeneracy reveals two interfering physiological mechanisms

Arthur Fyon, Alessio Franci, Pierre Sacré, Guillaume Drion

AI总结该研究探讨了神经元在离子通道组成高度可变的情况下如何维持稳定功能的问题。通过降维分析，研究发现了通道电导空间中的两个主要维度，揭示了两个相互干扰的生理机制，这些机制可通过反馈调节机制解释。研究为理解离子通道组成与神经元电生理活动之间的关系提供了定量见解，并提出了一个无需依赖模型的可靠神经调控规则。

2605.11764 2026-05-13 cs.LG q-bio.BM

Decomposing the Generalization Gap in PROTAC Activity Prediction: Variance Attribution and the Inter-Laboratory Ceiling

Thor Klamt, Wolfgang Nejdl, Ming Tang

AI总结该研究探讨了机器学习预测PROTAC（蛋白降解靶向嵌合体）生物活性时存在的泛化差距问题，指出在不同实验室间测量变异是导致这一差距的主要因素。通过分析多个模型在不同评估协议下的表现，研究揭示了跨实验室数据差异对预测性能的显著影响，并提出了分解该差距的框架。此外，研究还开发了PROTAC-Bench数据集及相关评估工具，为后续研究提供了重要资源。

详情

Comments: 32 pages, 11 figures, 11 tables. Dataset: https://huggingface.co/datasets/ThorKl/protac-bench (CC-BY-4.0). Code: https://github.com/ThorKlm/PROTAC-Bench (MIT)

英文摘要

Machine-learning predictors of biochemical activity often exhibit large random-split-to-leave-one-target-out generalisation gaps that have been documented but not decomposed. We frame this as an evaluation-science question and use targeted protein degradation as the empirical test bed. PROTACs (proteolysis-targeting chimeras) are heterobifunctional small molecules that induce targeted protein degradation, with more than forty candidates currently in clinical trials; published predictors report AUROC of 0.85 to 0.91 under random-split cross-validation, while the leave-one-target-out (LOTO) protocol of Ribes et al. reduces performance to approximately 0.67. Random splits reward within-target interpolation, whereas LOTO measures the novel-target prediction that de-novo design depends on. We decompose this gap and identify inter-laboratory measurement variance as the dominant component, anchored by a within-target cross-laboratory cascade bounding the inter-laboratory contribution at 0.124 AUROC, well above the 0.05 contribution from binarisation-threshold choice. Across eight published architectures and ESM-2 protein language models up to 3B parameters, LOTO AUROC plateaus near 0.67, with a comparable plateau under SMILES-level deduplication; a 21-dimensional 2000-trial hyperparameter optimisation cannot break this ceiling, and the rank-1 single-seed configuration regresses by 0.161 AUROC under multi-seed validation, matching a closed-form selection-bias prediction (Bailey and Lopez de Prado, 2014). Few-shot k=5 stratified per-target retraining combined with ADMET features lifts 65-target LOTO AUROC from 0.668 to 0.7050, and post-hoc Platt scaling recovers raw output to within the 0.05 well-calibrated threshold. We release PROTAC-Bench (10,748 measurements, 173 targets, 65 LOTO folds), the variance-decomposition framework, the per-target calibration protocol, and the evaluation code.

URL PDF HTML ☆

赞 0 踩 0

2605.11718 2026-05-13 q-bio.NC cs.AI cs.NE

Self-organized MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization

Zhaotian Gu, Molan Li, Jie Su, Chang Liu, Tianyi Qian, Dahui Wang

AI总结本研究探讨了灵长类视觉皮层背侧流中方向选择性图（如MT区）的计算起源问题。通过引入一种时空拓扑深度神经网络（TDANN），结合自监督对比学习与生物启发的空间损失函数，模型在自然视频训练中自发生成了类似大脑的运动方向图和拓扑针轮结构。研究揭示了MT区的方向选择特性源于任务驱动的判别压力与空间正则化之间的优化权衡，其表征定量匹配了猕猴MT区的生理基线，为背侧与腹侧视觉流的计算机制统一提供了新见解。

2605.11675 2026-05-13 q-bio.QM q-bio.NC

Accounting for Missed Events in the Bayesian Modeling of IP3R Multimodal Gating

Schayma Ben Marzougui, Audrey Denizot, Hugues Berry

AI总结该研究针对IP3R通道多模态门控行为的建模问题，提出了一种基于贝叶斯方法的改进模型，用于解决全细胞膜片钳技术因时间分辨率不足而遗漏短时事件所带来的偏差。通过引入分层马尔可夫链模型并直接在似然函数中整合遗漏事件的修正，该方法显著提升了参数估计和模型评估的准确性。研究发现，考虑遗漏事件后，IP3R通道的Park和Drive两种模式均基于相同的三态马尔可夫模型，但具有不同的动力学参数，且中等浓度钙离子显著抑制Drive到Park的转换，揭示了IP3R通道在不同钙浓度下的门控机制差异。

2605.11648 2026-05-13 q-bio.QM

NORI: Fast probabilistic inference for ambiguous observation-entity mappings

Simon Van de Vyver, Tibo Vande Moortele, Ben-Björn Binke, Pieter Verschaffelt, Peter Dawyndt, Bart Mesuere

AI总结 NORI 是一种快速的概率推理方法，用于解决实验观测与生物实体之间模糊映射的问题，其速度比现有方法快几个数量级。该方法支持大规模数据分析和广泛的超参数优化，能够应用于蛋白质推断、组学领域的分类与功能分析等生物信息学任务，显著提升了相关研究的效率和适用范围。

2605.11598 2026-05-13 cs.LG cs.AI cs.DB q-bio.QM

EpiCastBench: Datasets and Benchmarks for Multivariate Epidemic Forecasting

Madhurima Panja, Danny D'Agostino, Huitao Li, Tanujit Chakraborty, Nan Liu

AI总结随着数据驱动方法在公共卫生决策中的广泛应用，传染病预测已成为重要研究领域。为解决现有研究缺乏高质量多变量预测基准的问题，本文提出了EpiCastBench，一个包含40个精心挑选的多变量传染病数据集的大型基准框架，涵盖多种传染病和地理区域，具有不同的时间粒度、序列长度和稀疏性。研究通过统一的评估设置对15种多变量预测模型进行了系统比较，所有数据和代码均已公开，有助于推动传染病预测方法的发展与验证。

2605.11450 2026-05-13 q-bio.MN

Scalable vertex guided filtrations identify structurally relevant genes in cancer networks

Edmara Viana, Rodrigo Henrique Ramos, Flávia Raquel Gonçalves Carneiro, Cynthia de Oliveira Lage Ferreira

AI总结该研究提出了一种基于顶点函数的过滤方法（VFB），用于分析癌症相关蛋白网络中的拓扑结构，以识别具有结构意义的基因。相比传统的维托里斯-里斯（VR）过滤方法，VFB在计算效率上更具优势，并能够有效捕捉二阶和三阶拓扑结构（Betti-2和Betti-3），从而发现新的驱动基因并验证其生物学意义。该方法为大规模网络分析提供了可扩展且具有生物解释性的新工具。

2605.11389 2026-05-13 math.DS q-bio.MN

Bistability, Absolute Concentration Robustness, and Hysteresis in Dual-Site Futile Cycles with Bifunctional Enzymes

Badal Joshi, Tung D. Nguyen, Matthew D. Johnston

AI总结本文研究了由双功能酶催化的双位点无用循环系统，探讨了其在稳态数量、稳定性以及分岔结构等方面的动力学特性。通过数学分析，揭示了四类网络在边界稳态、双稳态和绝对浓度鲁棒性等方面的差异，并发现其中一类网络同时表现出双稳态和绝对浓度鲁棒性，系统可以在不同中间浓度下达到相同最终产物浓度的两个稳定状态。

2605.11368 2026-05-13 cs.LG cs.AI q-bio.GN

LPDP: Inference-Time Reward Control for Variable-Length DNA Generation with Edit Flows

Jeongchan Kim, Yunkyung Ko, Jong Chul Ye

AI总结本文研究了如何利用Edit Flows在DNA序列生成过程中实现推理阶段的奖励控制。提出了一种名为LPDP的方法，它是一种无需训练、关注中间状态和动作的局部重解算操作符，能够在生成可变长度DNA序列时进行高效的编辑操作。LPDP通过在每一步推理中评估单步根编辑、保留最优根编辑集，并在局部范围内求解离散优化问题，从而提升生成序列的质量和生物合理性，适用于增强子优化和基因剪接边界修复等任务。

2605.11258 2026-05-13 cs.AI cs.CL q-bio.QM

Unlocking LLM Creativity in Science through Analogical Reasoning

Andrew Shen, Shaul Druckmann, James Zou

AI总结本文研究如何通过类比推理（Analogical Reasoning, AR）提升大型语言模型（LLM）在科学问题中的创造力，特别是在生物医学等复杂领域。作者发现现有LLM在开放性问题求解中容易陷入模式崩溃，生成多样性不足的解，为此提出AR方法，通过跨领域问题的类比结构生成新颖解决方案。实验表明，AR显著提升了生成解的多样性和新颖性，并在多个生物医学任务中取得了优于现有方法的性能，验证了其在实际应用中的有效性。

2605.11221 2026-05-13 q-bio.QM cs.LG

Beyond Manual Curation: Augmenting Targeted Protein Degradation Databases via Agentic Literature Extraction Workflows

Yaochen Rao, Farzaneh Jalalypour, N. M. Anoop Krishnan, Rocío Mercado

AI总结该研究旨在解决靶向蛋白降解（TPD）领域中实验数据缺乏结构化的问题，提出了一种结合专家反馈的大型语言模型（LLM）工作流，用于自动化从科学文献中提取关键实验信息。该方法通过少量专家标注的样本优化提示指令，并在分子胶和PROTAC两类TPD化合物的数据库中实现了高精度的数据提取与扩展，显著提升了数据库规模与实验信息的完整性。研究成果为TPD研究及更广泛的科学文献数据整理提供了可复用的工具和数据资源。

2605.11189 2026-05-13 cs.LG q-bio.BM

Deep Learning for Protein Complex Prediction and Design

Ziwei Xie

AI总结本文研究如何利用深度学习准确建模和设计蛋白质复合物结构，这是计算结构生物学中的核心问题，对理解细胞功能和开发药物具有重要意义。研究提出了专门针对蛋白质结构层次特性的深度学习架构，并设计了高效的搜索算法，以在庞大的序列空间中寻找相互作用的同源蛋白，从而提升复合物结构预测和蛋白质序列设计的准确性。

2605.11028 2026-05-13 q-bio.OT

Morpho-Physiological and Genetic Diversity of Crataegus Taxa (Rosaceae) in Selected Locations of Iraqi Kurdistan-Region

Karzan Ezzalddin Mohammed

AI总结本文研究了伊拉克库尔德斯坦地区六十一份山楂（Crataegus spp.）种质资源的形态、生理生化及遗传多样性。通过形态学和分子标记分析，鉴定出七种山楂类群，包括五个物种和两个杂交种，并发现不同生态型在植株类型、生殖阶段及果实形态等方面存在显著差异。研究结果揭示了果实形态和理化特性在不同种质间具有高度变异，为山楂资源的保护和利用提供了重要依据。

详情

Comments: 96 pages

英文摘要

One of the great phytogeography zones of semi-arid lands in the world is the Kurdistan region of Iraq which hosts many important fruit species due to its geographical location and ecology. Mountain Hawthorn (Crataegus spp.) is a vital wild edible deciduous fruit tree of the genus Crataegus for the region, which is highly beneficial for ornamental, economical, industrial and medicinal uses. In the present study, morphological, phytochemical and molecular marker systems were applied on sixty-one Hawthorn accessions from different locations in the Iraqi Kurdistan region during April 2022 to September 2023. Phenotypic markers have proven to be extremely useful in studies of genetic diversity in Hawthorn genotypes, the results of the present morphological study showed that there are seven taxa (five species, two hybrids) were observed including, Crataegus azarolus, Crataegus meyrei, Crataegus monogyna, Crataegus orientalists, Crataegus pentagyna, Crataegus azarolus x Crataegus meyrei and Crataegus azarolus x Crataegus pentagyna. There was significant variation among different ecotypes in terms of plant type, reproductive stage, and fruit morphology and production uses. Fruit Physio-morphological data revealed a high level of significant variability (P 0.01) among accessions based on the analysis of variance. The most important characteristics for explaining fruit morphological variability `were 11 varbales including fruit weight (FW), fruit length (FL), fruit width (FW), seed length (SL), seed width (SW), number of seeds per fruits (NSF), volume solution (VS), fruit fresh weight (WOF), seed weight (WS), Potentional of hydrogen (pH) and mositure content (MC). They all are significantly different for all the traits measured for the studied accessions.

URL PDF HTML ☆

赞 0 踩 0

2605.11022 2026-05-13 q-bio.GN cs.AI cs.ET cs.LG

SCOPE: Siamese Contrastive Operon Pair Embeddings for Functional Sequence Representation and Classification

Akarsh Gupta, Kenneth Rodrigues, Sagnik Chatterjee

AI总结该研究提出了一种名为SCOPE的Siamese对比操作子对嵌入方法，用于功能序列的表示与分类。通过融合嵌入空间进行分类，该方法在操作子对识别任务中表现出色，其ROC-AUC达到0.71，与当前最先进的模型相当。研究发现，基于蛋白质语言模型的嵌入已能有效捕捉功能关系，为大规模微生物基因组的操作子识别提供了可行且可扩展的解决方案。

详情

英文摘要

Identifying operons is a fundamental step in understanding prokaryotic gene regulation, as classifying genes into operons supports the reconstruction of regulatory networks, functional annotation of unannotated genes, and drug candidate development. Experimental approaches such as RT-PCR and RNA-seq provide precise evidence of operon structure, but are laborious and largely limited to well-studied model organisms, making scalable computational methods essential for genome-wide operon identification. Prior computational approaches have employed traditional classifiers such as logistic regression and decision trees, motivating our use of these as physicochemical baselines. The DGEB benchmark evaluates operonic pair classification by embedding each sequence independently with a pre-trained protein language model and computing pairwise cosine similarity. In contrast, our Siamese MLP learns a classifier over the fused embedding space, which is theoretically better motivated for binary classification, as cosine similarity can yield meaningless scores depending on the regularization of the embedding model. While protein language model embeddings substantially outperform physicochemical features in ROC-AUC, a learned Siamese MLP head does not significantly improve over unsupervised cosine similarity in Average Precision, suggesting that the geometry of the embedding space already captures the functional relationships needed for this task. Nonetheless, our Siamese MLP achieves a ROC-AUC of 0.71, competitive with state-of-the-art models on the DGEB leaderboard. These findings indicate that protein language model embeddings are a viable, scalable foundation for operonic pair classification across diverse microbial genomes, with implications for automated genome annotation, regulatory network reconstruction, and characterization of organisms lacking experimental operon annotations.

URL PDF HTML ☆

赞 0 踩 0

2605.10994 2026-05-13 q-bio.NC q-bio.OT

Internally triggered retrospective learning in neural networks

Arturo Tozzi

AI总结本文提出了一种神经网络内部触发的回顾学习方法，区别于传统依赖外部输入驱动的持续参数更新方式，该方法通过网络自身生成的事件触发参数更新。在网络运行过程中，突触相互作用被累积为编码近期共激活模式的潜在痕迹，同时内部预测机制持续计算预测状态与实际状态之间的差异，当差异超过自适应阈值时触发学习事件，从而实现对过去活动的有选择性整合。该方法能够减少不必要的参数漂移，适用于需要对稀有或重要输入进行选择性适应的多种应用场景。

详情

Comments: 13 pagews, 2 figures

英文摘要

Learning in artificial neural networks usually relies on continuous, externally driven weight updates, in which parameters are modified at every step in response to incoming data, error signals or reward feedback. In this setting, routine and informative inputs contribute similarly to parameter adjustment. We introduce a learning approach in which parameter updates are governed by internally generated events arising from the network own representational dynamics. During ongoing activity, synaptic interactions are accumulated as latent traces encoding recent coactivation patterns, without immediately modifying the underlying parameters. In parallel, an internal predictive process estimates the evolving latent state, while a scalar measure of discrepancy between predicted and observed states is continuously computed. When discrepancy exceeds an adaptive threshold derived from recent error statistics, a learning event is triggered, inducing a retrospective update selectively integrating past activity into the current configuration. We performed simulations using a minimal neural network exposed to structured sequential inputs with transient perturbations. We found that learning occurs through sparse, temporally localized events associated with increases in prediction error, leading to stepwise changes in synaptic efficacy and discrete transitions in latent state organization. By selectively reorganizing parameters in response to internally detected discrepancies, our episodic updating may reduce unnecessary parameter drift while preserving informative patterns. Potential applications include systems requiring selective adaptation to rare or informative inputs such as physiological, industrial or environmental monitoring, edge computing under limited energy budgets, autonomous systems operating in dynamic conditions and sequential computational data processing.

URL PDF HTML ☆

赞 0 踩 0

2605.10985 2026-05-13 cs.LG cs.AI q-bio.BM

Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning

Siddhant Dutta, Edward Tan Beng Wai, Soumick Sarker, Pasan Gunawardane, Jagath C. Rajapakse

AI总结该研究提出了一种可解释的蛋白质语言模型表示方法，通过可微分图划分技术将ESM-2的表示映射到蛋白质接触图，并利用SoftBlobGIN网络学习功能子结构，从而提升预测任务的性能与可解释性。该方法无需重新训练语言模型，仅增加少量参数，即可在酶分类、功能预测等任务中取得优异表现，并能自动识别生物意义的功能区域，如活性位点残基和催化接触模式。实验表明，该框架显著提升了结构解释的准确性与可审计性，为蛋白质语言模型提供了结构层面的透明性支持。

详情

Comments: 19 Pages, 8 figures, 11 Tables, Submitted to NeurIPS 2026

英文摘要

Protein language models such as ESM-2 learn rich residue representations that achieve strong performance on protein function prediction, but their features remain difficult to interpret as structural $\&$ evolutionary signals are encoded in dense latent spaces. We propose a plug-$\&$-play framework that projects ESM-2 representations onto protein contact graphs $\&$ applies $\textbf{SoftBlobGIN}$, a lightweight Graph Isomorphism Network with differentiable Gumbel-softmax substructure pooling, to perform structure-aware message passing $\&$ learn coarse functional substructures for downstream prediction tasks. Across enzyme classification, SoftBlobGIN achieves 92.8\% accuracy $\&$ 0.898 macro-F1. Unlike post hoc analysis of protein language models alone, our method produces directly auditable structural explanations: GNNExplainer recovers biologically meaningful active-site residues, spatially localized functional clusters, $\&$ catalytic contact patterns. On binding-site detection, SoftBlobGIN improves residue AUROC from $0.885$ using an ESM-2 linear probe to $0.983$, indicating that these structural explanations are not recoverable from language-model features alone. Learned blob partitions provide an additional layer of interpretability by automatically grouping residues into functional substructures, with blobs containing annotated active-site residues showing $1.85\times$ higher importance than other blobs ($ρ{=}0.339$, $p{=}0.009$), without any active-site supervision. Our framework requires no retraining of the language model, adds only $\sim$1.1M parameters, $\&$ generalises across ProteinShake tasks, achieving $F_{\max}$ of $0.733$ on Gene Ontology prediction $\&$ AUROC of $0.969$ on binding-site detection. We position this as an interpretable structural companion to protein language models that makes their predictions more transparent $\&$ auditable.

URL PDF HTML ☆

赞 0 踩 0

2605.10979 2026-05-13 q-bio.OT

Statin Recommendations among US Adults with the 2026 Dyslipidemia Guidelines

James A. Diao, Thomas A. Buckley, Andrew Z. Zhou, Smaraki Dash, Rishi K. Wadhera, Arjun K. Manrai

AI总结该研究分析了2026年美国血脂异常指南对中老年人群他汀类药物推荐的影响，发现相较于2018年指南，新指南在一级推荐标准下减少了约300万人的他汀推荐，而在引入30年风险评估的二级推荐标准下，推荐人数却增加了约2080万。研究指出，新指南对不同人群的影响存在显著差异，尤其对中青年人群的推荐大幅增加，突显了30年风险评估在扩大用药资格中的关键作用。

详情

英文摘要

Importance: The 2026 multisociety dyslipidemia guideline recommended the PREVENT equations in place of the PCE equations, introduced 30-year risk assessment as a new treatment pathway, and lowered risk-based treatment thresholds. The net population impact of these concurrent changes on statin recommendations is unknown. Objective: To estimate changes in statin recommendations under 2026 PREVENT-based dyslipidemia guidelines compared with 2018 PCE-based guidelines. Design and Participants: Cross-sectional analysis of pooled data from NHANES, spanning 2011-2023 and comprising 24,199 participants aged 30-79 years. Main Outcomes and Measures: Number and proportion of US adults receiving or recommended for statin therapy. Results: At the class 1 threshold, the number of US adults receiving or recommended for statin therapy decreased by an estimated 3.0 million (95% CI, 2.3 million to 3.6 million), with larger reductions among Black adults (-4.2 percentage points [pp]), men (-4.0pp), and adults aged 50-69 years (-5.6pp). At the class 2 threshold--which additionally recommends statins for adults aged 30-59 years based on 30-year risk--the number of adults recommended increased by an estimated 20.8 million (95% CI, 19.6 million to 22.0 million), or +11.6pp. The increase was largest among adults aged 50-59 years (+19.7pp) and 40-49 years (+14.8pp). Conclusions: The net population impact of the 2026 dyslipidemia guidelines depends critically on which recommendation class is applied. At the class 1 threshold, statin recommendations decreased modestly; at the class 2 threshold, inclusion of 30-year risk assessment substantially expanded recommendations, particularly among younger adults. These divergent effects underscore the importance of the 30-year risk criterion as a major driver of new eligibility and the need for outcomes and equity monitoring during guideline implementation.

URL PDF HTML ☆

赞 0 踩 0

2605.10840 2026-05-13 cs.LG cs.AI q-bio.QM

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

Yixuan Yang, Mehak Arora, Ryan Zhang, Baraa Abed, Junseob Kim, Tilendra Choudhary, Md Hassanuzzaman, Kevin Zhu, Ayman Ali, Chengkun Yang, Alasdair Edward Gent, Victor Moas, Rishikesan Kamaleswaran

AI总结本文提出 Clin-JEPA，一种用于电子健康记录（EHR）患者轨迹的多阶段协同训练框架，旨在通过联合嵌入预测预训练（JEPA）实现对患者轨迹的预测和多种下游风险预测任务的统一建模。该方法通过五阶段预训练课程，稳定地协同训练一个基于 Qwen3-8B 的编码器和一个高参数量的潜在轨迹预测器，解决了传统 JEPA 方法中预测器与编码器无法有效协同的问题。实验表明，Clin-JEPA 在 MIMIC-IV 数据集上显著优于现有方法，在多个风险预测任务中表现出优越的性能。

详情

Comments: 17 pages, 4 figures, 8 tables. Code: https://github.com/YeungYathin/Clin-JEPA

英文摘要

We present Clin-JEPA, a multi-phase co-training framework for joint-embedding predictive (JEPA) pretraining on EHR patient trajectories. JEPA architectures have enabled latent-space planning in robotics and high-quality representation learning in vision, but extending the paradigm to EHR data -- to obtain a single backbone that simultaneously forecasts patient trajectories and serves diverse downstream risk-prediction tasks without per-task fine-tuning -- remains an open challenge. Existing JEPA frameworks either discard the predictor after pretraining (I-JEPA, V-JEPA) or train it on a frozen pretrained encoder (V-JEPA 2-AC), leaving the encoder unaware of the rollout signal that the retained predictor must use at inference; co-training the encoder and predictor under a shared JEPA prediction objective would supply this grounding, but naïve co-training is unstable, with representation collapse and online/target drift causing autoregressive rollout to diverge. Clin-JEPA's five-phase pretraining curriculum -- predictor warmup, joint refinement, EMA target alignment, hard sync, and predictor finalization -- addresses each failure mode by phase, stably co-training a Qwen3-8B-based encoder and a 92M-parameter latent trajectory predictor. On MIMIC-IV ICU data, three independent evaluations support the framework: (1) latent $\ell_1$ rollout drift uniquely converges ($-$15.7%) over 48-hour horizons while baselines and ablations diverge (+3% to +4951%); (2) the encoder learns a clinically discriminative latent geometry (deteriorating-patient cohorts displace 4.83$\times$ further than stable patients in latent space, vs $\leq$2.62$\times$ for baseline encoders); (3) a single backbone outperforms strong tabular and sequence baselines on multi-task downstream evaluation. Clin-JEPA achieves mean AUROC 0.851 on ICareFM EEP and 0.883 on 8 binary risk tasks (+0.038 and +0.041 vs baseline average).

URL PDF HTML ☆

赞 0 踩 0

2605.09964 2026-05-13 cs.AI q-bio.QM

Learning the Interaction Prior for Protein-Protein Interaction Prediction: A Model-Agnostic Approach

Ziqi Gao, Chenyi Zi, Zijing Liu, Ziqiao Meng, Yu Li, Jia Li

AI总结蛋白质-蛋白质相互作用（PPIs）在细胞功能和疾病机制中起着关键作用。当前基于学习的PPI预测方法主要关注学习蛋白质的表示，却忽略了设计专门的分类头，通常依赖于缺乏生物学依据的通用聚合方法。本文提出了一种基于生物“L3规则”的模型无关PPI分类器L3-PPI，通过引入L3路径正则化的图提示学习方法，将蛋白质嵌入对的分类任务转化为图级别的分类任务，有效提升了预测性能。

2602.17739 2026-05-13 q-bio.GN cs.AI cs.LG

GeneZip: Region-Aware Compression for Long Context DNA Modeling

Jianan Zhao, Xixian Liu, Zhihao Zhan, Xinyu Yuan, Hongyu Guo, Jian Tang

AI总结 GeneZip 是一种面向长上下文DNA建模的区域感知压缩框架，旨在解决现有方法在压缩预算分配和计算成本上的不足。该方法结合动态路由机制与区域感知比例（RAR）目标，利用基因结构注释指导压缩过程，从而在推理时无需注释即可对原始DNA序列进行高效压缩。GeneZip 在压缩效果、冗余识别和训练效率方面表现出色，显著提升了长序列DNA模型的性能与可扩展性。

详情

Comments: Preprint, work in progress

英文摘要

Long-context DNA models are limited by token-mixing cost and by how compression allocates representational budget across the genome. Existing approaches operate close to base-pair resolution, apply fixed downsampling, or learn content-dependent chunks without an explicit genomic budget, making long-context pretraining expensive and difficult to control. We introduce GeneZip, a region-aware DNA compression framework that combines H-Net-style dynamic routing with a Region-Aware Ratio (RAR) objective and bounded routing. GeneZip uses static gene-structure annotations during compression training to specify region-wise base-pairs-per-token (BPT) targets; at inference time, it compresses raw unseen DNA without annotations. GeneZip provides three main benefits. First, it is effective: GeneZip variants achieve the best validation PPL among encoder-based compressors, with GeneZip-70M operating at 137.6 BPT, and across four reproducible DNALongBench tasks--contact map prediction, eQTL prediction, enhancer-target gene prediction, and transcription-initiation signal prediction--GeneZip obtains the best average rank among compared sequence models. Second, it is redundancy-aware: a post-hoc RepeatMasker/TRF analysis shows that, without repeat supervision, GeneZip assigns higher local BPT to TE-derived interspersed repeats and tandem repeats, two major classes of repetitive DNA sequence redundancy. Third, it is efficient: by reducing the effective token-mixing length, GeneZip enables longer-context and larger-capacity pretraining, including 128K-context and 636M-parameter variants on a single A100 80GB GPU, and fine-tunes the eQTL task 50.4x faster than JanusDNA (50 vs. 2520 minutes). These results establish GeneZip as an effective, redundancy-aware, and efficient compression interface for long-context DNA modeling.

URL PDF HTML ☆

赞 0 踩 0

2602.15451 2026-05-13 q-bio.QM cs.AI cs.LG quant-ph

Molecular Design beyond Training Data with Novel Extended Objective Functionals of Generative AI Models Driven by Quantum Annealing Computer

Hayato Kunugi, Mohsen Rahmani, Yosuke Iyama, Yutaro Hirono, Akira Suma, Matthew Woolway, Vladimir Vargas-Calderón, William Kim, Kevin Chern, Mohammad Amin, Masaru Tateno

AI总结该研究提出了一种结合量子退火计算机的深度生成模型优化框架，用于小分子药物设计，解决了传统生成模型生成药物类化合物频率较低的问题。研究中引入了神经哈希函数（NHF），同时作为正则化和二值化方案，用于经典与量子神经网络之间的信号转换及误差函数构建。实验表明，基于量子退火的生成模型在分子有效性和药物相似性方面优于传统模型，并且在无需额外约束条件下超越了训练数据的表现，展示了量子计算在药物设计中的潜在优势。