arXivDaily arXiv每日学术速递 周一至周五更新
2606.19396 2026-06-19 q-bio.QM 新提交

BioHarness: Substrate-Aware Evidence Assembly for Biomedical Question Answering across Literature, Knowledge Bases, and Biological Atlases

BioHarness:面向生物医学问答的底物感知证据组装——跨文献、知识库和生物图谱

Meng Xiao, Chuan Qin, Jinmiao Chen, Yihang Cheng, Yuanchun Zhou, Hengshu Zhu

AI总结 提出BioHarness,通过级联控制机制在文献检索、知识库和生物图谱间选择性组装证据,提升生物医学问答准确率,在19,302个问答项上得分从65.9提升至71.0。

Comments 14 Pages, 11 Figures, Keywords: biomedical question answering; retrieval-augmented generation; large language models; evidence assembly; biomedical knowledge bases; biological atlases

详情
AI中文摘要

动机:生物医学问答通常需要超越主题检索文献的证据,包括基因别名解析、数据库标识符标准化以及来自图谱的生物测量值。然而,现有的检索增强生成(RAG)系统通常遵循固定工作流程,缺乏明确机制来决定何时检索文本足够、何时需要经过整理的生物医学知识、或何时应调用对结构化测量值的可执行证据组装。这激发了一种底物感知的大语言模型(LLM)框架,能够跨文献、知识库和生物图谱选择性地组装足够的证据。结果:我们引入BioHarness,一种用于分阶段生物医学证据组装的LLM框架,涵盖文献检索、经过整理的生物医学知识资源以及来自图谱的结构化测量值。BioHarness首先尝试根据重排序的文献证据回答问题,并通过基于接地级联控制,仅在当前证据不确定、接地不足或底物不匹配时升级到REPL风格的证据组装。在涵盖七种答案格式的19,302个生物医学问答项上,BioHarness将最强非预言基线的综合得分从65.9提升至71.0。消融实验、案例研究和骨干扩展分析表明,这些提升源于通过重排序、实体接地和结构化测量访问修复证据-底物不匹配,而非不加区分地调用更多推理步骤、检索更多文献或依赖特定答案模型规模。

英文摘要

Motivation: Biomedical question answering often requires evidence beyond topically retrieved literature, including gene alias resolution, database identifier normalization, and atlas-derived biological measurements. However, existing retrieval-augmented generation (RAG) systems typically follow a fixed workflow and lack an explicit mechanism for deciding when retrieved text is sufficient, when curated biomedical knowledge is required, or when executable evidence assembly over structured measurements should be invoked. This motivates a substrate-aware large language model (LLM) harness that selectively assembles sufficient evidence across literature, knowledge bases, and biological atlases. Results: We introduce BioHarness, an LLM harness for staged biomedical evidence assembly across literature retrieval, curated biomedical knowledge resources, and atlas-derived structured measurements. BioHarness first attempts to answer from reranked literature evidence and escalates through grounded cascade control to REPL-style evidence assembly only when the current evidence is uncertain, weakly grounded, or substrate-mismatched. Across 19,302 biomedical QA items spanning seven answer formats, BioHarness improves the pooled score from 65.9 to 71.0 over the strongest non-oracle baseline. Ablations, case studies, and backbone-scaling analyses show that these gains arise from repairing evidence-substrate mismatches through reranking, entity grounding, and structured measurement access, rather than from indiscriminately invoking more reasoning steps, retrieving additional literature, or relying on a particular answer-model scale.

2606.19405 2026-06-19 q-bio.QM math.DS q-bio.PE 新提交

Multi-type branching inference on contact trees with application to COVID-19

接触树上的多类型分支推断及其在COVID-19中的应用

Augustine Okolie, Johannes Müller, Eno Akarawakc, Isaac Ajiboye

AI总结 提出一种直接作用于接触树上传播树的似然框架,通过多类型分支过程考虑接触度异质性,从部分解析的传播树中推断流行病学参数,并在COVID-19接触追踪数据中验证。

Comments 26 pages, 8 Figures

详情
AI中文摘要

从传播树推断流行病学参数对于理解传染病动态至关重要。现有的基于树的似然方法,包括最初应用于系统动力学环境中的多类型出生-死亡模型,提供了强大的工具,但大多数假设均匀混合,很少捕捉当个体感染更多接触者时传播潜力的变化。在这项工作中,我们开发了一个直接作用于传播树的似然框架,其中节点是个体,边是报告的传播事件,不涉及序列数据。我们推导了一个在有根接触树上的随机SIR过程的似然,其中每个感染个体由有效接触总数和已感染的下游接触数来刻画。我们得到了一个分支完全未被观察到的概率以及它产生一个处于给定状态的观察(采样)末端的概率密度的闭式常微分方程。对于已知末端状态的有根接触树,可以评估得到的似然,并且我们通过将内部分支时间视为潜在变量,将其扩展到部分解析的树。在模拟爆发上的验证确认了准确的参数恢复和良好校准的不确定性。应用于印度卡纳塔克邦的经验COVID-19接触追踪数据,展示了该框架在实际流行病学环境中的实用性。通过在多类型分支似然中纳入接触度异质性,我们的工作为从完全或部分解析的传播树推断传播动态和接触结构提供了一个原则性的基线,补充而非依赖于基于序列的系统动力学推断。

英文摘要

Inferring epidemiological parameters from transmission trees is essential for understanding infectious disease dynamics. Existing tree-based likelihood methods, including the multi-type birth-death models originally applied in phylodynamic settings, provide powerful tools, but most assume homogeneous mixing and rarely capture how transmission potential changes as an individual infects more of their contacts. In this work, we develop a likelihood framework that operates directly on transmission trees, in which nodes are individuals and edges are reported transmission events, with no sequence data involved. We derive a likelihood for a stochastic SIR process on a rooted contact tree in which each infected individual is characterised by the total number of effective contacts, and the number of already infected downstream contacts. We obtain closed-form ordinary differential equations for the probability that a clade goes entirely unobserved and for the probability density that it produces an observed (sampled) tip in a given state. The resulting likelihood can be evaluated for a rooted contact tree with known tip states, and we extend it to partially resolved trees by treating internal branching times as latent variables. Validation on simulated outbreaks confirms accurate parameter recovery and well calibrated uncertainty. Application to empirical COVID-19 contact-tracing data from Karnataka, India, demonstrates the framework's utility for real epidemiological settings. By incorporating contact-degree heterogeneity in a multi-type branching likelihood, our work provides a principled baseline for inferring both transmission dynamics and contact structure from fully or partially resolved transmission trees, complementing rather than relying on sequence-based phylodynamic inference

2503.04507 2026-06-19 q-bio.QM cs.CG cs.LG 新提交

The Morse Transform for Discrete Shape Analysis

离散形状分析的Morse变换

Alexander M. Tanaka, Aras T. Asaad, Richard Cooper, Vidit Nanda

AI总结 提出一种基于定向分段线性Morse理论的拓扑变换,通过记录多个高度函数下的临界点来量化嵌入对象的几何形状,生成的特征向量在配体虚拟筛选中取得最优平均AUROC。

Comments 37 pages, 3 main figures, 2 main tables, 12 appendix figures and 4 appendix tables

详情
AI中文摘要

物体的几何形状在调节其与物理世界的相互作用中起着至关重要的作用。然而,为了统计推断或分类任务的目的,用数值描述几何信息仍然困难。在这里,我们引入了一种新的拓扑变换,它利用定向分段线性Morse理论,通过编录多个高度函数下的临界点来量化嵌入对象的几何形状。该Morse变换的输出记录了表征底层形状的临界点的高度和局部拓扑类型(峰、谷或鞍点),保留了比欧拉特征变换更精细的信息,同时自然优先考虑形状的最外层区域。关键的是,该输出可以进一步压缩为丰富而紧凑的特征向量。我们将Morse特征向量作为配体虚拟筛选(LBVS)的描述符进行基准测试,这本质上依赖于分子的形状。在常见的梯度提升树分类流程下,与其他拓扑变换描述符和标准基于形状的LBVS描述符相比,Morse描述符实现了最高的平均AUROC。

英文摘要

The geometry of an object plays a vital role in modulating its interactions with the physical world. It nevertheless remains difficult to describe geometric information numerically for the purposes of statistical inference or classification tasks. Here, we introduce a new topological transform which leverages directional piecewise-linear Morse theory to quantify the geometry of an embedded object by cataloguing critical points across multiple height-functions. The output of this Morse transform records both the heights and the local topological type (peak, trough or saddle) of the critical points that characterise the underlying shape, retaining finer information than the Euler characteristic transform whilst naturally prioritising a shape's outermost regions. Crucially, this output can be further compressed into a rich but compact feature vector. We benchmark the Morse feature vector as a descriptor for ligand-based virtual screening (LBVS), which intrinsically depends on the shape of molecules. Under a common gradient-boosted tree classification pipeline, Morse descriptors achieve the highest mean AUROC when compared to other topological transform descriptors and to standard shape-based LBVS descriptors.

2606.20223 2026-06-19 cs.CV q-bio.QM 交叉投稿

DeepForestVisionV2: Ecology-Driven Taxonomy Expansion for Camera-Trap Monitoring in African Tropical Forests

DeepForestVisionV2:面向非洲热带森林相机监测的生态驱动分类扩展

Hugo Magaldi, Theau d'Audiffret, Etienne Francois Akomo-Okoue, Bala Amarasekaran, Naomi Anderson, Claire Auger, Noemie Cappelle, Daniel Cornelis, Raphael Cornette, Tobias Deschner, Gabriel Dubus, Davy Fonteyn, Rosa M. Garriga, Jennifer Hatlauf, Innocent Kasekendi, Raymond Katumba, Aram Kazandjian, Alfred Ngomanda, Stephan Ntie, Simone Pika, Xavier Rufray, Harold Rugonge, John Justice Tibesigwa, Peter van Lunteren, Hadrien Vanthomme, Joeri A. Zwerts, Sabrina Krief

发表机构 * UMR7206 Eco-Anthropologie, MNHN(UMR7206 生态人类学,法国国家自然历史博物馆) One Forest Vision initiative(One Forest Vision 倡议) Sebitoli Chimpanzee Project(塞比托利黑猩猩项目) Centre National de la Recherche Scientifique et Technologique(国家科学技术研究中心) Institut de Recherche en Ecologie Tropicale(热带生态研究所) Tacugama Chimpanzee Sanctuary(塔库加马黑猩猩保护区) Biotope(Biotope 公司) CIRAD(法国农业发展国际合作研究中心) Max Planck Institute for Evolutionary Anthropology(马克斯·普朗克进化人类学研究所) BOKU University(维也纳自然资源与生命科学大学) Agence Nationale des Parcs Nationaux du Gabon(加蓬国家公园管理局) Uganda Wildlife Authority(乌干达野生动物管理局) Addax Data Science(Addax 数据科学公司) Utrecht University(乌得勒支大学)

AI总结 针对非洲热带森林相机监测中生态梯度(垂直分层、场景开放度、人为界面)导致原35类分类过粗的问题,提出扩展至64类的DeepForestVisionV2,在保持离线工作流的同时提升野外实用性。

Comments Accepted at ICPR 2026 - Computer Vision for Biodiversity Monitoring and Conservation Workshop

详情
AI中文摘要

非洲热带森林中的相机监测正从封闭冠层内部扩展到河岸、空地和公园边缘。在现有的非洲森林相机分类开放工具中,DeepForestVision是唯一提供照片和视频匹配离线工作流的工具,先前研究表明其在可比基准上优于其他基线。然而,它专为封闭冠层、地面森林内部设计,使用35类预测空间,当部署遇到树栖灵长类、鸟类、半水生类群或家畜等人为混杂因素时,该空间变得过于粗糙。我们提出DeepForestVisionV2,这是一个从35类扩展到64类预测空间(61个动物类加上人类、车辆和空白)的生态驱动扩展,旨在解决三个反复出现的部署梯度:垂直分层、场景开放度和人为界面。DeepForestVisionV2保留相同的离线工作流,并在来自多国非洲热带森林项目的1,535,010张照片和243,354个视频上训练。评估结合了一个跨国家裁剪照片验证集(用于评估跨站点和相机设置的鲁棒性)和三个涵盖目标梯度的留出乌干达视频基准。在验证集上,DeepForestVisionV2达到0.86准确率、0.82宏F1和0.81平衡准确率。在部署基准上,尽管分类任务更困难,它仍保持或提高了基线准确率,同时将识别的类群数量从森林内部视频的22个增加到29个,河岸视频从4个增加到9个。在公园边缘用例中,它将准确率从0.62提高到0.86,并将误报从11次减少到0次。这些结果表明,DeepForestVisionV2在保持跨站点、栖息地和相机设置鲁棒性的同时,显著提高了野外实用性。

英文摘要

Camera-trap monitoring in African tropical forests increasingly extends beyond closed-canopy interiors to riverbanks, clearings, and park edges. Among available open tools for African forest camera-trap classification, DeepForestVision is the only one providing a matched offline workflow for both photographs and videos, and previous work showed that it outperformed other available baselines on a comparable benchmark. However, it was designed for closed-canopy, ground-level forest interiors and uses a 35-class prediction space that becomes too coarse when deployments encounter arboreal primates, birds, semi-aquatic taxa, or human-associated confounders such as livestock. We present DeepForestVisionV2, an ecology-driven expansion from 35 to 64 prediction classes (61 animal classes plus human, vehicle, and blank) designed to address three recurrent deployment gradients: vertical stratification, scene openness, and anthropogenic interfaces. DeepForestVisionV2 retains the same offline workflow and is trained on 1,535,010 photographs and 243,354 videos from multi-country African tropical-forest projects. Evaluation combines a cross-country cropped-photo validation set, used to assess robustness across sites and camera-trap settings, with three held-out Uganda video benchmarks spanning the targeted gradients. On the validation set, DeepForestVisionV2 reaches 0.86 accuracy, 0.82 macro-F1, and 0.81 balanced accuracy. On the deployment benchmarks, it preserves or improves baseline accuracy despite its harder classification task, while increasing the number of identified taxa from 22 to 29 in forest-interior videos and from 4 to 9 at riverbanks. In the park-edge use case, it raises accuracy from 0.62 to 0.86 and reduces false alarms from 11 to 0. These results show that DeepForestVisionV2 materially improves field utility while preserving robustness across sites, habitats, and camera-trap settings.

2606.20164 2026-06-19 cs.CL cs.AI cs.LG q-bio.QM 交叉投稿

MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization

MedRLM:用于长上下文临床推理、传感器引导筛查、证据支持决策及社区到三级转诊优化的递归多模态健康智能

Aueaphum Aueawatthanaphisut

发表机构 * School of Information, Computer Communication Technology Sirindhorn International Institute of Technology, Thammasat University Pathum Thani, Thailand 1

AI总结 提出MedRLM递归多模态健康智能框架,通过递归检查、分解、检索、验证和合成患者信息,协调多个专业代理并引入临床证据图记忆,实现长上下文临床推理和传感器引导筛查。

Comments 9 pages, 3 figures, 3 tables, 1 Algorithm, 29 equations

详情
AI中文摘要

现实世界的临床决策支持需要对异质性和纵向的患者信息进行推理,而不是回答孤立的医学问题。然而,当前的医学大语言模型和检索增强生成系统通常依赖单步提示或检索,当临床证据分布在长电子健康记录、医学图像、传感器流、指南和转诊约束中时,这可能变得脆弱。本文提出MedRLM,一个用于长上下文临床推理、传感器引导筛查和社区到三级转诊支持的递归多模态健康智能框架。MedRLM不是将所有患者信息压缩到一个提示中,而是将患者病例视为一个外部临床环境,可以递归地检查、分解、检索、验证和综合。该框架协调了专门用于临床文本、纵向EHR、医学影像、生理传感器信号、指南检索、不确定性审计和转诊规划的代理。它进一步引入了临床证据图记忆,将患者特定的观察结果与检索到的证据、标准化定义、传感器衍生的生物标志物和转诊标准连接起来。传感器引导的递归触发机制在检测到异常生理或行为模式时激活更深层次的推理,而不确定性门控细化支持临床医生对高风险或低置信度病例的审查。我们还概述了一个使用公共和经认证的临床数据集(涵盖EHR、放射学、ECG、ICU时间序列和转诊代理结果)的真实数据评估设计。MedRLM旨在将医学AI从静态问答转向可审计、多模态和流程感知的临床决策支持。

英文摘要

Real-world clinical decision support requires reasoning over heterogeneous and longitudinal patient information rather than answering isolated medical questions. However, current medical large language models and retrieval-augmented generation systems often rely on single-step prompting or retrieval, which can be fragile when clinical evidence is distributed across long electronic health records, medical images, sensor streams, guidelines, and referral constraints. This paper proposes MedRLM, a Recursive Multimodal Health Intelligence framework for long-context clinical reasoning, sensor-guided screening, and community-to-tertiary referral support. Instead of compressing all patient information into one prompt, MedRLM treats the patient case as an external clinical environment that can be recursively inspected, decomposed, retrieved, verified, and synthesized. The framework coordinates specialized agents for clinical text, longitudinal EHR, medical imaging, physiological sensor signals, guideline retrieval, uncertainty auditing, and referral planning. It further introduces a Clinical Evidence Graph Memory to connect patient-specific observations with retrieved evidence, standardized definitions, sensor-derived biomarkers, and referral criteria. A sensor-guided recursive triggering mechanism activates deeper reasoning when abnormal physiological or behavioral patterns are detected, while uncertainty-gated refinement supports clinician review for high-risk or low-confidence cases. We also outline a real-data evaluation design using public and credentialed clinical datasets spanning EHR, radiology, ECG, ICU time series, and referral-proxy outcomes. MedRLM aims to move medical AI from static question answering toward auditable, multimodal, and workflow-aware clinical decision support.

2512.02908 2026-06-19 q-bio.MN q-bio.QM q-bio.SC 版本更新

Imperfect molecular detection can renormalize apparent kinetic rates in stochastic gene regulatory networks

不完美的分子检测可以重整化随机基因调控网络中的表观动力学速率

Iryna Zabaikina, Ramon Grima

AI总结 研究不完美分子检测对基因调控网络随机动力学的影响,发现捕获效应在某些条件下可重整化动力学速率,为解释噪声单细胞测量提供系统基础。

Comments 28 pages, 6 figures. Changes include Table I, demonstrating accurate renormalization even for mean protein copy numbers of only a few tens of molecules, and Fig. 6, summarizing all models, reaction schemes, assumptions, rate rescalings, and validity regimes. The conclusion was expanded to discuss practical applications

详情
AI中文摘要

单细胞实验中的不完美分子检测引入了技术噪声,掩盖了基因调控网络的真实随机动力学。虽然分子捕获的二项模型提供了不完美检测的原理性描述,但迄今为止仅针对未明确考虑调控的简单基因表达模型进行了分析。在这里,我们将捕获的二项模型扩展到一般基因调控网络,以理解不完美捕获如何重塑观察到的分子计数的时间相关统计量。我们的结果揭示了捕获效应何时对应于一部分动力学速率的重整化,以及何时不能被吸收为有效速率,从而为解释有噪声的单细胞测量提供了系统基础。特别地,我们表明速率重整化取决于模型中调控细节的水平。对于基于启动子状态转换的隐式调控模型,只要基因产物合成不触发启动子状态变化(例如没有启动子近端暂停或暂停短暂),就会发生重整化。对于具有显式转录因子结合的模型,同样的条件成立,同时需要足够高的转录因子丰度,实际上每个细胞只需几十个分子。在这些情况下,技术噪声降低了合成基因产物的表观平均爆发大小,并加速了转录因子结合反应的表观速率。这种加速随着参与启动子转换的蛋白质种类和/或分子数量的增加而增强。这些效应对任意连接性的基因调控网络都成立,并且在时间依赖的动力学速率下仍然有效。

英文摘要

Imperfect molecular detection in single-cell experiments introduces technical noise that obscures the true stochastic dynamics of gene regulatory networks. While binomial models of molecular capture provide a principled description of imperfect detection, they have so far been analyzed only for simple gene-expression models that do not explicitly account for regulation. Here, we extend binomial models of capture to general gene regulatory networks to understand how imperfect capture reshapes the observed time-dependent statistics of molecular counts. Our results reveal when capture effects correspond to a renormalization of a subset of the kinetic rates and when they cannot be absorbed into effective rates, providing a systematic basis for interpreting noisy single-cell measurements. In particular, we show that rate renormalization depends on the level of regulatory detail in the model. For implicit regulatory models based on promoter state transitions, it arises whenever gene product synthesis does not trigger a promoter state change, as in the absence of promoter-proximal pausing or when pausing is short-lived. For models with explicit transcription factor binding, the same condition holds, together with sufficiently high transcription factor abundance, which in practice requires only a few tens of molecules per cell. In these cases, technical noise reduces the apparent mean burst size of synthesized gene products and accelerates the apparent rates of transcription factor binding reactions. This acceleration becomes stronger as the number of protein species and/or molecules involved in promoter switching increases. These effects hold for gene regulatory networks of arbitrary connectivity and remain valid under time-dependent kinetic rates.

1812.03321 2026-06-19 q-bio.QM 版本更新

Isolating phyllotactic patterns embedded in the secondary growth of sweet cherry (Prunus avium L.) using magnetic resonance imaging

Mitchell Eithun, Daniel H. Chitwood, James Larson, Gregory Lang, Elizabeth Munch

Comments Code: https://github.com/eithun/cherry-phyllotaxy

详情
英文摘要

Epicormic branches arise from dormant buds patterned during the growth of previous years. Dormant epicormic buds remain on the surface of trees, pushed outward from the pith during secondary growth, but maintaining vascular connections. Epicormic buds can be reactivated, either through natural processes or intentionally, to rejuvenate orchards and control tree architecture. Because epicormic structures are embedded within secondary growth, tomographic approaches are a useful method to study them and understand their development. We apply techniques from image processing to determine the locations of epicormic vascular traces embedded within secondary growth of sweet cherry (Prunus avium L.), revealing the juvenile phyllotactic pattern in the trunk of an adult tree. Techniques include breadth-first search to find the pith of the tree, edge detection to approximate the radius, and a conversion to polar coordinates to threshold and segment phyllotactic features. Intensity values from Magnetic Resonance Imaging (MRI) of the trunk are projected onto the surface of a perfect cylinder to find the locations of traces in the "boundary image". Mathematical phyllotaxy provides a means to capture the patterns in the boundary image by modeling phyllotactic parameters. Our cherry tree specimen has the conspicuous parastichy pair $(2,3)$, phyllotactic fraction 2/5, and divergence angle of approximately 143 degrees. The methods described not only provide a framework to study phyllotaxy, but for image processing of volumetric image data in plants. Our results have practical implications for orchard rejuvenation and directed approaches to influence tree architecture. The study of epicormic structures, which are hidden within secondary growth, using tomographic methods also opens the possibility of studying the genetic and environmental basis of such structures.

1802.04677 2026-06-19 math.AT math.DS q-bio.QM 版本更新

Evolutionary homology on coupled dynamical systems

Zixuan Cang, Elizabeth Munch, Guo-Wei Wei

详情
英文摘要

Time dependence is a universal phenomenon in nature, and a variety of mathematical models in terms of dynamical systems have been developed to understand the time-dependent behavior of real-world problems. Originally constructed to analyze the topological persistence over spatial scales, persistent homology has rarely been devised for time evolution. We propose the use of a new filtration function for persistent homology which takes as input the adjacent oscillator trajectories of a dynamical system. We also regulate the dynamical system by a weighted graph Laplacian matrix derived from the network of interest, which embeds the topological connectivity of the network into the dynamical system. The resulting topological signatures, which we call evolutionary homology (EH) barcodes, reveal the topology-function relationship of the network and thus give rise to the quantitative analysis of nodal properties. The proposed EH is applied to protein residue networks for protein thermal fluctuation analysis, rendering the most accurate B-factor prediction of a set of 364 proteins. This work extends the utility of dynamical systems to the quantitative modeling and analysis of realistic physical systems.