The Biosecurity Blind Spot: Systematic Dual-use Detection in Open Science Infrastructure
生物安全盲点:开放科学基础设施中的系统性双重用途检测
Vasudha Sharma, Chakresh Kumar Singh, Jayesh Choudhari, Dharmit Nakrani
AI总结 本研究通过混合词法过滤和大语言模型评估,系统分析了bioRxiv预印本中双重用途研究关注内容,揭示了开放获取摘要中普遍存在的潜在风险,并提出了结合元数据监控与开放科学原则的治理框架。
Comments Ongoing work
详情
人工智能以前所未有的速度改变着生命科学研究,加速了蛋白质结构预测、基因组建模和药物开发等领域的发现(Jumper et al., 2021; Mak et al., 2024)。然而,这种快速进步,加上开放科学运动,引入了重大的双重用途研究问题,但这些问题尚未得到充分的实证研究。本文首次对开放预印本服务器上的双重用途研究关注(DURC)内容进行了系统分析。我们使用词法过滤和大语言模型(LLM)评估的混合流程,筛选了约52,000篇bioRxiv预印本(2024-2025年),并根据美国及澳大利亚集团监管框架,对九个DURC类别、三个PEPP类别和五个治理类别的元数据进行了评分。我们的分析显示,双重用途相关的知识通常出现在公开可访问的标题和摘要中,即使在具有合法公共卫生目标的研究中,也常常超过既定的风险阈值。虽然这种映射捕捉了表面层面的信息扩散,但它并未衡量操作能力、下游滥用潜力或限制有害应用的重大技术和生物安全障碍。我们认为,机构审查流程、资助要求和预印本平台政策必须发展,以纳入主动的元数据级监控,同时不损害科学透明度。最终,将高风险方法学的受控访问机制与科学贡献的开放摘要相协调,为大规模治理AI加速生物学提供了实用框架。
AI is transforming life sciences research at unprecedented speed, accelerating discovery across protein structure prediction, genome modeling, and drug development (Jumper et al., 2021; Mak et al., 2024). Yet this rapid advancement, coupled with the open science movement, introduces significant dual-use research concerns that have received limited empirical scrutiny. Here we present the first systematic analysis of dual-use research of concern (DURC) content on open preprint servers. We screened ~52,000 bioRxiv preprints (2024-2025) using a hybrid pipeline of lexical filtering and large language model (LLM) evaluation, scoring metadata across nine DURC, three PEPP, and five governance categories aligned with U.S. and Australia Group oversight frameworks. Our analysis reveals that dual-use-adjacent knowledge is routinely present in openly accessible titles and abstracts, often exceeding established risk thresholds even in studies with legitimate public health objectives. While this mapping captures surface-level information diffusion, it does not measure operational capability, downstream misuse potential, or the substantial technical and biosafety barriers that constrain harmful application. We argue that institutional review processes, funding requirements, and preprint platform policies must evolve to incorporate proactive, metadata-level monitoring without compromising scientific transparency. Ultimately, harmonizing controlled-access mechanisms for high-risk methodologies with open summaries of scientific contributions offers a pragmatic framework for governing AI-accelerated biology at scale.