arXivDaily arXiv每日学术速递 周一至周五更新

AI 大模型

代码大模型 / AI 编程

代码生成、软件工程智能体、程序修复、测试生成和开发者工具。

今日/当前日期收录 1 信号源:cs.SE, cs.CL, cs.AI, cs.LG, cs.PL
2606.18619 2026-06-18 cs.CR cs.AI cs.SE 新提交 85%

Code-Augur: Agentic Vulnerability Detection via Specification Inference

Code-Augur:通过规约推断的智能体漏洞检测

Zhengxiong Luo, Mehtab Zafar, Dylan Wolff, Abhik Roychoudhury

发表机构 * National University of Singapore(新加坡国立大学)

专题命中 程序修复 :智能体漏洞检测,通过规约推断发现漏洞

AI总结 提出安全规约优先范式,通过显式化智能体假设并运行时反证,结合引导式模糊测试提升漏洞检测能力,在真实项目中比现有智能体检测更多漏洞。

详情
AI中文摘要

智能体漏洞检测的出现已成为软件安全的分水岭。完全由自主LLM智能体进行的审计正在发现数字社会基础软件中的关键漏洞。许多漏洞多年来一直隐藏,直到现在才被AI智能体发现。然而,这些发现背后的推理仍然令人担忧地不透明且未经验证。当智能体认为某个函数安全时,它对函数输入做了哪些假设?推理失败和错误假设可能导致遗漏漏洞,并降低对智能体分析的信任。我们提出了一种安全规约优先范式,该范式(1)将智能体的隐性假设明确暴露为安全规约,并(2)通过运行时反证持续细化这些规约。我们在Code-Augur中实现了我们的方法,这是一种用于智能体漏洞检测的新型框架。给定一个代码库,Code-Augur分析系统的每个组件以查找漏洞代码。当它认为某个组件安全时,它会将该判断背后的局部不变量作为源代码中的断言提交。同时,Code-Augur利用引导式模糊测试器尝试反证这些假设。当模糊测试器触发断言时,要么揭示一个真实漏洞,要么揭示一个需要细化的有缺陷规约。在这两种情况下,这一过程都夯实了智能体的理解,使其对代码意图的看法与代码实际行为保持一致。在真实世界的主题上,Code-Augur有效利用安全规约检测到比其他最先进智能体更多的漏洞。此外,Code-Augur在关键开源项目中发现了22个新漏洞。与精心策划的专用模型(如Claude Mythos)相比,Code-Augur提供了基于广泛可用的LLM(如Sonnet和DeepSeek)构建的有效智能体漏洞检测。

英文摘要

The advent of agentic vulnerability detection is already becoming a watershed moment for software security. Audits conducted entirely by autonomous LLM agents are uncovering critical vulnerabilities in fundamental software underpinning digital society. Many of these vulnerabilities remained masked for years, surfacing only now with AI agents. Yet the reasoning behind these discoveries remains alarmingly opaque and unvalidated. What assumptions did the agent make about a function's inputs when it deemed that function to be secure? Failures in reasoning and incorrect assumptions can lead to missed vulnerabilities and reduce trust in agentic analysis. We propose a security-specification-first paradigm that (1) exposes the agent's tacit assumptions explicitly as security specifications and (2) continuously refines those specifications via runtime falsification. We realize our approach in Code-Augur, a novel harness for agentic vulnerability detection. Given a codebase, Code-Augur analyzes each component of the system for vulnerable code. When it deems a component to be secure, it commits the local invariants behind that judgment as in-source assertions. In parallel, Code-Augur leverages a guided fuzzer to attempt to falsify those assumptions. When the fuzzer triggers an assertion, this either reveals a genuine vulnerability or a flawed specification to refine. In both cases, this process grounds the agent's understanding, aligning its view of code intent with how the code actually behaves. On real-world subjects, Code-Augur effectively leverages security specifications to detect more vulnerabilities than other state-of-the-art agents. Additionally, Code-Augur found 22 new vulnerabilities in key open-source projects. Compared to curated specialized models like Claude Mythos, Code-Augur offers effective agentic vulnerability detection built on widely available LLMs like Sonnet and DeepSeek.