Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs
最小提示扰动导致代码漏洞:编码大语言模型中的提示脆弱性和隐藏状态信号
Alexander Sternfeld, Andrei Kucharavy, Ljiljana Dolamic
AI总结 本文通过token级突变实验,发现微小提示扰动(如单字符变化)即可使LLM生成代码从安全变为脆弱,并利用隐藏状态分析揭示输入处理漏洞比安全默认值漏洞更可预测。
详情
基于LLM的编码助手正被迅速采用,显著提高了开发者的生产力。随着组织越来越多地部署这些代理生成的代码,代码的安全性变得至关重要。先前的研究表明,微小的提示扰动会降低LLM生成代码的功能正确性,但这是否也会危及代码安全性尚未被研究。我们对三个模型和五种编程语言的提示应用token级突变,并表明小至单字符变化的突变可以将生成的代码从安全变为脆弱。探测模型的隐藏状态揭示,这种脆弱性部分编码在提示表示中,但分布不均匀。输入处理漏洞(模型省略验证或清理)比安全默认值漏洞(不安全代码源于一个局部选择,如弱算法或不安全参数)更可预测(平均AUC 0.753 vs 0.674)。这些结果表明,LLM辅助编码的威胁模型不仅包括提示注入,还包括普通的提示变化,并指出输入处理缺陷可以在生成前被捕获,而安全默认值缺陷需要在解码过程中进行干预。
LLM-based coding assistants are seeing rapid adoption, offering substantial gains in developer productivity. As organizations increasingly ship code these agents produce, the security of that code becomes critical. Prior work has shown that minor prompt perturbations degrade the functional correctness of LLM-generated code, but whether they also compromise code security has remained unstudied. We apply token-level mutations to prompts across three models and five programming languages, and show that mutations as small as a single-character change can flip generated code from secure to vulnerable. Probing the models' hidden states reveals that this fragility is partially encoded in prompt representations, but unevenly so. Input-handling vulnerabilities, where the model omits validation or sanitization, are more predictable (mean AUC 0.753) than secure-defaults vulnerabilities, where insecure code stems from one local choice such as a weak algorithm or unsafe parameter (mean AUC 0.674). These results show that the threat model for LLM-assisted coding extends beyond prompt injection to ordinary prompt variation, and indicate that input-handling flaws can be caught before generation while secure-defaults flaws require intervention during decoding.