Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code
发表机构 * Institute of Information and Communication Technology, Bangladesh University of Engineering Technology(孟加拉工程科技大学信息与通信技术研究所) ; Hajee Mohammad Danesh Science and Technology University(海杰莫哈默德丹什科学与技术大学)
AI总结 本文研究了在Python代码中使用小型语言模型(SLM)进行准确且隐私保护的CWE漏洞检测的可行性。通过半监督方法构建了一个包含500个样本的数据集,并对一个3.5亿参数的预训练代码模型进行指令遵循的微调,最终在测试集上实现了近99%的准确率和召回率。实验表明,经过微调的SLM能够在本地环境中高效、精确地检测CWE漏洞,为安全分析提供了一种隐私友好的解决方案。
Comments 11 pages, 2 figures, 3 tables. Dataset available at https://huggingface.co/datasets/floxihunter/synthetic_python_cwe. Model available at https://huggingface.co/floxihunter/codegen-mono-CWEdetect. Keywords: Small Language Models (SLMs), Vulnerability Detection, CWE, Fine-tuning, Python Security, Privacy-Preserving Code Analysis