Dynamic Malicious Skills in Agentic AI
智能体AI中的动态恶意技能
Tianhao Chen, Zhengyuan Jiang, Yuepeng Hu, Yebei Gou, Neil Zhenqiang Gong
AI总结 研究智能体AI中通过自然语言文档注入恶意指令实现动态恶意技能的攻击方法,并提出基于操作系统内核只读挂载的系统级防御。
详情
技能是智能体AI的关键使能组件。虽然它们增强了智能体的能力,但也引入了新的攻击面。在这项工作中,我们通过展示动态恶意技能来研究其中一个攻击面。通过将恶意指令嵌入自然语言文档(例如SKILL.md),攻击者可以诱使智能体在执行过程中动态地将恶意逻辑注入到原本良性的技能中。我们在OpenHands和Claude Code等智能体框架上评估了这种攻击,表明动态恶意技能能够在运行时以非平凡的成功率引入一系列恶意行为。为了缓解这一漏洞,我们提出了一种系统级防御,利用操作系统内核强制只读挂载来防止技能的动态修改。我们的评估表明,这种防御在保持良性技能功能的同时,有效阻止了动态恶意技能。
Skills are a key enabling component of agentic AI. While they enhance agents' capabilities, they also introduce new attack surfaces. In this work, we investigate one such attack surface by demonstrating dynamic malicious skills. By embedding malicious instructions in natural-language documentation (e.g., this http URL ), an attacker can induce an agent to dynamically inject malicious logic into an otherwise benign skill during execution. We evaluate this attack across agentic frameworks such as OpenHands and Claude Code, showing that dynamic malicious skills can successfully introduce a range of malicious behaviors at runtime with non-trivial success rates. To mitigate this vulnerability, we propose a system-level defense that prevents dynamic modification of skills using operating system kernel-enforced read-only mounts. Our evaluation demonstrates that this defense effectively blocks dynamic malicious skills while preserving the functionality of benign skills.