PCDM: A Diffusion-Based Data Poisoning Attack Against Federated Learning Systems
PCDM:一种基于扩散的数据污染攻击对抗联邦学习系统
Wei Sun, Yijun Chen, Bo Gao, Ke Xiong, Yuwei Wang, Pingyi Fan, Khaled Ben Letaief
AI总结 本文提出基于扩散的数据污染框架,通过Poisoning-Oriented Conditional Diffusion Model实现对联邦学习系统中局部数据污染的精细控制,同时保证攻击效果与隐蔽性。
详情
联邦学习(FL)由于其分布式特性而容易受到数据污染攻击。尽管最近基于GAN的数据污染方法表明了利用生成式AI生成看似合法的污染数据的潜力,但GAN输出的内在一致性仍可能揭示数据污染的迹象。本文提出了一种针对FL系统的基于扩散的数据污染框架,该框架利用面向污染的条件扩散模型(PCDM)以实现对本地污染数据的精细控制,同时确保攻击的有效性和隐蔽性。我们的PCDM在全局上下文中整合可调节的污染向量,以精确控制污染数据的生成,并在理论上保证攻击性能。此外,它采用了一种新颖的跳跃扩散策略以实现轻量级和高效的污染数据生成。我们对FL污染攻击进行了最系统和广泛的实验评估,针对各种防御措施,包括先进的拜占庭鲁棒聚合机制,在四个公开数据集上进行测试:MNIST、Fashion-MNIST、CIFAR-10、CIFAR-100以及一个现实世界的无线专用数据集VRAI。我们的结果表明,与最先进的方法相比,PCDM较少表现出统计异常,同时更有效地降低全局FL性能,这给FL中的数据安全带来了重大风险。
Federated learning (FL) is vulnerable to data poisoning attacks due to its distributed nature. Although recent GAN-based data poisoning methods have indicated the potential of using generative AI to generate seemingly legitimate poisoned data, the inherent consistency of GAN outputs can still reveal a sign of data poisoning. In this paper, we propose a diffusion-based data poisoning framework against FL systems, which leverages a Poisoning-Oriented Conditional Diffusion Model (PCDM) to enable fine-grained control over the local generation of poisoned data while ensuring both attack effectiveness and stealthiness. Our PCDM incorporates an adjustable poisoning vector within the global context to precisely control the generation of poisoned data, with theoretical guarantees on attack performance. Furthermore, it employs a novel jumping diffusion strategy for lightweight and efficient poisoned data generation. We conduct the most systematic and broad experimental evaluation for FL poisoning attacks against various defenses, including advanced Byzantine robust aggregation mechanisms, on four open datasets: MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and a real-world wireless-specific dataset VRAI. Our results demonstrate that PCDM is less likely to exhibit statistical anomalies compared with the state-of-the-art methods while more effectively degrading global FL performance, which poses a significant risk to data security in FL.