Adversarial Trust Poisoning in Vehicular Collaborative Perception
车联网协作感知中的对抗信任污染
Yutong Liu, Chenyi Wang, Ming F. Li, Qingzhao Zhang
AI总结 该研究提出TrustFlip攻击,利用一致性防御机制污染对良性车辆的信任评分,导致系统感知能力下降甚至安全故障,同时提出TrustReflect作为缓解措施。
详情
协作感知(CP)使连接和自动驾驶车辆能够共享传感器数据并共同感知环境。为防御对抗者篡改共享数据,现有系统采用跨车辆不一致性检测和信任估计,惩罚与多数观察冲突的车辆。本文证明这些防御本身引入了新的攻击面。我们提出了TrustFlip,一种利用一致性防御机制污染对良性车辆信任的新型攻击。不同于注入虚假数据,它部署真实的物理对抗对象,诱导良性车辆产生不一致观察。由此产生的不一致被防御机制误归因于目标车辆,导致其信任分数下降并最终被降权或排除。因此,系统失去可靠感知贡献者,降低感知能力,可能引发安全关键故障。我们在多个协作感知架构和防御机制上评估TrustFlip。结果表明,最先进防御可显著受影响:攻击在87.7%的场景中将目标良性车辆排除在协作之外,并将平均精度(AP)降低高达13%。作为初步缓解措施,我们引入TrustReflect,一种轻量级的自我反思机制,将争议区域标记为不确定并排除在信任评估之外,将攻击成功率降低35-100%。
Collaborative perception (CP) enables connected and autonomous vehicles to share sensor data and jointly reason about their environment. To defend against adversaries that fabricate or manipulate shared data, existing systems employ cross-vehicle inconsistency detection and trust estimation, penalizing vehicles whose observations conflict with the majority. In this work, we show that these defenses themselves introduce a new attack surface. We present TrustFlip, a novel attack that weaponizes consistency-based defenses to poison the trust assigned to benign vehicles. Instead of injecting false data into the collaboration pipeline, it deploys physical adversarial objects that are genuine but induce inconsistent observations among benign vehicles. The resulting inconsistencies are misattributed by the defense to the targeted vehicle, causing its trust score to degrade and eventually leading to its downweighting or exclusion from collaboration. Consequently, the system loses reliable sensing contributors, degrading perception capability and potentially inducing safety-critical failures. We evaluate TrustFlip across multiple collaborative perception architectures and defense mechanisms. Our results show that state-of-the-art defenses can be significantly affected: the attack removes the targeted benign vehicle from collaboration in up to 87.7% of scenarios and drops Average Precision (AP) by up to 13%. As an initial mitigation, we introduce TrustReflect, a lightweight self-reflection mechanism that marks disputed regions as uncertain and excludes them from trust evaluation, reducing the attack success rate by 35-100%.