arXivDaily arXiv每日学术速递 周一至周五更新
2606.20553 2026-06-19 cs.CR 新提交

From Efficiency to Leakage -- Privacy Backdoor in Federated Language Model Fine-Tuning

从效率到泄露——联邦语言模型微调中的隐私后门

Shanghao Shi, Chaoyu Zhang, Heng Jin, Yang Xiao, Yevgeniy Vorobeychik, William Yeoh, Ning Zhang, Y. Thomas Hou, Wenjing Lou

AI总结 提出NeuroImprint攻击,恶意参数服务器在参数高效微调中植入隐私后门,通过为每个样本分配独立神经元并限制单次更新,实现高保真重建训练文本。

详情
AI中文摘要

联邦学习(FL)使多方能够协作微调语言模型以完成特定领域任务,而无需共享原始数据。由于完整模型微调对FL客户端而言通常过于昂贵,参数高效微调(PEFT)已成为实践中的事实标准,它冻结基础模型,仅训练少量适配器。在本文中,我们表明恶意参数服务器可以隐秘地将PEFT适配器破坏为隐私后门,该后门隐式记忆客户端的训练样本,作为存储在独立神经元中的隔离的每样本参数更新,而不降低模型效用。具体来说,我们的攻击NeuroImprint为每个训练样本分配一个专用的记忆神经元,并约束每个神经元在局部微调轨迹中最多更新一次。这种设计减轻了语言模型微调中由大批量和状态优化器(如Adam/AdamW)引入的跨样本碰撞和跨步混合。微调后,得到的隔离的每样本更新可以通过闭式解析逆变换恢复文本嵌入,然后确定性地映射回令牌序列。为了理解我们方法的通用性,我们在多个语言模型(BERT、GPT-2、Qwen2和Llama3.2)上实现了NeuroImprint,并在涵盖不同领域的四个微调数据集上进行了评估。结果表明,我们的攻击能够以高语义保真度重建59%至79%的所有微调样本。

英文摘要

Federated learning (FL) enables multiple parties to collaboratively fine-tune language models for domain-specific tasks without sharing raw data. Since full model fine-tuning is often prohibitively expensive for FL clients, parameter-efficient fine-tuning (PEFT) has become the de facto approach in practice, freezing the base model and training only a small set of adapters. In this paper, we show that a malicious parameter server can stealthily corrupt a PEFT adapter into a privacy backdoor that implicitly memorizes the client's training samples as isolated per-sample parameter updates stored in separate neurons, without degrading model utility. Concretely, our attack, NeuroImprint, assigns a dedicated memorization neuron to each training sample and constrains that each neuron is updated at most once along the local fine-tuning trajectory. This design mitigates both cross-sample collisions and cross-step mixing introduced by large local batches and stateful optimizers (e.g., Adam/AdamW) in language-model fine-tuning. After fine-tuning, the resulting isolated per-sample updates can be analytically inverted in closed form to recover text embeddings, which are then deterministically mapped back to token sequences. To understand the generality of our method, we implemented NeuroImprint on multiple language models (BERT, GPT-2, Qwen2, and Llama3.2) and evaluated it across four fine-tuning datasets spanning diverse domains. The results demonstrate that our attack can reconstruct 59% to 79% of all finetuning samples with high semantic fidelity.

2606.20520 2026-06-19 cs.CR cs.AI cs.DC cs.LG 新提交

Sovereign Execution Brokers: Enforcing Certificate-Bound Authority in Agentic Control Planes

主权执行代理:在智能体控制平面中强制执行证书绑定权限

Jun He, Deying Yu

AI总结 针对自主代理在生产环境中执行变更时缺乏强制权限验证的问题,提出主权执行代理(SEB),通过证书验证、状态检查和范围身份实现运行时强制权限控制,并在AWS和Kubernetes上验证了其安全性和性能。

Comments 19 pages, 6 figures, 10 tables

详情
AI中文摘要

自主代理越来越多地连接到云、部署和数据控制工作流,但生产环境的变更权限不应存在于非确定性推理过程中。现有的访问控制机制授权身份,而保证层认证提议的操作;两者单独都无法在变更时刻提供对认证权限的强制执行点。本文介绍了主权执行代理(SEB),一种用于证书绑定智能体基础设施的运行时强制边界。SEB消耗由主权保证边界(SAB)颁发的证书,验证请求的变更与认证的执行合约匹配,检查有效期窗口、策略时期、撤销时期和实时状态漂移,铸造范围执行身份,调用基础设施API,并记录签名的决策和结果记录。通过分离提议、准入和执行,SEB将认证权限转化为短暂的、可撤销的、可审计的运行时能力,前提是生产变更API拒绝非代理身份。我们展示了SEB执行模型、证书和重放验证谓词、范围身份语义、绕过预防部署模式、失败行为以及一个具体的原型实现。我们在AWS和Kubernetes集群上评估了原型,测量了延迟开销、撤销传播、漂移检测以及故障注入下的安全性。

英文摘要

Autonomous agents are increasingly connected to cloud, deployment, and data-control workflows, but production mutation authority should not reside inside non-deterministic reasoning processes. Existing access-control mechanisms authorize identities, while assurance layers certify proposed actions; neither alone provides a mandatory enforcement point for certified authority at the moment of mutation. This paper introduces the Sovereign Execution Broker (SEB), a runtime enforcement boundary for certificate-bound agentic infrastructure. SEB consumes certificates issued by the Sovereign Assurance Boundary (SAB), verifies that the requested mutation matches the certified execution contract, checks validity windows, policy epochs, revocation epochs, and live-state drift, mints scoped execution identity, invokes infrastructure APIs, and records signed decision and outcome records. By separating proposal, admission, and execution, SEB turns certified authority into a short-lived, revocable, auditable runtime capability, provided that production mutation APIs reject non-broker identities. We present the SEB execution model, certificate and replay-verification predicates, scoped identity semantics, bypass-prevention deployment patterns, failure behavior, and a concrete prototype implementation. We evaluate the prototype on AWS and Kubernetes clusters, measuring latency overheads, revocation propagation, drift detection, and security under fault injection.

2606.20510 2026-06-19 cs.CR cs.AI 新提交

Efficient and Sound Probabilistic Verification for AI Agents

高效且可靠的AI智能体概率验证

Alaia Solko-Breslin, Pramod Kaushik Mudrakarta, Mihai Christodorescu, Somesh Jha, Krishnamurthy Dj Dvijotham

AI总结 提出基于分布鲁棒优化的框架,为AI智能体在复杂数字环境中的概率策略违规提供可靠上界,无需独立性假设,在终端和工具调用智能体基准上优于现有方法。

详情
AI中文摘要

保护在复杂数字环境中运行的AI智能体已成为关键需求,而运行时监控方法通过制定并执行以Datalog等正式语言表达的策略提供了一种有前景的解决方案。然而,现有方法仅限于确定性策略。在AI智能体的许多实际应用中,需要在面对模糊性时强制执行安全策略,导致概率谓词或状态转换(例如,每次调用时具有一定失败概率的解密器或个人身份信息(PII)检测器)。此外,在许多此类应用中,无法轻易做出调用先前Datalog概率推理工作所需的独立性假设。我们通过引入一种基于分布鲁棒优化的可靠且高效的验证框架来解决这一问题,该框架计算策略违规概率的可靠上界,而不考虑谓词之间可能的相关性。在终端和工具调用智能体的标准基准上,我们证明了我们的方法优于现有技术,并在确保策略违规概率的严格上界的同时,改善了安全-效用权衡。

英文摘要

Securing AI agents that operate in complex digital environments has become a critical need, and runtime monitoring approaches that formulate and enforce policies expressed in a formal language like Datalog offer a promising solution. However, existing approaches are restricted to deterministic policies. In many practical applications of AI agents, there is a need to enforce security policies in the face of ambiguity, leading to probabilistic predicates or state transitions (for example, a declassifier or Personally Identifiable Information (PII) detector that has some failure probability on each invocation). Furthermore, in many such applications, one cannot easily make the independence assumptions necessary to invoke prior work on probabilistic inference in Datalog. We address this by introducing a sound and efficient framework for such verification based on distributionally robust optimization, computing sound upper bounds on the probability of policy violation regardless of possible correlations between predicates. On standard benchmarks for terminal and tool calling agents, we demonstrate that our approach outperforms prior art and improves the security-utility trade-off while ensuring rigorous bounds on the probability of policy violation.

2606.20502 2026-06-19 cs.CR cs.AI cs.SE 新提交

Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software

无理解的校准:诊断微调大语言模型在系统软件漏洞检测中的局限性

Arastoo Zibaeirad, Marco Vieira

AI总结 提出CWE-Trace框架,通过834个Linux内核样本和两个诊断指标(DFI和HDD)评估LLM漏洞检测能力,发现数据污染无实质帮助,微调仅改变输出阈值而非决策策略,模型缺乏真正的安全推理能力。

详情
AI中文摘要

大语言模型在漏洞基准测试中得分高,但究竟是真正推理安全还是仅对污染数据进行模式匹配,这一问题仍未解决。我们提出CWE-Trace,一个基于834个手动整理的Linux内核样本(涵盖74个CWE)构建的LLM漏洞检测框架。该框架强制执行严格的时间分割(2025年前的历史集/截止后的无泄漏集),保留上下文感知的易受攻击-修补对,并引入两个诊断指标:方向性失败指数(DFI)和层次距离与方向(HDD)。我们评估了8个原始LLM和15个LoRA微调变体,涵盖非目标检测、目标检测和CWE分类。分析得出两个关键结果。首先,数据污染未提供可衡量的优势。函数级分析显示,84%的名义污染样本不携带可用的记忆信号:易受攻击的函数缺失或跨数据集交叉映射,约31%的污染样本存在CWE误分类。其次,骨干方向性先验主导微调。模型表现出稳定、系统性的失败模式(DFI范围从-85.5到+94.8个百分点),这些模式从历史数据持续到截止后数据,且难以纠正。微调改变了输出阈值,但未改变决策策略。这是无理解的校准:输出分布适应训练数据,而底层安全推理仍然缺失。在二元检测中最弱的骨干(DeepSeek-R1)在粗粒度CWE分类中提升最大,表明检测和理解是解耦的能力。最佳检测得分仅达到52.1%(比随机高2.1个百分点);精确CWE排名Top-1准确率仍低于1.3%,证实当前LLM无论采用何种微调策略,都缺乏对系统软件的可靠安全推理能力。

英文摘要

Whether LLMs scoring well on vulnerability benchmarks genuinely reason about security or merely pattern-match on contaminated data remains unresolved. We present CWE-Trace, a framework for LLM vulnerability detection built from 834 manually curated Linux kernel samples spanning 74 CWEs. The framework enforces a strict temporal split (pre-2025 historical set / post-cutoff leakage-free set), preserves context-aware vulnerable--patched pairs, and introduces two diagnostic metrics: the Directional Failure Index (DFI) and Hierarchical Distance and Direction (HDD). We evaluate eight vanilla LLMs and 15 LoRA fine-tuned variants across non-targeted detection, targeted detection, and CWE classification. Our analysis yields two key results. First, data contamination provides no measurable advantage. Function-level analysis shows that 84% of nominally contaminated samples carry no usable memorization signal: vulnerable functions are absent or cross-mapped across datasets, and ~31% of contaminated samples carry CWE misclassification. Second, backbone directional priors dominate fine-tuning. Models exhibit stable, systematic failure modes (DFI ranging from -85.5 to +94.8 pp) that persist from historical to post-cutoff data and resist correction. Fine-tuning shifts the output threshold without changing the decision policy. This is calibration without comprehension: output distributions adapt to training data while the underlying security reasoning remains absent. The weakest backbone at binary detection (DeepSeek-R1) gains the most in coarse CWE classification, revealing that detection and understanding are decoupled capabilities. The best detection score reaches only 52.1% (+2.1 pp above chance); exact CWE ranking remains below 1.3% Top-1 accuracy, confirming that current LLMs lack reliable security reasoning for systems software, regardless of fine-tuning strategy.

2606.20492 2026-06-19 cs.CR cs.LO 新提交

A-COMPASS: Formal Foundations for Anonymity Analysis in Microdata

A-COMPASS:微观数据匿名性分析的形式化基础

Tamara Tagliavia, Silvia Ghilezan

AI总结 本文修改COMPASS语言为A-COMPASS,使其适用于微观数据表,支持匿名条件检查与匿名化操作,并证明其语义的确定性和组合性,可用于验证k-匿名和l-多样性等属性。

详情
AI中文摘要

在信息时代,主要问题之一是如何确保个人隐私。根据考虑隐私的背景,出现了各种数据隐私模型。然而,即使对于最基本的模型,这些模型的形式化验证领域仍未得到充分探索。验证隐私需求的一种尝试是合规断言语言(COMPASS)。在COMPASS中,可以指定表需要满足的匿名条件,以及条件不满足时将修改表的操作。它设计用于对预处理后的表进行操作,形式为一条记录对应一组人。在本文中,我们修改COMPASS语言,使其以通常的一条记录对应一个人的形式对微观数据表进行操作。修改后的语言称为A-COMPASS。除了检查先前应用的匿名条件外,A-COMPASS还作为新功能支持执行匿名化操作。我们进一步提供了A-COMPASS语言的语法和语义。我们还证明了引入的语义的最重要属性,如确定性和组合性。最后,我们提供了一种验证匿名属性(如k-匿名和l-多样性)的机制。

英文摘要

In the information age, one of the leading problems is how to ensure individual's privacy. Depending on the context in which privacy is considered, various data privacy models have emerged. However, the domain of formal verification of these models is still not sufficiently explored even when it comes to the most basic models. An attempt to verify privacy requirements is the Compliance Assertion Language (COMPASS). In COMPASS, one can specify an anonymity condition that a table needs to satisfy, and an action that will modify the table if the condition is not satisfied. It is designed to operate on preprocessed tables in a form one record - one group of people. In this paper, we modify the COMPASS language in order to operate on microdata tables in their usual form of one record - one person. The modified language is called A-COMPASS. Along with checking of previously applied anonymity conditions, A-COMPASS enables the execution of anonymization actions as a new feature. We further provide the syntax and the semantics for the A-COMPASS language. We also prove the most important properties of the introduced semantics like determinism and compositionality. Finally, we provide a mechanism to verify anonymity properties, such as k-anonymity and l-diversity.

2606.20470 2026-06-19 cs.CR cs.AI 新提交

Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI Systems

分析针对基于模型引导的自动化攻击的防御性误导策略在智能体AI系统中的应用

Reza Soosahabi, Vivek Namsani

AI总结 本文通过概率模型分析智能体AI系统的攻击-防御场景,提出“检测-误导”策略(如CMPE)以替代传统“检测-拦截”方法,通过产生误导性响应降低攻击者成功率,并在基准测试中将攻击成功率上限降低两个数量级。

详情
AI中文摘要

智能体AI系统越来越依赖语言模型组件来解释指令、处理外部数据、调用工具以及与其他智能体协调。这些能力使得提示注入和越狱攻击的后果更加严重,尤其是当攻击者采用模型引导的自动化来扩展探测、提示优化和响应评估时。本文通过目标系统、其防御机制以及攻击者的自动评判器的概率模型来分析由此产生的攻击-防御场景。我们的分析表明,传统的“检测-拦截”防御可能使攻击者成功率(ASR)随着查询预算的增长而趋近于1,因为可预测的拒绝为自动化搜索提供了有用的反馈。然后,我们研究了“检测-误导”策略,其中检测到的恶意交互会收到受控的、非操作性的响应,旨在诱导攻击者评判器产生假阳性错误。这种策略降低了攻击者选择候选的正预测值,并产生有界的渐近ASR。我们通过渐进式参与的上下文误导(CMPE)评估了该策略的概念验证实现,这是一种轻量级的对话误导方法,旨在在自动化越狱设置中用安全但具有战略误导性的响应替换可预测的拒绝文本。在越狱基准测试中,CMPE将估计的ASR上限降低了两个数量级,并在端到端PAIR和GPTFuzz攻击运行中几乎消除了验证的攻击成功。

英文摘要

Agentic AI systems increasingly rely on language-model components to interpret instructions, process external data, invoke tools, and coordinate with other agents. These capabilities make prompt-injection and jailbreak attacks more consequential, especially as attackers adopt model-guided automation to scale probing, prompt refinement, and response evaluation. This work analyzes the resulting attack-defense setting through a probabilistic model of a target system, its defense mechanism, and the attacker's automated judge. Our analysis shows that conventional detect-and-block defenses can allow attacker success rate (ASR) to approach one as the query budget grows, since predictable refusals provide useful feedback to automated search. We then examine detect-and-misdirect, where detected malicious interactions receive controlled, non-operational responses designed to induce false-positive errors in the attacker's judge. This strategy reduces the positive predictive value of attacker-selected candidates and yields a bounded asymptotic ASR. We evaluate a proof-of-concept realization of this strategy through Contextual Misdirection via Progressive Engagement (CMPE), a lightweight conversational misdirection method designed to replace predictable refusal text with safe but strategically misleading responses in automated jailbreak settings. On jailbreak benchmarks, CMPE reduces estimated ASR upper bounds by up to two orders of magnitude and nearly eliminates verified attack success in end-to-end PAIR and GPTFuzz attack runs.

2606.20444 2026-06-19 cs.CR cs.SE 新提交

Image Encryption Algorithm Based on Convolutional Neural Networks and Dynamic S-Box Generation

基于卷积神经网络和动态S盒生成的图像加密算法

Ans Ibrahim, Fadhil Abbas Fadhil, Mahameed Reza Feizi Derakhshi, Maryam Mahdi Alhusseini, Nikolai Safiullin

AI总结 提出一种结合CNN与经典密码学的动态图像加密方法,通过CNN学习特征生成自适应S盒,增强非线性、唯一性和输入依赖性,提高抗攻击能力。

详情
AI中文摘要

本文提出了一种动态图像加密方法,结合卷积神经网络(CNN)和经典密码学,以提高图像加密的安全性和灵活性。主要概念是基于训练好的CNN学习到的特征创建自适应替换盒(S-box)。与传统固定S-box相比,基于CNN的S-box具有更强的非线性、唯一性和输入图像依赖性,因为它们容易受到线性攻击和差分攻击。这种动态行为增强了混淆特性,使其更能抵抗统计和结构攻击。加密算法包括基于CNN的特征提取和创建个性化S-box来替换像素。通过熵、直方图分析、相关性、NPCR和UACI对基于CNN生成的S-box进行安全评估,表明该方案比传统方案更具弹性和灵活性。

英文摘要

The paper proposes a dynamic approach to image encryption, combining the use of Convolutional Neural Networks (CNNs) and classical cryptography to improve the security and flexibility of image encryption. The main concept is to create adaptive Substitution boxes (S-boxes) based on characteristics that are learned by a trained CNN. The CNN-based S-boxes can be relied on for more non-linearity, uniqueness, and input image dependence than the conventional fixed S-boxes because they are susceptible to the linear and differential attacks. This dynamic behaviour enhances the confusion property and makes it more resistant to statistical and structural attacks. The encryption algorithm consists of CNN-based feature extraction and the creation of a personalised S-box to replace the pixels. Entropy, histogram analysis, correlation, NPCR, and UACI enable security assessment of generated S-boxes based on the CNN, indicating that the scheme is more resilient and flexible than traditional ones.

2606.20436 2026-06-19 cs.CR cs.AI 新提交

Multi-View Decompilation for LLM-Based Malware Classification

基于LLM的恶意软件分类的多视角反编译

Bercan Turkmen, Vyas Raina

AI总结 提出多反编译器视角提升LLM恶意软件分类性能,通过Ghidra和RetDec的互补伪C代码提高召回率和F1分数。

详情
AI中文摘要

恶意软件分析师通常在源代码不可用时,通过反编译的伪C代码检查编译后的二进制文件。最近的研究表明,大型语言模型(LLMs)可以通过将反编译代码分类为良性或恶意来辅助这一过程,但现有的流程通常依赖于单一的反编译器视角。我们认为这一假设是脆弱的:反编译器是有损的启发式工具,不同的反编译器可能暴露同一二进制文件的不同特征。我们整理了一个包含良性工具和恶意程序的基准测试,涵盖一系列威胁行为。每个样本都使用Ghidra和RetDec进行编译和反编译,生成匹配的伪C视图。在来自主要模型系列的一系列LLMs中,我们发现提供两种反编译器视图可以提高恶意类别的F1分数,主要是通过提高恶意样本的召回率。一致性分析进一步表明,Ghidra和RetDec会犯部分不同的错误,支持反编译器输出提供互补证据的观点。我们的结果表明,多反编译器提示是一种简单、无需训练的方法,可以在实际环境中改进基于LLM的恶意软件分类。

英文摘要

Malware analysts often inspect compiled binaries through decompiled pseudo-C, when source code is unavailable. Recent work suggests that large language models (LLMs) can assist this process by classifying decompiled code as benign or malicious, but existing pipelines typically rely on a single decompiler view. We argue that this assumption is fragile: decompilers are lossy heuristic tools, and different decompilers can expose different artefacts of the same binary. We curate a benchmark of benign utilities and malicious programs spanning a range of threat behaviors. Each sample is compiled and decompiled with both Ghidra and RetDec, yielding matched pseudo-C views. Across a range of LLMs from major model families, we find that providing both decompiler views improves malicious-class F1, mainly by increasing recall on malicious samples. Agreement analyses further show that Ghidra and RetDec make partially different errors, supporting the view that decompiler outputs provide complementary evidence. Our results suggest that multi-decompiler prompting is a simple, training-free way to improve LLM-based malware triage in practical settings.

2606.20408 2026-06-19 cs.CR cs.AI 新提交

LLM agent safety, multi-turn red-teaming, jailbreak benchmarks, adversarial robustness, safety-critical systems

LLM智能体安全性、多轮红队测试、越狱基准、对抗鲁棒性、安全关键系统

Hanwool Lee, Dasol Choi, Bokyeong Kim, Seung Geun Kim, Haon Park

AI总结 提出NRT-Bench基准,通过模拟核电站控制室的多轮红队测试,评估LLM智能体在安全关键系统中的对抗鲁棒性,发现不同模型的漏洞几乎不重叠,且防御效果高度依赖模型。

详情
AI中文摘要

大型语言模型(LLM)智能体越来越多地被提议作为安全关键系统的监督组件,但它们在持续、自适应对抗压力下的鲁棒性仍鲜有表征。我们提出了NRT-Bench,一个用于对作为安全关键系统操作员的LLM智能体进行多轮红队测试的基准,实例化为一个模拟核电站控制室。一个由五个角色组成的操作员团队,每个角色由可配置的LLM支持,运行一个由六项关键安全功能(CSF)管理的工厂,而对手在有限的多轮会话中通过四个通道注入消息,每轮有反馈。危害是一个客观信号,而非LLM评判的文本:一旦任何CSF丢失,运行即终止,并归因于导致该消息。在固定攻击配对重放协议下评估四个前沿操作员模型,我们发现自适应多轮攻击可靠地将操作员团队推过安全极限:在这四个模型中,8.7%至12.1%的攻击会话以工厂失去关键安全功能告终。尽管这四个模型在此聚合率下看起来几乎同样鲁棒,但它们的失败几乎没有重叠:在149个会话中,没有一个会话击败所有四个模型,而三分之一的会话至少击败一个模型,因此漏洞在模型之间几乎是不相交的,而非嵌套的。添加防御的效果强烈依赖于模型:同一套护栏或安全顾问智能体对一个模型降低攻击成功率,却可能对另一个模型提高成功率。我们发布了模拟场地、攻击数据集和重放工具,用于LLM智能体的可重复安全评估。

英文摘要

Large language model (LLM) agents are increasingly proposed as supervisory components for safety-critical systems, yet their robustness under sustained, adaptive adversarial pressure remains poorly characterized. We present NRT-Bench, a benchmark for multi-turn red-teaming of LLM agents acting as operators of a safety-critical system, instantiated in a simulated nuclear power plant control room. A five-role operator team, each backed by a configurable LLM, runs a plant governed by six critical safety functions (CSFs), while adversaries inject messages over four channels in bounded multi-turn sessions with per-turn feedback. Harm is an objective signal rather than LLM-judged text: a run terminates the moment any CSF is lost, attributed to the causing message. Evaluating four frontier operator models under a fixed-attack paired-replay protocol, we find that adaptive multi-turn attacks reliably push the operator team past a safety limit: across the four models, between 8.7% and 12.1% of attack sessions end with the plant losing a critical safety function. Although the four models look almost equally robust by this aggregate rate, their failures barely overlap: of $149$ sessions, none defeat all four models while a third defeat at least one, so vulnerabilities are nearly disjoint across models rather than nested. The effect of added defences is strongly model-dependent: the same guardrail stack or safety-advisor agent that lowers attack success for one model can raise it for another. We release the simulation venue, attack dataset, and replay tooling for reproducible safety evaluation of LLM agents.

2606.20254 2026-06-19 cs.CR 新提交

Quantization as a Malicious Task: Removing Quantization-Conditioned Backdoors via Task Arithmetic

量化作为恶意任务:通过任务算术移除量化条件后门

Kaihsun Yang, Min-Yan Tsai, Chia-Mu Yu

AI总结 提出QVec方法,通过将量化引起的权重变化视为恶意任务向量,在部署前进行参数校正,无需重训练或触发样本即可防御量化条件后门。

详情
AI中文摘要

模型量化被广泛采用,以在资源受限设备上部署深度神经网络时减少内存使用和推理成本。然而,最近的研究揭示了一种新的安全威胁,称为量化条件后门(QCBs),其中模型在全精度下行为正常,但仅在量化后激活恶意行为。现有的防御通常修改量化过程或校正激活统计,往往引入额外的计算开销或依赖特定的量化设置。在这里,我们提出QVec,一种从参数空间角度防御QCBs的方法。我们观察到,全精度模型与其量化版本之间的权重差异编码了一种结构化的行为偏移,可以解释为恶意任务向量,而非随机量化噪声。基于这一见解,QVec通过在部署前进行受控的参数校正来抵消这一恶意方向。QVec无需重新训练,无需触发样本,仅需一次量化传递来估计参数偏移,以及轻量级的超参数搜索。在图像分类基准和多个大型语言模型(LLM)攻击场景中的大量实验表明,QVec在保持干净性能的同时,持续抑制后门激活。

英文摘要

Model quantization is widely adopted to reduce memory usage and inference cost when deploying deep neural networks on resource-constrained devices. However, recent studies have revealed a new security threat known as Quantization-Conditioned Backdoors (QCBs), where a model behaves normally in full precision but activates malicious behavior only after quantization. Existing defenses typically modify quantization procedures or correct activation statistics, often introducing additional computational overhead or relying on specific quantization settings. Here, we present QVec, a parameter-space perspective for defending against QCBs. We observe that the weight difference between a full-precision model and its quantized counterpart encodes a structured behavioral shift, which can be interpreted as a malicious task vector rather than random quantization noise. Based on this insight, QVec counteracts this malicious direction through controlled parameter correction prior to deployment. QVec requires no retraining, no trigger samples, and only a single quantization pass to estimate the parameter shift, together with a lightweight hyperparameter search. Extensive experiments across image classification benchmarks and multiple Large Language Model (LLM) attack scenarios demonstrate that QVec consistently suppresses backdoor activation while preserving clean performance.

2606.20251 2026-06-19 cs.CR 新提交

TrustMix: How to Mix Messages in a Mobile Ad-hoc Network

TrustMix:如何在移动自组织网络中混合消息

Yu Shen, Aiswarya Walter, Stefanie Roos

AI总结 提出TrustMix协议,通过分组转发和洗牌实现无中心信任的匿名通信,利用可链接环签名限制速率,在随机预言机模型下证明安全性,仿真和Android实现验证了匿名性和吞吐量。

Comments Accepted at ICDCS 2026, 11 pages

详情
AI中文摘要

混合网络是实现匿名性、防御各种流量分析攻击的高效方法。然而,混合网络通常为基础设施网络设计,无法直接应用于移动自组织网络(MANET)。现有的少数MANET解决方案需要预先了解拓扑结构或依赖可信中心方。本文提出TrustMix,一种无需任何中心可信方的MANET混合协议。在TrustMix中,各方加入组,消息通过多个组转发以提供匿名性。用户只需在附近找到一个他们认为可信的方,然后将消息转发到该方的组,该方在转发到其他组之前对消息进行洗牌,从而无法将原始消息与转发消息关联。此外,即使所选方是敌对的,只有当其组内所有方都是敌对的时,他们才能破坏匿名性,因为所有方都参与洗牌。除了匿名性,TrustMix还通过可链接环签名对消息数量实施速率限制,从而能够在不泄露身份的情况下检测到各方发送超过允许数量的消息。我们在随机预言机模型下证明了协议的安全性。我们使用现有的混合网络模拟器评估其匿名性,表明TrustMix显著提高了消息匿名性。最后,我们展示了基于Android的概念验证实现,并表明TrustMix在5个移动设备上实现了可接受的吞吐量。

英文摘要

Mix networks are a highly effective way to achieve anonymity, defending against a wide range of traffic-analysis attacks. However, mix networks are usually designed for infrastructure networks and cannot be directly applied in the context of mobile ad hoc networks (MANETs). The few existing solutions for MANETs require advance knowledge of the topology or a trusted central party. In this paper, we present TrustMix, a mix protocol for MANETs that operates without any central trusted party. In TrustMix, parties join groups and then messages are forwarded via multiple groups to provide anonymity. With TrustMix, users only need to find a party nearby that they consider trusted. They then forward the message to this party's group, and the party shuffles messages before forwarding to other groups, meaning that the original message and the forwarded message cannot be linked. Furthermore, even if the chosen party is adversarial, they can only break the anonymity if all parties in their group are adversarial as all of them contribute to the shuffling. In addition to anonymity, TrustMix also enforces rate limits on the number of messages through the use of linkable ring signatures, which allows detecting that parties send more messages that allowed without revealing identities. We prove the security of our protocol in the random oracle model. We evaluate its anonymity using an existing mix-network simulator and show that TrustMix significantly improves message anonymity. Finally, we present a proof-of-concept Android implementation and show that TrustMix achieves acceptable throughput with 5 mobile devices.

2606.20215 2026-06-19 cs.CR 新提交

GNSS Spoofing Threat for V2X communications

GNSS欺骗对V2X通信的威胁

Adolfo P. Jimenez, Juan Arquero-Gallego, Mario P. Luna, Jose E. Naranjo, Felipe Jimenez Alonso

AI总结 本文提出利用廉价软件定义无线电(SDR)对V2X通信实施GNSS欺骗攻击的方法,并在真实设备上验证了攻击效果,揭示了V2X通信易受欺骗且难以检测的安全漏洞。

Comments 2026 IEEE\@. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

详情
AI中文摘要

全球导航卫星系统(GNSS)是车联网(V2X)领域提供关键定位、导航和授时(PNT)服务的核心技术,对于生成维护网络可靠性和车辆安全性的协作感知消息(CAM)不可或缺。然而,GNSS信号极易受到欺骗攻击,这是一种高级攻击,攻击者发送模拟合法卫星特征的精心构造信号,误导接收器计算出错误位置。本文提出了一种使用廉价软件定义无线电(SDR)进行物理欺骗的方法,描述了一个坐标生成流水线,该流水线采用基于Haversine的距离计算、时间离散化以模拟恒定速度,以及线性插值来生成高保真GPS基带信号。所提出的攻击在真实的Commsignia车载单元(OBU)和路侧单元(RSU)设备上,使用HackRF One在三种场景下进行了实验验证,这些场景模拟了90 km/h、145 km/h和200 km/h稳定速度下的合成轨迹。本文最重要的贡献是证明了V2X通信并不安全,因为它们容易受到GNSS欺骗攻击,导致服务降级而未被检测到。

英文摘要

Global Navigation Satellite Systems (GNSS) constitute a core technology for delivering crucial positioning, navigation, and timing (PNT) services in the Vehicle-to-Everything (V2X) domain, where they are indispensable for generating Cooperative Awareness Messages (CAM) that uphold network reliability and vehicular safety. Yet, GNSS signals are acutely exposed to spoofing, an advanced attack in which an adversary transmits crafted signals that replicate legitimate satellite characteristics, misleading the receiver into computing a false position. This work presents a methodology for conducting physical spoofing with inexpensive Software Defined Radio (SDR), describing a coordinate generation pipeline that employs Haversine-based distance calculations, temporal discretization to emulate constant velocity, and linear interpolation to produce high-fidelity GPS baseband signals. The proposed attack is experimentally validated on real Commsignia OnBoard Unit (OBU) and RoadSide Unit (RSU) devices using a HackRF One across three scenarios that emulate synthetic trajectories at steady speeds of 90 km/h, 145 km/h, and 200 km/h. The most significant contribution of this paper is the demonstration that V2X communications are not secured, as they are susceptible to GNSS spoofing attacks, which cause service degradation without being detected.

2606.20214 2026-06-19 cs.CR 新提交

Accelerating Trust Convergence in IIoT: A ML Approach for Dynamic Network Conditions

加速工业物联网中的信任收敛:一种针对动态网络条件的机器学习方法

Aymen Bouferroum, Valeria Loscri, Abderrahim Benslimane

AI总结 针对工业物联网中网络质量波动导致信任收敛慢的问题,提出基于机器学习的信任收敛加速方法,通过预测收敛时间并动态调整转移概率,在挑战性条件下将收敛时间减少28.6%,并提升恶意节点场景下的评估准确性。

Comments Symposium: Communication \& Information Systems Security (CISS)

Journal ref IEEE Global Communications Conference (GLOBECOM) 2025, Dec 2025, Taipei, Taiwan. pp.4427-4432

详情
AI中文摘要

在工业物联网(IIoT)环境中,信任管理在保障系统安全方面起着至关重要的作用,尤其是在处理资源受限设备时。传统的信任模型往往忽视了网络质量波动的影响,导致信任收敛速度慢且评估不准确。在本文中,我们提出了一种动态信任管理解决方案,称为信任收敛加速(TCA)方法,该方法集成了机器学习(ML)以在恶劣网络条件下加速信任收敛。我们的模型基于关键网络指标预测信任收敛所需的时间单位,并动态调整信任模型中的转移概率以提高收敛速度。通过使用基于IEEE 802.11标准的模拟框架,该框架包含真实的Wi-Fi信道条件,我们展示了基于TCA方法的有效性,在挑战性条件下实现了高达28.6%的信任收敛时间减少。此外,所提出的解决方案在涉及恶意节点的场景中表现出韧性,提高了信任评估的准确性。这项工作为动态工业环境中的IIoT系统提供了一个可扩展且自适应的信任框架,确保了在不同网络条件下的稳健性能。

英文摘要

In Industrial Internet of Things (IIoT) environments, trust management plays a vital role in securing systems, especially when dealing with resource-constrained devices. Traditional trust models often overlook the impact of fluctuating network quality, leading to slower trust convergence and inaccurate assessments. In this paper, we propose a dynamic trust management solution, known as the Trust Convergence Acceleration (TCA) approach, which integrates Machine Learning (ML) to accelerate trust convergence under poor network conditions. Our model predicts the number of time units needed for trust convergence based on key network metrics and dynamically adapts transition probabilities in the trust model to enhance convergence speed. Using a simulation framework that incorporates realistic Wi-Fi channel conditions based on the IEEE 802.11 standard, we demonstrate the effectiveness of the TCA-based approach, achieving up to a 28.6% reduction in trust convergence time under challenging conditions. Furthermore, the proposed solution exhibits resilience in scenarios involving malicious nodes, improving trust evaluation accuracy. This work provides a scalable and adaptive trust framework for IIoT systems in dynamic industrial environments, ensuring robust performance under varying network conditions.

2606.19983 2026-06-19 cs.CR 新提交

A Measurement Study of Cryptographic Misuse in Embodied AI Mobile Applications

具身AI移动应用中加密误用的测量研究

Junchao Li, Xuelei Wang, Yuhang Huang, Qi Wang, Boyang Ma, Xuelong Dai, Minghui Xu, Yue Zhang

AI总结 首次大规模测量具身AI移动应用的加密误用,通过自动化语义分析管道发现12,975个误用实例,揭示延迟敏感控制路径和离线配置导致的结构性安全权衡。

详情
AI中文摘要

具身AI (EAI) 移动应用正从辅助用户界面演变为主动控制路径组件,直接将移动端加密安全与网络物理信任联系起来。尽管发生了这种转变,现有的安全研究主要关注具身AI设备和云基础设施,而移动控制层作为关键攻击面在很大程度上未被探索。为了弥补这一差距,我们提出了首个针对EAI移动生态系统内加密误用的大规模测量研究。我们构建了EAIAppZoo,一个涵盖六个EAI领域的507个真实世界应用的基准测试,并采用自动化语义分析管道来测量五种主要加密失效模式的普遍性和特征。我们的测量结果产生了12,975个误用发现(评估精度为80.74%),揭示这些加密失效是由EAI特定的工程约束而非随机开发者错误驱动的。我们揭示了结构性的安全权衡:延迟敏感的控制路径系统性地削弱了传输保护,而对离线设备配置和遗留物联网SDK的严重依赖加剧了本地硬编码认证凭证的问题。通过真实世界案例研究,我们展示了这些移动端加密缺陷如何绕过名义上的网络保护,使攻击者能够拦截命令通道并劫持EAI实体的物理控制。最终,我们的发现强调,移动应用已成为网络物理系统中一个脆弱但被忽视的加密信任边界。

英文摘要

Embodied AI (EAI) mobile applications are evolving from auxiliary user interfaces into active control-path components, directly linking mobile-side cryptographic security to cyber-physical trust. Despite this shift, existing security research predominantly focuses on embodied AI devices and cloud infrastructures, leaving the mobile control layer largely unexplored as a critical attack surface. To bridge this gap, we present the first large-scale measurement study of cryptographic misuse within the EAI mobile ecosystem. We construct EAIAppZoo, a benchmark of 507 real-world applications across six EAI domains, and employ an automated semantic-aware analysis pipeline to measure the prevalence and characteristics of five major cryptographic failure modes. Our measurement yields 12,975 misuse findings (with an evaluated precision of 80.74\%), revealing that these cryptographic failures are driven by EAI-specific engineering constraints rather than random developer errors. We uncover structural security trade-offs: latency-sensitive control paths systematically weaken transport protection, while the heavy reliance on offline device provisioning and legacy IoT SDKs exacerbates the local hardcoding of authentication credentials. Through real-world case studies, we demonstrate how these mobile-side cryptographic flaws bypass nominal network protections, enabling adversaries to intercept command channels and hijack the physical control of EAI entities. Ultimately, our findings highlight that mobile applications have become a fragile, yet overlooked, cryptographic trust boundary in cyber-physical systems.

2606.19937 2026-06-19 cs.CR 新提交

AutoTam: Specifying Secure Protocol Implementations with Tamarin Model Generation

AutoTam: 通过 Tamarin 模型生成指定安全协议实现

Johannes Wilson, Mikael Asplund, Niklas Johansson

AI总结 提出一种语言优先方法,通过领域特定语言实现协议并自动生成 Tamarin 模型,验证迹属性并保证其传递到实现,同时集成符号执行分析内存安全,在签名 Diffie-Hellman 和 WireGuard 协议上验证了安全性和互操作性。

Comments 19 pages, 5 figures

详情
AI中文摘要

形式化验证是确保密码协议安全性的重要但具有挑战性的任务。虽然现代协议验证工具显著减少了验证工作量,但对于没有形式化验证背景的从业者来说,建模仍然具有挑战性。此外,将验证结果转移到具体的协议实现需要专业知识。在本文中,我们提出了一种新颖的语言优先方法,通过使用领域特定语言进行协议实现来验证迹属性。我们针对 Tamarin 证明器进行验证,并证明验证的通用迹属性可以转换回实现。我们还集成了符号执行以分析协议实现的内存安全性。我们使用我们的工具实现并生成了签名 Diffie-Hellman 协议和 WireGuard VPN 协议的准确模型。当使用我们的解释器时,我们的 WireGuard 实现与现有实现可互操作,并达到了可接受的性能。我们通过符号执行和生成的 Tamarin 模型的验证相结合,正式证明了我们的实现是安全的。

英文摘要

Formal verification is a challenging but important task for ensuring the security of cryptographic protocols. While modern protocol verification tools significantly reduce verification effort, modelling remains challenging to practitioners without a background in formal verification. In addition, transferring verification results to a concrete protocol implementation requires expert knowledge. In this paper, we present a novel language-first method for verification of trace properties using a domain-specific language for protocol implementations. We target the Tamarin prover for verification, and we prove that verified universal trace properties translate back to the implementation. We additionally integrate symbolic execution in order to analyse the memory safety of protocol implementations. We use our tool to implement and generate accurate models for a signed Diffie-Hellman protocol, and for the WireGuard VPN protocol. Our WireGuard implementation is interoperable with existing implementations when using our interpreter, and achieves acceptable performance. We formally prove our implementations secure using a combination of symbolic execution and verification of the generated Tamarin models.

2606.19887 2026-06-19 cs.CR cs.AI 新提交

FFinRED: An Expert-Guided Benchmark Generation and Evaluation Framework for Financial LLM Red-Teaming

FFinRED:面向金融大语言模型红队测试的专家引导基准生成与评估框架

Chaeyun Kim, Daeyoung Park, Junghwan Kim, Jinyoung Jeong, Eunji Song, Yongtaek Lim, Minwoo Kim

AI总结 提出FinRED框架,通过专家引导的两级分类法将全球金融标准映射为威胁,并利用真实金融文档生成上下文丰富的红队行为提示,结合专家验证的评估标准,有效降低关键假阴性。

详情
AI中文摘要

现有的安全基准主要针对通用对抗场景,但忽略了金融领域的特定风险。金融大语言模型面临监管合规违规、欺诈助长和系统性信任侵蚀等问题,需要有针对性的评估。我们引入了FinRED,一个与金融专家共同开发的、用于金融大语言模型安全评估的专家引导红队测试框架。FinRED采用新颖的两级分类法,将全球标准(如FATF和EU DORA)映射到从监管规避到复杂欺诈的威胁,并结合可扩展的流水线,通过专家定义的架构将真实金融文档转换为上下文丰富的红队行为提示(种子)。严格的专家验证确认了种子的合理性和真实性,以实现有意义的LLM安全评估。我们还提供了一个经过专家验证的、金融专用的评估标准,该标准超越了免责声明检查,比静态的一刀切标准更贴近人类专家,并将关键假阴性从28个减少到12个。FinRED与国际采纳的风险管理和信息安全标准(如ISO/IEC 27001)保持一致,已在韩国金融安全研究院(FSI)的监管沙盒中部署,用于真实金融服务中的生成式AI安全评估。为减轻双重用途风险,数据集、生成流水线、提示模板和评估框架对合格研究人员开放,访问地址为:此https URL和此https URL。

英文摘要

Existing safety benchmarks target general adversarial scenarios but miss finance-specific risks. Financial LLMs face regulatory compliance violations, fraud facilitation, and systemic trust erosion that require targeted evaluation. We introduce FinRED, an expert-guided red-teaming framework for financial LLM safety evaluation developed with financial experts. FinRED uses a novel two-level taxonomy mapping global standards (e.g., FATF and EU DORA) to threats ranging from regulatory evasion to complex fraud, integrated with a scalable pipeline that converts real financial documents into context-rich red-teaming Behavioral Prompts (seeds) through an expert-defined schema. Rigorous expert validation confirms seed plausibility and realism for meaningful LLM safety evaluation. We also provide an expert-validated, finance-specific rubric that goes beyond disclaimer checks, aligns more closely with human experts than static one-size-fits-all rubrics, and reduces critical false negatives from 28 to 12. Aligned with internationally adopted risk-management and information-security standards (e.g., ISO/IEC 27001), FinRED is deployed in South Korea's Financial Security Institute (FSI) regulatory sandbox for generative AI security evaluation in real financial services. To mitigate dual-use risks, the dataset, generation pipeline, prompt template, and evaluation framework are gated for qualified researchers at https://github.com/selectstar-ai/FinRED-paper and https://huggingface.co/datasets/datumo/FinRED.

2606.19866 2026-06-19 cs.CR 新提交

Low-Cost Multi-Precision Systolic Arrays for Accelerating FHE NTTs on AI ASICs

低成本多精度脉动阵列用于在AI ASIC上加速FHE NTT

George Alexakis, Dimitrios Schoinianakis, Giorgos Dimitrakopoulos

AI总结 针对FHE在AI硬件上因精度不匹配导致的性能瓶颈,提出一种最小修改的多精度脉动阵列,在统一数据流下原生执行全精度输出重建,实现1.33倍加速。

详情
AI中文摘要

全同态加密(FHE)确保了强大的数据隐私,但面临难以承受的计算开销。在AI硬件(如张量处理单元TPU)上加速FHE很有前景,但受到精度不匹配的根本限制:TPU针对8位算术优化,而FHE及其关键部分(如数论变换NTT)需要高精度。当前方法通过矩阵分解在低精度矩阵引擎上执行NTT计算来弥合这一差距。然而,重建全精度结果需要移位加累加,这与矩阵乘法的数据流不匹配。这迫使将全精度重建从矩阵引擎卸载到向量处理器,破坏了矩阵乘法数据流,造成显著的性能瓶颈。为解决这一限制,我们提出一种最小修改的多精度脉动阵列,在统一数据流下,与低精度矩阵乘法同步,在阵列内部原生执行全精度输出重建。使用OpenRoad在7nm工艺下综合,我们的设计硬件开销可忽略不计。使用SCALE-Sim的周期精确模拟表明,在128x128矩阵引擎上,对于2^12到2^16的变换大小,在所提出的架构上原生执行NTT可实现至少1.33倍的加速,成功使标准AI硬件支持高精度FHE加速。

英文摘要

Fully Homomorphic Encryption (FHE) ensures robust data privacy but suffers from prohibitive computational overhead. Accelerating FHE on AI hardware like Tensor Processing Units (TPUs) is promising, yet fundamentally limited by a precision mismatch: TPUs are optimized for 8-bit arithmetic, whereas FHE and its critical parts such as the Number Theoretic Transform (NTT), demand high precision. Current approaches bridge this gap using matrix decomposition to execute NTT computations on low-precision matrix engines. However, reconstructing the full-precision results requires shift-and-add accumulation that does not match the dataflow of matrix multiplication. This forces offloading full-precision reconstruction from matrix engines to vector processors that disrupts the matrix multiplication dataflow, creating significant performance bottleneck. To resolve this limitation, we propose a minimally modified multi-precision systolic array that performs full-precision output reconstruction natively within the array in sync with low-precision matrix multiplication under a uniform dataflow. Synthesized at 7nm with OpenRoad, our design incurs negligible hardware overhead. Cycle-accurate simulations using SCALE-Sim demonstrate that natively executing NTTs on the proposed architecture achieves at least 1.33x speedup, for transform sizes 2^12 to 2^16 on 128x128 matrix engines, successfully enabling standard AI hardware to support high-precision FHE acceleration.

2606.19826 2026-06-19 cs.CR cs.MA 新提交

Heterogeneous LLM Debate Under Adversarial Peers: Honest Gains, Replacement Costs, and Resilience

对抗性同伴下的异构LLM辩论:诚实增益、替代成本与韧性

Prashanti Nilayam, Kiran Kumar Ramanna, Prashil Tumbade, Sankalp Nayak

AI总结 研究异构LLM辩论中诚实与对抗性同伴对修正行为的影响,发现诚实同伴降低有害修正率,对抗性同伴则逆转,且异构性在已有对手时也能作为防御。

详情
AI中文摘要

异构LLM辩论的动机在于,多样化的同伴可以相互纠正,但同样的交流既携带纠正也携带对抗性影响。我们通过跟踪异构同伴如何改变诚实智能体的修正行为来衡量哪种影响占主导:他们改变答案的频率,以及这种改变是纠正性的还是有害的。我们比较了匹配面板(同质基线、诚实混合和对抗混合)以及受污染面板(其中已存在一个恶意的同族同伴),涵盖四个模型家族和三个推理基准。一个诚实的异构同伴显著降低了有害修正,而对抗性同伴则逆转了这一效果。对于Llama-3.1-70B防御者在MATH-hard上,诚实插槽的有害修正率从同质面板的89%下降到有诚实同伴时的35%,而对抗性同伴使其回到90%。条件率对弱防御者隐藏了这种损害,但辩论结束时的翻转率暴露了它。该模式在家族和基准上保持符号一致,而其幅度随防御者-基准机制变化。我们还测量了当已存在一个对抗性同族同伴时的效果:一个诚实的异构同伴降低了有害修正率以及最初正确答案丢失的比率。在相同的Llama-3.1-70B设置下,添加的诚实同伴将最初正确项上的翻转率从同族对手下的31%降至6%。因此,异构性不仅是一个攻击面,而且当对手已经存在时,也是一种防御。

英文摘要

Heterogeneous LLM debate is motivated by the promise that diverse peers correct one another, but the same exchange that carries correction also carries adversarial influence. We measure which dominates by tracking how a heterogeneous peer changes the honest agents' revision behavior: how often they change their answer, and whether the change is corrective or harmful. We compare matched panels (homogeneous baseline, honest-mixed, and adversarial-mixed) and contaminated panels in which a malicious same-family peer is already present, spanning four model families and three reasoning benchmarks. An honest heterogeneous peer sharply lowers harmful revision, and an adversarial one reverses it. For Llama-3.1-70B defenders on MATH-hard, the honest-slot harmful-revision rate falls from 89% in the homogeneous panel to 35% with an honest peer, and an adversarial peer returns it to 90%. The conditional rate hides this damage on weak defenders, but the end-of-debate flip rate exposes it. The pattern keeps its sign across families and benchmarks while its magnitude varies with the defender-benchmark regime. We also measure the effects when an adversarial same-family peer is already present: an honest heterogeneous peer lowers both harmful revision and the rate at which initially-correct answers are lost. On the same Llama-3.1-70B setting, the added honest peer cuts the flip rate on initially-correct items from 31% under a same-family adversary to 6%. Heterogeneity is therefore not only an attack surface but, when an adversary is already present, also a defense.

2606.19807 2026-06-19 cs.CR 新提交

DISARM: Target Electronic Device Informed Mitigation of Software Runtime Side-Channel Vulnerabilities

DISARM:目标电子设备知情的软件运行时侧信道漏洞缓解

Tasneem Suha, Tanzim Mahfuz, Rima Asmar Awad, Prabuddha Chakraborty

AI总结 提出DISARM方法,利用真实嵌入式设备时序值生成针对性软件修复,以缓解运行时侧信道漏洞,在五个不同设备上优于现有方案。

详情
AI中文摘要

程序运行时或时序攻击利用程序执行时间的变化来提取敏感信息(如加密密钥、敏感变量数据、知识产权)。针对运行时侧信道攻击的最新解决方案试图平衡不同控制流路径下敏感代码的执行时间,以消除时序泄漏。然而,在缓解过程中,大多数技术未考虑目标程序运行的底层硬件或设备。这可能导致过度修复(不必要的额外操作)、修复不足(未正确解决不平衡)甚至失败。我们提出DISARM,一种联合硬件-软件方法(不同于任何现有解决方案),用于缓解运行时侧信道漏洞,该方法利用真实嵌入式设备的时序值生成针对性的软件修复。我们实现了DISARM以支持C、C++和Java源代码,并在22个标准基准测试上进行验证。在五个不同的嵌入式或边缘设备上,DISARM在执行时间开销、代码大小开销和正确性方面均优于现有解决方案如PENDULUM和DifFuzzAR。

英文摘要

Program runtime or timing attacks exploit variations in a program's execution times to extract sensitive information from the program (e.g. encryption keys, sensitive variable data, intellectual property). State-of-the-art solutions to runtime side-channel attacks attempt to balance the execution time of the sensitive code for different control flow paths to eliminate the timing leakage. However, during the mitigation process, most techniques do not consider the underlying hardware or device on which the target program is supposed to run on. This can lead to over-fixing (unnecessary extra operations), under-fixing (not solving the imbalance properly), and even failures. We propose DISARM, a joint hardware-software methodology (unlike any existing solution) for mitigating runtime side-channel vulnerabilities that utilizes timing values from real embedded devices to generate targeted software fixes. We implement DISARM to support C, C++, and Java source codes and validate it across 22 standard benchmarks. DISARM outperforms state-of-the-art solutions such as PENDULUM and DifFuzzAR in terms of execution time overhead, code size overhead, and correctness on five different embedded or edge devices.

2606.19755 2026-06-19 cs.CR cs.AI 新提交

SafeSpec: Fast and Safe LLM via Dynamic Reflective Sampling

SafeSpec: 通过动态反射采样实现快速且安全的LLM

Haotian Xu, Zeyang Zhang, Linbao Li, Huadi Zheng, Yu Li, Cheng Zhuo

AI总结 提出SafeSpec框架,将轻量安全头集成到推测解码的验证过程中,通过风险估计和反射采样恢复安全生成,在保持加速的同时显著降低攻击成功率。

详情
AI中文摘要

推测推理加速了大语言模型(LLM)的解码过程,但本身不提供任何安全保障。现有的安全防御措施与推测推理大多不兼容:它们要么引入额外的计算,要么破坏草稿-验证机制,抵消加速优势。这揭示了当前安全方法与推测解码之间的根本性不兼容。我们提出SafeSpec,一个安全感知的推测推理框架,将风险估计直接集成到验证过程中。SafeSpec在目标模型上附加一个轻量级的潜在安全头,以在单次前向传递中联合评估语义有效性和安全性。当检测到不安全生成时,SafeSpec应用回滚和安全引导的反射多次采样来恢复安全延续,而不是终止生成。我们将越狱攻击建模为生成轨迹上的分布偏移,其中对抗性提示增加了有害延续的概率,但并未消除安全延续。在此模型下,SafeSpec在推测解码过程中执行风险感知的轨迹恢复。在多个模型和对抗基准测试中,SafeSpec实现了显著改进的安全-效率权衡。在Qwen3-32B上,SafeSpec将攻击成功率降低了15%,同时在良性工作负载上保持了2.06倍的推理加速,表明推测加速和推理时安全性可以联合优化。

英文摘要

Speculative inference accelerates large language model (LLM) decoding but provides no inherent safety guarantees. Existing safety defenses are largely incompatible with speculative inference: they either introduce additional computation or disrupt the draft-verify mechanism, negating acceleration benefits. This reveals a fundamental incompatibility between current safety methods and speculative decoding. We propose SafeSpec, a safety-aware speculative inference framework that integrates risk estimation directly into the verification process. SafeSpec attaches a lightweight latent safety head to the target model to jointly evaluate semantic validity and safety in a single forward pass. When unsafe generations are detected, SafeSpec applies rollback and safety-guided reflective multi-sampling to recover safe continuations rather than terminating generation. We model jailbreak attacks as distributional shifts over generative trajectories, where adversarial prompts increase the probability of harmful continuations without eliminating safe ones. Under this model, SafeSpec performs risk-aware trajectory recovery within the speculative decoding process. Across multiple models and adversarial benchmarks, SafeSpec achieves a substantially improved safety-efficiency trade-off. On Qwen3-32B, SafeSpec reduces attack success rates by 15% while preserving a 2.06x inference speedup on benign workloads, demonstrating that speculative acceleration and inference-time safety can be jointly optimized.

2606.19692 2026-06-19 cs.CR cs.DB cs.IR 新提交

When Global Gating Is Enough: Admission-Time Hubness Control in Anisotropic Vector Retrieval Systems

当全局门控足够:各向异性向量检索系统中的准入时间枢纽性控制

Prashant Kumar Pathak, Tarun Kumar Sharma

AI总结 针对检索增强生成中向量枢纽性引发的投毒风险,提出准入时间控制方法,通过哨兵查询评分隔离枢纽文档,全局门控在多个数据集上达到高召回率和低误报率。

详情
AI中文摘要

向量枢纽性(少数点成为许多查询的最近邻)在检索增强生成(RAG)中造成投毒风险:一个注入的文档可能影响不相关的请求。现有防御使用周期性反向k近邻扫描,存在暴露窗口和重复的全语料库工作。我们研究准入时间控制,根据哨兵查询对每个候选文档评分,并在插入前隔离类似枢纽的文档。在两个10万文档语料库、五个编码器以及不相交的攻击者和防御者查询集上,全局门控在决定性嵌入空间点达到召回率1.0(有效范围内>=0.92),在HotFlip攻击上达到0.91 +/- 0.07,对一般文档的误报率为1%。每主题门控没有提供可靠的好处,这与各向异性耦合局部和全局可见性一致。阈值是增量维护的,插入成本与语料库大小无关,删除成本摊销。在HNSW上,准入增加约3.1%的摄入延迟,评分在10^6向量上保持平坦,近似索引下1.2%的决策翻转,不涉及攻击。来源信息补充了门控对自然或紧密领域枢纽的处理。

英文摘要

Vector hubness, where a few points become nearest neighbors of many queries, creates a poisoning risk in retrieval-augmented generation (RAG): one injected document can influence unrelated requests. Existing defenses use periodic reverse-kNN scans, leaving an exposure window and repeated corpus-wide work. We study admission-time control, scoring each candidate against sentinel queries and quarantining hub-like documents before insertion. Across two 100,000-document corpora, five encoders, and disjoint attacker and defender query sets, a global gate achieves recall 1.0 at the decisive embedding-space point (>=0.92 across the effective range) and 0.91 +/- 0.07 on HotFlip attacks, with 1% false positives on general documents. A per-topic gate provides no reliable benefit, consistent with anisotropy coupling local and global visibility. Thresholds are maintained incrementally, with corpus-size-independent insertion cost and amortized deletion cost. On HNSW, admission adds about 3.1% to ingestion latency, scoring remains flat to 10^6 vectors, and 1.2% of decisions flip under approximate indexing, none involving attacks. Provenance complements the gate for natural or tight-domain hubs.

2606.19660 2026-06-19 cs.CR cs.CL 新提交

A Layered Security Framework Against Prompt Injection in RAG-Based Chatbots

基于RAG的聊天机器人中针对提示注入的分层安全框架

Gulshan Saleem, Nisar Ahmed, Muhammad Imran Zaman, Ali Hassan

AI总结 提出三层防御框架,通过输入过滤、上下文指令层级和输出审计,将提示注入攻击成功率从71.4%降至11.3%,误报率4.8%,延迟开销61.2毫秒。

Comments Submitted in ICCK Transactions on Information Security and Cryptography

详情
AI中文摘要

提示注入被OWASP Top 10 for LLM Applications列为大语言模型(LLM)部署中最关键的漏洞,然而现有防御措施仅在孤立的流水线阶段运行且不完整。输入过滤器无法检查检索到的文档,而输出监控器无法阻止恶意载荷到达模型。因此,检索增强生成(RAG)聊天机器人仍然容易受到间接注入攻击,其中被污染的知识库文档会损害每个检索到它的用户。我们提出了一个三层框架,在推理流水线中拦截直接和间接的提示注入。第一层使用基于规则的模式库和微调后的语义异常分类器筛选用户输入。第二层在上下文组装期间强制执行基于来源的指令层级,防止检索到的内容覆盖操作员策略。第三层在交付前使用策略规则引擎和语义漂移检测器审计模型输出。一个持续审计循环聚合结构化日志,并支持重新训练以适应新兴攻击模式。该框架与模型无关,作为中间件部署,无需修改底层LLM。在GPT-4o、Llama 3和Mistral 7B上对5,080个样本的评估显示,该框架将攻击成功率(ASR)从71.4%降至11.3%,比最佳单层基线高出27.3个百分点,比已发布的护栏系统高出23.8个百分点,同时保持4.8%的误报率和61.2毫秒的中位延迟开销。消融研究证实,所有三层提供互补保护,且其组合效果超过单个贡献的总和。

英文摘要

Prompt injection is ranked as the most critical vulnerability in large language model (LLM) deployments by the OWASP Top 10 for LLM Applications, yet existing defenses operate at isolated pipeline stages and remain incomplete. Input filters cannot inspect retrieved documents, while output monitors cannot prevent malicious payloads from reaching the model. Consequently, retrieval-augmented generation (RAG) chatbots remain vulnerable to indirect injection, where a poisoned knowledge-base document compromises every user whose query retrieves it. We present a three-layer framework that intercepts both direct and indirect prompt injection throughout the inference pipeline. Layer 1 screens user input using a rule-based pattern library and a fine-tuned semantic anomaly classifier. Layer 2 enforces a provenance-based instruction hierarchy during context assembly, preventing retrieved content from overriding operator policy. Layer 3 audits model output using a policy rule engine and semantic drift detector before delivery. A continuous audit loop aggregates structured logs and supports retraining to adapt the classifier to emerging attack patterns. The framework is model-agnostic and deploys as middleware without modifying the underlying LLM. Evaluation on 5,080 samples across GPT-4o, Llama 3, and Mistral 7B shows that the framework reduces Attack Success Rate (ASR) from 71.4\% to 11.3\%, outperforming the best single-layer baseline by 27.3 percentage points and a published guardrail system by 23.8 percentage points, while maintaining a 4.8\% false positive rate and a median latency overhead of 61.2 ms. Ablation studies confirm that all three layers provide complementary protection and that their combined effect exceeds the sum of individual contributions.

2606.19654 2026-06-19 cs.CR cs.SE 新提交

PUFFERDOS: Efficient and Effective Attack String Generation for Regular Expression Denial of Service Vulnerabilities

PUFFERDOS:针对正则表达式拒绝服务漏洞的高效攻击字符串生成

Shangzhi Xu, Ziqi Ding, Xiao Cheng, Yuekang Li, Nan Sun, Benjamin Turnbull, Shuangxiang Kan, Siqi Ma

AI总结 提出PUFFERDOS方法,通过定义三种脆弱模式并利用合成技术与组合符号执行,生成在现实长度预算内且经程序验证有效的ReDoS攻击字符串。

Comments Accepted by S&P'26

详情
AI中文摘要

ReDoS攻击构成了一类关键的资源耗尽漏洞。在此类攻击中,攻击者利用正则表达式引擎的病态最坏情况执行行为,诱导高度不对称的计算工作负载,最终耗尽系统资源并降低服务可用性。为了保护系统免受ReDoS攻击,研究人员提出了许多检测技术,这些技术通过生成攻击字符串来模拟攻击过程,以便在早期开发阶段主动利用ReDoS漏洞并促进修复。现有技术大致分为两类:搜索病态正则表达式结构的静态分析,以及合成候选攻击字符串的动态探索方法。然而,生成的攻击字符串通常不适用于实际利用,因为它们往往假设不切实际的输入长度预算,并且未在程序级别验证攻击的有效性和效率。因此,许多生成的字符串在应用于实际程序时无法触发易受攻击的正则表达式,进一步限制了其实用性。为了解决这些不足,我们引入了一种有效且高效的攻击字符串生成器PUFFERDOS,旨在合成在现实长度预算内可行且经程序级别验证的攻击输入,从而实现对实际程序中ReDoS漏洞的有效利用。具体来说,我们首先基于观察和形式化验证定义了三种脆弱模式。根据这些模式,PUFFERDOS采用合成技术生成攻击字符串,然后通过针对ReDoS的组合符号执行对字符串进行细化和验证,以确保现实世界中的可利用性。

英文摘要

ReDoS attacks constitute a critical class of resource-exhaustion vulnerabilities. In such attacks, adversaries exploit the pathological worst-case execution behavior of regular expression (regex) engines to induce highly asymmetric computational workloads, ultimately exhausting system resources and degrading service availability. To protect systems against ReDoS attacks, numerous detection techniques have been proposed that simulate the attack process by generating attack strings to proactively exploit ReDoS vulnerabilities at the early development stage and facilitate remediation. Existing techniques broadly fall into two classes: static analyses that search for pathological regex structures, and dynamic exploration methods that synthesize candidate attack strings. However, the generated attack strings are often impractical for real-world exploitation because they usually assume unrealistic input-length budgets and do not validate the effectiveness and efficiency of the attack at the program level. Therefore, many generated strings fail to trigger vulnerable regexes when applied to real-world programs, further limiting the practical utility. To address these shortcomings, we introduce an effective and efficient attack string generator, PUFFERDOS, designed to synthesize attack inputs that are both feasible within realistic length budgets and validated at the program level, enabling effective exploitation of ReDoS vulnerabilities in real-world programs. Specifically, we first define three vulnerable patterns based on our observation and formal verification. According to the patterns, PUFFERDOS conducts a synthesis technique to generate attack strings, and then refines and validates the strings with ReDoS-specific compositional concolic execution to guarantee real-world exploitability.

2606.19620 2026-06-19 cs.CR 新提交

G-Lox: Group-Adaptive, Privacy-Preserving Bridge Distribution with Two-Party Computation

G-Lox: 基于两方计算的组自适应、隐私保护桥分发

Baigang Chen, Nicholas Hopper

AI总结 提出G-Lox桥分发系统,通过两方安全计算实现隐藏的组级自适应分配,保护分发者盲性,支持阻塞报告、传输感知重分配和隐私保护组分裂。

详情
AI中文摘要

我们提出G-Lox(组自适应Lox),一种桥分发系统,在保持Lox风格分发者盲性的同时,实现隐藏的、有状态的组级自适应。G-Lox将自适应分配逻辑置于双服务器隐私墙之后,因此没有单个服务器能学习组标识符或组到桥的分配。私有状态访问和状态相关更新使用双服务器DPF/FSS协议和安全两方计算,支持阻塞报告、传输感知重分配和隐私保护组分裂。我们通过系统测量和策略模拟评估G-Lox。在我们的C++/EMP实现中,基于真实TCP套接字,私有状态访问的客户端可见开销较低:在状态大小高达2^16时,每次迭代的通信量保持在低KiB范围。在M=1024时,客户端发送1,968字节,接收1,280字节,每次迭代完成约0.25秒。针对特定组阻塞和女巫枚举的模拟表明,在保持广泛发行的系统中,G-Lox相比类似Lox和rBridge的基线提高了鲁棒性。

英文摘要

We present G-Lox (group-adaptive Lox), a bridge-distribution system that preserves Lox-style distributor blindness while enabling hidden, stateful group-level adaptation. G-Lox places adaptive assignment logic behind a two-server privacy wall, so no single server learns group identifiers or group-to-bridge assignments. Private state access and state-dependent updates use two-server DPF/FSS protocols and secure two-party computation, supporting blockage reporting, transport-aware reassignment, and privacy-preserving group splitting. We evaluate G-Lox through system measurements and policy simulation. In our C++/EMP implementation over real TCP sockets, private state access has low client-visible overhead: across state sizes up to 2^16, communication remains in the low-KiB range per iteration. At M=1024, the client sends 1,968 bytes, receives 1,280 bytes, and completes an iteration in about 0.25 s. Simulations with group-specific blocking and Sybil enumeration show that G-Lox improves robustness over Lox- and rBridge-like baselines among systems that maintain broad issuance.

2606.19535 2026-06-19 cs.CR cs.LG 新提交

FloatDoor: Platform-Triggered Backdoors in LLMs

FloatDoor: 大语言模型中的平台触发后门

Nils Loose, Jonas Sander, Felix Mächtle, Thomas Eisenbarth

AI总结 提出FloatDoor,首个输入无关、平台触发的后门攻击,利用浮点运算平台差异,通过两个轻量LoRA适配器在目标平台触发恶意行为,同时保持模型正常效用。

详情
AI中文摘要

大型语言模型(LLM)越来越多地部署在软件工程等敏感环境中,其输出直接影响下游工件。最近的研究表明,由于非结合浮点运算和不同的内核实现,同一模型在不同部署平台上可能产生可测量的不同输出。我们研究了这种平台依赖可变性的安全影响,并揭示了LLM部署中一种新的攻击面。我们提出了FloatDoor,这是首个针对生成式LLM的输入无关、平台触发的后门攻击。被攻陷的模型在目标平台上表现出对手选择的行为,而在其他平台上则表现正常。FloatDoor通过两个轻量级LoRA适配器实现:一个放大平台间数值差异,另一个将由此产生的平台签名绑定到恶意下游任务,同时保持模型整体效用基本不变。FloatDoor利用了模型审计和部署之间的显著检查时间与使用时间差距。我们在Qwen3-4B上展示了FloatDoor,涵盖了广泛的部署目标,包括NVIDIA GPU、Google TPU、AWS Graviton和阿里巴巴Yitian-710。作为最终案例研究,我们展示了FloatDoor能够在选定的目标平台上可靠地诱导可利用的代码漏洞。我们的结果建立了一类新的LLM部署攻击,并强调了在敏感的LLM驱动应用中建立可信模型供应链的迫切需求。

英文摘要

Large language models (LLMs) are increasingly deployed in sensitive settings such as software engineering, where their outputs directly shape downstream artifacts. Recent work has shown that an identical model can produce measurably different outputs depending on the deployment platform, a consequence of non-associative floating-point arithmetic and divergent kernel implementations. We study the security implications of this platform-dependent variability and uncover a novel attack surface on LLM deployments. We introduce FloatDoor, the first input-independent, platform-triggered backdoor attack against generative LLMs. The compromised model exhibits adversary-chosen behavior when served on a target platform and is otherwise benign. FloatDoor is realized through two lightweight LoRA adapters, one that amplifies inter-platform numerical divergence and one that binds the resulting platform signature to a malicious downstream task, while leaving aggregate model utility largely intact. FloatDoor exploits a pronounced time-of-check, time-of-use gap between model auditing and serving. We demonstrate FloatDoor on Qwen3-4B across a broad range of deployment targets, including NVIDIA GPUs, Google TPUs, AWS Graviton, and Alibaba Yitian-710. As a final case study, we show that FloatDoor reliably induces exploitable code vulnerabilities on a chosen target platform. Our results establish a new class of attacks on LLM deployments and underscore the pressing need for trusted model supply chains in sensitive, LLM-powered applications.

2606.19474 2026-06-19 cs.CR cs.AI cs.SE 新提交

Secure Coding Drift in LLM-Assisted Post-Quantum Cryptography Development: A Gamified Fix

LLM辅助后量子密码开发中的安全编码漂移:一种游戏化修复方案

R. D. N. Shakya, C. P. Wijesiriwardana, S. M. Vidanagamachchi, Nalin A. G. Arachchilage

AI总结 提出LLM辅助PQC开发中的安全编码漂移模型,通过游戏化框架将LLM转变为主动安全协作者,以缓解长期依赖LLM导致的安全退化。

Comments Accepted for 2026 SIGIR Workshop on Vulnerabilities in Generative Systems for Information Retrieval track

详情
AI中文摘要

向后量子密码学(PQC)的过渡引入了相当大的实现复杂性,要求严格遵守恒定时间执行、侧信道抵抗和精确参数化。同时,大型语言模型(LLM)已深度嵌入软件开发工作流程,包括密码工程。虽然LLM提高了生产力,但证据表明它们经常生成不安全或次优的代码,特别是在安全关键领域。本文引入了PQC中的安全编码漂移,这是一种新颖的社会技术漏洞模型,捕捉了由于持续依赖LLM生成的代码而导致的安全编码实践逐渐退化。与先前关注静态漏洞的工作不同,我们将安全风险概念化为一种源于人机交互的纵向行为现象。为了缓解这一问题,我们提出了一种游戏化的、LLM增强的安全编码框架,将对抗性评估、行为反馈和安全评分嵌入开发工作流程。我们的方法将LLM从被动助手重新定义为主动安全协作者,为AI中介环境中的更安全PQC实现做出贡献。

英文摘要

The transition to Post Quantum Cryptography (PQC) introduces considerable implementation complexity, requiring strict adherence to constant-time execution, side channel resistance, and precise parametrisation. Simultaneously, large language models (LLMs) are heavily embedded in software development workflows, including cryptographic engineering. While LLMs improve productivity, evidence shows that they frequently generate insecure or suboptimal code, particularly in security critical domains. This paper introduces Secure Coding Drift in PQC, a novel socio technical vulnerability model capturing the gradual degradation of secure coding practices due to sustained reliance on LLM-generated code. Unlike prior work that focuses on static vulnerabilities, we conceptualise security risk as a longitudinal behavioural phenomenon rising from human AI interaction. To mitigate this, we propose a gamified, LLM augmented secure coding framework that embeds adversarial evaluation, behavioural feedback, and security scoring into development workflows. Our approach reframes LLMs from passive assistants into active security co-pilots, contributing toward safer PQC implementation in AI mediated environments.

2606.19149 2026-06-19 cs.CR cs.LG 新提交

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

OpenAnt:通过代码分解、对抗性验证和动态测试实现LLM驱动的漏洞发现

Nahum Korda, Gadi Evron

AI总结 提出OpenAnt系统,结合静态分析与LLM推理,通过代码分解、对抗性验证和动态测试三阶段流水线,在降低误报率的同时发现未知漏洞。

详情
AI中文摘要

在大型代码库中自动发现漏洞仍然具有挑战性:传统静态分析误报率高,而模糊测试等动态方法需要大量基础设施且通常针对狭窄的漏洞类别。大型语言模型(LLM)的最新进展使得对程序行为进行语义推理成为可能,但将LLM应用于仓库级安全分析会引入上下文管理、成本和验证方面的挑战。我们提出了OpenAnt,一个开源漏洞发现系统,它在多阶段流水线中集成了静态程序分析与基于LLM的推理。OpenAnt引入了三种关键技术。首先,代码库被分解为自包含的分析单元,并通过从外部入口点的可达性进行过滤,将分析面减少高达97%,同时保留与攻击相关的代码。其次,候选漏洞通过受限攻击者模拟进行对抗性验证,其中模型在现实攻击者能力下评估可利用性。第三,通过动态验证确认发现结果,其中自动生成利用环境,在沙箱容器中执行,并在使用后丢弃。在包括OpenSSL、WordPress和Flowise在内的广泛使用的开源项目上的评估表明,这种架构可以识别先前未知的漏洞,同时保持可管理的分析成本并大幅减少误报。我们的结果表明,结合语义推理与利用验证的闭环漏洞发现流水线,为可扩展的自动化安全分析提供了一条实用路径。OpenAnt已在Apache 2.0许可下开源,网址为https://this https URL。

英文摘要

Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while dynamic approaches such as fuzzing require substantial infrastructure and often target narrow classes of bugs. Recent advances in large language models (LLMs) enable semantic reasoning about program behavior, but applying LLMs to repository-scale security analysis introduces challenges related to context management, cost, and verification. We present OpenAnt, an open-source vulnerability discovery system that integrates static program analysis with LLM-based reasoning in a multi-stage pipeline. OpenAnt introduces three key techniques. First, codebases are decomposed into self-contained analysis units filtered by reachability from external entry points, reducing the analysis surface by up to 97% while preserving attack-relevant code. Second, candidate vulnerabilities undergo adversarial verification through constrained attacker simulation, where the model evaluates exploitability under realistic attacker capabilities. Third, findings are validated through dynamic verification, in which exploit environments are generated automatically, executed in sandboxed containers, and discarded after use. Evaluation on widely used open-source projects including OpenSSL, WordPress, and Flowise shows that this architecture can identify previously unknown vulnerabilities while maintaining manageable analysis cost and substantially reducing false positives. Our results suggest that closed-loop vulnerability discovery pipelines, combining semantic reasoning with exploit validation, provide a practical path toward scalable automated security analysis. OpenAnt is released as open source under the Apache 2.0 license at https://github.com/knostic/OpenAnt.

2606.18996 2026-06-19 cs.CR cs.AI 新提交

TRAP: Benchmark for Task-completion and Resistance to Active Privacy-extraction

TRAP:任务完成与主动隐私提取抵抗基准

Moon Ye-Bin, Nam Hyeon-Woo, Baek Seong-Eun, Yejin Yeo, Tae-Hyun Oh

发表机构 * Dept. of Electrical Engineering, POSTECH(POSTECH电子工程系) Grad. School of Artificial Intelligence, POSTECH(POSTECH人工智能研究生院) School of Computing, KAIST(韩国科学技术院计算机学院)

AI总结 提出TRAP基准,评估智能体在文档密集型任务中平衡任务准确性与隐私泄露的能力,发现所有模型均存在非平凡泄露,并证明基于提示的防御无法同时实现高任务成功率和零泄露概率,提出结构化的私有字段隔离方法。

详情
AI中文摘要

智能体越来越多地部署在文档密集型工作流中,其中敏感私人信息不是边缘情况而是常规输入,例如,预订航班的智能体需要护照号码。在这种情况下,智能体必须使用私人信息准确完成任务,同时绝不在其响应中暴露这些信息,因为它无法验证键盘前实际是谁。这两个义务存在根本性矛盾。一个能够使用私人信息完成任务的模型,同样可能被诱导泄露这些信息。为了评估任务准确性与隐私泄露之间的权衡,我们引入了任务完成与主动隐私提取抵抗(TRAP)。每个场景包括一个包含私人信息的文档、一个要求智能体使用私有字段调用正确工具的任务查询,以及一个试图以自然语言引出相同信息的攻击查询。评估了涵盖前沿专有和开源模型的22个模型,我们发现所有模型系列都表现出非平凡的泄露,并且指令遵循能力与泄露率相关。现有的基于提示的防御减少了泄露,但以显著降低任务准确性为代价。提示优化未能摆脱这种权衡。我们证明这种失败并非偶然。对于任何基于softmax的模型,没有软约束防御(例如基于提示的防御)能够同时实现高任务成功率和零泄露概率。受这一不可能性结果的启发,我们提出了结构化的私有字段隔离,该方法在私有字段到达模型之前用哈希键替换它们。这种方法在保持任务准确性的同时很大程度上防止了泄露。

英文摘要

Agents are increasingly deployed in document-intensive workflows where sensitive private information is not an edge case but a routine input, e.g., an agent booking a flight needs passport numbers. In such settings, the agent must use private information to complete tasks accurately while never exposing it in its responses, because it cannot verify who is actually at the keyboard. These two obligations are in fundamental tension. A model capable enough to use private information for task completion can, by the same capability, be induced to reveal it. To evaluate the trade-off of task accuracy and privacy leakage, we introduce Task-completion and Resistance to Active Privacy-extraction (TRAP). Each scenario includes a document containing private information, a task query that requires the agent to invoke the correct tool using private fields, and an attack query that attempts to elicit the same information in natural language. Evaluating 22 models spanning frontier proprietary and open-source models at multiple scales, we find that all model families exhibit non-trivial leakage, and that instruction-following ability correlates with leakage rate. Existing prompt-based defenses reduce leakage but at significant cost to task accuracy. Prompt optimization fails to escape this trade-off. We demonstrate that this failure is not incidental. For any softmax-based model, no soft-constraint defense, e.g., prompt-based defenses, can jointly achieve high task success with zero leakage probability. Motivated by this impossibility result, we propose structural private field isolation, which replaces private fields with hash keys before they reach the model. This approach largely prevents leakage while keeping task accuracy.

2606.18325 2026-06-19 cs.CR cs.AI 新提交

Agentra: A Supervisable Multi-Agent Framework for Enterprise Intrusion Response

Agentra: 一种可监督的多智能体企业入侵响应框架

Raj Patel, Shaswata Mitra, Michele Guida, Stefano Iannucci, Sudip Mittal, Shahram Rahimi

发表机构 * The University of Alabama, Alabama, USA(阿拉巴马大学) Roma Tre University, Rome, Italy(罗马三大学)

AI总结 提出可监督的多智能体入侵响应框架Agentra,通过角色划分、规划-验证循环、安全网关和风险评分机制,将警报转化为结构化响应计划,在120事件语料上F1从0.61提升至0.84,有害动作率降至0.0%。

详情
AI中文摘要

企业入侵响应仍然依赖于静态剧本和分析师驱动的分类,导致警报生成与遏制之间存在延迟。我们提出Agentra,一个可监督的多智能体入侵响应系统(IRS)框架,它将来自IDS、EDR和XDR平台的警报转换为基于MITRE ATT&CK、MITRE D3FEND和NIST CSF 2.0的结构化事件响应计划。Agentra将响应推理分解到角色范围的智能体中,通过有界的规划器-验证器审查循环验证提议的计划,通过审核安全网关筛选检索到的威胁情报,通过行动目录和风险评分门控行动,并将决策记录在仅追加的审计日志中。我们在来自ThreatHunter-Playbook、Splunk BOTSv3和DARPA OpTC的120事件语料库上,将Agentra与静态OASIS CACAO v2.0网络剧本基线进行了评估。最强的配置将感知假阳性的IRS F1从0.61提高到0.84,并在仅规划器配置引入不安全过度反应后,将预计的有害动作率恢复到静态基线水平0.0%。这些结果表明,多智能体响应规划可以在保持分析师批准和可审计性的同时,提高基于本体的IRS覆盖率。

英文摘要

Enterprise intrusion response still depends on static playbooks and analyst-driven triage, creating delay between alert generation and containment. We present Agentra, a supervisable multi-agent Intrusion Response System (IRS) framework that converts alerts from IDS, EDR, and XDR platforms into structured incident response plans grounded in MITRE ATT&CK, MITRE D3FEND, and NIST CSF 2.0. Agentra decomposes response reasoning across role-scoped agents, validates proposed plans through a bounded Planner--Validator review loop, screens retrieved threat intelligence through a Moderator security gateway, gates actions through an Action Catalog and risk score, and records decisions in an append-only audit log. We evaluate Agentra against a static OASIS CACAO v2.0 cyber-playbook baseline on a 120-event corpus drawn from ThreatHunter-Playbook, Splunk BOTSv3, and DARPA OpTC. The strongest configuration improves FP-aware IRS F1 from 0.61 to 0.84 and restores the projected harmful-action rate to the static baseline level of 0.0% after Planner-only configurations introduce unsafe overreaction. These results indicate that multi-agent response planning can improve ontology-grounded IRS coverage while preserving analyst approval and auditability.

2606.20315 2026-06-19 q-bio.GN cs.CR 交叉投稿

bioETH-Beacon: A Confidential On-Chain Genomic Beacon with Encrypted Counts, Filters, and Bounded Noise over a Fully Homomorphic EVM

bioETH-Beacon: 基于全同态EVM的机密基因组信标,支持加密计数、过滤和有界噪声

Christos Galanopoulos, Kimon Antonios Provatas, Ilias Georgakopoulos-Soares

AI总结 提出基于全同态EVM的智能合约原型bioETH-Beacon,实现加密基因组信标查询,通过加密计数、有界噪声和访问控制抵御成员推理攻击,并优化查询成本。

Comments 11 pages, 6 figures, 8 tables. Research prototype for privacy-preserving genomics using Fully Homomorphic Encryption (FHE) on blockchain (fhEVM)

详情
AI中文摘要

全球基因组学与健康联盟(GA4GH)Beacon协议允许研究人员查询某个基因组变异是否在参与队列中被观察到,并返回聚合的变异级计数。随着Beacon网络的发展,两个隐私风险依然存在:宿主机构可以看到明文查询,而重复的罕见变异查询可能支持成员推理攻击。我们提出了bioETH-Beacon,一个智能合约原型,它在全同态以太坊虚拟机(fhEVM)上对加密数据执行Beacon“聚合计数”查询。医院上传加密的标记计数条目,授权研究人员提交加密的标记查询,合约返回加密答案,通过链下密钥管理服务仅释放给合约链上ACL中指定的请求者。该设计组织为一个3x4的层级-查询族网格,涵盖基因型、性别、年龄和表型查询,层级在更强的机密性和更低的查询成本之间进行权衡。对于基因型路径,原型可以添加链上有界噪声以减轻探测攻击。基于多基因评分(PGS)目录的合成面板实验显示了预期的扩展行为,并证明当公共标记存在是可接受的权衡时,预聚合可以显著降低查询gas成本。总体而言,bioETH-Beacon提供了一个无需可信计算评估者的机密Beacon式基因组查询研究原型。

英文摘要

The Global Alliance for Genomics and Health (GA4GH) Beacon protocol lets researchers ask whether a genomic variant has been observed in a participating cohort and receive aggregate variant-level counts. As Beacon networks grow, two privacy risks remain: host institutions can see plaintext queries, and repeated rare-variant queries can support membership-inference attacks. We present bioETH-Beacon, a smart-contract prototype that runs the Beacon "aggregate count" query over encrypted data on a fully homomorphic Ethereum Virtual Machine (fhEVM). Hospitals upload encrypted marker-count entries, authorized researchers submit encrypted marker queries, and the contract returns an encrypted answer that is released, via an off-chain key-management service, only to the requester named in the contract's on-chain ACL. The design is organized as a 3x4 tier-by-query-family grid spanning genotype, sex, age, and phenotype queries, with tiers that trade stronger confidentiality for lower query cost. For genotype paths, the prototype can add bounded on-chain noise to mitigate probing attacks. Experiments on synthetic panels derived from a Polygenic Score (PGS) catalog show the expected scaling behavior and demonstrate that pre-aggregation can substantially reduce query gas when public marker presence is an acceptable trade-off. Overall, bioETH-Beacon provides a research prototype for confidential Beacon-style genomic querying without a trusted compute evaluator.