AI Agent - arXivDaily 专题

2508.04086 2026-06-18 cs.CL 版本更新 95%

ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"

ToolGrad：利用文本“梯度”高效生成工具使用数据集

Zhongyi Zhou, Kohei Uehara, Haoyu Zhang, Jingtao Zhou, Lin Gu, Ruofei Du, Zheng Xu, Tatsuya Harada

发表机构 * Google（谷歌）； The University of Tokyo（东京大学）； RIKEN AIP（日本学术振兴会AIP）； Tohoku University（东北大学）

专题命中工具调用：提出ToolGrad框架生成工具使用数据集

AI总结提出ToolGrad框架，通过文本“梯度”引导的迭代过程先构建有效工具使用链再合成用户查询，实现低成本、高成功率的数据生成，训练模型性能超越基线。

Comments ACL 2026 Findings. Source code: https://github.com/zhongyi-zhou/toolgrad

2606.18947 2026-06-18 cs.AI cs.CL cs.IR cs.MA 新提交 90%

Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents

将搜索与推理解耦：面向LLM Agent的供应商无关的接地架构

Emmanuel Aboah Boateng, Kyle MacDonald, Amardeep Kumar, Siddharth Kodwani, Sudeep Das

发表机构 * DoorDash, Inc.（DoorDash公司）

专题命中工具调用：提出解耦搜索接地架构，增强LLM Agent搜索能力

AI总结提出解耦搜索接地（DSG）架构，将搜索接地从推理模型中分离，通过MCP兼容网关实现供应商路由、缓存等控制，在降低成本和延迟的同时保持或提升准确性。

Comments 15 pages, Figure 8

详情

AI中文摘要

生产级LLM Agent越来越依赖实时搜索，但原生搜索接地将检索策略、供应商选择、证据注入、成本、延迟和生成行为捆绑在单一模型-供应商边界内。这种耦合使得接地难以检查、调优、重用或移植，并可能触发搜索诱导的冗长，破坏严格的输出合约。我们提出解耦搜索接地（DSG），一种供应商无关的边界，通过MCP兼容网关将接地移出推理模型，将供应商路由、源感知上下文渲染、配置的回退、检索深度控制以及精确和语义缓存作为一级控制暴露。在SimpleQA、FreshQA和HotpotQA上的五个前沿模型上，原生搜索在时效性敏感的FreshQA上领先，但DSG在控制重要时展现出更强的前沿：在SimpleQA上，它以91%更低的搜索成本接近原生准确率（86.1%对87.7%），保持简洁答案合约，并以68%更低的延迟达到99.4%的热缓存命中率。作为大规模Agent工作负载的共享生产接地层部署，DSG在电商查询理解（QIU）工作负载上匹配或略超原生搜索准确率，同时将搜索成本降低超过98%。实时接地最好被视为可优化的接口边界，而非固定的模型特性。

英文摘要

Production LLM agents increasingly depend on real-time search, yet native search grounding bundles retrieval policy, provider choice, evidence injection, cost, latency, and generation behavior behind a single model-provider boundary. This coupling makes grounding hard to inspect, tune, reuse, or port, and can trigger Search-Induced Verbosity that breaks strict output contracts. We present Decoupled Search Grounding (DSG), a vendor-agnostic boundary that moves grounding outside the reasoning model through an MCP-compatible gateway, exposing provider routing, source-aware context rendering, configured fallback, retrieval-depth control, and exact plus semantic caching as first-class controls. Across five frontier models on SimpleQA, FreshQA, and HotpotQA, native search leads on recency-sensitive FreshQA, but DSG exposes a stronger frontier when control matters: on SimpleQA it nearly matches native accuracy (86.1% vs. 87.7%) at 91% lower search cost, preserves concise answer contracts, and reaches a 99.4% warm-cache hit rate with 68% lower latency. Deployed as a shared production grounding layer for large-scale agentic workloads with interchangeable models, DSG matches or slightly exceeds native-search accuracy on an e-commerce query-understanding (QIU) workload while cutting search cost by over 98%. Real-time grounding is best treated as an optimizable interface boundary, not a fixed model feature.

URL PDF HTML ☆

赞 0 踩 0

2606.18467 2026-06-18 stat.ML cs.LG 新提交 85%

ToolChain-CRC: Conformal Risk Control for Agentic AI Under Retrieval and Tool-Use Drift

ToolChain-CRC: 检索与工具使用漂移下代理型AI的共形风险控制

Jeffery Opoku, David Banahene

发表机构 * The University of Texas Rio Grande Valley（德克萨斯大学里奥格兰德谷分校）； Florida International University（佛罗里达国际大学）

专题命中工具调用：代理型AI工具使用风险控制

AI总结针对检索增强和工具使用代理在漂移下的风险控制问题，提出ToolChain-CRC方法，通过构建轨迹级风险评分并校准接受或干预规则，实现可证明的轨迹级风险控制。

Comments 26 pages, 11 figures

详情

AI中文摘要

现代AI代理检索文档、调用工具、检查中间信息，然后产生最终答案或行动。这产生了一个仅从最终答案无法察觉的风险控制问题。即使检索薄弱、工具输出错误或早期步骤缺乏支持，最终响应也可能看起来可接受。我们提出ToolChain-CRC，一种针对漂移下检索增强和工具使用代理的共形风险控制方法。该方法将每次代理运行视为动作、观察和最终输出的完整轨迹。它构建步骤级风险评分，将其组合成轨迹风险评分，校准接受或干预规则，并添加一个随时报警，可在最终答案前停止风险运行。我们在可交换校准运行下证明了轨迹级风险控制，给出了具有可审计常数的漂移感知扩展，并通过超鞅构造证明了随时升级规则。实验涵盖合成工具链漂移、RAG/工具使用压力测试、基于SQuAD的公共检索任务、无API代理问答案例研究、消融实验、目标风险敏感性检查、20种子鲁棒性检查、漂移边界审计以及实时RAG/工具使用代理基准。在这些设置中，仅基于最终答案的校准可能遗漏检索和工具故障，而轨迹级校准将接受轨迹的风险保持在目标之下。

英文摘要

Modern AI agents retrieve documents, call tools, check intermediate information, and then produce a final answer or action. This creates a risk-control problem that is not visible from the final answer alone. A final response may look acceptable even when the retrieval was weak, a tool output was wrong, or an earlier step was unsupported. We propose ToolChain-CRC, a conformal risk-control method for retrieval-augmented and tool-using agents under drift. The method treats each agent run as a full trajectory of actions, observations, and final output. It builds step-level risk scores, combines them into a trajectory risk score, calibrates an accept-or-intervene rule, and adds an anytime alarm that can stop risky runs before the final answer. We prove trajectory-level risk control under exchangeable calibration runs, give a drift-aware extension with auditable constants, and prove an anytime escalation rule through a supermartingale construction. Experiments cover synthetic tool-chain drift, RAG/tool-use stress tests, public SQuAD-derived retrieval tasks, an API-free agentic QA case study, ablations, target-risk sensitivity checks, 20-seed robustness checks, a drift-margin audit, and a live RAG/tool-use agent benchmark. Across these settings, final-answer-only calibration can miss retrieval and tool failures, while trajectory-level calibration keeps accepted-trajectory risk below the target.

URL PDF HTML ☆

赞 0 踩 0

2606.19242 2026-06-18 cs.SE 新提交 85%

Runtime Compliance Verification for AI Agents

AI代理的运行时合规性验证

Nafiseh Kahani, Masoud Barati, Diana Addae

专题命中工具调用：AI代理运行时合规性验证框架

AI总结提出C-Trace框架，通过运行时监控和形式化策略谓词，确保AI代理在工具调用和对话中遵守GDPR规则，将攻击成功率降至12%以下。

2606.19047 2026-06-18 cs.AI 新提交 85%

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

RODS: 面向多轮工具使用智能体的奖励驱动在线数据合成

Ruishan Fang, Siyuan Lu, Chenyi Zhuang, Tao Lin

发表机构 * Zhejiang University（浙江大学）； Shanghai Innovation Institute（上海创新研究院）； Westlake University（西湖大学）

专题命中工具调用：多轮工具使用智能体，奖励驱动数据合成。

AI总结针对多轮工具使用强化学习中静态数据集信息样本快速耗尽的问题，提出RODS方法，利用进度奖励方差作为零成本边界检测器，在线合成与智能体能力边界匹配的样本，以约800样本达到17K样本离线管道的性能。

详情

AI中文摘要

多轮工具使用强化学习受限于静态数据集中信息样本的快速耗尽。我们观察到GRPO中的梯度信号集中在具有最高 rollout 奖励方差的任务上，这是Popoviciu上界的结果。因此，位于智能体能力边界附近（成功与失败大致平衡）的样本贡献了不成比例的大策略梯度。随着训练进行，该边界不断移动，逐渐耗尽静态数据集中的信息样本池。我们提出RODS（奖励驱动在线数据合成）来解决这种耗尽问题。RODS通过将进度奖励方差重新用作一个实用的、零成本的边界检测器（除了训练中已计算的rollout外无需额外推理），来闭环RL训练与数据生成。它持续识别这些边界样本，通过技能对齐的重采样管道合成与其结构复杂度（例如API拓扑和依赖深度）匹配的新多轮变体，并管理一个与策略共同演化的动态回放缓冲区。从400个人工种子开始并维持约800个样本的活动训练池，RODS实现了与17K样本离线管道相当的性能，同时所需轨迹数量约少20倍，并在我们的受控设置中优于固定数据RL和环境增强方法。

英文摘要

Multi-turn tool-use RL is bottlenecked by the rapid depletion of informative samples in static datasets. We observe that the gradient signal in GRPO concentrates on tasks with the highest rollout reward variance, a consequence of the Popoviciu upper bound. Consequently, samples near the agent's capability boundary -- where successes and failures are roughly balanced -- contribute disproportionately large policy gradients. As training progresses, this boundary continuously shifts, which gradually depletes the pool of informative samples in a static dataset. We propose RODS (Reward-driven Online Data Synthesis) to resolve this depletion. RODS closes the loop between RL training and data generation by repurposing the progress reward variance as a practical, zero-cost boundary detector that requires no extra inference beyond the rollouts already computed for training. It continuously identifies such boundary samples, synthesizes new multi-turn variants matching their structural complexity (e.g., API topology and dependency depth) via a skill-aligned resampling pipeline, and manages a dynamic replay buffer that co-evolves with the policy. Starting from 400 human seeds and maintaining an active training pool of ~800 samples, RODS achieves comparable performance to a 17K-sample offline pipeline while requiring roughly 20x fewer trajectories, and improves over fixed-data RL and environment augmentation in our controlled setting.

URL PDF HTML ☆

赞 0 踩 0

2606.18902 2026-06-18 cs.CL 新提交 85%

SAGE: Stochastic Prompt Optimization via Agent-Guided Exploration

SAGE: 基于智能体引导探索的随机提示优化

Ziyi Zhu, Luka Smyth, Saki Shinoda, Jinghong Chen

发表机构 * Slingshot AI ； Department of Engineering, University of Cambridge（剑桥大学工程系）

专题命中工具调用：多智能体诊断代码执行实现提示优化

AI总结提出随机提示优化框架SPO，其中SAGE方法通过多智能体诊断代码执行实现黑盒搜索，在多个基准测试中表现依赖于错误类型，并在心理健康聊天机器人中通过连续优化显著提升次日留存率。

详情

AI中文摘要

上下文工程已成为无需参数更新即可改进AI系统的主要手段。最近研究表明文本梯度并非真实梯度，这促使我们将自动提示优化（APO）视为黑盒搜索。我们引入了SPO（随机提示优化），一个在提示空间上进行随机搜索的框架，并比较了三种复杂度递增的策略：基于错误信息的随机搜索、带有进化算子的遗传算法以及SAGE（基于智能体引导探索的SPO），后者是一个具有诊断代码执行的多智能体流水线。在三个基准测试中，没有单一策略占主导地位；有效性取决于景观结构与错误类型的相互作用。我们进一步在连续优化范式下将SAGE部署到一个心理健康聊天机器人上，它将八个个体噪声A/B测试周期累积为次日留存率的统计显著提升。我们认为，将定性诊断与定量验证相结合是使智能体优化对开放式任务导向对话有效的关键。

英文摘要

Context engineering has emerged as a primary lever for improving AI systems without parameter updates. Recent work showing that textual gradients do not function as real gradients motivates treating automatic prompt optimization (APO) as black-box search. We introduce SPO (Stochastic Prompt Optimization), a framework for stochastic search over prompt space, and compare three strategies of increasing sophistication: error-informed random search, a genetic algorithm with evolutionary operators, and SAGE (SPO via Agent-Guided Exploration), a multi-agent pipeline with diagnostic code execution. Across three benchmarks, no single strategy dominates; effectiveness depends on the interaction of landscape structure with error type. We further deploy SAGE on a mental-health chatbot under a continuous optimization paradigm, where it compounds eight cycles of individually-noisy A/B tests into a statistically robust gain in next-day retention. We argue that coupling qualitative diagnosis with quantitative validation is what makes agentic optimization effective for open-ended task-oriented dialogue.

URL PDF HTML ☆

赞 0 踩 0

2606.18789 2026-06-18 eess.SY cs.SY 新提交 85%

PowerAgentBench-SS: A Benchmark for Agentic AI in Power System Steady-State Studies

PowerAgentBench-SS：电力系统稳态研究中智能体AI的基准测试

Costas Mylonas, Magda Foti, Andrea Pomarico, Matheus Duarte, Qian Zhang, Emmanouel Varvarigos

专题命中工具调用：LLM智能体执行电力系统工作流

AI总结提出PowerAgentBench-SS基准框架，用于评估LLM智能体在电力系统稳态研究中执行工程工作流的能力，通过工具API、验证预算和风险敏感指标区分智能体性能。

详情

AI中文摘要

电力系统基准测试通常评估数值求解器、预测模型或顺序控制器。这些基准是必要的，但它们不直接测试大型语言模型（LLM）智能体是否能执行工程工作流：检查电网案例、选择工具、调用模拟器、筛选 contingencies、提出可接受的缓解措施、验证结果并生成可审计的证据链。本文介绍了PowerAgentBench-SS，一个用于评估电力系统运行和规划研究中工具使用智能体的稳态基准框架。该基准向智能体公开案例数据、动作约束、工具API和验证预算，同时隐藏的评估器重新计算物理有效性并对提交的报告进行评分。我们定义了智能体接口、工具契约、证据日志和风险敏感指标，包括提交召回率、证据支持召回率、发现召回率、假安全惩罚、严重性遗憾、残余违规分数、动作成本、工具使用效率和工作流诊断。为了使框架具体化，我们在可复现的直流热N-2 contingency搜索试点中实例化该协议，使用确定性IEEE 39节点运行点变体，包括脚本基线、LLM JSON命令适配器、三个本地托管的Ollama LLM智能体和一个OpenAI API智能体。结果表明为什么仅求解器或仅答案评估是不够的：智能体不仅通过顶级contingency发现来区分，还通过验证预算使用、显式提交、类型强制、重复验证、证据支持报告和缓解行为来区分。

英文摘要

Power system benchmarks usually evaluate numerical solvers, prediction models, or sequential controllers. These benchmarks are necessary, but they do not directly test whether a Large Language Model (LLM) agent can execute an engineering workflow: inspect a grid case, select tools, call simulators, screen contingencies, propose admissible mitigations, validate results, and produce an auditable evidence trail. This paper introduces PowerAgentBench-SS, a steady-state benchmark framework for evaluating tool-using agents in power system operation and planning studies. The benchmark exposes public case data, action constraints, a tool API, and a validation budget to an agent, while a hidden evaluator recomputes physical validity and scores the submitted report. We define the agent interface, tool contract, evidence log, and risk-sensitive metrics, including submitted recall, evidence-backed recall, found recall, false-safe penalties, severity regret, residual violation score, action cost, tool-use efficiency, and workflow diagnostics. To make the framework concrete, we instantiate the protocol in a reproducible DC thermal N-2 contingency-search pilot on deterministic IEEE 39-bus operating-point variants, with scripted baselines, an LLM JSON-command adapter, three locally hosted Ollama LLM agents, and one OpenAI API agent. The results show why solver-only or answer-only evaluation is insufficient: agents are distinguished not only by top-contingency discovery, but also by validation-budget use, explicit submission, type coercions, duplicate validations, evidence-backed reporting, and mitigation behavior.

URL PDF HTML ☆

赞 0 踩 0

2605.29676 2026-06-18 cs.AI cs.CL 版本更新 85%

Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems

符号至关重要：智能体AI系统中令牌优化格式的基准研究

Lorenz Kutschka, Bernhard Geiger

发表机构 * Know Center Research GmbH（知中心研究有限公司）； Graz University of Technology（格拉茨技术大学）； Graz Center for Machine Learning（格拉茨机器学习中心）

专题命中工具调用：智能体系统中令牌优化格式，提升工具调用效率

AI总结本研究在四个智能体基准上评估了两种令牌优化格式TOON和TRON，发现TRON在保持准确率的同时最多减少27%的令牌，而TOON虽减少18%但存在多轮解析失败和并行工具调用输出崩溃的问题。

Comments 16 pages, 6 figures, 4 tables

详情

AI中文摘要

智能体AI系统中的大型语言模型消耗工具模式和执行结果，并发出结构化数据的工具调用。这种交换的默认语言JSON是为应用间交换而非令牌效率设计的，因此其结构元素带来大量令牌开销。最近的工作提出了令牌优化替代方案，如TOON（令牌导向对象表示法）和TRON（令牌减少对象表示法）作为更紧凑的替代，但这些格式仅在孤立的理解或生成任务上进行了评估。它们在端到端智能体循环中是否保持令牌减少仍是一个开放问题。我们在四个智能体基准（BFCL、MCPToolBenchPP、MCP-Universe、StableToolBench）和五个开放权重LLM上评估了TOON和TRON，将输入压缩与输出压缩解耦，以独立测量理解和生成。TRON最多减少27%的令牌，准确率在JSON基线的14个百分点内。TOON实现了最多18%的减少，准确率成本类似为9个百分点，但在多轮解析失败上额外级联，并且对于大多数模型导致并行工具调用输出崩溃。

英文摘要

Large language models in Agentic AI systems consume tool schemas and execution results and emit tool invocations as structured data. The default language for that exchange, JSON, was designed for application-to-application interchange rather than token efficiency, so its structural elements impose substantial token overhead. Recent work proposes token-optimized alternatives such as TOON (Token-Oriented Object Notation) and TRON (Token Reduced Object Notation) as more compact replacements, but these formats have been evaluated only on isolated comprehension or generation tasks. Whether their token reductions hold inside end-to-end agentic loops therefore remains an open question. We evaluate TOON and TRON on four agentic benchmarks (BFCL, MCPToolBenchPP, MCP-Universe, StableToolBench) and five open-weight LLMs, decoupling input compression from output compression to measure comprehension and generation independently. TRON reduces tokens by up to 27% with accuracy within 14pp of the JSON baseline. TOON achieves up to 18% reduction at a similar 9pp accuracy cost, but additionally cascades on multi-turn parsing failures and collapses parallel tool-call output for most models. The code is available at: https://github.com/lkutschka/notation-matters

URL PDF HTML ☆

赞 0 踩 0

2606.18803 2026-06-18 cs.AI cs.CY 新提交 80%

ProfiLLM: Utility-Aligned Agentic User Profiling for Industrial Ride-Hailing Dispatch

ProfiLLM: 面向工业网约车调度的效用对齐智能用户画像

Tengfei Lyu, Zirui Yuan, Xu Liu, Kai Wan, Zihao Lu, Li Ma, Hao Liu

发表机构 * Didichuxing Co. Ltd（滴滴出行科技有限公司）

专题命中工具调用：LLM智能体用于网约车调度用户画像

AI总结提出ProfiLLM，一种通过工具增强全局知识挖掘和效用对齐画像探索的智能LLM数据管道，解决工业网约车调度中大规模行为日志的用户画像问题，在滴滴生产系统中实现AUC提升6.14%、GMV提升4.35%。

详情

AI中文摘要

将大型语言模型（LLM）作为语义特征提取器引入工业网约车调度，处理平台规模的行为日志，是一个引人注目但尚未充分探索的数据系统问题。生产匹配管道仍然以结构化数值特征为主，但关键的行为信号（例如，驾驶员对某些区域的习惯性厌恶）本质上是上下文相关的，并且可以自然地表达为LLM生成的用户画像。然而，将这种画像扩展到实时的、毫秒级延迟的调度器面临三个相互交织的约束，这些约束很少被一起解决：在一个拥有数百万日订单量的平台上，日志超出任何LLM的上下文窗口数个数量级；大多数用户是长尾用户，交互太少无法进行单个用户画像；表面流畅的画像不一定能提高下游预测效用。我们提出了ProfiLLM，一个智能LLM数据管道，通过两个模块实现面向生产匹配系统的效用对齐用户画像。（1）工具增强全局知识挖掘：为LLM智能体配备27个分析工具，用于挖掘平台规模的数据，生成可复用的全局知识、自适应用户聚类规则和区域级供需先验。（2）效用对齐画像探索：为每个聚类生成多个候选画像，通过轻量级下游效用代理进行评估，迭代优化最佳候选，并为DPO微调构建偏好对。在滴滴生产调度器上部署后，ProfiLLM在结果预测中实现了高达+6.14%的相对AUC改进，在调度模拟中实现了高达+4.35%的GMV增长，并在14天在线A/B测试中持续改进，包括+0.47% GMV、+0.33%完成率和-0.82%接单前取消率。

英文摘要

Bringing Large Language Models (LLMs) into industrial ride-hailing dispatch as semantic feature extractors over platform-scale behavioral logs is a compelling but under-explored data systems problem. Production matching pipelines remain dominated by structured numerical features, yet decisive behavioral signals (e.g., a driver's habitual aversion to certain regions) are inherently contextual and naturally expressible as LLM-generated user profiles. However, scaling such profiling to a live, millisecond-latency dispatcher faces three intertwined constraints rarely addressed together: on a platform with millions of daily orders, logs exceed any LLM's context window by orders of magnitude; most users are long-tail, with too few interactions for per-user profiling; and surface-fluent profiles do not necessarily improve downstream prediction utility. We present ProfiLLM, an agentic LLM data pipeline that operationalizes utility-aligned user profiling for production matching systems through two modules. (1) Tool-Augmented Global Knowledge Mining equips an LLM agent with 27 analytical tools to mine platform-scale data, producing reusable global knowledge, adaptive user clustering rules, and region-level supply-demand priors. (2) Utility-Aligned Profile Exploration generates multiple candidate profiles per cluster, evaluates them via a lightweight downstream utility proxy, iteratively refines the best candidates and constructs preference pairs for DPO fine-tuning. Deployed on DiDi's production dispatcher, ProfiLLM achieves up to +6.14% relative AUC improvement in outcome prediction, up to +4.35% GMV gain in dispatching simulation, and consistent improvements in a 14-day online A/B test including +0.47% GMV, +0.33% Completion Rate, and -0.82% Cancel-Before-Accept rate.

URL PDF HTML ☆

赞 0 踩 0

2606.18550 2026-06-18 cs.CR 新提交 70%

The Gate Is Only as Honest as Its Contracts: ContractGuard for the Contract Layer of Risk-Aware Causal Gating

门仅与其合约一样诚实：面向风险感知因果门控合约层的ContractGuard

Laxmipriya Ganesh Iyer, Rahul Suresh Babu

专题命中工具调用：保护工具增强型LLM代理

AI总结针对工具增强型LLM代理的间接提示注入，提出ContractGuard，通过验证合约完整性（而非风险标签）来防御攻击，在基准测试中实现零注入成功率。

详情

AI中文摘要

风险感知因果门控（RACG）通过从代理的可见动作空间中移除危险工具来防御工具增强型LLM代理免受间接提示注入，使得即使完全符合注入条件的代理也无法调用其不可见的工具。我们提出三点。首先，这种结构性保证并未消除安全工具使用背后的信任假设；它将其转移到门所读取的工具合约——声明的先决条件、效果、风险和授权——的完整性上，因此攻击者若破坏合约，可使门误判而无需说服代理。其次，伪造工具的效果比篡改其风险标签更危险，因为RACG在可准入门之前应用因果门：离路径工具从不暴露，因此仅重新标记风险会失败，而效果伪造则将危险工具路由到因果路径上并成功。效果完整性，而非风险标签，是承载假设。第三，我们引入ContractGuard，一个位于注册表和门之间的验证器，它分层使用签名来源、类型化合约认证和运行时效果验证；在受控基准测试中，它针对所有建模攻击（包括穷举白盒自适应攻击）将注入成功率恢复为零，且不会过度拒绝诚实合约，该结构性预测在六个当前代托管模型（Claude Opus 4.8, Sonnet 4.6, Haiku 4.5; Amazon Nova Premier and Nova 2 Lite; GPT-OSS-120B）上得到确认。

英文摘要

Risk-Aware Causal Gating (RACG) defends tool-augmented LLM agents against indirect prompt injection by removing dangerous tools from the agent's visible action space, so that even a fully injection-compliant agent cannot call a tool it cannot see. We make three points. First, this structural guarantee does not eliminate the trust assumption behind safe tool use; it relocates it into the integrity of the tool contracts -- declared preconditions, effects, risk, and authorization -- that the gate reads, so an attacker who corrupts a contract can make the gate mis-decide without ever persuading the agent. Second, forging a tool's effects is strictly more dangerous than tampering with its risk label, because RACG applies a causal gate before its admissibility gate: an off-path tool is never exposed, so risk-relabeling alone fails, whereas effect forgery routes the dangerous tool onto the causal path and succeeds. Effect integrity, not the risk label, is the load-bearing assumption. Third, we introduce ContractGuard, a verifier between the registry and the gate that layers signed provenance, typed contract attestation, and runtime effect verification; on a controlled benchmark it restores injection success to zero against every modeled attack -- including an exhaustive white-box adaptive attacker -- without over-rejecting honest contracts, and the structural prediction is confirmed on six current-generation hosted models (Claude Opus 4.8, Sonnet 4.6, Haiku 4.5; Amazon Nova Premier and Nova 2 Lite; GPT-OSS-120B).

URL PDF HTML ☆

赞 0 踩 0