arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.30930 2026-06-01 cs.HC cs.AI cs.CL cs.CY

TUX: Measuring Human--AI Tacit Understanding

TUX：衡量人机默契理解

Yueshen Li, Hanyi Min, Vedant Das Swain, Koustuv Saha

AI总结通过光谱放置任务和TUX指数，量化人类与LLM之间的默契理解，发现人格特征影响对齐程度。

详情

AI中文摘要

随着大型语言模型（LLMs）越来越多地作为协作伙伴，人机对齐通常通过明确的任务成功、准确性或奖励优化来评估。然而，许多协作场景依赖于默契理解：即智能体能否在没有明确目标、沟通或反馈的情况下，与人类的评价立场或表征先验对齐。为了研究这种能力，我们开发了一个受社交派对游戏Wavelength启发的光谱放置任务，在该任务中，人类和智能体独立地将概念放置在主观光谱上。我们将默契理解指数（TUX）操作化为人类与智能体判断之间的成对相似性度量，并通过241名人类参与者和200个基于人格条件的LLM智能体（涵盖四种模型）进行评估。我们发现，在特质空间中最近的人-智能体对实现了显著更高的TUX，表明默契对齐是由个体层面特征而非随机相似性所结构化的。回归分析表明，随着预测变量集变得更加丰富，TUX变得更可解释，个体特质、决策风格和置信度优于聚合特质距离基线。这些发现表明，人类与LLM之间的默契理解是可测量的，同时也揭示了基于人格条件化方法在捕捉更深层表征对齐方面的局限性。

英文摘要

As large language models (LLMs) increasingly act as collaborative partners, human--AI alignment is often evaluated through explicit task success, accuracy, or reward optimization. Yet many collaborative settings depend on tacit understanding: whether an agent can align with a human's evaluative stance or representational priors without clear objectives, communication, or feedback. To study this capacity, we develop a spectrum-placement task inspired by the social party game Wavelength, in which humans and agents independently place concepts along subjective spectra. We operationalize the Tacit Understanding Index (TUX) as a pairwise measure of similarity between human and agent judgments, and evaluate it with 241 human participants and 200 profile-conditioned LLM agents across four models. We find that nearest human--agent pairs in trait space achieve significantly higher TUX, suggesting that tacit alignment is structured by person-level characteristics rather than random similarity. Regression analyses show that TUX becomes more explainable as predictor sets become richer, with individual traits, decision-making styles, and confidence improving over aggregate trait-distance baselines. These findings suggest that tacit understanding between humans and LLMs is measurable, while revealing the limits of profile-based conditioning for capturing deeper representational alignment.

URL PDF HTML ☆

赞 0 踩 0

2605.30917 2026-06-01 cs.IR cs.CV

Inference-Free Multimodal Learned Sparse Retrieval for Production-Scale Visual Document Search

无推理多模态学习稀疏检索用于生产级视觉文档搜索

Gyu-Hwung Cho, Youngjune Lee, Kiyoon Jeong, Siyoung Lee, Sanggyu Han, Hervé Dejean, Stéphane Clinchant, Seung-won Hwang

AI总结提出V-SPLADE，一种无需推理的稀疏检索器，通过标题门控令牌监督解决视觉稀疏表示中的词汇基础问题，在视觉文档检索中达到稠密级效果。

Comments 12 pages, 5 figures, 12 tables, preprint

详情

AI中文摘要

随着arXiv论文和企业PDF等大规模视觉文档语料库的持续增长，视觉文档检索受到越来越多的关注；然而，目前仍缺乏一个可部署的系统，能够对视觉文档进行词汇索引，而无需在大规模下进行神经编码。现有方法要么使用基于VLM的稠密或多向量模型实现强大的检索质量，但需要在服务时进行神经查询编码；要么使用基于OCR或标题的BM25避免查询编码，但代价是耗时的文本提取或生成。为了填补这一缺失的服务机制，我们提出了V-SPLADE，一种用于视觉文档检索的无推理稀疏检索器。然而，这种无推理的多模态学习稀疏检索系统仍未得到充分探索，并且在高稀疏性下尚未显示出稠密级别的有效性。我们将这一限制归因于词汇基础问题：视觉稀疏表示通常无法捕捉文档图像中嵌入的词汇内容。为了解决这个问题，我们引入了标题门控令牌监督，这是一种仅在训练时使用的信号，利用VLM生成的标题作为词汇线索来激活检索相关的词汇维度。通过这种监督，V-SPLADE在六个视觉文档检索基准上的平均NDCG@5比同规模稠密基线提高了13.8个百分点，比基于OCR或标题的BM25基线提高了最多6.3个百分点。在1870万文档的语料库上，其R@5比同规模稠密基线提高了一倍以上，并通过分数融合进一步将竞争检索器的R@5提高了最多2.4个百分点。代码即将在https://github.com/naver/v-splade发布。

英文摘要

As large-scale visual-document corpora such as arXiv papers and enterprise PDFs continue to grow, visual-document retrieval has gained increasing attention; yet it still lacks a deployable system that lexically indexes visual documents to serve queries without neural encoding at scale. Existing methods either achieve strong retrieval quality with VLM-based dense or multi-vector models but require neural query encoding at serving time, or avoid query encoding with OCR- or caption-based BM25 at the cost of time-consuming text extraction or generation. To fill this missing serving regime, we present V-SPLADE, an inference-free sparse retriever for visual-document retrieval. However, such inference-free multimodal learned sparse retrieval systems remain underexplored and have not yet shown dense-level effectiveness under high sparsity. We attribute this limitation to a lexical grounding problem: visual sparse representations often fail to capture the lexical content embedded in document images. To address this problem, we introduce caption-gated token supervision, a training-only signal that uses VLM-generated captions as lexical cues to activate retrieval-relevant vocabulary dimensions. With this supervision, V-SPLADE improves average NDCG@5 across six visual-document retrieval benchmarks by +13.8pp over the same-scale dense baseline and by up to +6.3pp over OCR- or caption-based BM25 baselines. On an 18.7M-document corpus, it more than doubles R@5 over the same-scale dense baseline and further improves competing retrievers through score fusion by up to +2.4pp R@5. Code will be released soon at https://github.com/naver/v-splade.

URL PDF HTML ☆

赞 0 踩 0

2605.30907 2026-06-01 cs.SE cs.AI cs.CL cs.LG

BlueFin: Benchmarking LLM Agents on Financial Spreadsheets

BlueFin: 在金融电子表格上对LLM智能体进行基准测试

Srivatsa Kundurthy, Clara Na, Colton Moraine, Anoushka Mohta, Case Winter, George Fang, John Ling, Emma Strubell, Zach Kirshner

AI总结提出BlueFin基准，通过131个真实金融电子表格任务评估LLM智能体的合成、操作和理解能力，并验证了LM评判与人类专家的一致性。

Comments 26 pages

详情

AI中文摘要

我们提出BlueFin，一个基准测试，要求大语言模型（LLM）智能体在专业金融领域的电子表格工作簿上执行合成、操作和理解任务。尽管全球电子表格软件付费用户估计数亿——比全球专业开发人员估计数量高一个数量级——但投入探索和扩展LLM在电子表格领域能力的资源相对较少，而专门用于反映专业金融角色实际职业任务的资源更少。为此，我们整理了131个具有现实相关性的挑战性复杂任务，包含3225个细粒度评分标准；值得注意的是，我们的评分标准和LM评判评估由一组专家人工标注员验证，从而对难以通过编程验证但可由LM评判智能体可靠评估的复杂任务进行高质量、细粒度的评估。我们的评判与专家共识达到一致（α=0.826），宏F1得分为0.839。前沿LLM在此挑战性基准上表现不佳，最强LLM在任务上的平均得分低于50%——模型在动态正确性方面表现出特别弱点。我们的贡献包括：涵盖三类电子表格任务的示例数据集、开源工具包和智能体评估框架，以及现有前沿模型在我们基准上的性能表征。

英文摘要

We present BlueFin, a benchmark that tasks large language model (LLM) agents with synthesis, manipulation, and comprehension tasks over spreadsheet workbooks in the professional finance domain. Though estimates of the global population of paying users of spreadsheet software range in the hundreds of millions -- an order of magnitude more than the estimated global population of professional developers -- comparatively fewer resources have been devoted to exploring and expanding LLM capabilities in the spreadsheet domain, with fewer still dedicated to mirroring real occupational tasks encountered by those in professional finance roles. In response, we curate a set of 131 challenging, complex tasks with real-world relevance in the domain, containing 3,225 granular rubric criteria; notably, our rubric criteria and LM judge evaluations are validated by a team of expert human annotators, resulting in high-quality, granular evaluations of complex tasks that are difficult to verify programmatically but can be reliably evaluated by an LM judge agent. Our judge achieves parity with expert consensus ($α=0.826$) with a macro-F1 score of 0.839. Frontier LLMs demonstrate poor performance on the challenging benchmark, with the strongest LLMs achieving less than 50\% average scores across tasks -- models exhibit particular weaknesses in dynamic correctness. Our contributions include a dataset of examples across three categories of spreadsheet tasks, an open source harness and agentic evaluation framework, and a characterization of existing frontier models' performance on our benchmark.

URL PDF HTML ☆

赞 0 踩 0

2605.30905 2026-06-01 math.OC cs.LG

A Unifying View of Anchoring via Operator-Side Tikhonov Regularization

通过算子侧Tikhonov正则化实现锚定的统一视角

Zihao Chen

AI总结本文提出锚定固定点和单调方程方法可通过在基础方法查询的算子上添加消失的Tikhonov正则项来统一构造，并分析了四种变体的残差收敛率。

详情

AI中文摘要

锚定不动点和单调方程方法，包括Halpern迭代、额外锚定梯度及其相关方法，通过向参考点添加消失的拉力来获得最后迭代保证。现有的锚定变体通常能获得尖锐的最后迭代保证，但从更新层面来看，锚点的放置可能是算法特定的且概念上不透明。我们表明锚定允许一个单一的算子侧构造：用消失的Tikhonov项正则化基础方法查询的算子，然后运行未修改的基础方法。应用于Picard迭代，该配方重现了Halpern迭代；应用于前向步、外梯度（EG）和过去外梯度（PEG，也称为Popov方法），它产生了三种变体，其锚点放置继承了基础方法的查询模式。前向步实例化给出了一个新的残差收敛保证，而EG和PEG实例化给出了新的正则化变体。四种分析共享一个残差递推关系，恢复了Halpern残差范数的$O(1/k)$收敛速率，为正则化前向步给出了$O(1/\sqrt{k})$，并在无约束单调Lipschitz设置下为正则化EG和PEG变体给出了$O(1/k)$。

英文摘要

Anchored fixed point and monotone equation methods, including Halpern iteration, extra anchored gradient, and their relatives, add a vanishing pull toward a reference point to obtain last-iterate guarantees. Existing anchored variants often achieve sharp last-iterate guarantees, but from the update-level perspective the placement of the anchor can be algorithm-specific and conceptually opaque. We show that anchoring admits a single operator-side construction: regularize the operator queried by the base method with a vanishing Tikhonov term, then run the unmodified base method. Applied to the Picard iteration, this recipe reproduces the Halpern iteration; applied to the forward step, extragradient (EG), and past extragradient (PEG, also known as Popov's method), it yields three variants whose anchor placements inherit the base method's query pattern. The forward-step instantiation gives a new residual convergence guarantee, while the EG and PEG instantiations give new regularized variants. The four analyses share a residual recurrence, recovering the $O(1/k)$ Halpern residual-norm convergence rate, giving $O(1/\sqrt{k})$ for the regularized forward step, and giving $O(1/k)$ for the regularized EG and PEG variants in the unconstrained monotone Lipschitz setting.

URL PDF HTML ☆

赞 0 踩 0

2605.30899 2026-06-01 eess.AS cs.AI cs.SD

A Unified and Reproducible Experimentation Framework for Speech Understanding

语音理解的统一可复现实验框架

Jing Peng, Junhao Du, Chenghao Wang, Hanqi Li, Yi Yang, Yixuan Wang, Xiaoyu Gu, Guanyu Chen, Yucheng Wang, Jiang Li, Zhangjie Zhao, Haoran Wang, Wenming Tu, Haoyu Li, Duo Ma, Lirong Qian, Yu Xi, Wen Wen, Jiaqi Guo, Hui Zhang, Shuai Fan, Wenbin Jiang, Shuai Wang, Kai Yu

AI总结提出SURE框架，通过标准化预测格式、归一化和评分，以及代理辅助的训练转换流程，提高语音理解模型在部署场景下的可比性和可复现性。

Comments This paper is submitted to INTERSPEECH 2026

2605.30889 2026-06-01 physics.chem-ph cs.LG

MLIPilot: LLM-Driven Auto-Research for Machine-Learned Interatomic Potentials

MLIPilot：面向机器学习原子间势的LLM驱动自动研究

Etinosa Osaro, Santosh Adhikari, Stamatia Zavitsanou, Kelsey Parker, Dario Rocca

AI总结提出MLIPilot框架，利用大语言模型自动提出假设、编辑训练代码并基于物理约束评分卡优化机器学习原子间势，在QM7和Cu EMT数据集上验证了其有效性。

详情

AI中文摘要

构建生产质量的机器学习原子间势（MLIP）需要在单个训练损失无法捕捉的约束下平衡精度、动力学稳定性和计算吞吐量。我们引入了MLIPilot，一个自动研究框架，其中工具调用的大语言模型提出假设、编辑MLIP训练代码、启动HPC作业，并使用固定的、受物理约束的评分卡接受或回退更改。我们在MACE势优化上评估了MLIPilot，使用了商业和开源权重LLM代理，包括GPT-5.5、GPT-4.1、Mistral-24B和Qwen3-32B。基准测试涵盖分子和周期性设置：一个QM7衍生数据集（我们为其生成了B3LYP/6-31G(d)能量和力），以及一个Cu EMT数据集（包含由ASE有效介质理论计算器标记的周期性铜超胞）。在这些基准测试中，最强的代理通过发现有用的训练策略（包括输出归一化、损失函数更改、渐进训练计划和模型容量调整），将最初违反约束的基线模型转变为可接受的模型。这些结果表明，当LLM代理的搜索受到领域特定验证标准的约束时，它们可以作为科学机器学习工作流的自主操作者，将MLIP开发从手动试错转向可审计的自动化实验。

英文摘要

Constructing production-quality machine-learned interatomic potentials (MLIPs) requires balancing accuracy, dynamical stability, and computational throughput under constraints that are not captured by a single training loss. We introduce MLIPilot, an auto-research framework in which tool-calling large language models propose hypotheses, edit MLIP training code, launch HPC jobs, and accept or revert changes using a fixed, physically constrained scorecard. We evaluate MLIPilot on MACE potential optimization using both commercial and open-weight LLM agents, including GPT-5.5, GPT-4.1, Mistral-24B, and Qwen3-32B. The benchmarks span molecular and periodic settings: a QM7-derived dataset for which we generated B3LYP/6-31G(d) energies and forces, and a Cu EMT dataset with periodic copper supercells labeled by ASE's Effective Medium Theory calculator. Across these benchmarks, the strongest agents move initially constraint-violating baselines to accepted models by discovering useful training strategies, including output normalization, loss-function changes, progressive training schedules, and model-capacity adjustments. These results suggest that LLM agents can serve as autonomous operators for scientific machine-learning workflows when their search is constrained by domain-specific validation criteria, shifting part of MLIP development from manual trial-and-error toward auditable, automated experimentation.

URL PDF HTML ☆

赞 0 踩 0

2605.30866 2026-06-01 quant-ph cs.LG

Generative Quantum Data Embeddings for Supervised Learning

用于监督学习的生成式量子数据嵌入

Jaewoong Heo, Daniel K. Park

AI总结提出一种基于能量的生成学习框架，通过保真度替代目标优化嵌入结构和参数，提升分类性能，并利用Wasserstein距离解释性能饱和现象。

Comments 14 pages, 7 figures

详情

AI中文摘要

量子机器学习的许多实际相关应用涉及经典数据，其性能关键取决于输入如何嵌入到量子态中。然而，使用固定的嵌入电路拟设仍是标准做法。我们提出了一种基于能量的生成学习框架，该框架合成门序列以优化嵌入结构并细化数据定制的参数，使用基于保真度的替代目标引导搜索以提高类别区分度。实验表明，该方法在不同设置下改善了分类性能，同时也揭示了在现有嵌入族内进行架构搜索仅带来有限额外收益的数据集。我们通过推导输入空间中Wasserstein距离的可实现经验风险界限来解释这种饱和，表明经典数据几何为不太可能从嵌入优化中获得实质性收益的情况提供了先验诊断。结果建立了一个实用且有理论依据的框架，通过生成优化搜索有效的量子数据嵌入，并通过底层经典数据的几何诊断可获得的收益。

英文摘要

Many practically relevant applications of quantum machine learning involve classical data, for which performance depends critically on how inputs are embedded into quantum states. Yet the use of a fixed embedding circuit ansatz remains standard practice. We propose an energy-based generative learning framework that synthesizes gate sequences to optimize embedding structures and refine data-tailored parameters, using a fidelity-based surrogate objective to guide the search toward improved class distinguishability. Empirically, the method improves classification performance across diverse settings, while also revealing datasets where architecture search within the present embedding family yields only limited additional gains. We explain this saturation by deriving bounds on the achievable empirical risk in terms of the Wasserstein distance in the input space, showing that classical data geometry provides an \emph{a priori} diagnostic for regimes in which substantial gains from embedding optimization are unlikely. The results establish a practically useful and theoretically motivated framework for searching effective quantum data embeddings through generative optimization, with the attainable gains diagnosed through the geometry of the underlying classical data.

URL PDF HTML ☆

赞 0 踩 0

2605.30862 2026-06-01 cs.DB cs.AI

Sophrosyne: Agentic Exploration of Relational Data Systems Needs Moderation

Sophrosyne: 关系数据系统的智能体探索需要适度

Madhav Jivrajani, Ramnatthan Alagappan, Aishwarya Ganesan

AI总结针对LLM驱动的Text2SQL智能体在探索数据系统时过度探索的问题，提出Sophrosyne环境，通过增强API响应中的指令来引导探索，减少过度探索并提升SQL生成准确性。

详情

AI中文摘要

由LLM驱动的Text2SQL智能体通过工具调用探索数据系统，将自然语言意图转化为SQL。然而，为了确保安全且受限的访问，数据系统构建了具有显式API表面的环境。我们研究并分类了当前暴露的API，将其分为粗粒度或细粒度，并认为在这两者之间进行选择会带来成本效益探索与准确SQL生成之间的基本权衡。大多数数据系统暴露细粒度API，但这无意中使智能体处于劣势：它们过度探索，将不相关的模式元素纳入查询公式中，并产生不准确的结果。我们认为，抑制过度探索是有效利用这些API表面的关键，并提出了Sophrosyne，一种数据系统环境，它通过增强API响应中的指令来引导智能体的探索过程。初步结果显示，指令将过度探索减少了4.6倍，并将准确率提高了高达12.4%（约4个百分点）。

英文摘要

Text2SQL agents powered by LLMs translate natural language intent into SQL by exploring the data system through tool calls before formulating the query. However, to ensure secure and scoped access, data systems construct environments with explicit API surfaces. We study and categorize these APIs exposed today as either coarse-grained or fine-grained and posit that choosing between them presents a fundamental tradeoff between cost-efficient exploration and accurate SQL generation. Most data systems expose fine-grained APIs, but this inadvertently disadvantages agents: they over-explore, incorporating irrelevant schema elements into their query formulation and produce inaccurate results. We argue that curbing over-exploration is key to the effective use of these API surfaces, and propose Sophrosyne, a data system environment that augments API responses with directives that guide the agent's exploration process. Initial results show that directives reduce over-exploration by 4.6x and boost accuracy by up to 12.4% (approx. 4 percentage points).

URL PDF HTML ☆

赞 0 踩 0

2605.30860 2026-06-01 math.ST cs.LG math.PR stat.TH

Bayesian Inference with Shaped Deep Non-linear MLPs

具有形状深度非线性MLP的贝叶斯推断

Boris Hanin, Tianze Jiang

AI总结本文通过神经协方差SDE分析深度非线性MLP在训练样本数、输入维数、隐藏层宽度和层数均较大时的贝叶斯推断，发现LP/N的一阶准则决定深度对模型证据的益处，并推导出贝叶斯预测后验等价于数据相关核方法。

Comments 35 Pages

详情

AI中文摘要

深度学习理论的一个核心目标是刻画神经网络在模型规模和训练集规模同时较大时的预测行为。由于模型参数数量和数据集大小发散极限不可交换，先验上并不清楚存在哪些极限。在这项工作中，我们通过研究深度非线性MLP在训练样本数（$P$）、输入维数（$N_0$）、隐藏层宽度（$N$）和隐藏层数（$L$）均可大时的贝叶斯推断，为这些问题提供了新的见解。我们基于神经协方差SDE（Li等人，2022）分析$LP/N\in\Theta(1)$（扮演有效网络深度角色）区域的预测后验。我们的框架涵盖光滑和ReLU激活函数，并适用于任意温度。我们发现，在$LP/N$的一阶近似下，存在一个简单准则，用于判断哪些数据生成过程能从深度中获益，即更大的$LP/N$会增加贝叶斯模型证据。我们还对物理学文献中的一个先前结果给出了新的推导：至少在$LP/N$的一阶近似下，贝叶斯预测后验极其简单，等价于一个数据相关的核方法。

英文摘要

A central aim of deep learning theory is to characterize how neural networks make predictions in the regime of simultaneously large model and training set size. Since the limits of diverging number of model parameters and dataset size do not commute it is not clear a priori what limits exist. In this work, we shed new light on these questions by studying Bayesian inference in deep non-linear MLPs in the regime where the number of training samples ($P$), the input dimension ($N_0$), the hidden layer width ($N$), and the number of hidden layers ($L$) can all be large. We build on the Neural Covariance SDE (Li et al., 2022) to analyze predictive posteriors in the regime where $LP/N\inΘ(1)$, playing the role of an effective network depth. Our framework covers both smooth and ReLU activation functions and applies to arbitrary temperature. We find to first order in $LP/N$ a simple criterion for which data generating processes benefit from depth in the sense that larger $LP/N$ increases the Bayesian model evidence. We also give a novel derivation of a prior result from the physics literature that at least to first order in $LP/N$, the Bayesian predictive posterior is remarkably simple and is simply equivalent to that of a data-dependent kernel method.

URL PDF HTML ☆

赞 0 踩 0

2605.30854 2026-06-01 cs.MA cs.AI

Safe Equilibrium Policy Optimization for Strategic Agent Policies

面向策略型智能体的安全均衡策略优化

Karthika Arumugam, Kiran Kumar Manku, Amit Dhanda

AI总结提出Safe Equilibrium Policy Optimization (SEPO)方法，通过惩罚可剥削性、共谋风险和外部性成本，优化语言模型在多智能体博弈中的策略安全性。

Comments Submitted to EMNLP 2026

详情

AI中文摘要

使用强化学习微调的语言模型通常优化任务奖励，忽略了多智能体策略结构。由于这些智能体以自然语言游戏状态描述为条件并通过自由生成发出动作，策略失败模式——利用较弱对手、协调有害均衡以及外部化成本——与语言接口本身密不可分。我们提出Safe Equilibrium Policy Optimization (\sepo{})，一种训练目标，通过显式惩罚可剥削性、共谋风险和外部性成本来增强期望收益。我们将\sepo{}作为组相对策略优化（GRPO）的奖励信号，应用于监督微调（SFT）后的Gemma~4 E4B-it和Qwen~3.5-4B。在五个策略领域（迭代囚徒困境、重复拍卖、两种谈判变体以及Kuhn扑克）中评估。\sepo{}在Kuhn扑克中实现了两种模型的零剥削池优势，在四个领域的安全性能上优于基础模型，并纠正了SFT引入的过度合作行为。在谈判中，\sepo{}实现了正安全结果，并且是唯一具有正归一化相对优势的谈判配置。消融实验证实，每次推演的剥削计算是必要的：共享常数惩罚在GRPO优势归一化中抵消（常数控制变量性质），产生零梯度。为支持智能体策略安全的进一步研究，我们发布了我们的\href{https://anonymous.4open.science/r/sepo-2668/README.md}{代码}和SFT数据集。

英文摘要

Language models fine-tuned with reinforcement learning typically optimize for task reward, ignoring multi-agent strategic structure. Because these agents condition on natural language game-state descriptions and emit actions through free-form generation, strategic failure modes -- exploiting weaker opponents, coordinating on harmful equilibria, and externalizing costs are inseparable from the language interface itself. We propose Safe Equilibrium Policy Optimization (\sepo{}), a training objective that augments expected payoff with explicit penalties for exploitability, collusion risk, and externality cost. We implement \sepo{} as a reward signal for Group Relative Policy Optimization (GRPO), applied to Gemma~4 E4B-it and Qwen~3.5-4B after supervised fine-tuning (SFT). Evaluated across five strategic domains: Iterated Prisoner's Dilemma, repeated auctions, two negotiation variants, and Kuhn Poker. \sepo{} achieves zero exploit-pool advantage in Kuhn Poker for both models, outperforms the base model on safety in four domains, and corrects the over-cooperative behavior introduced by SFT. In negotiation, \sepo{} achieves a positive-safety outcome and only the positive normalized relative advantage of any negotiation configuration. Ablation experiments confirm that per-rollout exploit computation is necessary: a shared constant penalty cancels in GRPO advantage normalization (constant control-variate property), producing zero gradient. To support further research in strategic safety for agents, we release our \href{https://anonymous.4open.science/r/sepo-2668/README.md}{code} and SFT datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.30831 2026-06-01 q-bio.QM cs.LG physics.chem-ph

The Geometry of Activity Cliffs: Representation Dependence and Multi-Scale Characterization of Activity Landscapes

活性悬崖的几何结构：活性景观的表征依赖性与多尺度表征

Pawel Dabrowski-Tumanski, Bartosz Topolski, Dariusz Plewczynski, Tomasz Jetka

AI总结本研究通过六步分析流程，系统探究不同分子表征（如指纹和嵌入）对活性悬崖定义的影响，发现无单一表征在所有标准下均最优，揭示了活性悬崖是表征诱导的几何现象而非分子对固有属性。

详情

AI中文摘要

活性悬崖是指结构相似但活性差异巨大的化合物，通常被视为化学数据集的固有特征。我们认为，除了靶标生物学因素外，我们对活性悬崖的理解很大程度上是由所选分子表征所诱导的几何结构决定的，而非分子对本身的属性。我们设计了一个六步分析流程来系统检验这一假设。该流程包括：评估成对距离几何、悬崖富集度、活性梯度分布、悬崖子空间的持续同调、嵌入和度量对的预测基准测试，以及最终匹配分子对和立体异构体的分析。我们将该流程应用于十五种嵌入和度量配置，以构建针对三个已知活性悬崖挑战的不同数据集的基准。没有一种表征在所有标准上均表现优异：Morgan Tanimoto 提供了最强的悬崖富集度和跨骨架泛化能力；MolFormer 余弦提供了唯一有意义的立体化学敏感性；MACCS 和 RDKit Dice 指纹对匹配分子对变换最敏感；ChemBERTa 由于嵌入坍缩而全面失败。这些发现并非排名。它们反映了不同表征编码了分子识别的不同方面，而选择一种表征实际上就隐含地定义了活性悬崖是什么。

英文摘要

Activity cliffs, structurally similar compounds with large potency differences, are widely treated as intrinsic features of chemical datasets. We argue that apart from target biology, much of our cliff understanding is a consequence of the geometry induced by the chosen molecular representation, not a property of a molecule pair itself. We designed a six-step pipeline to systematically test this hypothesis. The pipeline consists of: assessing pairwise distance geometry, cliff enrichment, activity gradient distribution, persistent homology of the cliff subspace, predictive benchmarking for a chosen pair of an embedding and a metric, and eventually, analysis of the matched molecular pairs and stereoisomers. We applied the pipeline to fifteen configurations of embeddings and metrics to build a benchmark across three distinctive datasets known of activity cliffs challenges. No representation excels on all criteria: Morgan Tanimoto provides the strongest cliff enrichment and cross-scaffold generalization; MolFormer cosine provides the only meaningful stereochemical sensitivity; MACCS and RDKit Dice fingerprints are most sensitive to matched-molecular-pair transformations; ChemBERTa fails uniformly due to embedding collapse. These findings are not a ranking. They reflect the fact that different representations encode different aspects of molecular recognition, and that choosing one implicitly defines what an activity cliff actually is.

URL PDF HTML ☆

赞 0 踩 0

2605.30818 2026-06-01 cs.ET cs.AI cs.SD

GaMi: Geometry-Agnostic Material Identification via Cross-Modal Subtractive Disentanglement

GaMi: 通过跨模态减法解缠实现几何无关的材料识别

Zhiwei Chen, Yijie Li, Yimo Zhang, Shiyun Shao, Yichao Chen, Dian Ding, Liang Wang, Haiwei Wu, Liwei Guo, Jie Yang, Xiaosong Zhang, Yongzhao Zhang

AI总结提出GaMi系统，利用毫米波和声学传感的跨模态减法解缠框架，在不受约束的几何条件下实现高精度材料识别。

Comments 17 pages, 18 figures

详情

AI中文摘要

非接触式材料识别使具身智能能够进行自适应交互，但面临几何诱导变化（如方向、形状、距离）和单模态模糊性的挑战。本文提出GaMi，一种集成毫米波和声学传感的多模态材料识别系统，可在不受约束的几何条件下稳健运行。利用共置双模态传感器之间共享几何一致性的洞察，GaMi采用样本内跨模态减法解缠框架。通过语义对齐模态并减去共享几何上下文，它隔离了内在材料特征。此外，GaMi引入样本间对比学习以纠正跨模态未对准引起的残余干扰。另外，两种模态之间的配对自适应策略实现了跨设备的少样本泛化。在20种材料上的广泛评估表明，GaMi达到了95.2%的准确率，在未见几何条件下优于单模态基线。

英文摘要

Non-contact material identification enables adaptive interaction for embodied intelligence yet faces challenges from geometry-induced variations (e.g., orientation, shape, distance) and single-modality ambiguities. In this paper, we present GaMi, a multimodal material identification system integrating mmWave and acoustic sensing to robustly operate under unconstrained geometric conditions. By leveraging the insight of shared geometric consistency between co-located bimodal sensors, GaMi employs an intra-sample cross-modal subtractive disentanglement framework. By semantically aligning modalities and subtracting the shared geometric context, it isolates intrinsic material features. Furthermore, GaMi incorporates inter-sample contrastive learning to correct the residual interference caused by cross-modal misalignment. Additionally, a pairing-based adaptation strategy between two modalities enables few-shot generalization across devices. Extensive evaluations on 20 materials show that GaMi achieves 95.2% accuracy, outperforming single-modality baselines across unseen geometric conditions.

URL PDF HTML ☆

赞 0 踩 0

2605.30808 2026-06-01 cs.CR cs.AI cs.LG

Differentially Private Preference Data Synthesis for Large Language Model Alignment

面向大语言模型对齐的差分隐私偏好数据合成

Fengyu Gao, Jing Yang

AI总结提出DPPrefSyn算法，基于Bradley-Terry偏好模型和DP-PCA生成差分隐私合成偏好数据，实现隐私保护的偏好对齐。

Comments Accepted to ICML 2026

详情

AI中文摘要

偏好对齐是大语言模型（LLMs）的关键后训练步骤，以确保其输出与人类价值观一致。然而，在真实人类偏好数据上进行后训练会引发隐私问题，因为这些数据集通常包含敏感的用户提示和人类判断。为了解决这一问题，我们提出了DPPrefSyn，一种用于生成差分隐私（DP）合成偏好数据的新算法，以实现隐私保护的偏好对齐。DPPrefSyn是一个基于Bradley-Terry偏好模型和成对人类偏好数据内在几何结构的原理性框架。它首先从具有正式差分隐私保证的私有数据中学习一个潜在的偏好模型，然后利用学习到的模型结合公共提示合成高质量的偏好数据。它利用每个簇奖励模型的共享线性结构来有效捕捉私有数据中的异构人类偏好，并利用差分隐私主成分分析（DP-PCA）来提高学习准确性。大量实验结果表明，DPPrefSyn在强DP保证下实现了具有竞争力的对齐性能。这些发现突显了合成偏好数据作为隐私保护偏好对齐的实用替代方案在广泛应用中的潜力。据我们所知，这是首项为LLM对齐生成DP合成偏好数据的工作。我们的代码可在https://github.com/gfengyu/Differentially-Private-Preference-Data-Synthesis获取。

英文摘要

Preference alignment is a crucial post-training step for large language models (LLMs) to ensure their outputs align with human values. However, post-training on real human preference data raises privacy concerns, as these datasets often contain sensitive user prompts and human judgments. To address this, we propose DPPrefSyn, a novel algorithm for generating differentially private (DP) synthetic preference data to enable privacy-preserving preference alignment. DPPrefSyn is a principled framework grounded in the Bradley-Terry preference model and the intrinsic geometric structure of pairwise human preference data. It first learns an underlying preference model from private data with formal differential privacy guarantees, and then leverages the learned model together with public prompts to synthesize high-quality preference data. It exploits the shared linear structure of per-cluster reward models to effectively capture heterogeneous human preferences in private datasets, and leverages DP Principal Component Analysis (DP-PCA) to improve learning accuracy. Extensive experimental results demonstrate that DPPrefSyn achieves competitive alignment performance under strong DP guarantees. These findings highlight the potential of synthetic preference data as a practical alternative for privacy-preserving preference alignment across a broad range of applications. To the best of our knowledge, this is the first work to generate DP synthetic preference data for LLM alignment. Our code is available at https://github.com/gfengyu/Differentially-Private-Preference-Data-Synthesis.

URL PDF HTML ☆

赞 0 踩 0

2605.30802 2026-06-01 cs.MA cs.AI

Design and Evaluation of Multi-Agent AI Oracle Systems for Prediction Market Resolution

用于预测市场决议的多智能体AI预言机系统的设计与评估

Tarun Kota

AI总结本研究设计并评估了多智能体LLM架构作为预测市场决议的预言机，通过独立聚合与协商共识两种机制，在KalshiBench数据集上对比单模型基线，发现置信度加权投票的独立聚合达到83.43%准确率，而协商共识因错误传播性能下降，并提出了混合AI-人类预言机的路由标准。

Comments 34 pages, 11 figures

详情

AI中文摘要

预测市场聚合集体智慧以预测不确定事件，但其效用依赖于可靠的结果决议。现有的预言机系统在快速但脆弱的自动化与准确但昂贵的人工仲裁之间进行权衡。单LLM预言机实现了有意义的准确性，但继承了其底层模型的所有失败模式，且没有自我纠正机制。我们评估了多智能体LLM架构是否能在单模型基线之上提高预言机决议准确性。我们在KalshiBench的1,189个已决议预测市场问题上，比较了独立聚合和协商共识与单LLM基线（GPT-5 Nano、DeepSeek V3和Llama-3.3-70B）的性能。所有智能体通过Exa共享共同的证据层，检索按出版日期过滤以隔离推理与检索质量。采用置信度加权投票的独立聚合达到了83.43%的最高准确率，比最佳个体模型高出1.01个百分点。协商共识将准确率降低至约76%，低于所有单模型基线，这归因于辩论过程中的错误传播，即自信的错误模型使正确模型发生翻转。模型间的错误相关性（0.529-0.689）解释了聚合增益为何低于理论Condorcet上限，对集成方法构成了根本限制。许多问题无法通过任何多智能体架构纠正，这促使升级至人工仲裁。我们提出了混合AI-人类预言机系统的路由标准：仅自动解决一致且高置信度的问题，在数据集的47%上达到97.87%的准确率，而智能体间的分歧则标记其余部分供人工审查。

英文摘要

Prediction markets aggregate collective intelligence to forecast uncertain events, but their utility depends on reliable outcome resolution. Existing oracle systems tradeoff fast but brittle automation against accurate but costly human arbitration. Single-LLM oracles achieve meaningful accuracy but inherit all failure modes of their underlying model with no self-correction mechanism. We evaluate whether multi-agent LLM architectures can improve oracle resolution accuracy over single-model baselines. We compare independent aggregation and deliberative consensus against single-LLM baselines (GPT-5 Nano, DeepSeek V3, and Llama-3.3-70B) on 1,189 resolved prediction market questions from KalshiBench. All agents share a common evidence layer through Exa, with retrieval filtered by publication date to isolate reasoning from retrieval quality. Independent aggregation with confidence-weighted voting achieves the highest accuracy at 83.43 percent, outperforming the best individual model by 1.01 percentage points. Deliberative consensus degrades accuracy to approximately 76 percent, below every single-model baseline, attributed to error propagation during debate where confidently wrong models flip correct ones. Error correlations across models (0.529-0.689) explain why aggregation gains fall short of the theoretical Condorcet ceiling, placing a fundamental limit on ensemble approaches. Many questions resist correction by any multi-agent architecture, motivating escalation to human arbitration. We propose routing criteria for hybrid AI-human oracle systems: auto-resolving only unanimous, high-confidence questions yields 97.87 percent accuracy on 47 percent of the dataset, with inter-agent disagreement flagging the remainder for human review.

URL PDF HTML ☆

赞 0 踩 0

2605.30792 2026-06-01 eess.AS cs.AI

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

OpenSTBench：超越语义评估的语音翻译

Yanjie An, Yuxiang Zhao, Yichi Zhang, Qixi Zheng, Yujie Tu, Keqi Deng, Kai Yu, Xie Chen

AI总结提出OpenSTBench统一多维评估框架，联合评估语音翻译系统的翻译质量、语音质量、时间一致性等，揭示系统间跨维度差异。

Comments Submitted to EMNLP 2026

详情

AI中文摘要

语音翻译系统日益涵盖语音到文本翻译（S2TT）、语音到语音翻译（S2ST）、离线翻译和流式生成，产生的输出在模态、语音实现和时间行为上有所不同。现有评估实践评估了翻译质量、语音质量和时间质量等重要方面，但这些方面通常在不同的协议下进行评估，使得难以全面比较异构系统。为弥补这一差距，我们提出了OpenSTBench，一个统一的多维评估框架，将异构语音翻译输出组织成共享的评估格式。OpenSTBench支持离线与流式设置下的S2TT和S2ST系统，并联合评估翻译质量、语音质量、说话人保留、情感与副语言保真度、时间一致性和延迟。通过在代表性语音翻译系统上的实验，我们表明具有强翻译质量的系统在语音质量和时间质量上仍可能存在显著差异。OpenSTBench提供了一个可复现的协议，用于分析这些跨维度差异，并支持面向应用的语音翻译系统比较。代码和数据集可在https://github.com/sjtuayj/OpenSTBench获取。

英文摘要

Speech translation systems increasingly span speech-to-text translation (S2TT), speech-to-speech translation (S2ST), offline translation, and streaming generation, producing outputs that differ in modality, speech realization, and timing behavior. Existing evaluation practices assess important aspects such as translation quality, speech quality, and temporal quality, but these aspects are often evaluated under separate protocols, making it difficult to compare heterogeneous systems comprehensively. To address this gap, we present OpenSTBench, a unified multidimensional evaluation framework that organizes heterogeneous speech translation outputs into a shared evaluation format. OpenSTBench supports both S2TT and S2ST systems in offline and streaming settings, and jointly evaluates translation quality, speech quality, speaker preservation, emotion and paralinguistic fidelity, temporal consistency, and latency. Through experiments on representative speech translation systems, we show that systems with strong translation quality can still differ substantially in speech quality, as well as in temporal quality. OpenSTBench provides a reproducible protocol for analyzing these cross-dimensional differences and supporting application-oriented comparison of speech translation systems. The code and datasets are available at https://github.com/sjtuayj/OpenSTBench.

URL PDF HTML ☆

赞 0 踩 0

2605.30790 2026-06-01 cs.IR cs.AI cs.CL

On the impact of retrieved content representations in RAG Pipelines

关于检索内容表示对RAG管道的影响

Jonathan J Ross, Bevan Koopman, Anton van der Vegt, Guido Zuccon

AI总结通过控制变量实验，研究检索文档的不同表示（选择、摘要、改写等）对RAG生成准确性的影响，发现答案保留是主要决定因素。

Comments 23 pages, 15 figures, submitted to ACL May 2026 ARR

详情

AI中文摘要

检索增强生成（RAG）通过检索到的文档补充语言模型的输入，但大多数RAG管道继承了为人类读者设计的检索组件。当消费者是大型语言模型（LLM）而非人类时，检索内容应如何表示尚不清楚。最近的工作提出了对检索内容的转换，并识别了影响生成的属性，但每项工作仅孤立地考察单一转换或属性，未明确文档表示的哪些特征最重要。我们通过控制比较来解决这一问题：固定检索不变，仅改变检索文档的表示，将原始基线与其他十三种转换（涵盖选择、摘要和改写，包括查询相关和查询无关变体）进行比较。在这十四种表示中，我们测量了四个生成器的问答准确性，并对每种表示测量了答案保留：即已知包含答案的文档在转换后是否仍支持其答案。我们发现，答案保留是生成器准确性的主要决定因素；值得注意的是，当保留率高时，表示的措辞、结构、长度和查询相关性影响有限。这表明，先前工作中归因于特定机制的准确性提升，可能部分由这些机制保留答案内容的能力解释，而这种归因在未控制保留的情况下无法确定。

英文摘要

Retrieval-Augmented Generation (RAG) supplements a language model's input with retrieved documents, yet most RAG pipelines inherit retrieval components designed for human readers. How retrieved content should be represented when the consumer is a large language model (LLM) rather than a human is less well understood. Recent work has proposed transformations of retrieved content and identified properties that affect generation, but each examines a single transformation or property in isolation, leaving open which features of a document's representation matter most. We address this with a controlled comparison: holding retrieval fixed, we vary only the representation of retrieved documents, comparing an original baseline against thirteen transformations spanning selection, summarisation, and reformulation, in query-dependent and query-independent variants. Across these fourteen representations we measure question-answering accuracy for four generators, and for each representation we also measure answer retention: whether a known answer-bearing document still supports its answer after transformation. We find that answer retention is the primary determinant of generator accuracy; notably, when retention is high, a representation's wording, structure, length, and query-dependence have limited effect. This suggests that accuracy gains attributed to specific mechanisms in prior work may be partly explained by how well those mechanisms preserve answer-bearing content, an attribution that cannot be settled without controlling for retention.

URL PDF HTML ☆

赞 0 踩 0

2605.30741 2026-06-01 stat.ML cs.LG

Is the Last Layer Sufficient for Uncertainty Quantification?

最后一层是否足以进行不确定性量化？

Joseph Wilson, Chris van der Heide, Liam Hodgkinson, Fred Roosta

AI总结通过理论分析和实验评估，比较全网络线性化与最后一层线性化在深度神经网络认知不确定性量化中的性能，发现最后一层近似在保持相当UQ性能的同时显著提升计算效率。

Comments 40 pages, 14 figures, 7 tables

详情

AI中文摘要

深度神经网络（DNN）的认知不确定性量化（UQ）是在关键任务环境中安全采用AI的要求。几种领先的UQ方法将DNN线性化以形成贝叶斯广义线性模型（GLM），其中认知不确定性通过预测后验分布建模。在DNN的最终连接层参数周围进行线性化是一种常用的近似方法，用于减少此类GLM的计算负担，尽管通常认为这会以性能下降为代价。在这项工作中，我们使用理论和实证方法比较了由全网络和最后一层线性化产生的GLM。我们首先利用随机矩阵理论进行理论比较；该分析显示全线性化在UQ能力上没有有意义的改进。结合一系列现代机器学习任务的大规模实证评估，我们得出以下结论：最后一层近似在提供显著提高的计算效率的同时，产生了可比的UQ性能。

英文摘要

Epistemic uncertainty quantification (UQ) for deep neural networks (DNNs) is a requirement for safe adoption of AI in mission-critical settings. Several leading methods for UQ linearize DNNs to form Bayesian Generalized Linear Models (GLMs), where epistemic uncertainty is modeled via the predictive posterior distribution. Linearizing around the parameters of the final connected layer of a DNN is a commonly used approximation for reducing the computational burden of such GLMs, though it is often believed to come at the cost of degraded performance. In this work, we compare GLMs arising from full-network and last-layer linearization using both theoretical and empirical approaches. We first employ tools from random matrix theory to conduct a theoretical comparison; this analysis reveals no meaningful improvement in the UQ capabilities of full linearization. Coupled with a large-scale empirical evaluation across a range of modern machine learning tasks, we arrive at the following conclusion: a last-layer approximation yields comparable UQ performance while offering substantially improved computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2605.30693 2026-06-01 cs.CR cs.CL

Triaging Threats to Specialized Guardrails

针对专用护栏的威胁分类

Wenjie Jacky Mo, Xiaofei Wen, Rui Cai, Boyu Zhu, Sicong Jiang, Zihan Wang, Minglai Yang, Zhe Zhao, Muhao Chen

AI总结提出GuardZoo统一基准和RouteGuard路由专家框架，通过将对话分流至专用护栏，解决单一护栏在多种威胁域上的任务干扰问题，提升细粒度检测和泛化能力。

详情

AI中文摘要

构建稳健的安全护栏对于在多样化的实际应用中部署大型语言模型至关重要。然而，这一目标仍然具有挑战性，因为安全风险涵盖异质的威胁领域，而现有数据集仅覆盖零散的风险子集并依赖不一致的分类法。因此，目前尚不清楚现有护栏能否在狭窄的评估设置之外泛化。为了更好地理解护栏模型的稳健性，我们首先引入GuardZoo，一个统一的人工标注基准，包含32,460个样本，覆盖15个不同的不安全类别。在GuardZoo上的评估揭示，单一护栏遭受任务干扰：不同的威胁领域需要不同的决策边界，这些边界难以压缩到单个模型中。因此，我们提出RouteGuard，一个路由-专家框架，将每个对话分流到专门的专家护栏进行威胁特定检测。实验表明，RouteGuard在强护栏基线上提高了细粒度威胁检测，在域外评估下泛化更好，并支持灵活的模块化扩展以应对新兴威胁。

英文摘要

Building robust safety guardrails is essential for deploying Large Language Models across diverse real-world applications. However, this goal remains challenging because safety risks span heterogeneous threat domains, while existing datasets cover only fragmented risk subsets and rely on inconsistent taxonomies. Consequently, it remains unclear whether current guardrails can generalize beyond narrow evaluation settings. To better understand the robustness of guardrail models, we first introduce GuardZoo, a unified human-annotated benchmark with 32,460 samples covering 15 distinct unsafe categories. Evaluation on GuardZoo reveals that monolithic guardrails suffer from task interference: different threat domains require distinct decision boundaries that are difficult to compress into a single model. We therefore propose RouteGuard, a router-expert framework that triages each conversation to specialized expert guardrails for threat-specific detection. Experiments show that RouteGuard improves fine-grained threat detection over strong guardrail baselines, generalizes better under out-of-domain evaluation, and supports flexible modular expansion to emerging threats.

URL PDF HTML ☆

赞 0 踩 0

2605.30686 2026-06-01 cs.CR cs.AI cs.LG

Depth-Dependent Indirect Prompt Injection in Tool-Calling ReAct Agents: Injection Depth, Payload Framing, and Turn-Budget Sensitivity

工具调用ReAct代理中深度相关的间接提示注入：注入深度、载荷框架和轮次预算敏感性

Mohammadreza Rashidi

AI总结通过四个对照实验（共460次试验），研究在工具调用ReAct代理中，注入深度、载荷框架和轮次预算对间接提示注入攻击成功率的影响，发现注入深度是主导变量，且仅清理第一个工具观察可捕获67%的注入成功。

Comments 17 pages, 16 figures

详情

AI中文摘要

将链式推理与工具调用交错的ReAct代理越来越多地用于实际任务，如调度、文件检索和数据访问。它们的工具观察循环创建了一个直接攻击面：控制任何工具返回值的攻击者可以嵌入指令，将代理从用户目标引开，这种威胁称为间接提示注入。现有基准在固定条件下评估固定注入位置的攻击成功率（ASR），留下了三个未探索的风险维度：载荷在工具序列中出现的位置（注入深度）、使用的修辞风格（框架）以及代理允许的轮次数（轮次上限）。我们在五个攻击类别的20个场景中进行了四项对照研究，总共对GPT-4o-mini和Claude Haiku进行了460次试验，总API成本低于0.36美元。研究1显示，GPT-4o-mini的ASR从深度1的60%衰减到深度4和5的0%（Cramer's V = 0.58，p < 0.001；限制在序列内深度1-3：V = 0.47，p = 0.0013），这是由于深度1的模型抵抗和更深位置在遇到载荷前任务完成所致。研究2在Claude Haiku上重复了深度实验，通过保守的工具调用和真正的指令抵抗，在每个深度均实现了0%的ASR。研究3显示，在深度1，框架将ASR调节在25%（中性）到75%（角色）之间，范围达50个百分点，但在每个条件下N=20时未达到统计显著性。研究4确认ASR在3、5和7的轮次上限下稳定，表明轮次预算在此设置中不是风险因素。我们的结果确立了注入深度为主导变量，并表明仅清理第一个工具观察可捕获67%的测量注入成功。

英文摘要

ReAct agents that interleave chain-of-thought reasoning with tool calls are increasingly deployed for real tasks such as scheduling, file retrieval, and data access. Their tool observation loop creates a direct attack surface: an adversary who controls any tool's return value can embed instructions that redirect the agent away from the user's goal, a threat known as indirect prompt injection. Existing benchmarks evaluate attack success rate (ASR) at a fixed injection position under fixed conditions, leaving three risk dimensions unexplored: where in the tool sequence the payload appears (injection depth), what rhetorical register it uses (framing), and how many turns the agent is permitted (turn cap). We conduct four controlled studies on 20 scenarios spanning five attack categories, totalling 460 trials against GPT-4o-mini and Claude Haiku at a combined API cost under 0.36 USD. Study 1 shows that ASR against GPT-4o-mini decays from 60% at depth 1 to 0% at depths 4 and 5 (Cramer's V = 0.58, p < 0.001; restricted to within-sequence depths 1-3: V = 0.47, p = 0.0013), driven by model resistance at depth 1 and task completion before payload encounter at deeper positions. Study 2 replicates the depth experiment on Claude Haiku, which achieves 0% ASR at every depth through a combination of conservative tool invocation and genuine instruction resistance. Study 3 shows that framing modulates ASR between 25% (neutral) and 75% (persona) at depth 1, a 50-percentage-point range that does not reach statistical significance at N = 20 per condition. Study 4 confirms that ASR is stable across turn caps of 3, 5, and 7, indicating the turn budget is not a risk factor in this setting. Our results establish injection depth as the dominant variable and show that sanitising only the first tool observation captures 67% of measured injection successes.

URL PDF HTML ☆

赞 0 踩 0

2605.30685 2026-06-01 cs.CY cs.AI cs.CL cs.HC

How Early Adopters Used Generative AI Worldwide: Variation by Country Income and Language

早期采用者如何在全球范围内使用生成式AI：按国家收入和语言的差异

Madeleine I. G. Daepp, Isaac Slaughter

AI总结基于大规模匿名化AI聊天机器人交互数据，实证分析了不同国家早期采用者在使用生成式AI上的差异，发现教育用途在低收入国家更普遍，休闲用途与收入正相关，且英语交互在非英语主导国家中过度代表，表明语言性能改进可能影响数字鸿沟或跨越式发展。

2605.30677 2026-06-01 cs.CR cs.AI cs.SE

Investigating Detection and Obfuscation of Prompt Injection Attacks Against Software Reverse Engineering AI Agents

针对软件逆向工程AI代理的提示注入攻击的检测与混淆研究

Brian Crawford, Patrick McClure

AI总结本研究针对软件逆向工程AI代理面临的提示注入攻击，提出了检测反编译器输出中提示注入字符串的防御策略，并探索了攻击混淆及相应防御方法。

2605.30667 2026-06-01 cs.CR cs.AI

Automatically Attacking Software Reverse Engineering AI Agents

自动攻击软件逆向工程AI代理

Brian Crawford, Justin Phillips, Patrick McClure

AI总结提出基于遗传算法的对抗性提示生成技术（AutoDAN变体），通过注入无关字符串变量欺骗LLM驱动的反汇编与反编译系统，导致其错误分析二进制可执行文件。

详情

AI中文摘要

用于逆向工程可执行二进制文件的软件工具（如Ghidra）使恶意软件分析师能够在无法访问原始源代码的情况下安全地进行稳健的静态分析。结合大型语言模型（LLM）的分析能力，配备工具（如GhidraMCP）的代理系统可以自动化先前由人工驱动的过程。尽管这种自动化可以提高单个恶意软件分析师的生产力，但它也为恶意软件混淆引入了新的漏洞领域。本文提出了一种对抗性技术，使用基于遗传算法的提示生成（一种称为AutoDAN的对抗性攻击的变体），以证明能够欺骗基于LLM的反汇编和反编译系统，使其错误解释二进制可执行文件，从而有效破坏其分析输出。这种概念验证方法利用了LLM处理和解译反编译机器代码时的固有漏洞，通过使用无关字符串变量赋值向LLM传递隐蔽指令，同时不影响可执行文件的功能。我们通过几个简洁的例子展示了这种能力。这种方法可能使攻击者能够绕过依赖LLM驱动分析管道的自动化检测系统。通过研究和理解这种攻击，可以获得关于将LLM集成到网络安全工具链中的安全影响以及构建更稳健的代理代码分析系统的见解。

英文摘要

Software tools for reverse engineering executable binary files, such as Ghidra, enable malware analysts to safely conduct robust static analysis without having access to original source code. Coupled with the analytic power of large language models (LLM), agentic systems enabled with tools, such as GhidraMCP, can allow analysts to automate a previously human driven process. Although this automation can increase the productivity of a single malware analyst, it also introduces a new area of vulnerability for malware obfuscation. This paper presents an adversarial technique using genetic algorithm-based prompt generation, a modification of an adversarial attack known as AutoDAN, to demonstrate the ability to deceive LLM-powered disassembly and decompilation systems into misinterpreting binary executables, effectively corrupting their analytical output. This proof-of-concept methodology exploits inherent vulnerabilities in how LLMs process and interpret decompiled machine code via prompt injection by using extraneous string variable assignments to pass surreptitious instructions to the LLM while not impacting the functionality of the executable file. We demonstrate this capability through several concise examples. This approach could enable attackers to bypass automated detection systems that rely on LLM-driven analysis pipelines. By studying and understanding this attack, insights can be gained regarding the security implication of integrating LLMs into cybersecurity toolchains and building more robust agentic code analysis systems.

URL PDF HTML ☆

赞 0 踩 0

2605.30632 2026-06-01 cs.HC cs.AI cs.LG

Rationalize: Shared Semantic Reasoning for Human-AI Alignment

Rationalize: 人机对齐的共享语义推理

Aritra Dasgupta, Naga Datha Saikiran Battula, Avina Nakarmi, Sohom Sen, Subhodeep Ghosh, Xun Song

AI总结提出Rationalize角色对框架，通过共享推理空间中的互补角色对（如探索者-引导者）实现人类与AI在数据驱动意义建构中的语义对齐，并设计元素级和角色特定的对齐评估方法。

Comments Accepted by ACM CHI 2026 BiAlign Workshop

详情

AI中文摘要

我们介绍了Rationalize，一个用于数据驱动意义建构中人类与AI模型之间共享语义推理的角色对框架。基于人机协作和批判性思维的思路，我们将人机交互概念化为一系列互补的角色对（探索者-引导者、调查者-告知者、教师-学生、法官-倡导者），这些角色对在共享推理空间中运作。在这个空间中，人类分析师和AI模型（如LLM）使目的、问题、假设、证据、推理和影响变得明确，不仅促进输出层面的对齐，而且促进双方意图和行动的合理化层面的对齐。我们将这些角色对与双向人机对齐框架联系起来，说明“使AI对齐人类”和“使人类对齐AI”如何因角色而异，并勾勒出一个使用元素级和角色特定方法进行对齐设计和评估的协作研究议程。

英文摘要

We introduce Rationalize, a role-pair framework for shared semantic reasoning between humans and AI models in data-driven sensemaking. Building on ideas in human-machine teaming and critical thinking, we conceptualize human-AI interaction as a series of complementary role pairs (Explorer-Guide, Investigator-Informant, Teacher-Student, Judge-Advocate) operating in a shared reasoning space. In this space, human analysts and AI models (such as LLMs) make purposes, questions, assumptions, evidence, inferences, and implications explicit, facilitating alignment not only at the output level but at the level of rationalization of intent and action by each side. We relate these role pairs to the bidirectional human-AI alignment framework, illustrating how "aligning AI to humans" and "aligning humans to AI" differ by role, and sketch a collaborative research agenda for alignment design and assessment using element-level and role-specific approaches.

URL PDF HTML ☆

赞 0 踩 0

2605.30622 2026-06-01 cs.NI cs.LG

Jamming-Resilient PRB Reservation for Latency-Critical O-RAN Network Slicing

抗干扰的PRB预留用于延迟关键型O-RAN网络切片

Elahe Delavari, Junaid Farooq

AI总结针对O-RAN下行链路中的恶意干扰导致PRB容量降低和延迟违规问题，提出一种基于预留的弹性框架，通过近实时RIC xApp控制预留PRB池，结合主动清除积压和被动分配预留容量，并采用掩蔽深度Q网络学习非平稳干扰下的最优策略，显著降低URLLC延迟违规并提高预留效率。

Comments Accepted at ML-Spec Workshop in IEEE DySPAN 2026

详情

AI中文摘要

开放无线接入网络（O-RAN）架构通过部署在近实时RAN智能控制器（near-RT RIC）上的可编程xApp，实现对网络切片的近实时、软件驱动控制。在工业5G下行链路系统中，恶意干扰会突然降低有效物理资源块（PRB）容量，导致队列积压和持续延迟违规，尤其是在存在低频谱效率的小区边缘用户设备时。本文提出了一种基于预留的弹性框架，用于切片O-RAN部署中的PRB分配。一个有限的预留PRB池由near-RT RIC xApp控制，该xApp通过主动清除积压以建立延迟余量，并在干扰活跃期间被动分配预留容量，提供混合缓解措施。我们将预留激活建模为受约束的序贯决策问题，并设计了一个掩蔽深度Q网络，以学习非平稳干扰下的有效控制策略。仿真结果表明，与反应式基线相比，URLLC延迟违规显著减少，预留效率提高。

英文摘要

Open radio access network (O-RAN) architectures enable near real-time, software-driven control of network slicing through programmable xApps deployed on the near-real-time RAN Intelligent Controller (near-RT RIC). In industrial 5G downlink systems, adversarial jamming can abruptly reduce the effective physical resource block (PRB) capacity, triggering queue buildup and persistent latency violations, particularly in the presence of low spectral efficiency cell edge user equipments. This paper proposes a reserve-based resilience framework for PRB allocation in sliced O-RAN deployments. A finite pool of reserved PRBs is controlled by a near-RT RIC xApp that provides hybrid mitigation by proactively clearing backlog to build latency margin and reactively allocating reserve capacity during jammer active intervals. We formulate reserve activation as a constrained sequential decision problem and design a masked Deep Q-Network to learn effective control policies under non-stationary jamming. Simulation results show substantial reductions in URLLC latency violations and improved reserve efficiency compared to reactive baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.30619 2026-06-01 stat.ML cs.AI cs.LG

Reward Learning from Best-of-$N$ Preference Data: Targets, Tradeoffs, and Design Principles

从 Best-of-$N$ 偏好数据中学习奖励：目标、权衡与设计原则

Rattana Pukdee, Maria-Florina Balcan, Pradeep Ravikumar

AI总结本文分析了从 Best-of-$N$ 采样构建的成对偏好数据中 Bradley-Terry 奖励学习的目标，揭示了 $N$ 和基础分布对奖励估计的影响，并提出了基于样本效率和连通性权衡的设计原则。

详情

AI中文摘要

Best-of-$N$ 采样被广泛用于构建成对偏好数据：从基础分布中抽取 $N$ 个候选，并将最佳响应与拒绝响应配对。尽管其广泛使用，但 Bradley-Terry (BT) 奖励学习从这类数据中提取了什么，以及如何选择 $N$ 和基础分布，仍不清楚。我们将近期通过诱导条件分布对偏好数据的分析专门应用于 Best-of-$N$。对于独立参考变体，我们推导出作为 $N$ 和基础分布显式函数的闭式奖励目标，并证明它们保留了潜在奖励排名。对于实用的 Best-vs-Random 和 Best-vs-Worst 变体，所选和拒绝的响应通过同一候选集耦合，因此精确的 BT 可表示性通常不成立；然而，随着 $N$ 增长，有界类最小化器接近参考目标。尽管已知边界和连通性在成对偏好学习中控制样本效率，但 Best-of-$N$ 通过 $N$ 以相反方向耦合它们：更大的 $N$ 加宽成对边界但降低连通性。这种权衡产生了两个设计原则：当偏好标签是瓶颈时使用较大的 $N$，当生成是瓶颈时使用较小的 $N$；并塑造基础分布，使其质量集中在测试时比较最重要的响应之间。在合成和真实偏好数据上的实验支持了对样本量和基础分布形状的预测依赖性。

英文摘要

Best-of-$N$ sampling is widely used to construct pairwise preference data: $N$ candidates are drawn from a base distribution, and the best is paired with a rejected response. Despite its widespread use, what Bradley--Terry (BT) reward learning extracts from such data, and how to choose $N$ and the base distribution, remain unclear. We specialize a recent analysis of preference data via its induced conditional distribution to Best-of-$N$. For independent-reference variants, we derive closed-form reward targets as explicit functions of $N$ and the base distribution, and show that they preserve the latent reward ranking. For the practical Best-vs-Random and Best-vs-Worst variants, chosen and rejected responses are coupled through the same candidate set, so exact BT representability generally fails; nevertheless, bounded-class minimizers approach the reference targets as $N$ grows. Although margin and connectivity are known to govern sample efficiency in pairwise preference learning, Best-of-$N$ couples them through $N$ in opposing directions: larger $N$ widens pairwise margins but reduces connectivity. This trade-off yields two design principles: use larger $N$ when preference labels are the bottleneck, smaller $N$ when generation is the bottleneck; and shape the base distribution to place mass between the responses whose comparison matters most at test time. Experiments on synthetic and real preference data support the predicted dependence on sample size and base-distribution shape.

URL PDF HTML ☆

赞 0 踩 0

2605.30614 2026-06-01 cs.CR cs.SD

Audio Pirates: Black-box Audio Watermark Removal via Diffusion Priors

Audio Pirates: 基于扩散先验的黑盒音频水印移除

Lingfeng Yao, Xincong Zhong, Chenpei Huang, Xuandong Zhao, Hanqing Guo, Aohan Li, Jiang Liu, Tomoaki Ohtsuki, Miao Pan

AI总结提出黑盒水印移除攻击DiffErase，通过将水印音频扰动到中间扩散噪声水平并利用预训练去噪模型再生，有效去除水印同时保持感知质量。

详情

AI中文摘要

随着AI生成音频的兴起，水印被广泛用于检测滥用和保护知识产权。然而，攻击者可能试图移除这些水印，因此评估水印方案抵御移除攻击的能力至关重要。现有攻击往往不切实际：它们要么显著降低感知质量，要么需要访问水印方案。我们提出DiffErase，一种黑盒水印移除攻击，假设对目标水印方案一无所知，同时保持感知质量。DiffErase将带水印音频扰动到中间扩散噪声水平，并使用预训练去噪模型再生，有效抑制水印信号。理论分析和大量实验表明，不可听音频水印高度脆弱：在多个音频域中，DiffErase始终移除水印同时保持感知质量。这些发现突显了未来音频水印设计需要考虑基于扩散的威胁。代码和演示可在https://differase.github.io/DiffErase/获取。

英文摘要

With the rise of AI-generated audio, watermarking has become widely used for detecting misuse and protecting intellectual property. However, adversaries may try to remove these watermarks, making it critical to evaluate how well watermarking schemes withstand removal attacks. Existing attacks are often impractical: they either noticeably degrade perceptual quality or require access to the watermarking scheme. We propose DiffErase, a black-box watermark removal attack that assumes no knowledge of the target watermarking scheme while maintaining perceptual quality. DiffErase perturbs watermarked audio to an intermediate diffusion noise level and regenerates it using a pretrained denoising model, effectively suppressing watermark signals. Theoretical analysis and extensive experiments demonstrate that inaudible audio watermarks are highly vulnerable: across multiple audio domains, DiffErase consistently removes watermarks while preserving perceptual quality. These findings highlight the need for future audio watermarking designs to consider diffusion-based threats. Code and demos are available at https://differase.github.io/DiffErase/.

URL PDF HTML ☆

赞 0 踩 0

2605.30613 2026-06-01 cs.CR cs.LG

CacheProbe: Auditing Prompt Cache Isolation in Gateway APIs

CacheProbe: 审计网关API中的提示缓存隔离

Ryan Fahey

AI总结本文通过CacheProbe方法审计OpenRouter API网关架构，发现其共享组织凭证的路由机制可能绕过提供商级别的提示缓存隔离，导致全局缓存共享漏洞。

Comments 11 pages, 8 figures, 2 tables Accepted at SAGAI '26 (Workshop on Secure Agents for Generative AI), co-located with IEEE Symposium on Security and Privacy 2026

2605.30604 2026-06-01 cs.CR cs.AI cs.CL cs.IR

An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations

面向受监管网络安全运营的组织范围LLM代理运行时架构

George Fatouros, Georgios Makridis, George Kousiouris, John Soldatos, Dimosthenis Kyriazis

AI总结提出一种组织范围的LLM代理运行时架构，通过类型化安全上下文、运行时核心、专业子代理、受控工具适配层和分层人机回环，实现检索、工具调用、内存、发现、报告和审计的全局强制，并保持模型无关和本地部署。

Comments 8 pages, 3 figures

详情

AI中文摘要

受监管的网络安全工作流缺乏一个运行时基础，该基础能够在检索、工具调用、内存、发现、报告和审计中强制执行组织范围，同时保持模型无关和本地可部署。近期的大语言模型（LLM）代理系统在孤立的网络安全任务上报告了强劲结果，但它们本身并未为受监管的安全运营中心（SOC）和合规工作流定义一个可审计的平台架构，在这些工作流中，单个分析师可能触发约束整个组织的行动，并且运行时必须与现有的SIEM/XDR堆栈集成，作为上下文和告警驱动触发器的主要来源，而不是作为独立的分析层运行。本文提出了一种面向金融网络安全的组织范围LLM代理运行时架构。其贡献是一个类型化的安全上下文，该上下文在每个入口点创建，包括作为一等触发器摄入的SIEM/XDR通知，并在每个组件边界强制执行，结合共享的运行时核心、逻辑专业子代理、受控工具适配层（在统一策略和审计下暴露SIEM/XDR查询、丰富和响应原语）、带有证据引用的结构化发现、分层人机回环（HITL）门控以及仅追加审计。模型上下文协议（MCP）、扩展遥测、用于渗透测试的数字孪生、图检索和联邦知识共享被视为可选的扩展路径，而非强制性的运行时假设。我们描述了一个可实现的切片作为架构的可测试性表面，并提出了一个可证伪的评估计划，包含用于架构就绪性、安全策略执行、证据可追溯性、输出质量和运营可观测性的度量级通过标准。

英文摘要

Regulated cybersecurity workflows lack a runtime substrate that enforces organization-level scope across retrieval, tool calls, memory, findings, reports, and audit while remaining model-agnostic and locally deployable. Recent large language model (LLM) agent systems report strong results on isolated cybersecurity tasks, yet they do not by themselves define an auditable platform architecture for regulated security operations centre (SOC) and compliance workflows, where a single analyst may trigger actions that bind the organization, and where the runtime must integrate with existing SIEM/XDR stacks as a primary source of context and alert-driven triggers rather than operate as a standalone analytical layer. This paper proposes an organization-scoped LLM agent runtime architecture for financial cybersecurity. The contribution is a typed Security Context that is created at every entry point, including SIEM/XDR notifications ingested as first-class triggers, and enforced at every component boundary, combined with a shared Runtime Core, logical specialist subagents, a governed Tool Adapter Layer exposing SIEM/XDR query, enrichment, and response primitives under uniform policy and audit, structured findings with evidence references, tiered human-in-the-loop (HITL) gates, and append-only audit. Model Context Protocol (MCP), extended telemetry, digital twins for pentesting, graph retrieval, and federated knowledge sharing are treated as optional extension paths rather than mandatory runtime assumptions. We describe an implementable slice as the architecture's testability surface, and we propose a falsifiable evaluation plan with metric-level pass criteria for architecture readiness, security-policy enforcement, evidence traceability, output quality, and operational observability.

URL PDF HTML ☆

赞 0 踩 0

2605.30603 2026-06-01 physics.ao-ph cs.LG

Learning effective Sargassum transport dynamics from limited drifter observations

从有限的漂流观测中学习有效的马尾藻输运动力学

F. J. Beron-VEra, M. J. Olascoaga, J. Morell, E. Cruz

AI总结针对浮游物质输运中未解析过程的影响，提出基于物理诊断和有限记忆表示的数据驱动输运学习框架，通过MLP集成和SINDy方法从有限拉格朗日观测中学习有效输运修正，并在波多黎各和墨西哥湾流区域验证了诊断信息的有效性及延迟稀疏符号修正的局限性。

详情

AI中文摘要

浮游物质输运受到未解析过程的影响，这些过程通常无法从现有的环流产品中获得。我们开发了一个数据驱动的输运学习框架，利用物理驱动的海洋-大气诊断和部分受惯性粒子记忆效应启发的有限记忆表示，从有限的拉格朗日观测中学习有效的输运修正。通过留一轨迹验证，使用预测性和稀疏符号发现方法分析诊断表示。在波多黎各地区和墨西哥湾流的马尾藻跟随漂流器应用中，结果表明诊断包含超越基线环流产品的输运相关信息。多层感知器（MLP）集成提供了灵活的预测轨迹修正，而非线性动力学稀疏辨识（SINDy）测试是否可以从诊断中提取瞬时或延迟的稀疏符号输运结构。结果在不同流态下有所不同：（i）在波多黎各，延迟稀疏符号修正提供了适度但系统的改进；（ii）在墨西哥湾流应用中，尽管延迟预测信息持续存在，但动态有用的稀疏符号修正主要保持瞬时性。这些结果支持粗粒度浮游物质输运中的有限记忆效应，同时也说明了获得稳定延迟稀疏符号闭合的困难。

英文摘要

Floating-material transport is influenced by unresolved processes that are often absent from available circulation products. We develop a data-driven transport-learning framework for learning effective transport corrections from limited Lagrangian observations using physically motivated ocean--atmosphere diagnostics and finite-memory representations motivated in part by inertial-particle memory effects. The diagnostic representation is analyzed through predictive and sparse symbolic-discovery approaches under leave-one-trajectory-out validation. Applications to Sargassum-following drifters in the Puerto Rico region and the Gulf Stream show that the diagnostics contain transport-relevant information beyond the baseline circulation products. Multilayer perceptron (MLP) ensembles provide flexible predictive trajectory corrections, while Sparse Identification of Nonlinear Dynamics (SINDy) tests whether instantaneous or delayed sparse symbolic transport structure can be extracted from the diagnostics. The results differ across flow regimes: (i) in Puerto Rico, delayed sparse symbolic corrections provide modest but systematic improvement; (ii) in the Gulf Stream application, dynamically useful sparse symbolic corrections remain primarily instantaneous even though delayed predictive information persists. These results support finite-memory transport effects in coarse-grained floating-material transport while also illustrating the difficulty of obtaining stable delayed sparse symbolic closures.

URL PDF HTML ☆

赞 0 踩 0

2605.30578 2026-06-01 cs.CR cs.CV

AdvScene: Rethinking Adversarial Patch Evaluation Through Scene Robustness

AdvScene: 通过场景鲁棒性重新思考对抗补丁评估

Xiaoyong, Yuan, Lan, Zhang

AI总结提出AdvScene框架，通过重建真实环境并引入对抗补丁到场景嵌入（APSE）方法，评估对抗补丁在视角、距离和场景条件变化下的场景鲁棒性，揭示现有评估未捕获的场景依赖性变化。

详情

AI中文摘要

对抗补丁是附着在真实物体上以误导AI视觉系统的物理图案。它们的现实世界风险并非由单次成功预测决定，而是取决于在部署后变化的视角、距离和场景条件下是否仍然有效。我们将这一特性称为场景鲁棒性，即部署的补丁在真实环境各种条件下的有效性。然而，现有评估并未很好地衡量场景鲁棒性：真实图像基准虽真实但固定，而模拟器虽可控但未基于特定真实场景。我们提出AdvScene，一个基于场景的框架，用于在重建的真实环境中测量对抗补丁的场景鲁棒性。AdvScene将评估重新定义为操作性测量：给定一个固定的部署补丁，它刻画补丁的操作包络——攻击成功的位置和条件——作为视角、距离和场景上下文的函数。一个关键挑战是攻击通常仅在单个锚定视图中定义，而评估需要一种在视角变化下保持保真度的表示。我们将此形式化为一个约束提升问题，并引入对抗补丁到场景嵌入（APSE），它在保留攻击关键外观并强制局部性、目标表面附着和跨视图一致性的同时，解决跨视图歧义。我们使用真实世界物理数据验证AdvScene，并对现有对抗补丁进行全面评估。结果表明，AdvScene揭示了攻击有效性的显著场景依赖性变化，而现有基于图像或模拟器的评估未能捕获这些变化。

英文摘要

Adversarial patches are physical patterns attached to real objects to mislead AI vision systems. Their real-world risk is not determined by a single successful prediction, but by whether they remain effective after deployment under changing viewpoints, distances, and scene conditions. We refer to this property as scene robustness, the effectiveness of a deployed patch across conditions in a real environment. Yet existing evaluations do not measure scene robustness well: real image benchmarks are realistic but fixed, while simulators are controllable but not grounded in a specific real scene. We present AdvScene, a scene-grounded framework for measuring the scene robustness of adversarial patches in reconstructed real environments. AdvScene reframes evaluation as operational measurement: given a fixed deployed patch, it characterizes the patch's operational envelope - where and when the attack succeeds - as a function of viewpoint, distance, and scene context. A key challenge is that the attack is typically defined only in a single anchor view, while evaluation requires a representation that remains faithful under viewpoint changes. We formalize this as a constrained lifting problem and introduce Adversarial Patch-to-Scene Embedding (APSE), which resolves cross-view ambiguity while preserving attack-critical appearance and enforcing locality, target-surface attachment, and cross-view consistency. We validate AdvScene using real-world physical data and conduct a comprehensive evaluation of existing adversarial patches. Our results show that AdvScene reveals substantial scene-dependent variation in attack effectiveness that is not captured by existing image-centric or simulator-based evaluations.

URL PDF HTML ☆

赞 0 踩 0