arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.04000 2026-06-04 cond-mat.mtrl-sci cs.LG

SPLIT-PINN: Separable Probability Learning Technique via Physics-Informed Neural Networks for High-Dimensional Probabilistic Modeling

SPLIT-PINN: 基于物理信息神经网络的可分离概率学习技术用于高维概率建模

Pouria Behnoudfar, Deekshith Naidu Ponnana, Noah J. Schmelzer, Janith Wanni, George T. Gray, Dan J. Thoma, Curt A. Bronkhorst, Nan Chen, Wenxiao Pan

发表机构 * Department of Mechanical Engineering, University of Wisconsin-Madison（威斯康星大学麦迪逊分校机械工程系）； Department of Mathematics, University of Wisconsin-Madison（威斯康星大学麦迪逊分校数学系）； Department of Civil Engineering, Johns Hopkins University（约翰霍普金斯大学土木工程系）； Materials Physics and Applications Division, Los Alamos National Laboratory（洛斯阿拉莫斯国家实验室材料物理与应用 division）； Department of Materials Science and Engineering, University of Wisconsin-Madison（威斯康星大学麦迪逊分校材料科学与工程系）

AI总结提出一种基于物理信息神经网络的可分离概率学习技术（SPLIT-PINN），通过将漂移场分解为边际校正项并施加正交约束，从数据中推断高维输运主导的联合概率密度函数演化，实现对多晶材料微观结构状态演变的准确概率预测。

详情

AI中文摘要

我们提出了一种概率建模框架，用于将小尺度空间异质性纳入多晶金属材料宏观行为描述中。空间异质性材料状态场使用概率密度函数（PDF）表示，提供了跨不同计算多晶实现的微观结构变异性和状态演化的原则性统计描述。该框架基于概率输运模型的逆识别，该模型被表述为具有未知漂移项的Liouville方程。为了在高维、输运主导的设置中实现该漂移场的准确、稳定和可解释推断，我们开发了基于物理信息神经网络的可分离概率学习技术（SPLIT-PINN）。该方法结合了边际校正漂移分解、正交性约束和基于残差的自适应训练，以增强适定性、数值稳定性和物理一致性，而不施加限制性参数假设。使用SPLIT-PINN，控制联合状态PDF时间演化的漂移场直接从数据中推断。在基准验证之后，该框架应用于描述多晶微观结构状态（包括von Mises应力、位错密度和等效塑性应变率）演化的物理计算数据集。在单个数据集上训练的所学Liouville模型随后用于对多个未见过的多晶实现的联合和边际PDF的时间演化进行正向预测。与参考PDF的定量比较表明，所提出的框架产生了准确且鲁棒的概率预测，并有效跨数据集泛化。

英文摘要

We present a probabilistic modeling framework for incorporating small-scale spatial heterogeneity into macroscopic descriptions of material behavior for polycrystalline metallic materials. Spatially heterogeneous material state fields are represented using probability density functions (PDFs), providing a principled statistical description of microstructural variability and state evolution across different computational polycrystalline realizations. The framework is built on the inverse identification of a probabilistic transport model, formulated as a Liouville equation with an unknown drift term. To enable accurate, stable, and interpretable inference of this drift field in high-dimensional, transport-dominated settings, we develop a Separable Probability Learning Technique via Physics-Informed Neural Networks (SPLIT-PINN). This method incorporates a marginal-correction drift decomposition, orthogonality constraints, and residual-based adaptive training to enhance well-posedness, numerical stability, and physical consistency without imposing restrictive parametric assumptions. Using SPLIT-PINN, the drift field governing the temporal evolution of joint state PDFs is inferred directly from data. After benchmark validation, the framework is applied to physical computational datasets describing the evolution of polycrystalline microstructural states, including von Mises stress, dislocation density, and equivalent plastic strain rate. The learned Liouville model, trained on a single dataset, is subsequently used in forward predictions of the temporal evolution of joint and marginal PDFs for multiple unseen polycrystal realizations. Quantitative comparisons with reference PDFs demonstrate that the proposed framework yields accurate and robust probabilistic predictions and generalizes effectively across datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.03283 2026-06-04 eess.AS cs.SD

SpeakerCard-1M: An Evidence-Grounded Speaker Card Corpus for In-the-Wild Speaker Verification

SpeakerCard-1M：面向野外说话人确认的基于证据的说话人卡片语料库

Junyi Peng, Oldřich Plchot, Xiao Song, Dading Chong, Lichun Fan, Hang Su, Themos Stafylakis, Junjie Li, Kong Aik Lee, Shuai Wang, Jian Luan, Jan Černocký

发表机构 * Brno University of Technology, Czechia（布拉格技术大学，捷克）； Peking University, China（北京大学，中国）； Xiaomi, China（小米，中国）； Athens University of Economics and Business, Greece（雅典经济与商业大学，希腊）； The Hong Kong Polytechnic University, Hong Kong（香港理工大学，香港）； Nanjing University, China（南京大学，中国）

AI总结提出SpeakerCard-1M双语说话人资源，通过声学探针和受限LLM生成结构化说话人卡片，并定义跨模态协议以支持基于证据的说话人确认。

Comments Corpus and protocols at https://junyipeng00.github.io/SpeakerCard-1M-page

详情

AI中文摘要

现代说话人确认（SV）系统依赖于说话人嵌入，这些嵌入有效但难以解释或通过自然语言查询。大多数现有的语音-文本语料库针对可控合成或话语级字幕，并为野外说话人识别提供有限的说话人级监督。本文介绍了SpeakerCard-1M，一个面向基于证据的SV的双语说话人中心资源，源自VoxCeleb1/2和CN-Celeb1/2，其中“-1M”后缀指发布中包含的178万条话语级字幕。我们采用工具优先、LLM最后的策略：十个声学探针产生字段级证据，证据在将相对稳定特征与话语级状态分离的模式下聚合成说话人档案，并由仅看到结构化字段的受限LLM渲染出双语说话人卡片。发布内容包括10200个说话人的56700条说话人卡片记录、178万条话语级字幕以及说话人ID不相交的难负三元组。我们进一步定义了两个面向SV的跨模态协议：双向说话人-文本检索（T2S-R / S2T-R）和属性条件验证（AC-Verify），并在零样本强制选择设置下将双编码器基线与最近的音频语言模型进行比较。联合音频-文本训练使VoxCeleb1-O的EER比纯音频基线绝对提高0.31%。在风格对称的LLM生成反事实协议下，八个最近的音频语言模型（7B-30B+参数，包括开源和闭源）在双向强制选择的音高级AC-Verify中得分为49-77%，而我们的双编码器达到88.66%。

英文摘要

Modern speaker verification (SV) systems rely on speaker embeddings that are effective but difficult to interpret or query in natural language. Most existing speech-text corpora target controllable synthesis or utterance-level captioning, and provide limited speaker-level supervision for in-the-wild speaker recognition. This paper introduces SpeakerCard-1M, a bilingual speaker-centric resource for evidence-grounded SV, derived from VoxCeleb1/2 and CN-Celeb1/2, where the "-1M" suffix refers to the 1.78M utterance-level captions contained in the release. We adopt a tool-first, LLM-last approach: ten acoustic probes produce field-level evidence, the evidence is aggregated into speaker profiles under a schema that separates relatively stable traits from utterance-level states, and bilingual Speaker Cards are rendered by a constrained LLM that sees only the structured fields. The release includes 56.7K Speaker Card records over 10.2K speakers, 1.78M utterance-level captions, and speaker-ID-disjoint hard-negative triplets. We further define two SV-oriented cross-modal protocols, bidirectional Speaker-Text Retrieval (T2S-R / S2T-R) and Attribute-Conditioned Verification (AC-Verify), and compare a dual-encoder baseline against recent audio language models under a zero-shot forced-choice setting. Joint audio-text training increases VoxCeleb1-O EER by 0.31% absolute over the audio-only baseline. Under a style-symmetric LLM-generated counterfactual protocol, eight recent audio language models (7B-30B+ parameters, both open- and closed-source) score 49-77% on pitch-level AC-Verify under two-way forced choice, compared with 88.66% reached by our dual encoder.

URL PDF HTML ☆

赞 0 踩 0

2606.03606 2026-06-04 cs.CR cs.AI

Testing LLM Arithmetic Reasoning Generalization with Automatic Numeric-Remapping Attacks

测试大语言模型算术推理泛化能力：自动数值重映射攻击

Malia Barker, Bishal Lakha, Edoardo Serra, Francesco Gullo

发表机构 * Department of Computer Science, Boise State University（计算机科学系，博伊州立大学）； University of L’Aquila（拉奎拉大学）

AI总结提出自动数值重映射攻击算法，通过保持推理程序的小数值变化测试LLM算术推理鲁棒性，发现GSM8K上准确率下降12-26个百分点，而MAWPS和MultiArith更稳定。

详情

AI中文摘要

大语言模型在算术推理基准上表现强劲，应对算术脆弱性的一种常见方法是将计算委托给代码。然而，模型仍经常用于需要直接从自然语言推理的场景，可信赖的模型应能解决小数值算术文字题而无需外部工具。先前工作表明，LLM对数值变化敏感：模型可能解决原始问题，但在需要相同推理过程但数字不同的结构相似变体上失败。我们探究这种脆弱性是否在更严格的设置下持续存在，该设置涉及保留原始推理程序并避免大数值压力测试的小规模、模式保持的数值变化。我们引入了一种自动算法，用于生成算术文字题的数值重映射攻击。与需要手动模式或约束的基于模板的扰动方法不同，我们的方法推导问题特定的符号表示，生成受约束的数值重映射，重新计算正确答案，并通过由LLM生成的编辑计划指导的确定性编辑实现变换后的问题。分阶段验证和高置信度审计保留了可靠的攻击，使得流水线在有限人工干预下可扩展。我们在GSM8K、MAWPS和MultiArith上评估了DeepSeek-R1 (70B)、Gemma4 (31B)和GPT-OSS (120B)。在GSM8K上，完成的运行显示条件准确率下降12.16至25.82个百分点。MAWPS和MultiArith则稳定得多，大多数攻击后的准确率接近或高于98%。这些结果表明，数值重映射鲁棒性强烈依赖于数据集结构：即使推理程序被保留且答案被重新计算，GSM8K仍然敏感，而更短、更规则的数据集则更鲁棒。

英文摘要

Large language models achieve strong performance on arithmetic reasoning benchmarks, and one common response to arithmetic brittleness is to delegate computation to code. Yet models are still often used in settings where they must reason directly from natural language, and trustworthy models should solve small-number arithmetic word problems without external tools. Prior work shows that LLMs are sensitive to numerical variation: a model may solve an original problem but fail on structurally similar variants requiring the same reasoning procedure with different numbers. We ask whether this fragility persists under a stricter setting involving small, schema-preserving numeric changes that retain the original reasoning program and avoid large-number stress tests. We introduce an automatic algorithm for generating numeric-remapping attacks on arithmetic word problems. Unlike template-based perturbation methods requiring manual schemas or constraints, our approach derives problem-specific symbolic representations, generates constrained numeric remappings, recomputes gold answers, and realizes transformed questions through deterministic edits guided by LLM-generated edit plans. Stage-wise validation and a high-confidence audit retain reliable attacks, making the pipeline scalable with limited human intervention. We evaluate DeepSeek-R1 (70B), Gemma4 (31B), and GPT-OSS (120B) on GSM8K, MAWPS, and MultiArith. On GSM8K, completed runs show conditional accuracy drops of 12.16 to 25.82 percentage points. MAWPS and MultiArith are far more stable, with most attacked accuracies near or above 98%. These results show that numeric-remapping robustness depends strongly on dataset structure: GSM8K remains sensitive even when reasoning programs are preserved and answers are recomputed, while shorter, more regular datasets are more robust.

URL PDF HTML ☆

赞 0 踩 0

2606.03323 2026-06-04 cs.CR cs.AI

Implement Kubernetes Pod-Level Remote Attestation for Confidential Workloads on dstack

dstack-capsule：Kubernetes 上机密工作负载的 Pod 级远程证明

Yang Yang, Kevin Wang, Yuanhai Luo, Hang Yin, Jie Cai, Shunfan Zhou, Wenfeng Wang

发表机构 * OPPO ； Phala

AI总结提出 dstack-capsule 平台，通过两层证明架构和权限熔断机制，在 Intel TDX 上实现多个 Pod 共享一个机密虚拟机且每个 Pod 保留独立硬件背书身份的 Pod 级远程证明，避免了每 Pod 独立虚拟机的资源开销。

详情

AI中文摘要

LLM即服务和其他机密云工作负载的兴起要求密码学证明用户数据在可信、未被篡改的环境中处理。现有解决方案，特别是机密容器（CoCo），强制执行严格的“每个虚拟机一个Pod”模型，仅证明客户机操作系统栈，留下容器级身份未验证，并导致高昂的每虚拟机资源开销。我们提出dstack-capsule，一个Kubernetes平台，通过允许多个Pod共享单个机密虚拟机，同时每个Pod保留独立的硬件背书身份，在Intel TDX上实现Pod级远程证明。我们的关键见解是两层证明架构：静态平台测量通过不可逆的权限熔断冻结在RTMR[3]中，而动态Pod身份（pod_uid、pod_spec_hash、workload_id）嵌入在TDX Quote的report_data字段中，并在每次请求时由硬件签名。dstack-capsule引入了（1）一个Pod级证明协议，将Pod规范摘要绑定到硬件签名的Quote；（2）一个权限熔断机制，原子地将节点从设置模式转换到安全模式；（3）一个多层沙箱，涵盖存储、运行时、准入、API和网络隔离层；以及（4）一个基于Kubernetes 1.32、Intel TDX和Sysbox的完整开源实现。我们评估了dstack-capsule的安全属性、证明正确性和性能特征，证明它实现了Pod粒度验证，而没有每虚拟机隔离的资源开销。

英文摘要

The rise of LLM-as-a-Service and other confidential cloud workloads demands cryptographic proof that user data is processed in a trusted, untampered environment. Existing solutions, notably Confidential Containers (CoCo), enforce a strict "one Pod per VM" model that attests only the Guest OS stack, leaving container-level identity unverified and incurring prohibitive per-VM resource overhead. We present dstack-capsule, a Kubernetes platform that enables Pod-level remote attestation on Intel TDX by allowing multiple Pods to share a single Confidential VM while each retains independent, hardware-backed proof of identity. Our key insight is a two-layer attestation architecture: static platform measurements are frozen in RTMR[3] via an irreversible privilege fuse, while dynamic Pod identities (pod_uid, pod_spec_hash, workload_id) are embedded in the TDX Quote's report_data field and signed by hardware on every request. dstack-capsule introduces (1) a Pod-level attestation protocol binding Pod spec digests to hardware-signed Quotes; (2) a privilege fuse mechanism that atomically transitions a node from setup mode to secure mode; (3) a multi-layer sandbox spanning storage, runtime, admission, API, and network isolation layers; and (4) a complete open-source implementation based on Kubernetes 1.32, Intel TDX, and Sysbox. We evaluate the security properties, attestation correctness, and performance characteristics of dstack-capsule, demonstrating that it achieves Pod-granularity verification without the resource overhead of per-VM isolation.

URL PDF HTML ☆

赞 0 踩 0

2606.03307 2026-06-04 cs.IR cs.AI

Generalizing Graph Foundation Models via Hyperbolic Retrieval-Augmented Generation

通过双曲检索增强生成泛化图基础模型

Yifan Jin, Qirui Ji, Bin Qin, Jiangmeng Li, Lixiang Liu, Fuchun Sun, Changwen Zheng

发表机构 * Institute of Software, Chinese Academy of Sciences（中国科学院软件研究所）； University of Chinese Academy of Sciences（中国科学院大学）； Tsinghua University（清华大学）

AI总结提出双曲检索增强生成框架，通过双曲空间索引树状外部知识库并多粒度检索，解决图基础模型分布偏移下的泛化问题。

Comments Accepted by KDD2026

详情

DOI: 10.1145/3770855.3817750

AI中文摘要

图基础模型（GFMs）通过利用大规模预训练进行跨领域推理，成为图表示学习中的主导范式。然而，这些模型编码的参数化知识不足以应对分布偏移，限制了其泛化能力。为了缓解这一问题，检索增强生成（RAG）被引入以在推理时融入外部知识。然而，现有在欧几里得空间中运行的RAG框架存在一个基本的几何限制：欧几里得空间的多项式体积增长与树状结构的外部知识库本质上不匹配。这种不匹配导致检索中语义粒度的损失，并产生枢纽效应。为了解决这一限制，我们提出了双曲检索增强生成（HyRAG）框架，旨在增强GFMs的泛化能力。具体来说，引入的双曲知识索引模块通过在双曲空间中建模外部知识库，保留了其树状层次结构。然后，多粒度检索模块通过粗粒度和细粒度知识检索分别为GFMs提供全局语义锚点和局部语义细节。最后，双路径融合模块在特征和结构层面实现了图任务的有效知识整合。在多个图基准上的实验表明，在零样本设置下取得了显著改进，突显了我们的方法在鲁棒GFMs推理中的泛化能力。

英文摘要

Graph foundation models (GFMs) emerged as a dominant paradigm in graph representation learning by leveraging large-scale pre-training for cross-domain inference. However, the parameterized knowledge encoded within these models is insufficient to cope with distribution shifts, limiting their generalization ability. To mitigate this issue, retrieval-augmented generation (RAG) has been introduced to incorporate external knowledge at inference time. Nevertheless, existing RAG frameworks operating in Euclidean space suffer from a fundamental geometric limitation: the polynomial volume growth of Euclidean space is inherently mismatched with the tree-structured external knowledge bases. This mismatch leads to the loss of semantic granularity in retrieval and gives rise to the hubness phenomenon.To address this limitation, we propose a Hyperbolic Retrieval-Augmented Generation (HyRAG) framework designed to enhance the generalization capabilities of GFMs. Specifically, the introduced Hyperbolic Knowledge Indexing module retains the tree-like hierarchies of the external knowledge base by modeling them within hyperbolic space. The Multi-granularity Retrieval module then provides GFMs with the global semantic anchors and local semantic nuances through coarse-grained and fine-grained knowledge retrieval, respectively. Finally, the Dual-path Fusion module achieves effective knowledge integration for graph tasks at both the feature and structural levels. Experiments on multiple graph benchmarks demonstrate significant improvements in the zero-shot setting, highlighting the generalization of our method for robust GFMs inference.

URL PDF HTML ☆

赞 0 踩 0

2606.03554 2026-06-04 cond-mat.stat-mech cs.AI nlin.AO physics.comp-ph

Constraint-Enhanced Physical Search through Correlation Matching

通过相关性匹配的约束增强物理搜索

Song-Ju Kim

发表机构 * SOBIN Institute LLC（SOBIN研究所）

AI总结本文提出约束增强物理搜索原理，通过将探索的时间相关性与约束诱导的空间相关性匹配，利用最小拉锯战赌博机模型（TOW）证明守恒律将局部观测转化为跨选项的差异证据，而时间相关驱动控制探索顺序，从而提升搜索效率。

Comments 13 pages, 4 figures

详情

AI中文摘要

物理系统不仅为搜索过程添加噪声，还施加约束，从而产生结构化相关性。我们提出一个约束增强物理搜索原理，其中探索的时间相关性与更新动力学中约束诱导的空间相关性相匹配。使用一个最小拉锯战赌博机模型（TOW），我们证明守恒律将局部观测转化为跨选项的差异证据，而时间相关驱动控制探索顺序。搜索效率的提升不是通过更强的随机性或最大反相关性，而是通过将时间相关性与将反馈转化为证据的物理更新尺度相匹配。一个标度估计识别出更新噪声与对比度之比是限制时间反相关性使用程度的主要参数。结果提示了物理搜索的一个一般组织原则：约束和涨落可以产生结构化的时空相关性，当这些相关性与更新动力学相匹配时，高效探索就会出现。

英文摘要

Physical systems do not merely add noise to search processes; they impose constraints that generate structured correlations. We propose a principle of constraint-enhanced physical search in which temporal correlations in exploration are matched to constraint-induced spatial correlations in the update dynamics. Using a minimal tug-of-war bandit model (TOW), we show that a conservation law converts local observations into differential evidence across alternatives, while a temporally correlated drive controls the order of exploration. Search efficiency is improved not by stronger randomness or by maximal anti-correlation, but by matching the temporal correlation to the physical update scale that converts feedback into evidence. A scaling estimate identifies the update-noise-to-contrast ratio as the leading parameter that limits how strongly temporal anti-correlation can be used. The results suggest a general organizing principle for physical search: constraints and fluctuations can generate structured spatiotemporal correlations, and efficient exploration emerges when these correlations are matched to the update dynamics.

URL PDF HTML ☆

赞 0 踩 0

2606.01804 2026-06-04 eess.AS cs.SD

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

SpeechEditBench：面向指令引导语音编辑的双语多属性基准

Hanlin Zhang, Daxin Tan, Dehua Tao, Xiao Chen, Haochen Tan, Linqi Song

发表机构 * Department of Computer Science, City University of Hong Kong（香港城市大学计算机科学系）； AI Lab, Leibniz Research Center, Huawei（华为莱茵研究院人工智能实验室）

AI总结提出SpeechEditBench双语多属性基准，通过锚点评估协议衡量语音编辑中目标属性的修改成功与非目标属性的保持，发现现有模型在组合编辑任务上表现不佳。

详情

AI中文摘要

指令引导的语音编辑要求模型在修改指定语音属性的同时保持无关特征。尽管语音大语言模型（Speech LLMs）进展迅速，但对该能力的系统评估仍具挑战，因为现有基准分散于孤立的编辑任务。为弥补这一差距，我们引入了 extbf{SpeechEditBench}，一个用于指令引导语音编辑的双语多属性基准。SpeechEditBench包含七个原子编辑任务，以及将多个操作整合到单条指令中的组合编辑任务。我们提出了一种基于锚点的评估协议，分别评估目标属性的编辑成功和未目标属性的保持，从而得出三个指标：目标成功、保持成功和联合成功。利用该基准，我们评估了主流的Speech LLMs和专门的语音编辑系统。结果揭示了三个关键发现：（1）没有单一模型在所有编辑维度上表现良好；（2）闭源Speech LLMs通常优于开源模型；（3）组合编辑仍然极具挑战，即使是最先进的模型也难以实现高联合成功。SpeechEditBench提供了一个严格的诊断框架来识别Speech LLMs的瓶颈，从而促进具有更稳健和精确指令引导编辑能力的下一代Speech LLMs的开发。数据和代码将在接收后发布。

英文摘要

Instruction-guided speech editing requires a model to modify specified speech attributes while preserving unrelated characteristics. Despite rapid progress in Speech Large Language Models (Speech LLMs), systematic evaluation of this capability remains challenging, as existing benchmarks are fragmented across isolated editing tasks. To bridge this gap, we introduce SpeechEditBench, a bilingual multi-attribute benchmark for instruction-guided speech editing. SpeechEditBench encompasses seven atomic editing tasks, as well as compositional editing tasks that integrate multiple operations within a single instruction. We propose an anchor-based evaluation protocol that separately assesses the edit success of target attributes and the preservation of untargeted attributes, leading to three metrics: target success, preservation success, and joint success. Using this benchmark, we evaluate mainstream Speech LLMs and specialized speech editing systems. The results reveal three key findings: (1) no single model performs well across all editing dimensions; (2) closed-source Speech LLMs generally outperform open-source models; (3) compositional editing remains highly challenging, with even the most advanced models struggling to achieve high joint success. SpeechEditBench provides a rigorous diagnostic framework to identify bottlenecks in Speech LLMs, thereby facilitating the development of next-generation Speech LLMs with more robust and precise instruction-guided editing capabilities. Data and code are avaialble at https://github.com/daxintan-cuhk/SpeechEditBench .

URL PDF HTML ☆

赞 0 踩 0

2606.01138 2026-06-04 cs.CR cs.AI cs.DC

memorywire: A Vendor-Neutral Wire Format for Agent Memory Operations

AMP：一种用于智能体内存操作的供应商中立线格式

Thamilvendhan Munirathinam

发表机构 * Independent Researcher（独立研究者）

AI总结提出一种基于JSON-Schema 2020-12的供应商中立线格式memorywire，支持五种内存操作和四种内存类型，通过参考实现和基准测试验证其性能与兼容性。

Comments v2: title corrected from pre-launch name "AMP" to "memorywire"; abstract clarifies recall@5 = 1.000 is on the 42 gold-id queries (50 total; 8 no-match probes excluded). 17 pages, 1 figure, 6 tables. Code: github.com/mthamil107/memorywire. Companion to arXiv:2604.18248 (Prompt Injection Detection)

详情

AI中文摘要

智能体内存框架——mem0、Letta/MemGPT、Cognee、Zep/Graphiti、MemoryOS、MemTensor——各自提供自己的SDK、存储布局和操作词汇。没有共享的线格式：每次集成都是定制的，每次迁移都从头重建内存，并且没有框架提供治理界面，让人类在写入进入长期存储之前进行审查。我们提出memorywire，一种基于JSON-Schema 2020-12的线格式，支持五种内存操作（记住、回忆、遗忘、合并、过期）和四种内存类型（语义、情景、程序、情感），并包含一个MemoryStore接口、一个扇出路由器以及一个可选的人机回环治理通道。我们描述了一个开源参考实现，包含五个后端适配器（sqlite-vec、mem0、Letta、Cognee、pgvector）；一个基于100个事实/50个查询的标注语料库的微基准测试，在42个标注查询上实现了recall@5=1.000，摄入p50=37.8毫秒，回忆p50=40.6毫秒；一个对抗融合实验显示，在1-of-N秩0注入扫描（K∈{0,5,...,50}）中，倒数秩融合保持recall@5=1.000，而最大融合在K≥5时下降至0.500，泄露率达80%；以及一个16场景跨适配器一致性测试套件，80个单元中通过68个，零失败。贡献不在于新算法，而在于将现有组件（RRF、FSM、STM/LTM整合、差异与批准工作流）打包成一个供应商中立的协议，并附有经验验证的参考实现，旨在与模型上下文协议协作而非竞争。

英文摘要

Agent-memory frameworks -- mem0, Letta/MemGPT, Cognee, Zep/Graphiti, MemoryOS, MemTensor -- each ship their own SDK, storage layout, and operational vocabulary. There is no shared wire format: every integration is bespoke, every migration rebuilds memory from scratch, and no framework ships a governance surface that lets a human review writes before they enter long-term storage. We present memorywire, a JSON-Schema 2020-12 wire format for five memory operations (remember, recall, forget, merge, expire) over four memory types (semantic, episodic, procedural, emotional), with a MemoryStore interface, a fan-out router, and an optional HITL governance channel. We describe an open-source reference implementation with five backend adapters (sqlite-vec, mem0, Letta, Cognee, pgvector); a microbenchmark on a 100-fact / 50-query labelled corpus (42 with non-empty gold ids + 8 no-match probes) achieving recall@5 = 1.000 on the 42 gold-id queries with ingest p50 = 37.8 ms and recall p50 = 40.6 ms; an adversarial-fusion experiment showing Reciprocal Rank Fusion holds recall@5 = 1.000 across a 1-of-N rank-0 injection sweep (K in {0, 5, ..., 50}) where max fusion collapses to 0.500 with 80% leak at K >= 5; and a 16-scenario cross-adapter conformance suite passing 68 of 80 cells with zero failures. The contribution is not a new algorithm; it is a packaging of established components (RRF, FSMs, STM/LTM consolidation, diff-and-approve workflows) into a venue-neutral protocol with an empirically validated reference, positioned to compose with the Model Context Protocol rather than compete with it.

URL PDF HTML ☆

赞 0 踩 0

2605.30995 2026-06-04 cs.CY cs.CL

Traceable by Design: An LLM Pipeline and Dashboard for EU Regulatory Consultation Analysis

可追溯性设计：用于欧盟监管咨询分析的LLM流程与仪表板

Thales Bertaglia, Haoyang Gui, Catalina Goanta, Gerasimos Spanakis

发表机构 * Utrecht University（乌特勒支大学）； Maastricht University（马斯特里赫特大学）

AI总结提出基于LLM的端到端流程与交互式仪表板，从监管咨询提交中提取结构化主题，确保逐字引用、完全可追溯和透明性，并以欧盟数字公平法案为例验证。

Comments This research has been supported by funding from the ERC Starting Grant HUMANads (ERC-2021-StG No 101041824)

详情

AI中文摘要

公众咨询产生大量利益相关者提交的数据，手动分析几乎不可行。我们提出了一个基于LLM的端到端流程和交互式仪表板，用于从监管咨询提交中提取结构化主题，并以欧盟委员会数字公平法案（DFA）公开征集证据作为案例研究。该系统处理原始PDF附件和网络表单响应，提取主题注释，并将每个提取结果基于源文本的逐字引用。应用于4,322份DFA提交，该流程生成了15,368个主题注释，并附有20,951条逐字证据引用。三个原则指导了所提出的设计：逐字引用、完全可追溯性和透明性设计。仪表板通过五个分析视图展示完整的提取数据集，从数据集级别的主题概览到单个段落的深入分析，每个结果都可追溯到其来源。除了预定义的DFA主题类别外，该流程还生成了某些利益相关者关注的问题，如年龄验证、支付处理器审查和数字所有权，这些是固定分类法方法会遗漏的。该流程是领域通用的；将其适应新的咨询只需要更新提示词和新的数据集。实时演示可在https://dfa-dashboard.thalesbertaglia.com/获取。代码和处理后的数据可在https://github.com/thalesbertaglia/dfa-dashboard公开获取。

英文摘要

Public consultations generate large volumes of data in the form of stakeholder submissions that are practically unfeasible to analyse manually. We present an end-to-end LLM-based pipeline and interactive dashboard for structured topic extraction from regulatory consultation submissions, demonstrated on the European Commission's Digital Fairness Act (DFA) public call for evidence as a case study. The system processes raw PDF attachments and web-form responses, extracts topic annotations, and grounds every extraction in a verbatim quote from the source text. Applied to 4,322 DFA submissions, the pipeline produced 15,368 topic annotations supported by 20,951 verbatim evidence quotes. Three principles govern the proposed design: verbatim grounding, full traceability, and transparency by design. The dashboard exposes the full extraction dataset through five analytical views, from dataset-level topic overviews to individual paragraph drill-downs, with every result traceable to its source. Beyond the predefined DFA topic categories, the pipeline generated certain stakeholder concerns, such as Age Verification, Payment Processor Censorship, and Digital Ownership, that a fixed-taxonomy approach would have missed. The pipeline is domain-generic; adapting it to a new consultation requires only a prompt update and a new dataset. A live demo is available at https://dfa-dashboard.thalesbertaglia.com/. The code and processed data are publicly available at https://github.com/thalesbertaglia/dfa-dashboard.

URL PDF HTML ☆

赞 0 踩 0

2605.30457 2026-06-04 eess.AS cs.CL

Extracting accent features in spoken Brazilian Portuguese without sociolinguistic labels

在没有社会语言学标签的情况下提取巴西葡萄牙语的口音特征

Pedro H. L. Leite, Pedro Benevenuto Valadares, Luiz W. P. Biscainho

发表机构 * PEE/COPPE, UFRJ（PEE/COPPE，UFRJ）； Faculdade de Engenharia Elétrica e Computação (FEEC), UNICAMP（电子工程与计算学院（FEEC），UNICAMP）； DEL/Poli & PEE/COPPE, UFRJ（DEL/Poli与PEE/COPPE，UFRJ）

AI总结针对巴西葡萄牙语口音分类中标签缺乏的问题，提出一种仅使用声学标签的新工作流，通过隔离区域口音地标和基于音素的强制对齐器提取特征，在口音相关任务上优于通用架构。

Comments This work was submitted to the XLIV Brazilian Symposium on Telecommunications and Signal Processing (SBrT 2026)

2605.27488 2026-06-04 cs.CR cs.AI

Grimlock: Guarding High-Agency Systems with eBPF and Attested Channels

Grimlock: 使用eBPF和认证通道保护高代理系统

Qiancheng Wu, Wenhui Zhang, Gan Fang, Sheng Mao, Biao Gao, David Levitsky, Shawna Murphy Butterworth, Rob Cameron

发表机构 * Roblox

AI总结针对代理系统中用户编排代码带来的安全挑战，提出Grimlock代理守卫，通过eBPF强制流量拦截和TLS 1.3通道绑定认证，实现透明、可审计、作用域绑定的代理间通信。

Comments Vision paper presented at the 1st Workshop on Operating Systems Design for AI Agents (AgenticOS '26), co-located with ASPLOS 2026

详情

AI中文摘要

代理系统越来越多地运行用户编写的编排代码，这些代码调用工具、生成子任务并在机器和云之间委派工作。虽然这种高代理效率很高，但它带来了安全问题：身份、授权、来源和委派通常被推入应用程序代码，在那里它们变得难以一致地执行和审计。我们提出Grimlock，一种代理守卫，通过将信任执行移动到沙箱子系统中，同时保持代理代码不变，来恢复关注点分离。Grimlock使用eBPF强制流量拦截来确保沙箱通信通过守卫，并将其与绑定到标准TLS 1.3通道绑定的握手后认证相结合。通道建立后，守卫授权通信并生成短暂的、通道绑定的作用域令牌，这些令牌捕获最小权限委派。在接收端，目标守卫重新验证身份、作用域和通道绑定，终止TLS，并仅在策略检查成功后向目标沙箱释放明文。kTLS为受保护的通信提供了高效的数据平面。因此，Grimlock提供了一条路径，使用通用Linux原语，无需更改用户层编排代码，即可在异构多云环境中实现透明、可审计、作用域绑定的代理间通信。

英文摘要

Agentic systems increasingly run user-authored orchestration code that invokes tools, spawns subtasks, and delegates work across machines and clouds. Although this high agency is productive, it creates a security problem: identity, authorization, provenance, and delegation are often pushed into application code, where they become difficult to enforce consistently and difficult to audit. We present Grimlock, an Agent Guard that restores separation of concerns by moving trust enforcement into the sandbox substrate while leaving agent code unchanged. Grimlock uses eBPF-enforced traffic interception to ensure that sandbox communication passes through a guard, and combines it with post-handshake attestation bound to standard TLS~1.3 channel bindings. After a channel is established, the guard authorizes communication and mints short-lived, channel-bound scope tokens that capture least-privilege delegation. At the receiving side, the destination guard re-validates identity, scope, and channel binding, terminates TLS, and releases plaintext to the destination sandbox only after policy checks succeed. kTLS provides an efficient dataplane for protected communication. As a result, Grimlock offers a path toward transparent, auditable, and scope-bound agent-to-agent communication across heterogeneous multi-cloud environments, using commodity Linux primitives and without requiring changes to user-layer orchestration code.

URL PDF HTML ☆

赞 0 踩 0

2605.26814 2026-06-04 cond-mat.str-el cs.LG physics.comp-ph

Neural Autoregressive Control Variates for the Quantum Monte Carlo Sign Problem

量子蒙特卡洛符号问题的神经自回归控制变量

Bei Qiao, Lei Wang

发表机构 * Beijing National Laboratory for Condensed Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China（北京凝聚态物理国家实验室和物理研究所，中国科学院，北京100190，中国）； University of Chinese Academy of Sciences, Beijing 100049, China（中国科学院大学，北京100049，中国）

AI总结通过训练一对自回归模型构造零均值控制变量，有效缓解量子蒙特卡洛模拟中的符号问题，在三角晶格海森堡反铁磁体上实现平均符号标准误差降低一个数量级，能量估计误差降低三到五倍。

Comments 19 pages, 9 figures

详情

AI中文摘要

我们训练一对自回归模型来构造零均值控制变量，以缓解量子蒙特卡洛模拟中的符号问题。这两个自回归网络被限制在严格不相交支撑的正负符号扇区内，并且每个网络在其扇区内精确归一化。因此，它们的差在结构上具有零均值，提供了一个无偏的辅助可观测量，其与符号估计量的相关性控制方差减少。我们在随机级数展开框架内实现该方法，通过开发增量环拓扑更新将其扩展到受挫晶格。符号遍历采样通过扭转通道实现，这是非二分晶格上唯一的符号改变机制。我们将控制变量实现为自回归变换器，并带有序列结束奇偶掩码以强制精确的符号扇区分辨率，同时将增量环计数变化和累积受挫奇偶性作为拓扑特征纳入。在三角晶格海森堡反铁磁体上，我们在小$N$极限下对该方法进行基准测试。控制变量将平均符号的标准误差降低了一个数量级，并将能量估计量的标准误差降低了三到五倍，即使在平均符号低于$10^{-3}$时仍然有效。这项工作奠定了框架并提供了原理验证，表明自回归控制变量可以有效缓解符号问题。扩展到更大系统并采用物理信息架构是未来工作的主题。

英文摘要

We train a pair of autoregressive models to construct zero-mean control variates to mitigate the sign problem in quantum Monte Carlo simulations. The two autoregressive networks are confined to the positive- and negative-sign sectors with strictly disjoint support, and each is exactly normalized over its sector. Their difference is therefore structurally zero-mean, providing an unbiased auxiliary observable whose correlation with the sign estimator controls the variance reduction. We implement the method within the stochastic series expansion framework, which we extend to frustrated lattices by developing an incremental loop-topology update. Sign-ergodic sampling is achieved through a twist channel, which is the unique sign-changing mechanism on non-bipartite lattices. We implement the control variates as autoregressive transformers with an end-of-sequence parity mask that enforces exact sign-sector resolution, while the incremental loop-count change and cumulative frustration parity are incorporated as topological features. On the triangular-lattice Heisenberg antiferromagnet, we benchmark the method in the small-$N$ limit. The control variate reduces the standard error of the average sign by up to an order of magnitude and that of the energy estimator by a factor of three to five, remaining effective even when the average sign drops below $10^{-3}$. This work lays out the framework and provides a proof-of-principle demonstration that autoregressive control variates can effectively mitigate the sign problem. Scaling to larger systems with physics-informed architectures is the subject of future work.

URL PDF HTML ☆

赞 0 踩 0

2605.30120 2026-06-04 cs.IR cs.AI cs.LG

No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

不再需要K-means：用于高效多向量检索的单阶段稀疏编码

Lixuan Guo, Yifei Wang, Tiansheng Wen, Aosong Feng, Stefanie Jegelka, Chenyu You

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结针对多向量检索中K-means聚类导致的索引延迟和语义损失问题，提出单阶段稀疏检索（SSR），利用稀疏自编码器将词元嵌入投影为高维稀疏表示，结合倒排索引实现高效检索，在BEIR基准上索引时间减少15倍、检索延迟减半且性能提升。

Comments Accepted by ICML2026

详情

AI中文摘要

以ColBERT为代表的多向量检索（MVR）模型通过保留细粒度的词元级交互，在检索准确性上树立了新标杆。然而，这种粒度带来了存储和检索效率的瓶颈：为了管理十亿级词元向量的巨大内存占用和计算开销，最先进的系统被迫依赖激进的降维和复杂的聚类（例如K-means）。这种妥协引入了两个关键限制：大规模语料库聚类的过度索引延迟以及压缩固有的语义信息损失。在本文中，我们提出了单阶段稀疏检索（SSR），这是一种范式转变，用高效的稀疏编码取代了昂贵的聚类。我们不将特征压缩为低维稠密向量，而是利用稀疏自编码器（SAE）将词元嵌入投影到高维但高度稀疏的表示中。这种转换使我们能够完全绕过向量聚类，并利用倒排索引实现精确、高吞吐量的检索。在BEIR基准上的大量实验表明，SSR实现了“三连胜”的改进：与ColBERTv2相比，索引时间减少了15倍，检索延迟减半，同时检索性能优于领先的基线方法。

英文摘要

Multi-vector retrieval (MVR) models, exemplified by ColBERT, have established new benchmarks in retrieval accuracy by preserving fine-grained token-level interactions. However, this granularity imposes prohibitive storage and retrieval efficiency bottlenecks: to manage the immense memory footprint and computational overhead of billion-scale token vectors, state-of-the-art systems are forced to rely on aggressive dimension reduction and complex clustering (e.g., K-means). This compromise introduces two critical limitations: excessive indexing latency of clustering large-scale corpora and semantic information loss inherent to compression. In this paper, we propose Single-stage Sparse Retrieval (SSR}, a paradigm shift that replaces expensive clustering with efficient sparse coding. Instead of compressing features into low-dimensional dense vectors, we utilize Sparse Autoencoder (SAE) to project token embeddings into a high-dimensional but highly sparse representation. This transformation enables us to bypass vector clustering entirely and leverage inverted indexing for precise, high-throughput retrieval. Extensive experiments on the BEIR benchmark demonstrate that SSR achieves a "trifecta" of improvements: it reduces indexing time by 15x compared to ColBERTv2, halves retrieval latency, and simultaneously improves retrieval performance over leading baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.29928 2026-06-04 cs.HC cs.AI

Label Over Logic? How Source Cues Bias Human Fallacy Judgments More Than LLMs

标签胜过逻辑？源标签如何比LLMs更严重地偏差人类的谬误判断

Mahjabin Nahar, Nafis Irtiza Tripto, Aiping Xiong, Ting-Hao 'Kenneth' Huang, Dongwon Lee

发表机构 * The Pennsylvania State University（宾夕法尼亚州立大学）

AI总结通过在线实验和LLM对比，发现人类在评估逻辑谬误时显著受到内容源标签（如人类、AI等）的影响，而LLM评估相对稳定，表明源标签偏差主要是人类的弱点。

详情

AI中文摘要

随着AI生成和AI辅助内容充斥在线空间，附加在这些内容上的源标签可能会扭曲人类的推理判断，对审核、评估和决策产生下游影响。LLM是否也存在这种脆弱性，或者能提供更不受源影响的评估，仍然是一个悬而未决的问题，直接影响人机协作。我们使用逻辑谬误作为受控环境来隔离源标签对推理质量的影响，独立于领域知识。我们进行了一项在线研究（N=505），参与者被分配到不同的源条件（人类、AI、人类辅助AI、AI辅助人类或无披露），并评估包含逻辑谬误的评论，将其判断与LLM（GPT-5.2、Gemini 2.5 Flash、Claude Sonnet 4.5）在相同源条件下的评估进行比较。人类评估者显著更容易受到标记为人类或人类辅助AI的谬误的影响，并在这些条件下给予更高的信任和评估评分。LLM评估在不同源标签下相对稳定，但不同模型表现各异。无论是否存在谬误，人类和LLM在所有条件下的置信水平都同样高。我们的发现表明，推理评估中的源标签偏差主要是人类的弱点，并突显了在日益AI中介的环境中人类与LLM协作的潜力。

英文摘要

As AI-generated and AI-assisted content floods online spaces, source labels attached to such content can distort human reasoning judgments, with downstream consequences for moderation, evaluation, and decision-making. Whether LLMs share this vulnerability, or offer more source-agnostic evaluation, remains an open question with direct implications for human-AI collaboration. We examine this issue using logical fallacies as a controlled setting to isolate source-label effects on reasoning quality, independent of domain knowledge. We conduct an online study (N=505) where participants are assigned to a source condition (human, AI, human with AI assistance, AI with human assistance, or no disclosure) and evaluate comments containing logical fallacies, comparing their judgments with those of LLMs (GPT-5.2, Gemini 2.5 Flash, Claude Sonnet 4.5), who were evaluated across the same source conditions. Human evaluators were significantly more susceptible to fallacies labeled as written by human or human with AI assistance and assigned higher trust and evaluation ratings in these conditions. LLM evaluations remained comparatively stable across source labels, though performance varied across models. Confidence levels were similarly high across conditions for both humans and LLMs, regardless of fallacy presence. Our findings indicate that source-label bias in reasoning evaluation is primarily a human vulnerability and highlight the potential of human-LLM collaboration in increasingly AI-mediated environments.

URL PDF HTML ☆

赞 0 踩 0

2605.18931 2026-06-04 stat.ML cs.AI cs.LG

Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models

马尔可夫链解码器克服Lipschitz生成模型的重尾限制

Abdelhakim Ziani, Andras Horvath, Paolo Ballarini

发表机构 * Université Paris Saclay, Lab. MICS, CentraleSupélec, Gif-sur-Yvette, France（巴黎萨克雷大学，MICS实验室，CentraleSupélec，法国吉夫昂耶vette）； Università di Torino, Torino, Italy（都灵大学，意大利都灵）

AI总结针对Lipschitz生成模型无法生成重尾分布的问题，提出用基于马尔可夫链的Phase-Type分布替换高斯解码器，显著降低了尾部误差和极端分位数误差。

详情

Journal ref: 22nd European Performance Engineering Workshop (EPEW 2026), Jun 2025, Grimstad, Norway

AI中文摘要

重尾分布在性能评估、网络流量和风险建模中普遍存在。这种行为对现代深度生成模型构成了根本性挑战。标准变分自编码器（VAE）采用高斯解码器似然和Lipschitz约束神经网络，这种组合在结构上无法产生重尾输出：高斯尾部呈指数衰减，而Lipschitz连续性阻止解码器放大来自潜在空间的罕见事件以充分克服这种衰减。我们提供了这一局限性的理论刻画，并使用合成Pareto数据（跨越尾部指数$α$ ∈ {2, 3, 5, 30}和维度d ∈ {1, 5, 10}的网格）进行了受控实证演示。作为解决方案，我们在保持编码器、潜在空间和训练过程不变的情况下，将高斯解码器替换为基于马尔可夫链的Phase-Type（PH）分布。PH分布允许对任何正值分布（包括重尾族）进行任意精确的近似。实验表明，对于重尾数据，与高斯基线相比，基于PH的模型将尾部Kolmogorov-Smirnov距离减少了最多6倍，极端分位数误差减少了最多10倍。这些结果表明，将基于马尔可夫链的分布集成到生成模型的解码器中，为重尾生成问题提供了一个有原则且实际有效的解决方案。

英文摘要

Heavy-tailed distributions are prevalent in performance evaluation, network traffic, and risk modeling. This behavior poses a fundamental challenge for modern deep generative models. Standard Variational Autoencoders (VAEs) employ Gaussian decoder likelihoods and Lipschitz-constrained neural networks, a combination that is structurally incapable of producing heavy-tailed outputs: the Gaussian tail decays exponentially, and Lipschitz continuity prevents the decoder from amplifying rare events from the latent space input to sufficiently overcome this decay. We provide both a theoretical characterization of this limitation and a controlled empirical demonstration using synthetic Pareto data across a grid of tail indices $α$ $\in$ {2, 3, 5, 30} and dimensions d $\in$ {1, 5, 10}. As a solution, we replace the Gaussian decoder with a Phase-Type (PH) distribution based on Markov chains, while keeping the encoder, latent space, and training procedure identical. PH distributions allow for arbitrarily precise approximations of any positive-valued distributions, including heavy-tailed families. Experiments showed that the PH-based model reduces tail Kolmogorov-Smirnov distance by up to x6 and extreme quantile error by up to x10 compared to the Gaussian baseline for heavy-tailed data. These results demonstrate that integrating Markov chain-based distributions into the decoder of a generative model institutes a principled and practically effective solution to the heavy-tail generation problem.

URL PDF HTML ☆

赞 0 踩 0

2605.16331 2026-06-04 q-bio.BM cs.AI

Retrieval and competition: how a protein foundation model starts a protein

检索与竞争：蛋白质基础模型如何启动蛋白质

Piotr Jedryszek, Oliver M. Crook

发表机构 * Department of Biology, University of Oxford, Oxford, UK（牛津大学生物学系）； Kavli Institute for Nanoscience Discovery, University of Oxford, Oxford, UK（牛津大学纳科学发现研究所）； Department of Chemistry, University of Oxford, Oxford, UK（牛津大学化学系）

AI总结通过追踪ESM2-8M预测蛋白质起始甲硫氨酸的计算路径，发现模型依赖位置先验检索而非直接识别，揭示了模型置信度与生物学证据之间的脱节。

Comments updated figure 4

详情

AI中文摘要

蛋白质语言模型越来越多地用于指导实验和临床决策，但通常不清楚一个自信的预测是反映了对生物学证据的识别还是对统计默认值的检索。我们针对一个近乎普遍的生物学规则——蛋白质以甲硫氨酸起始——通过追踪ESM2-8M产生该预测的计算路径来检验这一区别。模型并未检测到掩码位置的甲硫氨酸。相反，它通过跨层组装的特定位置查询，从序列起始标记处的参考表示中检索出有利于甲硫氨酸的信号，最终输出通过与上下文相关电路的竞争而出现。为了理解位置信息如何到达读出端，我们引入了旋转频率带内注意力分数的范数-方向分解。位置编码通过分布在各个频带中的查询范数和角度对齐的耦合变化来运作。对于真实N端不是甲硫氨酸的序列（此时生物学问题至关重要），模型仍然预测甲硫氨酸。这不是由意外机制产生的正确预测，而是匹配统计平均值的位置先验检索电路的输出，在生物学偏离平均值的地方失败。区分这两者需要在单个电路、频率带和查询组成的层面上进行解析，这表明在生物学风险更高的预测中，机制验证将是必要且具有挑战性的。即使对于最简单的生物学规则，模型的预测也是通过分布式计算电路而非直接识别来介导的，这表明任务复杂性的增加将进一步模糊模型置信度与潜在生物学证据之间的关系。

英文摘要

Protein language models are increasingly used to guide experimental and clinical decisions, yet it is often unclear whether a confident prediction reflects recognition of biological evidence or retrieval of a statistical default. We examine this distinction for a near-universal biological rule, that proteins begin with methionine, by tracing the computational pathway through which ESM2-8M produces this prediction. The model does not detect methionine at the masked position. Instead, it retrieves a methionine-favouring signal from a reference representation at the beginning-of-sequence token via a position-specific query assembled across layers, with the final output emerging through competition with context-dependent circuits. To understand how positional information reaches the readout, we introduce a norm-direction decomposition of attention scores within rotary frequency bands. Positional encoding operates through coupled changes in query norm and angular alignment distributed across these bands. On sequences whose true N-terminus is not methionine, where the biological question matters, the model predicts methionine anyway. This is not a correct prediction produced by an unexpected mechanism, but the output of a positional-prior retrieval circuit that matches the statistical average and fails where biology diverges from it. Distinguishing the two requires resolution at the level of individual circuits, frequency bands, and query composition, suggesting that mechanistic verification will be necessary, and challenging, for predictions where the biological stakes are higher. Even for the simplest biological rule, the model's prediction is mediated by a distributed computational circuit rather than direct recognition, suggesting that increasing task complexity will further obscure the relationship between model confidence and underlying biological evidence.

URL PDF HTML ☆

赞 0 踩 0

2605.16301 2026-06-04 cs.CY cs.AI cs.LG

Do LLMs Hold Their Values? MANTA: A Multi-Turn Adversarial Benchmark for Animal Welfare Reasoning

LLMs 是否坚持其价值观？MANTA：一个用于动物福利推理的多轮对抗性基准

Isabella Luong, Joyee Chen, Arturs Kanepajs, Jasmine Brazilek, Sankalpa Ghose, David Williams-King, Linh Le, Allen Lu

发表机构 * SPAR ； Compassion Aligned Machine Learning（同情对齐机器学习）； NUS（新加坡大学）； Mila（Mila研究所）； ERA Cambridge（剑桥ERA）

AI总结提出 MANTA 基准，通过多轮对抗性对话评估大语言模型在动物福利推理中的价值观稳定性和道德敏感性，发现单轮基准无法捕捉的排名变化和物种-压力交互效应。

详情

AI中文摘要

评估大语言模型（LLMs）中的动物福利推理仍然是一个开放挑战，尽管它们在消费者和专业环境中迅速部署，其中福利考虑隐含地出现在日常查询中。现有的基准（如 AnimalHarmBench）通过单轮、明确框架的问题进行评估，衡量模型在直接询问时是否避免有害内容。这种方法忽略了两种失败模式：在持续对抗性压力下的对齐退化，以及道德敏感性（模型是否在日常查询中自发提出福利问题）。为填补这一空白，我们构建了 MANTA，一个包含 1,088 个五轮对话的基准，从隐式的第一轮场景开始，通过明确的福利提示，再到来自五种类型（社会、文化、经济、实用和认知）的三轮对抗性压力。我们在两个维度上对对话进行评分：动物福利价值观稳定性（AWVS，主要）和动物福利道德敏感性（AWMS，诊断）。我们评估了七个前沿模型：Claude Opus 4.7、GPT-5.5、DeepSeek V4、Llama 3.3 70B、Mistral Small、Grok 4.3 和 Gemini 3.1 Flash Lite。多轮评估捕捉了单轮基准遗漏的行为：7 个模型中有 4 个相对于第一轮得分改变了排名，包括 Gemini Flash Lite，它在 AWMS 上从第五名下降到 AWVS 上的最后一名。AWMS 和 AWVS 呈正相关但不完全相关，表明道德识别测试捕捉了模型在压力下行为的一个稳定但不完整的组成部分。MANTA 还提供了先前基准无法获得的物种-压力交互矩阵，显示福利鲁棒性同时取决于动物和施加的压力；伴侣动物得分高于野生动物，后者高于养殖动物和无脊椎动物。我们发布了数据集、脚本化压力计划、评判提示和分析代码。

英文摘要

Evaluating animal welfare reasoning in LLMs remains an open challenge despite rapid deployment in consumer and professional contexts where welfare considerations appear implicitly in everyday queries. Existing benchmarks such as AnimalHarmBench evaluate this through single-turn, explicitly framed questions, measuring whether models avoid harmful content when directly asked. This approach overlooks two failure modes: alignment degradation under sustained adversarial pressure, and moral sensitivity (whether a model spontaneously surfaces welfare stakes in everyday queries). To fill this gap, we construct MANTA, a benchmark of 1,088 five-turn conversations progressing from an implicit Turn-1 scenario through an explicit welfare prompt to three adversarial pressure rounds drawn from a five-type taxonomy: Social, Cultural, Economic, Pragmatic, and Epistemic. We score conversations on two dimensions: Animal Welfare Value Stability (AWVS, primary) and Animal Welfare Moral Sensitivity (AWMS, diagnostic). We evaluate seven frontier models: Claude Opus 4.7, GPT-5.5, DeepSeek V4, Llama 3.3 70B, Mistral Small, Grok 4.3, and Gemini 3.1 Flash Lite. Multi-turn evaluation captures behavior single-turn benchmarks miss: 4 of 7 models change rank relative to Turn 1 scores, including Gemini Flash Lite, which drops from fifth on AWMS to last on AWVS. AWMS and AWVS are positively but imperfectly correlated, suggesting moral-recognition tests capture a stable but incomplete component of model behavior under pressure. MANTA also enables a species-by-pressure interaction matrix unavailable to prior benchmarks, showing welfare robustness depends jointly on the animal and pressure applied; companion animals score above wild animals, which score above farmed animals and invertebrates. We release the dataset, scripted pressure plans, judge prompts, and analysis code.

URL PDF HTML ☆

赞 0 踩 0

2605.15118 2026-06-04 cs.CR cs.CL

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

谈话（不）廉价：LLM攻击的分类法与基准覆盖审计

Karthik Raghu Iyer, Yazdan Jamshidi, Nicholas Bray, Alexey A. Shvets

发表机构 * Palo Alto Networks（帕洛阿尔托网络）

AI总结提出一个基于STRIDE的4×6目标×技术矩阵框架，用于审计LLM攻击基准的集体覆盖，发现现有基准仅覆盖最多25%的威胁面，且存在命名碎片化和评估空白。

详情

AI中文摘要

我们引入了一个可重用的框架，用于审计LLM攻击基准是否共同覆盖威胁面：一个基于STRIDE的4×6目标×技术矩阵，该矩阵由从932篇arXiv安全研究（2023-2026）中提取的507叶分类法（401个数据填充叶和106个威胁模型衍生叶）构建而成。该矩阵支持基准外部验证——审计集体覆盖而非单个基准的一致性。将其应用于六个公开基准，发现三个主要框架（HarmBench、InjecAgent、AgentDojo）占据非重叠的单元格，最多覆盖矩阵的25%，而整个STRIDE威胁类别（服务中断、模型内部）缺乏标准化评估，尽管这些类别中已发表的攻击通过没有基准测试的机制实现了46倍令牌放大和96%的攻击成功率。包含2521个独特攻击组的语料库进一步揭示了普遍的命名碎片化（单个攻击最多有29种表面形式）以及集中在安全与对齐绕过上的严重问题，这些结构属性在较小规模下不可见。分类法、攻击记录和覆盖映射作为可扩展工件发布；随着新基准的出现，它们可以映射到同一矩阵上，使社区能够跟踪评估差距是否正在缩小。

英文摘要

We introduce a reusable framework for auditing whether LLM attack benchmarks collectively cover the threat surface: a 4$\times$6 Target $\times$ Technique matrix grounded in STRIDE, constructed from a 507-leaf taxonomy -- 401 data-populated and 106 threat-model-derived leaves -- of inference-time attacks extracted from 932 arXiv security studies (2023--2026). The matrix enables benchmark-external validation -- auditing collective coverage rather than individual benchmark consistency. Applying it to six public benchmarks reveals that the three primary frameworks (HarmBench, InjecAgent, AgentDojo) occupy non-overlapping cells covering at most 25\% of the matrix, while entire STRIDE threat categories (Service Disruption, Model Internals) lack any standardized evaluation, despite published attacks in these categories achieving 46$\times$ token amplification and 96\% attack success rates through mechanisms which no benchmark tests. The corpus of 2,521 unique attack groups further reveals pervasive naming fragmentation (up to 29 surface forms for a single attack) and heavy concentration in Safety \& Alignment Bypass, structural properties invisible at smaller scale. The taxonomy, attack records, and coverage mappings are released as extensible artifacts; as new benchmarks emerge, they can be mapped onto the same matrix, enabling the community to track whether evaluation gaps are closing.

URL PDF HTML ☆

赞 0 踩 0

2605.03353 2026-06-04 cs.CR cs.AI

SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents

SkCC：面向跨框架LLM代理的可移植且安全的技能编译

Yipeng Ouyang, Yi Xiao, Yuhao Gu, Xianwei Zhang

发表机构 * Sun Yat-sen University（中山大学）

AI总结针对LLM代理技能在不同框架间缺乏可移植性和安全性的问题，提出SkCC编译器，通过强类型中间表示SkIR解耦语义与格式，实现跨框架部署，并内置静态优化器强制执行安全约束，显著提升性能并降低适配复杂度。

Comments Accepted by the Agent Skills Workshop at ACM CAIS 2026. 20 pages, 6 figures. Project Homepage: https://skcc.nexa-lang.com/ Code Repo: https://github.com/Nexa-Language/Skill-Compiler/

详情

AI中文摘要

LLM代理越来越依赖可重用技能（例如SKILL markdown文件）来执行复杂任务，但这些工件缺乏可移植性：代理框架对提示格式高度敏感，导致同一技能的性能差异很大。然而，大多数技能以格式无关的Markdown形式一次性编写，需要昂贵的逐框架重写，并且安全性在很大程度上未得到解决，实践中存在广泛漏洞。为解决这些问题，我们提出SkCC，一个LLM代理编译器，将经典编译设计引入代理技能开发。SkCC以SkIR为核心，这是一种强类型中间表示，将技能语义与框架特定格式解耦，从而支持跨代理框架的可移植部署。在此IR之上，静态优化器强制执行安全约束，在部署前阻止漏洞。作为四阶段流水线实现，SkCC有效将跨$m$个技能和$n$个框架的适配复杂度从$O(m \times n)$降低到$O(m + n)$。在SkillsBench上的实验表明，SkCC相比原始版本带来一致且显著的性能提升，在Claude Code上通过率从21.1%提高到33.3%，在Kimi CLI上从35.1%提高到48.7%。此外，该设计实现了低于10ms的编译延迟、94.8%的主动安全触发率以及跨框架10-46%的运行时token节省。

英文摘要

LLM agents increasingly rely on reusable skills (e.g., SKILL markdown files) to execute complex tasks, yet these artifacts lack portability: agent frameworks are highly sensitive to prompt formatting, leading to a large performance variation for the same skill. Nevertheless, most skills are authored once as format-agnostic Markdown, necessitating costly per-framework rewrites and also leaving security largely unaddressed, with widespread vulnerabilities in practice. To address this, we present SkCC, a compiler for LLM agents that introduces classical compilation design into agent skill development. SkCC centers on SkIR, a strongly-typed intermediate representation that decouples skill semantics from framework-specific formatting, thus enabling portable deployment across agent frameworks. Atop of this IR, a static Optimizer enforces security constraints, blocking vulnerabilities before deployment. Implemented as a four-phase pipeline, SkCC effectively reduces adaptation complexity from $O(m \times n)$ to $O(m + n)$ across $m$ skills and $n$ frameworks. Experiments on SkillsBench demonstrate that SkCC delivers consistent and substantial gains over original counterparts, with pass rate increases from 21.1% to 33.3% on Claude Code and from 35.1% to 48.7% on Kimi CLI. Further, the design achieves sub-10ms compilation latency, 94.8% proactive security trigger rate, and 10-46% runtime token savings across frameworks.

URL PDF HTML ☆

赞 0 踩 0

2401.07386 2026-06-04 cs.CY cs.AI cs.LG

How do machines learn? Evaluating the AIcon2abs method

机器如何学习？评估AIcon2abs方法

Rubens Lacerda Queiroz, Cabral Lima, Fabio Ferrentini Sampaio, Priscila Machado Vieira Lima

发表机构 * PPGI, Federal University of Rio de Janeiro（里约热内卢联邦大学PPGI系）； Computer Science Institute, Federal University of Rio de Janeiro（里约热内卢联邦大学计算机科学研究所）； Polytechnic University of Setúbal – Portugal（葡萄牙塞图巴尔理工大学）； PESC/COPPE, Federal University of Rio de Janeiro（里约热内卢联邦大学PESC/COPPE系）； Tercio Pacitti Institute (NCE), Federal University of Rio de Janeiro（里约热内卢联邦大学Tercio Pacitti研究所（NCE））

AI总结本研究通过远程课程实验，评估了基于WiSARD权重神经网络、无需互联网的AIcon2abs方法在提升不同年龄段公众对机器学习理解方面的有效性，结果显示参与者满意度高。

Comments textual review (spelling and grammar); reorganization of the elements of some figures; New references included

详情

AI中文摘要

本研究扩展了先前介绍AIcon2abs方法（从具体到抽象的人工智能：向公众揭秘人工智能）的工作，该方法是一种创新方法，旨在提高不同年龄群体（包括K-12学生）对机器学习（ML）的理解，并评估其有效性。AIcon2abs采用WiSARD算法，这是一种以其简单性和用户可访问性著称的无权重神经网络。WiSARD不需要互联网，使其非常适合非技术用户和资源有限的环境。该方法使参与者能够通过引人入胜的动手活动直观地可视化和交互ML过程，仿佛他们自己就是算法。该方法允许用户通过实践活动直观地可视化和理解训练和分类的内部过程。由于WiSARD的功能不需要互联网连接，它可以从最小数据集（甚至单个示例）中有效学习。这一特性使用户能够观察到机器在接收更多数据时如何逐步提高其准确性。此外，WiSARD生成代表其学习内容的心理图像，突出显示分类数据的基本特征。AIcon2abs通过一个六小时的远程课程进行测试，有34名巴西参与者，包括5名儿童、5名青少年和24名成人。数据分析从两个角度进行：混合方法预实验（包括假设检验）和定性现象学分析。几乎所有参与者都对AIcon2abs给予正面评价，结果显示在实现预期结果方面具有高度满意度。本研究已获得CEP-HUCFF-UFRJ研究伦理委员会的批准。

英文摘要

This study expands on previous work that introduced the AIcon2abs method (AI from Concrete to Abstract: Demystifying Artificial Intelligence to the general public), an innovative approach designed to increase public understanding of machine learning (ML) across diverse age groups, including K-12 students, and aims to evaluate its effectiveness. AIcon2Abs employs the WiSARD algorithm, a weightless neural network known for its simplicity, and user accessibility. WiSARD does not require Internet, making it ideal for non-technical users and resource-limited environments. This method enables participants to intuitively visualize and interact with ML processes through engaging, hands-on activities, as if they were the algorithms themselves. The method allows users to intuitively visualize and understand the internal processes of training and classification through practical activities. Once WiSARDs functionality does not require an Internet connection, it can learn effectively from a minimal dataset, even from a single example. This feature enables users to observe how the machine improves its accuracy incrementally as it receives more data. Moreover, WiSARD generates mental images representing what it has learned, highlighting essential features of the classified data. AIcon2abs was tested through a six-hour remote course with 34 Brazilian participants, including 5 children, 5 adolescents, and 24 adults. Data analysis was conducted from two perspectives: a mixed-method pre-experiment (including hypothesis testing), and a qualitative phenomenological analysis. Nearly all participants rated AIcon2abs positively, with the results demonstrating a high degree of satisfaction in achieving the intended outcomes. This research was approved by the CEP-HUCFF-UFRJ Research Ethics Committee.

URL PDF HTML ☆

赞 0 踩 0

2602.14757 2026-06-04 math.NA cs.LG cs.NA

Solving Inverse Parametrized Problems via Finite Elements and Extreme Learning Networks

通过有限元和极限学习网络求解反参数化问题

Erik Burman, Mats G. Larson, Karl Larsson, Jonatan Vallin

发表机构 * KTH Royal Institute of Technology（皇家理工学院）； Uppsala University（乌普萨拉大学）

AI总结提出一种基于插值的建模框架，结合有限元离散和极限学习机代理，用于控制、反问题和不确定性量化中的参数依赖偏微分方程，并应用于定量光声层析成像，实现计算节省且保持精度。

详情

DOI: 10.1016/j.cma.2026.119077
Journal ref: Comput. Methods Appl. Mech. Engrg. 460 (2026), Paper No. 119077

AI中文摘要

我们开发了一种基于插值的建模框架，用于控制、反问题和不确定性量化中出现的参数依赖偏微分方程。在物理域中使用有限元方法对解进行离散化，同时单独近似对有限维参数的依赖。我们建立了参数解的存在性、唯一性和正则性，并推导了严格的误差估计，明确量化了空间离散化和参数逼近之间的相互作用。在低维参数空间中，经典插值方案基于参数变量的Sobolev正则性产生代数收敛速度。在高维参数空间中，我们用极限学习机（ELM）代理替换经典插值，并在显式逼近和稳定性假设下获得误差界。该框架应用于定量光声层析成像中的反问题，我们推导了势和参数重建误差估计，并证明了与标准方法相比，在不牺牲精度的情况下显著节省了计算量。

英文摘要

We develop an interpolation-based modeling framework for parameter-dependent partial differential equations arising in control, inverse problems, and uncertainty quantification. The solution is discretized in the physical domain using finite element methods, while the dependence on a finite-dimensional parameter is approximated separately. We establish existence, uniqueness, and regularity of the parametric solution and derive rigorous error estimates that explicitly quantify the interplay between spatial discretization and parameter approximation. In low-dimensional parameter spaces, classical interpolation schemes yield algebraic convergence rates based on Sobolev regularity in the parameter variable. In higher-dimensional parameter spaces, we replace classical interpolation by extreme learning machine (ELM) surrogates and obtain error bounds under explicit approximation and stability assumptions. The proposed framework is applied to inverse problems in quantitative photoacoustic tomography, where we derive potential and parameter reconstruction error estimates and demonstrate substantial computational savings compared to standard approaches, without sacrificing accuracy.

URL PDF HTML ☆

赞 0 踩 0

2604.02121 2026-06-04 physics.comp-ph cond-mat.stat-mech cs.LG physics.bio-ph physics.chem-ph

Gradient estimators for parameter inference in discrete stochastic kinetic models

离散随机动力学模型中参数推断的梯度估计器

Ludwig Burger, Annalena Kofler, Lukas Heinrich, Ulrich Gerland

发表机构 * Physics of Complex Biosystems, School of Natural Sciences, James-Franck-Straße 1, 85748 Garching, Germany（复杂生物系统物理系，自然科学院，James-Franck街1号，85748 Garching，德国）； Max Planck Institute for Intelligent Systems, Max-Planck-Ring 4, 72076 Tübingen, Germany（智能系统马克斯·普朗克研究所，Max-Planck环4号，72076 Tübingen，德国）； Max Planck Institute for Gravitational Physics (Albert Einstein Institute), Am Mühlenberg 1, 14476 Potsdam, Germany（引力物理马克斯·普朗克研究所（爱因斯坦研究所），Am Mühlenberg 1号，14476 Potsdam，德国）； Data Science in Physics, School of Natural Sciences, James-Franck-Straße 1, 85748 Garching, Germany（物理学数据科学，自然科学院，James-Franck街1号，85748 Garching，德国）； Munich Center for Machine Learning (MCML), Munich, Germany（慕尼黑机器学习中心（MCML），慕尼黑，德国）

AI总结针对Gillespie随机模拟算法不可微的问题，采用三种机器学习梯度估计器（Gumbel-Softmax直通、得分函数、替代路径）实现参数梯度计算，并在弛豫和振荡动力学系统中验证了其有效性。

Comments 19 pages, 9 figures

详情

AI中文摘要

随机动力学模型在物理学中无处不在，但从实验数据推断其参数仍然具有挑战性。对于确定性模型，参数推断通常依赖于梯度，这些梯度可以通过自动微分（AD）高效获得。然而，AD不能直接应用于Gillespie随机模拟算法（SSA），因为从离散反应集合中采样引入了不可微操作。在这项工作中，我们采用三种来自机器学习的梯度估计器用于Gillespie SSA：Gumbel-Softmax直通（GS-ST）估计器、得分函数估计器和替代路径估计器。我们使用这些估计器来评估稳态和时间依赖可观测量的梯度，并在具有弛豫动力学（双分子结合）和振荡动力学（抑制子）的代表性生物物理系统中比较它们的性能。我们发现GS-ST估计器通常产生表现良好的梯度估计，但在具有挑战性的参数区域中表现出发散的方差，这可能导致参数推断失败。在这些情况下，其他估计器提供更稳健、方差更低的梯度。我们的结果表明，基于梯度的参数推断可以有效地与Gillespie SSA结合，不同的估计器提供互补的优势。

英文摘要

Stochastic kinetic models are ubiquitous in physics, yet inferring their parameters from experimental data remains challenging. For deterministic models, parameter inference often relies on gradients, which can be obtained efficiently through automatic differentiation (AD). However, AD cannot be applied directly to the Gillespie stochastic simulation algorithm (SSA), since sampling from a discrete set of reactions introduces non-differentiable operations. In this work, we adopt three gradient estimators from machine learning for the Gillespie SSA: the Gumbel-Softmax Straight-Through (GS-ST) estimator, the Score Function estimator, and the Alternative Path estimator. We use the estimators to evaluate gradients of steady-state and time-dependent observables, and compare their performance in representative biophysical systems with relaxation dynamics (bimolecular association) and oscillatory dynamics (repressilator). We find that the GS-ST estimator generally yields well-behaved gradient estimates, but exhibits diverging variance in challenging parameter regimes, which can cause parameter inference to fail. In these cases, other estimators provide more robust, lower variance gradients. Our results demonstrate that gradient-based parameter inference can be effectively combined with the Gillespie SSA, with different estimators offering complementary advantages.

URL PDF HTML ☆

赞 0 踩 0

2510.21459 2026-06-04 cs.CR cs.CL cs.LG

SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots

SBASH：用于设计和评估RAG与提示调优的LLM蜜罐框架

Adetayo Adebimpe, Helmut Neukirchen, Thomas Welsh

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出SBASH框架，利用轻量级本地LLM和RAG技术构建蜜罐，通过多种指标评估RAG与提示调优对LLM蜜罐真实性和响应延迟的影响。

Comments to be published in: The 3rd International Conference on Foundation and Large Language Models (FLLM2025), IEEE, 2025

详情

DOI: 10.1109/FLLM67465.2025.11391242
Journal ref: 2025 3rd International Conference on Foundation and Large Language Models (FLLM), IEEE, 2025

AI中文摘要

蜜罐是用于收集有价值威胁情报或将攻击者从生产系统引开的诱饵系统。最大化攻击者参与度对其效用至关重要。然而，研究表明，上下文感知能力（例如响应新攻击类型、系统和攻击者代理的能力）对于提高参与度是必要的。大型语言模型（LLM）已被证明是提高上下文感知能力的一种方法，但面临若干挑战，包括响应时间的准确性和及时性、高运营成本以及由于云部署带来的数据保护问题。我们提出了基于系统的注意力外壳蜜罐（SBASH）框架，通过使用轻量级本地LLM来管理数据保护问题。我们研究了使用检索增强生成（RAG）支持的LLM和非RAG LLM处理Linux shell命令的情况，并使用多种不同指标（如响应时间差异、人类测试者的真实感、以及通过Levenshtein距离、SBert和BertScore计算的与真实系统的相似度）对其进行评估。我们表明，RAG提高了未调优模型的准确性，而通过系统提示（指示LLM像Linux系统一样响应）调优的模型在无RAG情况下达到了与未调优模型有RAG时相似的准确性，同时延迟略低。

英文摘要

Honeypots are decoy systems used for gathering valuable threat intelligence or diverting attackers away from production systems. Maximising attacker engagement is essential to their utility. However research has highlighted that context-awareness, such as the ability to respond to new attack types, systems and attacker agents, is necessary to increase engagement. Large Language Models (LLMs) have been shown as one approach to increase context awareness but suffer from several challenges including accuracy and timeliness of response time, high operational costs and data-protection issues due to cloud deployment. We propose the System-Based Attention Shell Honeypot (SBASH) framework which manages data-protection issues through the use of lightweight local LLMs. We investigate the use of Retrieval Augmented Generation (RAG) supported LLMs and non-RAG LLMs for Linux shell commands and evaluate them using several different metrics such as response time differences, realism from human testers, and similarity to a real system calculated with Levenshtein distance, SBert, and BertScore. We show that RAG improves accuracy for untuned models while models that have been tuned via a system prompt that tells the LLM to respond like a Linux system achieve without RAG a similar accuracy as untuned with RAG, while having a slightly lower latency.

URL PDF HTML ☆

赞 0 踩 0

2603.13384 2026-06-04 cs.SE cs.AI

VulnAgent-R2: Evidence-Calibrated Multi-Agent Auditing for Repository-Level Vulnerability Detection

VulnAgent-R2: 证据校准的多智能体审计用于仓库级漏洞检测

Renwei Meng, Haoyi Wu, Jingming Wang

发表机构 * stu.ahu.edu.cn（安徽大学）

AI总结提出VulnAgent-R2，一个预算感知的多智能体审计框架，通过反事实证据重加权、构建感知验证计划合成和成本风险帕累托调度器，在仓库级漏洞检测中提升F1和AUROC，并降低在线令牌消耗。

Comments 13 pages, 4 figures

详情

AI中文摘要

软件漏洞通常依赖于跨文件数据流、构建选项、框架约定和运行时防护，因此孤立的函数分类器会产生脆弱且校准不良的警告。仓库级LLM智能体可以收集更丰富的证据，但先前的变体对可重复性、验证器行为、基线公平性和统计不确定性规定不足。我们提出VulnAgent-R2，一个预算感知的智能体审计框架，包含三个额外的可重用模块：反事实证据重加权、构建感知验证计划合成和成本风险帕累托调度器。该系统结合了图分类、有界上下文优化、角色专业化智能体、怀疑性反证据、选择性动态验证和校准融合。在Devign、Big-Vul、DiverseVul和PrimeVul上，VulnAgent-R2分别获得0.798/0.895、0.739/0.871、0.700/0.842和0.385/0.781的F1/AUROC。在JITVul上，它达到0.606 F1、0.529 Top-1和0.742 Top-3定位，同时在线令牌比始终全量多智能体执行减少38.3%。在线时间包括检索、LLM调用、CER评分、验证器规划、编译和测试执行，但不包括一次性共享索引。Bootstrap检验显示，在PrimeVul上相对于VulnAgent-X的增益为+0.038 F1，95% CI [0.020, 0.055]，Holm调整p=0.009。将漏洞检测视为校准证据积累，在评估协议下提高了检测、定位、可审计性和成本控制，同时仍然是手动审查的辅助而非替代。代码可在https://github.com/renweimeng/Vlun-Agent-X获取。

英文摘要

Software vulnerabilities often depend on cross-file data flow, build options, framework conventions, and runtime guards, so isolated function classifiers produce fragile and poorly calibrated warnings. Repository-level LLM agents can gather richer evidence, but prior variants under-specify reproducibility, verifier behavior, baseline fairness, and statistical uncertainty. We present VulnAgent-R2, a budget-aware agentic auditing framework with three additional reusable modules: counterfactual evidence reweighting, build-aware verification-plan synthesis, and a cost-risk Pareto scheduler. The system combines graph triage, bounded context optimization, role-specialized agents, sceptic counter-evidence, selective dynamic verification, and calibrated fusion. On Devign, Big-Vul, DiverseVul, and PrimeVul, VulnAgent-R2 obtains 0.798/0.895, 0.739/0.871, 0.700/0.842, and 0.385/0.781 F1/AUROC, respectively. On JITVul it reaches 0.606 F1, 0.529 Top-1, and 0.742 Top-3 localization, while reducing online tokens by 38.3\% over always-full multi-agent execution. Online time includes retrieval, LLM calls, CER scoring, verifier planning, compilation, and test execution, but excludes one-time shared indexing. Bootstrap tests show the PrimeVul gain over VulnAgent-X is +0.038 F1, 95\% CI [0.020, 0.055], Holm-adjusted $p=0.009$. Treating vulnerability detection as calibrated evidence accumulation improves detection, localization, auditability, and cost control under the evaluated protocol, while remaining a prioritization aid rather than a replacement for manual review.Code is available at https://github.com/renweimeng/Vlun-Agent-X.

URL PDF HTML ☆

赞 0 踩 0

2602.23312 2026-06-04 cs.HC cs.AI cs.LG cs.RO cs.SY eess.SY

Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction

评估小语言模型在领导者-跟随者交互中的零样本和单样本适应

Rafael R. Baptista, André de Lima Salgado, Ricardo V. Godoy, Marcelo Becker, Thiago Boaventura, Gustavo J. G. Lahr

发表机构 * University of Sao Paulo（圣保罗大学）； Federal University of Lavras（拉瓦尔联邦大学）； Faculdade Israelita de Ensino e Pesquisa Albert Einstein（亚伯拉罕·林克·埃instein教育与研究学院）

AI总结本文通过微调小语言模型（Qwen2.5-0.5B）在领导者-跟随者交互中实现角色分类，零样本微调达到86.66%准确率且延迟低至22.2毫秒，但单样本模式因上下文长度增加导致性能下降。

详情

AI中文摘要

领导者-跟随者交互是人机交互（HRI）中的一个重要范式。然而，对于资源受限的移动和辅助机器人来说，实时分配角色仍然具有挑战性。虽然大型语言模型（LLMs）在自然通信方面显示出潜力，但其规模和延迟限制了设备端部署。小语言模型（SLMs）提供了一种潜在的替代方案，但它们在HRI中角色分类的有效性尚未得到系统评估。在本文中，我们提出了一个用于领导者-跟随者通信的SLMs基准测试，引入了一个源自已发表数据库的新数据集，并增加了合成样本以捕捉交互特定的动态。我们研究了两种适应策略：提示工程和微调，在零样本和单样本交互模式下进行研究，并与未训练的基线进行比较。使用Qwen2.5-0.5B的实验表明，零样本微调实现了稳健的分类性能（86.66%准确率），同时保持低延迟（每个样本22.2毫秒），显著优于基线和提示工程方法。然而，结果也表明在单样本模式下性能下降，其中增加的上下文长度挑战了模型的架构能力。这些发现表明，微调的SLMs为直接角色分配提供了有效的解决方案，同时突出了边缘端对话复杂性与分类可靠性之间的关键权衡。

英文摘要

Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resource-constrained mobile and assistive robots. While large language models (LLMs) have shown promise for natural communication, their size and latency limit on-device deployment. Small language models (SLMs) offer a potential alternative, but their effectiveness for role classification in HRI has not been systematically evaluated. In this paper, we present a benchmark of SLMs for leader-follower communication, introducing a novel dataset derived from a published database and augmented with synthetic samples to capture interaction-specific dynamics. We investigate two adaptation strategies: prompt engineering and fine-tuning, studied under zero-shot and one-shot interaction modes, compared with an untrained baseline. Experiments with Qwen2.5-0.5B reveal that zero-shot fine-tuning achieves robust classification performance (86.66% accuracy) while maintaining low latency (22.2 ms per sample), significantly outperforming baseline and prompt-engineered approaches. However, results also indicate a performance degradation in one-shot modes, where increased context length challenges the model's architectural capacity. These findings demonstrate that fine-tuned SLMs provide an effective solution for direct role assignment, while highlighting critical trade-offs between dialogue complexity and classification reliability on the edge.

URL PDF HTML ☆

赞 0 踩 0

2603.10289 2026-06-04 quant-ph cs.AI cs.LG

Quantum entanglement provides a competitive advantage in adversarial games

量子纠缠在对抗性博弈中提供竞争优势

Peiyong Wang, Kieran Hymas, James Quach

发表机构 * CSIRO（联邦科学与工业研究组织）

AI总结本研究通过量子-经典混合智能体在Pong对抗性马尔可夫博弈中的实验，发现纠缠量子电路在特征提取和竞争性强化学习中优于可分离电路，表明量子纠缠可作为表示学习的功能资源。

Comments 22 pages, 5 figures

详情

AI中文摘要

量子资源是否能在完全经典的竞争环境中提供优势仍然是一个悬而未决的问题。竞争性零和强化学习尤其具有挑战性，因为成功需要对对抗智能体之间的动态交互进行建模，而非静态的状态-动作映射。在此，我们进行了一项受控研究，隔离了量子纠缠在训练于Pong（一个竞争性马尔可夫博弈）的量子-经典混合智能体中的作用。一个8量子比特参数化量子电路作为近端策略优化框架内的特征提取器，允许直接比较可分离电路与包含固定（CZ）或可训练（IsingZZ）纠缠门的架构。纠缠电路在参数数量相当的情况下始终优于可分离电路，并且在低容量区域中达到或超过经典多层感知机基线。表示相似性分析进一步表明，纠缠电路学习到结构上不同的特征，与对交互状态变量的改进建模一致。这些发现确立了纠缠作为竞争性强化学习中表示学习的功能资源。

英文摘要

Whether uniquely quantum resources confer advantages in fully classical, competitive environments remains an open question. Competitive zero-sum reinforcement learning is particularly challenging, as success requires modelling dynamic interactions between opposing agents rather than static state-action mappings. Here, we conduct a controlled study isolating the role of quantum entanglement in a quantum-classical hybrid agent trained on Pong, a competitive Markov game. An 8-qubit parameterised quantum circuit serves as a feature extractor within a proximal policy optimisation framework, allowing direct comparison between separable circuits and architectures incorporating fixed (CZ) or trainable (IsingZZ) entangling gates. Entangled circuits consistently outperform separable counterparts with comparable parameter counts and, in low-capacity regimes, match or exceed classical multilayer perceptron baselines. Representation similarity analysis further shows that entangled circuits learn structurally distinct features, consistent with improved modelling of interacting state variables. These findings establish entanglement as a function resource for representation learning in competitive reinforcement learning.

URL PDF HTML ☆

赞 0 踩 0

2603.10044 2026-06-04 cs.SE cs.AI cs.CL cs.LG

Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety

脚手架下的安全性：评估条件如何影响测量的安全性

David Gringras

发表机构 * Harvard University（哈佛大学）； MIT（麻省理工学院）

AI总结本研究通过62,808次盲法预注册评估，测试了六种前沿模型在四种部署配置下的安全性，发现脚手架架构对安全性影响较小，而格式转换（如选择题与开放式问题）可导致5-20个百分点的测量差异，且模型-脚手架间存在显著异质性，质疑了单一综合安全性分数的实用性。

Comments 74 pages including appendices. 6 frontier models, 62,808 primary observations (~89k total). Pre-registered: OSF DOI 10.17605/OSF.IO/CJW92. Code and data: https://github.com/davidgringras/safety-under-scaffolding

详情

AI中文摘要

在基准测试中获得的安全分数不一定能预测同一模型在未经测试的智能体脚手架中的行为。我们通过四种部署配置（直接API、ReAct、多智能体批评者、map-reduce委托）运行了六种前沿模型：在四个安全基准测试（BBQ、TruthfulQA、XSTest/OR-Bench、sycophancy）上进行了N = 62,808次盲法、预注册、等价性检验评估，以及三项支持性分析。ReAct和多智能体脚手架保持在预注册的±2个百分点的等价范围内；map-reduce委托降低了测量的安全性（NNH = 14），尽管这种损失很大程度上是测量伪影：在相同项目上，选择题与开放式问题的措辞使测量的安全率变化5-20个百分点，而分解过程无声地移除了选择题选项。每个模型map-reduce损失的约40-89%归因于这种格式转换而非推理中断，一种保留选项的变体恢复了大部分损失。汇总效应也掩盖了模型与脚手架之间的显著异质性：在map-reduce下，对于相同项目，Opus损失16.8个百分点，而Llama 4增加18.8个百分点。从结构上看，脚手架架构仅解释了0.4%的结果方差（基准选择解释了45倍以上），泛化系数G = 0.000（bootstrap 95% CI [0.000, 0.752]）。如此宽的区间本身足以削弱任何单一综合安全分数作为部署标准的效用。这些是“简单案例”；像诡计和CBRN提升这样的重要属性没有明显理由对格式或脚手架不敏感。代码、数据和提示已作为ScaffoldSafety发布。

英文摘要

A safety score earned on a benchmark need not predict how the same model behaves once it is wrapped in an agentic scaffold the benchmark never tested. We ran six frontier models through four deployment configurations (direct API, ReAct, multi-agent critic, map-reduce delegation): N = 62,808 blinded, pre-registered, equivalence-tested evaluations across four safety benchmarks (BBQ, TruthfulQA, XSTest/OR-Bench, sycophancy), plus three supporting analyses. ReAct and multi-agent scaffolds stay within a pre-registered +/-2 pp equivalence margin; map-reduce delegation degrades measured safety (NNH = 14), though that loss is largely a measurement artifact: on identical items, multiple-choice versus open-ended phrasing shifts the measured safety rate by 5-20 pp, and decomposition silently strips the multiple-choice options. Roughly 40-89% of the per-model map-reduce loss is this format conversion rather than reasoning disruption, and an option-preserving variant recovers most of it. Pooled effects also mask sharp model-by-scaffold heterogeneity: under map-reduce, on identical items, Opus loses 16.8 pp while Llama 4 gains 18.8 pp. Structurally, scaffold architecture explains only 0.4% of outcome variance (benchmark choice explains 45x more), and the generalizability coefficient is G = 0.000 (bootstrap 95% CI [0.000, 0.752]). An interval that wide is enough on its own to undermine the utility of any single composite safety number as a deployment criterion. These are the "easy cases"; consequential properties like scheming and CBRN uplift have no obvious reason to be less format- or scaffold-sensitive. Code, data, and prompts are released as ScaffoldSafety.

URL PDF HTML ☆

赞 0 踩 0

2603.04444 2026-06-04 cs.NI cs.AI

vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

vLLM Semantic Router: 面向多模态混合模型的信号驱动决策路由

Xunzhuo Liu, Huamin Chen, Samzong Lu, Yossi Ovadia, Guohong Wen, Hao Wu, Zhengda Tan, Jintao Zhang, Senan Zedan, Yehudit Kerido, Liav Weiss, Haichen Zhang, Bishen Yu, Asaad Balum, Noa Limoy, Abdallah Samara, Baofa Fan, Brent Salisbury, Ryan Cook, Zhijie Wang, Qiping Pan, Rehan Khan, Avishek Goswami, Houston H. Zhang, Shuyi Wang, Ziang Tang, Fang Han, Zohaib Hassan, Jianqiao Zheng, Avinash Changrani, Xue, Liu, Bowei He

发表机构 * MBZUAI（穆斯林人工智能研究所）

AI总结提出vLLM Semantic Router框架，通过可组合信号编排（13种异构信号类型和布尔决策规则）实现多模态模型部署中的智能请求路由，支持不同场景的差异化策略配置。

Comments Technical Report

详情

AI中文摘要

随着大语言模型在模态、能力和成本配置上的多样化，智能请求路由（即在推理时为每个查询选择合适模型）已成为关键的系统挑战。我们提出vLLM Semantic Router，一种面向多模态混合模型部署的信号驱动决策路由框架。该架构遵循两种互补的香农启发视角。在信息论机制中，信号提取通过从原始查询中提炼路由相关信息来降低“选择哪个模型？”的熵。在布尔代数机制中，决策引擎将信号条件组合成功能完备的路由策略。核心创新是可组合信号编排：13种异构信号类型（涵盖亚毫秒级启发式方法以及用于语义、安全性和模态的神经分类器）通过可配置的布尔决策规则组合成部署特定的路由策略，使得根本不同的场景（多云企业、隐私监管、成本优化）被表达为同一架构上的不同配置。匹配的决策通过13种选择算法驱动语义模型路由，而每个决策的插件链强制执行安全约束，包括三阶段HaluGate幻觉检测流水线和轻量级情景记忆系统（带ReflectionGate用于个性化多轮上下文）。一种类型化的神经符号DSL指定这些路由策略并将其编译到多个部署目标，实现无需代码更改的配置优先适配。这些组件共同表明，可组合信号编排使单一框架能够以差异化的成本、隐私和安全策略服务于多种部署场景。

英文摘要

As large language models (LLMs) diversify across modalities, capabilities, and cost profiles, the problem of intelligent request routing: selecting the right model for each query at inference time, has become a critical systems challenge. We present vLLM Semantic Router, a signal-driven decision routing framework for Mixture-of-Modality (MoM) model deployments. The architecture follows two complementary Shannon-inspired views. In the information-theoretic regime, signal extraction reduces the entropy of "which model?" by distilling routing-relevant information from raw queries. In the Boolean-algebraic regime, the decision engine composes functionally complete routing policies from signal conditions. The central innovation is composable signal orchestration: thirteen heterogeneous signal types, spanning sub-millisecond heuristics and neural classifiers for semantics, safety, and modality, are composed through configurable Boolean decision rules into deployment-specific routing policies, so that fundamentally different scenarios (multi-cloud enterprise, privacy-regulated, cost-optimized) are expressed as different configurations over the same architecture. Matched decisions drive semantic model routing via thirteen selection algorithms, while per-decision plugin chains enforce safety constraints including a three-stage HaluGate hallucination detection pipeline and a lightweight episodic memory system with ReflectionGate for personalized multi-turn context. A typed neural-symbolic DSL specifies these routing policies and compiles them to multiple deployment targets, enabling configuration-first adaptation without code changes. Together, these components show that composable signal orchestration enables a single framework to serve diverse deployment scenarios with differentiated cost, privacy, and safety policies.

URL PDF HTML ☆

赞 0 踩 0

2509.02655 2026-06-04 cs.CY cs.AI

BioBlue: Systematic runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format

BioBlue：在生物与经济对齐的AI安全基准上，具有简化观察格式的LLM的系统性类失控优化失败模式

Roland Pihlakas, Sruthi Susan Kuriakose

发表机构 * Independent researcher（独立研究者）； Three Laws research collaboration（Three Laws研究合作）； Rakvere, Estonia（爱沙尼亚拉克雷市）

AI总结本研究通过长期控制环境测试LLM，发现尽管LLM能理解目标，但在多目标场景下会系统性偏离至单目标、无界优化行为，表现出类似失控优化的失败模式。

Comments 27 pages, 7 figures, 7 tables

详情

AI中文摘要

许多关于“失控优化”的AI对齐讨论聚焦于RL智能体：无界效用最大化者，它们以牺牲其他一切为代价过度优化代理目标（例如，“回形针最大化者”、规范博弈）。基于LLM的系统通常被认为更安全，因为它们作为下一个词元预测器而非持久优化器运行。我们通过将LLM置于需要随时间维持状态或平衡目标的简单、长期控制型环境中来实证检验这一假设：单目标和多目标稳态、平衡无界目标与递减收益、以及可再生资源的可持续性。我们发现，尽管LLM在多个步骤中经常表现适当并清楚理解所述目标，但它们常常以结构化的方式丢失上下文并漂移至失控行为：忽略稳态目标、从多目标权衡崩溃为单目标最大化——从而未能尊重凹效用结构。这些失败在初始阶段的能力行为之后可靠地出现，并表现出特征模式（包括自模仿振荡、无界最大化以及回归单目标优化），尽管此时上下文窗口远未满。问题不在于LLM只是丢失上下文并变得不连贯。尽管LLM表面上看似多目标且有界，但在涉及多目标的持续交互下，其行为系统性偏向于像单目标、无界、对齐不良的优化器。我们假设存在一个词元级模式强化吸引子：LLM可能越来越多地从其近期动作历史的词元模式而非原始指令中推导动作。为何这仅发生在多目标设置中仍是一个开放问题。

英文摘要

Many AI alignment discussions of "runaway optimisation" focus on RL agents: unbounded utility maximisers that over-optimise a proxy objective (e.g., "paperclip maximiser", specification gaming) at the expense of everything else. LLM-based systems are often assumed to be safer because they function as next-token predictors rather than persistent optimisers. We empirically test this assumption by placing LLMs in simple, long-horizon control-style environments that require maintaining state of or balancing objectives over time: single- and multi-objective homeostasis, balancing unbounded objectives with diminishing returns, and sustainability of a renewable resource. We find that, although LLMs frequently behave appropriately for many steps and clearly understand the stated objectives, they often lose context in structured ways and drift into runaway behaviours: ignoring homeostatic targets, collapsing from multi-objective trade-offs into single-objective maximisation - thus failing to respect concave utility structures. These failures emerge reliably after initial periods of competent behaviour and exhibit characteristic patterns (including self-imitative oscillations, unbounded maximisation, and reverting to single-objective optimisation), even though the context window is far from full at that point. The problem is not that the LLMs just lose context and become incoherent. Although LLMs appear multi-objective and bounded on the surface, their behaviour under sustained interaction involving multiple objectives, is systematically biased towards acting like single-objective, unbounded, poorly aligned optimisers. We hypothesise a token-level pattern reinforcement attractor: LLMs may increasingly derive actions from the token patterns of their recent action history rather than from the original instructions. Why this happens only in multi-objective settings remains an open question.

URL PDF HTML ☆

赞 0 踩 0

2602.19799 2026-06-04 stat.ML cs.LG math.OC

Path-conditioned training: a principled way to rescale ReLU neural networks

路径条件训练：一种缩放ReLU神经网络的原则性方法

Arthur Lebeurrier, Titouan Vayer, Rémi Gribonval

发表机构 * Université de Lyon（里昂大学）； CNRS（法国国家科学研究中心）

AI总结本文提出一种基于路径提升框架的几何准则来缩放ReLU网络参数，通过最小化该准则实现核对齐，从而加速训练。

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning (ICML 2026), Seoul, South Korea, PMLR 306 (2026)

AI中文摘要

尽管最近算法有所进展，我们仍然缺乏原则性的方法来利用ReLU神经网络参数中记录良好的缩放对称性。虽然两个适当缩放的权重实现相同的函数，但训练动态可能截然不同。为了对这一现象提供新的视角，我们基于最近的路径提升框架，该框架提供了ReLU网络的紧凑分解。我们引入了一个几何动机的准则来缩放神经网络参数，其最小化导致一种条件策略，将路径提升空间中的核与选定的参考对齐。我们推导了一种有效的算法来执行这种对齐。在随机网络初始化的背景下，我们分析了架构和初始化尺度如何共同影响所提出方法的输出。数值实验展示了其加速训练的潜力。

英文摘要

Despite recent algorithmic advances, we still lack principled ways to leverage the well-documented rescaling symmetries in ReLU neural network parameters. While two properly rescaled weights implement the same function, the training dynamics can be dramatically different. To offer a fresh perspective on exploiting this phenomenon, we build on the recent path-lifting framework, which provides a compact factorization of ReLU networks. We introduce a geometrically motivated criterion to rescale neural network parameters which minimization leads to a conditioning strategy that aligns a kernel in the path-lifting space with a chosen reference. We derive an efficient algorithm to perform this alignment. In the context of random network initialization, we analyze how the architecture and the initialization scale jointly impact the output of the proposed method. Numerical experiments illustrate its potential to speed up training.

URL PDF HTML ☆

赞 0 踩 0