arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.25058 2026-05-26 cs.HC cs.AI

Intent Signal Theory: A Computational Framework for Intent-State Control in Human-AI Interaction

意图信号理论：人机交互中意图状态控制的计算框架

Gang Peng

发表机构 * Huizhou Lateni AI Technology Co., Ltd.（惠州莱尼人工智能技术有限公司）； Huizhou University（惠州大学）

AI总结提出意图信号理论（IST），通过区分潜在源意图、可观测意图代理、编码载体和模型输出四个对象，形式化意图丢失定理，并基于六种大语言模型、三种语言和三个任务领域的实验验证了结构-保真度分裂等预测，将提示工程重新定义为意图协议设计。

Comments 10 pages, 2 figures. Theoretical framework paper grounded in four companion empirical studies. Data and code repository: https://github.com/PGlarry/prompt-protocol-specification

详情

AI中文摘要

当前的人工智能交互模型将提示视为主要的交换对象，忽略了一个关键层面：用户的潜在源意图，即提示之前并激发提示的目标状态。这里我们引入意图信号理论（IST），这是一个形式化这一缺失意图层的计算框架。IST区分了四个通常被混淆的对象：潜在源意图（I*）、可观测意图代理（I-hat）、编码载体（P）和模型输出（O）。它形式化了维度权重、编码掩码、结构和保真度恢复分数以及公私意图分解。不可逆意图丢失定理确立了：载体中缺失的私有意图无法通过通用替换恢复。来自四项配套研究的证据（涵盖六种大语言模型、三种语言和三个任务领域）显示了与IST预测一致的结构-保真度分裂、人类验证的度量分离以及权重容忍平台。IST将提示工程重新定义为意图协议设计，并识别了当前人工智能系统所缺乏的一个计算层面。

英文摘要

Current AI interaction models treat the prompt as the primary object of exchange, omitting a critical layer: the user's latent source intent, the goal state preceding and motivating the prompt. Here we introduce Intent Signal Theory (IST), a computational framework that formalises this missing intent layer. IST distinguishes four objects routinely conflated: latent source intent (I*), observable intent proxy (I-hat), encoded carrier (P), and model output (O). It formalises dimensional weights, encoding masks, structural and fidelity recovery scores, and public-private intent decomposition. The Theorem of Irreversible Intent Loss establishes that private intent absent from the carrier cannot be recovered beyond generic substitution. Evidence from four companion studies spanning six LLMs, three languages and three task domains shows structural-fidelity splits, human-validated metric dissociation, and weight-tolerance plateaus consistent with IST's predictions. IST reframes prompt engineering as intent-protocol design and identifies a computational layer that current AI systems lack.

URL PDF HTML ☆

赞 0 踩 0

2605.25057 2026-05-26 math.NA cs.LG cs.NA

Random Neural Network Expressivity for Non-Linear Partial Differential Equations

随机神经网络对非线性偏微分方程的表达能力

Muhammed Ali Mehmood, Lukas Gonon

发表机构 * Department of Mathematics（数学系）； Imperial College London（帝国理工学院伦敦分校）； School of Computer Science（计算机科学学院）； University of St. Gallen（圣加尔登大学）

AI总结研究随机生成隐藏权重的神经网络（RaNNs）对非线性偏微分方程解的逼近能力，推导了误差界并得到维数无关的逼近率1/2，应用于多孔介质方程和可压缩Navier-Stokes方程。

详情

AI中文摘要

随机生成隐藏权重的神经网络（RaNNs）已被广泛研究，既作为独立的机器学习方法，也作为全可训练深度学习方法的初始化。本文研究RaNNs在学习非线性偏微分方程（PDEs）解方面的表达能力。尽管在实际应用中广泛使用，但对此背景下RaNNs逼近性质的严格理论理解仍然有限。本文推导了RaNNs对时间依赖Sobolev函数的误差界，并对足够正则的函数获得了维数无关的逼近率$ rac{1}{2}$。我们将结果应用于两类重要的非线性PDEs：多孔介质方程和可压缩Navier-Stokes方程，表明RaNNs能够有效逼近这些复杂非线性PDEs的解。我们的理论分析得到了数值实验的支持，表明所获得的收敛速率超出了所考虑的设置。

英文摘要

Neural networks with randomly generated hidden weights (RaNNs) have been extensively studied, both as a standalone learning method and as an initialization for fully trainable deep learning methods. In this work, we study RaNN expressivity for learning solutions to non-linear partial differential equations (PDEs). Despite their widespread use in practical applications, a rigorous theoretical understanding of the approximation properties of RaNNs in this context remains limited. Here, we derive error bounds for RaNN approximations to time-dependent Sobolev functions and obtain a dimension-free approximation rate $\frac{1}{2}$ for sufficiently regular functions. We apply our results to two important classes of non-linear PDEs: Porous Medium Equations and Compressible Navier-Stokes Equations, showing that RaNNs are capable of efficiently approximating solutions to these complex, non-linear PDEs. Our theoretical analysis is supported by numerical experiments, showing that the obtained convergence rates extend beyond the considered setting.

URL PDF HTML ☆

赞 0 踩 0

2605.25050 2026-05-26 stat.AP cs.LG q-bio.QM stat.ML

Multimodality Stacking with Blockwise missing values and application to the PIONeeR biomarkers study for prediction of resistance to immunotherapy

具有分块缺失值的多模态堆叠及其在预测免疫治疗耐药性的PIONeeR生物标志物研究中的应用

Mohamed Boussena, Florence Monville, Jacques Fieschi-Meric, Frederic Vely, Pierre Milpied, Julien Mazieres, Maurice Perol, Eric Vivier, Laurent Greillier, Fabrice Barlesi, Sebastien Benzekry

发表机构 * Inria – Inserm team COMPO, COMPutational pharmacology and clinical Oncology, Centre Inria Sophia Antipolis - Méditerranée, Centre de Recherches en Cancérologie de Marseille, Inserm U1068, CNRS UMR7258, Institut Paoli-Calmettes, Pharmacy faculty, Aix-Marseille University（Inria - Inserm COMPO团队，计算药理学和临床肿瘤学，Inria Sophia Antipolis -地中海， Marseille癌症研究中心，Inserm U1068，CNRS UMR7258，Paoli-Calmettes研究所，药学系，Aix-Marseille大学）； Veracyte SAS, Marseille, France（Veracyte SAS，法国马赛）； Assistance Publique-Hôpitaux de Marseille (APHM), Marseille, France（马赛公共医院（APHM），法国马赛）； Toulouse University Hospital, Toulouse, France（图卢兹大学医院，法国图卢兹）； Centre Leon Berard, Lyon, France（Leon Berard中心，法国里昂）； Innate Pharma, Marseille, France（Innate Pharma，法国马赛）； Université Paris Saclay, Gustave Roussy, Inserm, Prédicteurs Moléculaires et nouvelles cibles en oncologie (U981), F-94805, Villejuif, France（巴黎萨克雷大学，Gustave Roussy，Inserm，分子预测与肿瘤学新靶点（U981），法国维尔若，F-94805）

AI总结提出多模态堆叠框架MSB，通过独立建模各模态特征并利用交叉验证堆叠元学习器聚合预测，解决高维和分块缺失问题，在PIONeeR研究中预测非小细胞肺癌免疫治疗无进展生存期，性能优于基线算法。

详情

AI中文摘要

在临床肿瘤学中，整合多模态数据集常受到高维性和分块缺失的阻碍，即特定患者子集无法获得完整数据源。标准生存模型通常难以处理这些缺失，导致结果偏倚或患者排除。我们提出具有分块缺失值的多模态堆叠（MSB），一种用于生存分析的晚期融合框架，它独立建模模态特定特征，然后通过交叉验证的堆叠元学习器聚合预测。MSB在PIONeeR研究（n=443名患者，来自八个异质来源的378个生物标志物）中进行了验证，以预测接受免疫治疗的晚期非小细胞肺癌患者的无进展生存期。MSB产生了比基线算法更高的预测性能（C-index）。改进幅度因基线强度而异：线性模型提高了15.9%（Wilcoxon符号秩检验p<0.001），随机生存森林提高了5.4%（p=0.002），梯度提升方法提高了2.1%（p=0.030）。除了区分能力外，MSB还缩小了泛化差距（5折交叉验证重复3次的训练-测试差异：0.055 vs 线性模型的0.380）。置换重要性分析确定了常规实验室标志物、临床特征和PD-L1表达为主要预测驱动因素。缺失块指示器的重要性可忽略，表明模型从生物标志物值而非数据可用性模式中学习。MSB为具有分块缺失的多模态生存预测提供了一个统计验证的框架。通过无需完整数据即可进行系统性生物标志物评估，MSB为生物医学研究中的预测建模提供了实用工具，有待外部验证。实现代码可在https://github.com/MohamedBoussena/MSB 根据Inria许可证获取。

英文摘要

Integrating multimodal datasets in clinical oncology is frequently hindered by high dimensionality and blockwise missingness, where entire data sources are unavailable for specific patient subsets. Standard survival models often struggle with these gaps, leading to biased results or patient exclusion. We introduce Multimodality Stacking with Blockwise missing values (MSB), a late-fusion framework for survival analysis that independently models modality-specific features before aggregating predictions via a cross-validated stacking meta-learner. MSB was validated on the PIONeeR study (n=443 patients, 378 biomarkers across eight heterogeneous sources) to predict progression-free survival in advanced non-small cell lung cancer patients receiving immunotherapy. MSB yielded higher predictive performance (C-index) than baseline algorithms. Improvements varied by baseline strength: linear models showed a 15.9% increase (p<0.001 for the Wilcoxon signed-rank test), random survival forests gained 5.4% (p=0.002), and gradient boosting methods improved by 2.1% (p=0.030). Beyond discrimination, MSB reduced the generalization gap (train-test difference in 5 folds cross-validation repeated 3 times: 0.055 vs 0.380 for linear models). Permutation importance analysis identified routine laboratory markers, clinical features, and PD-L1 expression as primary predictive drivers. Missing block indicators showed negligible importance, suggesting the model learned from biomarker values rather than data availability patterns. MSB provides a statistically validated framework for multimodal survival prediction with blockwise missingness. By enabling systematic biomarker evaluation without requiring complete data, MSB offers a practical tool for predictive modeling in biomedical research, pending external validation. Implementation is available at https://github.com/MohamedBoussena/MSB under Inria license.

URL PDF HTML ☆

赞 0 踩 0

2605.24999 2026-05-26 q-bio.NC cs.AI cs.MA

Interpretation, Learning, and Empathy as One Constraint: A Residual-Adequacy Architecture with Accountable Abstention

解释、学习与共情作为单一约束：具有可问责弃权的残差充分性架构

Chainarong Amornbunchornvej

发表机构 * National Electronics and Computer Technology Center (NECTEC)（国家电子与计算机技术中心）

AI总结提出一种认知架构，通过单一残差量统一处理解释、学习和共情，当情境超出表征能力时产生带类型和见证的弃权。

Comments First draft for journal submission. The code is at https://github.com/DarkEyes/RC-Arch

详情

AI中文摘要

一个智能体必须对当前情境采取行动，学习它尚无法表征的内容，并充分建模其他智能体以进行协调。这些能力通常由独立的机制实现，但它们共享一种失败模式：情境可能超出智能体当前能表征的范围，此时诚实的回应是原则性的拒绝，并说明缺失了什么。我们开发了一个小型认知架构，其中这些限制源于单一量。一个解释-决策单元（IDU）通过一组体制（具有私有基的局部表征框架）解释内容向量，并决定其许可哪些行动；内容相对于活跃体制表征范围的标量残差驱动该单元。低残差且许可清晰时发出行动；否则单元重新解释、尝试描述长度合理的扩展，或停止并给出带类型和见证的终止。我们证明该单元是总且确定性的：对于任何内容和固定配置，它在有限有界步数内停止，并带有唯一终止见证，因此弃权由构造携带其原因。通过绑定架构的开放参数而不改变其机制，相同的残差-范围约束在三个范围上恢复了三个有记录的现象：不知的类型学（类型化弃权）；智能体之间的强制误解，局限于一个共享概念且对犯错的智能体不可见（有界共情）；以及学习中的先决条件依赖，源于有界关注窗口而非假设（发展先决条件）。每个实例化都针对自然智能体和人工智能体进行了阐述，并提出了可证伪的预测，因此一个约束可以模拟人类和机器认知中的限制。该工作提供了一种统一和一种可问责弃权的概念，通过构造带有类型和见证。

英文摘要

An agent must act on the situation before it, learn what it cannot yet represent, and model other agents well enough to coordinate. These faculties are usually realized by separate mechanisms, yet they share a failure mode: the situation can exceed what the agent can currently represent, and the honest response is then a principled refusal that says what was missing. We develop a small cognitive architecture in which these limits arise from a single quantity. An Interpretation-Decision Unit (IDU) interprets a content vector through a family of regimes - local representational frames with private bases - and decides which actions it licenses; a scalar residual of the content against the active regimes' representational scope drives the unit. Low residual with a clean licensing emits an action; otherwise the unit re-interprets, attempts a description-length-justified expansion, or halts with a typed, witnessed terminal. We prove the unit is total and deterministic: for any content and fixed configuration it halts in finitely many bounded-cost steps with a unique terminal witness, so abstention carries its cause by construction. By binding the architecture's open parameters without changing its mechanics, the same residual-against-scope constraint recovers three documented phenomena at three scopes: the typology of not-knowing (typed abstention); a forced misunderstanding between agents, localized to one shared concept and invisible to the agent committing it (bounded empathy); and prerequisite dependence in learning derived from a bounded focus window rather than posited (developmental prerequisites). Each instantiation is worked for a natural and an artificial agent and states a falsifiable prediction, so one constraint can model limits in both human and machine cognition. The account contributes a unification and a notion of accountable abstention, typed and witnessed by construction.

URL PDF HTML ☆

赞 0 踩 0

2605.24992 2026-05-26 cs.NI cs.AI cs.LG cs.MA

Scaling up Energy-Aware Multi-Agent Reinforcement Learning for Mission-Oriented Drone Networks with Individual Reward

面向任务驱动无人机网络的能量感知多智能体强化学习扩展与个体奖励

Changling Li, Ying Li

发表机构 * Department of Computer Science, ETH Zurich（苏黎世联邦理工学院计算机科学系）； Department of Computer Science, Colby College（科尔比学院计算机科学系）

AI总结提出基于个体奖励函数的能量感知多智能体强化学习模型，利用深度Q网络解决无人机网络动态环境和电池容量限制下的轨迹规划问题，实验表明在任务密度高时成功率接近100%，且扩展性优于共享奖励模型。

Comments IEEE Internet of Things Journal

详情

DOI: 10.1109/JIOT.2024.3511253
Journal ref: volume=12, number=8, year=2025, pages=10640-10654

AI中文摘要

多智能体强化学习（MARL）因其通过交互学习的能力，在自动驾驶和智慧城市等协作系统中显示出广泛适用性。随着无人机网络的最新发展，研究人员也应用MARL来解决轨迹规划问题。然而，动态环境和有限的电池容量仍然是使用MARL实现高效协作任务执行的挑战。在本文中，我们提出了一种能量感知的MARL模型作为应对这些挑战的尝试，利用深度Q网络（DQN）和由任务执行进度及无人机剩余电量驱动的个体奖励函数。我们对所提出的模型进行了一系列仿真研究，并将其与共享奖励MARL进行比较，以探索MARL中信用分配的影响。结果表明，无论任务位置和长度如何，我们提出的模型都能达到至少80%的成功率。与共享奖励模式类似，个体奖励模式在任务密度高时可以获得更好的成功率，并且当任务密度接近40%时，几乎可以达到100%的成功率。我们提出的个体奖励模型的真正优势在环境扩展时得以显现。与共享奖励MARL的比较表明，我们提出的模型对环境大小和智能体数量的变化更加鲁棒。由于目标的清晰性，它可以用更少的步骤实现更高的成功率，从而更好地提高能源效率。

英文摘要

Multi-agent reinforcement learning (MARL) has shown wide applicability in collaborative systems such as autonomous driving and smart cities for its ability of learning through interaction. With the recent development of drone networks, researchers have also applied MARL to address the trajectory planning problems. However, the dynamic environment and the limited battery capacity are still challenging for using MARL to achieve efficient collaborative task execution. In this paper, we propose an energy-aware MARL model as an attempt to tackle these challenges, leveraging Deep Q-Networks (DQN) with \emph{individual reward functions} driven by the task execution progress and the remaining battery of drones. We conduct a set of simulation studies for the proposed mode and compare it with the shared reward MARL~\cite{Li2022MARL} to explore the impact of credit assignment in MARL. The results indicate that our proposed model can achieve at least 80\% success rate regardless of the task locations and lengths. Similar to the shared reward mode, the individual reward mode can achieve a better success rate when the task density is high, and it can hit nearly a 100\% success rate when task density gets close to 40\%. The true advantage of our proposed model with individual reward is revealed when scaling up the environment. The comparison to the shared reward MARL shows that the our proposed model is more robust towards the change of the environment size and agent numbers. It can achieve higher success rate with fewer steps due to the clarity of the goal which improves energy efficiency even better.

URL PDF HTML ☆

赞 0 踩 0

2605.24986 2026-05-26 cs.IR cs.LG

Self-Balancing Gradient Allocation for Heterogeneity-Aware Feature Generation in Click-Through Rate Prediction

点击率预测中面向异构感知特征生成的自平衡梯度分配

Moyu Zhang, Yun Chen, Yujun Jin, Jinxin Hu, Yu Zhang, Xiaoyi Zeng

发表机构 * Alibaba Group（阿里巴巴集团）

AI总结针对生成式CTR方法中重建目标忽略特征场异构性导致难场欠拟合的问题，提出HeteGenCTR，通过可学习的场难度参数联合训练去噪网络，实现自平衡损失和难度引导注意力机制，在五个基准和在线A/B测试中取得显著提升。

Comments 12 pages, 5 figures, 4 tables

详情

AI中文摘要

通过离散扩散的生成式预训练在所有特征场上同时提供密集的重建监督，缓解了CTR预测中数据稀疏导致的表示崩溃。然而，所有现有的生成式CTR方法都有一个根本限制：重建目标对每个特征场赋予相同的训练权重，忽略了高基数ID字段、稀疏分类属性、数值和行为序列之间重建难度的深刻异质性。这导致容易的场主导训练梯度，而最难但信息最丰富的场长期欠拟合，我们将这个问题称为生成难度不平衡。我们提出HeteGenCTR，通过每个场可学习的难度参数与去噪网络联合训练来解决这种不平衡。这个统一信号驱动两个协调组件，无需额外超参数：一个自平衡损失，自动将梯度预算重新分配给更难的场，具有可证明的稳定均衡；以及一个难度引导的注意力机制，抑制已经收敛的容易场的影响，同时放大向难场的跨场信息流。两个组件共享相同的学习信号，并在整个训练过程中保持相互一致。在五个CTR基准和一个为期七天的在线A/B测试中，实验表明相对于最先进的基线具有一致且统计显著的改进，对冷启动和长尾用户有不成比例的增益。

英文摘要

Generative pre-training via discrete diffusion provides dense reconstruction supervision across all feature fields simultaneously, mitigating representation collapse from data sparsity in CTR prediction. However, all existing generative CTR methods share a fundamental limitation: the reconstruction objective assigns equal training weight to every feature field, ignoring the profound heterogeneity of reconstruction difficulty across high-cardinality ID fields, sparse categorical attributes, numerical values, and behavioral sequences. This causes easy fields to dominate training gradients while the hardest but most informative fields remain chronically underfit, a problem we term the generative difficulty imbalance.We propose HeteGenCTR, which resolves this imbalance through per-field learnable difficulty parameters jointly trained with the denoising network. This unified signal drives two coordinated components without additional hyperparameters: a self-balancing loss that automatically reallocates gradient budget toward harder fields with a provably stable equilibrium, and a difficulty-guided attention mechanism that suppresses the influence of already-converged easy fields while amplifying cross-field information flow toward hard fields. Both components share the same learned signal and remain mutually consistent throughout training. Experiments on five CTR benchmarks and a seven-day online A/B test demonstrate consistent, statistically significant improvements over state-of-the-art baselines, with disproportionate gains for cold-start and long-tail users.

URL PDF HTML ☆

赞 0 踩 0

2605.24949 2026-05-26 cs.CR cs.AI

APT-Agent: Automated Penetration Testing using Large Language Models

APT-Agent：利用大语言模型的自动化渗透测试

William Guanting Li, Alsharif Abuadbba, Kristen Moore, Dan Dongseong Kim

发表机构 * University of Queensland（昆士兰大学）

AI总结提出APT-Agent框架，通过混合修正模块和命令特定记忆架构解决大语言模型在渗透测试中的幻觉和长期记忆问题，在Metasploitable 2上实现84.29%的端到端利用成功率。

Comments 11 pages, 8 figures

详情

AI中文摘要

渗透测试对于保护现代网络基础设施至关重要，然而传统的手动方法难以跟上其规模和复杂性。大语言模型（LLMs）为自动化这些任务提供了新的机会，但现有方法面临两个持续挑战：技术实体的幻觉和长期上下文记忆不足。为了解决这些问题，我们提出了APT-Agent，一个完全自动化的LLM驱动的渗透测试框架，系统性地协调侦察、利用和数据窃取。APT-Agent引入了一个混合修正模块来恢复幻觉命令，以及一个命令特定的记忆架构来跨多步攻击序列保留操作上下文。我们在Metasploitable 2上针对涵盖Web、数据库和网络协议的七个脆弱服务评估了我们的APT-Agent。APT-Agent实现了84.29%的端到端利用成功率，而在匹配条件下，Script Kiddie和PentestGPT分别为48.57%和18.57%。通过减少认知负担和最小化对人类干预的依赖，APT-Agent代表了向可扩展、可靠且认知高效的渗透测试自动化迈出的一步。

英文摘要

Penetration testing is essential to securing modern web infrastructures, yet traditional manual methods struggle to keep pace with their scale and complexity. Large Language Models (LLMs) offer new opportunities for automating these tasks, but existing approaches face two persistent challenges: hallucination of technical entities and insufficient long-term contextual memory. To address these issues, we present APT-Agent, a fully automated LLM-driven penetration testing framework that systematically orchestrates reconnaissance, exploitation, and exfiltration. APT-Agent introduces a hybrid rectification module to recover hallucinated commands and a command-specific memory architecture to preserve operational context across multi-step attack sequences. We evaluate our APT-Agent on Metasploitable 2 against seven vulnerable services spanning web, database, and network protocols. APT-Agent achieves an 84.29% end-to-end exploitation success rate, compared to 48.57% (Script Kiddie) and 18.57% (PentestGPT) under matched conditions. By reducing cognitive burden and minimizing reliance on human intervention, APT-Agent represents a step toward scalable, reliable, and cognitively efficient automation for penetration testing.

URL PDF HTML ☆

赞 0 踩 0

2605.24941 2026-05-26 cs.CR cs.LG

Memory-Induced Tool-Drift in LLM Agents

LLM代理中的记忆诱导工具漂移

Mahavir Dabas, Jihyun Jeong, Ming Jin, Ruoxi Jia

发表机构 * Virginia Tech（弗吉尼亚理工大学）

AI总结研究LLM代理中长期记忆存储的个性偏见（如成本意识、不耐烦等）在不适用情境下静默影响工具调用的问题，提出MEMDRIFT基准测试，发现偏置记忆导致工具参数偏离基线，且现有防御措施无法消除该现象。

详情

AI中文摘要

现代LLM代理将用于个性化的长期记忆与用于在现实世界中采取行动的工具调用接口相结合——这一组合支撑着当代生产系统。我们研究了这种组合的一个先前未被检查的失败：当存储在记忆中的个性驱动偏见（成本意识、不耐烦、风险承受能力等）在不适用情境下静默影响工具调用时。我们称此为记忆诱导工具漂移，并通过MEMDRIFT将其操作化，MEMDRIFT是一个包含105个场景的基准测试，涵盖五个偏见维度和七个专业领域，通过自动化对抗性流水线生成。在七个前沿模型（包括具有扩展推理能力的模型）中，偏置记忆使偏转分数（一种由评判者评分的参数偏离无偏基线的度量）在1-5分制上提高高达+3.6分。当记忆管理由三种生产记忆架构处理时，工具漂移持续存在。该现象影响现实世界的工具：扫描288个经过验证的MCP服务器上的6,062个工具，我们标记了608个具有易受影响参数的工具，并在一个经过验证的子集上确认了工具漂移。从机制上讲，偏置记忆充当隐式引导向量，将激活沿与显式行为指令相同的潜在方向推动。它们还将注意力从任务相关上下文重新分配到与目标参数具有表面关键词重叠的记忆条目上。标准防御——基于提示的相关性指令和记忆过滤器——减少了漂移但未能消除它。随着代理以用户名义采取越来越重要的行动，记忆诱导工具漂移代表了当前安全措施未能解决的一个系统性漏洞，这激发了在记忆管理和工具调用生成交叉点上的专用防御。

英文摘要

Modern LLM agents combine long-term memory for personalization with tool-calling interfaces for taking actions in the world -- a combination underpinning contemporary production systems. We study a previously unexamined failure of this combination: when personality-driven biases stored in memory (cost-consciousness, impatience, risk tolerance, etc.) silently affect tool calls in contexts where they are not applicable. We call this memory-induced tool-drift and operationalize it through MEMDRIFT, a benchmark of 105 scenarios spanning five bias dimensions and seven professional domains, generated through an automated adversarial pipeline. Across seven frontier models -- including those with extended reasoning -- biased memories raise deflection scores (a judge-scored measure of parameter deviation from unbiased baselines) by up to $+3.6$ points on a 1--5 scale. Tool-drift persists when memory management is handled by three production memory architectures. The phenomenon affects real-world tools: scanning 6{,}062 tools across 288 verified MCP servers, we flag 608 with susceptible parameters and confirm tool-drift on a validated subset. Mechanistically, biased memories act as implicit steering vectors, pushing activations along the same latent directions as explicit behavioral instructions. They also redistribute attention from task-relevant context toward memory entries with surface-level keyword overlap to the target parameter. Standard defenses -- prompt-based relevance instructions and memory filters -- reduce drift but do not eliminate it. As agents take increasingly consequential actions on a user's behalf, memory-induced tool-drift represents a systematic vulnerability that current safeguards do not address, motivating dedicated defenses at the intersection of memory management and tool-call generation.

URL PDF HTML ☆

赞 0 踩 0

2605.24938 2026-05-26 cs.IR cs.AI cs.CV

Your Embedding Model is SMARTer Than You Think

你的嵌入模型比你想象的更聪明

Jianrui Zhang, Hyun Jung Lee, Sukanta Ganguly, Tae-Eui Kam, Donghyun Kim, Yong Jae Lee

发表机构 * UW-Madison（威斯康星大学麦迪逊分校）； Korea University（韩国大学）； NetApp, Inc.（NetApp公司）

AI总结提出SMART框架，通过利用标准单向量模型的隐式多向量能力，在推理时应用后期交互，无需额外训练即可提升多模态检索性能。

详情

AI中文摘要

多模态检索严重依赖单向量检索器，它将丰富的顺序令牌序列压缩为单个全局表示。虽然高效，但它们丢弃了密集检索任务所需的关键细粒度局部证据。多向量方法作为解决方案被引入，但严格需要训练，且许多忽略了全局总结表示的必要性。为解决这一问题，我们引入SMART，一个释放标准单向量模型潜在多向量能力的框架。我们首先证明，在池化嵌入上的标准对比训练通过梯度流隐式塑造了前序隐藏状态的检索几何结构。通过在推理时对这些冻结的隐藏状态应用直接后期交互，SMART作为一种即插即用的升级，持续提升跨多种模态的性能，甚至在MMEB-V2上进一步改进了最先进的模型。我们还揭示了SMART的优越性能，简单的轻量级后训练不仅节省时间和计算，还在视觉文档检索上带来进一步改进，使单向量模型能够超越最先进的多向量对应模型。最终，SMART为多模态检索提供了高效的推理增强和强大的微调技术。我们在https://github.com/HanSolo9682/SMART开源了代码和权重。

英文摘要

Multimodal retrieval relies heavily on single-vector retrievers, which compress rich, sequential token sequences into one single global representation. While efficient, they discard fine-grained, local evidence critical for dense retrieval tasks. Multi-vector approaches were introduced as a solution, but they strictly require training and many ignore the necessity of a globally summarizing representation. To address this, we introduce SMART, a framework that unlocks the latent multi-vector capabilities of standard single-vector models. We first demonstrate that standard contrastive training on the pooled embedding implicitly shapes the retrieval geometry of preceding hidden states via gradient flow. By applying direct late-interaction over these frozen hidden states during inference, SMART acts as a plug-and-play upgrade that consistently improves performance across diverse modalities, improving even the state-of-the-art models further on MMEB-V2. We also reveal SMART's superior performance, as simple lightweight post-training not only saves time and compute, but also brings forth further improvement on Visual Document retrieval, allowing a single-vector model to outperform SoTA multi-vector counterparts. Ultimately, SMART offers both a highly efficient inference enhancement and a powerful finetuning technique for multimodal retrieval. We open source our code and weights at https://github.com/HanSolo9682/SMART.

URL PDF HTML ☆

赞 0 踩 0

2605.24929 2026-05-26 stat.ML cs.IT cs.LG math.IT

Estimating Mixture Distributions via Stochastic Mirror Descent

通过随机镜像下降估计混合分布

Mohammadreza Ahmadypour, Tara Javidi, Farinaz Koushanfar

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）； University of California San Diego（加州大学圣地亚哥分校）

AI总结针对从样本中估计未知分布的问题，提出基于随机镜像下降（SMD）的混合模型估计器族，通过选择Bregman散度实现灵活估计，在大规模候选分量下保持高效，并在KL散度和ℓ2范数下达到近最优收敛率。

详情

AI中文摘要

我们重新审视了从样本中估计未知分布的经典问题，通过拟合最小化交叉熵损失的混合模型。将该任务视为在$M$分量混合分布空间上的随机凸优化问题，我们提出了一族源自随机镜像下降（SMD）算法的估计器。这种基于优化的方法提供了一个原则性且灵活的框架，它推广了传统估计器，并通过选择Bregman散度提出了多种新颖的估计器。我们方法的一个关键优势是它能够随着候选分量$f_i$的数量高效扩展；也就是说，可以在混合模型中使用大量基分布，而不会产生显著的计算开销。这使得能够实现更丰富的近似和改进的估计精度。此外，在类别分布（离散结果）的情况下，我们的估计器不需要严格的下界，换句话说，我们的框架不需要精确知道分布的支持集。我们证明，在温和条件下，所提出的$φ$-SMD估计器在Kullback-Leibler（KL）散度和$\ell_2$范数下均能达到近最优的收敛速率，并在计算昂贵时提供实际优势。我们的数值分析突出了相对于经典估计器在样本效率和可扩展性方面的改进性能保证。

英文摘要

We revisit the classical problem of estimating an unknown distribution from its samples by fitting a mixture model that minimizes cross-entropy loss. Framing the task as a stochastic convex optimization problem over the space of $ M $-component mixture distributions, we propose a family of estimators derived from the stochastic mirror descent (SMD) algorithm. This optimization-based approach provides a principled and flexible framework that generalizes traditional estimators and proposes a variety of novel estimators through the choice of Bregman divergences. A key advantage of our method is that it scales efficiently with the number of candidate components $ f_i $; that is, one can employ a large set of basis distributions in the mixture model without incurring significant computational overhead. This enables richer approximations and improved estimation accuracy. Moreover, in the case of categorical distribution (discrete outcomes) our estimators do not require a strict lower bound, in other words our framework does not require the precise knowledge of the support of the distribution. We demonstrate that, under mild conditions, the proposed $ φ$-SMD estimators achieve near-optimal convergence rates in both Kullback-Leibler (KL) divergence and $ \ell_2 $-norm and offer practical benefits when computation is expensive. Our numerical analysis highlights improved performance guaranties over classical estimators, particularly in terms of sample efficiency and scalability.

URL PDF HTML ☆

赞 0 踩 0

2605.24915 2026-05-26 cs.GR cs.CV

CyberMaskQA: 一个用于评估大语言模型在网络安全问答中隐私意识的基准

Matilda Gaddi, Jin Noh, Onat Gungor, Tajana Rosing

发表机构 * Department of Computer Science and Engineering（计算机科学与工程系）； University of California, San Diego (UCSD)（加州大学圣地亚哥分校）

AI总结针对现有基准缺乏隐私保护评估的问题，提出CyberMaskQA基准，通过结合人工场景与LLM语义扩展生成带隐私标签的数据集，以评估模型在网络安全问答中的推理与隐私保护能力。

详情

AI中文摘要

大型语言模型（LLM）越来越多地应用于网络安全问答（QA），用于事件响应和漏洞分析等关键任务。然而，现实世界的操作环境，包括系统日志和网络配置，本质上包含敏感标识符，例如IP地址、主机名和用户账户。在受监管的环境中，使用基于云的模型处理这些数据通常不安全或不可行。此外，隐私保护问答的进展因缺乏能够同时评估操作推理和隐私保护的带注释、上下文丰富的数据集而受阻。为解决这一差距，我们引入了CYBERMASKQA，一个涵盖关键安全领域的隐私感知问答基准。与主要测试事实知识的现有基准不同，CYBERMASKQA将问题置于现实的组织环境中，并具有资产和权限之间的显式因果依赖关系。通过系统化的流水线生成，该数据集结合了人工策划的基础场景与LLM驱动的语义扩展，为每个实例标注精确的私有实体标签，以实现可控的信息披露。对问答准确性和掩码性能的评估证明了该基准在开发可部署、上下文感知的网络安全模型以及促进隐私-效用权衡的细致研究方面的实用性。一经接受，我们将发布数据集和生成框架。

英文摘要

Large language models (LLMs) are increasingly applied to cybersecurity question answering (QA) for critical tasks such as incident response and vulnerability analysis. However, real-world operational contexts, including system logs and network configurations, inherently contain sensitive identifiers, e.g., IP addresses, host names, and user accounts. Processing this data with cloud-based models is often unsafe or infeasible in regulated environments. Furthermore, progress in privacy-preserving QA is hindered by the lack of annotated, context-rich datasets capable of jointly evaluating operational reasoning and privacy preservation. To address this gap, we introduce CYBERMASKQA, a privacy-aware QA benchmark covering key security domains. Unlike existing benchmarks that primarily test factual knowledge, CYBERMASKQA grounds questions in realistic organizational contexts with explicit causal dependencies among assets and privileges. Generated through a systematic pipeline, the dataset combines human-curated base scenarios with LLM-driven semantic expansion, annotating each instance with precise private entity labels to enable controlled information disclosure. Evaluations of QA accuracy and masking performance demonstrate the benchmark's utility for developing deployable, context-aware cybersecurity models and facilitating nuanced studies of privacy-utility trade-offs. Upon acceptance, we will release the dataset and the generation framework.

URL PDF HTML ☆

赞 0 踩 0

2605.24764 2026-05-26 cs.IR cs.AI cs.CL

凸聚类中的亲和图连通性

Sam Rosen, Jason Xu

发表机构 * Department of Statistical Science, Duke University（杜克大学统计科学系）； Department of Biostatistics, University of California Los Angeles（加州大学洛杉矶分校生物统计学系）

AI总结研究凸聚类中亲和权重对应一般连通图时的有限样本界，通过随机游走理论分析聚类性能与图结构连通性的关系，并提出超参数调优应包括亲和权重的调整。

Comments 28 pages, 6 figures

2605.24663 2026-05-26 cs.CR cs.AI

CyBOKClaw: Human-in-the-Loop CyBOK Mapping for Cybersecurity Curriculum

CyBOKClaw：用于网络安全课程的人机协同CyBOK映射框架

Yan Lin Aung, Kevin Togbe

发表机构 * University of Derby, Derby, UK（德比大学）

AI总结提出CyBOKClaw，一种可解释的人机协同检索框架，通过查询归一化、术语扩展、概念提升、主题描述丰富和领域敏感排序规则，将网络安全关键词/短语映射到CyBOK，并采用专家引导的top-5有用性指标ECA-5评估，在开发集和验证集上分别达到91.88%和98.00%的ECA-5。

详情

AI中文摘要

本文提出了CyBOKClaw，一个可解释的人机协同检索框架，用于将网络安全关键词或短语（KWoPs）映射到网络安全知识体系（CyBOK）。该框架并非将任务视为严格的精确分类，而是设计为供专家审查的top-k候选生成器。它结合了查询归一化、策划的术语扩展、概念级提升、主题描述丰富以及领域敏感的排序规则。由于教育领域的KWoPs通常宽泛、模糊且仅与CyBOK术语大致对齐，严格的精确匹配只能提供部分实际效用。因此，我们使用结构检索指标和专家引导的top-5有用性指标ECA-5（前5名中精确或最接近可接受匹配）来评估该框架，该指标记录返回的候选是否包含至少一个专家判断为精确或可接受为最接近实际CyBOK位置的映射。在开发数据集上，CyBOKClaw达到了64.73%的EXA-5（前5名精确匹配）、84.18%的结构语义对齐和91.88%的ECA-5；在验证数据集上，达到了81.19%的EXA-5、93.32%的结构语义对齐和98.00%的ECA-5。这些结果表明，专家引导的top-k有用性比单纯的精确结构匹配更能忠实地反映实际CyBOK映射效用，并且CyBOKClaw作为一种针对CyBOK的专家支持检索系统是有效的。

英文摘要

This paper presents CyBOKClaw, an interpretable human-in-the-loop retrieval framework for mapping cybersecurity keywords or phrases (KWoPs) to the Cyber Security Body of Knowledge (CyBOK). Rather than treating the task as strict exact classification, the framework is designed as a top-k candidate generator for expert review. It combines query normalization, curated term expansion, concept-level boosts, topic-description enrichment, and domain-sensitive ranking rules. Because educational KWoPs are often broad, ambiguous, and only approximately aligned with CyBOK terminology, strict exact matching provides only a partial account of practical utility. We therefore evaluate the framework using both structural retrieval metrics and an expert-guided top-5 usefulness metric, ECA-5 (Exact or Closest Acceptable Match at top-5), which records whether the returned candidates contain at least one mapping that an expert would judge exact or accept as the nearest practical CyBOK placement. On the development dataset, CyBOKClaw achieves 64.73% EXA-5 (Exact Match at top-5), 84.18% structural semantic alignment, and 91.88% ECA-5; on the validation dataset, it achieves 81.19% EXA-5, 93.32% structural semantic alignment, and 98.00% ECA-5. These results show that expert-guided top-k usefulness provides a more faithful account of practical CyBOK mapping utility than exact structural matching alone, and that CyBOKClaw is effective as a CyBOK-specific expert-support retrieval system.

URL PDF HTML ☆

赞 0 踩 0

2605.24651 2026-05-26 math.NA cs.LG cs.NA

WINO: A Weak-Form Physics Informed Neural Operator for Hyperelasticity on Variable Domains

WINO: 一种用于变域超弹性问题的弱形式物理信息神经算子

Bokai Zhu, Qinghui Zhang, Timon Rabczuk

发表机构 * School of Science, Harbin Institute of Technology, Shenzhen, P. R. China（哈尔滨工业大学深圳校区）； School of Science, Harbin Institute of Technology, Shenzhen, Guangdong（哈尔滨工业大学深圳校区）； Institute of Structural Mechanics, Bauhaus-Universität Weimar（魏玛 Bauhaus 大学结构力学研究所）

AI总结提出一种无数据框架WINO，结合神经算子的效率与φ-有限元法的几何灵活性，通过最小化弱形式残差和惩罚项训练，实现高精度且计算时间减少50-80%。

详情

AI中文摘要

我们提出了一种弱形式物理信息神经算子（WINO），这是一个无数据框架，结合了神经算子的效率与φ-有限元法（φ-FEM）的几何灵活性。φ-FEM是一种非拟合方法，无需体拟合网格即可适应几何变化，其中域几何由水平集函数φ表示。为了施加边界条件，Dirichlet问题采用φ-FEM提升，因此仅学习齐次位移贡献，而牵引驱动的Neumann问题额外预测非拟合弱形式所需的辅助场。参数通过最小化与φ-FEM对齐的弱形式残差平方以及切割单元辅助方程的平方惩罚来训练，从而消除了对大型配对数据集的依赖。训练后，WINO输出可作为神经算子热启动（NOWS）为非线性φ-FEM求解器提供初始值，相比传统冷启动求解器减少了迭代次数。数值基准测试表明，WINO在所有基准测试中实现了低于0.04的高精度，同时与纯数据驱动方法相比，总计算时间减少了50-80%。

英文摘要

We propose a Weak-form Physics-Informed Neural Operator (WINO), a data-free framework that combines the efficiency of neural operators with the geometric flexibility of the $φ$-finite element method ($φ$-FEM). $φ$-FEM is an unfitted method that accommodates geometric variations without body-fitted meshes, where the domain geometry is represented by the level-set function $φ$. To impose the boundary conditions, Dirichlet problems adopt the $φ$-FEM lifting so only the homogeneous displacement contribution is learned, whereas traction-driven Neumann problems additionally predict the auxiliary fields necessary for the unfitted weak formulation. Parameters are trained by minimizing squared weak-form residuals aligned with $φ$-FEM together with squared penalties on the cut-cell auxiliary equations, which removes the need for large paired datasets of converged reference solutions. After training, WINO outputs can seed the nonlinear $φ$-FEM solvers as neural operator warm starts (NOWS), which reduce iteration counts relative to traditional cold-started solvers. Numerical benchmarks show that WINO achieves high accuracy below 0.04 across all benchmarks, while reducing total computational time by 50--80\% compared with purely data-driven methods.

URL PDF HTML ☆

赞 0 踩 0

2605.24632 2026-05-26 cs.CR cs.AI cs.LG

Demystifying the Mythos or Disrupting Bugonomics? From Zero-Day Asymmetry to Defender Remediation Throughput

揭秘神话或颠覆漏洞经济学？从零日不对称到防御者修复吞吐量

Alfredo Pesoli, Herman Errico, Lorenzo Cavallaro

发表机构 * University College London（伦敦大学学院）； Bynario

AI总结本文通过漏洞经济学视角分析LLM驱动的漏洞发现，指出其核心影响并非增加零日漏洞，而是提升防御者修复吞吐量，并利用Anthropic Mythos预览和Mozilla Firefox合作数据论证这一转变。

详情

AI中文摘要

最近，大型语言模型在生产软件中生成候选和确认漏洞的演示，重新引发了AI将重塑攻防安全的叙事。头条新闻强调能力，却很少审视成本和激励。本文通过漏洞经济学视角审视LLM驱动的漏洞发现：即生产、证明、优先级排序和修复安全相关缺陷的操作经济学。历史上，最引人注目的高端漏洞经济学是攻击方定价的，因为生产级零日漏洞和利用链是面向政府、经纪人和攻击方供应商的昂贵专家输出。防御方漏洞经济学早已存在于漏洞研究、奖励计划和供应商修复工作中；LLM辅助系统改变了其规模和分布。它们使得候选生成、代码理解、测试工具构建、影响证明草拟和报告准备在代码库规模上更便宜。利用和概念验证仍然重要，但在防御方工作流中，它们主要用于证明影响、指导优先级排序和证明修复的合理性。由此产生的瓶颈不仅仅是发现更多漏洞，而是吸收、验证、分类、修补和发布更大规模的报告流。利用Anthropic的Mythos预览和Mozilla Firefox合作的公开数据，以及公开的利用市场价格锚点和漏洞奖励计划，我们认为近期的转变不仅仅是更多的零日漏洞。而是向更广泛的防御者修复吞吐量迈进：低信号候选变得更便宜，证据丰富的修复变得更加重要，稀缺的能力转向维护者审查和发布工作。这种影响在开源领域尤为严重，因为LLM辅助发现可以增加报告量，而维护者侧的验证、分类、资金和发布能力可能无法扩展。

英文摘要

Recent demonstrations of large language models producing candidate and confirmed vulnerabilities in production software have renewed the narrative that AI will reshape offensive and defensive security. Headlines emphasize capability; they rarely interrogate costs and incentives. This paper examines LLM-driven vulnerability discovery through a bugonomics lens: the operational economics of producing, proving, prioritizing, and fixing security-relevant defects. Historically, the most visible high-end bugonomics was offense-priced because production-grade zero-days and exploit chains were expensive specialist outputs for governments, brokers, and offensive vendors. Defender-side bugonomics already existed in vulnerability research, reward programs, and vendor remediation work; LLM-assisted systems change its scale and distribution. They make candidate generation, code comprehension, harness construction, proof-of-impact drafting, and report preparation cheaper at codebase scale. Exploits and proofs of concept remain important, but in defender workflows they primarily prove impact, guide prioritization, and justify remediation. The resulting bottleneck is not only finding more bugs; it is absorbing, validating, triaging, patching, and shipping a larger stream of reports. Using public data from Anthropic's Mythos Preview and Mozilla Firefox collaborations, along with public exploit-market price anchors and vulnerability reward programs, we argue that the near-term shift is not simply more zero-days. It is a move toward broader defender remediation throughput: low-signal candidates become cheaper, evidence-rich remediation become more important, and scarce capacity shifts toward maintainer review and release work. The effect is acute in open source, where LLM-assisted discovery can increase report volume while maintainer-side validation, triage, funding, and release capacity may not scale.

URL PDF HTML ☆

赞 0 踩 0