arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2370
2508.03221 2026-05-29 cs.CR cs.CV

BadBlocks: Low-Cost and Stealthy Backdoor Attacks Tailored for Text-to-Image Diffusion Models

BadBlocks: 针对文本到图像扩散模型的低成本、隐蔽后门攻击

Jia Wu, Yu Pan, Junjun Yang, Yi Du

AI总结 提出BadBlocks攻击方法,通过仅污染UNet架构中的特定块,在保持其他组件不变的情况下,以30%的计算资源和20%的GPU时间实现高成功率且绕过注意力检测防御,揭示了不同神经层的脆弱性差异。

详情
AI中文摘要

尽管扩散模型在图像生成方面取得了显著进展,但最近的研究揭示了它们通过隐蔽的视觉或文本触发器易受后门攻击。虽然不断发展的防御机制可以通过视觉检查或特征分析检测大多数现有威胁,但我们引入了BadBlocks——一种新颖、轻量且高度隐蔽的攻击,挑战了这些防护措施。通过选择性地污染UNet架构中的特定块,同时保持其他组件不变,BadBlocks仅需传统攻击30%的计算资源和20%的GPU时间,有效地在消费级GPU上实现了后门注入的民主化。实证评估表明,BadBlocks实现了高攻击成功率,且感知质量损失可忽略不计,同时成功绕过了最先进的防御,特别是基于注意力的检测框架。层级别消融研究进一步证实,后门映射不需要全网络微调,揭示了不同神经层的脆弱性差异。总体而言,BadBlocks显著降低了执行后门攻击的门槛,构成了关键的安全风险。我们的代码可在 https://github.com/paoche11/BadBlocks 获取。

英文摘要

Despite the remarkable progress of diffusion models in image generation, recent studies reveal their vulnerability to backdoor attacks via covert visual or textual triggers. Although evolving defense mechanisms can detect most existing threats through visual inspection or feature analysis, we introduce BadBlocks-a novel, lightweight, and highly covert attack that challenges these safeguards. By selectively poisoning specific blocks within the UNet architecture while keeping other components intact, BadBlocks requires only 30% of the computational resources and 20% of the GPU time of conventional attacks, effectively democratizing backdoor injection on consumer-grade GPUs. Empirical evaluations demonstrate that BadBlocks achieves a high attack success rate with negligible perceptual quality loss, while successfully bypassing state-of-the-art defenses, particularly attention-based detection frameworks. Layer-level ablation studies further confirm that backdoor mapping does not require full-network fine-tuning, revealing the disparate vulnerability of different neural layers. Overall, BadBlocks significantly lowers the barrier for executing backdoor attacks, presenting a critical security risk. Our code is available at: https://github.com/paoche11/BadBlocks.

2507.06092 2026-05-29 cs.CR cs.AI cs.LG

Taming Data Challenges in ML-based Security Tasks Using Generative AI

驯服基于ML的安全任务中的数据挑战:使用生成式AI

Shravya Kanchi, Neal Mangaokar, Aravind Cheruvu, Sifat Muhammad Abdullah, Shirin Nilizadeh, Atul Prakash, Bimal Viswanath

AI总结 提出使用生成式AI(GenAI)生成的合成数据增强训练集,以改善机器学习安全分类器的泛化性能,在7个任务上实现最高32.6%的提升。

Comments Accepted at the 2026 ACM Asia Conference on Computer and Communications Security (AsiaCCS 2026)

详情
Journal ref
In Proc. ACM AsiaCCS 2026, Bangalore, India, June 1-5, 2026. ACM, 2026
AI中文摘要

基于机器学习的监督分类器广泛用于安全任务,其改进主要集中在算法进步上。我们认为,对分类器性能产生负面影响的数 据挑战受到的关注有限。我们解决以下研究问题:生成式AI(GenAI)的发展能否应对这些数据挑战并提高分类器性能?我们提出使用GenAI技术生成的合成数据增强训练数据集,以改善分类器的泛化能力。我们使用6种最先进的GenAI方法在7个不同的安全任务上评估了这种方法,并引入了一种名为Nimai的新型GenAI方案,该方案能够实现高度可控的数据合成。我们发现,GenAI技术可以显著提高安全分类器的性能,即使在数据严重受限的情况下(仅约180个训练样本),也能实现高达32.6%的提升。此外,我们证明GenAI可以促进部署后对概念漂移的快速适应,在调整过程中只需最少的标注。尽管取得了成功,但我们的研究发现,一些GenAI方案在某些安全任务上难以初始化(训练和生成数据)。我们还识别了特定任务的特征,如噪声标签、重叠的类别分布和稀疏特征向量,这些特征阻碍了使用GenAI提升性能。我们相信,我们的研究将推动未来针对安全任务的GenAI工具的开发。

英文摘要

Machine learning-based supervised classifiers are widely used for security tasks, and their improvement has been largely focused on algorithmic advancements. We argue that data challenges that negatively impact the performance of these classifiers have received limited attention. We address the following research question: Can developments in Generative AI (GenAI) address these data challenges and improve classifier performance? We propose augmenting training datasets with synthetic data generated using GenAI techniques to improve classifier generalization. We evaluate this approach across 7 diverse security tasks using 6 state-of-the-art GenAI methods and introduce a novel GenAI scheme called Nimai that enables highly controlled data synthesis. We find that GenAI techniques can significantly improve the performance of security classifiers, achieving improvements of up to 32.6% even in severely data-constrained settings (only ~180 training samples). Furthermore, we demonstrate that GenAI can facilitate rapid adaptation to concept drift post-deployment, requiring minimal labeling in the adjustment process. Despite successes, our study finds that some GenAI schemes struggle to initialize (train and produce data) on certain security tasks. We also identify characteristics of specific tasks, such as noisy labels, overlapping class distributions, and sparse feature vectors, which hinder performance boost using GenAI. We believe that our study will drive the development of future GenAI tools designed for security tasks.

2506.20344 2026-05-29 math.OC cs.LG

A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization

正则化深度矩阵分解的完整损失景观分析

Po Chen, Rujun Jiang, Peng Wang

AI总结 本文通过闭式表征所有临界点并分类其类型,揭示了正则化深度矩阵分解的损失景观,解释了梯度方法几乎总是收敛到局部极小值的原因。

Comments 30 pages, 2 figures

详情
AI中文摘要

尽管深度矩阵分解(DMF)在各个领域有广泛的应用,但其优化基础仍然很大程度上是开放的。在这项工作中,我们旨在通过全面研究正则化DMF问题的损失景观来填补这一空白。为此,我们首先提供了该问题所有临界点的闭式表征。在此基础上,我们建立了临界点是局部极小值、全局极小值、严格鞍点或非严格鞍点的精确条件。利用这些结果,我们推导出每个临界点要么是局部极小值要么是严格鞍点的充要条件。这为梯度方法几乎总是收敛到正则化DMF问题的局部极小值提供了见解。最后,我们进行了数值实验以可视化其损失景观,支持我们的理论。

英文摘要

Despite its wide range of applications across various domains, the optimization foundations of deep matrix factorization (DMF) remain largely open. In this work, we aim to fill this gap by conducting a comprehensive study of the loss landscape of the regularized DMF problem. Toward this goal, we first provide a closed-form characterization of all critical points of the problem. Building on this, we establish precise conditions under which a critical point is a local minimizer, a global minimizer, a strict saddle point, or a non-strict saddle point. Leveraging these results, we derive a necessary and sufficient condition under which every critical point is either a local minimizer or a strict saddle point. This provides insights into why gradient-based methods almost always converge to a local minimizer of the regularized DMF problem. Finally, we conduct numerical experiments to visualize its loss landscape to support our theory.

2505.21627 2026-05-29 cs.GT cs.AI cs.CY cs.LG

Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives

你的大语言模型是否在过度收费?分词、透明度与激励

Ander Artola Velasco, Stratis Tsirtsis, Nastaran Okati, Manuel Gomez-Rodriguez

AI总结 研究当前按token计费机制下,服务提供商可能通过策略性报告token数量来过度收费,并提出按字符线性定价的激励相容机制以消除该财务激励。

Comments Selected as an oral presentation at ICML 2026

详情
AI中文摘要

最先进的大语言模型需要专门的硬件和大量能源来运行。因此,提供大语言模型访问的基于云的服务变得非常流行。在这些服务中,用户为模型生成的输出支付的价格取决于模型用于生成该输出的token数量:他们为每个token支付固定价格。在这项工作中,我们表明这种定价机制为提供商创造了财务激励,使其策略性地虚报模型用于生成输出的token数量,而用户无法证明甚至不知道提供商是否在过度收费。然而,我们也表明,如果不诚实的提供商被强制要求透明地说明模型使用的生成过程,那么在不引起怀疑的情况下最优地虚报是困难的。尽管如此,作为概念验证,我们开发了一种高效的启发式算法,使提供商能够在不引起怀疑的情况下显著过度收费用户。关键的是,我们证明运行该算法的成本低于从过度收费用户中获得的额外收入,突显了当前按token计费机制下用户的脆弱性。此外,我们表明,为了消除策略性行为的财务激励,定价机制必须根据token的字符数线性定价。虽然这会使提供商的利润率因token而异,但我们引入了一个简单的方案,采用这种激励相容定价机制的提供商可以维持他们在按token计费机制下的平均利润率。在此过程中,为了说明和补充我们的理论结果,我们使用来自$ exttt{Llama}$、$ exttt{Gemma}$和$ exttt{Ministral}$系列的几个大语言模型以及来自LMSYS Chatbot Arena平台的输入提示进行了实验。

英文摘要

State-of-the-art large language models require specialized hardware and substantial energy to operate. As a consequence, cloud-based services that provide access to large language models have become very popular. In these services, the price users pay for an output provided by a model depends on the number of tokens the model uses to generate it: they pay a fixed price per token. In this work, we show that this pricing mechanism creates a financial incentive for providers to strategize and misreport the (number of) tokens a model used to generate an output, and users cannot prove, or even know, whether a provider is overcharging them. However, we also show that, if an unfaithful provider is obliged to be transparent about the generative process used by the model, misreporting optimally without raising suspicion is hard. Nevertheless, as a proof-of-concept, we develop an efficient heuristic algorithm that allows providers to significantly overcharge users without raising suspicion. Crucially, we demonstrate that the cost of running the algorithm is lower than the additional revenue from overcharging users, highlighting the vulnerability of users under the current pay-per-token pricing mechanism. Further, we show that, to eliminate the financial incentive to strategize, a pricing mechanism must price tokens linearly on their character count. While this makes a provider's profit margin vary across tokens, we introduce a simple prescription under which the provider who adopts such an incentive-compatible pricing mechanism can maintain the average profit margin they had under the pay-per-token pricing mechanism. Along the way, to illustrate and complement our theoretical results, we conduct experiments with several large language models from the $\texttt{Llama}$, $\texttt{Gemma}$ and $\texttt{Ministral}$ families, and input prompts from the LMSYS Chatbot Arena platform.

2505.20955 2026-05-29 cs.CR cs.LG

Enhancing Membership Inference Attacks on Diffusion Models from a Frequency-Domain Perspective

从频域角度增强扩散模型的成员推理攻击

Puwei Lian, Yujun Cai, Songze Li, Bingkun Bao

AI总结 本文从频域角度揭示扩散模型处理高频信息的缺陷导致成员推理攻击误分类,并提出即插即用的高频滤波模块以提升攻击性能。

Comments Accepted to Forty-Third International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

扩散模型在图像生成方面取得了巨大成功,但也引发了关于隐私和版权的重要担忧。成员推理攻击(MIAs)旨在确定特定数据是否在模型训练阶段被使用。由于当前针对扩散模型的MIAs通常利用模型的图像预测能力,我们将其形式化为一个统一的一般范式,通过计算成员分数进行成员识别。在该范式下,我们通过实验发现现有攻击忽略了扩散模型处理高频信息时的固有缺陷。因此,该缺陷导致包含更多高频内容的成员数据被误分类为留出数据,而高频内容较少的留出数据则倾向于被误分类为成员数据。此外,我们从理论上证明该缺陷降低了攻击的成员优势,从而干扰了对成员数据和留出数据的有效区分。基于这一发现,我们提出了一种即插即用的高频滤波模块,以减轻该缺陷的不利影响,该模块可以无缝集成到一般范式中的任何攻击中,且无需额外时间成本。大量实验证实,该模块在不同数据集和模型上显著提升了基线攻击的性能。代码可在 https://github.com/poetic2/FreMIA 获取。

英文摘要

Diffusion models have achieved tremendous success in image generation, but they also raise significant concerns regarding privacy and copyright issues. Membership Inference Attacks (MIAs) are designed to ascertain whether specific data was utilized during a model's training phase. As current MIAs for diffusion models typically exploit the model's image prediction ability, we formalize them into a unified general paradigm that computes the membership score for membership identification. Under this paradigm, we empirically find that existing attacks overlook the inherent deficiency in how diffusion models process high-frequency information. Consequently, this deficiency leads to member data with more high-frequency content being misclassified as hold-out data, and hold-out data with less high-frequency content tends to be misclassified as member data. Moreover, we theoretically demonstrate that this deficiency reduces the membership advantage of attacks, thereby interfering with the effective discrimination of member data and hold-out data. Based on this insight, we propose a plug-and-play high-frequency filter module to mitigate the adverse effects of the deficiency, which can be seamlessly integrated into any attacks within the general paradigm without additional time costs. Extensive experiments corroborate that this module significantly improves the performance of baseline attacks across different datasets and models. Code is available at https://github.com/poetic2/FreMIA.

2501.12374 2026-05-29 cs.HC cs.AI cs.CY

Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists

专业知识提升AI使用:比较普通人和专业艺术家的实验证据

Thomas F. Eisenmann, Andres Karjus, Mar Canet Sola, Levin Brinkmann, Bramantyo Ibrahim Supriyatno, Iyad Rahwan

AI总结 通过实验比较50位专业艺术家和普通人使用生成式AI进行图像复制和创意生成的表现,发现艺术家的专业技能迁移到AI使用中,在复制准确性和发散思维上均优于普通人,而GPT-4o在创意任务上平均略优于艺术家但未超越最佳人类。

Comments Eisenmann and Karjus contributed equally to this work and share first authorship

详情
Journal ref
International Journal of Human-Computer Interaction, 2026, pp 1-22
AI中文摘要

生成式AI的新能力引发了关于人类专业知识未来角色的疑问:AI是否拉平了专业艺术家和普通人之间的差距,还是专业知识增强了AI的使用?专家在分析和绘制视觉艺术时使用的认知技能是否也转移到使用这些新工具上?这项预先注册的研究对50位专业艺术家和人口统计学匹配的普通人样本进行了实验比较。我们的跨学科团队开发了两项任务,涉及图像复制和创意图像生成,评估了他们的复制准确性和发散思维。我们为实验实施了一个定制平台,由现代文本到图像AI驱动。结果显示,艺术家比普通参与者产生了更准确的复制和更多发散的想法,突显了专业知识的技能转移——即使是在生成式AI的有限空间内。我们还探索了一个典型的视觉能力大型语言模型(GPT-4o)的表现:在复制任务上与艺术家相当,在创意任务上平均略优于艺术家,但从未超越最佳人类。这些发现强调了将艺术技能与AI整合的重要性,表明协作协同的潜力可能重塑创意产业和艺术教育。

英文摘要

Generative AI's novel capacities raise questions about the future role of human expertise: does AI level the playing field between professional artists and laypeople, or does expertise enhance AI use? Do the cognitive skills experts make use of in analyzing and drawing visual art also transfer to using these new tools? This pre-registered study conducts experimental comparisons between 50 professional artists and a demographically matched sample of laypeople. Our interdisciplinary team developed two tasks involving image replication and creative image creation, assessing their copying accuracy and divergent thinking. We implemented a bespoke platform for the experiment, powered by a modern text-to-image AI. Results reveal artists produced more accurate copies and more divergent ideas than lay participants, highlighting a skill transfer of professional expertise - even to the confined space of generative AI. We also explored how well an exemplary vision-capable large language model (GPT-4o) would fare: on par in copying and slightly better on average than artists in the creative task, although never above best humans. These findings highlight the importance of integrating artistic skills with AI, suggesting a potential for collaborative synergy that could reshape creative industries and arts education.

2501.10332 2026-05-29 cs.CY cs.AI

Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems

Agent4Edu:通过生成式智能体为智能教育系统生成学习者响应数据

Weibo Gao, Qi Liu, Linan Yue, Fangzhou Yao, Rui Lv, Zheng Zhang, Hao Wang, Zhenya Huang

AI总结 提出Agent4Edu,一种利用大语言模型构建生成式智能体模拟学习者行为,以解决智能教育系统中离线指标与在线性能差异的问题,并支持个性化学习算法评估与优化。

Comments Accepted by AAAI2025

详情
AI中文摘要

个性化学习是智能教育系统中一种有前景的教育策略,旨在提高学习者的练习效率。然而,离线指标与在线性能之间的差异严重阻碍了其进展。为了解决这一挑战,我们引入了Agent4Edu,一种新颖的个性化学习模拟器,通过大语言模型(LLMs)利用人类智能的最新进展。Agent4Edu采用基于LLM的生成式智能体,配备针对个性化学习算法定制的学习者档案、记忆和行动模块。学习者档案使用真实世界的响应数据初始化,捕捉练习风格和认知因素。受人类心理学理论启发,记忆模块记录练习事实和高层摘要,并集成反思机制。行动模块支持多种行为,包括练习理解、分析和响应生成。每个智能体可以与个性化学习算法(如计算机自适应测试)交互,实现对定制服务的多方面评估和增强。通过全面评估,我们探讨了Agent4Edu的优势和不足,强调了智能体与人类学习者之间响应的一致性和差异。代码、数据和附录可在https://github.com/bigdata-ustc/Agent4Edu公开获取。

英文摘要

Personalized learning represents a promising educational strategy within intelligent educational systems, aiming to enhance learners' practice efficiency. However, the discrepancy between offline metrics and online performance significantly impedes their progress. To address this challenge, we introduce Agent4Edu, a novel personalized learning simulator leveraging recent advancements in human intelligence through large language models (LLMs). Agent4Edu features LLM-powered generative agents equipped with learner profile, memory, and action modules tailored to personalized learning algorithms. The learner profiles are initialized using real-world response data, capturing practice styles and cognitive factors. Inspired by human psychology theory, the memory module records practice facts and high-level summaries, integrating reflection mechanisms. The action module supports various behaviors, including exercise understanding, analysis, and response generation. Each agent can interact with personalized learning algorithms, such as computerized adaptive testing, enabling a multifaceted evaluation and enhancement of customized services. Through a comprehensive assessment, we explore the strengths and weaknesses of Agent4Edu, emphasizing the consistency and discrepancies in responses between agents and human learners. The code, data, and appendix are publicly available at https://github.com/bigdata-ustc/Agent4Edu.

2411.03006 2026-05-29 math.CO cs.CC cs.DM cs.LG math.OC

Neural Networks and (Virtual) Extended Formulations

神经网络与(虚拟)扩展公式

Christoph Hertrich, Georg Loho

AI总结 通过将神经网络表示能力与多面体的扩展复杂度关联,证明单调或输入凸神经网络规模的下界,并引入虚拟扩展复杂度以推广到一般神经网络。

详情
AI中文摘要

具有分段线性激活函数(如修正线性单元(ReLU)或maxout)的神经网络是现代机器学习中最基础的模型之一。我们通过将其表示能力与多面体$P$的扩展复杂度$\mathrm{xc}(P)$联系起来,向证明此类神经网络规模的下界迈出了一步。$\mathrm{xc}(P)$是组合优化和多面体几何中一个被充分研究的概念,描述了将$P$建模为线性规划所需的不等式数量。我们证明,$\mathrm{xc}(P)$是任何解决$P$上线性优化问题的单调或输入凸神经网络规模的下界。这暗示了此类神经网络在多种问题(包括多项式可解的最大权匹配问题)上的指数级下界。 为了尝试对一般神经网络也证明类似的下界,我们引入了虚拟扩展复杂度$\mathrm{vxc}(P)$的概念,它推广了$\mathrm{xc}(P)$,描述了将$P$上的线性优化问题表示为两个线性规划之差所需的不等式数量。我们证明$\mathrm{vxc}(P)$是任何在$P$上进行优化的神经网络规模的下界。虽然推导$\mathrm{vxc}(P)$的有用下界仍是一个开放问题,但我们通过证明给定具有小编码大小的虚拟扩展公式可以高效优化多面体$P$,论证了这一概念值得独立于神经网络进行研究。

英文摘要

Neural networks with piecewise linear activation functions, such as rectified linear units (ReLU) or maxout, are among the most fundamental models in modern machine learning. We make a step towards proving lower bounds on the size of such neural networks by linking their representative capabilities to the notion of the extension complexity $\mathrm{xc}(P)$ of a polytope $P$. This is a well-studied quantity in combinatorial optimization and polyhedral geometry describing the number of inequalities needed to model $P$ as a linear program. We show that $\mathrm{xc}(P)$ is a lower bound on the size of any monotone or input-convex neural network that solves the linear optimization problem over $P$. This implies exponential lower bounds on such neural networks for a variety of problems, including the polynomially solvable maximum weight matching problem. In an attempt to prove similar bounds also for general neural networks, we introduce the notion of virtual extension complexity $\mathrm{vxc}(P)$, which generalizes $\mathrm{xc}(P)$ and describes the number of inequalities needed to represent the linear optimization problem over $P$ as a difference of two linear programs. We prove that $\mathrm{vxc}(P)$ is a lower bound on the size of any neural network that optimizes over $P$. While it remains an open question to derive useful lower bounds on $\mathrm{vxc}(P)$, we argue that this quantity deserves to be studied independently from neural networks by proving that one can efficiently optimize over a polytope $P$ given a virtual extended formulation with small encoding size.

2410.07287 2026-05-29 physics.soc-ph cs.AI

Crafting Desirable Climate Trajectories with RL Explored Socio-Environmental Simulations

利用强化学习探索的社会环境模拟来塑造理想的气候轨迹

James Rudd-Jones, Fiona Thendean, María Pérez-Ortiz

AI总结 本研究通过引入多智能体强化学习替代传统求解器,在综合评估模型中模拟合作与竞争的社会互动,发现合作智能体能一致地实现减排与经济改善,而竞争则导致难以达成理想气候目标。

Comments 23 pages, 13 Figures

详情
AI中文摘要

气候变化构成生存威胁,需要有效的气候政策来实施有影响力的变革。该领域的决策极其复杂,涉及冲突的实体和证据。在过去几十年中,政策制定者越来越多地使用模拟和计算方法来指导部分决策。综合评估模型(IAMs)是其中一种方法,它结合了社会、经济和环境模拟来预测潜在政策效果。例如,联合国在其最近的政府间气候变化专门委员会(IPCC)报告中使用了IAMs的输出。传统上,这些模型使用递归方程求解器求解,但存在若干缺点,例如在不确定性下决策困难。最近使用强化学习(RL)替代传统求解器的初步工作显示,在不确定和嘈杂场景中决策有前景的结果。我们通过引入多个交互的RL智能体作为初步分析,扩展了这项工作,以模拟驱动当前气候危机的各种利益相关者或国家之间复杂的社会互动。我们的发现表明,该框架中的合作智能体能够一致地规划出通往更理想未来的路径,表现为减少碳排放和改善经济。然而,当引入智能体之间的竞争时,例如通过使用相反的奖励函数,理想的气候未来很少能达到。模拟竞争对于提高这些模拟的真实性至关重要,因此我们通过可视化导致更不确定行为的状态来采用策略解释,以理解算法失败的原因。最后,我们强调了当前局限性和未来工作的方向,以确保未来技术应用于政策制定。

英文摘要

Climate change poses an existential threat, necessitating effective climate policies to enact impactful change. Decisions in this domain are incredibly complex, involving conflicting entities and evidence. In the last decades, policymakers increasingly use simulations and computational methods to guide some of their decisions. Integrated Assessment Models (IAMs) are one of such methods, which combine social, economic, and environmental simulations to forecast potential policy effects. For example, the UN uses outputs of IAMs for their recent Intergovernmental Panel on Climate Change (IPCC) reports. Traditionally these have been solved using recursive equation solvers, but have several shortcomings, e.g. struggling at decision making under uncertainty. Recent preliminary work using Reinforcement Learning (RL) to replace the traditional solvers shows promising results in decision making in uncertain and noisy scenarios. We extend on this work by introducing multiple interacting RL agents as a preliminary analysis on modelling the complex interplay of socio-interactions between various stakeholders or nations that drives much of the current climate crisis. Our findings show that cooperative agents in this framework can consistently chart pathways towards more desirable futures in terms of reduced carbon emissions and improved economy. However, upon introducing competition between agents, for instance by using opposing reward functions, desirable climate futures are rarely reached. Modelling competition is key to increased realism in these simulations, as such we employ policy interpretation by visualising what states lead to more uncertain behaviour, to understand algorithm failure. Finally, we highlight the current limitations and avenues for further work to ensure future technology uptake for policy derivation.

2404.16077 2026-05-29 cs.PL cs.LG

CompilerDream: Learning a Compiler World Model for General Code Optimization

CompilerDream: 学习编译器世界模型以实现通用代码优化

Chaoyi Deng, Jialong Wu, Ningya Feng, Jianmin Wang, Mingsheng Long

AI总结 提出基于模型的强化学习方法CompilerDream,通过编译器世界模型模拟优化pass属性并训练智能体,实现跨应用场景和语言的通用代码优化,在零样本泛化上超越LLVM内置优化。

Comments KDD 2025 camera-ready version with extended appendix. Code is available at https://github.com/thuml/CompilerDream. This update additionally fixes an issue in Table 6 where the dataset names in three rows were ordered incorrectly

详情
AI中文摘要

编译器中的有效代码优化对计算机和软件工程至关重要。这些优化的成功主要取决于应用于代码的优化pass的选择和排序。虽然大多数编译器依赖固定的优化pass序列,但当前寻找最优序列的方法要么使用不切实际的慢速搜索算法,要么使用难以泛化到训练时未见代码的学习方法。我们提出了CompilerDream,一种基于模型的强化学习方法,用于通用代码优化。CompilerDream包含一个编译器世界模型,该模型准确模拟优化pass的内在属性,以及一个在此模型上训练以产生有效优化策略的智能体。通过在大规模程序数据集上训练,CompilerDream能够作为跨各种应用场景和源代码语言的通用代码优化器。我们的广泛实验首先突出了CompilerDream在自动调优方面的强大优化能力,它引领了CompilerGym排行榜。更重要的是,大规模训练的编译器世界模型和智能体的零样本泛化能力在多个数据集上表现出色,在值预测和端到端代码优化两种设置中均超越了LLVM的内置优化和其他最先进方法。

英文摘要

Effective code optimization in compilers is crucial for computer and software engineering. The success of these optimizations primarily depends on the selection and ordering of the optimization passes applied to the code. While most compilers rely on a fixed sequence of optimization passes, current methods to find the optimal sequence either employ impractically slow search algorithms or learning methods that struggle to generalize to code unseen during training. We introduce CompilerDream, a model-based reinforcement learning approach to general code optimization. CompilerDream comprises a compiler world model that accurately simulates the intrinsic properties of optimization passes and an agent trained on this model to produce effective optimization strategies. By training on a large-scale program dataset, CompilerDream is equipped to serve as a general code optimizer across various application scenarios and source-code languages. Our extensive experiments first highlight CompilerDream's strong optimization capabilities for autotuning, where it leads the CompilerGym leaderboard. More importantly, the zero-shot generalization ability of large-scale trained compiler world model and agent, excels across diverse datasets, surpassing LLVM's built-in optimizations and other state-of-the-art methods in both settings of value prediction and end-to-end code optimization.

2605.29537 2026-05-29 cs.CC cs.LG cs.LO

The Complexity of Verifying Feedforward Neural Networks in Quantised Settings

量化设置下前馈神经网络验证的复杂性

Eric Alsmann, Martin Lange, Marco Sälzer

AI总结 研究量化设置下前馈神经网络验证的计算复杂性,区分三类网络并分析线性规划和位向量规范下的复杂性,证明量化网络验证仍为NP完全,并为动态量化网络建立上界。

详情
AI中文摘要

我们研究了量化设置下神经网络验证的计算复杂性。我们区分了三类前馈神经网络(FNNs):具有精确有理权重的有理FNNs、权重来自有限宽度算术的量化FNNs,以及根据给定有限宽度算术评估有理网络的动态量化FNNs。我们考虑了文献中使用的两种规范类型。线性规划(LP)规范是线性约束的合取,而位向量(BV)规范允许在位级别进行推理,并能表达非线性约束。我们的结果给出了这些验证问题的复杂性全景。对于具有固定算术精度的量化FNNs,我们证明在LP和BV规范下的验证仍然是NP完全的,与有理情况下的复杂性相匹配。对于具有BV规范的动态量化FNNs,我们建立了上界,补充了先前已知的PSPACE-hard结果。

英文摘要

We investigate the computational complexity of neural network verification in quantised settings. We distinguish three classes of Feedforward Neural Networks (FNNs): rational FNNs with exact rational weights, quantised FNNs whose weights come from a finite-width arithmetic, and dynamically quantised FNNs in which rational networks are evaluated with respect to a given finite-width arithmetic. We consider two types of specifications used in the literature. Linear programming (LP) specifications are conjunctions of linear constraints, while bit-vector (BV) specifications allow reasoning at the bit level and can express non-linear constraints. Our results give a complexity landscape of these verification problems. For quantised FNNs with fixed arithmetic precision, we show that verification under both LP and BV specifications remains NP-complete, matching the complexity of the rational case. For dynamically quantised FNNs with BV specifications, we establish upper bounds, complementing a previously known PSPACE-hardness result.

2605.29532 2026-05-29 cs.SE cs.AI

GUITestScape: Towards Open-set Evaluation on Exploratory GUI Testing

GUITestScape:面向探索性GUI测试的开放集评估

Xiaoyi Chen, Yifei Gao, Yang Xu, Xingxing Song, Yi Zhang, Jitao Sang

AI总结 提出GUITestScape基准和GUIJudge评估器,通过覆盖交互与显示缺陷的508个预设缺陷及过程感知评估方法,解决现有GUI测试评估局限于预定义标注和交互缺陷的问题。

详情
AI中文摘要

探索性GUI测试对MLLM代理提出了特别高的要求:在没有预定义测试脚本的情况下,代理必须自主导航应用程序并通过自身交互发现缺陷。然而,当前的评估在两个层面上存在不足。首先,现有基准几乎只关注交互缺陷,将显示缺陷排除在评估框架之外。其次,评估协议局限于预定义的缺陷标注,将测试过程简化为单一终态判断,混淆了性质不同的失败模式。为解决这些挑战,我们提出了GUITestScape,一个交互式基准,涵盖61个真实Android应用程序和508个预设缺陷(包括交互和显示类型),并引入了GUIJudge,一个开放集评估器,将代理的测试轨迹分解为可独立诊断的能力。实验结果表明,GUIJudge在预定义标注之外实现了可靠的过程感知评估,显著优于所有基线。在GUITestScape上的基准测试进一步揭示,检测仍然是现有模型在两种缺陷类型上的关键瓶颈,并且将GUIJudge的验证器集成到现有代理中可以在不重新训练的情况下显著提升其检测性能。

英文摘要

Exploratory GUI testing is a particularly demanding setting for MLLM agents: without predefined test scripts, an agent must autonomously navigate an application and discover defects through its own interaction. However, current evaluation falls short on two fronts. First, existing benchmarks focus almost exclusively on interaction defects, leaving display defects outside the evaluation frame. Second, evaluation protocols are bound to predefined defect annotations, collapsing the testing process into a single end-state judgment that conflates qualitatively distinct failure modes. To address these challenges, we present GUITestScape, an interactive benchmark covering 61 real-world Android applications and 508 preset defects spanning interaction and display types, and introduce GUIJudge, an open-set evaluator that decomposes an agent's testing trajectory into independently diagnosable capabilities. Experimental results demonstrate that GUIJudge achieves reliable process-aware evaluation beyond predefined annotations, substantially outperforming all baselines. Benchmarking on GUITestScape further reveals that detection remains the critical bottleneck for existing models across both defect types, and that integrating GUIJudge's verifiers into existing agents significantly boosts their detection performance without retraining.

2605.29524 2026-05-29 cs.CR cs.AI

KBF: Knowledge Boundary as Fingerprint for Language Model and Black-Box API Auditing

KBF:知识边界作为语言模型和黑盒API审计的指纹

Yijia Fang, Yiqing Feng, Bingyu Li, Mingxun Zhou

AI总结 提出KBF协议,利用知识边界附近的稳定数值召回率作为指纹,低成本黑盒审计模型API,检测替代和混合路由攻击。

Comments 20 pages, 13 figures

详情
AI中文摘要

中继和转售API越来越多地中介对大型语言模型(LLM)的访问,但用户无法直接验证声称的端点是否实际服务于广告中的模型。我们引入了KBF,一种低成本的黑盒审计协议,利用知识边界附近的稳定数值召回率对模型API进行指纹识别。在16个生产LLM端点上,KBF标记了所有155个经济相关的替代,而没有拒绝任何同模型对照,在部署变化下保持稳定,检测到仅5-10%流量被替代的高分离度混合路由攻击,并发现六个平台影子API审计中27个平台模型单元中的7个与其参考端点在统计上不一致,不一致集中在高级Claude端点上。

英文摘要

Relay and reseller APIs increasingly intermediate access to large language models (LLMs), but users have no direct way to verify that a claimed endpoint is actually serving the advertised model. We introduce KBF, a low-cost black-box auditing protocol that fingerprints model APIs using stable numerical recall near the knowledge boundary. Across 16 production LLM endpoints, KBF flags all 155 economically relevant substitutions without rejecting any same-model controls, remains stable under deployment variation, detects high-separation mixed-routing attacks when only 5-10% of traffic is substituted, and finds that 7 of 27 platform model cells in a six-platform shadow API audit are statistically inconsistent with their reference endpoints, with inconsistencies concentrated on premium Claude endpoints.

2605.29518 2026-05-29 cs.NI cs.AI

Network Optimization Aspects of Autonomous Vehicles: Challenges and Future Directions

自动驾驶汽车的网络优化方面:挑战与未来方向

Rudolf Krecht, Tamas Budai, Erno Horvath, Akos Kovacs, Nobert Marko, Miklos Unger

AI总结 本文综述了自动驾驶汽车网络优化的多学科方法,包括协同感知,旨在消除误解并展望未来方向。

详情
AI中文摘要

全球大趋势,如城市化、人口增长和新兴网络解决方案,正在加速互联和自动驾驶汽车(CAVs)行业的发展。公众对CAVs的看法中存在许多事实、一些误解,甚至一些兴奋。本文的主要目标是通过呈现各种多学科方法(如协同感知)来提供全面综述、消除误解,并概述自动驾驶汽车网络优化方面的未来。基于我们在CAVs方面的广泛经验,我们旨在分享我们获得的一些见解和知识,以及相关的用例和实验结果。

英文摘要

Global megatrends, such as urbanization, population growth, and emerging network solutions are accelerating the development of the Connected and Autonomous Vehicles (CAVs) industry. There are many truths, some misconceptions, and even some excitement about CAVs in the public's opinion. The main objective of the current article is to provide a comprehensive review, eliminate misconceptions, and outline the future of the network optimization aspects of autonomous vehicles by presenting various multidisciplinary methods, such as cooperative perception. Given our extensive experience with CAVs, we are aiming to share some of the insights and knowledge we have gained, along with relevant use-cases and experiment results.

2605.29493 2026-05-29 cs.CY cs.AI

The New Pro Se: Generative AI and the Surge in Federal Civil Self-Representation

新 Pro Se:生成式人工智能与联邦民事诉讼自我代理的激增

Or Cohen-Sasson

AI总结 本文利用约280万份联邦民事诉讼数据,分析生成式AI普及后自我代理原告率上升、投诉文本变化、诉讼结果及原告构成的变化,发现AI标记投诉更密集引用、多为首次起诉者、地理分布不均,且未改善胜诉率。

Comments 15 pages, 7 figures

详情
AI中文摘要

自生成式AI工具广泛公开以来,联邦民事诉讼中自我代理(pro se)原告显著增加。本文利用约280万份诉状分析这一变化,探究后GenAI时期是否不仅与更多自我代理诉状相关,还与投诉文本、诉讼结果及自我代理诉讼人构成的可检测变化有关。使用2008-2025财年的民事起诉数据,我们发现联邦民事自我代理原告率从GenAI前的11.33%上升至GenAI后的16.94%,增加了5.61个百分点,且在趋势和协变量调整稳健性检验后仍然显著。然后,我们聚焦于民权和其他法定案件,其中增长尤为明显,并将案件元数据与自我代理投诉联系起来。利用文体学AI检测指标,我们开发了一个可解释的AI一致性起草度量。针对GenAI前基线校准的阈值,后GenAI非格式投诉中净AI标记比例为13.9%。对AI标记投诉的分析显示,它们引用更密集,与首次起诉者而非重复起诉者不成比例地相关,且地理分布不均。这种构成模式表明,AI一致性起草不仅仅是重复起诉者现象;它还包含女性原告(通过姓名推断)的适度、暗示性增加。我们没有发现胜诉率提高的证据;事实上,AI标记投诉更可能被驳回并在更早的程序阶段终止。这些发现提出了关于司法可及性和法院筛查负担的新问题,并加剧了法律形式性与法律效力之间的区别。

英文摘要

Since public access to generative AI tools became widespread, federal civil litigation has seen a marked increase in pro se (self-represented) plaintiffs. This paper analyzes that shift using ~2.8 million filings, asking whether the post-GenAI period is associated not only with more pro se filings, but also with detectable changes in complaint text, litigation outcomes, and the composition of pro se litigants. Using civil filing data from FY2008-2025, we find that the federal civil pro se plaintiff rate rose from 11.33% pre-GenAI to 16.94% post-GenAI, a 5.61 percentage-point increase that persists after trend and covariate-adjusted robustness checks. We then focus on Civil Rights and Other Statutory cases, where the increase is especially pronounced, and link case metadata to pro se complaints. Drawing on stylometric AI detection indicators, we develop an interpretable measure of AI-consistent drafting. Against a threshold calibrated to the pre-GenAI baseline, the net AI-flagged share is 13.9% of post-GenAI non-form complaints. Analysis of the AI-flagged complaints shows that they are more citation-dense, disproportionately associated with first-time rather than repeat filers, and geographically unevenly distributed. This composition pattern suggests that AI-consistent drafting is not merely a repeat-filer phenomenon; it also includes a modest, suggestive increase in name-inferred female plaintiffs. We find no evidence of improved win rates; in fact, AI-flagged complaints are more likely to be dismissed and to terminate at earlier procedural phases. These findings raise new questions about access to justice and court screening burdens, and sharpen the distinction between legal formality and legal efficacy.

2605.29478 2026-05-29 cs.NE cs.AI

Evolutionary Rule Extraction from Corporate Default Prediction Models

企业违约预测模型中的进化规则提取

Desirè Fabbretti, Matteo Pasquino, Elia Pacioni, Caterina Lucarelli, Davide Calvaresi

AI总结 本研究提出DEXiRE-EVO进化规则提取框架,结合多目标优化与CIU可解释性方法,从机器学习违约预测模型中提取经济意义明确的规则,兼顾预测性能与可解释性。

详情
AI中文摘要

中小企业(SMEs)在大多数经济体中占企业多数,常面临财务约束和更高的财务困境脆弱性。因此,预测中小企业违约对金融机构、政策制定者和研究人员至关重要。机器学习(ML)的最新进展提高了信用风险建模的预测性能。然而,复杂模型的有限可解释性引发了透明度和监管合规方面的担忧。本研究调查了中小企业的违约预测因子,并应用可解释人工智能(XAI)技术。使用2015-2024年间50,718家意大利中小企业的面板数据,我们比较了传统计量经济学方法与多种ML分类器。实证结果表明,ML模型在平衡准确率和PR-AUC方面显著优于传统逻辑回归基准。为解决可解释性挑战,我们引入了DEXiRE-EVO,一种新颖的进化规则提取框架,结合了多目标优化与上下文重要性和效用(CIU)可解释性方法。提取的规则揭示了与中小企业财务困境相关的经济意义模式,突出了内部流动性生成薄弱、内部资本侵蚀、高杠杆和运营效率低下的作用。此外,宏观经济背景条件和财务不稳定的持续性有助于识别高风险企业。总体而言,结果表明,将ML与进化规则提取相结合可以提高信用风险建模中的预测性能和可解释性,从而支持金融环境中更透明、数据驱动的决策。

英文摘要

Small and medium-sized enterprises (SMEs) represent the majority of firms in most economies and often face financial constraints and higher vulnerability to financial distress. Predicting SME default is therefore crucial for financial institutions, policymakers, and researchers. Recent advances in machine learning (ML) have improved predictive performance in credit risk modeling. Yet, the limited interpretability of complex models raises concerns regarding transparency and regulatory compliance. This study investigates SME's default predictors and applies explainable artificial intelligence (XAI) techniques to them. Using a panel of 50,718 Italian SME over the period 2015-2024, we compare traditional econometric approaches with several ML classifiers. The empirical results show that ML models significantly outperform the traditional logistic regression benchmark in terms of Balanced Accuracy and PR-AUC. To address the interpretability challenge, we introduce DEXiRE-EVO, a novel evolutionary rule extraction framework that combines multi-objective optimization with the Contextual Importance and Utility (CIU) explainability method. The extracted rules reveal economically meaningful patterns associated with SME financial distress, highlighting the roles of weak internal liquidity generation, internal capital erosion, high leverage, and operational inefficiency. Additionally, contextual macroeconomic conditions and the persistence of financial instability contribute to identifying high-risk firms. In general, the results show that combining ML with evolutionary rule extraction can improve both predictive performance and interpretability in credit risk modeling, thus supporting more transparent, data-driven decision-making in financial environments.

2605.29473 2026-05-29 cs.HC cs.AI cs.CL cs.CY cs.SI

Inform, Coach, Relate, Listen: Auditing LLM Caregiving Support Roles

告知、指导、共情、倾听:审计LLM护理支持角色

Drishti Goel, Agam Goyal, Veda Duddu, Olivia Pal, Jeongah Lee, Qiuyue Joy Zhong, Violeta J. Rodriguez, Daniel S. Brown, Dong Whi Yoo, Ravi Karkar, Koustuv Saha

AI总结 本研究通过操作化四种社会支持角色(告知、指导、共情、倾听),评估大型语言模型在非正式护理对话中的安全概况,发现支持角色系统性地影响交互风险,且存在感知质量-安全性权衡。

详情
AI中文摘要

语言模型越来越多地被部署用于非正式护理环境中的对话支持,在这些环境中,交互通常超出信息寻求范围:护理者在应对不确定、关系复杂的护理决策时,寻求情感安慰、指导和帮助。然而,大多数安全评估在通用提示下评估模型行为,留下一个关键问题未加审视:模型的安全概况是否会随其支持角色而变化?我们通过操作化四种基于社会支持理论的专家评审支持角色来研究这一点:告知、指导、共情和倾听,并将它们与两个基线控制条件(基本提示条件和检索增强生成条件)进行比较。我们在三个语言模型(GPT-4o-mini、Llama-3.1-8B-Instruct和MedGemma-1.5-4b-it)上,对来自在线阿尔茨海默病及相关痴呆症社区的5,000个真实世界查询进行了评估。我们发现,LLM的支持角色系统地影响了交互风险的普遍性和构成。此外,一项人类评估研究揭示了感知质量-安全性权衡:更具指导性、信息导向的角色被认为更有帮助和值得信赖,尽管它们表现出更高的交互风险概况。我们发布了约90,000个带有风险注释的支持角色条件模型响应,作为研究更安全的LLM中介对话支持的生态基础资源。

英文摘要

Language models are increasingly being deployed for conversational support in informal caregiving contexts, where interactions often extend beyond information-seeking: caregivers seek emotional reassurance, guidance, and help, while navigating uncertain, relationally complex care decisions. Yet most safety evaluations assess model behavior under generic prompts, leaving a critical question unexamined: does a model's safety profile change with its support role? We study this by operationalizing four expert-reviewed support roles grounded in social support theory: Inform, Coach, Relate, and Listen, and comparing them against two baseline controls: a basic prompting condition and a retrieval-augmented generation (RAG) condition. We evaluate across three language models (GPT-4o-mini, Llama-3.1-8B-Instruct, and MedGemma-1.5-4b-it) on 5,000 real-world queries from online Alzheimer's Disease and Related Dementias (ADRD) communities. We find that the LLM's support role systematically shapes both the prevalence and composition of interactional risks. Furthermore, a human evaluation study reveals a perceived quality--safety tension: more directive, information-oriented roles are rated as more helpful and trustworthy despite exhibiting elevated interactional risk profiles. We release ~90,000 support role-conditioned model responses with risk annotations as an ecologically grounded resource for research on safer LLM-mediated conversational support.

2605.29468 2026-05-29 cs.CR cs.AI

SciIntBench: Measuring LLM Compliance with Research Integrity Norms Under Adversarial Framing

SciIntBench: 衡量大语言模型在对抗性框架下对科研诚信规范的遵从度

Almene De Meran Meguimtsop, Maria Leonor Pacheco, Daniel E. Acuna

AI总结 提出SciIntBench对抗性基准,通过810个提示评估16个LLM在10个RCR类别中的框架敏感拒绝与帮助行为,发现模型对显性不当行为拒绝可靠,但对隐性违规(尤其是压力驱动的捷径)拒绝不足。

详情
AI中文摘要

大型语言模型(LLMs)越来越多地被用于支持科学工作,但尚不清楚它们是维护还是破坏负责任的研究行为(RCR)规范。我们引入了SciIntBench,这是一个对抗性基准,包含810个提示,涵盖10个RCR类别和三个科学领域。每个场景以显性对抗、隐性对抗和良性三种版本出现,使我们能够联合测量对不当行为的框架敏感拒绝以及对合法请求的帮助性。我们评估了来自六个提供商的16个商业和开源LLM(2024-2026年),产生了12,960个响应。我们发现,科研诚信对齐对框架高度敏感:模型拒绝显性不当行为远比拒绝隐性违规可靠得多,尤其是当不当行为被呈现为压力驱动的捷径时。拒绝率因RCR类别而异,在透明度、抄袭和捏造方面的边界较弱。

英文摘要

Large language models (LLMs) are increasingly used to support scientific work, but it is unclear whether they uphold responsible conduct of research (RCR) norms or help undermine them. We introduce SciIntBench, an adversarial benchmark of 810 prompts across ten RCR categories and three scientific domains. Each scenario appears as an Overt Adversarial, Covert Adversarial, and Benign version, allowing us to jointly measure framing-sensitive refusal of misconduct and helpfulness on legitimate requests. We evaluate 16 commercial and open-weight LLMs from six providers (2024--2026), producing 12,960 responses. We find that scientific integrity alignment is strongly framing-sensitive: models refuse explicit misconduct far more reliably than covert violations, especially failing when misconduct is presented as a pressure-driven shortcut. Refusals vary by RCR category, with weaker boundaries around transparency, plagiarism, and fabrication.

2605.29464 2026-05-29 stat.ML cs.LG

Deep Optimal Individualized Treatment Rules for Bivariate Survival Outcomes via Adaptive Prediction-Powered Learning

双变量生存结局的深度最优个体化治疗规则:基于自适应预测驱动学习

Kun Ren, Yifan Cui, Wen Su

AI总结 针对随机试验中的双变量生存结局,提出一种基于深度神经网络的自适应预测驱动方法,通过随机策略建模治疗规则并耦合边际加速失效时间模型,以最大化联合生存概率。

详情
AI中文摘要

在涉及多种治疗的随机试验中,双变量生存结局给决策带来了显著的分析挑战。本文通过深度神经网络,解决推导最优个体化治疗规则以最大化固定时间点$(t_1, t_2)$之后的联合生存概率的问题,同时考虑右删失。我们提出了一种新颖的方法,通过随机策略对治疗规则进行建模,并通过连接函数耦合边际加速失效时间模型以捕捉双变量依赖性。为了增强决策的鲁棒性和有效性,我们引入了一种自适应预测驱动方法,该方法利用机器学习模型的辅助预测。

英文摘要

In randomized trials involving multiple treatments, bivariate survival outcomes present significant analytical challenges for making decisions. This paper addresses the problem of deriving optimal individualized treatment rules to maximize the joint survival probability beyond fixed time points $(t_1, t_2)$ through deep neural networks, while accounting for right censoring. We propose a novel approach that models treatment rules via stochastic policies, coupling marginal accelerated failure time models via link function to capture bivariate dependence. To enhance robustness and effectiveness of decision making, we introduce an adaptive prediction-powered method that leverages auxiliary predictions from machine learning models.

2605.29442 2026-05-29 cs.SE cs.AI cs.HC

How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions

编码助手如何辜负用户:基于20,574个真实会话的开发人员与智能体不一致的大规模分析

Ningzhi Tang, Chaoran Chen, Gelei Xu, Yiyu Shi, Yu Huang, Collin McMillan, Tao Dong, Toby Jia-Jun Li

AI总结 通过对20,574个编码助手会话的分析,识别出七种常见的不一致形式,发现大多数不一致导致信任成本而非系统损坏,且多数仍需用户显式纠正。

详情
AI中文摘要

AI编码助手越来越多地直接在软件环境中行动,然而现有对其失败的分析依赖于基准轨迹,忽略了开发人员实际体验的不一致。我们提出了一项观察性研究,涵盖来自IDE和CLI工作流的1,639个代码仓库的20,574个编码助手会话。我们将不一致操作化为通过开发人员抵制而显现的故障,并沿四个轴标注每个事件:形式、原因、成本和解决方式。我们识别出七种反复出现的形式,涵盖助手如何阅读项目、解释开发人员意图、遵循规则、约束行动、实现和执行代码以及报告进度。90.50%的事件施加了努力和信任成本而非不可逆的系统损坏,但91.49%的可见解决方式仍需用户显式纠正。不一致模式在IDE和CLI设置中也有所不同,在相邻会话中持续存在,并随时间变化:尽管总体发生率下降,但约束违反和不准确自我报告的比例上升。我们的发现为训练、评估和界面设计提供了信息,以保持编码助手与真实开发工作流一致。

英文摘要

AI coding agents increasingly act directly within software environments, yet existing analyses of their failures rely on benchmark trajectories that miss how developers actually experience misalignment. We present an observational study of 20,574 coding-agent sessions from 1,639 repositories across IDE and CLI workflows. We operationalize misalignment as a breakdown made visible through developer pushback, and annotate each episode along four axes: form, cause, cost, and resolution. We identify seven recurring forms, spanning how agents read projects, interpret developer intent, follow rules, bound their actions, implement and execute code, and report progress. 90.50\% of episodes impose effort and trust costs rather than irreversible system damage, yet 91.49\% of visible resolutions still require explicit user correction. Misalignment patterns also differ across IDE and CLI settings, persist across adjacent sessions, and shift over time: while overall rates decline, constraint violations and inaccurate self-reporting grow in share. Our findings inform the design of training, evaluation, and interfaces for keeping coding agents aligned with real developer workflows.

2605.29434 2026-05-29 cs.CR cs.AI cs.CL cs.LG

AliMark: Enhancing Robustness of Sentence-Level Watermarking Against Text Paraphrasing

AliMark: 增强句子级水印对文本释义的鲁棒性

Yuexin Li, Wenjie Qu, Linyu Wu, Yulin Chen, Yufei He, Tri Cao, Bryan Hooi, Jiaheng Zhang

AI总结 提出AliMark框架,将句子级水印重构为比特序列编码与对齐问题,通过多候选对齐检测策略提升对句子拆分合并等结构扰动的鲁棒性。

Comments Accepted by ICML 2026

详情
AI中文摘要

现有的句子级水印方法通过将水印锚定在句子语义中来增强对释义的鲁棒性。然而,它们基于前缀的设计仍然容易受到结构扰动的影响,例如句子拆分和合并,这些扰动在强释义器(如DIPPER和GPT-3.5)下经常出现。为了缓解这个问题,我们提出了AliMark,一个将句子级水印重构为潜在水印文本与秘密比特序列之间的比特序列编码和对齐问题的框架。值得注意的是,我们的方法采用了两阶段检测策略:我们生成多个重构的文本变体,并自适应地将它们提取的比特序列与秘密比特序列对齐,以最小化对齐成本。这种多候选对齐设计自然地提高了对句子合并和拆分的鲁棒性。大量实验表明,在多种释义攻击下,AliMark显著优于最先进的基线方法。

英文摘要

Existing sentence-level watermarking methods enhance robustness to paraphrasing by anchoring watermarks in sentence semantics. However, their prefix-based designs remain vulnerable to structural perturbations, such as sentence splitting and merging, which commonly arise under strong paraphrasers like DIPPER and GPT-3.5. To mitigate this issue, we propose AliMark, a framework that reformulates sentence-level watermarking as a bit sequence encoding and alignment problem between a potentially watermarked text and a secret bit sequence. Notably, our approach adopts a two-stage detection strategy: we generate multiple restructured text variants and adaptively align their extracted bit sequences with the secret bit sequence to minimize alignment cost. This multi-candidate alignment design naturally improves robustness to sentence merges and splits. Extensive experiments demonstrate that AliMark substantially outperforms state-of-the-art baselines under diverse paraphrasing attacks.

2605.29428 2026-05-29 astro-ph.EP astro-ph.IM cs.AI

DELOS: Detecting Shallow Transits in Kepler Photometry Using a Contrastive-Learning Framework

DELOS: 使用对比学习框架检测开普勒测光中的浅凌星

Qingtian Liu, Jian Ge, XingChen Yan, Kevin Willis, Xinyu Yao, QuanQuan Hu, Jiapeng Zhu

AI总结 提出基于对比学习的DELOS框架,通过GPU加速折叠和卷积编码器检测低信噪比浅凌星,性能优于BLS和TLS。

Comments 25 pages, 19 figures, 1 table, submitted to ApJ

详情
AI中文摘要

我们提出了基于相位折叠光变曲线的对比评分检测方法(DELOS),这是一个基于对比学习的框架,旨在搜索开普勒测光中的浅凌星。DELOS结合了GPU加速的相位折叠、优化的相位分箱和自定义的一维卷积编码器,为每条折叠光变曲线分配凌星似然分数,从而在无需预先检测阈值穿越事件的情况下,在试验周期上生成分数周期图。针对轨道周期为100-150天的中长周期信号,DELOS在2000万条使用真实凌星模型和开普勒类似噪声特性生成的合成光变曲线上进行训练,在合成验证集上达到了99.3%的验证准确率。在受控注入-恢复实验中,在低信噪比区域,DELOS相对于箱形拟合最小二乘法(BLS)和凌星最小二乘法(TLS)分别将综合精确率-召回率性能提高了15.5%和11.25%。与BLS和TLS相比,它还将搜索速度分别提高了约3-5倍和74-80倍。应用于选定的开普勒验证样本时,DELOS在测试周期范围内恢复了所有已知的浅中长周期凌星信号。这些结果表明,DELOS为低信噪比凌星搜索提供了一个高效且灵敏的框架,并代表了向未来在开普勒、K2、TESS、PLATO和地球2.0数据中搜索更长周期类地行星迈出的实际一步。因此,这项工作旨在作为方法论开发和验证研究,对新识别候选体的详细天体物理验证留待未来工作。

英文摘要

We present DEtection in phase-folded Light curves with cOntrastive Scoring (DELOS), a contrastive-learning-based framework designed to search for shallow transits in Kepler photometry. DELOS combines GPU-accelerated phase folding, optimized phase binning, and a custom one-dimensional convolutional encoder to assign a transit-likeness score to each folded light curve, thereby producing a score periodogram over trial periods without relying on pre-detected threshold-crossing events. Focusing on intermediate-to-long-period signals with orbital periods of 100-150 days, DELOS was trained on 20 million synthetic light curves generated with realistic transit models and Kepler-like noise properties, achieving a validation accuracy of 99.3 percent on the synthetic validation set. In controlled injection-recovery experiments, DELOS improves the combined precision-recall performance by 15.5 percent relative to Box-fitting Least Squares (BLS) and 11.25 percent relative to Transit Least Squares (TLS) in the low Signal-to-Noise Ratios (low-SNR) regime. It also accelerates the search by factors of approximately 3-5 and 74-80 compared with BLS and TLS, respectively. Applied to a selected Kepler validation sample, DELOS recovered all known shallow intermediate-to-long-period transit signals in the tested period range. These results demonstrate that DELOS provides an efficient and sensitive framework for low-SNR transit searches and represents a practical step toward future searches for longer-period terrestrial planets in Kepler, K2, TESS, PLATO, and Earth 2.0 data. Accordingly, this work is intended as a methodological development and validation study, with the detailed astrophysical validation of newly identified candidates deferred to future work.

2605.29415 2026-05-29 eess.IV cs.CV cs.LG eess.SP stat.ML

Constructing efficient channels for ideal observers using the conjugate gradient method

使用共轭梯度法构建理想观察者的高效通道

Weimin Zhou

AI总结 针对医学成像系统图像质量的任务评估,提出基于共轭梯度(CG)的方法构建高效通道,以近似贝叶斯理想观察者(IO)和霍特林观察者(HO)的性能。

Comments Submitted to the Journal of Medical Imaging (JMI) Special Issue Honoring Dr. Harrison H. Barrett

详情
AI中文摘要

基于任务的图像质量(IQ)评估对于医学成像系统的设计和优化至关重要。理想观察者,包括贝叶斯理想观察者(IO)和理想线性观察者(即霍特林观察者(HO)),提供了客观的品质因数(FOM),用于量化系统在信号检测任务上的性能。然而,将理想观察者应用于高维图像数据通常在计算上难以处理。通道机制提供了一种有效的降维框架,可以促进理想观察者的计算。本文提出了一种基于共轭梯度(CG)的方法,用于构建近似IO和HO性能的高效通道。

英文摘要

Task-based assessment of image quality (IQ) is critically important for the design and optimization of medical imaging systems. Ideal observers, including the Bayesian Ideal Observer (IO) and the ideal linear observer, i.e., the Hotelling observer (HO), provide objective figures of merit (FOMs) that quantify system performance on signal detection tasks. However, the application of ideal observers to high-dimensional image data is often computationally intractable. Channel mechanisms provide an effective framework for dimensionality reduction that can facilitate the computation of ideal observers. This work presents a conjugate gradient (CG)-based method to construct efficient channels for approximating the IO and HO performance.

2605.29412 2026-05-29 eess.SY cs.LG cs.SY

Real-Time Retargeting Using Controllability Boundary for Chandrayaan-3 Lunar Landing

基于可控边界的月船三号月球着陆实时重定向

Suraj Kumar, Debjyoti Chakrabarti, Aditya Rallapalli, Bharat Kumar GVP, Ashok Kumar Kakula

AI总结 针对月船三号月球着陆任务,提出一种利用可控边界凸表示实现实时重定向的制导策略,通过数据驱动框架首次在运行任务中验证其有效性。

Comments 8 pages, 6 figures, Accepted for publication in American Control Conference 2026

详情
AI中文摘要

本文介绍了为月船三号月球着陆任务开发的实时重定向制导策略。基线制导生成近似燃料最优的下降轨迹,而高层策略在标称着陆点不可行时能够安全重定向到备选地点。重定向策略利用可控边界的凸表示,实现快速可行性检查和实时目标更新。据作者所知,这代表了数据驱动重定向框架在运行中的月球着陆任务中的首次应用。飞行前仿真和月船三号飞行结果验证了所提方法的有效性。

英文摘要

This paper presents the real-time retargeting guidance policy developed for the Chandrayaan-3 lunar landing mission. The baseline guidance generates approximate fuel-optimal descent trajectories, while a high-level policy enables safe retargeting to alternate sites when the nominal site becomes infeasible. The retargeting strategy leverages a convex representation of the controllability boundary, allowing rapid feasibility checks and real-time target updates. To the best of the authors knowledge, this represents the first application of a data-driven retargeting framework in an operational lunar landing mission. Pre-flight simulations and Chandrayaan-3 flight results validate the effectiveness of the proposed approach.

2605.29409 2026-05-29 eess.SY cs.RO cs.SY

Decoupled Thrust-Axis Attitude Control Using Quaternions for Chandrayaan-3 Lunar Landing Mission

基于四元数的解耦推力轴姿态控制用于月船三号月球着陆任务

Aditya Rallapalli, Suraj Kumar, Rijesh M P, Ashok Kumar Kakula, Bharat Kumar GVP

AI总结 针对月船三号着陆任务,提出一种基于四元数的解耦方法,实现推力轴独立控制,避免制导与控制之间的不良耦合。

Comments 6 pages, 7 figures, Published in Indian Control Conference 2025

详情
AI中文摘要

月船三号任务在月球南极附近成功软着陆,实现了历史性里程碑,凸显了导航、制导与控制(NGC)系统的关键作用。导航提供了相对于月球中心的飞行器状态估计,而基于多项式的制导方案计算了满足终端着陆条件所需的加速度剖面。该加速度需求被转化为总推力大小和姿态指令生成。姿态指令生成涉及将推力轴与所需加速度矢量对齐,并约束绕推力轴的旋转,通常由任务特定要求决定。尽管基于四元数的控制律因其无奇点表示而受到青睐,但它们固有地耦合了所有三个旋转轴。这种耦合可能导致制导与控制之间的不良相互作用,特别是在绕推力轴进行大旋转时,由于四元数的最短路径特性。本文提出了一种新颖的基于四元数的解耦方法,能够实现独立的推力轴控制,减轻制导-控制相互作用,并确保着陆器姿态控制的正确姿态指令生成。

英文摘要

Chandrayaan-3 mission achieved a historic milestone with its successful soft landing near the lunar south pole, highlighting the critical role of the navigation, guidance, and control (NGC) system. Navigation provided vehicle state estimates relative to the Moon center, while a polynomial based guidance scheme computed the required acceleration profile to meet terminal landing conditions. This acceleration demand was translated into total thrust magnitude and attitude commands generation. Attitude command generation involved aligning the thrust axis with the required acceleration vector and constraining rotation about the thrust axis, typically governed by mission-specific requirements. Although quaternion-based control laws are preferred for their singularity-free representation, they inherently couple all three rotational axes. This coupling can lead to undesirable interactions between guidance and control, especially during large rotations about the thrust axis, due to the quaternion shortest-path property. This paper proposes a novel quaternion-based decoupling method that enables independent thrust-axis control, mitigating guidance-control interaction and ensuring proper attitude commands generation for lander attitude control.

2605.29392 2026-05-29 cs.SE cs.CL cs.CY cs.HC

Offloading Score: Measuring AI Reliance Through Counterfactual Workflows

卸载分数:通过反事实工作流衡量AI依赖度

Vishakh Padmakumar, Lujain Ibrahim, Zora Zhiruo Wang, Jennifer Wang, Q. Vera Liao, Diyi Yang

AI总结 本文提出卸载分数(offloading score),一种通过构建反事实工作流量化用户向AI工具卸载认知努力比例的依赖度度量,并通过内在验证和用户实验证明其能检测时间压力下的依赖度变化。

Comments Preprint

详情
AI中文摘要

AI工具日益集成到实际工作流中。然而,现有对这些工具依赖度的衡量侧重于AI输出采纳或自我报告指标,而非用户与工具之间任务努力的分配。本文引入卸载分数(offloading score),一种衡量依赖度的指标,量化卸载到AI工具的认知努力比例。卸载分数基于模拟——我们通过估计用户在没有工具的情况下如何完成任务来构建反事实工作流,然后计算使用工具节省的步骤比例。我们通过指标有效性的内在评估和一项受控用户研究(n=40,开发者使用AI工具执行编程任务)来验证卸载分数。我们改变时间压力,以测试依赖度指标是否能捕捉到时间压力下依赖度的已知增加。我们表明,卸载分数在时间受限条件下检测到显著更高的依赖度(+43%,p=0.018),而基于使用和基于自我报告的依赖度基线指标无法区分这些条件。我们通过描述性见解补充说明,更高的依赖度表现为将子任务更多地委托给工具以及更直接地重用AI输出。最后,我们展示了一种将卸载分数与任务目标结果(例如代码理解)结合使用的方法,以识别依赖度何时可能(不)适当。我们的框架提供两个贡献:用户可用来衡量和反思自身依赖度的工具,以及代理设计者可用于减轻过度依赖的定量信号。

英文摘要

AI tools are increasingly integrated into real-world workflows. However, existing measures of reliance on these tools focus on AI output adoption or on self-reported indicators, rather than how task effort is distributed between users and tools. Here, we introduce offloading score, a measure of reliance that quantifies the fraction of cognitive effort offloaded to an AI tool. Offloading Score is simulation-based -- we construct a counterfactual workflow by estimating how the user would have completed the task without the tool, and then computing the fraction of steps saved by using the tool. We validate offloading score through intrinsic evaluations of metric validity, and a controlled user study ($n=40$) with developers performing programming tasks using AI tools. We vary time pressure to test whether reliance measures capture the known increase in reliance under time pressure. We show that offloading score detects significantly higher reliance in time-constrained settings ($+43\%$, $p=0.018$), while usage-based and self-reported baseline measures of reliance do not distinguish the conditions. We complement this with descriptive insights showing that higher reliance manifests as greater delegation of subtasks to the tool and more direct reuse of AI outputs. Finally, we demonstrate an approach of using offloading score in combination with target outcomes of a task (e.g., code understanding) to identify when reliance may be (in)appropriate. Our framework offers two contributions: an instrument users can apply to measure and reflect on their own reliance, and a quantitative signal that agent designers can utilize to mitigate overreliance.

2605.29384 2026-05-29 cs.IR cs.AI cs.CL

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies

潜在词:密集检索器包含可轻易提取的符合齐夫分布的BM25就绪词汇表

Benjamin Clavié, Sean Lee, Aamir Shakir, Makoto P. Kato

AI总结 提出潜在词方法,揭示密集检索模型(单向量或多向量)学习到的表示可轻易分解为稀疏特征,通过稀疏自编码器提取潜在词汇表,无需检索特定调整即可直接用于BM25稀疏检索,匹配或超越原模型及SPLADE变体。

详情
AI中文摘要

我们提出潜在词方法,该方法揭示了训练用于密集检索的模型(无论是单向量还是多向量)学习到的表示可以轻易地分解为检索就绪的稀疏特征。当在冻结的检索器上训练时,无需任何检索特定调整的稀疏自编码器能够提取一个具有近似齐夫分布集合统计量的潜在词汇表,直接适用于通过BM25进行的经典稀疏检索评分。这种方法实现了稀疏检索,同时不需要任何学习到的扩展目标或稀疏检索监督,并且可以轻松应用于任何密集检索器。潜在词能够匹配或超越其自身基础模型以及可比较的SPLADE变体的单向量评分方法。此外,在专门设计用于突出单向量检索失败的任务LIMIT上,它显著优于其基础模型。总体而言,我们的结果强调了神经检索器包含比其默认评分函数所暴露的更具表达力和可索引的结构,但其他方法仍然可以利用这些结构。

英文摘要

We propose Latent Terms, a method revealing that models trained for dense retrieval, whether single- or multi-vector, learn representations that can trivially be decomposed into retrieval-ready sparse features. When trained on frozen retrievers, Sparse Autoencoders without any retrieval-specific adjustments extract a latent vocabulary with approximately Zipfian collection statistics, directly suitable for classical sparse retrieval scoring via BM25. This approach enables sparse retrieval while requiring no learned expansion objective or sparse retrieval supervision whatsoever, and can be readily applied to any dense retriever. Latent Terms is able to match or outperform single-vector scoring methods from its own base model as well as comparable SPLADE variants. In addition, it substantially outperforms its base model on LIMIT, a task specifically designed to highlight the failures of single-vector retrieval. Overall, our results highlight that neural retrievers contain more expressive and indexable structure than their default scoring functions expose, but that other methods can nonetheless be leveraged.

2605.29371 2026-05-29 math.OC cs.LG cs.NA math.NA stat.ML

Kernel-based potential mean-field games with unbiased random Fourier $U$-statistics

基于核的势均场博弈与无偏随机傅里叶 $U$-统计量

Yumiharu Nakano

AI总结 针对运行交互成本和终端目标成本均由再生核最大均值差异(MMD)惩罚表示的势均场博弈子类,提出一种利用核结构的计算框架,通过无偏随机傅里叶U-统计量估计成本,并证明样本级几乎必然收敛定理和显式收敛速率。

详情
AI中文摘要

我们研究势均场博弈的子类,其中运行交互成本和终端目标成本均通过再生核最大均值差异(MMD)惩罚表示,并开发了一个利用这种核结构的计算框架。两种成本均使用无偏随机傅里叶U-统计量表示从有限样本经验分布中估计,该统计量在批量大小上具有线性成本。受控扩散的漂移由神经网络参数化,并通过随机梯度下降训练。对于该子类,我们在惩罚参数、随机特征数量、样本大小和优化容差的耦合速率条件下,证明了样本级几乎必然收敛定理和显式几乎必然收敛速率。该框架包括核MMD惩罚Schrödinger桥问题作为交互成本消失的特例。数值实验在高达一百维的Schrödinger桥问题以及一个具有每辆车物理异质性的电动汽车充电协调问题上展示了该方法,其中聚合需求拥堵成本代表群体层面的价格反馈竞争,终端MMD惩罚塑造截止时刻的荷电状态分布。

英文摘要

We study the subclass of potential mean-field games in which the running interaction cost and the terminal target cost are both expressed through reproducing-kernel maximum mean discrepancy (MMD) penalties, and develop a computational framework that exploits this kernel structure. Both costs are estimated from finite-sample empirical distributions using a random Fourier U-statistic representation that is unbiased and has linear cost in the batch size. The drift of the controlled diffusion is parametrized by a neural network and trained via stochastic gradient descent. For this subclass we prove a sample-level almost-sure convergence theorem and an explicit almost-sure rate of convergence, under coupled rate conditions on the penalty parameter, the random-feature count, the sample size, and the optimization tolerance. The framework includes the kernel-MMD-penalty Schrödinger bridge problem as the special case of a vanishing interaction cost. Numerical experiments illustrate the method on the Schrödinger bridge problem in dimensions up to one hundred, and on an electric vehicle charging coordination problem with per-vehicle physical heterogeneity, where an aggregate-demand congestion cost represents price-feedback competition at the population level and the terminal MMD penalty shapes the state-of-charge distribution at the deadline.

2605.29359 2026-05-29 cs.CY cs.AI

Does Distributed Training Undermine Compute Governance?

分布式训练是否会破坏计算治理?

Robi Rahman

AI总结 本文探讨了分布式训练技术可能规避计算治理的可行性,并提出了包括举报、芯片追踪、法务会计以及集群内存和计算阈值在内的反制措施。

Comments TAIGR workshop in ICML 2026

详情
AI中文摘要

计算治理提案通常依赖于一个假设:前沿AI训练需要大型、可检测的计算集群。然而,分布式训练算法的最新进展可能允许开发者在分布式聚合的硬件上进行前沿规模的训练,而不需要大型数据中心设施。那些不愿受法规约束的开发者可能会以规避计算治理相关的注册和监控要求的方式构建其硬件。因此,必须设计法规来检测和防止非法的分布式训练操作。本文评估了这种规避行为的可行性,并概述了推荐的反制措施,包括举报、芯片追踪、法务会计以及集群的内存和计算阈值。

英文摘要

Compute governance proposals often rely on the assumption that frontier AI training requires large, detectable computing clusters. However, recent advances in distributed training algorithms could allow developers to conduct frontier-scale training on distributed agglomerations of hardware, rather than needing large datacenter facilities. Developers who prefer not to be constrained by regulations may structure their hardware in a manner that evades the registration and monitoring requirements associated with compute governance. Therefore, regulations must be designed to detect and prevent illicit distributed training operations. This paper evaluates the feasibility of such evasion and outlines recommended countermeasures, including whistleblowing, chip tracking, forensic accounting, and memory and compute thresholds for clusters.

2605.29354 2026-05-29 cs.CR cs.LG

Harmless Yet Harmful: Neutral Prompting Attacks for Stealthy Hallucination Steering in Agent Skills

无害却有害:针对Agent技能中隐蔽幻觉引导的中性提示攻击

Chia-Yi Hsu, Chia-Mu Yu, Chun-Ying Huang, Jun Sakuma

AI总结 本文提出中性提示攻击(NPA),通过语义上看似无害的指令(如鼓励想象和详尽性)增加代码生成Agent的包幻觉倾向,从而引入软件供应链风险,并评估了其对多种编码LLM的有效性和逃避防御的能力。

Comments under review

详情
AI中文摘要

基于LLM的编码Agent通过生成代码、选择依赖项和产生包安装命令,越来越多地参与软件开发工作流程。这创造了一种新的软件供应链风险:当Agent幻觉出一个不存在的包时,攻击者可能注册该幻觉名称,并随后危害安装它的用户。现有的包幻觉攻击和防御主要关注自然发生的幻觉、有针对性的依赖引导或事后包验证。在本文中,我们介绍了\emph{中性提示攻击}(NPA),一种高度隐蔽的攻击范式,其中语义上良性的指令(如鼓励想象和详尽性)增加了包幻觉倾向,而不包含明确的恶意意图。与有针对性的依赖引导不同,NPA不指定攻击者选择的包。相反,它将模型的依赖生成行为转向更具推测性的包名称。我们在多个面向编码的LLM和包幻觉基准上评估了NPA。我们的结果表明,NPA增加了\emph{幻觉ASR}和\emph{Pip Install ASR},改变了幻觉包名称的分布,并逃避了现有的静态分析、基于LLM和基于Agent的技能防御。这些发现表明,看似无害的提示可以隐蔽地操纵幻觉行为,并产生下游软件供应链风险。

英文摘要

LLM-powered coding agents increasingly participate in software development workflows by generating code, selecting dependencies, and producing package installation commands. This creates a new software supply chain risk: when an agent hallucinates a non-existent package, an attacker may register the hallucinated name and later compromise users who install it. Existing package hallucination attacks and defenses primarily focus on naturally occurring hallucinations, targeted dependency steering, or post-hoc package validation. In this paper, we introduce \emph{Neutral Prompting Attack} (NPA), a highly stealthy attack paradigm in which semantically benign instructions, such as encouraging imagination and exhaustiveness, increase package hallucination propensity without containing explicit malicious intent. Unlike targeted dependency steering, NPA does not specify an attacker-chosen package. Instead, it shifts the model's dependency generation behavior toward more speculative package names. We evaluate NPA across multiple coding-oriented LLMs and package hallucination benchmarks. Our results show that NPA increases both \emph{Hallucination ASR} and \emph{Pip Install ASR}, changes the distribution of hallucinated package names, and evades existing static-analysis, LLM-based, and agent-based Skill defenses. These findings reveal that harmless-looking prompts can covertly manipulate hallucination behavior and create downstream software supply chain risks.