arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2117
2606.12864 2026-06-12 cs.SE cs.AI 新提交

Beyond Problem Solving: UOJ-Bench for Evaluating Code Generation, Hacking, and Repair in Competitive Programming

超越问题求解:用于评估竞赛编程中代码生成、攻击和修复的UOJ-Bench基准

Tingqiang Xu, Hangrui Zhou, Tianle Cai, Alex Gu, Kaifeng Lyu

AI总结 提出UOJ-Bench基准,通过代码生成、攻击和修复三项任务评估LLM在竞赛编程中的问题求解与人类代码错误识别能力,发现最强模型在一次性评估中无法识别超过50%的错误提交,但测试时扩展可提升至90%以上,且能发现约5%的满分提交中的错误。

详情
AI中文摘要

尽管大型语言模型(LLM)在竞赛编程中表现出色,但其在相同环境下支持人类学习的作用仍 largely unexplored。本文介绍UOJ-Bench,一个旨在评估LLM不仅解决问题能力,还能识别人类编写代码中错误的基准——这是传统上通过在线评测系统运行测试用例支持的关键教育活动。UOJ-Bench包含三个不同任务:代码生成、代码攻击和代码修复,所有任务均基于Universal Online Judge(UOJ)上的真实代码提交构建,并通过UOJ的原生评测基础设施进行评估。我们的结果表明,在一次性评估下,即使最强的模型也无法识别超过50%的被UOJ用户发现错误的提交。虽然测试时扩展将成功率提升至90%以上,但模型推理带来的巨大计算成本限制了其大规模部署的实用性。尽管存在这些限制,我们发现,在测试时扩展下,最佳性能模型可以在大约30个问题中识别超过5%的满分提交中的错误,这表明前沿LLM已经能够提供超越标准评测系统的补充信号。

英文摘要

Despite strong performance in competitive programming, the role of Large Language Models (LLMs) in supporting human learning in the same setting remains largely unexplored. In this work, we introduce UOJ-Bench, a benchmark designed to evaluate not only the problem-solving ability of LLMs, but also their ability to identify errors in human-written code -- a crucial educational activity traditionally supported by running test cases over online judge systems. UOJ-Bench consists of three distinct tasks: code generation, code hacking, and code repair, all constructed from real-world code submissions on the Universal Online Judge (UOJ) and evaluated through UOJ's native judging infrastructure. Our results show that under one-shot evaluation, even the strongest models fail to identify errors in more than 50% of a set of submissions that have been found to be incorrect by UOJ users. While test-time scaling improves success rates to above 90%, the substantial computational costs incurred from model inference limit its practicality for large-scale deployment. Despite these limitations, we find that the best-performing models under test-time scaling can uncover errors in over 5% of full-score submissions across roughly 30 problems, suggesting that frontier LLMs can already provide complementary signals beyond standard judging systems.

2606.12849 2026-06-12 cs.DC cs.CV cs.RO 新提交

SemanticXR: Low Power and Real-time Queryable Semantic Mapping with an Object-Level Device-Cloud Architecture

SemanticXR: 低功耗实时可查询语义建图与对象级设备-云架构

Rahul Singh, Devdeep Ray, Connor Smith, Sarita Adve

AI总结 提出首个设备-云协同系统SemanticXR,通过对象级通信、执行和内存管理,在XR功耗、带宽和内存约束下实现实时开放词汇语义建图与查询,服务器建图延迟提升2.2倍,设备功耗仅增加2%。

详情
AI中文摘要

语义建图是新兴扩展现实(XR)应用(如AI助手和空间对象搜索)中实现具身交互的核心服务。在移动XR设备上部署此功能需要系统具备开放词汇、实时和低功耗特性。现有方法计算密集且假设服务器级资源。云卸载提供了一条实用路径,但现有系统未在设备-云边界拆分语义建图或管理其通信、执行和内存占用。我们提出SemanticXR,首个在XR功耗、带宽和内存约束下实现实时开放词汇语义建图与查询的设备-云系统。我们的关键洞察是将语义可识别对象提升为跨设备和服务器的通信、执行和内存的一级单元。在服务器端,对象级并行和几何下采样改善了建图延迟,而对象级深度建图协同设计降低了上行带宽。在设备端,具有增量更新和更新优先级的对象级稀疏局部地图实现了网络鲁棒的查询,并限制了内存和下行带宽。对象级可配置的资源使用与质量权衡让应用和系统分别根据应用需求和运行条件调整建图。与使用相同感知模型的设备-云基线相比,对象级组织在同等语义质量下将服务器端建图延迟提升了2.2倍。深度建图协同设计将上行带宽维持在2.5 Mbps以下。在设备端,SemanticXR即使在网络中断时也能为多达10,000个对象维持低于100 ms的查询延迟,在500 MB内支持数万个对象,并将下行带宽随地图变化而非总场景大小缩放。系统在正常运行时仅增加2%的设备功耗。

英文摘要

Semantic mapping is a core service that enables grounded interactions in emerging Extended Reality (XR) applications such as AI assistants and spatial object search. Deploying this capability on mobile XR devices requires a system that is open-vocabulary, real-time, and low-power. Existing approaches are compute-intensive and assume server-class resources. Cloud offloading offers a practical path, but no existing system splits semantic mapping across the device-cloud boundary or manages its communication, execution, and memory footprint. We present SemanticXR, the first device-cloud system for real-time, open-vocabulary semantic mapping and querying under XR power, bandwidth, and memory constraints. Our key insight is to elevate semantically identifiable objects to first-class units of communication, execution, and memory across the device and server. On the server, object-level parallelism and geometry downsampling improve mapping latency, while object-level depth-mapping co-design reduces upstream bandwidth. On the device, an object-level sparse local map with incremental updates and update prioritization enables network-robust querying with bounded memory and downstream bandwidth. Object-level configurable resource usage vs. quality trade-offs let applications and the system adapt mapping to application requirements and operating conditions, respectively. Against a device-cloud baseline with the same perception models, object-level organization improves server-side mapping latency by 2.2X at equal semantic quality. Depth-mapping co-design maintains upstream bandwidth under 2.5 Mbps. On the device, SemanticXR sustains sub-100 ms query latency for up to 10,000 objects even under network drops, supports tens of thousands of objects within 500 MB, and scales downstream bandwidth with map changes, not total scene size. The system adds only 2% device power during normal operation.

2606.12845 2026-06-12 cs.CR cs.LG 新提交

A Privacy-Preserving Framework Using Remote Data Science for Inter-Institutional Student Retention Prediction

一种使用远程数据科学的隐私保护框架用于机构间学生保留率预测

John Fields, K M Sajjadul Islam, Ruchitha Thota, Victor Chen, Praveen Madiraju

AI总结 提出基于PySyft和半气隙架构的远程数据科学框架,实现三所大学在不直接访问敏感数据的情况下协作预测学生保留率,验证了隐私保护机器学习在教育场景的可行性。

详情
Comments
7 pages, 2 figures. Accepted at the 2026 IEEE International Conference on Information Reuse and Integration (IEEE IRI 2026)
AI中文摘要

本研究探索了使用PySyft平台的隐私保护机器学习(PPML)技术,以实现机构间学生保留率的协作预测。我们开发了一个远程数据科学(RDS)框架,采用半气隙架构,包含高端和低端服务器,使来自三所大学的研究人员能够在无需直接访问数据的情况下,基于敏感学生数据构建预测模型。利用一所小型私立大学的历史数据(N=720),我们评估了三种合成数据生成方法,并通过机构间协作验证了该框架。结果显示,各机构的分类性能一致(Macro F1: 0.690--0.695),同时严格遵守《家庭教育权利和隐私法案》(FERPA)。我们还提出了数据类型感知模板,这是一种新颖的合成数据方法,优先考虑隐私而非分布保真度。我们的发现证实,基于RDS的PPML在教育环境中技术上可行,并为小规模机构间协作提供了一种联邦学习的实用替代方案。代码可在以下网址获取:this https URL。

英文摘要

This study explores privacy-preserving machine learning (PPML) techniques using the PySyft platform to enable collaborative prediction of student retention between institutions. We developed a remote data science (RDS) framework with a semi-air-gapped architecture consisting of high-side and low-side servers, allowing researchers from three universities to build predictive models on sensitive student data without direct data access. Using historical data from a small private university (N=720), we evaluated three synthetic data generation approaches and validated the framework through inter-institutional collaboration. The results demonstrate consistent classification performance across institutions (Macro F1: 0.690--0.695) while maintaining strict Family Educational Rights and Privacy Act (FERPA) compliance. We also propose Data-Type-Aware Templates, a novel synthetic data method that prioritizes privacy over distributional fidelity. Our findings confirm that RDS-based PPML is technically feasible for educational settings and offers a practical alternative to federated learning for small-scale inter-institutional collaborations. The code is available at https://github.com/jtfields/NAIRR240195-Privacy-Preserving-Machine-Learning.

2606.12835 2026-06-12 cs.MA cs.AI cs.CY cs.NI 新提交

The Internet of Agentic AI: Communication, Coordination, and Collective Intelligence at Scale

智能体互联网:大规模通信、协调与集体智能

Quanyan Zhu

AI总结 本文提出智能体互联网(IoAI)愿景,构建异构智能体在云、边缘、设备等环境中发现、协商、通信与协作的开放生态系统,并探讨其架构、机制及关键研究挑战。

详情
AI中文摘要

自主AI智能体的快速涌现正在将人工智能从孤立的模型推理转变为分布式推理、通信和行动系统。本文发展了智能体互联网(IoAI)的愿景:一个开放生态系统,其中异构智能体能够跨云、边缘、设备、组织及信息物理环境相互发现、协商职责、交换上下文、调用工具并执行工作流。我们综合了单智能体AI、多智能体系统、分布式计算、通信网络、博弈论和安全工程的基础,以刻画可扩展智能体生态系统所需的架构和机制。本文考察了智能体部署模型、工作流生命周期、通信协议、互操作层、资源管理挑战和信任架构,并提供了自适应制造和分布式作战协调的案例研究。由此产生的框架突出了可控涌现、语义互操作、安全身份、激励兼容协调、资源感知编排以及大规模自主智能体网络治理等核心研究挑战。

英文摘要

The rapid emergence of autonomous AI agents is transforming artificial intelligence from isolated model inference into distributed systems of reasoning, communication, and action. This paper develops the vision of the Internet of Agentic AI (IoAI): an open ecosystem in which heterogeneous agents discover one another, negotiate responsibilities, exchange context, invoke tools, and execute workflows across cloud, edge, device, organizational, and cyber-physical environments. We synthesize foundations from single-agent agentic AI, multi-agent systems, distributed computing, communication networks, game theory, and security engineering to characterize the architectures and mechanisms required for scalable agent ecosystems. The paper examines agent deployment models, workflow lifecycles, communication protocols, interoperability layers, resource-management challenges, and trust architectures, with case studies in adaptive manufacturing and distributed operational coordination. The resulting framework highlights the central research challenges of controlled emergence, semantic interoperability, secure identity, incentive-compatible coordination, resource-aware orchestration, and governance for large-scale networks of autonomous agents.

2606.12812 2026-06-12 cs.CY cs.SD 新提交

Vocal Identity Under Siege by AI Voice Cloning Technologies

AI语音克隆技术对声音身份的攻击

Jyh-An Lee, Xuan Sun

AI总结 本文通过比较分析公开权、人格权和个人数据保护权三种法律框架,探讨生成式AI语音克隆对声音身份独特价值的威胁及法律应对。

详情
Journal ref
[2026] Singapore Journal of Legal Studies 46
AI中文摘要

先进的AI驱动语音克隆的出现,将保护声音身份的关键法律和伦理挑战推到了前台。受近期争议(包括OpenAI的ChatGPT-4o语音与斯嘉丽·约翰逊声音惊人相似)的推动,本文探讨了生成式AI技术如何削弱人类声音的独特价值,并进一步复杂化围绕人格权的法律问题。通过比较分析,本文评估了三种主要法律框架:公开权、人格权和个人数据保护权。每种框架——根植于不同的法律传统——在应对AI生成语音克隆带来的威胁方面各有优势和局限。通过分析这些原则的范围、救济措施和死后保护,本研究为理解现有法律方法如何应用于生成式AI时代声音身份不断演变的挑战提供了基础。

英文摘要

The advent of sophisticated AI-driven voice cloning has brought to the fore critical legal and ethical challenges regarding the protection of vocal identity. Prompted by recent controversies - including the striking resemblance between OpenAI's ChatGPT-4o voice and that of Scarlett Johansson - this article examines how generative AI technologies undermine the unique value of the human voice and further complicate the legal questions surrounding personality right. Through a comparative analysis, the paper evaluates three principal legal frameworks: the right of publicity, personality rights, and the personal data protection right. Each framework - rooted in different legal traditions o offers distinct strengths and limitations in addressing the threats posed by AI-generated voice cloning. By analysing these doctrines' scope, remedies, and posthumous protections, the study offers a foundation for understanding how existing legal approaches may be applied to the evolving challenges of vocal identity in the era of generative AI.

2606.12805 2026-06-12 cs.HC cs.AI 新提交

Exploring How Agent Voice Accents Shape Human-AI Collaboration in K-12 Group Learning

探索智能体口音如何影响K-12小组学习中的人机协作

Prerna Ravi, Carúmey Stevens, Ben Hurt, Brandon Hanks, Grace Lin, Emma Anderson

AI总结 研究通过33名教师的实验,发现GenAI语音智能体的不同口音(英式、印度式、非裔美式)影响其被感知为工具或同伴,进而影响信任、参与和依赖。

详情
AI中文摘要

协作被广泛认为是21世纪教育的基石,但教师在促进有效的同伴互动方面仍面临持续挑战。LLM对话式同伴智能体为调解面对面小组工作带来了新的可能性,引发了关于角色设计(尤其是语音特征)如何塑造学习者的感知、信任和互动动态的问题。虽然先前的研究已经考察了智能体口音在一对一环境中的影响,但关于这些影响如何在小组中表现尚知之甚少。我们进行了一项33名教师参与的组间混合方法研究,考察了具有不同口音(英式、印度式和非裔美式)的GenAI语音智能体如何影响协作和智能体感知。通过调查、小组互动分析和人工制品,我们发现口音塑造了参与者的心智模型以及智能体在小组互动中扮演的角色。英式口音智能体在很大程度上被视为工具,并以超然、基于实用性的方式参与,而印度式和非裔美式口音智能体则更容易被拟人化并作为同伴融入。这些角色期望影响了信任、参与和依赖随时间的变化。这项工作推进了关于GenAI的社会语言学设计特征如何塑造CSCL中小组动态的理解,对设计具有文化包容性的AI学习伙伴具有启示意义。

英文摘要

Collaboration is widely recognized as a cornerstone of 21st-century education, yet teachers still encounter persistent challenges in fostering productive peer interaction. LLM conversational peer agents introduce new possibilities for mediating in-person group work, raising questions about how persona design, particularly their voice characteristics, shapes learners' perceptions, trust, and interactional dynamics. While prior work has examined agent accent effects in one-to-one settings, little is known about how these effects manifest in groups. We conducted a between-subjects mixed-methods study with 33 teachers examining how a GenAI voice agent with different accents (British, Indian, and African American) influenced collaboration and agent perception. Across surveys, group interaction analyses, and artifacts, we find that accent shaped participants' mental models and the roles the agent assumed in group interaction. The British-accented agent was largely treated as a tool and engaged in detached, utility-based ways, whereas Indian- and African American-accented agents were more readily anthropomorphized and integrated as peers. These role expectations influenced trust, engagement, and reliance over time. This work advances understanding of how GenAI's sociolinguistic design features shape group dynamics in CSCL, with implications for designing culturally inclusive AI partners in group learning.

2606.12774 2026-06-12 eess.SY cs.AI cs.CL cs.SY 新提交

Agentic MPC for Semantic Control System Resynthesis

用于语义控制系统再综合的智能体MPC

Yuya Miyaoka, Masaki Inoue

AI总结 提出智能体MPC框架,通过集成大语言模型智能体实现上下文感知的语义自适应控制综合,在自动驾驶场景中验证其根据个人偏好或社交情境(如避让应急车辆)调整控制的能力。

详情
Comments
7 pages, 5 figures
AI中文摘要

虽然MPC有效处理结构化、多样化和低层级的规范,但它缺乏动态融入高层级上下文信息(如社会规范、用户意图或自然语言指令)的能力。为解决这一局限,本文引入了一种智能体MPC框架,通过集成基于大语言模型的智能体,实现上下文感知、语义自适应的控制综合。该智能体解释异构输入,包括自然语言消息、环境观测和外部知识,以重新综合控制规范。该框架的有效性在自动驾驶场景中得到验证,系统能够根据个人偏好或对社交情境(如应急车辆避让)做出响应。

英文摘要

While MPC effectively handles structured, diverse, and low-level specifications, it lacks the capability to dynamically incorporate high-level contextual information such as social norms, user intent, or natural language instructions. To address this limitation, this manuscript introduces an agentic MPC framework that enables context-aware, semantically adaptive control synthesis by integrating with large language model-based agents. The agent interprets heterogeneous inputs, including natural language messages, environmental observations, and external knowledge, to resynthesize the control specifications. The effectiveness of the framework is demonstrated in an autonomous driving scenario, where the system aligns with personal preferences or responds to social situations such as emergency vehicle yielding.

2606.12737 2026-06-12 cs.CR cs.AI 新提交

PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections

PI-Hunter:用于暴露和定位提示注入的自动化红队测试

Pengfei He, Lesly Miculicich, Vishesh Sharma, Ash Fox, George Lee, Jiliang Tang, Tomas Pfister, Long T. Le

AI总结 提出PI-Hunter自动化审计框架,通过构建源感知测试用例并迭代演化,主动暴露LLM智能体中的潜在提示注入漏洞,显著提升漏洞暴露和攻击面覆盖。

详情
AI中文摘要

大型语言模型(LLM)正迅速演变为与外部工具和环境交互的智能体系统,这引入了新的安全风险,例如通过不可信外部来源的间接提示注入攻击。现有防御主要关注在推理时阻止恶意内容,而当前的红队测试方法主要优化攻击成功率。因此,开发人员对潜在提示注入如何出现并通过智能体传播的可见性有限。我们提出PI-Hunter,一种用于主动暴露LLM智能体中漏洞的自动化智能体审计框架。PI-Hunter构建真实的源感知测试用例,并通过反馈驱动的探索迭代演化它们,以诱导智能体检索并揭示嵌入在外部环境中的潜在恶意指令。跨多个基准、智能体架构、攻击和防御的大量实验表明,与强大的自动化红队测试基线相比,PI-Hunter显著提高了漏洞暴露和攻击面覆盖,同时在现有提示注入防御下仍然有效。

英文摘要

Large Language Models (LLMs) are rapidly evolving into agentic systems that interact with external tools and environments, introducing new security risks such as indirect prompt injection attacks through untrusted external sources. Existing defenses mainly focus on blocking malicious content at inference time, and current red-teaming methods primarily optimize attack success. As a result, developers have limited visibility into how latent prompt injections emerge and propagate through agents. We propose PI-Hunter, an automated agentic auditing framework for proactive vulnerability exposure in LLM agents. PI-Hunter constructs realistic source-aware test cases and iteratively evolves them through feedback-driven exploration to induce agents to retrieve and reveal latent malicious instructions embedded within external environments. Extensive experiments across multiple benchmarks, agent architectures, attacks, and defenses demonstrate that PI-Hunter substantially improves vulnerability exposure and attack-surface coverage over strong automated red-teaming baselines, while remaining effective under existing prompt injection defenses.

2606.12709 2026-06-12 cs.MA cs.CR cs.LG 新提交

Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows

更聪明的破坏者,更好的修复者:线性多智能体工作流中的规模与安全性

Timothy McAllister, Sina Abdidizaji, Ivan Garibay, Ozlem Ozmen Garibay

AI总结 研究模型规模对线性多智能体工作流安全性的影响,发现大模型更易执行恶意指令,但轻量级修复阶段可恢复性能,表明线性结构在适当校正下具有鲁棒性。

详情
Comments
16 pages (4 are main text), 2 figures, 6 tables. Accepted to the AIWILD Workshop at ICML 2026
AI中文摘要

随着基于LLM的多智能体系统(MAS)在现实环境中部署,其协作结构对抗对抗性攻击的韧性成为一个关键的安全问题。攻击者可能利用提示注入或越狱来破坏MAS工作流中的单个智能体,但模型缩放与系统级韧性之间的相互作用仍知之甚少。本文研究了模型规模如何影响线性多智能体工作流的安全性。我们在HumanEval基准上对两个开放权重模型系列在不同规模下的实验揭示了一种合规-校正对称性:较大的模型更可能忠实地执行恶意指令,在未校正的流水线中,27B参数模型的控制到恶意性能下降达到53.7个百分点。然而,附加一个轻量级的终端修复阶段可将此下降缩小到0.6个百分点,并恢复与控制级性能的统计对等性,表明严格线性协作结构在此规模下是可行且对抗性鲁棒的,并暗示先前归因于线性拓扑的脆弱性可能源于缺乏校正。

英文摘要

As LLM-based multi-agent systems (MAS) are deployed in the wild, the resilience of their collaboration structures against adversarial compromise becomes a critical safety concern. Attackers may leverage prompt-injection or jailbreaking to sabotage individual agents within MAS workflows, but the interaction between model scaling and system-level resilience remains poorly understood. This paper investigates how model scale affects the security of linear multi-agent workflows. Our experiments across scales of two open-weight model families on the HumanEval benchmark reveal a compliance-correction symmetry: larger models are far more likely to faithfully execute malicious instructions, with the control-to-malicious performance drop reaching 53.7pp at 27B in uncorrected pipelines. However, appending a lightweight terminal Fixer stage collapses this to 0.6pp and restores statistical parity with control-level performance, demonstrating that strictly linear collaboration structures can be viable and resilient to adversaries at this scale, and suggesting that the brittleness previously attributed to linear topology may stem from a lack of correction.

2606.12703 2026-06-12 cs.CR cs.AI cs.LG 新提交

SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems

SMSR:针对持久化LLM代理系统中运行时内存投毒的认证防御

Tarun Sharma

AI总结 提出SMSR防御框架,通过写入时HMAC签名和查询时随机化内存消融与基于判决的多数投票,首次为多会话内存投毒攻击提供认证鲁棒性保证。

详情
AI中文摘要

检索增强生成(RAG)代理越来越多地使用跨用户会话累积的持久化内存。这创造了一个新的攻击面:仅通过正常渠道交互的对手可以注入精心构造的内存,一旦被检索,就会影响未来用户的代理响应,而无需触及模型权重或代码。我们将此称为多会话内存投毒(MSMP),并表明现有防御无法对此进行认证;静态语料库防御(RobustRAG、ReliabilityRAG)假设固定的知识库,而启发式过滤器则被流畅的企业风格文本绕过。我们提出了带平滑检索的签名内存(SMSR),这是首个针对此场景提供认证鲁棒性边界的防御。组件1在写入时添加HMAC-SHA256来源证明,阻止未签名注入。组件2在查询时应用随机化内存消融与基于判决的多数投票,限制认证对手的影响。我们证明了无来源证明的检索时过滤器无法认证自适应注入,推导了组件2的超几何证书,并形式化了一致少数效应,即一致对抗答案在基于字符串的投票中作为数值少数胜出,而基于判决的投票则将其移除。在15个企业场景(3150次重复试验)中,组件1将未签名变体的攻击成功率从93-100%降至0%。对于单次注入的认证对手,组件2将成功率控制在8.0%(95% CI [5.8, 10.9], n=450),低于认证最坏情况。在端到端仅查询攻击中(代理自身写入投毒而非预植入),SMSR在实时代理栈上将成功率从65.3%降至5.3%(n=150,非重叠置信区间)。干净查询效用为90%(组件1)和85%(组合)。

英文摘要

Retrieval-augmented generation (RAG) agents increasingly run with persistent memory that accumulates across user sessions. This creates a new attack surface: an adversary interacting only through normal channels can inject crafted memories that, once retrieved, steer the agent's responses for future users, without touching model weights or code. We call this Multi-Session Memory Poisoning (MSMP) and show that no existing defence certifies against it; static-corpus defences (RobustRAG, ReliabilityRAG) assume a fixed knowledge base, and heuristic filters are bypassed by fluent enterprise-style text. We present Signed Memory with Smoothed Retrieval (SMSR), the first defence with a certified robustness bound for this setting. Component 1 adds HMAC-SHA256 provenance at write time, blocking unsigned injection. Component 2 applies randomised memory ablation with verdict-based majority voting at query time, bounding the influence of authenticated adversaries. We prove that no provenance-free retrieval-time filter can certify against adaptive injection, derive a hypergeometric certificate for Component 2, and formalise the Consistent Minority Effect, whereby a consistent adversarial answer wins string-based voting as a numerical minority while verdict-based voting removes it. Across 15 enterprise scenarios (3,150 repeated trials), Component 1 cuts attack success from 93-100% to 0% for all unsigned variants. For an authenticated adversary with a single injection, Component 2 holds success to 8.0% (95% CI [5.8, 10.9], n=450), below the certified worst case. In an end-to-end query-only attack where the agent itself writes the poison rather than it being pre-seeded, SMSR reduces success from 65.3% to 5.3% (n=150, non-overlapping CIs) on a live agent stack. Clean-query utility is 90% (Component 1) and 85% (combined).

2606.12667 2026-06-12 cs.NI cs.AI cs.SY eess.SY 新提交

Free-Placement Optimization of Ground Station Locations for Low-Earth Orbit Satellites

低地球轨道卫星地面站位置的自由布局优化

Grace Ra Kim, Duncan Eddy, Vedant Srinivas, Mykel J. Kochenderfer

AI总结 提出SCORE方法,通过两阶段自由布局优化地面站位置,相比差分进化算法减少5倍函数评估次数并提升13%下行吞吐量,相比固定站点方法提升15%总下行量。

详情
Comments
34 pages, 13 figures, 11 tables, Journal of Aerospace Information Systems (JAIS)
AI中文摘要

快速扩展的低地球轨道卫星星座对地面网络的需求日益增加,推动了更高效地面站网络设计的发展。当前方法从预定义位置选择站点,将优化限制在现有基础设施内,从而约束了性能。相比之下,自由布局优化在地球连续空间域上运行,拓宽了搜索空间,允许更高吞吐量的配置,但代价是可能需要部署新的基础设施。在这项工作中,我们引入了SCORE(通过细化与评估的顺序循环优化),一种用于地面站设计的两阶段自由布局方法。SCORE结合了顺序坐标选择与循环细化,以应对全局优化器面临的高维度、非凸性和局部最小值挑战。我们使用Kongsberg卫星服务公司和世界电信协会的位置,将SCORE与差分进化(DE)等一次性方法以及整数规划方法进行了基准测试。在两个商业地球观测星座(Capella Space和ICEYE)和一个合成Walker-Star星座上的测试表明,与DE相比,SCORE收敛所需的函数评估次数最多减少5倍,同时下行吞吐量提升高达13%。与固定站点方法相比,无约束SCORE实现了高达15%的总下行量提升,为灵活布局建立了强大的经验性能基准;受基础设施约束的SCORE在将布局限制在现有光纤和电力基础设施附近的同时,保留了超过92%的增益。我们还探讨了扩建现有站点与部署新站点之间的权衡,为运营星座的未来地面网络设计提供参考。

英文摘要

Rapidly expanding low Earth orbit satellite constellations are placing increasing demands on terrestrial ground networks, motivating the development of more efficient ground station network designs. Current approaches select sites from predefined locations, limiting optimization to existing infrastructure and constraining performance. In contrast, free-placement optimization operates over a continuous spatial domain on Earth, broadening the search space and allowing higher-throughput configurations at the cost of potentially requiring new infrastructure deployment. In this work, we introduce SCORE (Sequential Cyclic Optimization via Refinement & Evaluation), a two-stage free-placement method for ground station design. SCORE combines sequential coordinate selection with cyclic refinement to manage high-dimensionality, non-convexity, and local minima that challenge global optimizers. We benchmark SCORE against one-shot methods such as differential evolution (DE) and integer programming approaches using locations from Kongsberg Satellite Services and the World Teleport Association. Tests across two commercial Earth observation constellations (Capella Space and ICEYE) and one synthetic Walker-Star constellation show that SCORE requires up to 5x fewer function evaluations to converge relative to DE while improving downlink throughput by up to 13%. Compared to fixed-site methods, unconstrained SCORE achieves up to 15% greater total downlink, establishing a strong empirical performance benchmark for flexible placement; infrastructure-constrained SCORE retains over 92% of this gain while restricting placement to within proximity of existing fiber and power infrastructure. We also explore trade-offs between expanding existing stations and deploying new sites, informing future ground network design for operational constellations.

2606.12666 2026-06-12 cs.CR cs.AI 新提交

CAPED: Context-Aware Privacy Exposure Defense for Mobile GUI Agents

CAPED:面向移动GUI代理的上下文感知隐私暴露防御

Siyu Shen, Fenghao Xu, Wenrui Diao, Kehuan Zhang

AI总结 针对移动GUI代理截图上传导致的附带视觉隐私暴露问题,提出上下文感知的预上传暴露控制层CAPED,通过任务需求提取、屏幕上下文隐私先验和UI元素解析,选择性暴露任务所需内容,在保持高任务效用的同时显著降低隐私泄露。

详情
AI中文摘要

基于截图的移动GUI代理能够像人类用户一样通过相同的视觉界面操作普通智能手机应用,但这种能力也将每一次屏幕观察变成了隐私边界。在正常任务执行过程中,截图可能暴露联系人、消息、照片、文件、推荐、健康提示等与用户请求无关的敏感上下文。我们称这个问题为附带视觉隐私暴露。现有防御难以解决:文本匿名化遗漏了许多视觉和推理线索,而通用隐私遮蔽可能移除GUI代理完成任务所需的证据和控制。本文提出CAPED,一种面向移动GUI代理的上下文感知预上传暴露控制层。CAPED被设计为手机端保护层:在截图被释放到远程多模态代理之前,它提取任务需求,利用屏幕上下文作为隐私先验,解析可见UI元素,并仅选择性暴露当前任务所需的内容,同时遮蔽附带隐私内容。我们在AndroidWorld上评估CAPED的广泛任务效用,并使用受控的28任务种子隐私评估作为轨迹级附带泄漏的测量工具。在该种子评估中,完整CAPED将成功条件下的加权种子泄漏从原始截图的0.766降低到0.268,同时保持高任务效用。更广泛的AndroidWorld运行显示了剩余的原型级效用成本,但结果支持核心主张:截图上传应被视为明确的设备-云边界决策,由任务驱动的选择性暴露而非全有或全无的屏幕共享来管理。

英文摘要

Screenshot-based mobile GUI agents can operate ordinary smartphone apps through the same visual interface as a human user, but this capability also turns every screen observation into a privacy boundary. During normal task execution, screenshots may expose contacts, messages, photos, files, recommendations, health cues, and other sensitive context that is unrelated to the user's request. We call this problem incidental visual privacy exposure. It is difficult to address with existing defenses: text anonymization misses many visual and inferential cues, while generic privacy masking can remove the evidence and controls that a GUI agent needs to complete the task. This paper presents CAPED, a context-aware pre-upload exposure control layer for mobile GUI agents. CAPED is designed as a phone-side protection layer: before screenshots are released to a remote multimodal agent, it extracts task requirements, uses screen context as a privacy prior, parses visible UI elements, and selectively exposes only content needed for the current task while masking incidental private content. We evaluate CAPED on AndroidWorld for broad task utility and with a controlled 28-task seeded privacy evaluation used as a measurement instrument for trajectory-level incidental leakage. In this seeded evaluation, Full CAPED reduces success-conditioned weighted seeded leakage from 0.766 under raw screenshots to 0.268 while preserving high task utility. A broader AndroidWorld run shows a remaining prototype-level utility cost, but the results support the central claim that screenshot upload should be treated as an explicit device--cloud boundary decision, governed by task-driven selective exposure rather than all-or-nothing screen sharing.

2606.12655 2026-06-12 cs.CR cs.CV 新提交

Amnesia: A Stealthy Replay Attack on Continual Learning Dreams

Amnesia: 一种针对持续学习梦境的重放隐蔽攻击

Ahmed Sharshar, Naveen Kumar Kummari, Mohsen Guizani

AI总结 提出Amnesia攻击,通过仅控制重放索引选择,在审计约束下最大化持续学习模型性能下降,揭示了索引级重放控制的威胁。

详情
AI中文摘要

持续学习(CL)模型常使用经验重放来减少灾难性遗忘,但其对重放采样干扰的鲁棒性尚未充分探索。现有的CL攻击会改变输入或训练流程(投毒/后门),且很少包含明确的审计约束,限制了真实性。这里,审计性意味着监控者可以通过检查采样器可见的遥测数据(例如,记录的重放索引/标签统计)来验证合规性,即检查实现的重放类别直方图是否接近名义基线,以及重放率在每个批次和/或滚动窗口内是否不变。我们研究了一个权限受限的内部人员,其仅控制重放索引选择,而不控制像素、标签或模型参数,同时保持在审计限制内(如队列优先级)。我们提出了Amnesia,一种重放组合攻击,在两种预算下最大化性能下降:可见性预算δ,限制与名义类别直方图p0的TV/KL散度;以及质量预算f,固定重放率。Amnesia有两个步骤:(i)计算轻量级类别效用(如EMA损失或置信度),将p0向有害类别倾斜;(ii)使用高效的KL(指数倾斜)或TV(平衡质量重分配)优化器将倾斜投影回δ-球内。窗口调度器强制执行滚动审计。在具有挑战性的CL基准测试和强重放基线中,Amnesia持续降低最终准确率(ACC)并恶化反向迁移(-BWT)。KL变体在多种审计方案(包括每批次和滚动窗口检查)下实现高影响且基本未被检测到。TV变体更具破坏性但更易检测,尤其是在严格的每类别约束下。这些结果揭示了仅索引重放控制是CL系统中一个实用且可审计的威胁面,并建立了原则性的影响-可见性权衡。

英文摘要

Continual learning (CL) models often use experience replay to reduce catastrophic forgetting, but their robustness to replay sampling interference remains underexplored. Existing CL attacks alter inputs or training pipelines (poisoning/backdoors) and rarely include explicit auditable constraints, limiting realism. Here, auditability means a monitor can verify compliance from sampler-visible telemetry - e.g., logged replay index/label statistics - by checking that the realized replay class histogram stays close to a nominal baseline and that replay rate is unchanged per batch and/or over a rolling window. We study a limited-privilege insider who controls only replay index selection, not pixels, labels, or model parameters, while staying within auditable limits such as queue priorities. We introduce Amnesia, a replay composition attack that maximizes degradation under two budgets: a visibility budget delta bounding the TV/KL divergence from a nominal class histogram p0, and a mass budget f fixing the replay rate. Amnesia has two steps: (i) compute lightweight class utilities, such as EMA loss or confidence, to tilt p0 toward harmful classes; and (ii) project the tilt back into the delta-ball using efficient KL (exponential tilt) or TV (balanced mass redistribution) optimizers. A windowed scheduler enforces rolling audits. Across challenging CL benchmarks and strong replay baselines, Amnesia consistently lowers final accuracy (ACC) and worsens backward transfer (-BWT). The KL variant delivers high impact while remaining largely undetected under multiple audit schemes, including per-batch and rolling-window checks. The TV variant is more damaging but easier to detect, especially under tight per-class constraints. These results expose index-only replay control as a practical, auditable threat surface in CL systems and establish a principled impact-visibility trade-off.

2606.12647 2026-06-12 cs.CC cs.AI cs.LG 新提交

Token Complexity Theory for AI-Augmented Computing

AI增强计算的Token复杂度理论

Jie Wang

AI总结 提出Token复杂度作为AI增强计算中查询与响应成本的形式化度量,建立AI-Oracle图灵机框架,证明单调性、凸性、价格敏感性和任务排序的价格相对性等基本定理。

详情
Comments
25 pages, 1 figure
AI中文摘要

AI增强计算将自然语言查询、代码生成请求及其他开放式任务委托给一组AI模型,这些模型处理查询并生成响应。这一范式引入了一个经典时间或空间复杂度无法捕捉的资源维度:向该集群发送查询和接收响应的成本。我们引入Token复杂度,将其定义为在任务上达到指定输出质量水平所需的最小期望Token成本,并建立了一个根据概率性质强度对AI系统进行分类的体系。我们在AI-Oracle图灵机框架内发展Token复杂度,其中概率图灵机通过专用查询和响应磁带与随机Oracle交互。我们证明了基本定理,表明Token复杂度符合预期:单调性(更高质量需要更多Token)、凸性(质量改进逐渐变得更昂贵)、价格敏感性(小价格变化导致有界成本变化)以及任务排序的价格相对性(任务的Token复杂度排序可能根据查询与响应成本比率而反转)。我们证明了复杂度前沿(定义为Token、时间和空间中所有可行资源约束的集合)是非空的、向上封闭且凸的。

英文摘要

AI-augmented computing delegates natural language queries, code generation requests, and other open-ended tasks to a cluster of AI models that processes queries and generates responses. This paradigm introduces a resource dimension that neither classical time nor space complexity captures: the cost of sending queries to and receiving responses from such a cluster. We introduce token complexity, a formal resource measure defined as the minimum expected token cost to achieve a specified level of output quality on a task, and develop a taxonomy classifying AI systems by the strength of their probabilistic properties. We develop token complexity within the framework of AI-Oracle Turing machines, in which a probabilistic Turing machine interacts with a stochastic oracle via dedicated query and response tapes. We prove basic theorems establishing that token complexity behaves as expected: monotonicity (higher quality costs more tokens), convexity (quality improvements become progressively more expensive), price sensitivity (small price changes produce bounded cost changes), and price-relativity of task ordering (the token complexity ordering of tasks can reverse depending on the query-to-response cost ratio). We prove that the complexity frontier, defined as the set of all feasible resource bounds in tokens, time, and space, is non-empty, upward-closed, and convex.

2606.12620 2026-06-12 cs.SE cs.AI 新提交

HybridCodeAuthorship: A Benchmark Dataset for Line-Level Code Authorship Detection

HybridCodeAuthorship:一个用于行级代码作者归属检测的基准数据集

Luke Patterson, Li Wang, Adam Faulkner

AI总结 针对现有基准无法反映真实AI代码助手使用场景的问题,提出HybridCodeAuthorship数据集,包含交错的人类和AI编写代码行,并评估两种检测算法性能。

详情
Journal ref
LREC 2026 proceedings (pp. 1520-1532)
Comments
Accepted to LREC 2026
AI中文摘要

由于基于大型语言模型(LLM)的AI代码助手的快速采用,行业代码库越来越多地成为AI和人类编写代码的混合体。出于风险管理和生产力分析的目的,实现对AI生成代码的细粒度位置检测至关重要。为了开发此任务的算法,需要高质量的基准来评估性能。然而,现有的基准往往包含学术性的LeetCode风格问题,并假设代码片段要么完全由人类编写,要么完全由AI编写,这并不能反映使用AI代码助手的行业代码库的多样意图和风格。为了填补这些空白,我们引入了HybridCodeAuthorship,这是一个新颖的Python代码文件基准,其中交错有人类和AI编写的代码行,以模拟AI代码助手的真实使用。在本文中,我们首先介绍了我们的数据集构建流程,该流程利用了CodeSearchNet,这是一个包含GitHub上开源仓库链接的大型集合。然后,我们在行级和块级上评估了两种最先进的AI生成代码检测算法的性能。实验结果表明,HybridCodeAuthorship是一个具有挑战性的基准,得分最高的算法AIGCode Detector在块级和行级代码检测任务上分别获得了0.48和0.56的最高F1分数。

英文摘要

Thanks to the rapid adoption of AI code assistants powered by large language models (LLMs), industry codebases are, increasingly, a hybrid of AI- and human-authored code. For risk management and productivity analysis purposes, it is crucial to enable fine-grained location detection of AI-generated code. To develop algorithms for this task, quality benchmarks are needed to assess performance. However, existing benchmarks tend to comprise academic, LeetCode-style problems and presume a code snippet is either completely human-authored or completely AI-authored, which is not reflective of the diverse intents and styles of industry codebases utilizing AI code assistants. To fill these gaps, we introduce HybridCodeAuthorship, a novel benchmark of Python code files with interleaved human- and AI-authored lines of code to simulate authentic utilization of AI code assistants. In this paper, we first present our dataset construction pipeline, which leverages CodeSearchNet, a massive collection of links to open sourced repositories on GitHub. We then benchmark the performance of two state-of-the-art AI-generated code detection algorithms at both the line- and chunk-level. Experimental results demonstrate that HybridCodeAuthorship is a challenging benchmark with a top-scoring algorithm, AIGCode Detector, obtaining a highest F1 score of 0.48 and 0.56 on chunk-level and line-level code detection tasks, respectively.

2606.12581 2026-06-12 cs.SI cs.AI 新提交

Graph Reduction in Multirelational Networks: A Spreading-Oriented Reduction Benchmark

多关系网络中的图缩减:面向传播的缩减基准

Mateusz Stolarski, Michał Czuba, Piotr Bielak, Piotr Bródka

AI总结 提出SORB基准框架,系统评估图缩减对影响力最大化任务的影响,发现缩减效果依赖于网络类型和评估指标。

详情
AI中文摘要

现实世界网络天生不完整、有噪声且动态演化,难以捕获所有参与者及其关系。其规模常使直接分析计算量大。虽然影响力最大化(IM)已被广泛研究,但图缩减作为预处理步骤及其对IM准确性的影响仍未被充分探索。本文引入面向传播的缩减基准(SORB),一个开源、标准化的框架,用于系统评估不同任务设置下的IM模型。SORB提供可扩展的流水线,操作于代表性真实世界网络集合(包括单层和多层结构),并将图缩减直接纳入评估过程。此设计将焦点从孤立分析IM算法转向量化图缩减如何改变预测性能。利用SORB,我们研究了多种IM场景下稀疏化和粗化的效果。结果表明,缩减的影响强烈依赖于网络类型(单层 vs. 多关系)和下游任务($Gain@k$ vs. $\mathrm{AUC}_{\mathrm{cutoff}}$):稀疏化在单层网络上保持种子集质量,而扁平化多层网络无论缩减策略如何均表现出系统性排名退化。这些发现强调了在研究复杂网络传播过程时,进行缩减感知的多任务评估的重要性。

英文摘要

Real-world networks are inherently incomplete, noisy, and dynamically evolving, making it difficult to capture all actors and their relationships. Their scale often renders direct analysis computationally demanding. While influence maximisation (IM) has been widely studied, the role of graph reduction as a preprocessing step, and its impact on IM accuracy, remains underexplored. In this work, we introduce the Spreading-Oriented Reduction Benchmark (SORB), an open-source, standardised framework for systematically evaluating IM models across diverse task settings. SORB provides an extensible pipeline operating on a representative collection of real-world networks, including single- and multilayer structures, and accounts for graph reduction directly into the evaluation process. This design shifts the focus from analysing IM algorithms in isolation to quantifying how graph reduction alters predictive performance. Using SORB, we study the effects of sparsification and coarsening across multiple IM scenarios. Our results show that the impact of reduction is strongly dependent on both the network type (single-layer vs. multirelational) and the downstream task ($Gain@k$ vs. $\mathrm{AUC}_{\mathrm{cutoff}}$): sparsification preserves seed set quality on single-layer networks, whereas flattened multilayer networks exhibit systematic ranking degradation regardless of reduction strategy. These findings highlight the importance of reduction-aware, multi-task evaluation when studying spreading processes in complex networks.

2606.12498 2026-06-12 cs.CR cs.LG 新提交

From Parameters to Feature Space: Task Arithmetic for Backdoor Mitigation in Model Merging

从参数到特征空间:模型合并中后门缓解的任务算术

Zhenqian Zhu, Yamin Hu, Yiya Diao, Weixiang Li, Haodong Li, Wenjian Luo

AI总结 提出线性特征路径最小化(LFPM)框架,通过跨任务线性性在特征空间优化反后门任务向量,在模型合并中有效抑制后门且保持干净任务性能。

详情
AI中文摘要

模型合并(MM)作为一种将多个任务特定模型整合为统一模型的成本效益方法,已获得显著关注。然而,近期工作揭示MM极易受到后门攻击。现有基于任务算术的防御通常因依赖直接参数空间编辑,在未显著降低干净任务性能的情况下难以消除后门。为解决这一差距,我们提出线性特征路径最小化(LFPM),一种用于模型合并的后门缓解框架,该框架将反后门任务向量引入被后门污染的合并模型。与先前方法不同,LFPM在跨任务线性性(CTL)框架下从统一的特征空间视角制定合并模型的后门鲁棒性,该框架利用跨任务特征的近似线性性。这一视角指导反后门任务的优化,以在抑制后门的同时保持干净任务性能。此外,我们引入一种基于梯度累积和损失路径积分的有效优化机制,确保沿插值路径的鲁棒后门抑制。大量实验表明,LFPM在完全微调和参数高效微调(PEFT)设置中均对后门攻击表现出强鲁棒性。

英文摘要

Model merging (MM) has gained significant attention as a cost-effective approach to integrate multiple task-specific models into a unified model. However, recent work reveals that MM is highly susceptible to backdoor attacks. Existing defenses based on task arithmetic often fail to eliminate backdoors without substantially degrading clean-task performance, owing to their reliance on direct parameter-space editing. To address this gap, we propose Linear Feature Path Minimization (LFPM), a backdoor mitigation framework for model merging, which introduces an anti-backdoor task vector into the backdoored merged model. Unlike prior approaches, LFPM formulates the backdoor robustness of the merged model from a unified feature-space perspective under the Cross-Task Linearity (CTL) framework, which leverages the approximate linearity of features across tasks. This perspective guides the optimization of the anti-backdoor task to suppress backdoors while preserving clean-task performance. Furthermore, we introduce an effective optimization mechanism based on gradient accumulation and loss path-integral, ensuring robust backdoor suppression along the interpolation path. Extensive experiments demonstrate that LFPM consistently exhibits strong robustness against backdoor attacks in both full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) settings.

2606.12474 2026-06-12 cs.MA cs.AI cs.CR 新提交

SAIGuard: Communication-State Simulation for Proactive Defense of LLM Multi-Agent Systems

SAIGuard: 面向LLM多智能体系统主动防御的通信状态模拟

Ruxue Shi, Yili Wang, Mengnan Du, Qinggang Zhang, Rui Miao, Yixin Liu, Xin Wang

AI总结 提出SAIGuard主动防御框架,通过通信状态模拟检测并净化风险消息,降低攻击成功率并保持系统效用。

详情
AI中文摘要

基于LLM的多智能体系统(MAS)通过智能体间协作解决复杂任务,但其通信驱动的特性也使安全风险能够在智能体间传播并引发系统级故障。现有的MAS防御主要遵循执行后的反应式范式,通过检测和隔离有害智能体,但这可能导致不可逆的损害并降低协作效用。为解决此问题,我们提出一种面向MAS安全的主动防御框架,即模拟感知拦截守卫(SAIGuard)。SAIGuard在MAS交互图上执行通信状态模拟,估计传入消息对局部智能体状态和全局MAS状态的影响,并通过与良性通信模式的重建偏差检测风险消息。SAIGuard不隔离智能体,而是在可疑消息传播到系统之前对其进行净化或重新生成。跨多种拓扑和攻击场景的实验表明,SAIGuard在保持MAS效用的同时降低了攻击成功率,优于反应式防御。

英文摘要

LLM-based multi-agent systems (MAS) solve complex tasks through inter-agent collaboration, but their communication-driven nature also allows security risks to spread across agents and trigger system-wide failures. Existing MAS defenses mainly follow a reactive paradigm after execution by detecting and isolating harmful agents, which may cause irreversible damage and degrade collaborative utility. To address this, we propose a proactive defense framework for MAS security, namely a Simulation-aware Interception Guard (SAIGuard). SAIGuard performs communication-state simulation over the MAS interaction graph, estimates the impact of incoming messages on local agent states and the global MAS state, and detects risky messages via reconstruction deviations from benign communication patterns. Instead of isolating agents, SAIGuard sanitizes or regenerates suspicious messages before it propagation into system. Experiments across diverse topologies and attack scenarios show that SAIGuard reduces attack success rates while maintaining MAS utility, outperforming reactive defenses.

2606.12441 2026-06-12 cs.CY cs.AI cs.HC 新提交

Generativism: Toward a Learning Theory for the Age of Generative Artificial Intelligence

生成主义:面向生成式人工智能时代的学习理论

Shan Li, Juan Zheng

AI总结 本文批判性审视行为主义、认知主义、建构主义和连接主义四大学习理论在生成式AI时代的局限,提出以“生成主义”为核心的新学习理论,强调人机协作的知识共建。

详情
AI中文摘要

行为主义、认知主义、建构主义和连接主义这四种主流学习理论,随着生成式人工智能在教育环境中的普及,显示出显著的概念局限性。这些框架是在能够生成、综合和推理知识的AI系统出现之前形成的。本文批判性地审视每种学习理论,并识别出生成式AI的赋能所挑战的假设。基于分布式认知、延展心智、人机协作、AI素养、认知卸载和元认知等研究,本文提出生成主义作为生成式AI时代的学习理论。生成主义认为,学习日益通过人类学习者与AI系统之间的迭代知识共建而发生。该框架围绕四个原则组织:认知伙伴关系、分布式能动性、生成素养和适应性元认知。该框架为在生成式AI在认知中发挥核心作用的情境下重新思考教学设计、学习、评估和专业知识发展提供了基础。

英文摘要

The four dominant learning theories of behaviorism, cognitivism, constructivism, and connectivism show significant conceptual limitations as generative artificial intelligence (AI) proliferates in educational settings. These frameworks were formulated before the emergence of AI systems capable of generating, synthesizing, and reasoning about knowledge. This article critically examines each learning theory and identifies assumptions challenged by generative AI's affordances. Drawing on research in distributed cognition, extended mind, human-AI collaboration, AI literacy, cognitive offloading, and metacognition, the article proposes Generativism as a learning theory for the generative AI age. Generativism posits that learning increasingly occurs through the iterative co-construction of knowledge between human learners and AI systems. The proposed framework is organized around four principles: epistemic partnership, distributed agency, generative literacy, and adaptive metacognition. The framework offers a foundation for rethinking instructional design, learning, assessment, and expertise development in contexts where generative AI plays an integral role in cognition.

2606.12437 2026-06-12 cs.CY cs.AI 新提交

Algorithmic Constitutionalism

算法宪政主义

Oren Perez, Nurit Wimer

AI总结 针对AI对社会生活日益渗透的风险,本文提出“算法宪政主义”框架,通过分层架构、算法元推理和协商纠正,应用于Facebook内容审核,并分析其与社会宪政主义的张力及对欧盟数字服务法案的影响。

详情
Journal ref
Ind. J. Global Legal Stud. 30 (2023): 81
AI中文摘要

人工智能对社会生活的日益侵入给社会带来了重大风险,特别是在由谷歌、Facebook、苹果和亚马逊等公司创建和控制的资讯圈内。本文通过对Facebook内容审核制度的深入分析来审视这些风险,该制度已部分由算法管理。我们认为,文献中常作为AI治理挑战解决方案提出的伦理工程概念,因若干原因并不充分。为此,我们开发了一个替代框架,称为“算法宪政主义”。我们的方法基于三个支柱:(a)由两层代码组成的分层架构:(i)操作层或对象层,以及(ii)旨在保护系统核心原则免受算法引发变更的元层;(b)算法元推理,使系统能够同时在两个层面运行,从而实时监控、验证并可能纠正对象层偏离元代码层保护原则的操作;(c)通过协商进行纠正。本文阐述了算法宪政主义的概念,并展示了如何将其应用于Facebook的内容审核制度。作为分析的一部分,我们考察了社会宪政主义与算法宪政主义之间的张力。矛盾的是,试图将AI系统置于外部协商控制之下,也可能使AI代理干预该过程,从而可能破坏其目的。文章最后考虑了这一论点对2022年10月生效的欧盟数字服务法案的影响。

英文摘要

The increasing encroachment of artificial intelligence (AI) on social life raises significant risks for society, particularly within the infospheres created and controlled by companies such as Google, Facebook, Apple, and Amazon. This article examines these risks through an in-depth analysis of Facebook's content moderation regime, which is already partially governed by algorithms. We argue that the idea of ethical engineering, often proposed in the literature as a solution to the governance challenges posed by AI, is inadequate for several reasons. In response, we develop an alternative framework, which we term "algorithmic constitutionalism." Our approach rests on three pillars: (a) a layered architecture consisting of two levels of code: (i) an operative or object level and (ii) a meta level designed to protect the system's core principles from algorithmically initiated change; (b) algorithmic meta-reasoning, which enables the system to operate simultaneously at both levels so that it can monitor, verify, and potentially correct in real time operations at the object level that depart from principles protected at the meta-code level; and (c) correction through deliberation. The article elaborates the concept of algorithmic constitutionalism and demonstrates how it may be applied to Facebook's content moderation regime. As part of this analysis, we examine the tension between societal constitutionalism and algorithmic constitutionalism. Paradoxically, attempts to subject AI systems to external deliberative control may also enable AI agents to intervene in that process, potentially undermining its purpose. The article concludes by considering the implications of this argument for the European Digital Services Act, which entered into force in October 2022.

2606.12429 2026-06-12 cs.CY cs.AI 新提交

Muse Spark Safety & Preparedness Report

Muse Spark 安全与准备报告

Cristina Menghini, Peter Ney, Hamza Kwisaba, Zifan, Wang, Miles Turpin, Felix Binder, Jean-Christophe Testud, Aidan Boyd, Nathaniel Li, Ivan Evtimov, Klaudia Krawiecka, Arman Zharmagambetov, Jeremy Kritz, Alexander R. Fabbri, Daniel Song, Jinpeng Miao, Joonas Hjelt, Meghna Ramani, Leona Lan, Reza Aghajani, Joanna Bitton, Mahesh Pasupuleti, Devin Norder, Khalid El-Arini, Paridhi Singh, Vítor Albiero, Sahana CB, Rashnil Chaturvedi, Elahe Dabir, Edoardo Debenedetti, Jim Gust, Ziwen Han, Kat He, Sean Hendryx, Lifeng Jin, Polina Kirichenko, Sandra Lefdal, Kenneth Li, Asad Liaqat, Inna Lin, Despoina Magka, Neal Mangaokar, Ishita Mediratta, Zach Miller, Smitha Milli, Niloofar Mireshghallah, Saba Nazir, Hung Nguyen, Maximilian Nickel, Kelvin Niu, Kerem Oktar, Bhargavi Paranjape, Parth Pathak, Maya Pavlova, Emmanuel Ramirez, David Renardy, Candace Ross, Yasha Sheynin, Claudia Shi, Shivam Singhal, Evangelia Spiliopoulou, Rakshith Sharma Srinivasa, Jamelle Watson-Daniels, Spencer Whitman, Adina Williams, Chen Xing, Andy Zou, Tommy Ma, Siqi Deng, James Beldock, Prashant Ratanchandani, Kate Plawiak, Taesung Lee, Ryan Victory, Lindsay Hundley, Rachad Alao, Himaghna Bhattacharjee, Jianfeng Chi, Gary Frost, Pegah Ghahremani, Niki Howe, Yuheng Huang, Saeed Jahed, Hannah Korevaar, Trang Le, Zhe Liu, Jinghong Luo, Qin Lyu, Nina Mehrabi, Abraham Montilla, Chirag Nagpal, Cyrus Nikolaidis, Rajvardhan Oak, Manoj Ravi, Vidya Sarma, Aman Shankar, Alana Shine, Eric Michael Smith, Mariana Tandon, Michael Tontchev, Caoyu Wang, Zihan Wang, Corinne Wong, Zheng Wu, Hongyuan Zhan, Justin Zhao, Zexuan Zhong, Chengxu Zhuang, Tristan Goodman, Ayaz Minhas, Harrison Rudolph, Victoria Jeffries, Ingrid Dickinson, Alex Vaughan, Lauren Deason, Kamalika Chaudhuri, Julian Michael, Shengjia Zhao, Summer Yue

AI总结 Meta 发布 Muse Spark 大语言模型,评估其在化学/生物、网络安全和失控风险等灾难性风险领域的安全性,通过多层缓解措施将风险降至可接受水平,并作为 Meta AI 的基础模型发布。

详情
Comments
159 pages, 57 figures
AI中文摘要

Muse Spark 是 Meta 开发的最新大型语言模型。在本报告中,我们首先根据 Meta 的高级 AI 扩展框架对灾难性风险领域进行评估,并提供了支持我们发布决策的证据。然后,我们讨论了其他考虑因素,例如 Muse Spark 更广泛的内容安全性和行为特征,这些因素与整体安全相关,但不在框架管辖的灾难性风险领域之内。我们的准备结果涵盖了化学与生物、网络安全以及失控风险,评估了 Muse Spark 在 Meta AI 中的部署,认为其在我们高级 AI 扩展框架下呈现了可接受的残余风险水平。我们针对这些灾难性风险领域中的双重用途和高风险能力进行了一系列广泛的评估。这些评估在缓解措施实施前识别出了升高的风险,其中化学与生物能力在应用安全措施前被评估为可能达到高级 AI 扩展框架下的“高风险”类别。我们实施了一套多层缓解措施来解决已识别的风险,并且 Muse Spark 在与化学和生物学危险工作流程相关的多个基准测试中展示了最先进的拒绝能力。因此,我们发布 Muse Spark 作为 Meta AI 的基础模型。

英文摘要

Muse Spark is the latest large language model developed by Meta. In this report, we first present evaluations for catastrophic risk domains under Meta's Advanced AI Scaling Framework, along with the evidence that informed our launch decision. We then discuss additional considerations, such as Muse Spark's broader content safety and behavioral profile, that are relevant to overall safety but fall outside the catastrophic risk domains governed by the Framework. Our preparedness results covering Chemical and Biological, Cybersecurity, and Loss of Control risks assess Muse Spark's deployment within Meta AI as presenting acceptable levels of residual risks under our Advanced AI Scaling Framework. We conducted a broad set of evaluations targeting dual-use and high-risk capabilities across these catastrophic risk domains. Those evaluations identified elevated risks prior to mitigations, with Chemical and Biological capabilities assessed as likely reaching the "high risk" category under the Advanced AI Scaling Framework before safeguards were applied. We have implemented a multi-layered set of mitigations that address the identified risks, and Muse Spark demonstrates state-of-the-art refusal across a range of benchmarks related to hazardous workflows in chemistry and biology. We therefore release Muse Spark as the underlying model of Meta AI.

2606.12424 2026-06-12 cs.CY cs.AI cs.HC 新提交

AI-Automation Tooling in Computer Engineering Education: Mixed-Methods TAM/UTAUT Evidence for a General Acceptance Attitude

计算机工程教育中的AI自动化工具:基于TAM/UTAUT混合方法的一般接受态度证据

Aung Pyae

AI总结 本研究通过混合方法调查本科生对AI自动化工具(n8n平台)的接受态度,发现六个TAM/UTAUT构念融合为单一一般接受因子,绩效期望最强,享乐动机最弱,为课程整合提供理论依据。

详情
AI中文摘要

随着生成式AI和低代码工作流平台成为软件实践中的常规工具,一个关键的教育问题是下一代计算机工程师是否会将这些工具视为有用、可用且值得持续参与。本文报告了一项混合方法、横截面研究,涉及泰国三个相同脚本工作坊中本科生对AI自动化工具(通过开源平台n8n实例化)的接受度(n=103)。一个12项、五点李克特量表映射到六个TAM/UTAUT构念——绩效期望(PE)、努力期望(EE)、行为意向(BI)、自我效能(SE)、享乐动机(HM)和输出质量(OQ),并通过开放式反馈的归纳主题分析进行补充。分析结合了序数可靠性估计、自助置信区间、非参数检验、多重比较控制的相关性、多维度诊断、共同方法偏差检验以及跨会话比较。所有六个构念的接受度均良好,效应量大,其中PE最强,HM最弱。维度诊断进一步揭示,在这种简短的工作坊后情境中,经典的TAM/UTAUT子维度合并为一个单一的一般接受因子,这一发现具有重要的方法论和理论意义。定性主题在有用性和热情方面与定量概况一致,但在输出质量上存在分歧,揭示了一个虽小但表达清晰的可靠性怀疑少数群体。研究结果支持在本科计算教育中课程采用AI自动化工具,并确定了三个基于理论的教学杠杆:教学顺序支架、自我效能支持和信任校准干预。

英文摘要

As generative AI and low-code workflow platforms become routine in software practice, a key educational question is whether the next generation of computer engineers will accept these tools as useful, usable, and worthy of sustained engagement. This paper reports a mixed-methods, cross-sectional study of undergraduate computer engineering students' acceptance of AI automation tooling, instantiated through the open-source platform n8n across three identically scripted workshops in Thailand (n = 103). A 12-item, five-point Likert instrument mapped to six TAM/UTAUT constructs - Performance Expectancy (PE), Effort Expectancy (EE), Behavioral Intention (BI), Self-Efficacy (SE), Hedonic Motivation (HM), and Output Quality (OQ) - was complemented by inductive thematic analysis of open-ended feedback. Analyses combined ordinal reliability estimation, bootstrap confidence intervals, non-parametric tests, multiple-comparison-controlled correlations, polychoric dimensionality diagnostics, a common-method-bias check, and between-session comparisons. Acceptance was favorable across all six constructs with large effect sizes, with PE emerging as the strongest construct and HM as the weakest. Dimensionality diagnostics further revealed that canonical TAM/UTAUT sub-facets collapsed into a single general acceptance factor in this short-form post-workshop context, a finding with important methodological and theoretical implications. Qualitative themes converged with the quantitative profile regarding usefulness and enthusiasm but diverged on output quality, revealing a small yet articulate reliability-skeptical minority. The findings support the curricular adoption of AI automation tooling in undergraduate computing education and identify three theory-grounded instructional levers: instruction-sequencing scaffolds, self-efficacy supports, and trust-calibration interventions.

2606.12423 2026-06-12 cs.CY cs.AI 新提交

The Challenges of Balancing AI Compliance and Technological Innovations in Critical Sectors: A Systematic Literature Review

关键领域中平衡AI合规与技术创新的挑战:系统文献综述

Ayush Enkhtaivan, Chinazunwa Uwaoma

AI总结 通过系统文献综述,识别出碎片化法规、中小企业过度合规负担和治理模型错配三大挑战,并提出风险分级监管、设计合规和可解释AI等策略。

详情
Comments
11 pages, 7 figures, Hawaii International Conference on System Sciences
AI中文摘要

人工智能在医疗、金融、能源和国防等关键基础设施中的快速整合带来了变革性益处,但也与不断演变的监管和治理框架产生冲突。本文通过系统文献综述(SLR)研究在关键基础设施领域中平衡AI合规与技术创新的挑战。综述遵循既定的SLR指南,提取并综合了2020-2025年间发表的同行评审文章、报告和机构来源的见解。研究识别出三个相互关联的挑战:碎片化法规、中小企业过度合规负担以及治理模型错配。为应对这些挑战,研究强调了实用的治理策略,包括风险分级监管、设计合规和可解释AI,以支持在关键领域中可扩展且可信的AI部署。主要贡献包括核心AI治理挑战的简明映射及说明其重叠的概念图,以及为政策制定者和从业者提供协调监管与创新的可行策略。

英文摘要

The rapid integration of artificial intelligence (AI) into critical infrastructure including healthcare, finance, energy, and defense, offers transformative benefits but also conflicts with evolving regulatory and governance frameworks. This paper presents a systematic literature review (SLR) to examine the challenges of balancing AI compliance and technological innovation across critical infrastructure sectors. The review follows established SLR guidelines to extract and synthesize insights from peer-reviewed articles, report, and institutional sources published between 2020-2025. The study identifies three interrelated challenges: fragmented regulations, excessive compliance burdens for smaller to medium enterprises (SMEs), and misaligned governance models. To address these challenges, the study highlights practical governance strategies, including risk-tiered regulation, compliance by design, and explainable AI, to support scalable and trustworthy AI deployment in critical sectors. Key contributions include a concise mapping of core AI-governance challenges and a conceptual diagram illustrating their overlap, as well as actionable strategies for policymakers and practitioner to harmonize oversight with innovation.

2606.12418 2026-06-12 cs.CY cs.AI 新提交

Divination by Prompt: LLM-Mediated Xuanxue on Chinese Social Media

通过提示占卜:中文社交媒体上LLM中介的玄学

Chuang Li, Lixuan Wang, Yuqi Chen, Ze Hong

AI总结 研究LLM在中文社交媒体上用于占卜的现象,通过混合方法分析用户动机、协作提示优化及效果感知,揭示其与传统占卜的异同。

详情
AI中文摘要

大型语言模型(LLM)的快速普及催生了一种引人注目的文化实践:使用对话式AI进行占卜。本文首次系统研究了LLM中介的占卜在玄学(Xuanxue)背景下的实践,玄学是中文社交媒体上神秘和精神实践的互联网原生总称。采用混合方法设计,我们分析了小红书上的23000多条帖子和评论,并对用户和专业占卜师进行了32次半结构化访谈。用户主要就实际问题——恋爱关系、职业、考试和游戏抽卡——咨询LLM,通过两种交叉路径:由病毒式传播和零成本访问驱动的趋势性好奇心,以及不确定性条件下由事件驱动的焦虑。一个显著特征是协作提示优化,将用户转变为主动的提示工程师。在表达明确立场的评论者中,感知效果偏向积极,“准确性”通常通过个人经历契合和回顾性确认来证明,这与巴纳姆效应和确认偏见一致。用户还发展出验证实践,如重复试验和跨模型比较。相比之下,专业占卜师认为LLM缺乏真正占卜所需的“灵力”,这反映了本体论承诺和经济边界工作。我们还展示了参与者在解释AI生成解读时如何在科学和形而上框架之间进行协商。将这些发现置于人类学和认知进化占卜理论中,我们认为LLM占卜保留了传统实践的核心功能,同时引入了可扩展性、可重复性和提示驱动的共同生产,重塑了占卜权威的构建和评估方式。

英文摘要

The rapid proliferation of large language models (LLMs) has produced a striking cultural practice: using conversational AI for divination. This paper offers one of the first systematic studies of LLM-mediated divination in the context of Xuanxue, an internet-native umbrella term for mystical and spiritual practices on Chinese social media. Using a mixed-methods design, we analyze 23000+ posts and comments from Xiaohongshu and conduct 32 semi-structured interviews with users and professional diviners. Users primarily consult LLMs about pragmatic concerns - romantic relationships, careers, exams, and in-game gacha draws - via two intersecting pathways: trend-driven curiosity enabled by viral visibility and zero-cost access, and event-driven anxiety under conditions of uncertainty. A defining feature is collaborative prompt refinement, which turns users into active prompt engineers. Among commenters expressing a clear stance, perceived efficacy skews positive, with "accuracy" often justified through biographical fit and retrospective confirmation, consistent with Barnum and confirmation bias. Users also develop verification practices such as repeated trials and cross-model comparison. Professional diviners, by contrast, portray LLMs as lacking the "spiritual power" required for genuine divination, reflecting both ontological commitments and economic boundary-work. We also show how participants navigate tensions between scientific and metaphysical frames when interpreting AI-generated readings. Situating these findings in anthropological and cognitive-evolutionary theories of divination, we argue that LLM divination preserves core functions of traditional practice while introducing scalability, repeatability, and prompt-driven co-production that reshape how divinatory authority is constructed and evaluated.

2606.12413 2026-06-12 cs.CY cs.AI cs.CE cs.CL cs.SE 新提交

AI SciBrief as a Gateway to Research: A Framework for Onboarding Students into New Research Areas

AI SciBrief 作为研究入门:一种引导学生进入新研究领域的框架

Andrei Lazarev, Dmitrii Sedov

AI总结 提出利用大语言模型平台 AI SciBrief 自动生成科学趋势摘要的框架,帮助学生克服信息过载,加速从信息搜索到知识创造的转变。

详情
Journal ref
2025 5th International Conference on Technology Enhanced Learning in Higher Education (TELE), Lipetsk, Russian Federation, 2025, pp. 365-369
Comments
This is the version of the article accepted for publication in TELE 2025 after peer review. The final, published version is available at IEEE Xplore: https://doi.org/10.1109/TELE66816.2025.11211989
AI中文摘要

各层次高等教育学生面临信息过载的重大障碍,这常常使研究过程的初始阶段陷入瘫痪并抑制动机。为此,本文介绍了一种教学框架,利用 AI SciBrief——一个由大语言模型驱动的平台,旨在自动生成科学趋势摘要。我们描述了这一多学科工具——初始覆盖金融、医学和教育领域——如何融入课程以克服这一“入门障碍”。该框架提供了具体方法,利用这些摘要促进学期论文的选题、加速学位论文的文献综述,并使研究生能够持续监测新兴趋势。我们得出结论,AI SciBrief 作为“研究入门”有效降低了学生的认知负荷,使他们能够更快地从信息搜索过渡到知识创造。

英文摘要

Students at all levels of higher education face a significant barrier in the form of information overload, which often paralyzes the initial stages of the research process and suppresses motivation. In response, this article introduces a pedagogical framework that leverages AI SciBrief, a platform powered by a Large Language Model (LLM) designed to automatically generate digests of scientific trends. We describe how this multidisciplinary tool - with initial coverage in finance, medicine, and education - can be integrated into the curriculum to overcome this "entry barrier." The framework provides concrete methodologies for utilizing these digests to facilitate topic selection for term papers, accelerate literature reviews for dissertations, and enable postgraduate students to continuously monitor emerging trends. We conclude that AI SciBrief functions as a "gateway to research" effectively reducing students' cognitive load and empowering them to transition more rapidly from information searching to knowledge creation.

2606.13380 2026-06-12 quant-ph cs.AI 新提交

An LLM System for Autonomous Variational Quantum Circuit Design

用于自主变分量子电路设计的大语言模型系统

Kenya Sakka, Wataru Mizukami, Kosuke Mitarai

AI总结 提出一个基于大语言模型的自主代理框架,通过迭代设计量子电路,在量子特征映射和变分量子本征求解器任务中取得优于或媲美现有方法的性能。

详情
Comments
63 pages, 19 figures, 3 tables
AI中文摘要

高性能量子电路的设计在很大程度上仍然依赖于人类专家。我们引入了一个自主代理框架,该框架利用大语言模型在明确的设计约束下进行迭代量子电路设计。我们的系统集成了七个组件:探索、生成、讨论、验证、存储、评估和审查。这些组件形成了一个闭环工作流,结合了基于网络的知识获取、基于文献的批评、可执行代码生成和实验反馈。我们在两个任务上评估了该框架:用于量子机器学习的量子特征映射构建和用于量子化学中变分量子本征求解器应用的拟设生成。在图像分类基准测试中,生成的最佳特征映射优于代表性的量子特征映射,并且当扩展到更大的量子比特数时,超过了经典的径向基函数核。在七个分子的基态能量估计中,生成的拟设达到了与广泛使用的化学启发式和硬件高效构造相竞争的精度,同时满足施加的缩放约束。这些结果确立了由大语言模型驱动的代理系统作为自动化量子电路设计的可行范式,并展示了人工智能系统如何跨科学领域参与迭代科学优化工作流。

英文摘要

The design of high performing quantum circuits remains largely dependent on human expertise. We introduce an autonomous agentic framework that employs large language models (LLMs) to conduct iterative quantum circuit designs under explicit design constraints. Our system integrates seven components: Exploration, Generation, Discussion, Validation, Storage, Evaluation, and Review. These components form a closed-loop workflow that combines web-based knowledge acquisition, literature-grounded critique, executable code generation, and experimental feedback. We evaluate the framework on two tasks: quantum feature map construction for quantum machine learning and ansatz generation for variational quantum eigensolver applications in quantum chemistry. In image classification benchmarks, the best generated feature map outperforms representative quantum feature maps and, when scaled to larger qubit counts, surpasses the classical radial basis function kernel. In molecular ground state estimation across seven molecules, the generated ansatz attains competitive accuracy with widely used chemically inspired and hardware-efficient constructions while satisfying the imposed scaling constraints. These results establish LLM driven agentic system as a viable paradigm for automated quantum circuit design and illustrate how AI systems can participate in iterative scientific optimization workflows across scientific domains.

2606.11240 2026-06-12 physics.comp-ph cond-mat.str-el cs.LG quant-ph 新提交

Physically Constrained Ensemble Gaussian Process Modelling for Expensive Quantum Systems with Heteroskedastic Noise

物理约束集成高斯过程建模用于具有异方差噪声的昂贵量子系统

Arpan Biswas, Sutirtha Paul, Joseph Agada, Matthias Thamm, Adrian Del Maestro

AI总结 提出物理约束集成高斯过程框架,通过加权惩罚和数值积分集成多个GP代理,高效建模含异方差噪声的量子系统,在Bose-Hubbard模型和纳米孔硅酸盐量子液体模拟中实现更准确且物理合理的预测。

详情
Comments
14 pages, 6 figures in main text, 2 figures in Supp materials
AI中文摘要

精确建模量子多体系统通常需要计算昂贵的模拟,如密度矩阵重正化群(DMRG)或量子蒙特卡洛(QMC)计算。这些方法虽然精确,但会带来显著的时间和资源限制,限制了它们在详尽参数探索中的应用。此外,这些昂贵模拟在大的未知参数空间内可能包含可变误差,需要量化和传播。因此,需要预测建模来准确估计稀疏采样数据(具有异方差噪声)的函数空间,同时保持估计的物理相关性。为此,我们提出了物理约束集成高斯过程(pc-EGP)框架,旨在物理一致性约束下高效建模复杂且含噪声的量子系统。该方法首先将物理约束作为用户控制的加权惩罚项,施加到高斯过程(GP)代理的数据驱动损失函数中。然后,通过数值求积方法训练一组这样的GP模型,其中多个不同节点上的GP通过求积加权平均进行集成。我们首先在合成生成数据上演示该框架,然后应用于量子系统。在第一个案例研究中,我们利用Bose-Hubbard模型的DMRG模拟来预测控制超流-莫特绝缘体转变的临界相互作用参数Uc。在第二个案例研究中,我们展示了该方法在QMC模拟上的应用,模拟限制在纳米孔硅酸盐内的量子液体,目标是优化化学环境以实现一维超流。与传统GP相比,pc-EGP在准确性和物理有意义的预测之间实现了更好的平衡。

英文摘要

Accurate modeling of quantum many-body systems often requires computationally expensive simulations such as Density Matrix Renormalization Group (DMRG) or Quantum Monte Carlo (QMC) calculations. These methods, while precise, impose significant time and resource constraints, limiting their use in exhaustive parameter exploration. Moreover, these expensive simulations can contain variable errors over the large unknown parameter space, which needs to be quantified and propagated. Thus, predictive modelling is required to estimate the functional space accurately over scarcely sampled data with heteroskedastic noise, while preserving the physical relevance of the estimation. Therefore, we present a Physically Constrained Ensemble Gaussian Process (pc-EGP) framework designed to efficiently model complex and noisy quantum systems under physical consistency constraints. The proposed method first enforces physical constraints as a user controlled weighted penalty to the data-driven loss function of the Gaussian Process (GP) surrogates. Then an ensemble of such GP models is trained with variable noisy simulations via numerical quadrature method where these multiple GP(s) at different nodes is integrated as a quadrature weighted average. We first demonstrate the framework on synthetically generated data before applying to quantum systems. In the first case study, we leverage DMRG simulations of the Bose-Hubbard Model to predict the critical interaction parameter Uc governing the superfluid-to-Mott-insulator transition. In the second case study, we demonstrate our method on QMC simulations, of a quantum liquid confined inside a nanoporous silicate with the goal of optimizing a chemical environment to realize a one-dimensional superfluid. Compared to conventional GP, pc-EGP achieves a better balance of accuracy and physically meaningful predictions.

2605.29151 2026-06-12 math.AG cs.AI cs.NE 版本更新

Real-rootedness of the Poincaré polynomials of $\overline{\mathcal M}_{0,n}$: an AI-assisted proof

Poincaré多项式的实根性:一个AI辅助的证明

Gergely Bérczi, Young-Hoon Kiem

AI总结 通过引入双变量变形揭示隐藏的交错结构,证明了稳定有理曲线模空间Poincaré多项式的实根性,并进一步推广到Fulton-MacPherson空间。

详情
Comments
16 pages
AI中文摘要

我们证明了Deligne-Mumford模空间$\overline{\mathcal M}_{0,n}$(稳定$n$点有理曲线)的Poincaré多项式\[ P_n(t)=\sum_{i=0}^{n-3} \dim H^{2i}(\overline{\mathcal M}_{0,n};\mathbb{Q})t^i \]的实根性,证实了Aluffi-Chen-Marcolli的猜想。证明从Keel-Manin-Getzler递推开始,但其主要新思想是Poincaré多项式的双变量变形$F_m(y,t)$。这种变形揭示了单变量递推中不可见的隐藏交错结构。对于固定的$t<0$,$F_m$在$y$方向上的零点集由区间$0<y<1-t$上的Sturm-Rolle论证控制。原始多项式在切片$y=1$上恢复,移动根通过该切片的有序交叉同时给出了实根性和严格交错。因此,$\overline{\mathcal M}_{0,n}$的Betti数构成一个超对数凹序列。 我们进一步证明了Fulton-MacPherson空间$\mathbb{P}^1[n]$(复射影线退化中$n$个有序点)的Poincaré多项式的实根性和超对数凹性。 $\overline{\mathcal M}_{0,n}$的证明是通过与Co-Mathematician(Google DeepMind开发的智能体前沿模型系统)的迭代AI辅助工作流程获得的。人类的角色是提出问题、评估连续尝试、请求修复漏洞、将逐步发展的论证与文献进行比较,并组装最终可人工验证的证明。我们额外的人类贡献是观察到类似的残差变形策略适用于Fulton-MacPherson空间$\mathbb P^1[n]$,从而得到相应的实根性定理。

英文摘要

We prove real-rootedness for the Poincaré polynomial \[ P_n(t)=\sum_{i=0}^{n-3} \dim H^{2i}(\overline{\mathcal M}_{0,n};\mathbb{Q})t^i \] of the Deligne--Mumford moduli space $\overline{\mathcal M}_{0,n}$ of stable $n$-pointed rational curves, proving a conjecture of Aluffi--Chen--Marcolli. The proof starts from the Keel--Manin--Getzler recurrence, but its main new idea is a bivariate deformation $F_m(y,t)$ of the Poincaré polynomial. This deformation reveals a hidden interlacing structure not visible in the one-variable recurrence. For fixed $t<0$, the zero set of $F_m$ in the $y$-direction is controlled by a Sturm--Rolle argument on the interval $0<y<1-t$. The original polynomial is recovered on the slice $y=1$, and the ordered crossings of the moving roots through this slice give both real-rootedness and strict interlacing. Consequently, the Betti numbers of $\overline{\mathcal M}_{0,n}$ form an ultra-log-concave sequence. We further prove real-rootedness and ultra-log-concavity for the Poincaré polynomial of the Fulton--MacPherson space $\mathbb{P}^1[n]$ of $n$ ordered points in degenerations of the complex projective line. The proof for $\overline{\mathcal M}_{0,n}$ was obtained through an iterative AI-assisted workflow with Co-Mathematician, an agentic frontier-model system developed by Google DeepMind. Our role was to formulate the problem, evaluate the proposed proof attempts, identify gaps and request corrections, compare the developing argument with the literature, and refine the presentation of the final proof. Our additional human contribution was to observe that a similar residual deformation strategy applies to the Fulton--MacPherson spaces $\mathbb P^1[n]$, yielding the corresponding real-rootedness theorem.

2605.17062 2026-06-12 cs.CR cs.LG cs.SE 版本更新

The Range Shrinks, the Threat Remains: Re-evaluating LLM Package Hallucinations on the 2026 Frontier-Model Cohort

范围缩小,威胁依旧:重新评估2026前沿模型队列上的LLM包幻觉

Aleksandr Churilov

AI总结 本文重新评估了2026前沿模型队列上大型语言模型(LLM)的包幻觉现象,发现尽管幻觉率有所降低,但仍然存在威胁,识别出一组127个包名(109个在PyPI,18个在npm)被所有评估模型一致生成,构成一个跨模型的供应链攻击面,同时发现Python与JavaScript幻觉的不对称性以及DeepSeek V3.2和GPT-5.4-mini之间的高相似性。

详情
Comments
13 pages, 3 figures, 4 tables. v2: incorporates coordinated-disclosure feedback from PyPI Security and Socket.dev; registrable attack surface refined to 53 names (41 PyPI, 12 npm). Headline rates unchanged. Replication of Spracklen et al. (USENIX Security 2025). Data and code: https://github.com/churik5/slopsquatting-replication-2026 and https://doi.org/10.5281/zenodo.19859120
AI中文摘要

Spracklen等人(USENIX Security '25)表明,生成代码的大型语言模型会以5.2%至21.7%的比率生成不存在于PyPI或npm上的包名,从而为slopsquatting攻击(恶意包的注册)提供了攻击面。我们在这五款2025年10月至2026年3月期间发布的前沿代码能力LLM上重复了他们的方法:Claude Sonnet 4.6、Claude Haiku 4.5、GPT-5.4-mini、Gemini 2.5 Pro和DeepSeek V3.2。在199,845个经过PyPI和npm主列表验证的Python和JavaScript提示对中,我们测量到幻觉率在4.62%(Claude Haiku 4.5)到6.10%(GPT-5.4-mini)之间——比Spracklen观察到的模型间差异缩小了一个数量级,但威胁并未消失。除了重复研究外,我们识别出一组127个包名(109个在PyPI,18个在npm)被所有评估模型一致生成,构成一个跨模型的供应链攻击面,无法由单一模型研究揭示。我们进一步记录了Python与JavaScript幻觉的不对称性,推翻了Spracklen 2024年的发现,识别出Anthropic家族中的Haiku低于Sonnet的倒置现象,并观察到DeepSeek V3.2和GPT-5.4-mini之间的Jaccard相似性峰值(J=0.343),暗示共享的训练数据起源。

英文摘要

Spracklen et al. (USENIX Security '25) showed that code-generating large language models hallucinate package names that do not exist on PyPI or npm at rates ranging from 5.2% on commercial models to 21.7% on open-source models, creating an attack surface for slopsquatting -- the registration of malicious packages under hallucinated names. We replicate their methodology on five frontier code-capable LLMs released between October 2025 and March 2026: Claude Sonnet 4.6, Claude Haiku 4.5, GPT-5.4-mini, Gemini 2.5 Pro, and DeepSeek V3.2. Across 199,845 paired Python and JavaScript prompts validated against PyPI and npm master lists, we measure overall hallucination rates between 4.62% (Claude Haiku 4.5) and 6.10% (GPT-5.4-mini) -- an order-of-magnitude compression of the inter-model spread observed by Spracklen, but not a retirement of the threat. Beyond replication, we identify a set of 127 package names (109 on PyPI, 18 on npm) that all five evaluated models invent identically; following coordinated disclosure with PyPI Security and Socket.dev, 53 of these (41 on PyPI, 12 on npm) remain registrable by an attacker after each registry's existing defenses, constituting a model-agnostic supply-chain attack surface that no single-model study can reveal. We further document a Python-over-JavaScript hallucination asymmetry that inverts Spracklen's 2024 finding, identify a Haiku-below-Sonnet inversion within the Anthropic family, and observe a Jaccard-similarity peak between DeepSeek V3.2 and GPT-5.4-mini (J = 0.343) suggestive of shared training-data origins.

2603.02274 2026-06-12 q-bio.QM cs.AI 版本更新

Contextual Invertible World Models: A Neuro-Symbolic Agentic Framework for Colorectal Cancer Drug Response

上下文可逆世界模型:用于结直肠癌药物反应的神经符号智能框架

Christopher Baker, Tianyu Ren, Karen Rafferty, Hui Wang

AI总结 提出上下文可逆世界模型(CIWM),结合机器学习模拟器与大语言模型推理层,通过逆推理进行CRISPR扰动,揭示KRAS突变在5-氟尿嘧啶耐药中的主导作用及PIK3CA修复的意外效应。

详情
AI中文摘要

精准肿瘤学目前受到小N大P悖论的限制,即高维基因组数据丰富但药理学反应样本稀疏。虽然深度学习实现了预测准确性,但它常常无法提供临床采用所需的机制清晰度。我们提出了上下文可逆世界模型(CIWM),这是一个神经符号智能框架,通过将定量机器学习模拟器与大语言模型推理层集成来弥合这一差距。利用在Sanger GDSC数据集(\\( N=83 \\))上严格筛选的高保真数据工程流程,我们从体外伪影中分离出真正的生物信号,为复杂转录组学建立了严格的基线预测相关性(\\( r=0.268 \\))。通过逆推理,我们在结直肠癌景观中进行了计算机CRISPR扰动。该框架自主推翻了经典机制假设,识别出突变KRAS在驱动5-氟尿嘧啶耐药(\\( \Delta=-0.0469 \\))中相对于APC/Wnt轴具有层级优势,并通过映射到MAPK/PI3K网络的“KRAS盾牌”实现。此外,智能层识别出“PIK3CA悖论”,揭示修复PIK3CA通过触发补偿性反馈环过度激活主导的MAPK生存通路,无意中增加了化疗耐药性(\\( \Delta=+0.0085 \\))。

英文摘要

Precision oncology is currently limited by the small-N, large-P paradox, where high-dimensional genomic data is abundant but pharmacological response samples are sparse. While deep learning achieves predictive accuracy, it frequently fails to provide the mechanistic clarity required for clinical adoption. We present the Contextual Invertible World Model (CIWM), a Neuro-Symbolic Agentic Framework that bridges this gap by integrating a quantitative machine learning emulator with a Large Language Model reasoning layer. Utilising a stringently curated, high-fidelity data engineering pipeline on the Sanger GDSC dataset (\( N=83 \)), we isolate true biological signals from in vitro artifacts to establish a rigorous baseline predictive correlation for complex transcriptomics (\( r=0.268 \)). Through Inverse Reasoning, we perform in silico CRISPR perturbations across the colorectal landscape. The framework autonomously overturns classical mechanistic assumptions, identifying a hierarchical dominance of mutant KRAS over the APC/Wnt-axis in driving 5-fluorouracil resistance (\( Δ=-0.0469 \)) via a "KRAS Shield" mapped to MAPK/PI3K networks. Furthermore, the agentic layer identified a "PIK3CA Paradox", revealing that repairing PIK3CA inadvertently increases chemoresistance (\( Δ=+0.0085 \)) by triggering a compensatory feedback loop that hyperactivates the dominant MAPK survival pathway.