arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3851
2606.09189 2026-06-09 cs.CR cs.AI 新提交

Pretrained, Frozen, Still Leaking: Auditing Cross-Encoder Attribute Transfer in EEG Foundation Models

预训练、冻结、仍在泄露:脑电图基础模型中跨编码器属性转移的审计

Jianwei Tai

发表机构 * Jianwei Tai(Tai Jianwei)

AI总结 提出跨编码器桥接攻击,证明单一端点审计无法检测属性泄露,并引入审计端点分歧分数(AEDS)作为联合发布决策规则。

详情
AI中文摘要

脑电图基础模型的发布通常一次只审计一个端点:原始重建、成员推断、身份链接或下游头的DP-SGD。我们在所有四个端点上联合审计相同的发布嵌入,针对BIOT、LaBraM和EEGPT,并表明每个单一端点审计都会清除仍然泄露频谱属性的发布。决定性的证据是跨编码器转移审计:从一个冻结编码器学习的单一岭属性解码器,通过拟合的线性桥接,转移到每个其他编码器的留出受试者测试集,在所有六个BIOT/LaBraM/EEGPT方向上,受试者不相交的匹配对照95%置信区间下界至少为0.081。我们证明了一个充分条件:两个编码器共享一个非平凡的属性坐标投影重叠β,允许一个链式岭桥接攻击者,其中心增益下界为sqrt(β/(1+τ^2)) - eps_br - rho_0,并反解β在[0.008, 0.198]范围内。为了将联合审计转化为可部署的决策规则,我们引入了审计端点分歧分数(AEDS),证明了其正性的充分条件,并逐单元进行自举校准;在所有八个匹配置信区间单元中(EEGMMI上的BIOT/LaBraM/EEGPT;Sleep-EDF、54通道LIMO、CHB-MIT儿科头皮脑电图上的LaBraM),AEDS为正,p<0.001,而头部级别的Carlini LiRA成员审计仅达到AUC 0.50-0.70。标准防御在审计下失败:维纳风格噪声感知自适应攻击者、LiRA审计以及每个效用保持ε∈{4,8}的DP-SGD使属性通道基本保持不变。贡献是一个审计框架,将分散的单一端点防御转化为联合发布决策,由跨编码器桥接定理以及自适应攻击者、LiRA和DP-SGD基线支持;该审计许可发布阻止,而非原始波形窃取或留出受试者身份恢复。

英文摘要

EEG foundation-model releases are usually audited one endpoint at a time: raw-reconstruction, membership inference, identity linkage, or DP-SGD on the downstream head. We audit the same released embeddings under all four endpoints jointly, on BIOT, LaBraM, and EEGPT, and show that each single-endpoint audit clears releases that still leak spectral attributes. The decisive evidence is a cross-encoder transfer audit: a single ridge attribute decoder learned from one frozen encoder transfers, via a fitted linear bridge, to held-out-subject test splits of every other encoder, with subject-disjoint matched-control 95% CI lower bound at least 0.081 across all six BIOT/LaBraM/EEGPT directions. We prove a sufficient condition: two encoders sharing a nontrivial attribute-coordinate projector overlap beta admit a chained ridge bridge attacker with centered-gain lower bound sqrt(beta/(1+tau^2)) - eps_br - rho_0, and back-solve beta in [0.008, 0.198]. To turn the joint audit into a deployment-readable decision rule we introduce an audit-endpoint disagreement score (AEDS), prove sufficient conditions for its positivity, and bootstrap-calibrate it per cell; AEDS is positive in all eight matched-CI cells (BIOT/LaBraM/EEGPT on EEGMMI; LaBraM on Sleep-EDF, 54-channel LIMO, CHB-MIT pediatric scalp EEG) with p<0.001, while a head-level Carlini LiRA membership audit reaches AUC only 0.50-0.70. Standard defenses fail under audit: a Wiener-style noise-aware adaptive attacker, the LiRA audit, and DP-SGD at every utility-preserving epsilon in {4,8} leave the attribute channel essentially unchanged. The contribution is an audit framework that turns scattered single-endpoint defenses into a joint release decision, supported by a cross-encoder bridge theorem and adaptive-attacker, LiRA, and DP-SGD baselines; the audit licenses release-blocking, not raw-waveform exfiltration or held-out-subject identity recovery.

2606.09135 2026-06-09 cs.CR cs.AI 新提交

Steganography Without Modification: Hidden Communication via LLM Seeds

无需修改的隐写术:通过LLM种子进行隐藏通信

Felix Mächtle, Jonas Sander, Sebastian Berndt, Ben Weimar, Nils Loose, Thomas Eisenbarth

发表机构 * Institute for IT Security, University of Lübeck(吕贝克大学信息安全部) Technische Hochschule Lübeck(吕贝克技术大学)

AI总结 利用LLM推理栈中确定性解码的伪随机数生成器种子依赖性,提出一种无需修改模型权重或采样代码的隐写信道,通过种子编码秘密消息,接收者通过穷举搜索恢复。

Comments To appear in the Proceedings of the International Conference on Availability, Reliability and Security (ARES 2026)

详情
AI中文摘要

我们证明,广泛部署的大型语言模型(LLM)推理栈包含一个隐写信道,该信道无需修改模型权重、采样代码或输出分布。该信道利用了确定性解码的结构特性:在逆变换采样中使用的伪随机数生成器(PRNG)产生一个依赖于种子的token级概率区间序列,该序列可以仅从生成的文本中重建。发送者在生成前将秘密消息编码到PRNG种子中;接收者重建区间并通过穷举搜索种子空间恢复种子,从而恢复隐藏载荷。我们形式化了两种操作模式。在已知提示设置中,发送者和接收者共享提示,从而通过强制对齐实现精确区间重建和完美种子恢复。在未知提示设置中,仅可获取生成的文本;结合最大命中计数评分策略的近似区间重建仍能从足够长的输出中可靠恢复。在六个模型系列和五个异构文本域上的大量实验表明,在已知提示设置中,从完整的2^32候选空间中恢复32位种子,根据模型和文本域的不同,在300个token内、单GPU上35秒内可实现高达100%的准确率。在未知提示设置中,恢复在600-800个token内约12秒达到近乎完美的准确率。我们进一步分析了提示策略、分词歧义和采样超参数对信道可靠性的影响。此外,我们讨论了结果的几个应用:首先,它允许隐写传输32位信息,但也表明忽略提示并非有效的安全假设。

英文摘要

We demonstrate that widely deployed Large Language Model (LLM) inference stacks harbor a steganographic channel that requires no modification to model weights, sampling code, or output distributions. The channel exploits a structural property of deterministic decoding: pseudo-random number generators (PRNGs) used in inverse-transform sampling produce a seed-dependent sequence of token-level probability intervals that can be reconstructed from the generated text alone. A sender encodes a secret message in the PRNG seed before generation; a receiver reconstructs the intervals and recovers the seed, and thus the hidden payload, by exhaustive search over the seed space. We formalize two operational modes. In the known-prompt setting, sender and receiver share the prompt, enabling exact interval reconstruction and perfect seed recovery via forced alignment. In the unknown-prompt setting, only the generated text is available; approximate interval reconstruction combined with a maximum-hit-count scoring strategy still permits reliable recovery from sufficiently long outputs. Extensive experiments across six model families and five heterogeneous text domains show that, in the known-prompt setting, full 32-bit seed recovery from the complete 2^32 candidate space achieves up to 100% accuracy, depending on model and text domain, within 300 tokens and under 35 seconds on a single GPU. In the unknown-prompt setting, recovery reaches near-perfect accuracy at 600-800 tokens in about 12 seconds. We further analyze the influence of prompting strategies, tokenization ambiguities, and sampling hyperparameters on channel reliability. Moreover, we discuss several applications of our results: First, it allows for the steganographic transmission of 32 bits, but also shows that ignorance of the prompt is not a valid security assumption.

2606.09125 2026-06-09 cs.CR cs.AI 新提交

Unveiling Privacy Risks in Multi-modal Large Language Models: Task-specific Vulnerabilities and Mitigation Challenges

多模态大语言模型中的隐私风险揭示:任务特定漏洞与缓解挑战

Tiejin Chen, Pingzhi Li, Kaixiong Zhou, Tianlong Chen, Hua Wei

发表机构 * Arizona State University(亚利桑那州立大学) University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校) North Carolina State University(北卡罗来纳州立大学)

AI总结 本研究揭示了多模态大语言模型在处理图像和文本时存在的隐私泄露风险,通过构建MM-Privacy数据集评估了不同任务下的披露风险与保留风险,并强调了任务不一致性对隐私风险的影响。

详情
AI中文摘要

仅文本大语言模型(LLMs)的隐私风险已得到充分研究,特别是它们记忆和泄露敏感信息的倾向。然而,处理文本和图像的多模态大语言模型(MLLMs)引入了独特的隐私挑战,这些挑战尚未得到充分探索。与仅文本模型相比,MLLMs可以提取和暴露嵌入在图像中的敏感信息,带来新的隐私风险。我们发现一些MLLMs容易受到隐私泄露的影响,泄露嵌入在图像中或存储在记忆中的敏感数据。具体来说,在本文中,我们(1)引入了MM-Privacy,一个全面的数据集,旨在评估各种多模态任务和场景下的隐私风险,其中我们定义了披露风险和保留风险。(2)使用MM-Privacy系统评估了不同的MLLMs,并展示了模型如何在各种任务中泄露敏感数据,以及(3)提供了关于任务不一致性在隐私风险中的作用的额外见解,强调了缓解策略的迫切需求。我们的发现突出了MLLMs中的隐私问题,强调了防止数据暴露的安全措施的必要性。我们的数据集和代码可在此处找到。

英文摘要

Privacy risks in text-only Large Language Models (LLMs) are well studied, particularly their tendency to memorize and leak sensitive information. However, Multi-modal Large Language Models (MLLMs), which process both text and images, introduce unique privacy challenges that remain underexplored. Compared to text-only models, MLLMs can extract and expose sensitive information embedded in images, posing new privacy risks. We reveal that some MLLMs are susceptible to privacy breaches, leaking sensitive data embedded in images or stored in memory. Specifically, in this paper, we (1) introduce MM-Privacy, a comprehensive dataset designed to assess privacy risks across various multi-modal tasks and scenarios, where we define Disclosure Risks and Retention Risks. (2) systematically evaluate different MLLMs using MM-Privacy and demonstrate how models leak sensitive data across various tasks, and (3) provide additional insights into the role of task inconsistency in privacy risks, emphasizing the urgent need for mitigation strategies. Our findings highlight privacy concerns in MLLMs, underscoring the necessity of safeguards to prevent data exposure. Our dataset and code can be found here.

2606.09122 2026-06-09 cs.SE cs.AI cs.ET cs.MA cs.NI 新提交

Autonomous Incident Resolution at Hyperscale: An Agentic AI Architecture for Network Operations

超大规模下的自主事件解决:面向网络运维的智能体AI架构

Arun Malik

发表机构 * Arun Malik

AI总结 提出一种多智能体编排框架,通过分层分解、技能调用、知识编码和渐进自主,在超大规模云网络中实现90%以上常见事件的自主解决,并保障安全。

Comments 7 pages, 6 figures

详情
AI中文摘要

超大规模的云网络基础设施面临着独特的运维挑战,传统的人工驱动事件响应无法跟上故障的数量、速度和复杂性。本文提出了一种用于大规模网络运维中自主事件解决的智能体AI架构。我们的系统采用多智能体编排框架,其中专门的AI智能体协作检测、诊断和修复网络事件,无需人工干预。我们描述了架构原则,包括分层智能体分解、通过标准化协议的基于技能的工具调用、来自运维手册的结构化知识编码、具有安全边界的渐进自主性以及闭环验证。该架构已在主要云提供商的生产环境中部署,表明智能体AI系统能够在常见事件类别中实现超过90%的自主解决率,同时通过分层授权和回滚机制维护安全保证。我们讨论了设计权衡、故障模式以及从大规模运行自主AI智能体中获得的经验教训。

英文摘要

Cloud network infrastructure at hyperscale presents unique operational challenges where traditional human-driven incident response cannot keep pace with the volume, velocity, and complexity of failures. This paper presents an agentic AI architecture for autonomous incident resolution in large-scale network operations. Our system employs a multi-agent orchestration framework where specialized AI agents collaborate to detect, diagnose, and remediate network incidents without human intervention. We describe the architectural principles, including hierarchical agent decomposition, skills-based tool invocation via standardized protocols, structured knowledge encoding from operational runbooks, progressive autonomy with safety boundaries, and closed-loop verification. The architecture has been deployed in production at a major cloud provider, demonstrating that agentic AI systems can achieve autonomous resolution rates exceeding 90% for common incident categories while maintaining safety guarantees through layered authorization and rollback mechanisms. We discuss design tradeoffs, failure modes, and lessons learned from operating autonomous AI agents at scale.

2606.09100 2026-06-09 cs.SI cs.LG 新提交

Alcmean's: Unsupervised community detection using local Laplacian, automatic detection of the number of centers

Alcmean's: 使用局部拉普拉斯算子的无监督社区检测与中心数量自动检测

Shahin Momenzadeh, Rojiar Pir Mohammadiani

发表机构 * Department of Computer Engineering, University of Kurdistan, Sanandaj, Iran(伊朗库尔德大学桑和达吉分校计算机工程系)

AI总结 提出ALCMeans算法,结合拉普拉斯能量自动识别中心与DeepWalk嵌入,无需预设社区数,在基准数据集上NMI和ARI比Louvain等方法高10-20%。

详情
AI中文摘要

社区检测是复杂网络分析中的一个基本问题,在社交、生物和金融领域都有应用。传统算法如Louvain、LPA和模块度优化通常需要手动参数调整,还存在聚类中心选择不准确和可扩展性差的问题。为了解决这些挑战,我们提出了自动拉普拉斯中心均值(ALCMeans),一种新颖的社区检测算法。ALCMeans将基于拉普拉斯能量的自动中心识别与DeepWalk嵌入相结合,以实现稳健的节点表示。与现有的基于拉普拉斯和聚类方法不同,ALCMeans无需预定义社区数量,利用结构重要性增强聚类中心选择,并利用表示学习实现更准确和稳定的分配。在基准数据集上的实验结果表明,与Louvain、Newman-Girvan、LPA、Fast-Greedy以及最近基于GNN的竞争者(MAGI, KDD 2024)相比,NMI和ARI得分提高了10%到20%。使用模块度和F1分数的额外评估证实了ALCMeans的优越性。消融研究突出了每个组件的关键贡献。尽管依赖于DeepWalk参数并且相对于轻量级启发式方法运行时间增加,ALCMeans始终优于最先进的方法,使其成为现实世界网络分析的一个有前景的工具。

英文摘要

Community detection is a fundamental problem in the analysis of complex networks. It has applications across social, biological, and financial domains. Traditional algorithms such as Louvain, LPA, and modularity optimization often require manual parameter tuning. They also suffer from inaccurate cluster center selection and struggle with scalability. To address these challenges, we propose Automatic Laplacian Centrality Means (ALCMeans), a novel community detection algorithm. ALCMeans combines Laplacian energy-based automatic center identification with DeepWalk embeddings for robust node representation. Unlike existing Laplacian-based and clustering methods, ALCMeans eliminates the need to predefine the number of communities, enhances cluster center selection using structural importance, and leverages representation learning for more accurate and stable assignments. Experimental results on benchmark datasets demonstrate 10 to 20 percent higher NMI and ARI scores compared to Louvain, Newman-Girvan, LPA, Fast-Greedy, and a recent GNN-based competitor (MAGI, KDD 2024). Additional evaluations with modularity and F1-scores confirm the superiority of ALCMeans. Ablation studies highlight the critical contributions of each component. Despite its reliance on DeepWalk parameters and increased runtime relative to lightweight heuristics, ALCMeans consistently outperforms state-of-the-art methods. This makes it a promising tool for real-world network analysis.

2606.09090 2026-06-09 cs.SE cs.AI 新提交

Context Rot in AI-Assisted Software Development: Repurposing Documentation Consistency for AI Configuration Artifacts

AI辅助软件开发中的上下文腐烂:将文档一致性技术用于AI配置工件

Christoph Treude, Sebastian Baltes

发表机构 * Singapore Management University(新加坡国立管理学院) Heidelberg University(海德堡大学)

AI总结 针对AI编码助手的配置文件(如CLAUDE.md)随软件演化而变得陈旧的问题,提出利用现有文档一致性工具检测上下文腐烂,并在356个仓库中发现23.0%存在过时代码引用。

详情
AI中文摘要

开发者越来越多地通过配置文件(如CLAUDE.md、AGENTS.md和.cursorrules)为AI编码助手提供持久上下文。这些文件描述代码元素、架构和开发约定,形成指导AI工具跨会话行为的上下文。随着软件演化,这种上下文可能变得陈旧,我们称之为上下文腐烂。虽然AI配置工件是新的,但底层的一致性问题与数十年的软件文档研究相关。研究人员已构建工具来检查文档与代码之间的一致性,涵盖README文件、代码注释、API文档、架构描述和安装说明。我们认为,这个现有工具箱是检测上下文腐烂的直接起点,并提出了一个研究路线图,将文档一致性方法映射到这一新环境中的相应问题。作为初步证据,将现有的README/wiki一致性检查器应用于356个仓库的统计代表性样本,发现23.0%的仓库中存在过时代码元素引用,表明传统的文档一致性工具已经能够发现上下文腐烂。

英文摘要

Developers increasingly provide AI coding assistants with persistent context through configuration files such as CLAUDE.md, AGENTS.md, and .cursorrules. These files describe code elements, architecture, and development conventions, forming the context that guides AI tool behavior across sessions. As software evolves, this context can become stale, a phenomenon we call context rot. While AI configuration artifacts are new, the underlying consistency problem connects to decades of software documentation research. Researchers have built tools to check consistency between documentation and code, spanning README files, code comments, API documentation, architecture descriptions, and installation instructions. We argue that this existing toolbox is an immediate starting point for detecting context rot, and we present a research roadmap mapping documentation consistency approaches to corresponding problems in this new setting. As preliminary evidence, applying an existing README/wiki consistency checker to a statistically representative sample of 356 repositories identifies stale code element references in 23.0% of repositories, showing that traditional documentation consistency tools can already surface context rot.

2606.09084 2026-06-09 cs.CR cs.AI 新提交

Context-Fractured Decomposition Attacks on Tool-Using LLM Agents: Exploiting Artifact Provenance Gaps

上下文碎片化解构攻击:利用工具使用LLM代理的工件来源鸿沟

Xiaofeng Lin, Yukai Yang, Daniel Guo, Sahil Arun Nale, Charles Fleming, Guang Cheng

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对工具使用LLM代理,提出上下文碎片化解构(CFD)攻击,利用跨上下文工件来源鸿沟实现多步越狱,成功率提升高达28.3个百分点。

详情
AI中文摘要

使用工具的LLM代理通过与世界交互,在工件(如工作区文件或日志)中持久化状态。因此,越狱防御必须考虑跨步骤的组合,而非孤立的文本。然而,大多数现有的攻击和防御,包括Crescendo和Tree of Attacks等“多轮”越狱,仍然假设防御者可见单一连续的对话。这一假设在真实的代理流水线中不成立,因为强制措施分散在工具、模块和时间中,且工件来源通常不被追踪。我们为使用工具的LLM代理操作化了一种部署失败模式——\emph{来源鸿沟},并研究了其可复现的触发条件:\emph{上下文碎片化解构}(CFD),这是一类跨上下文的多步越狱,它保留早期交互中看似良性的中间工件,并在很久之后(可能在不同的代理实例或工作流阶段)通过单独无害的工具操作引发有害行为,其风险仅在延迟的工件介导组合下显现。我们通过跟踪级诊断来检测该失败模式,并概述了一种可验证的缓解方向(来源血统标记)。在代理系统越狱基准测试中,CFD相比最先进的基线将成功率提高了高达28.3个百分点,即使面对强大的单轮判断器。免责声明:本文包含有害或冒犯性语言的示例。

英文摘要

Tool-using LLM agents interact with the world through actions that persist state in artifacts (e.g., workspace files or logs). Consequently, jailbreak defenses must reason about cross-step composition rather than isolated text. Yet most existing attacks and defenses, including ``multi-turn'' jailbreaks such as Crescendo and Tree of Attacks,still assume a single contiguous conversation visible to the defender. This assumption breaks down in real agent pipelines, where enforcement is fragmented across tools, modules, and time, and where artifact provenance is often not tracked. We operationalize a deployment failure mode for tool-using LLM agents, the \emph{provenance gap}, and study reproducible triggers for it: \emph{Context-Fractured Decomposition} (CFD), a family of cross-context multi-step jailbreaks that preserve benign-looking intermediate artifacts from an early interaction and elicit harmful behavior much later, potentially in a different agent instance or workflow stage, via individually innocuous tool actions whose risk emerges only under delayed artifact-mediated composition. We instrument the failure mode with trace-level diagnostics and outline a verifiable mitigation direction (provenance lineage tagging). Across agent-system jailbreak benchmarks, CFD improves success rates by up to 28.3 percentage points over state-of-the-art baselines, even against strong single-turn judges. Disclaimer: This paper contains examples of harmful or offensive language.

2606.09050 2026-06-09 eess.AS cs.SD 新提交

MeanVC 2: Robust Low-Latency Streaming Zero-Shot Voice Conversion

MeanVC 2:鲁棒的低延迟流式零样本语音转换

Guobin Ma, Yuxuan Xia, Yuepeng Jiang, Dake Guo, Hanke Xie, Jingbin Hu, Yanbo Wang, Lei Xie, Pengcheng Zhu

发表机构 * The University of New South Wales, Australia(澳大利亚新南威尔士大学) WeNet Open Source Community, China(中国WeNet开源社区)

AI总结 提出MeanVC 2,通过未来感知分块(FRC)和通用音色令牌编码器,在40ms分块大小下实现稳定转换,延迟从211ms降至110ms,显著提升零样本说话人相似度。

Comments Accepted by Interspeech 2026

详情
AI中文摘要

流式零样本语音转换(VC)因其在实时应用中的潜力而日益流行。最近提出的MeanVC实现了轻量级流式零样本VC,但存在若干局限性:其逐块自回归去噪使有效训练序列长度加倍,小分块设置下转换质量下降,且其音色编码器直接依赖参考梅尔频谱图,对参考音频质量敏感。为解决这些问题,我们提出MeanVC 2。我们引入未来感知分块(FRC),该技术明确地在扩散变换器解码器层之间调度过去和未来的感受野,并消除了干净分块教师强制。通过结合有界未来上下文,FRC在40ms分块大小下实现稳定转换。我们进一步引入通用音色令牌编码器,该编码器从全局说话人嵌入构建音色表示,并通过交叉注意力检索细粒度音色线索,提高了对低质量参考的鲁棒性并增强了零样本说话人相似度。实验结果表明,MeanVC 2显著优于MeanVC,同时将延迟从211ms降低至110ms。音频样本已公开。源代码将公开发布。

英文摘要

Streaming zero-shot voice conversion (VC) has become increasingly popular due to its potential for real-time applications. The recently proposed MeanVC achieves lightweight streaming zero-shot VC, but it has several limitations: its chunk-wise autoregressive denoising doubles the effective training sequence length, conversion quality degrades under small-chunk settings, and its timbre encoder directly relies on reference mel-spectrograms, making it sensitive to reference audio quality. To address these limitations we propose MeanVC 2. We introduce future-receptive chunking (FRC), which explicitly schedules past and future receptive fields across diffusion transformer decoder layers and removes clean-chunk teacher forcing. By incorporating bounded future context, FRC enables stable conversion with a 40 ms chunk size. We further introduce a universal timbre token encoder, which constructs a timbre representation from a global speaker embedding and retrieves fine-grained timbre cues via cross-attention, improving robustness to low-quality references and enhancing zero-shot speaker similarity. Experimental results show that MeanVC 2 significantly outperforms MeanVC, while reducing latency from 211 ms to 110 ms. Audio samples are publicly available. The source code will be publicly released.

2606.09048 2026-06-09 eess.AS cs.AI cs.SD 新提交

BareWave: Waveform-Native Flow-Matching Text-to-Speech

BareWave: 波形原生流匹配文本转语音

Wei Fan, Chao-Hong Tan, Qian Chen, Wen Wang, Xiangang Li, Kejiang Chen, Weiming Zhang, Nenghai Yu

发表机构 * Anhui Province Key Laboratory of Digital Security(安徽省数字安全重点实验室) Tongyi Fun Team, Alibaba Group(阿里巴巴集团通义Fun团队)

AI总结 提出BareWave,一种完全波形原生的流匹配TTS框架,通过训练时表示对齐、分阶段噪声调度和速度感知感知对齐解决直接波形训练难题,实现零样本语音克隆的高质量合成。

Comments Under Review

详情
AI中文摘要

去除中间表示和单独训练的解码阶段已成为生成建模的重要方向。然而,在文本转语音中,高质量系统通常仍通过中间声学表示构建,再进行波形合成。本文提出BareWave,一种完全波形原生的框架,用于流匹配TTS中的直接文本到波形生成。我们认为该设置引发了三个训练挑战:原始波形建模缺乏强大的预训练表示支架;不同训练阶段受益于不同的噪声调度;数据空间感知目标不会自动共享速度空间流目标的时间结构。因此,直接波形训练难以高效优化,难以通过固定配方推向强最终工作点,也难以整合有效的感知细化。基于此观点,我们开发了一个直接文本到波形训练框架,结合训练时表示对齐、分阶段噪声调度和速度感知感知对齐(VAPA),同时在测试时保持单一波形原生推理路径,无需预训练组件。零样本语音克隆实验表明,在完全波形原生推理路径下,可以实现强可懂度、说话人相似度和自然度,支持波形原生流匹配TTS作为实用方向。带有音频示例的项目页面可在https://barewave.github.io/获取。

英文摘要

Removing intermediate representations and separately trained decoding stages has become an important direction in generative modeling. In text-to-speech, however, high-quality systems are still commonly built through an intermediate acoustic representation before waveform synthesis. In this work, we present BareWave, a fully waveform-native framework for direct text-to-wave generation in flow-matching TTS. We consider this setting to raise three training challenges: raw-waveform modeling lacks a strong pretrained representational scaffold, different stages of training benefit from different noise schedules, and data-space perceptual objectives do not automatically share the temporal structure of the velocity-space flow objective. As a result, direct waveform training is hard to optimize efficiently, hard to push toward a strong final operating point with a fixed recipe, and hard to integrate effective perceptual refinement. Guided by this view, we develop a direct text-to-wave training framework that combines training-time representation alignment, staged noise scheduling, and velocity-aware perceptual alignment (VAPA), while preserving a single waveform-native inference path without pretrained components at test time. Experiments on zero-shot voice cloning show that strong intelligibility, speaker similarity, and naturalness can be achieved under a fully waveform-native inference path, supporting waveform-native flow-matching TTS as a practical direction. Project page with audio demos is available at https://barewave.github.io/.

2606.09047 2026-06-09 eess.SY cs.LG cs.SY math.OC 新提交

Families of Control-Cost-Parametrized Inverse-Optimal Universal Stabilizers

控制代价参数化的逆最优通用镇定器族

Miroslav Krstic, Luke Bhan

发表机构 * Department of Mechanical and Aerospace Engineering, University of California San Diego(加州大学圣地亚哥分校机械与航空航天工程系) Department of Electrical and Computer Engineering, University of California San Diego(加州大学圣地亚哥分校电气与计算机工程系)

AI总结 提出一族代价参数化的镇定反馈律,用户选择控制运行代价函数,通过公式扩展通用控制器,并证明代价-扩展算子的Lipschitz性质,支持神经算子逼近,实现半全局实用渐近稳定性和二阶次优性界。

Comments 13 Pages

详情
AI中文摘要

经典的通用镇定公式不提供设计自由度:它是一个单一的无参数对象。我们引入一族代价参数化的镇定反馈律,其中(1)用户选择一个函数作为逆最优代价泛函中控制的运行代价,(2)通过一个公式获得预先存在的通用控制器的非线性“扩展器”,该扩展器解决了一个具有有意义的代价状态无限时域最优控制问题。代价-扩展器公式是一个三步构造,涉及代价微分和函数反演——总体上是一个非线性无限维算子。代价-扩展器算子被证明是Lipschitz的,这使得整个族的均匀神经算子逼近成为可能,并支持离线性能探索和在线自适应。在逼近下建立了半全局实用渐近稳定性和二阶次优性界。通过数值示例说明了算子学习及其在半全局镇定中的应用。我们将结果称为“半直接最优”,因为本文的设计不如一般的“直接最优”(HJB诱导)控制,但比完全逆最优更多,因为用户对任意给定的控制代价执行最小化。我们解决的半直接问题的对偶问题是状态代价任意且给定的问题。该对偶问题更容易,不在本文范围内。

英文摘要

A classical universal stabilization formula offers the practitioner no design freedom: it is a single, parameter-free object. We introduce a cost-parametrized family of stabilizing feedback laws, where (1) the user chooses a function that serves as the running cost on control in an inverse-optimal cost functional, and (2) obtains, through a formula, a nonlinear "expander" of a pre-existing universal controller, which solves an infinite-horizon optimal control problem with a meaningful cost on the state. The cost-to-expander formula is a three-step construction, involving, inter alia, cost differentiation and function inversion-overall, a nonlinear infinite-dimensional operator. The cost-to-expander operator is proven Lipschitz, which enables uniform neural operator approximation of the entire family and supports both offline performance exploration and online adaptation. Semiglobal practical asymptotic stability and second-order suboptimality bounds are established under the approximation. The operator learning and its use in semiglobal stabilization are illustrated numerically. We call the result 'half-direct-optimal' because the paper's design is less than a general 'direct optimal' (HJB-inducing) control, but more than the fully inverse optimal, since the user performs minimization for an arbitrary given cost on control. The dual to the half-direct problem we solve is the problem in which the cost on the state is arbitrary and given. This dual problem is easier and outside of the scope of the paper.

2606.09024 2026-06-09 cs.IR cs.CL cs.HC cs.SI 新提交

Personal Salience: Highlighting Is Social, but Individuality Lives in Selection

个人显著性:高亮是社交性的,但个性存在于选择中

Kazuki Nakayashiki, Keisuke Watanabe

发表机构 * Glasp Inc.(Glasp公司)

AI总结 通过共同阅读身份控制实验,发现高亮行为主要受群体影响(群体模型预测显著优于个人模型),但个人历史在选择已显著段落时表现出强预测力(差距+0.14),揭示个性更多体现在选择而非显著性上。

Comments 12 pages, 5 figures, 2 tables

详情
AI中文摘要

社交高亮工具让用户标记对他们重要的段落。我们通过共同阅读身份控制(同一文档被多个用户高亮)来探究从这些自然痕迹中能恢复多少个体信息,该方法固定文档和主题,询问一个人的历史是否比另一个读者的历史更能预测其标记。我们区分了通用显著性(结构)、群体显著性(他人标记的内容)和个人显著性(个体残差)。首先,高亮是社交性的:你标记的句子被群体预测得远好于结构或个人模型,甚至一个估计良好的群体(信息特权基线,能看到同一文档上他人的标记)也胜过基于你其他文档历史构建的前沿LLM孪生模型;文档内的个人信号最多是低语(嵌入评分器上的自己与他人差距+0.017,虽小但显著)。其次,与此形成鲜明对比的是,个性存在于选择中:当问及哪些已显著的段落是你的时,你自己的历史是一个强且无泄漏的预测器(差距+0.14)。主题分解显示这主要是稳定的主题偏好:与主题匹配的同伴相比,它缩小约6-8倍,且薄残差无法与更细的主题区分。非显而易见的部分是不对称性:在相同评分器下,个人信号在显著性中比在选择中弱约6-8倍。方法上,朴素的历史条件评估存在泄漏(目标自己的标记在约42%的配对中进入档案,使个人得分最多增加+0.15 AP),且小群体夸大个性化;我们的结果无泄漏,使用密集群体和模型匹配控制。高亮携带真实的个人签名,但只是强共享签名上的薄层,更多地体现在一个人选择哪些显著段落而非什么显著上。

英文摘要

Social highlighters let people mark passages that matter to them. We ask how much of an individual is recoverable from these naturalistic traces, using a co-readership identity control (the same document highlighted by many users) that holds document and topic fixed and asks whether a person's own history predicts their marks better than another reader's does. We separate generic salience (structure), crowd salience (what others marked), and personal salience (the individual residual). First, highlighting is social: which sentences you mark is predicted far better by the crowd than by structure or by a personal model, and even a well-estimated crowd, an information-privileged baseline that sees others' marks on the same document, beats a frontier LLM twin built from your other-document history; the within-document personal signal is at most a whisper (own-vs-other gap +0.017 by an embedding scorer, small but significant). Second, in sharp contrast, individuality lives in selection: asked which of the already-salient passages are yours, your own history is a strong, leakage-free predictor (gap +0.14). A topic decomposition shows this is largely stable thematic preference: it shrinks ~6-8x against a topically-matched peer, and a thin residual cannot be separated from finer topic. The non-obvious part is an asymmetry: under the same scorer the individual signal is ~6-8x weaker in salience than in selection. Methodologically, naive history-conditioning evaluations leak (the target's own marks enter the profile in ~42% of pairs, inflating personal scores by up to +0.15 AP) and small crowds overstate personalization; our results are leakage-free, use a dense crowd, and a model-matched control. Highlights carry a genuine individual signature, but a thin layer over a strong shared one, surfacing far more in which salient things a person selects than in what is salient.

2606.09006 2026-06-09 cs.SI cs.AI cs.CY cs.ET 新提交

Sustainability and Artificial Intelligence: Necessary, Challenging, and Promising Intersections

可持续性与人工智能:必要、挑战与有前景的交汇

Han-Teng Liao, Zijia Wang

发表机构 * Higher Education Impact Assessment Center(高等教育影响评估中心) Sun Yat-Sen University(中山大学) Nanfang College(南芳学院)

AI总结 本文基于541篇文献,梳理了人工智能与可持续性研究的交汇点,揭示了绿色科技在连接多学科中的核心作用,并讨论了其必要性、挑战与前景。

Comments This is an author preprint version. For the final authenticated version of record, please use the official publication via the IEEE Xplore database. DOI: 10.1109/MSIEID52046.2020.00076

详情
Journal ref
2020 Management Science Informatization and Economic Innovation Development Conference (MSIEID), Guangzhou, China, 2020, pp. 360-363
AI中文摘要

数字经济与数字技术的研究人员日益认识到需要更好地解决人工智能在塑造环境、社会和治理发展演变中的作用。可持续性与人工智能研究似乎在复杂、相互关联和动态的棘手问题特征上存在交汇。基于这种交汇,本文旨在通过概述现有研究,勾勒出必要、挑战和有前景的交汇点。基于从Web of Science数据库收集的541条文献数据,研究结果揭示了绿色可持续科技在连接不同学科、主要期刊及关键主题与概念方面日益核心的作用。研究结果展示了这些互动如何可以是必要的、挑战性的和有前景的。文章最后就如何多样化和扩展人工智能促进可持续发展的实践社区提出了一些一般性论点,特别是在预期的人工智能应用领域和机构方面。

英文摘要

Both digital economy and digital technology researchers increasingly recognize the need to better address the role that artificial intelligence (AI) plays in shaping the evolution of the environmental, social and governance aspects of development. It appears that sustainability and AI research converge on the features of wicked problems that are complex, interconnected and dynamic. Building off such convergence, this article aims to map out the necessary, challenging, and promising intersections by providing an overview of the state of art research. Based on 541 bibliographic data collected from the Web of Science (WoS) database, the findings reveal the increasingly central body of work on green and sustainable science and technology in bridging various disciplines, main journals and key topics and concepts. The findings reveal how such interactions can be necessary, challenging, and promising. The article concludes with few general arguments regarding how to diversify and expand the community of practice regarding AI for sustainable development, especially in the areas of expected AI application areas and institutions.

2606.09005 2026-06-09 cs.CR cs.CL 新提交

Document-Authored Control-Signal Impersonation: A Low-Cost Indirect Prompt Attack on RAG Safety Boundaries

文档作者控制信号冒充:对RAG安全边界的低成本间接提示攻击

Jianguo Zhu

发表机构 * Chengdu University of Information Technology(成都信息工程大学)

AI总结 研究检索增强生成系统中文档文本冒充控制信号的安全漏洞,提出非命令式间接注入攻击方法DACSI,并在多个模型上验证其有效性。

Comments Preprint. Independent-author version

详情
AI中文摘要

检索增强生成(RAG)系统通常将用户查询、检索文档、元数据、系统标签和任务指令序列化为一个自然语言提示。我们研究了这种设计中的源权威边界失效:攻击者撰写的检索文本可以冒充元数据、来源、权威或披露策略信号,这些信号对模型而言似乎是控制相关的。我们将这种模式称为文档作者控制信号冒充(DACSI)。DACSI是间接提示注入中一种非命令式、类似元数据的载荷子类。其核心教训很简单:文档作者标签是数据,而非策略。命令式注入要求模型忽略、覆盖或违反策略;而DACSI则询问当RAG提示渲染将可信和不可信文本合并到同一自然语言通道时,不可信的文档文本是否可能被错误归因于授权控制信号。我们在六种模型设置、提示压力水平、注入基线、信号分类、RAG中介管道、系统控制探针、源权威归因探针和合成金丝雀格式上评估了DACSI。我们按模型机制解释证据,而非将其视为六次同等重复:DeepSeek V4 Pro和Qwen3.5-397B提供了最清晰的正向提升,DeepSeek V4 Flash是高易感性设置,GPT-5.5和Gemini 3.1 Pro Low是具有选择性残留风险的强边界探针,而GLM-4.7是饱和泄漏边界案例。在这些机制中,DACSI值得单独评估,因为它使用无命令的元数据/来源/策略表面,遵循RAG特定的源权威路径,并对源/通道分离做出响应。源权威探针是行为归因证据,而非内部机制的证明。

英文摘要

Retrieval-augmented generation (RAG) systems often serialize user queries, retrieved documents, metadata, system labels, and task instructions into one natural-language prompt. We study a source-authority boundary failure in this design: attacker-authored retrieved text can impersonate metadata, provenance, authority, or disclosure-policy signals that appear control-relevant to the model. We call this pattern Document-Authored Control-Signal Impersonation (DACSI). DACSI is a non-imperative, metadata-like payload subclass within indirect prompt injection. Its central lesson is simple: document-authored labels are data, not policy. Command-style injection asks the model to ignore, override, or violate policy; DACSI asks whether untrusted document text can be misattributed as an authorized control signal when RAG prompt rendering collapses trusted and untrusted text into the same natural-language channel. We evaluate DACSI across six model settings, prompt-pressure levels, injection baselines, signal taxonomies, RAG-mediated pipelines, system-control probes, a source-authority attribution probe, and synthetic canary formats. We interpret the evidence by model regime rather than as six equal replications: DeepSeek V4 Pro and Qwen3.5-397B provide the cleanest positive lift, DeepSeek V4 Flash is a high-susceptibility setting, GPT-5.5 and Gemini 3.1 Pro Low are strong-boundary probes with selected residual risks, and GLM-4.7 is a saturated leakage boundary case. Across these regimes, DACSI warrants separate evaluation because it uses a command-free metadata/provenance/policy surface, follows a RAG-specific source-authority path, and responds to source/channel separation. The source-authority probe is behavioral attribution evidence, not proof of an internal mechanism.

2606.09002 2026-06-09 stat.ML cs.LG math.ST stat.TH 新提交

Multi-Armed Bandits with Arriving Arms: Sequential Screening, Dynamic Regret, and Sublinear Guarantees

带有到达臂的多臂老虎机:顺序筛选、动态遗憾与次线性保证

Deqi Zheng, Xiaoyang Xu, Yuhong Yang

发表机构 * Qiuzhen College, Tsinghua University(清华大学求真学院) Yau Mathematical Sciences Center, Tsinghua University(清华大学姚氏数学科学中心)

AI总结 针对可用臂随时间扩展的随机多臂老虎机问题,提出基于消除的UCB-AA算法,通过初步筛选新臂并考虑到达信息差异和漂移基准,实现动态遗憾的次线性界。

Comments 24 pages, 4 figures

详情
AI中文摘要

我们研究了一个随机多臂老虎机问题,其中可用臂的集合随时间扩展。这一设置出现在当新动作或治疗在正在进行的研究中变得可用时的顺序实验中,使得对事后单一最佳臂的遗憾不恰当。我们转而评估相对于当前可用最佳臂的性能,从而为到达臂环境引入了一个动态遗憾准则。为了解决到达信息差异(AID)和漂移基准(DB)带来的挑战,我们提出了用于到达臂的UCB(UCB-AA),这是一个基于消除的过程,并包含一个辅助的初步筛选步骤,用于新到达的臂在与现有臂完全竞争之前。我们证明UCB-AA获得的遗憾界明确依赖于到达过程,在间隙演化的正则条件下实现了次线性动态遗憾,并允许对未知时间范围进行在线扩展。仿真结果表明,UCB-AA减少了浪费的拉取次数,保持了较小的活动臂集,同时保持了有竞争力的遗憾性能。

英文摘要

We study a stochastic multi-armed bandit problem in which the set of available arms expands over time. This setting arises in sequential experimentation when new actions or treatments become available during an ongoing study, making regret against a single best arm in hindsight inappropriate. We instead evaluate performance relative to the best arm currently available, leading to a dynamic-regret criterion for arriving-arm environments. To address the resulting challenges of arrival information discrepancy (AID) and a drifting benchmark (DB), we propose UCB for Arriving Arms (UCB-AA), an elimination-based procedure with an aiding preliminary screening step for newly arrived arms before full competition with incumbent arms. We show that UCB-AA attains regret bounds that depend explicitly on the arrival process, achieves sublinear dynamic regret under regularity conditions on gap evolution, and admits an online extension for unknown horizons. Simulation results show that UCB-AA reduces wasted pulls and maintains a smaller active arm set while preserving competitive regret performance.

2606.08973 2026-06-09 q-bio.QM cs.LG 新提交

A systematic investigation of molecular encoding methods for drug property predictions across neural network and Transformer encoder-based model

基于神经网络和Transformer编码器模型的药物性质预测分子编码方法的系统研究

Sheng-Ya Chen, Shan-Ju Yeh

发表机构 * School of Medicine, National Tsing Hua University(国立清华大学医学院) Institute of Bioinformatics and Structural Biology, National Tsing Hua University(国立清华大学生物信息学与结构生物学研究所) Department of Life Science, National Tsing Hua University(国立清华大学生命科学系) Interdisciplinary Program of Life Sciences and Medicine, National Tsing Hua University(国立清华大学生命科学与医学跨学科计划)

AI总结 系统研究不同分子编码方法对药物性质预测的影响,使用MLP和MLP+TL模型,发现MACCS和PubChem指纹结合注意力权重可识别关键化学基团,预测准确率平均AUC>0.9。

详情
AI中文摘要

关于不同分子编码方法如何影响分子性质预测的基础研究仍然相对有限。在本研究中,我们使用两种流行的结构设计:经典神经网络模型(MLP)和基于Transformer编码器的模型(MLP+TL),广泛考察了分子性质预测的最优分子编码方法。对于分子编码方法,我们研究了几种类型的指纹,包括传统拓扑指纹、基于子结构的指纹和基于字符串的表示。这两个模型在七个著名的分子数据集上进行了训练,以基于评估指标评估不同的输入分子编码方法。在几个生物学相关的分类任务中,包括毒性、致突变性和副作用预测,我们的模型一致地实现了平均AUC值超过0.9。我们没有依赖外部事后解释方法,如局部可解释模型无关解释(LIME)或深度SHAP(SHAP),而是利用模型内在的注意力权重作为内部可解释性信号来识别潜在重要特征。使用MACCS和PubChem作为输入的MLP+TL模型能够捕获决定主要血脑屏障(BBB)通透性和鼠伤寒沙门氏菌致突变性的化学可解释基团。特别是,吗啡和海洛因之间的比较突出了羟基相关子结构在BBB通透性预测中的作用,这一点在注意力权重中一致反映。总体而言,我们的发现为选择有效的分子编码方法提供了实用指导,并有助于开发用于药物发现的可解释分子信息学方法。

英文摘要

Fundamental investigations into how different molecular encoding methods affect molecular property prediction remain relatively limited. In this study, we extensively examined the optimal molecular encoding methods for molecular properties prediction using two prevalent structure designs: a classical neural network model (MLP) and a Transformer encoder-based model (MLP+TL). For molecular encoding methods, we investigated several types of fingerprints, including traditional topological fingerprints, substructure-based fingerprints, and string-based representations. These two models were trained on seven well-known molecular datasets to evaluate different input molecular encoding methods based on evaluation metrics. On several biologically relevant classification tasks, including toxicity, mutagenicity, and side-effect prediction, our models consistently achieved average AUC values above 0.9. Rather than relying on external post-hoc explanation methods such as the local interpretable model-agnostic explanation (LIME) or the Deep SHapley Additive exPlanations (SHAP), we leveraged the model's intrinsic attention weights as an internal interpretability signal for identifying potentially important feature. The MLP+TL model using MACCS and PubChem as input can capture chemically interpretable groups that determined the major blood-brain barrier (BBB) permeability and mutagenicity in Salmonella typhimurium. In particular, a comparison between Morphine and Heroin highlighted the role of hydroxyl-related substructures in BBB permeability prediction, which was consistently reflected in the attention weights. Overall, our findings provide practical guidance for selecting effective molecular encoding methods and contribute to the development of interpretable molecular informatics approaches for drug discovery.

2606.08960 2026-06-09 cs.CR cs.AI cs.LG cs.MA 新提交

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

通过对抗性黑客-修复者循环强化智能体基准测试

Ziqian Zhong, Ivgeni Segal, Ivan Bercovich, Shashwat Saxena, Kexun Zhang, Aditi Raghunathan

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Fewshot Corp(Fewshot公司) Independent Researcher(独立研究员)

AI总结 提出黑客-修复者循环方法,通过LLM代理交替攻击和修补验证器,自动生成抗利用的验证器,将KernelBench攻击成功率从62%降至0%。

详情
AI中文摘要

智能体基准测试通常使用手工编写且脆弱的验证器来评分提交结果,这容易导致奖励黑客攻击。我们审计了五个终端智能体基准测试中的1,968个任务,发现其中323个(16%)可以被前沿模型仅通过任务描述成功攻击。这既破坏了排行榜排名,也破坏了强化学习训练信号,但标准的应对措施是手动且被动的。\n我们引入了黑客-修复者循环,一种无需逐任务手动修补即可构建抗利用验证器的方法。该循环交替使用三个LLM代理:黑客尝试在不解决任务的情况下通过验证器,修复者修补验证器以拒绝每个发现的漏洞,求解者确认修补后的验证器仍接受合法解决方案。循环迭代:每次修补都会重塑验证器的奖励机制,从而暴露下一个漏洞。我们进一步增加了验证器访问权限,并允许修补跨任务迁移,以扩大循环发现的漏洞范围。\n在KernelBench上,该循环将公开报告的漏洞语料库上的攻击成功率从62%降至0%。我们还发现,循环中的较弱代理可以防御更强的黑客:Gemini 3 Flash的循环将更强的Gemini 3.1 Pro和Claude Opus 4.7在KernelBench上的攻击成功率从76%和61%降至0%,而Gemini 3.1 Pro在Terminal Bench上的攻击成功率从39%降至17%(覆盖77个任务)。我们发布了Terminal Wrench(323个可攻击环境,3,632条攻击轨迹)作为当前攻击面的快照,以及我们修补后的验证器、循环发现的漏洞和我们的实现,作为未来工作的基础。

英文摘要

Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 (16%) hackable by frontier models given only the task description. This corrupts both leaderboard rankings and RL training signal, yet the standard response is manual and reactive. We introduce the hacker-fixer loop, a method for building exploit-resistant verifiers without per-task manual patching. The loop alternates three LLM agents: a hacker tries to pass the verifier without solving the task, a fixer patches the verifier to reject each discovered exploit, and a solver confirms the patched verifier still admits legitimate solutions. The loop iterates: each patch reshapes what the verifier rewards, surfacing the next exploit. We further add verifier access, and let patches transfer across tasks, to broaden the exploits the loop discovers. On KernelBench, the loop drives the attack success rate from 62% to 0% on a held-out corpus of publicly reported exploits. We also find that weaker agents in the loop can defend against much stronger hackers: Gemini 3 Flash's loop drives the stronger Gemini 3.1 Pro and Claude Opus 4.7's attack success rate from 76% and 61% to 0% on KernelBench, and Gemini 3.1 Pro's from 39% to 17% on Terminal Bench across 77 tasks. We release Terminal Wrench (323 hackable environments, 3,632 hack trajectories) as a snapshot of the current attack surface, our patched verifiers, the exploits the loop discovered, and our implementation as a basis for future work.

2606.08941 2026-06-09 stat.ML cs.LG 新提交

Estimate Collapsibility of Causal Effects in Completed Partial DAGs via Strong d-Convex Hulls

通过强d-凸包估计完全部分有向无环图中因果效应的可压缩性

Yuxin Deng, Yi Sun, Zhiming Li, Huaxiong Liu

发表机构 * College of Mathematics and System Science, Xinjiang University(新疆大学数学与系统科学学院) Institute of Statistics and Data Science, Xinjiang University of Finance and Economics(新疆财经大学统计与数据科学研究院)

AI总结 提出一种在完全部分有向无环图中保持因果效应估计一致性的可压缩方法,通过强d-凸包刻画最小可压缩集,并设计高效算法结合IDA框架。

详情
AI中文摘要

本文提出一种可压缩的因果效应估计方法,该方法在完全部分有向无环图(CPDAG)中对某些变量边缘化前后保持估计量的一致性。我们首先引入了CPDAG的估计可压缩性,并将最小可压缩集刻画为强d-凸包。设计了一种高效算法来获取DAG中的此类集合,并将其推广到CPDAG。然后,我们将图约简过程与IDA框架相结合。最后,实验和实证分析显示了CPDAG中因果估计可压缩性的有效性。代码可在 https://github.com/Jamyang-D/strongly-convex 获取。

英文摘要

This paper proposes a collapsible method for estimating causal effects that maintains the estimator's consistency before and after marginalization over some variables in completed partially directed acyclic graphs (CPDAGs). We first introduce the estimate collapsibility for CPDAGs and characterize the minimal collapsible sets as strong d-convex hulls. An efficient algorithm is devised to obtain such sets in DAGs and is generalized to CPDAGs. Then, we combine the graph reduction procedure with the IDA framework. Finally, experiments and empirical analysis show the effectiveness of the collapsibility for causal estimations in CPDAGs. Code is available at https://github.com/Jamyang-D/strongly-convex.

2606.08936 2026-06-09 cs.IR cs.AI cs.HC 新提交

Report on CHIIR 2026 Workshop on Generative AI and Academic Search (GAI&AS)

CHIIR 2026 生成式AI与学术搜索研讨会报告

Yifan Liu, Jaime Arguello, Orland Hoeber, Chang Liu, Soo Young Rieh, Luanne Sinnamon, Dean Alvarez, Susan Archambault, Rob Capra, Henson Chen, Charles Costa, Anita Crescenzi, Zhitong, Guan, Jacek Gwizdka, Pao-Pei Huang, Gavindya Jayawardena, Ghazal Kalhor, Dagmar Kern, Oliver Koop, Alice Li, Afra Mashhadi, Gaohui Meng, Marta Micheli, Anil B. Murthy, Kevin Schott, Sebastian Schultheiß, Jiwoo Seo, Phaneendra Sivangula, Frans van der Sluis, Xiaoxuan Song, Silang Wang, Dan Zhang

发表机构 * CHIIR 2026 Workshop(CHIIR 2026 工作坊)

AI总结 报告总结CHIIR 2026关于生成式AI重塑学术搜索系统的研讨会,聚焦设计评估挑战,涵盖基础、应用及搜索即学习三大主题,强调透明性、可信度与研究诚信。

详情
AI中文摘要

本报告总结了CHIIR 2026生成式AI与学术搜索研讨会(GAI&AS),该研讨会探讨了GenAI如何重塑学术搜索系统及研究实践。研讨会汇集了人类信息交互和信息检索领域的研究人员,探讨了在设计和评估未来集成GenAI的学术搜索系统中的关键挑战与机遇,超越了传统的文档检索,支持摘要、推荐、综合和对话交互。参与者的兴趣和讨论集中在三个主题集群:基础与原则、应用与机遇、以及搜索即学习。在这些主题中,研讨会强调了学术搜索系统在支持透明度、可信度、研究诚信和长期学术需求,以及促进高阶认知过程中的重要性。与会者讨论了指导理论、设计原则、方法论、合作伙伴关系以及旨在推进以人为中心的GenAI增强学术搜索系统的社区建设努力。总体而言,研讨会展示了社区对GenAI与学术搜索交叉领域的强烈兴趣以及多样化的正在进行和新兴的研究计划。

英文摘要

This report summarizes the CHIIR 2026 Workshop on Generative AI and Academic Search (GAI\&AS), which examined how GenAI is reshaping academic search systems and research practices. The workshop brought together researchers in human information interaction and information retrieval to explore key challenges and opportunities in designing and evaluating future academic search systems that integrate GenAI, moving beyond traditional document retrieval to support summarization, recommendation, synthesis, and conversational interaction. Participants' interests and discussions focused on three thematic clusters: foundations and principles, applications and opportunities, and search-as-learning. Across these themes, the workshop highlighted the importance of academic search systems in supporting transparency, credibility, research integrity, and long-term scholarly needs, as well as in fostering higher-order cognitive processes. Participants discussed guiding theories, design principles, methodological approaches, partnerships, and community-building efforts aimed at advancing human-centered GenAI-enhanced academic search systems. Overall, the workshop demonstrated strong community interest and a diverse range of ongoing and emerging research initiatives at the intersection of GenAI and academic search.

2606.08871 2026-06-09 math.NA cs.LG cs.NA 新提交

Fourier Neural Operators with rank-1 lattice points and hyperbolic cross

基于秩1格点和双曲交叉的傅里叶神经算子

Jakob Dilen, Alexander Keller, Frances Y. Kuo, Dirk Nuyens

发表机构 * University of New South Wales(新南威尔士大学) Max Planck Institute for Mathematics in the Sciences(马克斯·普朗克科学研究院)

AI总结 通过用秩1格点替代空间张量积网格,并在参数空间精心构造第二个格点作为训练点,提高了傅里叶神经算子的泛化误差,实现了更少参数、更少空间点和训练样本下的高效逼近。

详情
AI中文摘要

傅里叶神经算子(FNO)是一种学习函数空间之间映射的神经网络架构。其高效实现基于多维傅里叶变换。通过推导FNO关于空间和参数变量的一般正则性界,我们证明,用专门构建的秩1格点替代空间张量积网格,并使用第二个精心构造的格点作为参数空间中的训练点,可以改进FNO的泛化误差。我们用更少的网络参数、更少的空间点和更少的训练样本实现了更精确、更高效的逼近。此外,架构得到简化,因为秩1格点上的高维傅里叶变换仅需一维快速傅里叶变换,并且我们可以使用带有格点的双曲交叉频率指标集。我们通过环面上的椭圆偏微分方程展示了基于格点的双曲交叉FNO的优势。

英文摘要

The \emph{Fourier neural operator} (FNO) is a neural network architecture that learns mappings between function spaces. Its efficient implementation is based on the multi-dimensional Fourier transform. By deriving general regularity bounds for the FNO with respect to both the spatial and parametric variables, we prove that the generalization error of the FNO can be improved by replacing spatial tensor product grids with purpose-built rank-1 lattice points, and by using a second lattice carefully constructed as training points in the parametric space. We achieve more accurate and efficient approximations from fewer network parameters, fewer spatial points, and fewer training samples. In addition, the architecture is simplified, because the high-dimensional Fourier transform on rank-1 lattices requires only a \emph{one-dimensional fast Fourier transform}, and we can use a \emph{hyperbolic cross} frequency index set with lattice points. We demonstrate the benefits of our \emph{lattice-based hyperbolic-cross FNOs} for an elliptic PDE on the torus.

2606.08813 2026-06-09 cs.DC cs.DB cs.IR cs.LG 新提交

Aperon Technical Report: Hierarchical No-Pointer Tangent-Local Search for High-Dimensional Approximate Nearest Neighbors

Aperon技术报告:用于高维近似最近邻搜索的层次化无指针切向局部搜索

Yong Fu

发表机构 * Substratum Labs(Substratum实验室)

AI总结 提出HNTL框架,通过无指针块SoA布局和局部切空间划分,实现高维向量索引与候选生成,在768维数据上以C=20候选池达到Rerank Recall@10=1.0000,相比指针追踪图遍历加速3.61倍。

详情
AI中文摘要

我们提出了HNTL(层次化无指针切向局部搜索),这是Aperon向量内存系统的核心向量索引和候选生成框架。近邻图(例如HNSW)在内存开销上承受了沉重的指针税,并导致不规则的内存访问,从而阻塞CPU流水线。HNTL通过将高维空间划分为局部、连贯的颗粒,将向量表示为局部切空间上的低维坐标,并使用无指针的Block-SoA(结构体数组)布局顺序扫描它们来解决这一问题。在非各向同性流形数据(d=768,N=10,000)上,局部PCA捕获了96.3%的方差,使得HNTL能够仅使用C=20个向量的候选池达到最终Rerank Recall@10为1.0000。通过Apple kperf CPU性能监控单元(PMU)计数器进行的硬件性能分析表明,我们使用NEON自动向量化的C++ Block-SoA扫描引擎相比标准的指针追踪图遍历实现了3.61倍加速(4.137纳秒/向量对比14.951纳秒/向量),这得益于3.59倍的IPC(每周期指令数)和接近零的L1/L2数据缓存未命中。

英文摘要

We present HNTL (Hierarchical No-pointer Tangent-Local), the core vector indexing and candidate generation framework of the Aperon vector memory system. Proximity graphs (e.g., HNSW) incur a heavy pointer tax in memory overhead and induce irregular memory accesses that stall CPU pipelines. HNTL resolves this by partitioning the high-dimensional space into local, coherent grains, representing vectors as low-dimensional coordinates on local tangent spaces, and scanning them sequentially using a pointerless Block-SoA (Structure-of-Arrays) layout. On anisotropic manifold data (d=768, N=10,000), local PCA captures 96.3% of the variance, allowing HNTL to achieve a final Rerank Recall@10 of 1.0000 with a candidate pool size of only C=20 vectors. Hardware profiling via Apple kperf CPU Performance Monitoring Unit (PMU) counters demonstrates a 3.61x speedup (4.137 ns/vector vs. 14.951 ns/vector) for our NEON auto-vectorized C++ Block-SoA scan engine over standard pointer-chasing graph traversals, driven by a 3.59x IPC (Instructions Per Cycle) and near-zero L1/L2 data cache misses.

2606.08806 2026-06-09 cs.SE cs.AI 新提交

Governance Controls for AI-Generated Test Artifacts in Autonomous Software Testing

自主软件测试中AI生成测试工件的治理控制

Dimple Bajaj, Deepak Khetan

发表机构 * GitHub

AI总结 提出治理感知自主测试框架(GATF),通过治理验证、可解释性分析、风险评估、合规监控和审计治理,将AI生成测试工件的治理风险降低89.6%,准确率达94.3%。

Comments 21 pages, 9 figures

详情
AI中文摘要

人工智能(AI)和大语言模型(LLMs)越来越多地用于自主软件测试;然而,AI生成的测试工件常常存在幻觉、合规违规、安全风险和有限的可解释性。为了提高AI生成测试工件的可靠性、透明度和可信度,本研究引入了治理感知自主测试框架(GATF)的概念。该框架通过治理验证、可解释性分析、概率风险评估、合规监控以及审计治理来扩展自主测试生命周期。使用Defects4J和PROMISE软件工程数据集进行了实验。所提出的框架成功地将治理相关风险降低了89.6%,并在治理方面表现出94.3%的准确率、96.5%的工件可靠性、94.2%的合规准确率和90.8%的可解释性性能。结果表明,与传统的基于AI的测试系统相比,具有治理意识的自主测试系统可以显著提高自主测试系统的可靠性、透明度和操作安全性。所提出的架构具有可扩展性和可靠性,为软件测试提供了安全的环境。

英文摘要

Artificial Intelligence (AI) and Large Language Models (LLMs) are increasingly used in autonomous software testing; however, AI-generated test artifacts often suffer from hallucinations, compliance violations, security risks, and limited explainability. To enhance the reliability, transparency, and trustworthiness of AI-generated testing artifacts, this research introduces the concept of Governance-Aware Autonomous Testing Framework (GATF). The framework extends the autonomous testing lifecycle with governance validation, explainability analysis, probabilistic risk assessment, compliance monitoring, as well as audit governance. Experiments were performed with Defects4J and PROMISE software engineering datasets. The proposed framework successfully reduced the governance-related risks by 89.6% and demonstrated 94.3% accuracy in governance, 96.5% artifact reliability, 94.2% compliance accuracy, and 90.8% explainability performance. The results show that autonomous testing systems that are governance-aware can significantly enhance the reliability, transparency, and operational security of autonomous testing systems in comparison to conventional AI-based testing systems. The proposed architecture is scalable and reliable and provides a safe environment for software testing.

2606.08793 2026-06-09 cs.SE cs.AI 新提交

AI-Augmented Closed-Loop Quality Engineering: A Reference Architecture for Continuous Software Quality Intelligence

AI增强的闭环质量工程:面向持续软件质量智能的参考架构

Dimple Bajaj

发表机构 * Dimple Bajaj

AI总结 提出一种AI增强的闭环参考架构,通过需求特征挖掘、风险测试优先级、缺陷预测和生产事件分析,结合有限反馈学习模型,在六个发布周期中减少缺陷泄漏、提高检测效率并缩短测试执行时间。

Comments 15 pages, 4 figures

详情
AI中文摘要

由于需求、测试和生产之间的流程脱节,软件工程的质量仍面临挑战,这阻碍了在连续发布中实施质量策略的机会。现有方法往往是固定模型或单优化方法,缺乏生产反馈学习机制。本文提出了一种AI增强的持续软件质量智能闭环参考架构。该模型综合了需求特征挖掘、基于风险的测试优先级排序、缺陷预测和生产事件分析,作为基于反馈的流水线的一个元素。引入了一种有限反馈学习模型,用于根据缺陷严重性和事件影响将生产信号传播到下一个发布,以确保稳定性和时间。该方法使用一个半合成测试数据集进行评估,该数据集包含6个发布周期中的4500个需求、27049个测试用例、13089个缺陷和7841个事件。实验结果表明,与非自适应基线相比,所提出的系统将缺陷泄漏从0.19降低到0.13,将检测系统的有效性从0.72提高到0.84,并将测试执行时间缩短了高达35%。这些变化在发布之间是稳定的。研究结果表明,通过在闭环架构中集成基于反馈的学习,可以持续改进质量过程,为自适应软件质量工程提供了实用基础。

英文摘要

The quality of software engineering is still under a challenge due to disjointed processes between requirements, testing, and production, which hinders the opportunity to implement quality strategies in consecutive releases. Existing approaches tend to be fixed-model or single-optimization approaches and lack production feedback learning mechanisms. The paper at hand proposes a closed-loop reference architecture of continuous software quality intelligence with AI enhancements. The model synthesizes requirement feature mining, risk-based test prioritization, defect prediction, and production incident analysis as an element of a feedback-based pipeline. A limited feedback learning model is introduced that is used to propagate the production signal-based on defect severity and incident impact- to the following release to ensure stability, and the time. The method is evaluated using a semi-synthetic test dataset of 4,500 requirements, 27,049 test cases, 13,089 defects and 7,841 incidents in six release cycles. The experimental results show that the proposed system reduces the defect leakage by 0.19 to 0.13, increases the effectiveness of the detection system to 0.72 to 0.84, and shortens the test execution by up to 35 percent compared to the non-adaptive baselines. The changes are stable release to release. The findings indicate that through the integration of feedback-based learning in a closed-loop architecture, it can be continued to enhance quality process, which offers practical foundation of adaptive quality engineering of software.

2606.08791 2026-06-09 econ.EM cs.AI q-fin.PM q-fin.RM q-fin.ST 新提交

Evaluating AI Investment Strategies

评估AI投资策略

Irene Aldridge

发表机构 * ablemarkets.com(ablemarkets公司)

AI总结 研究通过可观测输入输出审计黑箱算法决策者,提出动态策略累积遗憾的精确分解,扩展至多期随机动态规划,并给出偏差修正与轨迹估计器。

Comments 33 pages

详情
AI中文摘要

我们研究仅从可观测输入和输出审计黑箱算法决策者的问题。主要结果是一个精确分解:在精确刻画条件下,动态策略的累积遗憾等于成本向量与策略决策之间每期协方差之和。这扩展了Aldridge (2026)的单期恒等式到随机动态规划的完整多期设置。我们证明了该恒等式在独立同分布成本和均值无偏马尔可夫策略下精确成立,推导了非平稳和时变情况下的闭式偏差修正,并建立了折现期模拟。协方差遗憾泛函的贝尔曼递归将该结果与标准强化学习算法联系起来;对于滚动窗口策略,估计误差偏差为$O(d/w)$。该分解对战略环境中的算法审计有直接影响:在平台机制设计中,它提供了基于福利的审计指标,无需访问代理的私人类型;在重复博弈中,协方差减少是策略改进的充分条件;在采购和广告拍卖中,偏差修正量化了战略误报导致的福利损失。相关的轨迹估计器是一致的、渐近正态的(具有HAC方差),并且可在$O(T \cdot nd)$时间内计算。这使得所提出的方法成为平台机制、算法投资策略以及任何受外部绩效审查的序列决策系统的可处理、无模型审计工具。

英文摘要

We study the problem of auditing a black-box algorithmic decision-maker from observable inputs and outputs alone. Our main result is an exact decomposition: under precisely characterized conditions, the cumulative \emph{regret} of a dynamic policy equals the sum of per-period covariances between the cost vector and the policy's decision. This extends the single-period identity of Aldridge~(2026) to the full multi-period setting of stochastic dynamic programming. We prove the identity holds exactly under i.i.d. costs and mean-unbiased Markov policies, derive closed-form bias corrections for non-stationary and time-varying cases, and establish the discounted-horizon analog. A Bellman recursion for the covariance regret functional connects the result to standard reinforcement learning algorithms; for rolling-window policies, the estimation-error bias is $O(d/w)$. The decomposition has direct implications for algorithmic auditing in strategic environments: in platform mechanism design, it provides a welfare-based audit metric without access to the agent's private type; in repeated games, covariance reduction is a sufficient condition for policy improvement; in procurement and ad auctions, the bias correction quantifies welfare loss from strategic misreporting. The associated trajectory estimator is consistent, asymptotically normal with HAC variance, and computable in $O(T \cdot nd)$ time. This makes the proposed approach a tractable, model-free audit tool for platform mechanisms, algorithmic portfolio strategies, and any sequential decision system subject to external performance review.

2606.08783 2026-06-09 math.OC cs.LG cs.NA math.NA 新提交

OptMuon: Closed-Loop Orthogonalized Momentum Methods for Stochastic Optimization with Zero-Noise Optimality

OptMuon:用于随机优化的闭环正交动量方法及其零噪声最优性

Ganzhao Yuan

发表机构 * Faculty of Computer Science and Artificial Intelligence(计算机科学与人工智能学院) Shenzhen University of Advanced Technology (SUAT)(深圳先进技术大学)

AI总结 提出OptMuon,将Muon风格极因子方向与轨迹依赖的AdaGrad-Norm型系数调度结合,实现自适应动量正交化,在无噪声时达到近乎最优的一阶速率,且无需手动调整超参数。

详情
AI中文摘要

正交化动量更新,如Muon风格优化器中所使用的,最近在大规模深度学习中显示出强大的经验稳定性。然而,现有的正交化方法通常与常数或开环幅度规则配对,因此不会根据观察到的优化轨迹明确校准其更新幅度。受Lipschitz-free和噪声自适应方法背后的闭环视角启发,我们提出了OptMuon,一种用于随机非凸优化的自适应动量正交化方法家族。OptMuon将Muon风格的极因子方向与轨迹依赖的AdaGrad-Norm型系数调度相结合,使得更新幅度由观察到的梯度和动量历史决定,而不是由预设的Lipschitz依赖规则决定。该调度在参数选择中不使用光滑常数、方差水平或有界梯度常数,其运行最大值校正防止了孤立的梯度尖峰导致过度的系数崩溃。在随机梯度有界方差、光滑性以及几乎必然有界随机梯度条件下,我们证明了两个互补的保证。OptMuon-A在平均光滑性下达到噪声自适应速率\(\tilde{\mathcal O}(T^{-1/2}+σ^{1/2}T^{-1/4})\),而OptMuon-I在个体光滑性下达到\(\tilde{\mathcal O}(T^{-1/2}+σ^{1/3}T^{-1/3})\)。在零噪声机制下,两个界限自动简化为近乎最优的确定性一阶速率\(\tilde{\mathcal O}(T^{-1/2})\),无需手动重新调整超参数。这些结果表明,闭环标量自适应可以与Muon风格的动量正交化相结合,同时保持噪声自适应性和零噪声最优性(至多对数因子)。

英文摘要

Orthogonalized momentum updates, as used in Muon-style optimizers, have recently shown strong empirical stability in large-scale deep learning. However, existing orthogonalized methods are typically paired with constant or open-loop magnitude rules, and therefore do not explicitly calibrate their update magnitudes from the observed optimization trajectory. Motivated by the closed-loop perspective behind Lipschitz-free and noise-adaptive methods, we propose OptMuon, a family of adaptive momentum orthogonalization methods for stochastic nonconvex optimization. OptMuon combines Muon-style polar-factor directions with a trajectory-dependent AdaGrad-Norm-type coefficient schedule, so that the update magnitude is determined by the observed gradient and momentum history rather than by a prescribed Lipschitz-dependent rule. The schedule does not use the smoothness constant, the variance level, or the bounded-gradient constant in parameter selection, and its running-maximum correction prevents isolated gradient spikes from causing excessive coefficient collapse. Under lower-boundedness, unbiased stochastic gradients with bounded variance, smoothness, and an almost-sure bounded stochastic-gradient condition, we prove two complementary guarantees. OptMuon-A achieves the noise-adaptive rate \(\tilde{\mathcal O}(T^{-1/2}+σ^{1/2}T^{-1/4})\) under average smoothness, while OptMuon-I achieves \(\tilde{\mathcal O}(T^{-1/2}+σ^{1/3}T^{-1/3})\) under individual smoothness. In the zero-noise regime, both bounds automatically reduce to a nearly optimal deterministic first-order rate \(\tilde{\mathcal O}(T^{-1/2})\) without manual hyperparameter retuning. These results show that closed-loop scalar adaptation can be combined with Muon-style momentum orthogonalization while retaining noise adaptivity and zero-noise optimality up to logarithmic factors.

2606.08761 2026-06-09 cs.DC cs.AI 新提交

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

APEX4: 通过SM内计算重平衡实现高效纯W4A4 LLM推理

Hong Guo, Nianhui Guo, Weixing Wang, Jona Otholt, Christoph Meinel, Haojin Yang

发表机构 * Hasso Plattner Institute(霍普夫-普拉特纳研究所) GreenBit.AI German University of Digital Science(德国数字科学大学)

AI总结 针对W4A4量化中CUDA核心反量化瓶颈,提出基于SM内计算平衡的ρ感知粒度自适应方法,设计纯INT4 GEMM内核,在多种GPU上实现最高2.09倍加速。

详情
AI中文摘要

W4A4量化承诺充分利用INT4张量核心,但CUDA核心上的组反量化开销导致现有系统采用混合精度回退。我们首次系统研究了SM内计算平衡如何主导这一瓶颈。通过在Ampere和Ada架构的四款GPU上进行受控基准测试,我们识别出张量核心与CUDA核心的吞吐量比($ρ$)作为主要硬件指标:在计算受限场景下,W4A4-g128内核在RTX 3090($ρ=16$)上获得$2.0$--$2.5\times$加速,但在A100($ρ=64$)上退化为$0.43$--$0.47\times$,表明W4A4的可行性是平台相关的,而非普遍不可行。基于这一发现,我们构建了\textbf{APEX4},它协同设计纯INT4 GEMM内核与$ρ$感知的粒度自适应,以缓解CUDA核心反量化瓶颈。APEX4在LLaMA-2-70B上实现了与FP16相差0.63的困惑度,并在零样本准确率上优于W4Ax Atom-g128达4.0%--4.4%。作为未修改vLLM中的即插即用替代品,它在L40S($ρ=8$)上提供高达$1.66\times$的端到端加速,在RTX 3090($ρ=16$)上为$1.78\times$,在A40($ρ=16$)上为$2.09\times$,并通过混合粒度模式将A100($ρ=64$)恢复至$1.20$--$1.40\times$。

英文摘要

W4A4 quantization promises full utilization of INT4 Tensor Cores, yet group dequantization overhead on CUDA Cores has driven existing systems to mixed-precision fallbacks. We present the first systematic study of how intra-SM compute balance governs this bottleneck. Through controlled benchmarks across four GPUs from Ampere and Ada architectures, we identify the Tensor Cores to CUDA Cores throughput ratio ($ρ$) as the primary hardware indicator: the W4A4-g128 kernel yields $2.0$--$2.5\times$ speedup on RTX~3090 ($ρ=16$) yet degrades to $0.43$--$0.47\times$ on A100 ($ρ=64$) in compute-bond scenarios, establishing W4A4 viability as platform-dependent rather than universally infeasible. Guided by this finding, we build \textbf{APEX4}, which co-designs pure INT4 GEMM kernels with $ρ$-aware granularity adaptation to mitigate the CUDA Cores dequantization bottleneck. APEX4 achieves perplexity within 0.63 of FP16 on LLaMA-2-70B and outperforms W4Ax Atom-g128 by 4.0\%--4.4\% in zero-shot accuracy. Deployed as a drop-in replacement in unmodified vLLM, it delivers up to $1.66\times$ end-to-end speedup on L40S ($ρ=8$), and $1.78\times$ on RTX~3090 ($ρ=16$), $2.09\times$ on A40 ($ρ=16$), while recovering A100 ($ρ=64$) to $1.20$--$1.40\times$ via the mixed-granularity mode.

2606.08738 2026-06-09 cs.NI cs.RO 新提交

Systems-Level Planning and Coordination of Truck-Drone Collaborative Delivery Networks

卡车-无人机协同配送网络的系统级规划与协调

Didem Cicek, Burak Kantarci

发表机构 * School of Electrical Engineering and Computer Science at the University of Ottawa(渥太华大学电气工程与计算机科学学院)

AI总结 针对城市最后一英里配送,提出分层规划与协调框架,通过任务编排与智能体同步,实现卡车-无人机协同配送,相比纯卡车模式,总配送时间减少42.4%,能耗降低44.2%。

Comments 6 pages, 4 figures, Accepted to 2026 IEEE HPSR on Network Architectures and Intelligence for Smart Mobility and Autonomous Systems (TRAVERSAL)

详情
AI中文摘要

城市最后一英里包裹配送日益依赖异构车队,其性能取决于及时协调、可靠通信和可扩展控制。卡车-无人机协作已成为一种网络化信息物理配送范式,结合了卡车的载重能力和续航效率与无人机在拥挤或受限城市环境中的灵活性。本文从系统与控制角度提出了一种分层规划与协调框架,用于构建卡车-无人机协同配送(TDCD)。该框架由五个相互关联的层组成:空间需求对齐、协同配送配置、资源与工作流编排、性能评估和可扩展性分析,为网络化配送操作中的协调、控制和系统级性能提供了统一视角。使用源自2021年亚马逊最后一英里路线研究挑战数据集的实际城市最后一英里配送场景评估了所提框架。案例研究表明,通过结构化任务编排和智能体间同步实现的协调卡车-无人机操作,在操作约束下提高了端到端系统效率。结果显示,与传统的纯卡车配送模型相比,总配送时间减少了42.4%,能耗降低了44.2%。可扩展性分析进一步强调了协调收益如何随系统规模增大而持续,并展示了高效控制和通信在异构配送网络中的重要性。

英文摘要

Urban last-mile parcel delivery increasingly relies on heterogeneous fleets whose performance depends on timely coordination, reliable communication, and scalable control. Truck-drone collaboration has emerged as a networked cyber-physical delivery paradigm that combines the payload capacity and range efficiency of trucks with the agility of drones in congested or access-limited urban environments. This paper proposes a layered planning and coordination framework that structures truck-drone collaborative delivery (TDCD) from a systems and control perspective. The framework consists of five interrelated layers: spatial-demand alignment, collaborative delivery configuration, resource and workflow orchestration, performance evaluation, and scalability analysis, providing a unified view of coordination, control, and system-level performance in networked delivery operations. The proposed framework is evaluated using a realistic urban last-mile delivery scenario derived from the 2021 Amazon Last Mile Routing Research Challenge dataset. The case study demonstrates how coordinated truck-drone operation, enabled by structured task orchestration and inter-agent synchronization, improves end-to-end system efficiency under operational constraints. Results show a 42.4% reduction in total delivery time and a 44.2% reduction in energy consumption compared to a conventional truck-only delivery model. The scalability analysis further highlights how coordination gains persist as system size increases, and shows the importance of efficient control and communication in heterogeneous delivery networks.

2606.08727 2026-06-09 math.NA cs.LG cs.NA 新提交

Compositional Approximation Can Strictly Outperform Superpositional Approximation

组合逼近可以严格优于叠加逼近

Dennis Elbrächter, Philipp Petersen

发表机构 * University of Freiburg(弗赖堡大学)

AI总结 本文通过构造显式例子,证明存在函数类使得叠加逼近的速率严格低于组合逼近,且差距可任意大。

详情
AI中文摘要

许多经典研究的函数类已知可以通过叠加方法最优逼近,即通过某些字典中元素的线性组合构造逼近。这里的最优性意味着,以参数数量为函数的均匀逼近误差具有任何参数化方法所能达到的最高阶多项式衰减,其中参数可以编码为长度与参数数量成正比(对数因子内)的比特串。尽管像神经网络这样的组合方法在结构上不同,但通过施加确保这种比例比特串编码的约束,它们的逼近速率可以变得可比。在这项工作中,我们研究了具有结构性质的函数类,这些性质限制了叠加逼近速率严格低于组合逼近速率。特别地,我们构造了显式例子,使得两者之间存在任意大的差距。

英文摘要

Many classically studied function classes are known to be approximated optimally by superpositional methods, i.e. with approximants constructed as the linear combination of elements in some dictionary. Here optimality means that the uniform approximation error viewed as a function of the number of parameters used has polynomial decay of the highest order achievable by any parametrized method whose parameters can be encoded as a bit string of length proportional, up to logarithmic factors, to the number of parameters. While compositional methods like neural networks are structurally different, their approximation rates can be made comparable by imposing constraints that ensure such a proportional bit string encoding. In this work we study function classes exhibiting structural properties that limit superpositional approximation rates to be strictly lower than compositional approximation rates. In particular, we construct explicit examples for which there is an arbitrarily large gap.

2606.08714 2026-06-09 eess.SY cs.AI cs.LG cs.RO cs.SY 新提交

Hybrid Neural Network and Conventional Controller Approach for Robust Control of Highly Unstable Systems: Application to Tilt-Rotor Control

混合神经网络与传统控制器方法用于高度不稳定系统的鲁棒控制:应用于倾转旋翼控制

Ali Kafili Gavgani, Amin Talaeizadeh, Aria Alasty, Hossein Nejat Pishkenari

发表机构 * Advanced Research Lab for Control and Agricultural Robotics (Sharif AgRoLab)(控制与农业机器人高级研究实验室(谢尔生产大学AgRoLab)) Department of Mechanical Engineering, Sharif University of Technology, Tehran, Iran(技术大学机械工程系,德黑兰,伊朗)

AI总结 提出一种神经网络增强的滑模控制器,将系统动力学分解为输入无关和输入相关部分,前者用轻量网络从少量数据学习,实现对全驱动倾转旋翼系统的鲁棒控制,LSTM优于MLP。

Comments Proceedings of the 13th RSI International Conference on Robotics and Mechatronics (ICRoM 2025)

详情
AI中文摘要

多旋翼飞行器广泛应用于从监视到精准农业等领域,但传统设计仍受限于其欠驱动特性。倾转旋翼配置通过实现全驱动克服了这一限制。本文研究基于神经网络的控制策略,用于一个具有四个推力矢量输入的全驱动倾转旋翼系统。我们的工作分为两部分。首先,我们有意呈现一个负面结果,通过评估直接输入-输出控制方法。在该方法中,多层感知器(MLP)、长短期记忆(LSTM)网络和Transformer模型被训练为直接将系统状态及其期望值映射到控制信号。我们表明该策略无法稳定系统,凸显了将直接输入-输出学习应用于高度不稳定对象的固有困难。其次,作为主要贡献,我们提出一种神经网络增强的滑模控制器(SMC)。该方法将系统动力学分解为输入无关和输入相关两部分,前者使用轻量网络从少量数据集学习,从而降低实时计算需求。此外,所提方法可以使用从低性能控制器收集的飞行日志进行训练,并且从真实数据学习到的动力学模型可用于仿真。我们进一步比较了基于MLP和LSTM的实现,在模型不确定性和外部干扰下,展示了所提方法的鲁棒性和有效性;特别是,带有LSTM植物动力学预测器的控制器相比基于MLP的对应物实现了更优性能,同时运行时也更低。

英文摘要

Multirotors are widely used in applications ranging from surveillance to precision agriculture, yet conventional designs remain limited by their under-actuation. Tilt-rotor configurations overcome this limitation by enabling full actuation. This paper investigates neural-network-based control strategies for a fully actuated tilt-rotor system with four thrust-vectoring inputs. Our work is structured in two parts. First, we deliberately present a negative result by evaluating a direct input-output control approach. In this method, multilayer perceptrons (MLPs), long short-term memory (LSTM) networks, and transformer models are trained to map system states and their desired values directly to control signals. We show that this strategy fails to stabilize the system, highlighting the inherent difficulty of applying direct input-output learning to highly unstable plants. Second, as the main contribution, we propose a neural-network-enhanced sliding mode controller (SMC). The method decomposes the system dynamics into input-independent and input-dependent components, with the former learned from a small dataset using lightweight networks, thereby reducing real-time computational demands. Moreover, the proposed method can be trained using flight logs collected from low-performance controllers, and the resulting dynamic model learned from real-world data can be used in simulation. We further compare MLP- and LSTM-based implementations under model uncertainties and external disturbances, demonstrating the robustness and effectiveness of the proposed approach; in particular, the controller with the LSTM plant dynamics predictor achieves superior performance to its MLP-based counterpart while also exhibiting lower runtime.

2606.08710 2026-06-09 cs.SE cs.AI 新提交

Structuring agentic AI for HPC code modernization

构建用于HPC代码现代化的智能体AI

Anthony Marinov, Igor Sfiligoi

发表机构 * San Diego Supercomputer Center(圣地亚哥超级计算机中心) University of California, San Diego(加州大学圣地亚哥分校) La Jolla, California, United States(圣地亚哥, 加州, 美国)

AI总结 提出一种结构化智能体AI方法,通过手动示例、持续可构建性和限制会话范围,成功将6万行Fortran MPI代码在数月内转换为C++ OpenMP并行代码。

Comments 10 pages

详情
AI中文摘要

传统科学代码的现代化通常需要跟上计算资源生态系统的不断变化。并行化和从支持不佳的软件生态系统迁移是研究软件工程领域中最耗时的两项活动。本文介绍了我们在NMAP-RKPM(一个基于再生核粒子方法(RKPM)的约6万行三维显式固体力学物理引擎)的成功两阶段AI辅助现代化中的经验。我们在几个月内将这一基于Fortran的单线程MPI应用程序转换为基于C++的OpenMP并行MPI工具。虽然基于大型语言模型(LLM)的工具本身被证明不足,但我们开发了一种高度结构化的“手把手”智能体AI方法,例如提供手动创建的示例、确保持续可构建性和限制会话范围,这种方法反而非常有效。本文提供了成功的AI辅助步骤以及我们必须克服的问题,以及所选路径背后的推理。

英文摘要

Modernization of legacy scientific codes is often necessary to keep up with the ever-evolving changes in the compute resource ecosystem. Parallelization and migration from poorly supported software ecosystems are two of the most time-consuming activities in the research software engineering field. This paper presents our experience in the successful, two-phase AI-assisted modernization of NMAP-RKPM, a roughly 60,000-line, 3D explicit solid mechanics physics engine based on the Reproducing Kernel Particle Method (RKPM). We converted this single-threaded, Fortran based MPI application into a OpenMP-parallel C++ based MPI tool in the span of a few months. While Large Language Model (LLM) based tools on their own proved inadequate, we developed a highly structured "hand-holding" agentic AI methodology, like providing manually created examples, ensuring continuous buildability and limiting session scope, that was instead highly effective. The paper provides both the AI-assisted steps that were successful and the problems that we had to overcome, alongside the reasoning behind the chosen path.

2606.08694 2026-06-09 cond-mat.soft cond-mat.stat-mech cs.LG 新提交

Discovering and decoding latent mean-field structure with variational autoencoders

通过变分自编码器发现和解读隐平均场结构

Marco Biroli, Max Welling, Vincenzo Vitelli

发表机构 * Department of Physics and the James Franck Institute, University of Chicago(芝加哥大学物理系及詹姆斯·弗兰克研究所) CuspAI AMLab, University of Amsterdam(阿姆斯特丹大学AMLab) Leinweber Institute for Theoretical Physics(莱因韦伯理论物理研究所)

AI总结 提出一种量化变分自编码器(VAE)重建多体系统联合概率分布能力的准则,证明成功VAE的条件独立解码器等价于有限尺寸平均场分解,从而可从解码器读出平均场理论的微观参数,并在标量、向量和张量序参量模型及视网膜记录数据中验证。

Comments 10 pages, 5 figures

详情
AI中文摘要

生成模型越来越多地用于捕捉多体系统中的相关性,但它们学习到的表示在很大程度上仍难以进行物理解释。在这里,我们建立了一个直观的准则,用于量化变分自编码器(VAE)忠实重建多体系统联合概率分布的能力。简而言之,通过将潜在通道的速率与数据的二分互信息进行比较,可以得到VAE容量的一个界限。利用这个界限,我们证明任何成功的VAE的条件独立解码器在结构上等同于有限尺寸平均场分解。因此,成功的重建是潜在平均场理论的直接证据,并且该理论的微观参数可以从训练好的解码器中读出。我们在具有标量(Curie-Weiss)、向量(Hopfield)和张量(Maier-Saupe)序参量的可解模型层次上验证了这些结论,仅从平衡样本中恢复了完整的Hopfield模式矩阵。我们发现,当应用于蝾螈视网膜记录时,一个双潜在VAE仅用两个有效的集体变量就再现了群体统计,使我们能够恢复神经群体的“存储模式”,并写出一个正确建模实验数据的广义Hopfield模型。

英文摘要

Generative models are increasingly used to capture correlations in many-body systems, but the representations they learn remain largely opaque to physical interpretation. Here, we establish an intuitive criterion that quantifies the capacity of a variational autoencoder (VAE) to faithfully reconstruct the joint probability distribution of a many body system. In a nutshell, a bound on the VAE capacity is obtained by comparing the rate of the latent channel to the bipartite mutual information of the data. Using this bound, we show that the conditionally independent decoder of any successful VAE is structurally identical to a finite-size mean-field factorization. Hence, a successful reconstruction is direct evidence for a latent mean-field theory and the microscopic parameters of that theory can be read off the trained decoder. We validate these conclusions on a hierarchy of solvable models with scalar (Curie-Weiss), vector (Hopfield) and tensor (Maier-Saupe) order parameters, recovering the full Hopfield pattern matrix from equilibrium samples alone. We find that, when applied to Salamander retinal recordings, a two-latent VAE reproduces the population statistics with only two effective collective variables allowing us to recover the `stored patterns' of the neural population and write a generalized Hopfield model which correctly models the experimental data.