arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1946
2605.22122 2026-05-22 cs.CR cs.AI

Adversarial Trust Poisoning in Vehicular Collaborative Perception

车联网协作感知中的对抗信任污染

Yutong Liu, Chenyi Wang, Ming F. Li, Qingzhao Zhang

AI总结 该研究提出TrustFlip攻击,利用一致性防御机制污染对良性车辆的信任评分,导致系统感知能力下降甚至安全故障,同时提出TrustReflect作为缓解措施。

详情
AI中文摘要

协作感知(CP)使连接和自动驾驶车辆能够共享传感器数据并共同感知环境。为防御对抗者篡改共享数据,现有系统采用跨车辆不一致性检测和信任估计,惩罚与多数观察冲突的车辆。本文证明这些防御本身引入了新的攻击面。我们提出了TrustFlip,一种利用一致性防御机制污染对良性车辆信任的新型攻击。不同于注入虚假数据,它部署真实的物理对抗对象,诱导良性车辆产生不一致观察。由此产生的不一致被防御机制误归因于目标车辆,导致其信任分数下降并最终被降权或排除。因此,系统失去可靠感知贡献者,降低感知能力,可能引发安全关键故障。我们在多个协作感知架构和防御机制上评估TrustFlip。结果表明,最先进防御可显著受影响:攻击在87.7%的场景中将目标良性车辆排除在协作之外,并将平均精度(AP)降低高达13%。作为初步缓解措施,我们引入TrustReflect,一种轻量级的自我反思机制,将争议区域标记为不确定并排除在信任评估之外,将攻击成功率降低35-100%。

英文摘要

Collaborative perception (CP) enables connected and autonomous vehicles to share sensor data and jointly reason about their environment. To defend against adversaries that fabricate or manipulate shared data, existing systems employ cross-vehicle inconsistency detection and trust estimation, penalizing vehicles whose observations conflict with the majority. In this work, we show that these defenses themselves introduce a new attack surface. We present TrustFlip, a novel attack that weaponizes consistency-based defenses to poison the trust assigned to benign vehicles. Instead of injecting false data into the collaboration pipeline, it deploys physical adversarial objects that are genuine but induce inconsistent observations among benign vehicles. The resulting inconsistencies are misattributed by the defense to the targeted vehicle, causing its trust score to degrade and eventually leading to its downweighting or exclusion from collaboration. Consequently, the system loses reliable sensing contributors, degrading perception capability and potentially inducing safety-critical failures. We evaluate TrustFlip across multiple collaborative perception architectures and defense mechanisms. Our results show that state-of-the-art defenses can be significantly affected: the attack removes the targeted benign vehicle from collaboration in up to 87.7% of scenarios and drops Average Precision (AP) by up to 13%. As an initial mitigation, we introduce TrustReflect, a lightweight self-reflection mechanism that marks disputed regions as uncertain and excludes them from trust evaluation, reducing the attack success rate by 35-100%.

2605.22120 2026-05-22 eess.AS cs.SD

Effective User-defined Keyword Spotting with Dual-stage Matching, Multi-modal Enrollment, and Continual Adaptation

高效的用户定义关键词侦测:双阶段匹配、多模态注册与持续适应

Zhiqi Ai, Han Cheng, Shiyi Mu, Xinnuo Li, Yongjin Zhou, Shugong Xu

AI总结 本文提出DMA-KWS框架,通过双阶段匹配、多模态注册和持续适应方法,解决用户定义关键词侦测中的混淆词区分、说话人发音不一致和高数据成本问题,实验表明其在LibriPhrase Hard子集上达到97.85%的AUC和6.13%的EER,性能领先。

Comments 14 pages, 13 figures, 12 tables. Accepted by TASLP

详情
AI中文摘要

用户定义关键词侦测(KWS)对于个性化语音交互至关重要,但现有方法面临几个挑战:(1)混淆词之间的区分度不足,(2)在发音不同的说话人之间性能不一致,(3)高数据成本以确保可靠的唤醒词性能。本文介绍DMA-KWS,一种高效的、稳健的用户定义关键词侦测框架。首先,它采用双阶段匹配流程:CTC解码结合流式音素搜索来定位候选段,随后使用QbyT结合音素匹配器进行精细验证,使其能够更好地区分混淆词。接下来,多模态注册融合用户特定的语音与文本嵌入,进一步提高已注册用户的准确性。最后,参数高效的持续适应机制通过合成和真实数据进行轻量级更新。广泛的实验表明DMA-KWS的优越性能。在LibriPhrase Hard子集上,它实现了97.85%的AUC和6.13%的EER,达到最先进的性能。在说话人依赖设置中,DMA-KWS始终优于文本-only注册,显示出显著的性能提升。此外,所提出的参数高效的微调机制仅需187k个更新参数即可适应DMA-KWS,进一步提高KWS性能,同时确保适用于设备部署。

英文摘要

User-defined keyword spotting (KWS) is crucial for personalized voice interaction, yet existing methods face several challenges: (1) insufficient discriminability among confusable words, (2) performance inconsistency across speakers with varying pronunciations, and (3) high data cost to ensure reliable wake-word performance. In this paper, we introduce DMA-KWS, an efficient and robust framework for user-defined keyword spotting. First, it adopts a dual-stage matching pipeline: CTC decoding with streaming phoneme search to locate candidate segments, followed by QbyT with a phoneme matcher for fine-grained verification, enabling it to better distinguish confusable words. Next, multi-modal enrollment fuses user-specific speech with text embeddings to further improve accuracy for registered users. Finally, a parameter-efficient continual adaptation mechanism performs lightweight updates using synthetic and real data. Extensive experiments demonstrate the superior performance of DMA-KWS. On the LibriPhrase Hard subset, it achieves 97.85% AUC and 6.13% EER, reaching state-of-the-art performance. In speaker-dependent settings, DMA-KWS consistently outperforms text-only enrollment, demonstrating significant performance gains. Moreover, the proposed parameter-efficient fine-tuning mechanism adapts DMA-KWS with only 187k updated parameters, further enhancing KWS performance while ensuring suitability for on-device deployment.

2605.22112 2026-05-22 astro-ph.HE astro-ph.IM cs.LG

Self-Supervised ConvLSTM for Fermi Large Area Telescope Transient Detection

基于自监督的ConvLSTM用于费米大视场望远镜瞬变检测

Alberto Garinei, Stefano Speziali, Alessandro Vispa, Andrea Marini, Sara Cutini, Emanuele Piccioni, Marcello Marconi, Francesco Longo, Matteo Martini, Francesca Fallucchi, Romeo Giuliano, Ernesto William De Luca, Umberto Di Matteo, Sabino Meola

AI总结 本文提出了一种结合端到端模拟和自监督时空深度学习的方法,用于在受控环境中检测费米- LAT中的瞬变伽马射线现象,通过生成一个十年合成宇宙并利用ConvLSTM网络来建模天空的典型演变,以检测异常。

Comments 17 pages, 5 figures. Accepted for publication in Astronomy and Computing. Author-accepted manuscript version

详情
Journal ref
Astronomy and Computing 56 (2026) 101128
AI中文摘要

我们提出了一种框架,通过将费米- LAT天空的端到端模拟与自监督时空深度学习相结合,用于在受控环境中检测瞬变伽马射线现象。我们使用gtobssim生成一个十年的合成宇宙,并将模拟事件处理成每日全天空计数和曝光图,获得一个时间有序的序列,其结构与费米- LAT观测一致。为了建模天空的典型演变,我们采用卷积长短期记忆网络(ConvLSTM),该网络直接在地图序列上运行,保持空间局部性的同时学习时间依赖性。模型被训练以重建预期的发射,偏离学习基线的量通过像素级均方残差图量化。然后,我们通过从训练集上的残差分布估计每个像素的阈值,定义统计学驱动的异常标准,并通过局部滤波强制空间一致性以抑制孤立波动。训练后的ConvLSTM被部署到费米- LAT每日地图上,其中天空可能由于真实的天体物理变化或仪器非平稳性而偏离典型行为。所得到的流程可以标记出与高变源或瞬变事件(如耀斑或伽马射线暴)一致的局部、时间依赖的过剩,并为在长持续时间、费米- LAT类数据集上评估异常检测策略提供基准。

英文摘要

We present a framework for detecting transient gamma-ray phenomena in a controlled environment by combining end-to-end simulations of the Fermi-LAT sky with self-supervised spatio-temporal deep learning. We generate a ten-year synthetic Universe with gtobssim and process the simulated events into daily all-sky maps of counts and exposure, obtaining a time-ordered sequence that mirrors the structure of Fermi-LAT observations. To model the nominal evolution of the sky, we employ a Convolutional Long Short-Term Memory (ConvLSTM) network that operates directly on map sequences, preserving spatial locality while learning temporal dependencies. The model is trained to reconstruct expected emission, and departures from the learned baseline are quantified through pixel-wise mean-squared residual maps. We then define statistically motivated anomaly criteria by estimating per-pixel thresholds from the residual distribution on the training set, and we enforce spatial coherence via local filtering to suppress isolated fluctuations. The ConvLSTM is then deployed as trained predictor on Fermi-LAT daily maps, where the sky can depart from the nominal behavior because of genuine astrophysical variability and instrumental non-stationarities. The resulting pipeline flags localized, time-dependent excesses consistent with high-variable sources or transient events (e.g., flares or GRBs) and provides a benchmark for evaluating anomaly-detection strategies on long-duration, Fermi-LAT-like datasets.

2605.22097 2026-05-22 quant-ph cs.LG

Q-PhotoNAS: Hybrid Quantum Neural Architecture Search Framework on Photonic Devices

Q-PhotoNAS:基于光子设备的混合量子神经架构搜索框架

Farah Elnakhal, Alberto Marchisio, Nouhaila Innan, Gabriel Falcao, Muhammad Shafique

AI总结 本文提出了一种结合遗传算法和可学习量子相位编码的混合光子量子-经典模型神经架构搜索框架,通过系统探索经典和量子组件的联合设计空间,提高了图像分类任务的准确率和硬件兼容性。

详情
AI中文摘要

光子量子计算是一种有前景的可扩展量子机器学习平台,但在硬件和优化约束下设计有效的混合架构仍然具有挑战性。现有方法依赖于手动调优的架构,无法考虑经典预处理、相位编码和光子电路结构之间的协同作用,限制了准确性和硬件兼容性。在本文中,我们提出了一种混合光子量子-经典模型的神经架构搜索框架,结合基于遗传算法的搜索和可学习量子相位编码,系统地探索经典和量子组件的联合设计空间。我们的框架编码了19个超参数,分布在六个基因组中,并通过基于组的交叉、按基因突变和精英主义进化混合架构的种群。在短训练预算下评估每个候选者,然后对最佳设计进行完整重新训练。我们在两个图像分类基准测试上评估了我们的框架,即Digits和MNIST,分别达到了99.44%和98.78%的最终验证准确率,基于Quandela Ascella光子QPU的第一性执行时间估计,单张图像推断时间分别为67 ms(Digits)和149 ms(MNIST)。我们的量子贡献分析进一步显示,光子层提取了与经典路径正交的非冗余特征,相较于仅经典基线提供了可测量的准确性优势。我们的结果表明,自动化架构搜索对于混合光子系统来说既实用又具有影响,为在光子设备上量子AI的系统设计空间探索开辟了道路。

英文摘要

Photonic quantum computing is a promising platform for scalable quantum machine learning, but designing effective hybrid architectures remains challenging under hardware and optimization constraints. Existing approaches rely on manually tuned architectures that fail to account for the collaboration between classical preprocessing, phase encoding, and photonic circuit structure, limiting both accuracy and hardware compatibility. In this paper, we propose a neural architecture search framework for hybrid photonic quantum-classical models that combines genetic algorithm-based search with learnable quantum phase encoding to systematically explore the joint design space of classical and quantum components. Our framework encodes 19 hyperparameters across six gene groups and evolves a population of hybrid architectures using group-based crossover, per-gene mutation, and elitism, evaluating each candidate on a short training budget before full retraining of the best found design. We evaluate our framework on two image classification benchmarks, Digits and MNIST, achieving final validation accuracies of 99.44% and 98.78%, respectively, with first-principles execution time estimates on the Quandela Ascella photonic QPU projecting single-image inference at 67 ms (Digits) and 149 ms (MNIST). Our quantum contribution analysis further shows that the photonic layer extracts non-redundant features orthogonal to the classical pathway, providing a measurable accuracy advantage over classical-only baselines. Our results demonstrate that automated architecture search is both practical and impactful for hybrid photonic systems, opening the way for systematic design space exploration of quantum AI on photonic devices.

2605.22095 2026-05-22 econ.GN cs.AI cs.GT cs.HC q-fin.EC

Not Yet: Humans Outperform LLMs in a Colonel Blotto Tournament

Not Yet: 人类在布洛托 tournaments 中优于 LLMs

Dmitry Dagaev, Egor Ivanov, Petr Parshakov, Alexey Savvateev, Gleb Vasiliev

AI总结 研究通过布洛托博弈 tournaments 比较了人类与 LLMs 的策略表现,发现人类更擅长使用校准良好的中间层次分配启发式方法,而 LLMs 的简单策略表现较差。

详情
AI中文摘要

大语言模型(LLMs)的出现促使经济学家研究人类和 LLMs 在战略环境中的行为。我们组织了一系列循环轮换 tournaments 在布洛托博弈中。该博弈吸引博弈论家的注意,因为其高维动作空间和没有纯策略纳什均衡。在第一个 tournaments 中,超过 200 名人类参与者相互竞争。在第二个 tournaments 中,几个流行的 LLMs 被邀请提交策略。在第三个 tournaments 中,我们匹配了 LLM 策略的数量与人类提交的数量。我们发现,人类更常使用更好的校准中间层次分配启发式方法,并且优于 LLMs 提交的更简单、更刻板的策略。战略复杂性是成功的关键,当且仅当达到必要的推理深度水平时。而较低和较高的推理层次在原始策略上没有明显优势。在人类中,学科背景弱预测成功:具有 STEM 背景的参与者在第一个 tournaments 中表现更好。令人惊讶的是,人类几乎不根据对手的不同集合调整策略。这一结果表明,人类主要基于游戏规则而非对手身份做出选择,将 LLMs 看作人类竞争对手。

英文摘要

The emergence of large language models (LLMs) has spurred economists to study how humans and LLMs behave in strategic settings. We organized a series of round-robin tournaments in the Colonel Blotto game. This game attracts game theorists' attention due to high-dimensional action space and the absence of pure strategy Nash equilibria. In the first tournament, more than 200 human participants competed against one another. In the second tournament, several popular LLMs were invited to submit strategies. In the third tournament, we matched the number of LLM strategies to the number submitted by humans. We find that humans more often employ better-calibrated intermediate-level allocation heuristics and outperform the simpler, more stereotyped strategies submitted by LLMs. Strategic sophistication is key to success if and only if the necessary level of reasoning depth is reached, while lower and higher levels of reasoning offer no clear advantage over the primitive strategies. Among humans, field of study weakly predicts success: participants with STEM backgrounds perform better in the first tournament. Surprisingly, humans almost do not adjust their strategies across tournaments with different sets of opponents. This result suggests that humans base their choices primarily on the game's rules rather than on the identity of their opponents, treating LLMs much like human competitors.

2605.21379 2026-05-22 cs.NE cs.AI

How to Build Marcus's Algebraic Mind: Algebro-Deterministic Substrate over Galois Fields

如何构建马库斯的代数心智:基于伽罗瓦域的代数确定性介质

Hiroyuki Chuma, Kanji Otsuk, Yoichi Sato

AI总结 本文提出了一种基于伽罗瓦域的代数确定性介质,实现了马库斯提出的三种认知架构核心要素,并展示了该架构在逻辑推理和语义区分方面的应用。

详情
AI中文摘要

在《代数心智》中,加里·马库斯指出了任何充分认知架构必须包含的三个组成部分:变量上的操作、递归结构的表示,以及个体与类别的区分。他指出标准多层感知机不支持这些,承认使用寄存器和树形单元的神经实现,通过发育程序而非梯度下降构建仍是一个程序性猜想。25年后,所需的介质现已可用。我们新开发的PyVaCoAl/VaCoAl是一种超维计算架构,围绕单个代数原语XOR-and-shift over GF(2)组织,通过原始多项式线性反馈移位寄存器实现。该架构支持通过Bind(R,F) = R XOR shift(F)实现的可逆变量绑定,非交换性的组合捆绑,能够区分“狗咬人”与“人咬狗”,并在同一代数下实现地址空间的个体/类别分离。一种互补观点认为,海马体-CA3回路是这种引擎的生物同源物,发育指定的 mossy-fiber 目标提供了马库斯预期的内生微回路。在本文中,我们映射马库斯的三种支柱与PyVaCoAl/VaCoAl的操作承诺之间的对应关系。我们重新解释树形单元为一个由原始生成多项式索引的代数寄存器集,论证该架构比2001年可用的张量积、循环卷积或时间同步更接近马库斯的规格。我们还展示了该介质如何自然扩展到佩尔的第三级反事实推理,这是原始树形单元程序未直接针对的能力。

英文摘要

In The Algebraic Mind, Gary Marcus identified three components essential for any adequate cognitive architecture: operations over variables, recursively structured representations, and a distinction between mental representations of individuals and kinds. He argued that standard multilayer perceptrons supported none of these, acknowledging that a neural implementation using registers and treelets, constructed via developmental programs rather than gradient descent, remained a programmatic conjecture. Twenty-five years later, the required substrate is now available. Our newly developed PyVaCoAl/VaCoAl is a hyperdimensional computing architecture organized end-to-end around a single algebraic primitive: XOR-and-shift over GF(2), implemented by primitive-polynomial linear-feedback shift registers. The architecture supports reversible variable binding via Bind(R,F) = R XOR shift(F), non-commutative compositional bundling that distinguishes "the dog bites the man" from "the man bites the dog," and address-space individual/kind separation under the same algebra. A companion perspective argues that the dentate gyrus-CA3 circuit is a biological homologue of this same engine, with developmentally specified mossy-fiber targeting supplying the innate microcircuitry Marcus anticipated. In this paper, we map the correspondence between Marcus's three pillars and the operational commitments of PyVaCoAl/VaCoAl. We reinterpret the treelet as an algebraic register set indexed by a primitive generator polynomial, arguing that this architecture provides a functional neural substrate meeting Marcus's specifications far more closely than the tensor products, circular convolution, or temporal synchrony available in 2001. We also demonstrate how this substrate naturally extends to Pearl's rung-3 counterfactual reasoning, a capability the original treelet program did not directly target.

2605.20348 2026-05-22 q-fin.CP cs.AI

Memory-Induced Supra-Competitive Outcomes Between Deep Reinforcement Learning Agents in Optimal Trade Execution

记忆诱导的深度强化学习代理在最优交易执行中的超竞争性结果

Christos Spyridon Koulouris, Carlo Campajola

AI总结 本文研究了在共享的最优执行环境中交互的深度强化学习代理是否能维持超竞争性结果,即在实现短损方面优于博弈论竞争基准。研究了一个双代理阿尔梅伦-克里斯特流动性清算游戏,并探讨了学习行为如何依赖于回合内环境反馈、解读中间价格的能力以及代理对过去的了解。我们首先使用事前调度学习代理来去除回合内反馈,以确定当代理在执行开始前承诺完成清算轨迹时会发生什么。然后允许代理使用多种DDQN架构根据演进的状态进行条件判断。我们发现,当代理能够访问回合内历史,特别是近期价格和自身过去行为时,超竞争性结果变得更加频繁和持久。这些发现表明,这种执行游戏中的超竞争性行为并非由多代理学习或当前价格观察单独驱动,而是由反馈、记忆和沿实际执行路径的状态依赖性交互驱动。

详情
AI中文摘要

在本文中,我们研究了在共享的最优执行环境中交互的深度强化学习代理是否能够维持超竞争性结果,即在实现短损方面优于相关博弈论竞争基准。我们研究了一个双代理阿尔梅伦-克里斯特流动性清算游戏,并探讨了学习行为如何依赖于回合内环境反馈、解读中间价格的能力以及代理对过去的了解。我们首先使用事前调度学习代理来去除回合内反馈,并确定当代理在执行开始前承诺完成清算轨迹时会发生什么。然后允许代理使用多种DDQN架构根据演进的状态进行条件判断。我们发现,当代理能够访问回合内历史,特别是近期价格和自身过去行为时,超竞争性结果变得显著更频繁和持久。这些发现表明,这种执行游戏中的超竞争性行为并非由多代理学习或当前价格观察单独驱动,而是由反馈、记忆和沿实际执行路径的状态依赖性交互驱动。

英文摘要

In this paper, we investigate whether deep reinforcement-learning agents interacting in a shared optimal-execution environment can sustain supra-competitive outcomes, in the sense of achieving lower implementation shortfalls than the relevant game-theoretical competitive benchmark. We study a two-agent Almgren-Chriss liquidation game and examine how learned behavior depends on intra-episode environment feedback, the ability to interpret the mid-price and the agent's knoledge of the past. We first use ex-ante schedule-learning agents to remove intra-episode feedback and isolate what can arise when agents commit to complete liquidation trajectories before execution begins. We then allow agents to condition on the evolving state using a variety of DDQN architectures. We find that, when agents are given access to intra-episode history, especially recent prices and own past actions, supra-competitive outcomes become substantially more frequent and more persistent. These findings indicate that supra-competitive behavior in this execution game is driven not by multi-agent learning or by current price observation alone, but by feedback, memory, and state-contingent interaction along the realized execution path.

2605.19354 2026-05-22 eess.IV cs.CV

Next-Acceleration-Scale Prediction for Autoregressive MRI Reconstruction

用于自回归MRI重建的下一步加速尺度预测

Yilmaz Korkmaz, Vishal M. Patel

AI总结 本文提出了一种基于离散多尺度潜在空间的自回归下一步加速尺度预测方法,通过引入特权信息蒸馏技术,提升了在极端欠采样下的MRI重建性能。

详情
AI中文摘要

MRI重建本质上是一个病态的逆问题,因为不完整的测量允许许多可能的解决方案。在高加速情况下,这种不确定性变得更加严重,像素域连续预测器倾向于在可行的重建之间平均并抑制高频解剖结构。我们通过将重建移动到离散多尺度潜在空间,并将其作为自回归下一步加速尺度预测来解决这一限制。利用在视觉自回归建模中证明有效的离散先验,我们的方法将解限制在紧凑的代码本令牌序列中,即使从极稀疏的测量中也能实现锐利的重建。这种离散自回归公式也自然与现代大型语言模型后训练技术对齐。基于这一观察,我们引入了视觉自回归建模中的在线策略特权信息蒸馏,其中教师仅在训练时使用不可用的特权上下文进行训练,在本案例中是完全采样获取,监督学生在自己的滚动生成中进行训练,从而实现一致的重建增益。通过在fastMRI基准上的广泛实验,我们展示了我们的方法在各种采样模式下在极端欠采样下提供了改进的重建性能。项目网站是https://yilmazkorkmaz1.github.io/discrete-mri-reconstruction-opd/。

英文摘要

MRI reconstruction is an inherently ill-posed inverse problem, since incomplete measurements admit many plausible solutions. This ambiguity becomes more severe under high acceleration, where pixel-domain continuous predictors tend to average over feasible reconstructions and suppress high-frequency anatomy. We address this limitation by moving reconstruction to discrete multi-scale latent space and posing it as autoregressive next-acceleration-scale prediction. Leveraging discrete priors proven effective in visual autoregressive modeling, our method restricts the solution to compact sequences of codebook tokens, enabling sharp reconstructions even from extremely sparse measurements. This discrete autoregressive formulation also aligns naturally with modern large language model post-training techniques. Building on this observation, we introduce on-policy privileged information distillation for visual autoregressive modeling, where a teacher is provided training only privileged context that is unavailable at inference, in our case fully sampled acquisitions, and supervises a student trained on its own rollouts, leading to consistent reconstruction gains. Through extensive experiments on the fastMRI benchmark, we show that our approach delivers improved reconstruction performance across diverse sampling patterns under extreme undersampling. Project website is \href{https://yilmazkorkmaz1.github.io/discrete-mri-reconstruction-opd/}{here}.

2605.19152 2026-05-22 stat.ML cs.ET cs.IT cs.LG cs.NE math.IT physics.optics

Information Processing Capacity of Stationary Physical Systems: Theory, Data-efficient Estimation Methods, and Photonic Demonstration

stationary 物理系统的信息处理能力:理论、数据高效估计方法和光子演示

Rahul Uma Ramachandran, Serge Massar

AI总结 本文研究了 stationary 物理系统的信息处理能力,提出了一种理论框架,并开发了数据高效估计方法,通过光子计算系统实验验证了其有效性。

Comments added 2 new references

详情
AI中文摘要

物理计算系统为实现硬件原生机器学习提供了有前景的途径,但其计算能力在原理上、任务无关和数据高效的方式下难以表征。我们扩展了信息处理能力(IPC)框架以适用于 stationary 物理计算系统,并建立了几个基本结果:个体容量在零和一之间被限制,其在完整基底上的总和受读数数量的限制,噪声严格减少这个界限。我们处理有限样本的 IPC 估计,并推导了影响朴素估计器的系统性正偏倚的渐近形式。基于这些结果,我们引入了基于 Richardson 推理和 Sobol 准随机采样的数据高效估计方法。我们通过基于皮秒激光脉冲在非线性光纤中传播的光子计算系统实验验证了该框架。通过改变激光功率和光纤长度,我们观察到由 Kerr 效应诱导的 IPC 分布系统性地向高阶非线性容量偏移。最后,我们证明了总 IPC 与基准机器学习任务的性能强相关,并提供了系统有效维度的可靠估计。这些结果确立了 IPC 作为连接物理计算系统内在动态与其机器学习性能的实用桥梁。

英文摘要

Physical computing systems provide a promising route toward hardware-native machine learning, but their computational capabilities remain difficult to characterize in a principled, task-independent, and data-efficient way. We extend the Information Processing Capacity (IPC) framework to stationary physical computing systems and establish several fundamental results: individual capacities are bounded between zero and one, their sum over a complete basis is bounded by the number of readouts, and noise strictly reduces this bound. We address the finite-sample estimation of IPC and derive the asymptotic form of the systematic positive bias affecting naive estimators. Building on these results, we introduce data-efficient estimation methods based on Richardson extrapolation and Sobol quasi-random sampling. We validate the framework experimentally using a photonic computing system based on picosecond laser pulses propagating through a nonlinear optical fibre. By varying the laser power and fibre length, we observe systematic shifts of the IPC distribution toward higher-order nonlinear capacities induced by the Kerr effect. Finally, we demonstrate that the total IPC strongly correlates with performance on benchmark machine-learning tasks and provides a reliable estimate of the effective dimensionality of the system. These results establish IPC as a practical bridge between the intrinsic dynamics of physical computing systems and their machine-learning performance.

2605.18372 2026-05-22 cs.HC cs.AI cs.CY cs.ET

The Hidden Cost of Contextual Sycophancy: an AI Literacy Intervention in Human-AI Collaboration

上下文谄媚的隐性成本:人类-人工智能协作中的AI素养干预

Cansu Koyuturk, Sabrina Guidotti, Dimitri Ognibene

AI总结 本研究探讨了在人类-人工智能协作中上下文谄媚现象的成因,并通过干预提升AI素养和提示能力以减轻其影响,发现AI反馈质量受用户错误传播影响显著,提示需系统层面改进以促进批判性参与。

Comments SPRINGER AIED 2026: Accepted for LBR, poster presentation at the 27th International Conference on Artificial Intelligence in Education, 27 Jun - 3 Jul 2026, Seoul, Republic of Korea

详情
AI中文摘要

大型语言模型(LLMs)在教育领域日益被用作交互工具进行协作。然而,其倾向于谄媚,即使在错误时也迎合用户信念,这引发了学习和决策的担忧,尤其是对知识较少的用户。本研究调查了在真实多轮人类-人工智能交互中谄媚对齐如何产生,并探讨了针对提高AI素养和提示能力的干预是否能减轻其影响。在受控混合设计实验中,60名参与者通过先生成个人排名再与AI助手协作进行分析生存排名任务,分别在干预前和干预后接受一般或谄媚聚焦的提示训练。初步结果显示,LLMs对用户输入高度敏感:低质量的初始响应导致较差的AI建议,表明模型镜像或整合了用户推理而非纠正或提供缺失或较少见的替代方案。关键的是,用户错误向AI响应的传播显著降低了AI反馈质量和最终用户任务表现,揭示了一种上下文谄媚依赖现象。尽管干预未能消除上下文错误的传播,但显著提高了AI建议质量,通过减少直接镜像错误用户排名。这些发现表明,提示和AI素养单独可能不足以确保知识上独立的AI支持,强调了需要系统层面方法以促进人类-人工智能协作中的批判性参与。

英文摘要

Large Language Models (LLMs) are increasingly used in educational settings as interactive tools for collaboration. However, their tendency toward sycophancy, aligning with user beliefs even when incorrect, raises concerns for learning and decision-making, especially for less knowledgeable users. This study investigates how sycophantic alignment emerges in authentic multi-turn human-AI interactions and whether interventions targeting increasing AI literacy and prompting competencies can mitigate its effects. In a controlled mixed-design experiment, 60 participants completed analytical survival ranking tasks by first generating individual rankings and then making final decisions after collaborating with an AI assistant, both before and after receiving either general or sycophancy-focused prompting training. Preliminary results show that LLMs are highly sensitive to user input: lower-quality initial responses lead to poorer AI advice, suggesting that the model mirrors or incorporates user reasoning rather than correcting it or offering better alternatives that are missing or less frequent in the conversation. Critically, the propagation of user errors into AI responses significantly reduced both the quality of AI feedback and final user task performance, revealing a form of contextual sycophantic dependence. While the intervention did not eliminate the propagation of contextual errors, it significantly improved AI advice by reducing the direct mirroring of incorrect user rankings. These findings suggest that prompting and AI literacy alone may be insufficient to ensure epistemically independent AI support, highlighting the need for system-level approaches that better promote critical engagement in human-AI collaboration.

2605.16299 2026-05-22 cs.SE cs.AI

ACE: Self-Evolving LLM Coding Framework via Adversarial Unit Test Generation and Preference Optimization

ACE:通过对抗性单元测试生成和偏好优化的自进化LLM编码框架

Yixu Huang, Xinglei Yu, Zhongyu Wei

AI总结 本文提出ACE框架,通过基于求解器-对抗架构的执行中心监督,实现自进化代码生成,无需真实代码或外部奖励模型,实验表明其在CodeContests、MBPP和LiveCodeBench上均优于现有求解器-验证器基线。

详情
AI中文摘要

大型语言模型(LLMs)在代码生成方面表现出色,但仍然严重依赖大规模标注解决方案和基于验证的监督,这限制了可扩展性和持续自我改进的能力。最近的求解器-验证器框架利用程序执行作为自动监督信号,但其有效性在求解器变得中等强大时会下降:验证器生成的测试越来越多地确认语义正确性,而不是暴露剩余的失败模式。我们提出了ACE,一种基于求解器-对抗架构的自进化代码生成框架,优先通过以执行为中心的监督进行主动失败发现。一个单一的LLM在生成候选程序和生成优化以诱导执行级失败(如运行时错误、异常或非终止)的对抗性单元测试输入之间交替进行。监督仅来源于执行结果:稳健的程序被选为监督微调,而对抗性测试通过Kahneman-Tversky优化使用执行衍生的偏好进行优化。值得注意的是,整个训练循环不需要真实代码或外部奖励模型。在CodeContests、MBPP和LiveCodeBench上的实验表明,ACE在pass@1上持续优于强大的求解器-验证器基线,实现了3-7%的绝对提升,在分布外基准上改进更大,同时保持竞争性或改进的推理效率。

英文摘要

Large Language Models (LLMs) excel at code generation but remain heavily reliant on large-scale annotated solutions and verification-based supervision, which constrains scalability and hinders sustained self-improvement. Recent solver--verifier frameworks exploit program execution as an automatic supervision signal, but their effectiveness degrades as solvers become moderately strong: verifier-generated tests increasingly confirm semantic correctness rather than exposing the remaining failure modes. We propose \textbf{ACE}, a self-evolving code generation framework based on a solver--adversary architecture that prioritizes active failure discovery through execution-centric supervision. A single LLM alternates between generating candidate programs and producing adversarial unit test inputs optimized to induce execution-level failures, such as runtime errors, exceptions, or non-termination. Supervision is derived solely from execution outcomes: robust programs are selected for supervised fine-tuning, while adversarial tests are optimized via Kahneman--Tversky Optimization using execution-derived preferences. Notably, the entire training loop requires no ground-truth code or external reward models. Experiments on CodeContests, MBPP, and LiveCodeBench demonstrate that ACE consistently outperforms strong solver--verifier baselines, achieving 3--7\% absolute gains in pass@1, with larger improvements on out-of-distribution benchmarks, while maintaining competitive or improved inference efficiency.

2605.12456 2026-05-22 cs.CR cs.CL cs.LG

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

TextSeal: 一种用于溯源与蒸馏保护的本地化大语言模型水印

Tom Sander, Hongyan Chang, Tomáš Souček, Tuan Tran, Valeriu Lacatusu, Sylvestre-Alvise Rebuffi, Alexandre Mourachko, Surya Parimi, Christophe Ropers, Rashel Moritz, Vanessa Stark, Hady Elsahar, Pierre Fernandez

AI总结 本文提出TextSeal,一种先进的大语言模型水印技术,通过Gumbel-max采样引入双密钥生成以恢复输出多样性,并结合熵加权评分和多区域定位提升检测性能。该方法支持推测解码和多令牌预测等服务优化,不增加推理开销。在检测强度上严格优于基线方法SynthID-text,并对稀释具有鲁棒性,即使在混合的人类/AI文档中也能保持自信的本地化检测。理论上该方案无失真,经推理基准评估证实其保持下游性能;同时通过多语言人工评估(6000次A/B对比,5种语言)显示无明显质量差异。除了用于溯源检测外,TextSeal还具有'放射性'特性:其水印信号通过模型蒸馏传递,可检测未经授权的使用。

详情
AI中文摘要

我们介绍TextSeal,一种最先进的大语言模型水印。基于Gumbel-max采样,TextSeal引入双密钥生成以恢复输出多样性,同时结合熵加权评分和多区域定位以提升检测性能。它支持推测解码和多令牌预测等服务优化,并不增加任何推理开销。TextSeal在检测强度上严格优于基线方法如SynthID-text,并对稀释具有鲁棒性,即使在混合的人类/AI文档中也能保持自信的本地化检测。该方案在理论上是无失真的,经推理基准评估确认其保持下游性能;同时通过多语言人工评估(6000次A/B对比,5种语言)显示无明显质量差异。除了用于溯源检测外,TextSeal还具有'放射性'特性:其水印信号通过模型蒸馏传递,可检测未经授权的使用。

英文摘要

We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizations such as speculative decoding and multi-token prediction, and does not add any inference overhead. TextSeal strictly dominates baselines like SynthID-text in detection strength and is robust to dilution, maintaining confident localized detection even in heavily mixed human/AI documents. The scheme is theoretically distortion-free, and evaluation across reasoning benchmarks confirms that it preserves downstream performance; while a multilingual human evaluation (6000 A/B comparisons, 5 languages) shows no perceptible quality difference. Beyond its use for provenance detection, TextSeal is also ``radioactive'': its watermark signal transfers through model distillation, enabling detection of unauthorized use.

2605.07985 2026-05-22 cs.DC cs.AI

Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

Dooly: 一种配置无关、冗余感知的LLM推理模拟器

Joon Ha Kim, Geon-Woo Kim, Anoop Rachakonda, Daehyeok Kim

AI总结 本文提出Dooly,一种能够忽略配置差异并高效处理冗余的LLM推理模拟器,通过单次推理过程和智能标签传播减少冗余 profiling 耗时,提升模拟精度和效率。

详情
AI中文摘要

选择最优的LLM推理配置需要在硬件、服务引擎、注意力后端和模型架构之间进行评估,因为没有单一选择在所有工作负载中表现最佳。基于配置的模拟器是标准工具,但它们硬编码操作集到特定配置,并重新对每个操作进行重新配置,这使得探索变得成本高昂。这种成本源于对结构理解的缺失:每个操作的每个输入维度都由模型配置或 incoming 请求决定。许多模型配置值(例如头大小、层数)在不同模型中重复出现,因此相同操作在许多配置中运行;一次扫描请求依赖的维度即可服务所有。我们提出了Dooly,利用这种结构实现配置无关、冗余感知的配置。Dooly执行一次推理过程,通过污点传播标记每个输入维度的来源,并仅对不在其延迟数据库中的操作进行选择性配置;状态操作如注意力通过重用服务引擎自身的初始化代码进行隔离,从而消除手动仪器化。它基于数据库构建延迟回归模型,该模型成为现有模拟器的即插即用后端。在两个GPU平台、三个注意力后端和多样的模型架构上,Dooly在TTFT上达到5%的MAPE精度,在TPOT上达到8%的精度,同时将12个模型的profiling GPU小时减少了56.4%。我们已开源Dooly在https://github.com/dooly-project。

英文摘要

Selecting the optimal LLM inference configuration requires evaluation across hardware, serving engines, attention backends, and model architectures, since no single choice performs best across all workloads. Profile-based simulators are the standard tool, yet they hardcode their operation set to a specific configuration and re-profile every operation from scratch, making exploration prohibitively expensive. This cost stems from a missing structural understanding: every input dimension of each operation is fixed by the model configuration or determined by the incoming request. Many model-configuration values (e.g., head size, layer count) recur across models, so the same operation runs in many configurations; a single sweep over the request-dependent dimensions can serve them all. We present Dooly, which exploits this structure to achieve configuration-agnostic, redundancy-aware profiling. Dooly performs a single inference pass, labels each input dimension with its origin via taint propagation, and selectively profiles only operations absent from its latency database; stateful operations such as attention are isolated by reusing the serving engine's own initialization code, eliminating manual instrumentation. It builds latency regression models based on the database, which becomes a drop-in backend for existing simulators. Across two GPU platforms, three attention backends, and diverse model architectures, Dooly achieves simulation accuracy within 5% MAPE for TTFT and 8% for TPOT while reducing profiling GPU-hours by 56.4% across 12 models compared to the existing profiling approach. We have open-sourced Dooly at https://github.com/dooly-project.

2605.07870 2026-05-22 cond-mat.dis-nn cs.AI stat.ML

Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer

深度网络中的谱动力学:特征学习、异常值逃逸和学习率转移

Clarissa Lauditi, Cengiz Pehlevan, Blake Bordelon

AI总结 本文研究了在宽神经网络中通过(随机)梯度下降训练时隐藏权重谱的演变,提出了一种双层动态平均场理论(DMFT)来联合跟踪具有尖峰集合的隐藏权重谱动态,其中尖峰方向在随机体上保持统计依赖性。该框架应用于两种设置:(1)无限宽度非线性网络在均值场/μP缩放下,以及(2)深度线性网络在比例高维极限下。理论预测了异常值如何随训练时间、宽度、输出尺度和初始化方差演变。在深度线性网络中,μP产生与宽度一致的异常值动态和超参数转移,包括主导NTK模式向稳定性边缘(EoS)的宽度稳定增长。相比之下,NTK参数化表现出强烈依赖宽度的异常值动态,尽管收敛到一个稳定的宽网络极限。我们展示了这种体+异常值图像是描述简单任务的,但涉及大量输出的任务(如ImageNet分类或GPT语言建模)则更适合通过重构谱体来描述。我们开发了一个具有大量输出通道的玩具模型,重现了这一现象,并展示了足够宽的网络下谱边缘仍会收敛。

Comments Updating related works + discussion

详情
AI中文摘要

我们研究了在宽神经网络中通过(随机)梯度下降训练时隐藏权重谱的演变。我们开发了一种双层动态平均场理论(DMFT),该理论联合跟踪具有尖峰集合的隐藏权重谱动态,其中尖峰方向在随机体上保持统计依赖性。我们将该框架应用于两种设置:(1)无限宽度非线性网络在均值场/μP缩放下,以及(2)深度线性网络在比例高维极限下,其中宽度、输入维度和样本大小以固定比例发散。我们的理论预测了异常值如何随训练时间、宽度、输出尺度和初始化方差演变。在深度线性网络中,μP产生与宽度一致的异常值动态和超参数转移,包括主导NTK模式向稳定性边缘(EoS)的宽度稳定增长。相比之下,NTK参数化表现出强烈依赖宽度的异常值动态,尽管收敛到一个稳定的宽网络极限。我们展示了这种体+异常值图像是描述简单任务的,但涉及大量输出的任务(如ImageNet分类或GPT语言建模)则更适合通过重构谱体来描述。我们开发了一个具有大量输出通道的玩具模型,重现了这一现象,并展示了足够宽的网络下谱边缘仍会收敛。

英文摘要

We study the evolution of hidden-weight spectra in wide neural networks trained by (stochastic) gradient descent. We develop a two-level dynamical mean-field theory (DMFT) that jointly tracks bulk and outlier spectral dynamics for spiked ensembles whose spike directions remain statistically dependent on the random bulk. We apply this framework to two settings: (1) infinite-width nonlinear networks in mean-field/$μ$P scaling and (2) deep linear networks in the proportional high-dimensional limit, where width, input dimension, and sample size diverge with fixed ratios. Our theory predicts how outliers evolve with training time, width, output scale, and initialization variance. In deep linear networks, $μ$P yields width-consistent outlier dynamics and hyperparameter transfer, including width-stable growth of the leading NTK mode toward the edge of stability (EoS). In contrast, NTK parameterization exhibits strongly width-dependent outlier dynamics, despite converging to a stable large-width limit. We show that this bulk+outlier picture is descriptive of simple tasks with small output channels, but that tasks involving large numbers of outputs (ImageNet classification or GPT language modeling) are better described by a restructuring of the spectral bulk. We develop a toy model with extensive output channels that recapitulates this phenomenon and show that edge of the spectrum still converges for sufficiently wide networks.

2605.06669 2026-05-22 cs.CR cs.AI cs.LG

Evaluating Prompt Injection Defenses for Educational LLM Tutors: Security-Usability-Latency Trade-offs

评估教育LLM导师的提示注入防御:安全-可用性-延迟的权衡

Alexandre Cristovão Maiorano

AI总结 本文提出了一种评估提示注入防御方法的框架,探讨了在教育LLM导师中安全、可用性和延迟之间的权衡,并通过实验比较了不同防御机制的性能。

Comments 19 pages, 4 figures, 9 tables

详情
AI中文摘要

教育LLM导师面临一个核心的AI对齐挑战:它们必须在遵循用户意图的同时保持教学约束和安全政策。我们提出了一个评估方法,用于评估提示注入防御在该场景中的表现,显示了防护栏设计在对抗性鲁棒性、良性任务可用性和响应延迟之间存在显式的权衡。我们评估了一个领域特定的多层安全防护流水线,结合确定性模式过滤器、结构验证、上下文沙箱和会话级行为检查。在受控的保留基准测试中,该流水线实现了低绕过率和假阳性率,同时优化了平均延迟——一个优先考虑教学可用性(零假阳性)而保持可测量攻击抵抗力的操作点。我们提供了一个可重复的基准测试协议,用于在相同条件下进行头对头比较,包括分层Bootstrap置信区间、配对McNemar显著性检验、多种子敏感度扫描,以及在相同划分上对Prompt Guard和NeMo Guardrails的直接评估。结果揭示了操作权衡:NeMo在16.22%的假阳性率下达到0%的绕过率,而Prompt Guard在3.60%的假阳性率下达到38.48%的绕过率。该框架支持在不同机构风险和可用性要求下,基于证据的防护栏选择。

英文摘要

Educational LLM tutors face a core AI alignment challenge: they must follow user intent while preserving pedagogical constraints and safety policies. We present an evaluation methodology for prompt-injection defenses in this setting, showing that guardrail design entails explicit trade-offs among adversarial robustness, benign-task usability, and response latency. We evaluate a domain-specific multi-layer safeguard pipeline combining deterministic pattern filters, structural validation, contextual sandboxing, and session-level behavioral checks. On a controlled holdout benchmark, the pipeline reaches low bypass and false positive rates with optimized average latency - an operating point that prioritizes pedagogical usability (zero false positives) while maintaining measurable attack resistance. We provide a reproducible benchmark protocol for head-to-head comparison under identical conditions, including stratified bootstrap confidence intervals, paired McNemar significance tests, multi-seed sensitivity sweeps, and direct evaluation of Prompt Guard and NeMo Guardrails on the same split with unified instrumentation. Results expose operational trade-offs: NeMo reaches 0 percent bypass at 16.22 percent FPR and roughly 1.5s latency, while Prompt Guard yields 38.48 percent bypass with 3.60 percent FPR. The framework supports evidence-based guardrail selection for AI tutoring systems under different institutional risk and usability requirements.

2605.01369 2026-05-22 eess.SP cs.AI cs.LG

MU-SHOT-Fi: Self-Supervised Multi-User Wi-Fi Sensing with Source-free Unsupervised Domain Adaptation

MU-SHOT-Fi: 基于源无关无监督域适应的多用户Wi-Fi感知

Ahmed Y. Radwan, Hina Tabassum

AI总结 本文提出MU-SHOT-Fi框架,通过源无关无监督域适应方法,在单用户和多用户Wi-Fi感知中实现准确的活动分类和占用估计,同时防止模型崩溃。

详情
Journal ref
IEEE Internet of Things Journal, Early Access, 2026
AI中文摘要

深度学习已被广泛应用于基于Wi-Fi CSI的人体活动识别(HAR),因为它能够以隐私保护和成本效益的方式学习时空特征。然而,基于深度学习的模型在跨环境泛化能力差,特别是在多用户设置中,重叠活动导致CSI纠缠和域偏移。实际部署通常由于隐私限制限制访问标记源数据,这促使使用仅未标记目标域CSI和预训练源模型进行源无关适应。在本文中,我们提出了MU-SHOT-Fi,一种用于单用户和多用户Wi-Fi感知的源无关无监督域适应框架。MU-SHOT-Fi在源训练期间采用排列不变的集合预测与匈牙利匹配,随后在目标域中采用冻结分类器骨干适应。为了实现无标签的稳定适应,我们引入了占用加权信息最大化,通过将多样性正则化集中在可能占用的槽位上,同时排除主导类别的边际熵。此外,我们采用二进制旋转预测作为空间自监督,利用CSI频率-时间结构学习域不变特征。对于单用户场景,我们引入SU-SHOT-Fi,通过将占用加权替换为标准信息最大化,并结合对比预测编码以利用时间一致性。在WiMANS和Widar 3.0数据集上进行了广泛的实验,涵盖了跨环境、跨频率、跨方向和组合域偏移,证明MU-SHOT-Fi在大域偏移下有效恢复多用户精确活动分类性能,同时保持准确的占用估计并防止向主导类崩溃。

英文摘要

Deep learning has been widely adopted for WiFi CSI-based human activity recognition (HAR) due to its ability to learn spatio-temporal features in a privacy-preserving and cost-effective manner. However, DL-based models generalize poorly across environments, a challenge amplified in multi-user settings where overlapping activities cause CSI entanglement and domain shifts. Practical deployments often limit access to labeled source data due to privacy constraints, motivating source-free adaptation using only unlabeled target-domain CSI and a pre-trained source model. In this paper, we propose MU-SHOT-Fi, a source-free unsupervised domain adaptation framework for single- and multi-user Wi-Fi sensing. MU-SHOT-Fi employs permutation-invariant set prediction with Hungarian matching during source training, followed by frozen-classifier backbone adaptation in the target domain. To enable stable adaptation without labels, we introduce occupancy-weighted information maximization that prevents model collapse by focusing diversity regularization on likely-occupied slots while excluding the dominant class from marginal entropy. Additionally, we employ binary rotation prediction as spatial self-supervision that exploits CSI frequency-time structure to learn domain-invariant features. For single-user scenarios, we introduce SU-SHOT-Fi by replacing occupancy weighting with standard information maximization and incorporating contrastive predictive coding to exploit temporal consistency. Extensive experiments on the WiMANS and Widar 3.0 datasets across cross-environment, cross-frequency, cross-orientation, and combined domain shifts demonstrate that MU-SHOT-Fi effectively recovers multi-user exact-activity classification performance under large domain shifts while maintaining accurate occupancy estimation and preventing collapse toward dominant classes.

2605.00515 2026-05-22 cs.DC cs.AI cs.NI

SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks

SpaceMoE:在空间网络上实现分布式混合专家推理

Zhanwei Wang, Huiling Yang, Min Sheng, Khaled B. Letaief, Kaibin Huang

AI总结 本文提出SpaceMoE框架,旨在解决在卫星网络中高效部署大规模LLM的挑战,通过分层专家放置策略减少延迟,实现混合专家模型在空间环境中的高效推理。

详情
AI中文摘要

利用高效的连续太阳能采集,空间数据中心被视为执行高能耗大语言模型(LLMs)的有前途的平台。鉴于这一优势,航天和人工智能 conglomerates(如SpaceX、Google)正在积极投资这一愿景。然而,一个关键挑战是由于卫星上的计算和通信资源有限,高效地在卫星网络中部署大规模LLM。这导致了一个放置问题,需要将模型组件划分为卫星,以确保不同的模型架构和网络拓扑能够协调一致,从而实现低延迟的token生成。为了解决这个问题,我们提出了混合专家(MoE)的空间网络(SpaceMoE)框架,旨在在空间中分布式执行流行的混合专家模型。所提出的放置策略是两级的:(1)层放置,将MoE层分配给卫星子网;(2)层内专家放置,将单个专家分配给同一层/子网的卫星。对于层放置,我们利用自回归推断的环形通信模式,将卫星星座沿轨道方向划分为子网,每个子网托管一个MoE层。基于此架构,我们制定了并解决了层内专家放置的优化问题,以将具有异构激活概率的专家映射到卫星上。推导出的策略揭示了一个直观的原则:频繁激活的专家应映射到具有低预期延迟的路由路径上的卫星。实验表明,SpaceMoE在千卫星星座上实现了至少三倍于传统随机和消融放置策略的延迟降低。

英文摘要

Leveraging continuous solar energy harvesting at high efficiency, space data centers are envisioned as a promising platform for executing energy-intensive large language models (LLMs). Recognizing this advantage, space and AI conglomerates (e.g., SpaceX, Google) are actively investing in this vision. One key challenge, however, is the efficient distributed deployment of a large-scale LLM in a satellite network due to the limited onboard computing and communication resources. This gives rise to a placement problem that involves partitioning and mapping model components to satellites such that the fundamentally different model architecture and network topology can be reconciled to ensure low-latency token generation. To address this problem, we present the Space Network of Mixture-of-Experts (SpaceMoE) framework targeting the distributed execution of a popular mixture-of-experts (MoE) model in space. The proposed placement strategies are two-level: (1) layer placement, which assigns MoE layers to satellite subnets; and (2) intra-layer expert placement, which assigns individual experts to satellites associated with the same layer/subnet. For layer placement, we exploit the ring-like communication pattern of autoregressive inference to partition the satellite constellation along the orbiting direction into subnets arranged on a ring, each hosting one MoE layer. Based on this architecture, we formulate and solve an optimization problem for intra-layer expert placement to map experts with heterogeneous activation probabilities onto satellites. The derived strategy reveals an intuitive principle: a frequently activated expert should be mapped to a satellite on a routing path with low expected latency. Experiments over a thousand-satellite constellation show that SpaceMoE achieves at least a threefold latency reduction compared with conventional random and ablation-based placement strategies.

2604.03501 2026-05-22 cs.HC cs.AI

The Augmentation Trap: AI Productivity and the Cost of Cognitive Offloading

增强陷阱:人工智能生产力与认知卸载的成本

Michael Caosun, Sinan Aral

AI总结 本文研究了人工智能工具对工人生产力的影响,发现尽管短期生产力提升,但持续使用会侵蚀工人技能。通过动态模型分析,发现即使预期技能损耗,理性决策者仍可能在短期收益大于长期成本时采用AI,导致稳态损失。此外,当管理者短视或工人技能具有外部价值时,决策者可能陷入增强陷阱,使工人状况恶化。最后,当AI生产力与工人技能关联较弱时,工人技能可能永久分化,经验丰富的工人实现潜力,而经验不足的工人技能降至零。

详情
AI中文摘要

实验证据证实,AI工具提高了工人的生产力,但持续使用也会侵蚀支撑这些收益的专业技能。我们开发了一个动态模型,其中决策者在时间上选择工人使用AI的强度,权衡即时生产力与工人技能的损耗。我们将工具的生产力效应分解为两个渠道,一个与工人技能无关,另一个随技能变化。模型产生了三个主要结果。第一,即使决策者完全预见技能损耗,理性决策者在短期生产力收益超过长期技能成本时仍会采用AI,产生稳态损失:工人最终比采用AI前更不 productive。第二,当管理者短视或工人技能具有外部价值时,决策者的最优政策将稳态损失转化为增强陷阱,使工人状况比未采用AI时更差。第三,当AI生产力较少依赖工人技能时,工人技能可以永久分化:经验丰富的工人实现全部潜力,而经验较少的工人技能降至零。小的管理激励差异决定了工人的路径。生产力分解将部署分为五个制度,区分有益和有害的采用,并识别哪些部署容易陷入陷阱。

英文摘要

Experimental evidence confirms that AI tools raise worker productivity, but also that sustained use can erode the expertise on which those gains depend. We develop a dynamic model in which a decision-maker chooses AI usage intensity for a worker over time, trading immediate productivity against the erosion of worker skill. We decompose the tool's productivity effect into two channels, one independent of worker expertise and one that scales with it. The model produces three main results. First, even a decision-maker who fully anticipates skill erosion rationally adopts AI when front-loaded productivity gains outweigh long-run skill costs, producing steady-state loss: the worker ends up less productive than before adoption. Second, when managers are short-termist or worker skill has external value, the decision-maker's optimal policy turns steady-state loss into the augmentation trap, leaving the worker worse off than if AI had never been adopted. Third, when AI productivity depends less on worker expertise, workers can permanently diverge in skill: experienced workers realize their full potential while less experienced workers deskill to zero. Small differences in managerial incentives can determine which path a worker takes. The productivity decomposition classifies deployments into five regimes that separate beneficial adoption from harmful adoption and identifies which deployments are vulnerable to the trap.

2604.02889 2026-05-22 stat.ML cs.AI cs.LG

Rethinking Forward Processes for Score-Based Nonlinear Data Assimilation in High Dimensions

重新思考高维数据同化中的分数基非线性数据同化前向过程

Eunbi Yoon, Won Chang, Donghan Kim, Dae Wook Kim

AI总结 本文提出了一种针对数据同化问题的改进前向过程,用于高维非线性系统的状态估计,通过改进的分数基滤波器在测量空间中转换系统状态,提高了同化性能。

详情
AI中文摘要

数据同化是通过结合模型预测和测量来估计动态系统状态的过程。当系统是非线性且高维时,这一任务变得具有挑战性。为了解决这个问题,最近出现了一种基于分数的贝叶斯滤波器。然而,这些方法在某些情况下仍表现不佳,特别是在空间稀疏测量下。这种退化源于对似然分数的启发式近似,其误差会随时间累积。这一限制是因为这些方法只是采用了一种经典的生成建模前向过程,将数据分布转化为高斯分布,而与测量方程无关。在这里,我们提出了一种针对滤波的前向过程,将系统状态转换到测量空间,从而实现了似然分数的理论严谨公式化。基于此,我们开发了测量感知的分数基滤波器(MASF)。我们在Kolmogorov流上评估了MASF,这是一个具有高达$\mathcal{O}(10^5)$维度的高维流体基准测试,包括非线性情况下的状态与测量之间的维度不匹配。MASF在现有分数基滤波器和集合型卡尔曼滤波器上表现出改进的性能。值得注意的是,当使用幅度预训练时,MASF相比基线实现了高达$28.2 imes$的时钟时间加速。我们的实现可在 exttt{https://github.com/tcnllab-oss/masf}获得。

英文摘要

Data assimilation is the process of estimating the state of a dynamical system over time by combining model predictions with measurements. This task becomes challenging when the system is nonlinear and high-dimensional. To address this, score-based Bayesian filters have recently emerged. However, these methods still show unsatisfactory performance in certain cases, particularly under spatially sparse measurements. Such degradation stems from heuristic approximations of the likelihood score, whose errors can accumulate over time. This limitation arises because the methods simply adopt a classical forward process for generative modeling that transforms a data distribution toward a Gaussian distribution, which is independent of the measurement equation. Here, we propose a forward process tailored for filtering that transforms the system state toward the measurement space, enabling a theoretically sound formulation of the likelihood score. Based on this, we develop the Measurement-Aware Score-based Filter (MASF). We evaluate MASF on Kolmogorov flow, a high-dimensional fluid benchmark with up to $\mathcal{O}(10^5)$ dimensions, under diverse measurement operators, including nonlinear cases with a dimensional mismatch between the state and the measurements. MASF shows improved performance over existing score-based filters and ensemble-type Kalman filters. Notably, MASF achieves up to a $28.2\times$ wall-clock speedup compared with the baselines when using amortized pretraining. Our implementation is available at \texttt{https://github.com/tcnllab-oss/masf}.

2603.15676 2026-05-22 cs.SE cs.AI

Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications

自动化自我测试作为质量门:基于证据的LLM应用发布管理

Alexandre Cristovão Maiorano

AI总结 本文提出了一种自动化自我测试框架,通过五个实证基础的维度(任务成功率、研究环境保持、P95延迟、安全通过率和证据覆盖)实现基于证据的发布决策(PROMOTE/HOLD/ROLLBACK),并通过长期案例研究评估了该框架在多代理对话AI系统中的有效性。

Comments 20 pages, 6 figures, 12 tables

详情
AI中文摘要

LLM应用是AI系统,其非确定性输出和不断变化的模型行为使得传统测试不足以满足发布管理的需求。我们提出了一种自动化自我测试框架,引入了基于证据的发布决策质量门(PROMOTE/HOLD/ROLLBACK)五个实证基础的维度:任务成功率、研究环境保持、P95延迟、安全通过率和证据覆盖。我们通过一个长期案例研究评估该框架,该研究涉及一个内部部署的多代理对话AI系统,具有特定的营销能力,并在活跃开发中覆盖了38次评估运行,跨越20多个内部发布。质量门在早期运行中识别出两个ROLLBACK级构建,并在四周的 staging 生命周期中支持稳定的质量演变,同时执行了基于角色的、多轮的、对抗性和证据要求的场景。统计分析(Mann-Kendall趋势、Spearman相关性、bootstrap置信区间)、质量门消融和开销扩展表明,证据覆盖是主要的严重回归判别器,且运行时间与套件大小成比例增长。人类校准研究(n=60分层案例,两个独立评估者,LLM-as-judge交叉验证)揭示了互补的多模态覆盖:LLM-judge与系统门的分歧(kappa=0.13)可归因于结构失败模式——延迟违规和路由错误——这些在响应文本中是不可见的,而评估者独立地揭示了被结构检查遗漏的内容质量失败,这与多维门设计一致。该框架、补充伪代码和校准工件被提供以支持AI系统质量保证和独立复制。

英文摘要

LLM applications are AI systems whose nondeterministic outputs and evolving model behavior make traditional testing insufficient for release governance. We present an automated self-testing framework that introduces quality gates with evidence-based release decisions (PROMOTE/HOLD/ROLLBACK) across five empirically grounded dimensions: task success rate, research context preservation, P95 latency, safety pass rate, and evidence coverage. We evaluate the framework through a longitudinal case study of an internally deployed multi-agent conversational AI system with specific marketing capabilities in active development, covering 38 evaluation runs across 20+ internal releases. The gate identified two ROLLBACK-grade builds in early runs and supported stable quality evolution over a four-week staging lifecycle while exercising persona-grounded, multi-turn, adversarial, and evidence-required scenarios. Statistical analysis (Mann-Kendall trends, Spearman correlations, bootstrap confidence intervals), gate ablation, and overhead scaling indicate that evidence coverage is the primary severe-regression discriminator and that runtime scales predictably with suite size. A human calibration study (n=60 stratified cases, two independent evaluators, LLM-as-judge cross-validation) reveals complementary multi-modal coverage: LLM-judge disagreements with the system gate (kappa=0.13) are attributable to structural failure modes - latency violations and routing errors - invisible in response text alone, while the judge independently surfaces content quality failures missed by structural checks, consistent with a multi-dimensional gate design. The framework, supplementary pseudocode, and calibration artifacts are provided to support AI-system quality assurance and independent replication.

2603.04525 2026-05-22 stat.ML cs.LG

The Volterra signature

Volterra签名

Paul P. Hager, Fabian N. Harang, Luca Pelizzari, Samy Tindel

AI总结 本文提出Volterra签名作为处理历史依赖系统的显式特征表示,通过将输入路径与时间核结合到张量代数中,利用Volterra-Chen恒等式推导出严谨的学习理论保证,并展示其在动态学习任务中的有效性。

详情
AI中文摘要

现代处理非马尔可夫时间序列的学习方法,如循环神经网络、神经控制微分方程或变换器,通常依赖于隐式的记忆机制,这些机制在长时间范围内难以解释或训练。我们提出Volterra签名VSig(x;K)作为处理历史依赖系统的显式特征表示。通过将输入路径x加权时间核K转化为张量代数,我们利用相关的Volterra-Chen恒等式推导出严谨的学习理论保证。具体来说,我们证明了注入性陈述(在增强下可识别),从而在无限维路径空间上推导出通用逼近定理,这在某些情况下通过VSig(x;K)的线性泛函实现。此外,我们通过展示与Volterra签名相关的内积可通过二参数积分方程闭合地表示,证明了核技巧的应用,从而利用PDE的数值方法进行计算。对于一大类指数型核,VSig(x;K)在张量代数中解线性状态空间微分方程。结合对时间重参数化的不变性,这些结果将Volterra签名定位为数据科学中稳健且计算上可行的特征映射。我们在真实和合成数据上的动态学习任务中展示了其有效性,其中它一致地改进了经典路径签名基线。

英文摘要

Modern approaches for learning from non-Markovian time series, such as recurrent neural networks, neural controlled differential equations or transformers, typically rely on implicit memory mechanisms that can be difficult to interpret or to train over long horizons. We propose the \emph{Volterra signature} $\mathrm{VSig}(x;K)$ as a principled, explicit feature representation for history-dependent systems. By developing the input path $x$ weighted by a temporal kernel $K$ into the tensor algebra, we leverage the associated Volterra--Chen identity to derive rigorous learning-theoretic guarantees. Specifically, we prove an \emph{injectivity} statement (identifiability under augmentation) that leads to a \emph{universal approximation} theorem on the infinite dimensional path space, which in certain cases is achieved by \emph{linear functionals} of $\mathrm{VSig}(x;K)$. Moreover, we demonstrate applicability of the \emph{kernel trick} by showing that the inner product associated with Volterra signatures admits a closed characterization via a two-parameter integral equation, enabling numerical methods from PDEs for computation. For a large class of exponential-type kernels, $\mathrm{VSig}(x;K)$ solves a linear state-space ODE in the tensor algebra. Combined with inherent invariance to time reparameterization, these results position the Volterra signature as a robust, computationally tractable feature map for data science. We demonstrate its efficacy in dynamic learning tasks on real and synthetic data, where it consistently improves classical path signature baselines.

2603.04383 2026-05-22 cs.CY cs.CR cs.IR cs.LG cs.SI

Turning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy

将信任转化为交易:追踪YouTube的影响力经济中的affiliate营销与FTC合规性

Chen Sun, Yash Vekaria, Zubair Shafiq, Rishab Nithyanand

AI总结 本研究通过Web测量和NLP技术开发工具,分析YouTube上affiliate营销生态系统的现状,揭示affiliate链接的普及程度及非合规行为的比例,并提出通过标准化披露功能提高合规性的建议。

Comments ICWSM 2026

详情
AI中文摘要

YouTube已发展成一个强大的平台,创作者通过affiliate营销来 monetize 他们的影响力,这引发了关于透明度和伦理问题的担忧,尤其是在创作者未能披露其affiliate关系时。尽管监管机构如美国联邦贸易委员会(FTC)已发布指南以解决这些问题,但非合规和消费者伤害仍然存在,且这些问题的严重程度仍不清楚。在本文中,我们介绍了利用最近的Web测量和NLP研究进展开发的工具,以研究YouTube上的affiliate营销生态系统。我们应用这些工具对来自近54万创作者的200万视频的10年数据集进行分析,研究YouTube上affiliate营销的普及程度及非合规行为的比例。我们的发现表明,affiliate链接广泛存在,但披露合规性仍然很低,大多数视频未能达到FTC标准。此外,我们分析了不同利益相关者在改善披露行为上的影响。我们的研究表明,平台通过标准化披露功能与提高合规性密切相关。我们建议监管机构和affiliate合作伙伴应与平台合作,以提高影响力经济中的透明度、问责制和信任度。

英文摘要

YouTube has evolved into a powerful platform where creators monetize their influence through affiliate marketing, raising concerns about transparency and ethics, especially when creators fail to disclose their affiliate relationships. Although regulatory agencies like the US Federal Trade Commission (FTC) have issued guidelines to address these issues, non-compliance and consumer harm persist, and the extent of these problems remains unclear. In this paper, we introduce tools, developed with insights from recent advances in Web measurement and NLP research, to examine the state of the affiliate marketing ecosystem on YouTube. We apply these tools to a 10-year dataset of 2 million videos from nearly 540,000 creators, analyzing the prevalence of affiliate marketing on YouTube and the rates of non-compliant behavior. Our findings reveal that affiliate links are widespread, yet disclosure compliance remains low, with most videos failing to meet FTC standards. Furthermore, we analyze the effects of different stakeholders in improving disclosure behavior. Our study suggests that the platform is highly associated with improved compliance through standardized disclosure features. We recommend that regulators and affiliate partners collaborate with platforms to enhance transparency, accountability, and trust in the influencer economy.

2602.23833 2026-05-22 eess.IV cs.CV

Revisiting Integration of Image and Metadata for DICOM Series Classification: Cross-Attention and Dictionary Learning

重新审视图像与元数据整合用于DICOM系列分类:交叉注意力与字典学习

Tuan Truong, Melanie Dohmen, Sara Lorio, Matthias Lenga

AI总结 本文提出了一种端到端的多模态框架,用于DICOM系列分类,通过联合建模图像内容和获取元数据,显式考虑异质切片内容、可变系列长度和完全缺失、不完整或不一致的DICOM元数据等挑战。

Comments Early acceptance at MICCAI 2026

详情
AI中文摘要

自动化识别DICOM图像系列对于大规模医学图像分析、质量控制、协议标准化和可靠后续处理至关重要。然而,由于异质切片内容、可变系列长度和完全缺失、不完整或不一致的DICOM元数据,DICOM系列分类仍具挑战性。我们提出了一种端到端的多模态框架,用于DICOM系列分类,该框架联合建模图像内容和获取元数据,同时显式考虑这些挑战。(i)图像和元数据通过模态感知模块编码,并使用双向跨模态注意力机制融合。(ii)元数据通过基于可学习特征字典和值条件调制的稀疏、缺失感知编码器进行处理。通过设计,该方法不需要任何形式的填补。(iii)系列长度和图像数据维度的变化通过2.5D视觉编码器和在等距采样的切片上操作的注意力机制来处理。我们评估了所提出的方法在公开可用的Duke Liver MRI数据集和一个大型多机构内部队列上的表现,评估了域内性能和域外泛化能力。在所有评估设置中,所提出的方法一致优于相关的仅图像、仅元数据和多模态2D/3D基线。结果表明,显式建模元数据稀疏性和跨模态交互提高了DICOM系列分类的鲁棒性。

英文摘要

Automated identification of DICOM image series is essential for large-scale medical image analysis, quality control, protocol harmonization, and reliable downstream processing. However, DICOM series classification remains challenging due to heterogeneous slice content, variable series length, and entirely missing, incomplete or inconsistent DICOM metadata. We propose an end-to-end multimodal framework for DICOM series classification that jointly models image content and acquisition metadata while explicitly accounting for all these challenges. (i) Images and metadata are encoded with modality-aware modules and fused using a bi-directional cross-modal attention mechanism. (ii) Metadata is processed by a sparse, missingness-aware encoder based on learnable feature dictionaries and value-conditioned modulation. By design, the approach does not require any form of imputation. (iii) Variability in series length and image data dimensions is handled via a 2.5D visual encoder and attention operating on equidistantly sampled slices. We evaluate the proposed approach on the publicly available Duke Liver MRI dataset and a large multi-institutional in-house cohort, assessing both in-domain performance and out-of-domain generalization. Across all evaluation settings, the proposed method consistently outperforms relevant image only, metadata-only and multimodal 2D/3D baselines. The results demonstrate that explicitly modeling metadata sparsity and cross-modal interactions improves robustness for DICOM series classification.

2602.17973 2026-05-22 cs.CR cs.AI

PenTiDef: Decentralized Federated Intrusion Detection System with Differential Privacy and Latent-Space Defense via Blockchain Coordination in IIoT

PenTiDef:通过区块链协调在工业物联网中的去中心化联邦入侵检测系统,结合差分隐私和潜在空间防御

Phan The Duy, Nghi Hoang Khoa, Nguyen Tran Anh Quan, Luong Ha Tien, Ngo Duc Hoang Son, Van-Hau Pham

AI总结 本文提出PenTiDef,一种完全去中心化、隐私保护且抗中毒的联邦入侵检测系统(DFL-IDS)。该系统整合了三个关键组件:(i)客户端侧的分布式差分隐私(DDP)通过随机高斯噪声保护梯度泄露;(ii)一个轻量级的潜在空间防御模块,通过自动编码器提取并压缩倒数第二层表示(PLRs)为稳定的潜在语义表示(LSRs),随后通过中心核对齐(CKA)和K-均值聚类进行鲁棒的恶意更新检测,无需辅助数据集;(iii)一个许可型区块链层,通过智能合约协调链上验证、安全FedAvg聚合和不可变审计性,消除任何中心服务器。在CIC-IDS2018和Edge-IIoTSet上进行的大量实验表明,在独立同分布(IID)和现实非独立同分布(non-IID)设置下,即使对抗比例高达40%,PenTiDef在检测准确率和F1分数上均优于最先进的基线(FLARE和FedCC),同时保持较低的训练开销。通过在统一的安全聚合协议中共同解决隐私、鲁棒性和去中心化问题,PenTiDef为异构、对抗性的工业物联网环境中的可信协作入侵检测提供了实用且可扩展的解决方案。

Comments version 2, change title of the paper

详情
AI中文摘要

This paper proposes PenTiDef, a fully decentralized, privacy-preserving, and poisoning-resilient framework for decentralized federated IDS (DFL-IDS). PenTiDef synergistically integrates three key components: (i) client-side Distributed Differential Privacy (DDP) with stochastic Gaussian noise to protect gradient leakage, (ii) a lightweight latent-space defense module that extracts and compresses penultimate-layer representations (PLRs) into stable Latent Semantic Representations (LSRs) via AutoEncoder, followed by Centered Kernel Alignment (CKA) and K-Means clustering for robust malicious update detection without auxiliary datasets, and (iii) a permissioned blockchain layer with smart contracts that orchestrates on-chain validation, secure FedAvg aggregation, and immutable auditability, eliminating any central server. Extensive experiments on CIC-IDS2018 and Edge-IIoTSet under both IID and realistic non-IID settings, with adversary ratios up to 40\%, demonstrate that PenTiDef consistently outperforms state-of-the-art baselines (FLARE and FedCC) in detection accuracy and F1-score while maintaining lower training overhead. By jointly addressing privacy, robustness, and decentralization in a unified secure aggregation protocol, PenTiDef provides a practical and scalable solution for trustworthy collaborative intrusion detection in heterogeneous, adversarial IIoT environments.

英文摘要

This paper proposes PenTiDef, a fully decentralized, privacy-preserving, and poisoning-resilient framework for decentralized federated IDS (DFL-IDS). PenTiDef synergistically integrates three key components: (i) client-side Distributed Differential Privacy (DDP) with stochastic Gaussian noise to protect gradient leakage, (ii) a lightweight latent-space defense module that extracts and compresses penultimate-layer representations (PLRs) into stable Latent Semantic Representations (LSRs) via AutoEncoder, followed by Centered Kernel Alignment (CKA) and K-Means clustering for robust malicious update detection without auxiliary datasets, and (iii) a permissioned blockchain layer with smart contracts that orchestrates on-chain validation, secure FedAvg aggregation, and immutable auditability, eliminating any central server. Extensive experiments on CIC-IDS2018 and Edge-IIoTSet under both IID and realistic non-IID settings, with adversary ratios up to 40\%, demonstrate that PenTiDef consistently outperforms state-of-the-art baselines (FLARE and FedCC) in detection accuracy and F1-score while maintaining lower training overhead. By jointly addressing privacy, robustness, and decentralization in a unified secure aggregation protocol, PenTiDef provides a practical and scalable solution for trustworthy collaborative intrusion detection in heterogeneous, adversarial IIoT environments.

2602.04703 2026-05-22 eess.SP cs.LG

Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels

利用亚6GHz通道进行毫米波波束预测的知识蒸馏

Sina Tavakolian, Nhan Thanh Nguyen, Ahmed Alkhateeb, Markku Juntti

AI总结 本文提出了一种基于知识蒸馏技术的高效框架,利用亚6GHz通道预测毫米波波束,通过紧凑的学生深度学习架构在减少计算和内存需求的同时保持性能。

Comments 5 pages, 4 figures. Accepted for publication at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2026

详情
Journal ref
Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 22642-22646, 2026
AI中文摘要

在毫米波(mmWave)高机动环境中,波束成形通常会带来显著的训练开销。尽管先前研究指出亚6GHz通道可用于预测最优毫米波波束,但现有方法依赖于大型深度学习(DL)模型,具有不可接受的计算和内存需求。本文提出了一种基于知识蒸馏(KD)技术的计算高效框架,用于亚6GHz通道-毫米波波束映射。我们开发了两种紧凑的学生DL架构,基于个体和关系蒸馏策略,仅保留少量隐藏层,却能紧密模仿大型教师DL模型的性能。大量仿真表明,所提出的学生模型在保持教师的波束预测准确性和频谱效率的同时,将可训练参数和计算复杂度减少了99%。

英文摘要

Beamforming in millimeter-wave (mmWave) high-mobility environments typically incurs substantial training overhead. While prior studies suggest that sub-6 GHz channels can be exploited to predict optimal mmWave beams, existing methods depend on large deep learning (DL) models with prohibitive computational and memory requirements. In this paper, we propose a computationally efficient framework for sub-6 GHz channel-mmWave beam mapping based on the knowledge distillation (KD) technique. We develop two compact student DL architectures based on individual and relational distillation strategies, which retain only a few hidden layers yet closely mimic the performance of large teacher DL models. Extensive simulations demonstrate that the proposed student models achieve the teacher's beam prediction accuracy and spectral efficiency while reducing trainable parameters and computational complexity by 99%.

2601.21853 2026-05-22 cs.IR cs.LG

LEMUR: Learned Multi-Vector Retrieval

LEMUR: 学习多向量检索

Elias Jääsaari, Ville Hyvönen, Teemu Roos

AI总结 LEMUR通过将多向量相似性搜索转化为监督学习问题,并利用现有单向量搜索索引加速检索,实现了高效的多向量相似性搜索,比现有方法快一个数量级。

Comments Accepted to ICML 2026

详情
AI中文摘要

由晚期交互模型生成的多向量表示,如ColBERT,在信息检索应用中比单向量表示具有更优越的检索质量。在多向量检索系统中,查询和文档均使用每个标记一个嵌入进行编码,相似性通过MaxSim相似性度量来衡量。然而,多向量检索的改进质量是以显著增加的搜索延迟为代价的。在本工作中,我们引入了LEMUR,一种简单而高效的多向量相似性搜索框架。LEMUR由两个连续的问题简化组成:首先,我们将多向量相似性搜索转化为一个可以使用单隐藏层神经网络解决的监督学习问题。其次,我们将在此模型下的推断简化为其潜在空间中的单向量相似性搜索,从而能够利用现有的单向量搜索索引来加速检索。LEMUR比先前的多向量相似性搜索方法快一个数量级。我们的代码可在https://github.com/ejaasaari/lemur获取。

英文摘要

Multi-vector representations generated by late interaction models, such as ColBERT, enable superior retrieval quality compared to single-vector representations in information retrieval applications. In multi-vector retrieval systems, both queries and documents are encoded using one embedding per token, and similarity between queries and documents is measured by the MaxSim similarity measure. However, the improved quality of multi-vector retrieval comes at the expense of significantly increased search latency. In this work, we introduce LEMUR, a simple yet efficient framework for multi-vector similarity search. LEMUR consists of two consecutive problem reductions: First, we formulate multi-vector similarity search as a supervised learning problem that can be solved using a one-hidden-layer neural network. Second, we reduce inference under this model to single-vector similarity search in its latent space, enabling the use of existing single-vector search indexes to accelerate retrieval. LEMUR is an order of magnitude faster than prior multi-vector similarity search methods. Our code is available at https://github.com/ejaasaari/lemur

2601.18094 2026-05-22 eess.AS cs.SD

OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion

OneVoice: 一个模型,三种场景——迈向统一的零样本语音转换

Zhichao Wang, Tao Li, Wenshuo Ge, Zihao Cui, Shilei Zhang, Junlan Feng

AI总结 本文提出OneVoice,一种能够统一处理语音转换三种场景(语音克隆、语言保护和歌唱)的零样本框架,通过混合专家机制和双路径路由机制实现统一建模,并采用两阶段训练策略解决数据不平衡问题。

详情
AI中文摘要

最近语音转换(VC)的进展在说话人克隆和语言保护方面达到了新的里程碑。但该领域仍碎片化,依赖专门模型处理语言保护、表达和歌唱场景。我们提出OneVoice,一个统一的零样本框架,能够在单一模型中处理所有三种场景。OneVoice基于一个连续语言模型,通过无VAE的next-patch扩散进行训练,确保高保真和高效的序列建模。其统一设计的核心在于混合专家(MoE)机制,旨在显式建模共享的转换知识和场景特定的表达性。专家选择由双路径路由机制协调,包括共享专家隔离和场景感知的领域专家分配,结合全局-局部线索。为了精确条件化,场景特定的音调特征通过门控机制融合到每一层,允许适应性地使用音调信息。此外,为了实现核心思想并缓解数据不平衡问题(语音数据丰富,歌唱数据稀缺),我们采用两阶段渐进训练,包括基础预训练和使用LoRA基于的领域专家的场景增强。实验表明,OneVoice在所有三种场景中与专用模型匹配或超越,同时验证了灵活的场景控制,并提供了一种快速解码版本,仅需几步即可。音频样本可在演示页面上获取。

英文摘要

Recent progress of voice conversion~(VC) has achieved a new milestone in speaker cloning and linguistic preservation. But the field remains fragmented, relying on specialized models for linguistic-preserving, expressive, and singing scenarios. We propose OneVoice, a unified zero-shot framework capable of handling all three scenarios within a single model. OneVoice is built upon a continuous language model trained with VAE-free next-patch diffusion, ensuring high fidelity and efficient sequence modeling. Its core design for unification lies in a Mixture-of-Experts (MoE) designed to explicitly model shared conversion knowledge and scenario-specific expressivity. Expert selection is coordinated by a dual-path routing mechanism, including shared expert isolation and scenario-aware domain expert assignment with global-local cues. For precise conditioning, scenario-specific prosodic features are fused into each layer via a gated mechanism, allowing adaptive usage of prosody information. Furthermore, to enable the core idea and alleviate the imbalanced issue (abundant speech vs. scarce singing), we adopt a two-stage progressive training that includes foundational pre-training and scenario enhancement with LoRA-based domain experts. Experiments show that OneVoice matches or surpasses specialized models across all three scenarios, while verifying flexible control over scenarios and offering a fast decoding version as few as 2 steps. Audio samples are available on demo page.

2512.21132 2026-05-22 cs.CR cs.AI cs.LG cs.PL

AutoBaxBuilder: Bootstrapping Code Security Benchmarking

AutoBaxBuilder: 通过代码安全基准测试进行代码安全性评估

Tobias von Arx, Niels Mündler, Mark Vero, Maximilian Baader, Martin Vechev

AI总结 本文提出AutoBaxBuilder,一种自动化生成代码安全基准测试任务的流水线,通过结合LLM的代码理解能力与可靠性检查,构建功能测试和端到端的安全性探测利用,从而提高代码安全性的评估效率和准确性。

Comments ICML 2026

详情
AI中文摘要

随着大型语言模型(LLMs)在软件工程中的广泛应用,对LLM生成代码的正确性和安全性的可靠评估至关重要。值得注意的是,先前的研究表明LLMs容易生成包含安全漏洞的代码,凸显了安全问题常被忽视。这些见解是通过安全专家通过大量手动工作专门设计的基准测试实现的。然而,基准测试(i)不可避免地会污染训练数据,(ii)必须扩展到新任务以提供更全面的视图,(iii)必须增加难度以挑战更强大的LLMs。在本工作中,我们解决了这些挑战,并提出了AutoBaxBuilder,一种自动化流水线,能够从头开始生成代码安全基准测试任务。它利用LLM的代码理解能力,结合稳健的可靠性检查,构建功能测试和端到端的安全性探测利用。该流水线的质量通过将其预测与专家编写的基础线对齐,并通过手动验证其正确性进行定性验证。我们使用AutoBaxBuilder构建了一个新的基准测试,并将其发布给公众作为AutoBaxBench,同时对当前的LLMs进行了全面评估。AutoBaxBuilder在不到2小时内生成新的任务,费用低于4美元。包括手动验证,这将基准测试构建所需的人力工作减少了一个因素12。

英文摘要

As large language models (LLMs) see wide adoption in software engineering, the reliable assessment of the correctness and security of LLM-generated code is crucial. Notably, prior work showed that LLMs are prone to generating code with security vulnerabilities, highlighting that security is often overlooked. These insights were enabled by specialized benchmarks crafted by security experts through significant manual effort. However, benchmarks (i) inevitably end up contaminating training data, (ii) must extend to new tasks to provide a more complete picture, and (iii) must increase in difficulty to challenge more capable LLMs. In this work, we address these challenges and present AutoBaxBuilder, an automated pipeline that generates code security benchmarking tasks from scratch. It leverages the code-understanding capabilities of LLMs combined with robust reliability checks to construct functional tests and end-to-end security-probing exploits. The quality of the pipeline is quantitatively confirmed by aligning its predictions with an expert-written baseline and qualitatively validated through manual soundness verification. We use AutoBaxBuilder to construct a new benchmark and release it to the public as AutoBaxBench, together with a thorough evaluation on contemporary LLMs. AutoBaxBuilder generates new tasks in under 2 hours, for less than USD 4. Including a manual verification, this reduces the required human effort for benchmark construction by a factor of 12.

2512.06556 2026-05-22 cs.CR cs.AI

Semantic Attacks on Tool-Augmented LLMs: Securing the Model Context Protocol Against Descriptor-Level Manipulation

对增强工具的语义攻击:保护模型上下文协议免受描述级操纵

Saeid Jamshidi, Arghavan Moradi Dakhel, Kawser Wazed Nafi, Foutse Khomh

AI总结 本文研究了通过工具描述符与外部工具交互的模型上下文协议(MCP)中存在的语义攻击问题,提出了一种分层防御方案,通过描述符完整性验证、预上下文语义审查和轻量级运行时防护机制,有效降低描述级操纵导致的不安全工具调用风险。

详情
AI中文摘要

模型上下文协议(MCP)使大型语言模型(LLMs)能够通过工具描述符与外部工具交互,从而扩展其任务执行、自主决策和多智能体协调的能力。现有MCP部署将工具描述符视为可信元数据,尽管其直接整合到LLM推理上下文中。这引入了一个此前未被充分探索的语义攻击面。当前防御主要针对提示注入,忽略了描述级操纵可能偏转工具选择和后续推理。为解决这一差距,我们正式化了三种描述驱动攻击类别:工具污染、影子和拉扯。我们提出了一种分层防御方案,整合描述符完整性验证、预上下文语义审查与辅助LLM以及轻量级运行时防护机制,无需模型重新训练。我们评估了GPT-5.3、DeepSeek-V3和LLaMA-3.5在八个提示策略下的表现,在受控的对抗性MCP场景中,工具元数据被操纵以模拟现实攻击。结果表明,描述操纵可以显著改变工具选择行为,在基线配置下,导致高达36%的不安全工具调用。所提出的全栈缓解措施将不安全调用减少到15%,同时将阻断率提高到74%,显示出对描述驱动攻击的显著改进。跨模型分析进一步揭示了不同LLM架构和提示策略在鲁棒性、延迟和对描述级操纵的敏感性方面的显著差异。本研究提供了对描述级威胁和缓解策略的受控跨模型评估,为部署安全且稳健的工具增强LLM奠定了实证基础。

英文摘要

The Model Context Protocol (MCP) enables Large Language Models (LLMs) to interact with external tools via tool descriptors, thereby extending their capabilities for task execution, autonomous decision-making, and multi-agent coordination. Existing MCP deployments treat tool descriptors as trusted metadata, despite their direct integration into the LLM reasoning context. This introduces a previously underexplored semantic attack surface. Current defenses primarily target prompt injection, neglecting descriptor-level manipulation that can bias tool selection and downstream reasoning. To address this gap, we formalize three descriptor-driven attack classes: Tool Poisoning, Shadowing, and Rug Pull. We propose a layered defense solution that integrates descriptor integrity verification, pre-context semantic vetting with an auxiliary LLM, and lightweight runtime guardrails, without requiring model retraining. We evaluate GPT-5.3, DeepSeek-V3, and LLaMA-3.5 across eight prompting strategies in controlled, adversarial MCP scenarios in which tool metadata is manipulated to simulate realistic attacks. Results demonstrate that descriptor manipulation can substantially alter tool-selection behavior, producing unsafe tool invocations in up to 36% of trials under baseline configurations. The proposed full-stack mitigation reduces unsafe invocations to 15% while increasing the block rate to 74%, demonstrating substantial improvement in resistance to descriptor-driven attacks. Cross-model analysis further reveals significant differences in robustness, latency, and sensitivity to descriptor-level manipulation across LLM architectures and prompting strategies. This study provides a controlled cross-model evaluation of descriptor-level threats and mitigation strategies in tool-calling LLM systems, establishing an empirical foundation for deploying secure and resilient tool-augmented LLMs.

2512.04111 2026-05-22 cs.SE cs.AI cs.HC

CentaurEval: Benchmarking Human-in-the-Loop Value in Agentic Coding

CentaurEval: 评估人机协同在编程中的价值

Hanjun Luo, Chiming Ni, Jiaheng Wen, Zhimu Huang, Yiran Wang, Bingduo Liao, Sylvia Chung, Yingbin Jin, Xinfeng Li, Wenyuan Xu, XiaoFeng Wang, Hanan Salam

AI总结 本文提出CentaurEval基准测试,用于评估人机协同在编程中的价值。该基准测试通过协作必要问题模板,结合人类推理和AI效率,展示了人机协作在编程任务中的显著优势。

Comments Accepted by ICML 2026

详情
AI中文摘要

基于大语言模型的编程代理正在重塑开发范式。然而,现有的评估系统既不是针对人类的传统测试,也不是针对LLM的基准测试,无法捕捉这种转变,排除了需要人类推理引导解决方案和AI效率实施的问题。我们引入CentaurEval,一个统一且生态有效的基准测试,用于衡量编程中的协同价值。CentaurEval的核心创新是其“协作必要”问题模板,这些模板对单独的LLM或人类来说是不可解的,但通过有效的协作可以解决。CentaurEval动态实例化45个模板的任务,提供标准化的IDE供人类使用,以及可重复的450任务工具包供LLM使用。我们对45名参与者和5个LLM在4个层次的人类干预下进行了基准测试。结果显示,虽然单独的LLM或人类实现的通过率仅为0.67%和18.89%,但人机协作显著提高到31.11%。我们的分析揭示了一种新兴的共同推理伙伴关系,挑战了传统的工具层级,表明战略突破可以来自人类或AI。

英文摘要

LLM-powered coding agents are reshaping the development paradigm. However, existing evaluation systems, neither traditional tests for humans nor benchmarks for LLMs, fail to capture this shift, excluding problems that require both human reasoning to guide solutions and AI efficiency for implementation. We introduce CentaurEval, a unified, ecologically valid benchmark for measuring human-in-the-loop value in coding. CentaurEval's core innovation is its "Collaboration-Necessary" problem templates, which are intractable for standalone LLMs or humans, but solvable through effective collaboration. CentaurEval dynamically instantiates tasks from 45 templates, providing a standardized IDE for humans and a reproducible 450-task toolkit for LLMs. We benchmark 45 participants against 5 LLMs under 4 levels of human intervention. Results show that while LLMs or humans alone achieve poor pass rates (0.67% and 18.89%), human-AI collaboration significantly improves to 31.11%. Our analysis reveals an emerging co-reasoning partnership, challenging the traditional human-tool hierarchy by showing that strategic breakthroughs can originate from either humans or AI.