arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.22306 2026-05-22 cs.MA cs.AI

ACCoRD: Actor-Critic Conflict Resolution with Deep learning for O-RAN xApps

ACCoRD：基于深度学习的O-RAN xApps中的Actor-Critic冲突解决

Cezary Adamczyk, Adrian Kliks

发表机构 * Institute of Radiocommunications（无线电通信研究所）； Poznan University of Technology（波兹南技术大学）

AI总结本文提出了一种基于深度学习的Actor-Critic冲突解决方法ACCoRD，用于在O-RAN xApps中实时解决控制冲突，通过强化学习算法PPO-Clip训练人工神经网络，提高了规则方法在中高流量场景下的效率。

详情

AI中文摘要

冲突缓解（ConMit）是智能网络控制在开放无线电接入网络（O-RAN）中的关键部分。本文提出了一种名为ACCoRD的方法，通过在近实时RAN智能控制器中使用一个通过强化学习算法PPO-Clip训练的人工神经网络（ANN）来解决检测到的控制冲突。实现的人工神经网络分析有关网络和冲突控制决策的数据，以推断最优的冲突解决（CR）操作。冲突解决代理在每次解决冲突后从网络收集反馈，以评估其效率并在批量训练中调整ANN的权重。所提出方法的评估基于仿真数据。提出了一种新的评估CR解决方案的方法。结果表明，所提出的基于ANN的方法通过显著减少由冲突控制决策引起的负面网络事件，提高了规则方法的效率。

英文摘要

Conflict Mitigation (ConMit) is a crucial part of intelligent network control in Open Radio Access Networks (O-RAN). In this paper, we propose a method named ACCoRD to resolve detected control conflicts in Near-Real Time RAN Intelligent Controller using a Conflict Resolution (CR) Agent with an Artificial Neural Network (ANN) trained with a reinforcement learning algorithm PPO-Clip. The implemented ANN analyzes data about the network and conflicting control decisions to infer optimal CR actions. The CR Agent gathers feedback from the network after each resolved conflict to assess its efficiency and adjust the ANN's weights during batch training. The evaluation of the proposed approach is based on simulation data. A new methodology for evaluating CR solutions is proposed. Results show that the proposed ANN-based method improves on the efficiency of rule-based approaches by significantly reducing negative network events caused by conflicting control decisions in medium and high traffic scenarios.

URL PDF HTML ☆

赞 0 踩 0

2605.22268 2026-05-22 cs.NI cs.AI cs.CV

Impact of Atmospheric Turbulence and Pointing Error on Earth Observation

大气湍流和指向误差对地球观测的影响

Celia Sánchez-de-Miguel, Antonio M. Mercado-Martínez, Beatriz Soret, Antonio Jurado-Navas, Miguel Castillo-Vázquez

发表机构 * TELMA, University of Malaga（TELMA，马拉加大学）

AI总结本文研究了大气湍流和指向误差对地球观测图像的影响，提出了一种增强的图像模拟器来生成物理真实的失真图像，并通过案例研究评估了YOLOv8和RetinaNet在不同湍流和指向误差条件下的性能。

Comments Conference

详情

AI中文摘要

地球观测（EO）图像常常受到大气湍流和指向抖动的退化；然而，这些效应很少被考虑在用于训练基于AI的检测模型的数据集中。基于先前的工作，本文提出了一种增强的图像模拟器，能够将垂直路径的大气湍流和卫星指向抖动（源于平台和传感器振动）纳入其中，以生成物理上真实的失真图像。作为案例研究，使用YOLOv8和RetinaNet在由所提出模拟器生成的图像上评估船舶检测，结果表明，在理想条件下，YOLOv8的召回率从91%下降到弱湍流存在时的60%，在强湍流或抖动下低于40%。相比之下，RetinaNet表现出更大的鲁棒性，在退化条件下保持约75%的召回率。这些结果突显了在EO训练数据集中纳入真实物理退化的重要性，以确保AI模型在操作环境中的可靠性能，如在海上监控应用中所展示的那样。

英文摘要

Earth Observation (EO) imagery is often degraded by atmospheric turbulence and pointing jitter; yet, these effects are rarely considered in datasets used to train AI-based detection models. Based on prior work, this paper presents an enhanced image simulator that enables the incorporation of vertical-path atmospheric turbulence and satellite pointing jitter, arising from platform and sensor vibrations, to generate physically realistic distorted images. As a case study, vessel detection is evaluated using YOLOv8 and RetinaNet on images generated by the proposed simulator under different levels of turbulence and pointing errors. Results show that YOLOv8 recall decreases from 91% under ideal conditions to 60% in the presence of weak turbulence, and falls below 40% under strong turbulence or jitter. In contrast, RetinaNet demonstrates greater robustness, maintaining approximately 75% recall across degraded conditions. These results highlight the importance of incorporating realistic physical degradations into EO training datasets to ensure reliable performance of AI-based models in operational environments, as demonstrated in maritime surveillance applications.

URL PDF HTML ☆

赞 0 踩 0

2605.22207 2026-05-22 eess.SY cs.LG cs.SY

Kernel-Based Safe Exploration in Deep Reinforcement Learning

基于核的深度强化学习安全探索

Rupak Majumdar, Nikhil Singh, Sadegh Soudjani

发表机构 * Max Planck Institute for Software Systems（马克斯·普朗克软件系统研究所）

AI总结本文提出了一种基于核的方法，用于在深度强化学习中安全探索，通过学习屏障函数来保证策略不会进入危险区域，同时在探索过程中同时学习最优策略和屏障函数，提供更可靠的概率安全保证。

Comments Accepted at L4DC Conference (22 Jan 2026)

详情

AI中文摘要

安全性在将深度强化学习算法部署到现实世界时是一个主要关注点。一种有前景的方向是学习一个屏障函数，以确保学习的策略不会访问危险区域。屏障函数是从状态到实数的函数，它将初始状态赋予低值，将危险状态赋予高值，并在每次转移中减少期望值；这样的函数可用于限制到达危险状态的概率。以前的研究直接从探索数据中学习屏障函数，但需要大量数据或对系统动力学的限制。在本文中，我们展示了如何利用核嵌入来学习深度强化学习中随机系统的屏障函数。我们的算法，称为基于核的安全探索（KBSE），在探索过程中同时学习最优策略和屏障函数。屏障函数是通过迭代计算得到的，并以条件均值嵌入表示，随着探索的增加，它们提供更好的概率安全保证。探索算法使用学习到的屏障函数来识别安全违规。在发生违规时，它会干预，将危险动作改为安全动作，从而确保探索仅限于限制到达危险状态概率的动作。我们评估了KBSE在多个复杂的连续控制基准上的性能。实验结果表明，我们的新算法适用于合成概率安全的控制策略，而不会影响奖励的累积。

英文摘要

Safety has been a major concern when deploying deep reinforcement learning algorithms in the real world. A promising direction that ensures that the learned policy does not visit unsafe regions is to learn a \emph{barrier function} along with the policy. A barrier is a function from states to reals that assigns low values to the initial states, high values to the unsafe states, and decreases in expectation on each transition; such a function can be used to bound the probability of reaching unsafe states. Previous attempts learned a barrier function directly from exploration data, but this required either large amounts of data or restrictions on the system dynamics. In this paper, we show how kernel embeddings can be used to learn barrier functions during deep reinforcement learning for stochastic systems with unknown dynamics. Our algorithm, \emph{kernel-based safe exploration (KBSE)}, learns an optimal policy and a barrier simultaneously during exploration. The barriers are computed iteratively, represented as conditional mean embeddings, and provide better probabilistic safety guarantees with more exploration. The exploration algorithm uses the learned barrier functions to identify safety violations. In the case of violation, it intervenes to modify the unsafe action to a safe action, thereby ensuring that the exploration is restricted to actions that bound the probability of reaching unsafe states. We evaluate KBSE on several complex continuous control benchmarks. Experimental results establish our new algorithm to be suitable for synthesizing control policies that are probabilistically safe without degradation in reward accumulation.

URL PDF HTML ☆

赞 0 踩 0

2605.22206 2026-05-22 cs.NE cs.AI cs.RO

Temporal Coding as a Substrate for Sensorimotor Object Inference: A Spiking Reinterpretation of Thousand Brains Architecture

时间编码作为感觉运动物体推断的子基质：一种脉冲重解释的千脑架构

Joy Bose

发表机构 * Independent Researcher（独立研究者）

AI总结该研究提出用脉冲编码替代密集向量，以更有效地编码传感器接触顺序，从而提升物体识别的准确性和鲁棒性，核心方法是基于STDP的学习规则和可学习参数lambda，主要贡献是验证了时间编码在不同空间排列和噪声水平下的优越性能。

Comments 18 pages, 5 figures

详情

AI中文摘要

千脑理论（TBT）及其开源的Monty框架通过感觉运动推断进行物体识别——通过主动移动传感器跨物体表面并逐接触建立证据。当前实现将每个接触编码为密集浮点向量。虽然Monty跟踪步间位移并跨接触积累证据，但其将每个接触的特征激活模式视为无序集合——特征遇到的顺序不具有表征意义。在TBT中，接触的顺序具有空间意义：知道在从左到右的扫过中特征A在特征B之前被感受到，可以告诉你A和B在物体上的位置。密集向量丢弃了这种顺序。我们提出用等级顺序脉冲包替代密集向量：每个接触产生一连串神经事件的短暂爆发，其中最强烈激活的神经元首先放电。连续爆发之间的时间间隔隐含地编码传感器位移，而无需显式坐标计算。一种生物启发的学习规则（STDP）将遍历方向编码到突触权重中。一个可学习的参数lambda调整对早期与近期接触的依赖程度，适应每个物体的几何形状。我们推导出三个可检验的预测，并指定了四个组件的大约450行NumPy实现。三个合成实验验证了核心主张：时间编码在具有相同特征但不同空间排列的物体上实现完美判别准确性，而密集积累在偶然情况下表现不佳；时间编码在所有测试噪声水平上保持30-50个百分点的优势；适应性的lambda收敛到不同的值，反映物体几何复杂性。对Monty的YCB基准的端到端评估留待未来工作。

英文摘要

The Thousand Brains Theory (TBT) and its open-source Monty framework model object recognition through sensorimotor inference -- identifying objects by actively moving a sensor across their surface and building evidence contact by contact. The current implementation encodes each contact as a dense floating-point vector. While Monty tracks inter-step displacement and accumulates evidence across contacts, it treats the feature activation pattern at each contact as an unordered set - the directional sequence in which features are encountered carries no representational weight. In TBT, the sequence of contacts carries spatial meaning: knowing that feature A was felt before feature B during a left-to-right sweep tells you something about where A and B sit on the object. Dense vectors discard this ordering. We propose replacing dense vectors with rank-order spike packets: each contact produces a brief burst of neural events where the most strongly activated neuron fires first. The time gap between successive bursts implicitly encodes sensor displacement without explicit coordinate calculations. A biologically motivated learning rule (STDP) encodes traversal direction into synaptic weights. A learnable parameter lambda adjusts reliance on earlier versus recent contacts, adapting to each object's geometry. We derive three testable predictions and specify an implementation of four components in approximately 450 lines of NumPy. Three synthetic experiments confirm the core claims: temporal coding achieves perfect discrimination accuracy on objects with identical features in different spatial arrangements, where dense accumulation performs at chance; temporal coding maintains a 30-50 percentage point advantage across all tested noise levels; the adaptive lambda converges to distinct values, reflecting object geometric complexity. End-to-end evaluation on Monty's YCB benchmark is left for future work.

URL PDF HTML ☆

赞 0 踩 0

2605.22175 2026-05-22 cs.SE cs.AI

SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering?

SWE-Mutation：LLMs能否在软件工程中生成可靠的测试套件？

Yuxuan Sun, Yuze Zhao, Yufeng Wang, Yao Du, Zhiyuan Ma, Jinbo Wang, Mengdi Zhang, Kai Zhang, Zhenya Huang

发表机构 * State Key Laboratory of Cognitive Intelligence（认知智能国家重点实验室）； University of Science and Technology of China（中国科学技术大学）； Beihang University（北航）； School of Mathematical Sciences, Peking University（北京大学数学科学学院）； NeoShell AI ； Institute of Artificial Intelligence, Hefei Comprehensive National Science Center（合肥综合性国家科学中心人工智能研究院）

AI总结本文提出SWE-Mutation基准，用于评估LLM生成的测试套件质量，通过系统性地变异解决方案来测试测试套件的可靠性，并发现当前LLM在生成可靠且具有判别力的测试套件方面存在不足。

Comments 24 pages, 8 figures

Journal ref ACL 2026 Findings

详情

AI中文摘要

评估软件工程能力已成为现代大语言模型（LLMs）的核心组成部分；然而，进一步扩展的关键瓶颈不在于高质量解决方案的稀缺，而在于高质量测试套件的缺乏。测试套件对于合成程序修复轨迹和在强化学习中提供精确反馈信号至关重要。不幸的是，由于标注成本高且困难，高质量测试套件长期以来难以获得，而由LLM自动生成的测试套件往往肤浅且缺乏足够的判别力。作为构建高质量测试套件的第一步，我们介绍了SWE-Mutation，一个用于评估LLM生成测试套件的基准。该基准通过引入系统性变异的解决方案来表征测试套件，这些变异试图“欺骗”测试套件并通过验证。我们进一步提出了一种代理、语言无关的框架，用于自动生成复杂的变异体。我们的基准包含2,636个变异体，源自800个原始实例，并包含覆盖九种编程语言的多语言子集。对七种LLM的实验表明，即使DeepSeek-V3.1也仅达到10.20%的验证率和36.15%的检测率，突显了当前LLM的不足。此外，我们的代理变异策略增强了现实性，与传统方法相比，将平均检测率从71.04%降低到39.81%。这些发现揭示了当前LLM在生成可靠且具有判别力的测试套件方面存在的持续缺陷。

英文摘要

Evaluating software engineering capabilities has become a core component of modern large language models (LLMs); however, the key bottleneck hindering further scaling lies not in the scarcity of high-quality solutions, but in the lack of high-quality test suites. Test suites are indispensable both for synthesizing program repair trajectories and for providing precise feedback signals in reinforcement learning. Unfortunately, due to the high cost and difficulty of annotation, high-quality test suites have long been hard to obtain, while those automatically generated by LLMs tend to be superficial and lack sufficient discriminative power. As a first step toward constructing high-quality test suites, we introduce SWE-Mutation, a benchmark for evaluating LLM-generated test suites. The benchmark characterizes test suites by introducing systematically mutated solutions that attempt to ``fool'' the test suites and pass validation. We further propose an agentic, language-agnostic framework for automatically generating complex mutants. Our benchmark consists of 2,636 mutated variants derived from 800 original instances and includes a multilingual subset spanning nine programming languages. Experiments on seven LLMs reveal that even DeepSeek-V3.1 achieves only 10.20% verification and 36.15% detection rates, highlighting the inadequacy of current LLMs. Additionally, our agentic mutation strategy enhances realism, reducing average detection rates from 71.04% to 39.81% compared to conventional methods. These findings expose persistent deficiencies in the ability of current LLMs to generate reliable and discriminative test suites.

URL PDF HTML ☆

赞 0 踩 0

2605.22122 2026-05-22 cs.CR cs.AI

Adversarial Trust Poisoning in Vehicular Collaborative Perception

车联网协作感知中的对抗信任污染

Yutong Liu, Chenyi Wang, Ming F. Li, Qingzhao Zhang

发表机构 * Connected and Autonomous Vehicles（连接与自动驾驶车辆）； Collaborative Perception（协同感知）

AI总结该研究提出TrustFlip攻击，利用一致性防御机制污染对良性车辆的信任评分，导致系统感知能力下降甚至安全故障，同时提出TrustReflect作为缓解措施。

详情

AI中文摘要

协作感知（CP）使连接和自动驾驶车辆能够共享传感器数据并共同感知环境。为防御对抗者篡改共享数据，现有系统采用跨车辆不一致性检测和信任估计，惩罚与多数观察冲突的车辆。本文证明这些防御本身引入了新的攻击面。我们提出了TrustFlip，一种利用一致性防御机制污染对良性车辆信任的新型攻击。不同于注入虚假数据，它部署真实的物理对抗对象，诱导良性车辆产生不一致观察。由此产生的不一致被防御机制误归因于目标车辆，导致其信任分数下降并最终被降权或排除。因此，系统失去可靠感知贡献者，降低感知能力，可能引发安全关键故障。我们在多个协作感知架构和防御机制上评估TrustFlip。结果表明，最先进防御可显著受影响：攻击在87.7%的场景中将目标良性车辆排除在协作之外，并将平均精度（AP）降低高达13%。作为初步缓解措施，我们引入TrustReflect，一种轻量级的自我反思机制，将争议区域标记为不确定并排除在信任评估之外，将攻击成功率降低35-100%。

英文摘要

Collaborative perception (CP) enables connected and autonomous vehicles to share sensor data and jointly reason about their environment. To defend against adversaries that fabricate or manipulate shared data, existing systems employ cross-vehicle inconsistency detection and trust estimation, penalizing vehicles whose observations conflict with the majority. In this work, we show that these defenses themselves introduce a new attack surface. We present TrustFlip, a novel attack that weaponizes consistency-based defenses to poison the trust assigned to benign vehicles. Instead of injecting false data into the collaboration pipeline, it deploys physical adversarial objects that are genuine but induce inconsistent observations among benign vehicles. The resulting inconsistencies are misattributed by the defense to the targeted vehicle, causing its trust score to degrade and eventually leading to its downweighting or exclusion from collaboration. Consequently, the system loses reliable sensing contributors, degrading perception capability and potentially inducing safety-critical failures. We evaluate TrustFlip across multiple collaborative perception architectures and defense mechanisms. Our results show that state-of-the-art defenses can be significantly affected: the attack removes the targeted benign vehicle from collaboration in up to 87.7% of scenarios and drops Average Precision (AP) by up to 13%. As an initial mitigation, we introduce TrustReflect, a lightweight self-reflection mechanism that marks disputed regions as uncertain and excludes them from trust evaluation, reducing the attack success rate by 35-100%.

URL PDF HTML ☆

赞 0 踩 0

2605.22120 2026-05-22 eess.AS cs.SD

Effective User-defined Keyword Spotting with Dual-stage Matching, Multi-modal Enrollment, and Continual Adaptation

高效的用户定义关键词侦测：双阶段匹配、多模态注册与持续适应

Zhiqi Ai, Han Cheng, Shiyi Mu, Xinnuo Li, Yongjin Zhou, Shugong Xu

发表机构 * New York University（纽约大学）； Xi’an Jiaotong-Liverpool University（西安交通大学利物浦大学）

AI总结本文提出DMA-KWS框架，通过双阶段匹配、多模态注册和持续适应方法，解决用户定义关键词侦测中的混淆词区分、说话人发音不一致和高数据成本问题，实验表明其在LibriPhrase Hard子集上达到97.85%的AUC和6.13%的EER，性能领先。

Comments 14 pages, 13 figures, 12 tables. Accepted by TASLP

详情

AI中文摘要

用户定义关键词侦测（KWS）对于个性化语音交互至关重要，但现有方法面临几个挑战：（1）混淆词之间的区分度不足，（2）在发音不同的说话人之间性能不一致，（3）高数据成本以确保可靠的唤醒词性能。本文介绍DMA-KWS，一种高效的、稳健的用户定义关键词侦测框架。首先，它采用双阶段匹配流程：CTC解码结合流式音素搜索来定位候选段，随后使用QbyT结合音素匹配器进行精细验证，使其能够更好地区分混淆词。接下来，多模态注册融合用户特定的语音与文本嵌入，进一步提高已注册用户的准确性。最后，参数高效的持续适应机制通过合成和真实数据进行轻量级更新。广泛的实验表明DMA-KWS的优越性能。在LibriPhrase Hard子集上，它实现了97.85%的AUC和6.13%的EER，达到最先进的性能。在说话人依赖设置中，DMA-KWS始终优于文本-only注册，显示出显著的性能提升。此外，所提出的参数高效的微调机制仅需187k个更新参数即可适应DMA-KWS，进一步提高KWS性能，同时确保适用于设备部署。

英文摘要

User-defined keyword spotting (KWS) is crucial for personalized voice interaction, yet existing methods face several challenges: (1) insufficient discriminability among confusable words, (2) performance inconsistency across speakers with varying pronunciations, and (3) high data cost to ensure reliable wake-word performance. In this paper, we introduce DMA-KWS, an efficient and robust framework for user-defined keyword spotting. First, it adopts a dual-stage matching pipeline: CTC decoding with streaming phoneme search to locate candidate segments, followed by QbyT with a phoneme matcher for fine-grained verification, enabling it to better distinguish confusable words. Next, multi-modal enrollment fuses user-specific speech with text embeddings to further improve accuracy for registered users. Finally, a parameter-efficient continual adaptation mechanism performs lightweight updates using synthetic and real data. Extensive experiments demonstrate the superior performance of DMA-KWS. On the LibriPhrase Hard subset, it achieves 97.85% AUC and 6.13% EER, reaching state-of-the-art performance. In speaker-dependent settings, DMA-KWS consistently outperforms text-only enrollment, demonstrating significant performance gains. Moreover, the proposed parameter-efficient fine-tuning mechanism adapts DMA-KWS with only 187k updated parameters, further enhancing KWS performance while ensuring suitability for on-device deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.22112 2026-05-22 astro-ph.HE astro-ph.IM cs.LG

Self-Supervised ConvLSTM for Fermi Large Area Telescope Transient Detection

基于自监督的ConvLSTM用于费米大视场望远镜瞬变检测

Alberto Garinei, Stefano Speziali, Alessandro Vispa, Andrea Marini, Sara Cutini, Emanuele Piccioni, Marcello Marconi, Francesco Longo, Matteo Martini, Francesca Fallucchi, Romeo Giuliano, Ernesto William De Luca, Umberto Di Matteo, Sabino Meola

发表机构 * Idea-RE

AI总结本文提出了一种结合端到端模拟和自监督时空深度学习的方法，用于在受控环境中检测费米- LAT中的瞬变伽马射线现象，通过生成一个十年合成宇宙并利用ConvLSTM网络来建模天空的典型演变，以检测异常。

Comments 17 pages, 5 figures. Accepted for publication in Astronomy and Computing. Author-accepted manuscript version

Journal ref Astronomy and Computing 56 (2026) 101128

详情

DOI: 10.1016/j.ascom.2026.101128

AI中文摘要

我们提出了一种框架，通过将费米- LAT天空的端到端模拟与自监督时空深度学习相结合，用于在受控环境中检测瞬变伽马射线现象。我们使用gtobssim生成一个十年的合成宇宙，并将模拟事件处理成每日全天空计数和曝光图，获得一个时间有序的序列，其结构与费米- LAT观测一致。为了建模天空的典型演变，我们采用卷积长短期记忆网络（ConvLSTM），该网络直接在地图序列上运行，保持空间局部性的同时学习时间依赖性。模型被训练以重建预期的发射，偏离学习基线的量通过像素级均方残差图量化。然后，我们通过从训练集上的残差分布估计每个像素的阈值，定义统计学驱动的异常标准，并通过局部滤波强制空间一致性以抑制孤立波动。训练后的ConvLSTM被部署到费米- LAT每日地图上，其中天空可能由于真实的天体物理变化或仪器非平稳性而偏离典型行为。所得到的流程可以标记出与高变源或瞬变事件（如耀斑或伽马射线暴）一致的局部、时间依赖的过剩，并为在长持续时间、费米- LAT类数据集上评估异常检测策略提供基准。

英文摘要

We present a framework for detecting transient gamma-ray phenomena in a controlled environment by combining end-to-end simulations of the Fermi-LAT sky with self-supervised spatio-temporal deep learning. We generate a ten-year synthetic Universe with gtobssim and process the simulated events into daily all-sky maps of counts and exposure, obtaining a time-ordered sequence that mirrors the structure of Fermi-LAT observations. To model the nominal evolution of the sky, we employ a Convolutional Long Short-Term Memory (ConvLSTM) network that operates directly on map sequences, preserving spatial locality while learning temporal dependencies. The model is trained to reconstruct expected emission, and departures from the learned baseline are quantified through pixel-wise mean-squared residual maps. We then define statistically motivated anomaly criteria by estimating per-pixel thresholds from the residual distribution on the training set, and we enforce spatial coherence via local filtering to suppress isolated fluctuations. The ConvLSTM is then deployed as trained predictor on Fermi-LAT daily maps, where the sky can depart from the nominal behavior because of genuine astrophysical variability and instrumental non-stationarities. The resulting pipeline flags localized, time-dependent excesses consistent with high-variable sources or transient events (e.g., flares or GRBs) and provides a benchmark for evaluating anomaly-detection strategies on long-duration, Fermi-LAT-like datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.22097 2026-05-22 quant-ph cs.LG

Q-PhotoNAS: Hybrid Quantum Neural Architecture Search Framework on Photonic Devices

Q-PhotoNAS：基于光子设备的混合量子神经架构搜索框架

Farah Elnakhal, Alberto Marchisio, Nouhaila Innan, Gabriel Falcao, Muhammad Shafique

发表机构 * Quandela Ascella photonic QPU（Quandela Ascella 光子量子处理器）

AI总结本文提出了一种结合遗传算法和可学习量子相位编码的混合光子量子-经典模型神经架构搜索框架，通过系统探索经典和量子组件的联合设计空间，提高了图像分类任务的准确率和硬件兼容性。

详情

AI中文摘要

光子量子计算是一种有前景的可扩展量子机器学习平台，但在硬件和优化约束下设计有效的混合架构仍然具有挑战性。现有方法依赖于手动调优的架构，无法考虑经典预处理、相位编码和光子电路结构之间的协同作用，限制了准确性和硬件兼容性。在本文中，我们提出了一种混合光子量子-经典模型的神经架构搜索框架，结合基于遗传算法的搜索和可学习量子相位编码，系统地探索经典和量子组件的联合设计空间。我们的框架编码了19个超参数，分布在六个基因组中，并通过基于组的交叉、按基因突变和精英主义进化混合架构的种群。在短训练预算下评估每个候选者，然后对最佳设计进行完整重新训练。我们在两个图像分类基准测试上评估了我们的框架，即Digits和MNIST，分别达到了99.44%和98.78%的最终验证准确率，基于Quandela Ascella光子QPU的第一性执行时间估计，单张图像推断时间分别为67 ms（Digits）和149 ms（MNIST）。我们的量子贡献分析进一步显示，光子层提取了与经典路径正交的非冗余特征，相较于仅经典基线提供了可测量的准确性优势。我们的结果表明，自动化架构搜索对于混合光子系统来说既实用又具有影响，为在光子设备上量子AI的系统设计空间探索开辟了道路。

英文摘要

Photonic quantum computing is a promising platform for scalable quantum machine learning, but designing effective hybrid architectures remains challenging under hardware and optimization constraints. Existing approaches rely on manually tuned architectures that fail to account for the collaboration between classical preprocessing, phase encoding, and photonic circuit structure, limiting both accuracy and hardware compatibility. In this paper, we propose a neural architecture search framework for hybrid photonic quantum-classical models that combines genetic algorithm-based search with learnable quantum phase encoding to systematically explore the joint design space of classical and quantum components. Our framework encodes 19 hyperparameters across six gene groups and evolves a population of hybrid architectures using group-based crossover, per-gene mutation, and elitism, evaluating each candidate on a short training budget before full retraining of the best found design. We evaluate our framework on two image classification benchmarks, Digits and MNIST, achieving final validation accuracies of 99.44% and 98.78%, respectively, with first-principles execution time estimates on the Quandela Ascella photonic QPU projecting single-image inference at 67 ms (Digits) and 149 ms (MNIST). Our quantum contribution analysis further shows that the photonic layer extracts non-redundant features orthogonal to the classical pathway, providing a measurable accuracy advantage over classical-only baselines. Our results demonstrate that automated architecture search is both practical and impactful for hybrid photonic systems, opening the way for systematic design space exploration of quantum AI on photonic devices.

URL PDF HTML ☆

赞 0 踩 0

2605.22095 2026-05-22 econ.GN cs.AI cs.GT cs.HC q-fin.EC

Not Yet: Humans Outperform LLMs in a Colonel Blotto Tournament

Not Yet: 人类在布洛托 tournaments 中优于 LLMs

Dmitry Dagaev, Egor Ivanov, Petr Parshakov, Alexey Savvateev, Gleb Vasiliev

发表机构 * HSE University（俄罗斯高等经济学院）； New Economic School（新经济学校）； Central Economic Mathematical Institute, Russian Academy of Sciences（俄罗斯科学院中央经济数学研究所）； Adyghe State University（阿迪格国立大学）； Moscow Institute of Physics and Technology（莫斯科物理技术学院）； Innopolis University（因诺波利斯大学）

AI总结研究通过布洛托博弈 tournaments 比较了人类与 LLMs 的策略表现，发现人类更擅长使用校准良好的中间层次分配启发式方法，而 LLMs 的简单策略表现较差。

详情

AI中文摘要

大语言模型（LLMs）的出现促使经济学家研究人类和 LLMs 在战略环境中的行为。我们组织了一系列循环轮换 tournaments 在布洛托博弈中。该博弈吸引博弈论家的注意，因为其高维动作空间和没有纯策略纳什均衡。在第一个 tournaments 中，超过 200 名人类参与者相互竞争。在第二个 tournaments 中，几个流行的 LLMs 被邀请提交策略。在第三个 tournaments 中，我们匹配了 LLM 策略的数量与人类提交的数量。我们发现，人类更常使用更好的校准中间层次分配启发式方法，并且优于 LLMs 提交的更简单、更刻板的策略。战略复杂性是成功的关键，当且仅当达到必要的推理深度水平时。而较低和较高的推理层次在原始策略上没有明显优势。在人类中，学科背景弱预测成功：具有 STEM 背景的参与者在第一个 tournaments 中表现更好。令人惊讶的是，人类几乎不根据对手的不同集合调整策略。这一结果表明，人类主要基于游戏规则而非对手身份做出选择，将 LLMs 看作人类竞争对手。

英文摘要

The emergence of large language models (LLMs) has spurred economists to study how humans and LLMs behave in strategic settings. We organized a series of round-robin tournaments in the Colonel Blotto game. This game attracts game theorists' attention due to high-dimensional action space and the absence of pure strategy Nash equilibria. In the first tournament, more than 200 human participants competed against one another. In the second tournament, several popular LLMs were invited to submit strategies. In the third tournament, we matched the number of LLM strategies to the number submitted by humans. We find that humans more often employ better-calibrated intermediate-level allocation heuristics and outperform the simpler, more stereotyped strategies submitted by LLMs. Strategic sophistication is key to success if and only if the necessary level of reasoning depth is reached, while lower and higher levels of reasoning offer no clear advantage over the primitive strategies. Among humans, field of study weakly predicts success: participants with STEM backgrounds perform better in the first tournament. Surprisingly, humans almost do not adjust their strategies across tournaments with different sets of opponents. This result suggests that humans base their choices primarily on the game's rules rather than on the identity of their opponents, treating LLMs much like human competitors.

URL PDF HTML ☆

赞 0 踩 0

2605.21379 2026-05-22 cs.NE cs.AI

How to Build Marcus's Algebraic Mind: Algebro-Deterministic Substrate over Galois Fields

如何构建马库斯的代数心智：基于伽罗瓦域的代数确定性介质

Hiroyuki Chuma, Kanji Otsuk, Yoichi Sato

发表机构 * Institute of Innovation Research, Hitotsubashi University（立命馆大学创新研究所）； Meisei University（立命馆大学）； Shuhari System（Shuhari系统）

AI总结本文提出了一种基于伽罗瓦域的代数确定性介质，实现了马库斯提出的三种认知架构核心要素，并展示了该架构在逻辑推理和语义区分方面的应用。

详情

AI中文摘要

在《代数心智》中，加里·马库斯指出了任何充分认知架构必须包含的三个组成部分：变量上的操作、递归结构的表示，以及个体与类别的区分。他指出标准多层感知机不支持这些，承认使用寄存器和树形单元的神经实现，通过发育程序而非梯度下降构建仍是一个程序性猜想。25年后，所需的介质现已可用。我们新开发的PyVaCoAl/VaCoAl是一种超维计算架构，围绕单个代数原语XOR-and-shift over GF(2)组织，通过原始多项式线性反馈移位寄存器实现。该架构支持通过Bind(R,F) = R XOR shift(F)实现的可逆变量绑定，非交换性的组合捆绑，能够区分“狗咬人”与“人咬狗”，并在同一代数下实现地址空间的个体/类别分离。一种互补观点认为，海马体-CA3回路是这种引擎的生物同源物，发育指定的 mossy-fiber 目标提供了马库斯预期的内生微回路。在本文中，我们映射马库斯的三种支柱与PyVaCoAl/VaCoAl的操作承诺之间的对应关系。我们重新解释树形单元为一个由原始生成多项式索引的代数寄存器集，论证该架构比2001年可用的张量积、循环卷积或时间同步更接近马库斯的规格。我们还展示了该介质如何自然扩展到佩尔的第三级反事实推理，这是原始树形单元程序未直接针对的能力。

英文摘要

In The Algebraic Mind, Gary Marcus identified three components essential for any adequate cognitive architecture: operations over variables, recursively structured representations, and a distinction between mental representations of individuals and kinds. He argued that standard multilayer perceptrons supported none of these, acknowledging that a neural implementation using registers and treelets, constructed via developmental programs rather than gradient descent, remained a programmatic conjecture. Twenty-five years later, the required substrate is now available. Our newly developed PyVaCoAl/VaCoAl is a hyperdimensional computing architecture organized end-to-end around a single algebraic primitive: XOR-and-shift over GF(2), implemented by primitive-polynomial linear-feedback shift registers. The architecture supports reversible variable binding via Bind(R,F) = R XOR shift(F), non-commutative compositional bundling that distinguishes "the dog bites the man" from "the man bites the dog," and address-space individual/kind separation under the same algebra. A companion perspective argues that the dentate gyrus-CA3 circuit is a biological homologue of this same engine, with developmentally specified mossy-fiber targeting supplying the innate microcircuitry Marcus anticipated. In this paper, we map the correspondence between Marcus's three pillars and the operational commitments of PyVaCoAl/VaCoAl. We reinterpret the treelet as an algebraic register set indexed by a primitive generator polynomial, arguing that this architecture provides a functional neural substrate meeting Marcus's specifications far more closely than the tensor products, circular convolution, or temporal synchrony available in 2001. We also demonstrate how this substrate naturally extends to Pearl's rung-3 counterfactual reasoning, a capability the original treelet program did not directly target.

URL PDF HTML ☆

赞 0 踩 0

2605.20348 2026-05-22 q-fin.CP cs.AI

Memory-Induced Supra-Competitive Outcomes Between Deep Reinforcement Learning Agents in Optimal Trade Execution

记忆诱导的深度强化学习代理在最优交易执行中的超竞争性结果

Christos Spyridon Koulouris, Carlo Campajola

发表机构 * Institute of Finance and Technology, University College London（金融与技术研究所，伦敦大学学院）； UZH Blockchain Center（苏黎世大学区块链中心）

AI总结本文研究了在共享的最优执行环境中交互的深度强化学习代理是否能维持超竞争性结果，即在实现短损方面优于博弈论竞争基准。研究了一个双代理阿尔梅伦-克里斯特流动性清算游戏，并探讨了学习行为如何依赖于回合内环境反馈、解读中间价格的能力以及代理对过去的了解。我们首先使用事前调度学习代理来去除回合内反馈，以确定当代理在执行开始前承诺完成清算轨迹时会发生什么。然后允许代理使用多种DDQN架构根据演进的状态进行条件判断。我们发现，当代理能够访问回合内历史，特别是近期价格和自身过去行为时，超竞争性结果变得更加频繁和持久。这些发现表明，这种执行游戏中的超竞争性行为并非由多代理学习或当前价格观察单独驱动，而是由反馈、记忆和沿实际执行路径的状态依赖性交互驱动。

详情

AI中文摘要

在本文中，我们研究了在共享的最优执行环境中交互的深度强化学习代理是否能够维持超竞争性结果，即在实现短损方面优于相关博弈论竞争基准。我们研究了一个双代理阿尔梅伦-克里斯特流动性清算游戏，并探讨了学习行为如何依赖于回合内环境反馈、解读中间价格的能力以及代理对过去的了解。我们首先使用事前调度学习代理来去除回合内反馈，并确定当代理在执行开始前承诺完成清算轨迹时会发生什么。然后允许代理使用多种DDQN架构根据演进的状态进行条件判断。我们发现，当代理能够访问回合内历史，特别是近期价格和自身过去行为时，超竞争性结果变得显著更频繁和持久。这些发现表明，这种执行游戏中的超竞争性行为并非由多代理学习或当前价格观察单独驱动，而是由反馈、记忆和沿实际执行路径的状态依赖性交互驱动。

英文摘要

In this paper, we investigate whether deep reinforcement-learning agents interacting in a shared optimal-execution environment can sustain supra-competitive outcomes, in the sense of achieving lower implementation shortfalls than the relevant game-theoretical competitive benchmark. We study a two-agent Almgren-Chriss liquidation game and examine how learned behavior depends on intra-episode environment feedback, the ability to interpret the mid-price and the agent's knoledge of the past. We first use ex-ante schedule-learning agents to remove intra-episode feedback and isolate what can arise when agents commit to complete liquidation trajectories before execution begins. We then allow agents to condition on the evolving state using a variety of DDQN architectures. We find that, when agents are given access to intra-episode history, especially recent prices and own past actions, supra-competitive outcomes become substantially more frequent and more persistent. These findings indicate that supra-competitive behavior in this execution game is driven not by multi-agent learning or by current price observation alone, but by feedback, memory, and state-contingent interaction along the realized execution path.

URL PDF HTML ☆

赞 0 踩 0

2605.19354 2026-05-22 eess.IV cs.CV

Next-Acceleration-Scale Prediction for Autoregressive MRI Reconstruction

用于自回归MRI重建的下一步加速尺度预测

Yilmaz Korkmaz, Vishal M. Patel

发表机构 * Johns Hopkins University（约翰霍普金斯大学）

AI总结本文提出了一种基于离散多尺度潜在空间的自回归下一步加速尺度预测方法，通过引入特权信息蒸馏技术，提升了在极端欠采样下的MRI重建性能。

详情

AI中文摘要

MRI重建本质上是一个病态的逆问题，因为不完整的测量允许许多可能的解决方案。在高加速情况下，这种不确定性变得更加严重，像素域连续预测器倾向于在可行的重建之间平均并抑制高频解剖结构。我们通过将重建移动到离散多尺度潜在空间，并将其作为自回归下一步加速尺度预测来解决这一限制。利用在视觉自回归建模中证明有效的离散先验，我们的方法将解限制在紧凑的代码本令牌序列中，即使从极稀疏的测量中也能实现锐利的重建。这种离散自回归公式也自然与现代大型语言模型后训练技术对齐。基于这一观察，我们引入了视觉自回归建模中的在线策略特权信息蒸馏，其中教师仅在训练时使用不可用的特权上下文进行训练，在本案例中是完全采样获取，监督学生在自己的滚动生成中进行训练，从而实现一致的重建增益。通过在fastMRI基准上的广泛实验，我们展示了我们的方法在各种采样模式下在极端欠采样下提供了改进的重建性能。项目网站是https://yilmazkorkmaz1.github.io/discrete-mri-reconstruction-opd/。

英文摘要

MRI reconstruction is an inherently ill-posed inverse problem, since incomplete measurements admit many plausible solutions. This ambiguity becomes more severe under high acceleration, where pixel-domain continuous predictors tend to average over feasible reconstructions and suppress high-frequency anatomy. We address this limitation by moving reconstruction to discrete multi-scale latent space and posing it as autoregressive next-acceleration-scale prediction. Leveraging discrete priors proven effective in visual autoregressive modeling, our method restricts the solution to compact sequences of codebook tokens, enabling sharp reconstructions even from extremely sparse measurements. This discrete autoregressive formulation also aligns naturally with modern large language model post-training techniques. Building on this observation, we introduce on-policy privileged information distillation for visual autoregressive modeling, where a teacher is provided training only privileged context that is unavailable at inference, in our case fully sampled acquisitions, and supervises a student trained on its own rollouts, leading to consistent reconstruction gains. Through extensive experiments on the fastMRI benchmark, we show that our approach delivers improved reconstruction performance across diverse sampling patterns under extreme undersampling. Project website is \href{https://yilmazkorkmaz1.github.io/discrete-mri-reconstruction-opd/}{here}.

URL PDF HTML ☆

赞 0 踩 0

2605.19152 2026-05-22 stat.ML cs.ET cs.IT cs.LG cs.NE math.IT physics.optics

Information Processing Capacity of Stationary Physical Systems: Theory, Data-efficient Estimation Methods, and Photonic Demonstration

stationary 物理系统的信息处理能力：理论、数据高效估计方法和光子演示

Rahul Uma Ramachandran, Serge Massar

发表机构 * Laboratoire d’Information Quantique CP224, Université libre de Bruxelles（量子信息实验室CP224，布鲁塞尔自由大学）

AI总结本文研究了 stationary 物理系统的信息处理能力，提出了一种理论框架，并开发了数据高效估计方法，通过光子计算系统实验验证了其有效性。

Comments added 2 new references

详情

AI中文摘要

物理计算系统为实现硬件原生机器学习提供了有前景的途径，但其计算能力在原理上、任务无关和数据高效的方式下难以表征。我们扩展了信息处理能力（IPC）框架以适用于 stationary 物理计算系统，并建立了几个基本结果：个体容量在零和一之间被限制，其在完整基底上的总和受读数数量的限制，噪声严格减少这个界限。我们处理有限样本的 IPC 估计，并推导了影响朴素估计器的系统性正偏倚的渐近形式。基于这些结果，我们引入了基于 Richardson 推理和 Sobol 准随机采样的数据高效估计方法。我们通过基于皮秒激光脉冲在非线性光纤中传播的光子计算系统实验验证了该框架。通过改变激光功率和光纤长度，我们观察到由 Kerr 效应诱导的 IPC 分布系统性地向高阶非线性容量偏移。最后，我们证明了总 IPC 与基准机器学习任务的性能强相关，并提供了系统有效维度的可靠估计。这些结果确立了 IPC 作为连接物理计算系统内在动态与其机器学习性能的实用桥梁。

英文摘要

Physical computing systems provide a promising route toward hardware-native machine learning, but their computational capabilities remain difficult to characterize in a principled, task-independent, and data-efficient way. We extend the Information Processing Capacity (IPC) framework to stationary physical computing systems and establish several fundamental results: individual capacities are bounded between zero and one, their sum over a complete basis is bounded by the number of readouts, and noise strictly reduces this bound. We address the finite-sample estimation of IPC and derive the asymptotic form of the systematic positive bias affecting naive estimators. Building on these results, we introduce data-efficient estimation methods based on Richardson extrapolation and Sobol quasi-random sampling. We validate the framework experimentally using a photonic computing system based on picosecond laser pulses propagating through a nonlinear optical fibre. By varying the laser power and fibre length, we observe systematic shifts of the IPC distribution toward higher-order nonlinear capacities induced by the Kerr effect. Finally, we demonstrate that the total IPC strongly correlates with performance on benchmark machine-learning tasks and provides a reliable estimate of the effective dimensionality of the system. These results establish IPC as a practical bridge between the intrinsic dynamics of physical computing systems and their machine-learning performance.

URL PDF HTML ☆

赞 0 踩 0

2605.18372 2026-05-22 cs.HC cs.AI cs.CY cs.ET

The Hidden Cost of Contextual Sycophancy: an AI Literacy Intervention in Human-AI Collaboration

上下文谄媚的隐性成本：人类-人工智能协作中的AI素养干预

Cansu Koyuturk, Sabrina Guidotti, Dimitri Ognibene

发表机构 * Università degli Studi di Milano-Bicocca（米兰-比科卡大学）

AI总结本研究探讨了在人类-人工智能协作中上下文谄媚现象的成因，并通过干预提升AI素养和提示能力以减轻其影响，发现AI反馈质量受用户错误传播影响显著，提示需系统层面改进以促进批判性参与。

Comments SPRINGER AIED 2026: Accepted for LBR, poster presentation at the 27th International Conference on Artificial Intelligence in Education, 27 Jun - 3 Jul 2026, Seoul, Republic of Korea

详情

AI中文摘要

大型语言模型（LLMs）在教育领域日益被用作交互工具进行协作。然而，其倾向于谄媚，即使在错误时也迎合用户信念，这引发了学习和决策的担忧，尤其是对知识较少的用户。本研究调查了在真实多轮人类-人工智能交互中谄媚对齐如何产生，并探讨了针对提高AI素养和提示能力的干预是否能减轻其影响。在受控混合设计实验中，60名参与者通过先生成个人排名再与AI助手协作进行分析生存排名任务，分别在干预前和干预后接受一般或谄媚聚焦的提示训练。初步结果显示，LLMs对用户输入高度敏感：低质量的初始响应导致较差的AI建议，表明模型镜像或整合了用户推理而非纠正或提供缺失或较少见的替代方案。关键的是，用户错误向AI响应的传播显著降低了AI反馈质量和最终用户任务表现，揭示了一种上下文谄媚依赖现象。尽管干预未能消除上下文错误的传播，但显著提高了AI建议质量，通过减少直接镜像错误用户排名。这些发现表明，提示和AI素养单独可能不足以确保知识上独立的AI支持，强调了需要系统层面方法以促进人类-人工智能协作中的批判性参与。

英文摘要

Large Language Models (LLMs) are increasingly used in educational settings as interactive tools for collaboration. However, their tendency toward sycophancy, aligning with user beliefs even when incorrect, raises concerns for learning and decision-making, especially for less knowledgeable users. This study investigates how sycophantic alignment emerges in authentic multi-turn human-AI interactions and whether interventions targeting increasing AI literacy and prompting competencies can mitigate its effects. In a controlled mixed-design experiment, 60 participants completed analytical survival ranking tasks by first generating individual rankings and then making final decisions after collaborating with an AI assistant, both before and after receiving either general or sycophancy-focused prompting training. Preliminary results show that LLMs are highly sensitive to user input: lower-quality initial responses lead to poorer AI advice, suggesting that the model mirrors or incorporates user reasoning rather than correcting it or offering better alternatives that are missing or less frequent in the conversation. Critically, the propagation of user errors into AI responses significantly reduced both the quality of AI feedback and final user task performance, revealing a form of contextual sycophantic dependence. While the intervention did not eliminate the propagation of contextual errors, it significantly improved AI advice by reducing the direct mirroring of incorrect user rankings. These findings suggest that prompting and AI literacy alone may be insufficient to ensure epistemically independent AI support, highlighting the need for system-level approaches that better promote critical engagement in human-AI collaboration.

URL PDF HTML ☆

赞 0 踩 0

2605.12456 2026-05-22 cs.CR cs.CL cs.LG

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

TextSeal: 一种用于溯源与蒸馏保护的本地化大语言模型水印

Tom Sander, Hongyan Chang, Tomáš Souček, Tuan Tran, Valeriu Lacatusu, Sylvestre-Alvise Rebuffi, Alexandre Mourachko, Surya Parimi, Christophe Ropers, Rashel Moritz, Vanessa Stark, Hady Elsahar, Pierre Fernandez

发表机构 * FAIR, Meta Superintelligence Labs（FAIR，Meta超智能实验室）

AI总结本文提出TextSeal，一种先进的大语言模型水印技术，通过Gumbel-max采样引入双密钥生成以恢复输出多样性，并结合熵加权评分和多区域定位提升检测性能。该方法支持推测解码和多令牌预测等服务优化，不增加推理开销。在检测强度上严格优于基线方法SynthID-text，并对稀释具有鲁棒性，即使在混合的人类/AI文档中也能保持自信的本地化检测。理论上该方案无失真，经推理基准评估证实其保持下游性能；同时通过多语言人工评估（6000次A/B对比，5种语言）显示无明显质量差异。除了用于溯源检测外，TextSeal还具有'放射性'特性：其水印信号通过模型蒸馏传递，可检测未经授权的使用。

详情

AI中文摘要

我们介绍TextSeal，一种最先进的大语言模型水印。基于Gumbel-max采样，TextSeal引入双密钥生成以恢复输出多样性，同时结合熵加权评分和多区域定位以提升检测性能。它支持推测解码和多令牌预测等服务优化，并不增加任何推理开销。TextSeal在检测强度上严格优于基线方法如SynthID-text，并对稀释具有鲁棒性，即使在混合的人类/AI文档中也能保持自信的本地化检测。该方案在理论上是无失真的，经推理基准评估确认其保持下游性能；同时通过多语言人工评估（6000次A/B对比，5种语言）显示无明显质量差异。除了用于溯源检测外，TextSeal还具有'放射性'特性：其水印信号通过模型蒸馏传递，可检测未经授权的使用。

英文摘要

We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizations such as speculative decoding and multi-token prediction, and does not add any inference overhead. TextSeal strictly dominates baselines like SynthID-text in detection strength and is robust to dilution, maintaining confident localized detection even in heavily mixed human/AI documents. The scheme is theoretically distortion-free, and evaluation across reasoning benchmarks confirms that it preserves downstream performance; while a multilingual human evaluation (6000 A/B comparisons, 5 languages) shows no perceptible quality difference. Beyond its use for provenance detection, TextSeal is also ``radioactive'': its watermark signal transfers through model distillation, enabling detection of unauthorized use.

URL PDF HTML ☆

赞 0 踩 0

2605.07985 2026-05-22 cs.DC cs.AI

Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

Dooly: 一种配置无关、冗余感知的LLM推理模拟器

Joon Ha Kim, Geon-Woo Kim, Anoop Rachakonda, Daehyeok Kim

发表机构 * University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结本文提出Dooly，一种能够忽略配置差异并高效处理冗余的LLM推理模拟器，通过单次推理过程和智能标签传播减少冗余 profiling 耗时，提升模拟精度和效率。

详情

AI中文摘要

选择最优的LLM推理配置需要在硬件、服务引擎、注意力后端和模型架构之间进行评估，因为没有单一选择在所有工作负载中表现最佳。基于配置的模拟器是标准工具，但它们硬编码操作集到特定配置，并重新对每个操作进行重新配置，这使得探索变得成本高昂。这种成本源于对结构理解的缺失：每个操作的每个输入维度都由模型配置或 incoming 请求决定。许多模型配置值（例如头大小、层数）在不同模型中重复出现，因此相同操作在许多配置中运行；一次扫描请求依赖的维度即可服务所有。我们提出了Dooly，利用这种结构实现配置无关、冗余感知的配置。Dooly执行一次推理过程，通过污点传播标记每个输入维度的来源，并仅对不在其延迟数据库中的操作进行选择性配置；状态操作如注意力通过重用服务引擎自身的初始化代码进行隔离，从而消除手动仪器化。它基于数据库构建延迟回归模型，该模型成为现有模拟器的即插即用后端。在两个GPU平台、三个注意力后端和多样的模型架构上，Dooly在TTFT上达到5%的MAPE精度，在TPOT上达到8%的精度，同时将12个模型的profiling GPU小时减少了56.4%。我们已开源Dooly在https://github.com/dooly-project。

英文摘要

Selecting the optimal LLM inference configuration requires evaluation across hardware, serving engines, attention backends, and model architectures, since no single choice performs best across all workloads. Profile-based simulators are the standard tool, yet they hardcode their operation set to a specific configuration and re-profile every operation from scratch, making exploration prohibitively expensive. This cost stems from a missing structural understanding: every input dimension of each operation is fixed by the model configuration or determined by the incoming request. Many model-configuration values (e.g., head size, layer count) recur across models, so the same operation runs in many configurations; a single sweep over the request-dependent dimensions can serve them all. We present Dooly, which exploits this structure to achieve configuration-agnostic, redundancy-aware profiling. Dooly performs a single inference pass, labels each input dimension with its origin via taint propagation, and selectively profiles only operations absent from its latency database; stateful operations such as attention are isolated by reusing the serving engine's own initialization code, eliminating manual instrumentation. It builds latency regression models based on the database, which becomes a drop-in backend for existing simulators. Across two GPU platforms, three attention backends, and diverse model architectures, Dooly achieves simulation accuracy within 5% MAPE for TTFT and 8% for TPOT while reducing profiling GPU-hours by 56.4% across 12 models compared to the existing profiling approach. We have open-sourced Dooly at https://github.com/dooly-project.

URL PDF HTML ☆

赞 0 踩 0

2605.07870 2026-05-22 cond-mat.dis-nn cs.AI stat.ML

Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer

深度网络中的谱动力学：特征学习、异常值逃逸和学习率转移

Clarissa Lauditi, Cengiz Pehlevan, Blake Bordelon

发表机构 * John A. Paulson School of Engineering and Applied Sciences, Harvard University（哈佛大学约翰A·保罗森工程与应用科学学院）； Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University（哈佛大学自然与人工智能研究学院）； Center for Mathematical Sciences and Applications, Harvard University（哈佛大学数学科学中心）； Oden Institute for Computational Engineering and Sciences & Dept. of Neuroscience, UT Austin（得克萨斯大学奥斯汀分校奥登计算工程与科学学院及神经科学系）

AI总结本文研究了在宽神经网络中通过（随机）梯度下降训练时隐藏权重谱的演变，提出了一种双层动态平均场理论（DMFT）来联合跟踪具有尖峰集合的隐藏权重谱动态，其中尖峰方向在随机体上保持统计依赖性。该框架应用于两种设置：（1）无限宽度非线性网络在均值场/μP缩放下，以及（2）深度线性网络在比例高维极限下。理论预测了异常值如何随训练时间、宽度、输出尺度和初始化方差演变。在深度线性网络中，μP产生与宽度一致的异常值动态和超参数转移，包括主导NTK模式向稳定性边缘（EoS）的宽度稳定增长。相比之下，NTK参数化表现出强烈依赖宽度的异常值动态，尽管收敛到一个稳定的宽网络极限。我们展示了这种体+异常值图像是描述简单任务的，但涉及大量输出的任务（如ImageNet分类或GPT语言建模）则更适合通过重构谱体来描述。我们开发了一个具有大量输出通道的玩具模型，重现了这一现象，并展示了足够宽的网络下谱边缘仍会收敛。

Comments Updating related works + discussion

详情

AI中文摘要

我们研究了在宽神经网络中通过（随机）梯度下降训练时隐藏权重谱的演变。我们开发了一种双层动态平均场理论（DMFT），该理论联合跟踪具有尖峰集合的隐藏权重谱动态，其中尖峰方向在随机体上保持统计依赖性。我们将该框架应用于两种设置：（1）无限宽度非线性网络在均值场/μP缩放下，以及（2）深度线性网络在比例高维极限下，其中宽度、输入维度和样本大小以固定比例发散。我们的理论预测了异常值如何随训练时间、宽度、输出尺度和初始化方差演变。在深度线性网络中，μP产生与宽度一致的异常值动态和超参数转移，包括主导NTK模式向稳定性边缘（EoS）的宽度稳定增长。相比之下，NTK参数化表现出强烈依赖宽度的异常值动态，尽管收敛到一个稳定的宽网络极限。我们展示了这种体+异常值图像是描述简单任务的，但涉及大量输出的任务（如ImageNet分类或GPT语言建模）则更适合通过重构谱体来描述。我们开发了一个具有大量输出通道的玩具模型，重现了这一现象，并展示了足够宽的网络下谱边缘仍会收敛。

英文摘要

We study the evolution of hidden-weight spectra in wide neural networks trained by (stochastic) gradient descent. We develop a two-level dynamical mean-field theory (DMFT) that jointly tracks bulk and outlier spectral dynamics for spiked ensembles whose spike directions remain statistically dependent on the random bulk. We apply this framework to two settings: (1) infinite-width nonlinear networks in mean-field/$μ$P scaling and (2) deep linear networks in the proportional high-dimensional limit, where width, input dimension, and sample size diverge with fixed ratios. Our theory predicts how outliers evolve with training time, width, output scale, and initialization variance. In deep linear networks, $μ$P yields width-consistent outlier dynamics and hyperparameter transfer, including width-stable growth of the leading NTK mode toward the edge of stability (EoS). In contrast, NTK parameterization exhibits strongly width-dependent outlier dynamics, despite converging to a stable large-width limit. We show that this bulk+outlier picture is descriptive of simple tasks with small output channels, but that tasks involving large numbers of outputs (ImageNet classification or GPT language modeling) are better described by a restructuring of the spectral bulk. We develop a toy model with extensive output channels that recapitulates this phenomenon and show that edge of the spectrum still converges for sufficiently wide networks.

URL PDF HTML ☆

赞 0 踩 0

2605.06669 2026-05-22 cs.CR cs.AI cs.LG

Evaluating Prompt Injection Defenses for Educational LLM Tutors: Security-Usability-Latency Trade-offs

评估教育LLM导师的提示注入防御：安全-可用性-延迟的权衡

Alexandre Cristovão Maiorano

发表机构 * Lumytics

AI总结本文提出了一种评估提示注入防御方法的框架，探讨了在教育LLM导师中安全、可用性和延迟之间的权衡，并通过实验比较了不同防御机制的性能。

Comments 19 pages, 4 figures, 9 tables

详情

AI中文摘要

教育LLM导师面临一个核心的AI对齐挑战：它们必须在遵循用户意图的同时保持教学约束和安全政策。我们提出了一个评估方法，用于评估提示注入防御在该场景中的表现，显示了防护栏设计在对抗性鲁棒性、良性任务可用性和响应延迟之间存在显式的权衡。我们评估了一个领域特定的多层安全防护流水线，结合确定性模式过滤器、结构验证、上下文沙箱和会话级行为检查。在受控的保留基准测试中，该流水线实现了低绕过率和假阳性率，同时优化了平均延迟——一个优先考虑教学可用性（零假阳性）而保持可测量攻击抵抗力的操作点。我们提供了一个可重复的基准测试协议，用于在相同条件下进行头对头比较，包括分层Bootstrap置信区间、配对McNemar显著性检验、多种子敏感度扫描，以及在相同划分上对Prompt Guard和NeMo Guardrails的直接评估。结果揭示了操作权衡：NeMo在16.22%的假阳性率下达到0%的绕过率，而Prompt Guard在3.60%的假阳性率下达到38.48%的绕过率。该框架支持在不同机构风险和可用性要求下，基于证据的防护栏选择。

英文摘要

Educational LLM tutors face a core AI alignment challenge: they must follow user intent while preserving pedagogical constraints and safety policies. We present an evaluation methodology for prompt-injection defenses in this setting, showing that guardrail design entails explicit trade-offs among adversarial robustness, benign-task usability, and response latency. We evaluate a domain-specific multi-layer safeguard pipeline combining deterministic pattern filters, structural validation, contextual sandboxing, and session-level behavioral checks. On a controlled holdout benchmark, the pipeline reaches low bypass and false positive rates with optimized average latency - an operating point that prioritizes pedagogical usability (zero false positives) while maintaining measurable attack resistance. We provide a reproducible benchmark protocol for head-to-head comparison under identical conditions, including stratified bootstrap confidence intervals, paired McNemar significance tests, multi-seed sensitivity sweeps, and direct evaluation of Prompt Guard and NeMo Guardrails on the same split with unified instrumentation. Results expose operational trade-offs: NeMo reaches 0 percent bypass at 16.22 percent FPR and roughly 1.5s latency, while Prompt Guard yields 38.48 percent bypass with 3.60 percent FPR. The framework supports evidence-based guardrail selection for AI tutoring systems under different institutional risk and usability requirements.

URL PDF HTML ☆

赞 0 踩 0

2605.01369 2026-05-22 eess.SP cs.AI cs.LG

MU-SHOT-Fi: Self-Supervised Multi-User Wi-Fi Sensing with Source-free Unsupervised Domain Adaptation

MU-SHOT-Fi: 基于源无关无监督域适应的多用户Wi-Fi感知

Ahmed Y. Radwan, Hina Tabassum

发表机构 * department of Electrical Engineering and Computer Science（电气工程与计算机科学系）

AI总结本文提出MU-SHOT-Fi框架，通过源无关无监督域适应方法，在单用户和多用户Wi-Fi感知中实现准确的活动分类和占用估计，同时防止模型崩溃。

Journal ref IEEE Internet of Things Journal, Early Access, 2026

详情

DOI: 10.1109/JIOT.2026.3686090

AI中文摘要

深度学习已被广泛应用于基于Wi-Fi CSI的人体活动识别（HAR），因为它能够以隐私保护和成本效益的方式学习时空特征。然而，基于深度学习的模型在跨环境泛化能力差，特别是在多用户设置中，重叠活动导致CSI纠缠和域偏移。实际部署通常由于隐私限制限制访问标记源数据，这促使使用仅未标记目标域CSI和预训练源模型进行源无关适应。在本文中，我们提出了MU-SHOT-Fi，一种用于单用户和多用户Wi-Fi感知的源无关无监督域适应框架。MU-SHOT-Fi在源训练期间采用排列不变的集合预测与匈牙利匹配，随后在目标域中采用冻结分类器骨干适应。为了实现无标签的稳定适应，我们引入了占用加权信息最大化，通过将多样性正则化集中在可能占用的槽位上，同时排除主导类别的边际熵。此外，我们采用二进制旋转预测作为空间自监督，利用CSI频率-时间结构学习域不变特征。对于单用户场景，我们引入SU-SHOT-Fi，通过将占用加权替换为标准信息最大化，并结合对比预测编码以利用时间一致性。在WiMANS和Widar 3.0数据集上进行了广泛的实验，涵盖了跨环境、跨频率、跨方向和组合域偏移，证明MU-SHOT-Fi在大域偏移下有效恢复多用户精确活动分类性能，同时保持准确的占用估计并防止向主导类崩溃。

英文摘要

Deep learning has been widely adopted for WiFi CSI-based human activity recognition (HAR) due to its ability to learn spatio-temporal features in a privacy-preserving and cost-effective manner. However, DL-based models generalize poorly across environments, a challenge amplified in multi-user settings where overlapping activities cause CSI entanglement and domain shifts. Practical deployments often limit access to labeled source data due to privacy constraints, motivating source-free adaptation using only unlabeled target-domain CSI and a pre-trained source model. In this paper, we propose MU-SHOT-Fi, a source-free unsupervised domain adaptation framework for single- and multi-user Wi-Fi sensing. MU-SHOT-Fi employs permutation-invariant set prediction with Hungarian matching during source training, followed by frozen-classifier backbone adaptation in the target domain. To enable stable adaptation without labels, we introduce occupancy-weighted information maximization that prevents model collapse by focusing diversity regularization on likely-occupied slots while excluding the dominant class from marginal entropy. Additionally, we employ binary rotation prediction as spatial self-supervision that exploits CSI frequency-time structure to learn domain-invariant features. For single-user scenarios, we introduce SU-SHOT-Fi by replacing occupancy weighting with standard information maximization and incorporating contrastive predictive coding to exploit temporal consistency. Extensive experiments on the WiMANS and Widar 3.0 datasets across cross-environment, cross-frequency, cross-orientation, and combined domain shifts demonstrate that MU-SHOT-Fi effectively recovers multi-user exact-activity classification performance under large domain shifts while maintaining accurate occupancy estimation and preventing collapse toward dominant classes.

URL PDF HTML ☆

赞 0 踩 0

2605.00515 2026-05-22 cs.DC cs.AI cs.NI

SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks

SpaceMoE：在空间网络上实现分布式混合专家推理

Zhanwei Wang, Huiling Yang, Min Sheng, Khaled B. Letaief, Kaibin Huang

发表机构 * Department of Electrical and Computer Engineering, The University of Hong Kong (HKU)（香港大学电子与计算机工程系）； State Key Laboratory of Integrated Service Networks, Institute of Information Science, Xidian University（西安电子科技大学信息科学学院集成服务网络国家重点实验室）； Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology (HKUST)（香港科学与技术大学电子与计算机工程系）

AI总结本文提出SpaceMoE框架，旨在解决在卫星网络中高效部署大规模LLM的挑战，通过分层专家放置策略减少延迟，实现混合专家模型在空间环境中的高效推理。

详情

AI中文摘要

利用高效的连续太阳能采集，空间数据中心被视为执行高能耗大语言模型（LLMs）的有前途的平台。鉴于这一优势，航天和人工智能 conglomerates（如SpaceX、Google）正在积极投资这一愿景。然而，一个关键挑战是由于卫星上的计算和通信资源有限，高效地在卫星网络中部署大规模LLM。这导致了一个放置问题，需要将模型组件划分为卫星，以确保不同的模型架构和网络拓扑能够协调一致，从而实现低延迟的token生成。为了解决这个问题，我们提出了混合专家（MoE）的空间网络（SpaceMoE）框架，旨在在空间中分布式执行流行的混合专家模型。所提出的放置策略是两级的：（1）层放置，将MoE层分配给卫星子网；（2）层内专家放置，将单个专家分配给同一层/子网的卫星。对于层放置，我们利用自回归推断的环形通信模式，将卫星星座沿轨道方向划分为子网，每个子网托管一个MoE层。基于此架构，我们制定了并解决了层内专家放置的优化问题，以将具有异构激活概率的专家映射到卫星上。推导出的策略揭示了一个直观的原则：频繁激活的专家应映射到具有低预期延迟的路由路径上的卫星。实验表明，SpaceMoE在千卫星星座上实现了至少三倍于传统随机和消融放置策略的延迟降低。

英文摘要

Leveraging continuous solar energy harvesting at high efficiency, space data centers are envisioned as a promising platform for executing energy-intensive large language models (LLMs). Recognizing this advantage, space and AI conglomerates (e.g., SpaceX, Google) are actively investing in this vision. One key challenge, however, is the efficient distributed deployment of a large-scale LLM in a satellite network due to the limited onboard computing and communication resources. This gives rise to a placement problem that involves partitioning and mapping model components to satellites such that the fundamentally different model architecture and network topology can be reconciled to ensure low-latency token generation. To address this problem, we present the Space Network of Mixture-of-Experts (SpaceMoE) framework targeting the distributed execution of a popular mixture-of-experts (MoE) model in space. The proposed placement strategies are two-level: (1) layer placement, which assigns MoE layers to satellite subnets; and (2) intra-layer expert placement, which assigns individual experts to satellites associated with the same layer/subnet. For layer placement, we exploit the ring-like communication pattern of autoregressive inference to partition the satellite constellation along the orbiting direction into subnets arranged on a ring, each hosting one MoE layer. Based on this architecture, we formulate and solve an optimization problem for intra-layer expert placement to map experts with heterogeneous activation probabilities onto satellites. The derived strategy reveals an intuitive principle: a frequently activated expert should be mapped to a satellite on a routing path with low expected latency. Experiments over a thousand-satellite constellation show that SpaceMoE achieves at least a threefold latency reduction compared with conventional random and ablation-based placement strategies.

URL PDF HTML ☆

赞 0 踩 0

2604.02889 2026-05-22 stat.ML cs.AI cs.LG

Rethinking Forward Processes for Score-Based Nonlinear Data Assimilation in High Dimensions

重新思考高维数据同化中的分数基非线性数据同化前向过程

Eunbi Yoon, Won Chang, Donghan Kim, Dae Wook Kim

发表机构 * KAIST（韩国科学技术院）

AI总结本文提出了一种针对数据同化问题的改进前向过程，用于高维非线性系统的状态估计，通过改进的分数基滤波器在测量空间中转换系统状态，提高了同化性能。

详情

AI中文摘要

数据同化是通过结合模型预测和测量来估计动态系统状态的过程。当系统是非线性且高维时，这一任务变得具有挑战性。为了解决这个问题，最近出现了一种基于分数的贝叶斯滤波器。然而，这些方法在某些情况下仍表现不佳，特别是在空间稀疏测量下。这种退化源于对似然分数的启发式近似，其误差会随时间累积。这一限制是因为这些方法只是采用了一种经典的生成建模前向过程，将数据分布转化为高斯分布，而与测量方程无关。在这里，我们提出了一种针对滤波的前向过程，将系统状态转换到测量空间，从而实现了似然分数的理论严谨公式化。基于此，我们开发了测量感知的分数基滤波器（MASF）。我们在Kolmogorov流上评估了MASF，这是一个具有高达$\mathcal{O}(10^5)$维度的高维流体基准测试，包括非线性情况下的状态与测量之间的维度不匹配。MASF在现有分数基滤波器和集合型卡尔曼滤波器上表现出改进的性能。值得注意的是，当使用幅度预训练时，MASF相比基线实现了高达$28.2 imes$的时钟时间加速。我们的实现可在 exttt{https://github.com/tcnllab-oss/masf}获得。

英文摘要

Data assimilation is the process of estimating the state of a dynamical system over time by combining model predictions with measurements. This task becomes challenging when the system is nonlinear and high-dimensional. To address this, score-based Bayesian filters have recently emerged. However, these methods still show unsatisfactory performance in certain cases, particularly under spatially sparse measurements. Such degradation stems from heuristic approximations of the likelihood score, whose errors can accumulate over time. This limitation arises because the methods simply adopt a classical forward process for generative modeling that transforms a data distribution toward a Gaussian distribution, which is independent of the measurement equation. Here, we propose a forward process tailored for filtering that transforms the system state toward the measurement space, enabling a theoretically sound formulation of the likelihood score. Based on this, we develop the Measurement-Aware Score-based Filter (MASF). We evaluate MASF on Kolmogorov flow, a high-dimensional fluid benchmark with up to $\mathcal{O}(10^5)$ dimensions, under diverse measurement operators, including nonlinear cases with a dimensional mismatch between the state and the measurements. MASF shows improved performance over existing score-based filters and ensemble-type Kalman filters. Notably, MASF achieves up to a $28.2\times$ wall-clock speedup compared with the baselines when using amortized pretraining. Our implementation is available at \texttt{https://github.com/tcnllab-oss/masf}.

URL PDF HTML ☆

赞 0 踩 0

2603.15676 2026-05-22 cs.SE cs.AI

Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications

自动化自我测试作为质量门：基于证据的LLM应用发布管理

Alexandre Cristovão Maiorano

发表机构 * Lumytics

AI总结本文提出了一种自动化自我测试框架，通过五个实证基础的维度（任务成功率、研究环境保持、P95延迟、安全通过率和证据覆盖）实现基于证据的发布决策（PROMOTE/HOLD/ROLLBACK），并通过长期案例研究评估了该框架在多代理对话AI系统中的有效性。

Comments 20 pages, 6 figures, 12 tables

详情

AI中文摘要

LLM应用是AI系统，其非确定性输出和不断变化的模型行为使得传统测试不足以满足发布管理的需求。我们提出了一种自动化自我测试框架，引入了基于证据的发布决策质量门（PROMOTE/HOLD/ROLLBACK）五个实证基础的维度：任务成功率、研究环境保持、P95延迟、安全通过率和证据覆盖。我们通过一个长期案例研究评估该框架，该研究涉及一个内部部署的多代理对话AI系统，具有特定的营销能力，并在活跃开发中覆盖了38次评估运行，跨越20多个内部发布。质量门在早期运行中识别出两个ROLLBACK级构建，并在四周的 staging 生命周期中支持稳定的质量演变，同时执行了基于角色的、多轮的、对抗性和证据要求的场景。统计分析（Mann-Kendall趋势、Spearman相关性、bootstrap置信区间）、质量门消融和开销扩展表明，证据覆盖是主要的严重回归判别器，且运行时间与套件大小成比例增长。人类校准研究（n=60分层案例，两个独立评估者，LLM-as-judge交叉验证）揭示了互补的多模态覆盖：LLM-judge与系统门的分歧（kappa=0.13）可归因于结构失败模式——延迟违规和路由错误——这些在响应文本中是不可见的，而评估者独立地揭示了被结构检查遗漏的内容质量失败，这与多维门设计一致。该框架、补充伪代码和校准工件被提供以支持AI系统质量保证和独立复制。

英文摘要

LLM applications are AI systems whose nondeterministic outputs and evolving model behavior make traditional testing insufficient for release governance. We present an automated self-testing framework that introduces quality gates with evidence-based release decisions (PROMOTE/HOLD/ROLLBACK) across five empirically grounded dimensions: task success rate, research context preservation, P95 latency, safety pass rate, and evidence coverage. We evaluate the framework through a longitudinal case study of an internally deployed multi-agent conversational AI system with specific marketing capabilities in active development, covering 38 evaluation runs across 20+ internal releases. The gate identified two ROLLBACK-grade builds in early runs and supported stable quality evolution over a four-week staging lifecycle while exercising persona-grounded, multi-turn, adversarial, and evidence-required scenarios. Statistical analysis (Mann-Kendall trends, Spearman correlations, bootstrap confidence intervals), gate ablation, and overhead scaling indicate that evidence coverage is the primary severe-regression discriminator and that runtime scales predictably with suite size. A human calibration study (n=60 stratified cases, two independent evaluators, LLM-as-judge cross-validation) reveals complementary multi-modal coverage: LLM-judge disagreements with the system gate (kappa=0.13) are attributable to structural failure modes - latency violations and routing errors - invisible in response text alone, while the judge independently surfaces content quality failures missed by structural checks, consistent with a multi-dimensional gate design. The framework, supplementary pseudocode, and calibration artifacts are provided to support AI-system quality assurance and independent replication.

URL PDF HTML ☆

赞 0 踩 0

2603.04525 2026-05-22 stat.ML cs.LG

The Volterra signature

Volterra签名

Paul P. Hager, Fabian N. Harang, Luca Pelizzari, Samy Tindel

发表机构 * Department of Statistics and Operations Research, University of Vienna（统计与运筹学系，维也纳大学）； Department of Economics, BI Norwegian Business School（经济学系，BI挪威商学院）； Department of Mathematics, Purdue University（数学系，普渡大学）

AI总结本文提出Volterra签名作为处理历史依赖系统的显式特征表示，通过将输入路径与时间核结合到张量代数中，利用Volterra-Chen恒等式推导出严谨的学习理论保证，并展示其在动态学习任务中的有效性。

详情

AI中文摘要

现代处理非马尔可夫时间序列的学习方法，如循环神经网络、神经控制微分方程或变换器，通常依赖于隐式的记忆机制，这些机制在长时间范围内难以解释或训练。我们提出Volterra签名VSig(x;K)作为处理历史依赖系统的显式特征表示。通过将输入路径x加权时间核K转化为张量代数，我们利用相关的Volterra-Chen恒等式推导出严谨的学习理论保证。具体来说，我们证明了注入性陈述（在增强下可识别），从而在无限维路径空间上推导出通用逼近定理，这在某些情况下通过VSig(x;K)的线性泛函实现。此外，我们通过展示与Volterra签名相关的内积可通过二参数积分方程闭合地表示，证明了核技巧的应用，从而利用PDE的数值方法进行计算。对于一大类指数型核，VSig(x;K)在张量代数中解线性状态空间微分方程。结合对时间重参数化的不变性，这些结果将Volterra签名定位为数据科学中稳健且计算上可行的特征映射。我们在真实和合成数据上的动态学习任务中展示了其有效性，其中它一致地改进了经典路径签名基线。

英文摘要

Modern approaches for learning from non-Markovian time series, such as recurrent neural networks, neural controlled differential equations or transformers, typically rely on implicit memory mechanisms that can be difficult to interpret or to train over long horizons. We propose the \emph{Volterra signature} $\mathrm{VSig}(x;K)$ as a principled, explicit feature representation for history-dependent systems. By developing the input path $x$ weighted by a temporal kernel $K$ into the tensor algebra, we leverage the associated Volterra--Chen identity to derive rigorous learning-theoretic guarantees. Specifically, we prove an \emph{injectivity} statement (identifiability under augmentation) that leads to a \emph{universal approximation} theorem on the infinite dimensional path space, which in certain cases is achieved by \emph{linear functionals} of $\mathrm{VSig}(x;K)$. Moreover, we demonstrate applicability of the \emph{kernel trick} by showing that the inner product associated with Volterra signatures admits a closed characterization via a two-parameter integral equation, enabling numerical methods from PDEs for computation. For a large class of exponential-type kernels, $\mathrm{VSig}(x;K)$ solves a linear state-space ODE in the tensor algebra. Combined with inherent invariance to time reparameterization, these results position the Volterra signature as a robust, computationally tractable feature map for data science. We demonstrate its efficacy in dynamic learning tasks on real and synthetic data, where it consistently improves classical path signature baselines.

URL PDF HTML ☆

赞 0 踩 0

2603.04383 2026-05-22 cs.CY cs.CR cs.IR cs.LG cs.SI

Turning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy

将信任转化为交易：追踪YouTube的影响力经济中的affiliate营销与FTC合规性

Chen Sun, Yash Vekaria, Zubair Shafiq, Rishab Nithyanand

发表机构 * University of Iowa（爱荷华大学）； UC Davis（加州大学戴维斯分校）

AI总结本研究通过Web测量和NLP技术开发工具，分析YouTube上affiliate营销生态系统的现状，揭示affiliate链接的普及程度及非合规行为的比例，并提出通过标准化披露功能提高合规性的建议。

Comments ICWSM 2026

详情

AI中文摘要

YouTube已发展成一个强大的平台，创作者通过affiliate营销来 monetize 他们的影响力，这引发了关于透明度和伦理问题的担忧，尤其是在创作者未能披露其affiliate关系时。尽管监管机构如美国联邦贸易委员会（FTC）已发布指南以解决这些问题，但非合规和消费者伤害仍然存在，且这些问题的严重程度仍不清楚。在本文中，我们介绍了利用最近的Web测量和NLP研究进展开发的工具，以研究YouTube上的affiliate营销生态系统。我们应用这些工具对来自近54万创作者的200万视频的10年数据集进行分析，研究YouTube上affiliate营销的普及程度及非合规行为的比例。我们的发现表明，affiliate链接广泛存在，但披露合规性仍然很低，大多数视频未能达到FTC标准。此外，我们分析了不同利益相关者在改善披露行为上的影响。我们的研究表明，平台通过标准化披露功能与提高合规性密切相关。我们建议监管机构和affiliate合作伙伴应与平台合作，以提高影响力经济中的透明度、问责制和信任度。

英文摘要

YouTube has evolved into a powerful platform where creators monetize their influence through affiliate marketing, raising concerns about transparency and ethics, especially when creators fail to disclose their affiliate relationships. Although regulatory agencies like the US Federal Trade Commission (FTC) have issued guidelines to address these issues, non-compliance and consumer harm persist, and the extent of these problems remains unclear. In this paper, we introduce tools, developed with insights from recent advances in Web measurement and NLP research, to examine the state of the affiliate marketing ecosystem on YouTube. We apply these tools to a 10-year dataset of 2 million videos from nearly 540,000 creators, analyzing the prevalence of affiliate marketing on YouTube and the rates of non-compliant behavior. Our findings reveal that affiliate links are widespread, yet disclosure compliance remains low, with most videos failing to meet FTC standards. Furthermore, we analyze the effects of different stakeholders in improving disclosure behavior. Our study suggests that the platform is highly associated with improved compliance through standardized disclosure features. We recommend that regulators and affiliate partners collaborate with platforms to enhance transparency, accountability, and trust in the influencer economy.

URL PDF HTML ☆

赞 0 踩 0

2602.23833 2026-05-22 eess.IV cs.CV

Revisiting Integration of Image and Metadata for DICOM Series Classification: Cross-Attention and Dictionary Learning

重新审视图像与元数据整合用于DICOM系列分类：交叉注意力与字典学习

Tuan Truong, Melanie Dohmen, Sara Lorio, Matthias Lenga

发表机构 * Bayer AG（勃林格殷曼公司）

AI总结本文提出了一种端到端的多模态框架，用于DICOM系列分类，通过联合建模图像内容和获取元数据，显式考虑异质切片内容、可变系列长度和完全缺失、不完整或不一致的DICOM元数据等挑战。

Comments Early acceptance at MICCAI 2026

详情

AI中文摘要

自动化识别DICOM图像系列对于大规模医学图像分析、质量控制、协议标准化和可靠后续处理至关重要。然而，由于异质切片内容、可变系列长度和完全缺失、不完整或不一致的DICOM元数据，DICOM系列分类仍具挑战性。我们提出了一种端到端的多模态框架，用于DICOM系列分类，该框架联合建模图像内容和获取元数据，同时显式考虑这些挑战。（i）图像和元数据通过模态感知模块编码，并使用双向跨模态注意力机制融合。（ii）元数据通过基于可学习特征字典和值条件调制的稀疏、缺失感知编码器进行处理。通过设计，该方法不需要任何形式的填补。（iii）系列长度和图像数据维度的变化通过2.5D视觉编码器和在等距采样的切片上操作的注意力机制来处理。我们评估了所提出的方法在公开可用的Duke Liver MRI数据集和一个大型多机构内部队列上的表现，评估了域内性能和域外泛化能力。在所有评估设置中，所提出的方法一致优于相关的仅图像、仅元数据和多模态2D/3D基线。结果表明，显式建模元数据稀疏性和跨模态交互提高了DICOM系列分类的鲁棒性。

英文摘要

Automated identification of DICOM image series is essential for large-scale medical image analysis, quality control, protocol harmonization, and reliable downstream processing. However, DICOM series classification remains challenging due to heterogeneous slice content, variable series length, and entirely missing, incomplete or inconsistent DICOM metadata. We propose an end-to-end multimodal framework for DICOM series classification that jointly models image content and acquisition metadata while explicitly accounting for all these challenges. (i) Images and metadata are encoded with modality-aware modules and fused using a bi-directional cross-modal attention mechanism. (ii) Metadata is processed by a sparse, missingness-aware encoder based on learnable feature dictionaries and value-conditioned modulation. By design, the approach does not require any form of imputation. (iii) Variability in series length and image data dimensions is handled via a 2.5D visual encoder and attention operating on equidistantly sampled slices. We evaluate the proposed approach on the publicly available Duke Liver MRI dataset and a large multi-institutional in-house cohort, assessing both in-domain performance and out-of-domain generalization. Across all evaluation settings, the proposed method consistently outperforms relevant image only, metadata-only and multimodal 2D/3D baselines. The results demonstrate that explicitly modeling metadata sparsity and cross-modal interactions improves robustness for DICOM series classification.

URL PDF HTML ☆

赞 0 踩 0

2602.17973 2026-05-22 cs.CR cs.AI

PenTiDef: Decentralized Federated Intrusion Detection System with Differential Privacy and Latent-Space Defense via Blockchain Coordination in IIoT

PenTiDef：通过区块链协调在工业物联网中的去中心化联邦入侵检测系统，结合差分隐私和潜在空间防御

Phan The Duy, Nghi Hoang Khoa, Nguyen Tran Anh Quan, Luong Ha Tien, Ngo Duc Hoang Son, Van-Hau Pham

发表机构 * Information Security Lab（信息安全部实验室）； University of Information Technology（信息技术大学）； Vietnam National University（越南国家大学）； VNU-HCM Information Security Center（VNU-HCM信息安全部中心）

AI总结本文提出PenTiDef，一种完全去中心化、隐私保护且抗中毒的联邦入侵检测系统（DFL-IDS）。该系统整合了三个关键组件：（i）客户端侧的分布式差分隐私（DDP）通过随机高斯噪声保护梯度泄露；（ii）一个轻量级的潜在空间防御模块，通过自动编码器提取并压缩倒数第二层表示（PLRs）为稳定的潜在语义表示（LSRs），随后通过中心核对齐（CKA）和K-均值聚类进行鲁棒的恶意更新检测，无需辅助数据集；（iii）一个许可型区块链层，通过智能合约协调链上验证、安全FedAvg聚合和不可变审计性，消除任何中心服务器。在CIC-IDS2018和Edge-IIoTSet上进行的大量实验表明，在独立同分布（IID）和现实非独立同分布（non-IID）设置下，即使对抗比例高达40%，PenTiDef在检测准确率和F1分数上均优于最先进的基线（FLARE和FedCC），同时保持较低的训练开销。通过在统一的安全聚合协议中共同解决隐私、鲁棒性和去中心化问题，PenTiDef为异构、对抗性的工业物联网环境中的可信协作入侵检测提供了实用且可扩展的解决方案。

Comments version 2, change title of the paper

详情

AI中文摘要

This paper proposes PenTiDef, a fully decentralized, privacy-preserving, and poisoning-resilient framework for decentralized federated IDS (DFL-IDS). PenTiDef synergistically integrates three key components: (i) client-side Distributed Differential Privacy (DDP) with stochastic Gaussian noise to protect gradient leakage, (ii) a lightweight latent-space defense module that extracts and compresses penultimate-layer representations (PLRs) into stable Latent Semantic Representations (LSRs) via AutoEncoder, followed by Centered Kernel Alignment (CKA) and K-Means clustering for robust malicious update detection without auxiliary datasets, and (iii) a permissioned blockchain layer with smart contracts that orchestrates on-chain validation, secure FedAvg aggregation, and immutable auditability, eliminating any central server. Extensive experiments on CIC-IDS2018 and Edge-IIoTSet under both IID and realistic non-IID settings, with adversary ratios up to 40\%, demonstrate that PenTiDef consistently outperforms state-of-the-art baselines (FLARE and FedCC) in detection accuracy and F1-score while maintaining lower training overhead. By jointly addressing privacy, robustness, and decentralization in a unified secure aggregation protocol, PenTiDef provides a practical and scalable solution for trustworthy collaborative intrusion detection in heterogeneous, adversarial IIoT environments.

英文摘要

This paper proposes PenTiDef, a fully decentralized, privacy-preserving, and poisoning-resilient framework for decentralized federated IDS (DFL-IDS). PenTiDef synergistically integrates three key components: (i) client-side Distributed Differential Privacy (DDP) with stochastic Gaussian noise to protect gradient leakage, (ii) a lightweight latent-space defense module that extracts and compresses penultimate-layer representations (PLRs) into stable Latent Semantic Representations (LSRs) via AutoEncoder, followed by Centered Kernel Alignment (CKA) and K-Means clustering for robust malicious update detection without auxiliary datasets, and (iii) a permissioned blockchain layer with smart contracts that orchestrates on-chain validation, secure FedAvg aggregation, and immutable auditability, eliminating any central server. Extensive experiments on CIC-IDS2018 and Edge-IIoTSet under both IID and realistic non-IID settings, with adversary ratios up to 40\%, demonstrate that PenTiDef consistently outperforms state-of-the-art baselines (FLARE and FedCC) in detection accuracy and F1-score while maintaining lower training overhead. By jointly addressing privacy, robustness, and decentralization in a unified secure aggregation protocol, PenTiDef provides a practical and scalable solution for trustworthy collaborative intrusion detection in heterogeneous, adversarial IIoT environments.

URL PDF HTML ☆

赞 0 踩 0

2602.04703 2026-05-22 eess.SP cs.LG

Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels

利用亚6GHz通道进行毫米波波束预测的知识蒸馏

Sina Tavakolian, Nhan Thanh Nguyen, Ahmed Alkhateeb, Markku Juntti

发表机构 * Centre for Wireless Communications, University of Oulu, P.O.Box 4500, FI-90014, Finland（奥卢大学无线通信中心，芬兰）； School of Electrical, Computer, and Energy Engineering, Arizona State University, AZ, USA（亚利桑那州立大学电气、计算机与能源工程学院）

AI总结本文提出了一种基于知识蒸馏技术的高效框架，利用亚6GHz通道预测毫米波波束，通过紧凑的学生深度学习架构在减少计算和内存需求的同时保持性能。

Comments 5 pages, 4 figures. Accepted for publication at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2026

Journal ref Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 22642-22646, 2026

详情

DOI: 10.1109/ICASSP55912.2026.11461506

AI中文摘要

在毫米波（mmWave）高机动环境中，波束成形通常会带来显著的训练开销。尽管先前研究指出亚6GHz通道可用于预测最优毫米波波束，但现有方法依赖于大型深度学习（DL）模型，具有不可接受的计算和内存需求。本文提出了一种基于知识蒸馏（KD）技术的计算高效框架，用于亚6GHz通道-毫米波波束映射。我们开发了两种紧凑的学生DL架构，基于个体和关系蒸馏策略，仅保留少量隐藏层，却能紧密模仿大型教师DL模型的性能。大量仿真表明，所提出的学生模型在保持教师的波束预测准确性和频谱效率的同时，将可训练参数和计算复杂度减少了99%。

英文摘要

Beamforming in millimeter-wave (mmWave) high-mobility environments typically incurs substantial training overhead. While prior studies suggest that sub-6 GHz channels can be exploited to predict optimal mmWave beams, existing methods depend on large deep learning (DL) models with prohibitive computational and memory requirements. In this paper, we propose a computationally efficient framework for sub-6 GHz channel-mmWave beam mapping based on the knowledge distillation (KD) technique. We develop two compact student DL architectures based on individual and relational distillation strategies, which retain only a few hidden layers yet closely mimic the performance of large teacher DL models. Extensive simulations demonstrate that the proposed student models achieve the teacher's beam prediction accuracy and spectral efficiency while reducing trainable parameters and computational complexity by 99%.

URL PDF HTML ☆

赞 0 踩 0

2601.21853 2026-05-22 cs.IR cs.LG

LEMUR: Learned Multi-Vector Retrieval

LEMUR: 学习多向量检索

Elias Jääsaari, Ville Hyvönen, Teemu Roos

发表机构 * Department of Computer Science, University of Helsinki, Helsinki, Finland（赫尔辛基大学计算机科学系）

AI总结 LEMUR通过将多向量相似性搜索转化为监督学习问题，并利用现有单向量搜索索引加速检索，实现了高效的多向量相似性搜索，比现有方法快一个数量级。

Comments Accepted to ICML 2026

详情

AI中文摘要

由晚期交互模型生成的多向量表示，如ColBERT，在信息检索应用中比单向量表示具有更优越的检索质量。在多向量检索系统中，查询和文档均使用每个标记一个嵌入进行编码，相似性通过MaxSim相似性度量来衡量。然而，多向量检索的改进质量是以显著增加的搜索延迟为代价的。在本工作中，我们引入了LEMUR，一种简单而高效的多向量相似性搜索框架。LEMUR由两个连续的问题简化组成：首先，我们将多向量相似性搜索转化为一个可以使用单隐藏层神经网络解决的监督学习问题。其次，我们将在此模型下的推断简化为其潜在空间中的单向量相似性搜索，从而能够利用现有的单向量搜索索引来加速检索。LEMUR比先前的多向量相似性搜索方法快一个数量级。我们的代码可在https://github.com/ejaasaari/lemur获取。

英文摘要

Multi-vector representations generated by late interaction models, such as ColBERT, enable superior retrieval quality compared to single-vector representations in information retrieval applications. In multi-vector retrieval systems, both queries and documents are encoded using one embedding per token, and similarity between queries and documents is measured by the MaxSim similarity measure. However, the improved quality of multi-vector retrieval comes at the expense of significantly increased search latency. In this work, we introduce LEMUR, a simple yet efficient framework for multi-vector similarity search. LEMUR consists of two consecutive problem reductions: First, we formulate multi-vector similarity search as a supervised learning problem that can be solved using a one-hidden-layer neural network. Second, we reduce inference under this model to single-vector similarity search in its latent space, enabling the use of existing single-vector search indexes to accelerate retrieval. LEMUR is an order of magnitude faster than prior multi-vector similarity search methods. Our code is available at https://github.com/ejaasaari/lemur

URL PDF HTML ☆

赞 0 踩 0

2601.18094 2026-05-22 eess.AS cs.SD

OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion

OneVoice: 一个模型，三种场景——迈向统一的零样本语音转换

Zhichao Wang, Tao Li, Wenshuo Ge, Zihao Cui, Shilei Zhang, Junlan Feng

发表机构 * JIUTIAN Research（钧天研究院）； China Mobile（中国移动）； Beijing, China（北京，中国）

AI总结本文提出OneVoice，一种能够统一处理语音转换三种场景（语音克隆、语言保护和歌唱）的零样本框架，通过混合专家机制和双路径路由机制实现统一建模，并采用两阶段训练策略解决数据不平衡问题。

详情

AI中文摘要

最近语音转换（VC）的进展在说话人克隆和语言保护方面达到了新的里程碑。但该领域仍碎片化，依赖专门模型处理语言保护、表达和歌唱场景。我们提出OneVoice，一个统一的零样本框架，能够在单一模型中处理所有三种场景。OneVoice基于一个连续语言模型，通过无VAE的next-patch扩散进行训练，确保高保真和高效的序列建模。其统一设计的核心在于混合专家（MoE）机制，旨在显式建模共享的转换知识和场景特定的表达性。专家选择由双路径路由机制协调，包括共享专家隔离和场景感知的领域专家分配，结合全局-局部线索。为了精确条件化，场景特定的音调特征通过门控机制融合到每一层，允许适应性地使用音调信息。此外，为了实现核心思想并缓解数据不平衡问题（语音数据丰富，歌唱数据稀缺），我们采用两阶段渐进训练，包括基础预训练和使用LoRA基于的领域专家的场景增强。实验表明，OneVoice在所有三种场景中与专用模型匹配或超越，同时验证了灵活的场景控制，并提供了一种快速解码版本，仅需几步即可。音频样本可在演示页面上获取。

英文摘要

Recent progress of voice conversion~(VC) has achieved a new milestone in speaker cloning and linguistic preservation. But the field remains fragmented, relying on specialized models for linguistic-preserving, expressive, and singing scenarios. We propose OneVoice, a unified zero-shot framework capable of handling all three scenarios within a single model. OneVoice is built upon a continuous language model trained with VAE-free next-patch diffusion, ensuring high fidelity and efficient sequence modeling. Its core design for unification lies in a Mixture-of-Experts (MoE) designed to explicitly model shared conversion knowledge and scenario-specific expressivity. Expert selection is coordinated by a dual-path routing mechanism, including shared expert isolation and scenario-aware domain expert assignment with global-local cues. For precise conditioning, scenario-specific prosodic features are fused into each layer via a gated mechanism, allowing adaptive usage of prosody information. Furthermore, to enable the core idea and alleviate the imbalanced issue (abundant speech vs. scarce singing), we adopt a two-stage progressive training that includes foundational pre-training and scenario enhancement with LoRA-based domain experts. Experiments show that OneVoice matches or surpasses specialized models across all three scenarios, while verifying flexible control over scenarios and offering a fast decoding version as few as 2 steps. Audio samples are available on demo page.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

ACCoRD: Actor-Critic Conflict Resolution with Deep learning for O-RAN xApps

Impact of Atmospheric Turbulence and Pointing Error on Earth Observation

Kernel-Based Safe Exploration in Deep Reinforcement Learning

Temporal Coding as a Substrate for Sensorimotor Object Inference: A Spiking Reinterpretation of Thousand Brains Architecture

SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering?

Adversarial Trust Poisoning in Vehicular Collaborative Perception

Effective User-defined Keyword Spotting with Dual-stage Matching, Multi-modal Enrollment, and Continual Adaptation

Self-Supervised ConvLSTM for Fermi Large Area Telescope Transient Detection

Q-PhotoNAS: Hybrid Quantum Neural Architecture Search Framework on Photonic Devices

Not Yet: Humans Outperform LLMs in a Colonel Blotto Tournament

How to Build Marcus's Algebraic Mind: Algebro-Deterministic Substrate over Galois Fields

Memory-Induced Supra-Competitive Outcomes Between Deep Reinforcement Learning Agents in Optimal Trade Execution

Next-Acceleration-Scale Prediction for Autoregressive MRI Reconstruction

Information Processing Capacity of Stationary Physical Systems: Theory, Data-efficient Estimation Methods, and Photonic Demonstration

The Hidden Cost of Contextual Sycophancy: an AI Literacy Intervention in Human-AI Collaboration

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer

Evaluating Prompt Injection Defenses for Educational LLM Tutors: Security-Usability-Latency Trade-offs

MU-SHOT-Fi: Self-Supervised Multi-User Wi-Fi Sensing with Source-free Unsupervised Domain Adaptation

SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks

Rethinking Forward Processes for Score-Based Nonlinear Data Assimilation in High Dimensions

Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications

The Volterra signature

Turning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy

Revisiting Integration of Image and Metadata for DICOM Series Classification: Cross-Attention and Dictionary Learning

PenTiDef: Decentralized Federated Intrusion Detection System with Differential Privacy and Latent-Space Defense via Blockchain Coordination in IIoT

Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels

LEMUR: Learned Multi-Vector Retrieval

OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion