arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2502.09487 2026-05-22 cs.CL cs.AI cs.LG

Internal narratives parameterise affective states

内部叙事参数化情感状态

Jakub Onysk, Quentin J. M. Huys

发表机构 * Applied Computational Psychiatry Lab（应用计算精神病学实验室）； Max Planck UCL Centre for Computational Psychiatry and Ageing Research（马克斯·普朗克UCL计算精神病学与衰老研究中心）； Queen Square Institute of Neurology and Mental Health（圣夸克广场神经病学与心理健康研究所）； Neuroscience Department（神经科学系）； Division of Psychiatry（精神病学系）

AI总结本文通过量化参与者内部叙事的大语言模型表示及其子空间，研究了叙事与情感状态之间的关系，发现特定症状的描述性思维能够预测标准化的抑郁评分，并强调保持症状间的协方差对构建效度至关重要。

详情

AI中文摘要

描述我们如何用语言表达感受对于心理评估和干预至关重要，但叙事与情感状态之间的映射仍然理解不足。在两个大规模研究（n=1257）中，我们通过大语言模型表示及其子空间量化了参与者内部叙事的结构和动态，以参数化抑郁状态。在第一项研究中，我们发现对特定症状的描述性思维捕捉了预测标准化、自我报告抑郁评分的细粒度信息。关键的是，我们显示保持症状之间的特定协方差对于构效效度至关重要，这表明高维文本表示镜像了疾病的潜在几何结构。第二项研究探讨了这种关系的时间动态，当参与者与情感叙事互动时。我们发现量化内部叙事的变化导致自我报告的变化，而基线叙事严重性预测了后续情感变化的幅度。通过将情感视为计算状态，我们的结果强调了其核心、治疗相关功能：约束内部叙事的结构并整合上下文以塑造自我报告。

英文摘要

Characterising how we verbalise our feelings is central to psychological assessment and intervention, yet the mapping between narrative and affective state remains poorly understood. Across two large studies (n=1257), we parameterised the structure and dynamics of depressive states by quantifying participants' internal narratives through large-language-model representations and their subspaces. In Study 1, we found verbal descriptions of symptom-specific thoughts captured granular information predictive of standardised, self-reported depression scores. Critically, we show preserving the specific covariance between symptoms is essential for construct validity, suggesting high-dimensional text representations mirror the latent geometry of the disorder. Study 2 probed the temporal dynamics of this relationship as participants engaged with emotional narratives. We found quantified changes in internal narratives led to changes in self-report, while the baseline narrative severity predicted the magnitude of subsequent affective change. By framing affect as a computational state, our results highlight its core, therapeutically pertinent functions: constraining the structure of internal narratives and integrating context to shape self-report.

URL PDF HTML ☆

赞 0 踩 0

2501.00677 2026-05-22 cs.LG cs.CV cs.IT cs.NA math.IT math.NA stat.ML

Deeply Learned Robust Matrix Completion for Large-scale Low-rank Data Recovery

深度学习鲁棒矩阵补全用于大规模低秩数据恢复

HanQin Cai, Chandra Kundu, Jialin Liu, Wotao Yin

发表机构 * School of Data, Mathematical, and Statistical Sciences and the Department of Computer Science, University of Central Florida（数据、数学与统计科学学院和计算机科学系，中央佛罗里达大学）； School of Data, Mathematical, and Statistical Sciences, University of Central Florida（数据、数学与统计科学学院，中央佛罗里达大学）； Damo Academy, Alibaba US（阿里云美国研究院）

AI总结本文提出了一种可扩展且可学习的非凸方法，即学得鲁棒矩阵补全（LRMC），用于大规模鲁棒矩阵补全问题，该方法具有低计算复杂度和线性收敛性，并通过深度展开有效学习自由参数以实现最优性能，同时在合成数据集和实际应用中验证了其优越的实验性能。

Journal ref IEEE Transactions on Pattern Analysis and Machine Intelligence, 48(6): 6541-6556, 2026

详情

DOI: 10.1109/TPAMI.2026.3659041

AI中文摘要

鲁棒矩阵补全（RMC）是一种广泛使用的机器学习工具，同时解决低秩数据分析中的两个关键问题：缺失数据条目和极端异常值。本文提出了一种新颖的可扩展且可学习的非凸方法，称为学得鲁棒矩阵补全（LRMC），用于大规模RMC问题。LRMC具有低计算复杂度和线性收敛性。受所提出定理的启发，LRMC的自由参数可通过深度展开有效学习以达到最佳性能。此外，本文提出了一种灵活的前馈-递归-混合神经网络框架，将深度展开从固定次数迭代扩展到无限次数迭代。通过在合成数据集和实际应用中的广泛实验，验证了LRMC的优越的实验性能，包括视频背景减除、超声成像、面部建模和卫星图像云去除。

英文摘要

Robust matrix completion (RMC) is a widely used machine learning tool that simultaneously tackles two critical issues in low-rank data analysis: missing data entries and extreme outliers. This paper proposes a novel scalable and learnable non-convex approach, coined Learned Robust Matrix Completion (LRMC), for large-scale RMC problems. LRMC enjoys low computational complexity with linear convergence. Motivated by the proposed theorem, the free parameters of LRMC can be effectively learned via deep unfolding to achieve optimum performance. Furthermore, this paper proposes a flexible feedforward-recurrent-mixed neural network framework that extends deep unfolding from fix-number iterations to infinite iterations. The superior empirical performance of LRMC is verified with extensive experiments against state-of-the-art on synthetic datasets and real applications, including video background subtraction, ultrasound imaging, face modeling, and cloud removal from satellite imagery.

URL PDF HTML ☆

赞 0 踩 0

2411.02776 2026-05-22 cs.LG stat.AP

Deep learning-based modularized loading protocol for parameter estimation of Bouc-Wen class models

基于深度学习的模块化加载协议用于Bouc-Wen类模型参数估计

Sebin Oh, Junho Song, Taeyong Kim

发表机构 * Department of Civil and Environmental Engineering, University of California, Berkeley, CA, United States（加州大学伯克利分校土木与环境工程系）； Department of Civil Systems Engineering, Ajou University, Suwon, Republic of Korea（全州大学土木系统工程系）

AI总结本文提出了一种基于深度学习的模块化加载协议，用于优化Bouc-Wen类模型的参数估计。该协议包含两个关键部分：最优加载历史构建和基于CNN的快速参数估计。每个部分被分解为独立的子模块，针对不同的滞回行为（基本滞回、结构退化和咬合效应），使协议能够适应多种滞回模型。三种独立的CNN架构被开发出来以捕捉这些滞回行为的路径依赖性。通过在多样化的加载历史上训练这些CNN架构，识别出最小的加载序列，称为加载历史模块，并将其组合以构建最优的加载历史。三种训练好的CNN模型用作快速参数估计器。协议的数值评估，包括三栋钢结构框架的非线性时间历史分析和三栋钢筋混凝土框架的脆弱性曲线构建，表明该协议显著减少了总分析时间，同时保持或提高了估计精度。该协议可扩展到其他滞回模型，表明了一种系统的方法来识别通用滞回模型。

Journal ref Engineering Structures 339, 120458 (2025)

详情

DOI: 10.1016/j.engstruct.2025.120458

AI中文摘要

本研究提出了一种模块化的深度学习基于加载协议，用于Bouc-Wen（BW）类模型的最佳参数估计。该协议由两个关键组成部分组成：最佳加载历史构建和基于CNN的快速参数估计。每个组成部分被分解为独立的子模块，针对不同的滞回行为——基本滞回、结构退化和咬合效应——使协议能够适应多种滞回模型。开发了三种独立的CNN架构以捕捉这些滞回行为的路径依赖性。通过在多样化的加载历史上训练这些CNN架构，识别出最小的加载序列，称为加载历史模块，然后将其组合以构建最优的加载历史。三种训练好的CNN模型，分别在相应的加载历史模块上训练，用作快速参数估计器。协议的数值评估，包括三栋钢结构框架的非线性时间历史分析和三栋钢筋混凝土框架的脆弱性曲线构建，表明所提出的协议显著减少了总分析时间，同时保持或提高了估计精度。所提出的协议可以扩展到其他滞回模型，表明了一种系统的方法来识别通用滞回模型。

英文摘要

This study proposes a modularized deep learning-based loading protocol for optimal parameter estimation of Bouc-Wen (BW) class models. The protocol consists of two key components: optimal loading history construction and CNN-based rapid parameter estimation. Each component is decomposed into independent sub-modules tailored to distinct hysteretic behaviors-basic hysteresis, structural degradation, and pinching effect-making the protocol adaptable to diverse hysteresis models. Three independent CNN architectures are developed to capture the path-dependent nature of these hysteretic behaviors. By training these CNN architectures on diverse loading histories, minimal loading sequences, termed \textit{loading history modules}, are identified and then combined to construct an optimal loading history. The three CNN models, trained on the respective loading history modules, serve as rapid parameter estimators. Numerical evaluation of the protocol, including nonlinear time history analysis of a 3-story steel moment frame and fragility curve construction for a 3-story reinforced concrete frame, demonstrates that the proposed protocol significantly reduces total analysis time while maintaining or improving estimation accuracy. The proposed protocol can be extended to other hysteresis models, suggesting a systematic approach for identifying general hysteresis models.

URL PDF HTML ☆

赞 0 踩 0

2411.01332 2026-05-22 cs.LG cs.AI

A Mechanistic Explanatory Strategy for XAI

为XAI的解释性策略机制

Marcin Rabiza

发表机构 * Institute of Philosophy and Sociology, Polish Academy of Sciences（哲学与社会学院，波兰科学院）； Institute for Philosophy, Leiden University（哲学研究所，莱顿大学）

AI总结本文提出了一种基于机制的解释性策略，旨在通过分解、定位和重组来揭示深度学习系统功能组织的机制，从而改进可解释人工智能的理论基础和实践应用。

详情

DOI: 10.1007/978-3-032-10073-3_23

AI中文摘要

尽管在XAI领域取得了显著进展，学者们指出缺乏坚实的理论基础和与更广泛科学解释 discourse 的整合仍是持续存在的问题。为此，新兴研究借鉴了各种科学和科学哲学文献中的解释策略来填补这些空白。本文概述了一种用于解释深度学习系统功能组织的机制性策略，将近期的可解释人工智能发展置于更广泛的哲学背景下。根据机制方法，对不透明AI系统的解释涉及识别驱动决策的机制。对于深度神经网络，这意味着辨别功能相关组件，如神经元、层、电路或激活模式，并通过分解、定位和重组来理解其作用。图像识别和语言模型的证明原理案例研究将这些理论方法与OpenAI和Anthropic的机制可解释性研究相结合。研究结果表明，追求机制性解释可以揭示传统可解释性技术可能忽略的元素，最终促进更彻底的可解释人工智能。

英文摘要

Despite significant advancements in XAI, scholars note a persistent lack of solid conceptual foundations and integration with broader scientific discourse on explanation. In response, emerging research draws on explanatory strategies from various sciences and the philosophy of science literature to fill these gaps. This paper outlines a mechanistic strategy for explaining the functional organization of deep learning systems, situating recent developments in explainable AI within a broader philosophical context. According to the mechanistic approach, the explanation of opaque AI systems involves identifying mechanisms that drive decision making. For deep neural networks, this means discerning functionally relevant components, such as neurons, layers, circuits, or activation patterns, and understanding their roles through decomposition, localization, and recomposition. Proof-of-principle case studies from image recognition and language modeling align these theoretical approaches with mechanistic interpretability research from OpenAI and Anthropic. The findings suggest that pursuing mechanistic explanations can uncover elements that traditional explainability techniques may overlook, ultimately contributing to more thoroughly explainable AI

URL PDF HTML ☆

赞 0 踩 0

2410.04753 2026-05-22 cs.AI cs.CL cs.LG cs.LO

ImProver: Agent-Based Automated Proof Optimization

ImProver：基于代理的自动证明优化

Riyaz Ahuja, Jeremy Avigad, Prasad Tetali, Sean Welleck

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结本文研究了自动证明优化问题，提出ImProver这一基于大语言模型的代理，用于重写证明以优化长度、可读性等任意标准，实验表明其能显著缩短证明并提高其模块化和可读性。

Comments Published as a conference paper at ICLR 2025

详情

AI中文摘要

大型语言模型（LLMs）已被用于在证明助手如Lean中生成数学定理的正式证明。然而，我们通常希望根据不同的下游用途优化正式证明，例如使其符合某种风格、易于阅读、简洁或模块化。适当优化的证明对于学习任务也非常重要，尤其是因为人工撰写的证明可能不适用于此目的。为此，我们研究了一个新的问题：自动证明优化，即重写证明以使其正确并优化任意标准，如长度或可读性。作为自动证明优化的一种初步方法，我们提出了ImProver，这是一个能够重写证明以优化任意用户定义指标的大型语言模型代理。我们发现直接应用LLMs进行证明优化效果有限，并在ImProver中引入了各种改进，例如新颖的链式状态技术中的符号Lean上下文使用，以及错误校正和检索。我们测试了ImProver在重写真实世界中的本科、竞赛和研究级数学定理方面的性能，发现ImProver能够重写证明使其显著更短、更模块化和更易读。

英文摘要

Large language models (LLMs) have been used to generate formal proofs of mathematical theorems in proofs assistants such as Lean. However, we often want to optimize a formal proof with respect to various criteria, depending on its downstream use. For example, we may want a proof to adhere to a certain style, or to be readable, concise, or modularly structured. Having suitably optimized proofs is also important for learning tasks, especially since human-written proofs may not optimal for that purpose. To this end, we study a new problem of automated proof optimization: rewriting a proof so that it is correct and optimizes for an arbitrary criterion, such as length or readability. As a first method for automated proof optimization, we present ImProver, a large-language-model agent that rewrites proofs to optimize arbitrary user-defined metrics in Lean. We find that naively applying LLMs to proof optimization falls short, and we incorporate various improvements into ImProver, such as the use of symbolic Lean context in a novel Chain-of-States technique, as well as error-correction and retrieval. We test ImProver on rewriting real-world undergraduate, competition, and research-level mathematics theorems, finding that ImProver is capable of rewriting proofs so that they are substantially shorter, more modular, and more readable.

URL PDF HTML ☆

赞 0 踩 0

2403.03920 2026-05-22 cs.AI cs.CL cs.HC

Enhancing Instructional Quality: Leveraging Computer-Assisted Textual Analysis to Generate In-Depth Insights from Educational Artifacts

提升教学质量：利用计算机辅助文本分析从教育资料中生成深入见解

Zewei Tian, Min Sun, Alex Liu, Shawon Sarkar, Jing Liu

发表机构 * University of Washington（华盛顿大学）； University of Maryland（马里兰大学）

AI总结本文探讨了计算机辅助文本分析在通过教育资料的深入分析提升教学质量的变革潜力，结合Richard Elmore的Instructional Core Framework，分析AI和机器学习方法，特别是自然语言处理（NLP），如何分析教育内容、教师话语和学生回答以促进教学改进，并指出AI/ML在教师指导、学生支持和内容开发中的关键优势。

详情

DOI: 10.4337/9781035321544.00019

AI中文摘要

本文探讨了计算机辅助文本分析在通过教育资料的深入分析提升教学质量的变革潜力。我们整合Richard Elmore的Instructional Core Framework，以探讨人工智能（AI）和机器学习（ML）方法，特别是自然语言处理（NLP），如何分析教育内容、教师话语和学生回答，以促进教学改进。通过在Instructional Core Framework内的全面回顾和案例研究，我们识别出AI/ML整合在教师指导、学生支持和内容开发中的关键优势。我们揭示出模式，表明AI/ML不仅简化了行政任务，还引入了新的个性化学习路径，为教育工作者提供可操作的反馈，并有助于更深入地理解教学动态。本文强调了将AI/ML技术与教学目标对齐的重要性，以在教育环境中实现其全部潜力，倡导一种平衡的方法，考虑伦理问题、数据质量和人类专业知识的整合。

英文摘要

This paper explores the transformative potential of computer-assisted textual analysis in enhancing instructional quality through in-depth insights from educational artifacts. We integrate Richard Elmore's Instructional Core Framework to examine how artificial intelligence (AI) and machine learning (ML) methods, particularly natural language processing (NLP), can analyze educational content, teacher discourse, and student responses to foster instructional improvement. Through a comprehensive review and case studies within the Instructional Core Framework, we identify key areas where AI/ML integration offers significant advantages, including teacher coaching, student support, and content development. We unveil patterns that indicate AI/ML not only streamlines administrative tasks but also introduces novel pathways for personalized learning, providing actionable feedback for educators and contributing to a richer understanding of instructional dynamics. This paper emphasizes the importance of aligning AI/ML technologies with pedagogical goals to realize their full potential in educational settings, advocating for a balanced approach that considers ethical considerations, data quality, and the integration of human expertise.

URL PDF HTML ☆

赞 0 踩 0

2401.00139 2026-05-22 cs.AI cs.CL cs.LG stat.ME

Enhancing Causal Reasoning in Large Language Models: A Causal Attribution Model for Precision Fine-Tuning

增强大语言模型中的因果推理：一种用于精确微调的因果归因模型

Hengrui Cai, Shengjie Liu, Rui Song

发表机构 * University of California, Irvine（加州大学尔湾分校）； North Carolina State University（北卡罗来纳州立大学）； Amazon（亚马逊公司）

AI总结本文提出一种因果归因模型，通过精确微调提升大语言模型的可解释性和因果推理能力，展示了模型在不同领域中的因果发现任务中的有效性。

Comments A Python implementation of our proposed method is available at https://github.com/ncsulsj/Causal_LLM

详情

AI中文摘要

本文介绍了一种因果归因模型，旨在通过精确微调增强大语言模型（LLMs）的可解释性并提高其因果推理能力。尽管LLMs在多种任务中表现出色，但其推理过程往往仍是一个黑箱，限制了有针对性的增强。我们提出了一种新的因果归因模型，利用“do-运算符”构建干预场景，使我们能够系统地量化LLMs因果推理过程中不同组件的贡献。通过在各种领域中进行因果发现任务来评估所提出的归因分数，我们证明了LLMs在因果发现中的有效性严重依赖于提供的上下文和领域特定知识，但也可以利用数值数据进行有限的相关性推理，而非因果性。这促使了所提出的微调LLM用于成对因果发现，有效且正确地利用了知识和数值信息。

英文摘要

This paper introduces a causal attribution model to enhance the interpretability of large language models (LLMs) and improve their causal reasoning abilities via precise fine-tuning. Despite LLMs' proficiency in diverse tasks, their reasoning processes often remain black box, and thus restrict targeted enhancement. We propose a novel causal attribution model that utilizes "do-operators" for constructing interventional scenarios, allowing us to quantify the contribution of different components in LLMs's causal reasoning process systematically. By assessing the proposed attribution scores through causal discovery tasks across various domains, we demonstrate that LLMs' effectiveness in causal discovery heavily relies on provided context and domain-specific knowledge but can also utilize numerical data with limited calculations in correlation, not causation. This motivates the proposed fine-tuned LLM for pairwise causal discovery, effectively and correctly leveraging both knowledge and numerical information.

URL PDF HTML ☆

赞 0 踩 0

2311.04938 2026-05-22 cs.CV cs.AI cs.LG

Improved DDIM Sampling with Moment Matching Gaussian Mixtures

改进的DDIM采样与矩匹配高斯混合模型

Prasad Gabbur

发表机构 * Independent Researcher（独立研究者）； Apple（苹果公司）

AI总结本文提出在DDIM框架中使用高斯混合模型作为反向转换操作符，通过约束GMM参数匹配DDPM前向边缘的矩，从而在少量采样步骤下提升生成样本质量，实验表明GMM核在FID和IS指标上优于传统高斯核。

Comments 34 pages, 12 figures; Accepted to TMLR; Code open sourced

Journal ref Transactions on Machine Learning Research, 05/2026

详情

AI中文摘要

我们提出在去噪扩散隐式模型（DDIM）框架中使用高斯混合模型（GMM）作为反向转换操作符（核），这是用于加速从预训练去噪扩散概率模型（DDPM）采样的最广泛使用的 approaches 之一。具体而言，我们通过约束GMM参数来匹配DDPM前向边缘的一阶和二阶中心矩。我们发现矩匹配足以获得与原始DDIM高斯核相等或更好的样本质量。我们分别在无条件模型（训练于CelebAHQ和FFHQ）、类条件模型（训练于ImageNet）以及使用Stable Diffusion v2.1在COYO700M数据集上进行文本到图像生成实验。我们的结果表明，当采样步骤数较小时，使用GMM核可显著提升生成样本的质量，如在ImageNet 256x256上，使用10个采样步骤时，GMM核的FID为6.94，IS为207.85，而高斯核分别为10.15和196.73。此外，我们还为修正流匹配模型推导了新的SDE采样器，并对所提出的方法进行了实验。我们发现使用1-修正流和2-修正流模型均有所改进。代码：https://github.com/pgabbur/ddim-gmm。

英文摘要

We propose using a Gaussian Mixture Model (GMM) as reverse transition operator (kernel) within the Denoising Diffusion Implicit Models (DDIM) framework, which is one of the most widely used approaches for accelerated sampling from pre-trained Denoising Diffusion Probabilistic Models (DDPM). Specifically we match the first and second order central moments of the DDPM forward marginals by constraining the parameters of the GMM. We see that moment matching is sufficient to obtain samples with equal or better quality than the original DDIM with Gaussian kernels. We provide experimental results with unconditional models trained on CelebAHQ and FFHQ, class-conditional models trained on ImageNet, and text-to-image generation using Stable Diffusion v2.1 on COYO700M datasets respectively. Our results suggest that using the GMM kernel leads to significant improvements in the quality of the generated samples when the number of sampling steps is small, as measured by FID and IS metrics. For example on ImageNet 256x256, using 10 sampling steps, we achieve a FID of 6.94 and IS of 207.85 with a GMM kernel compared to 10.15 and 196.73 respectively with a Gaussian kernel. Further, we derive novel SDE samplers for rectified flow matching models and experiment with the proposed approach. We see improvements using both 1-rectified flow and 2-rectified flow models. Code: https://github.com/pgabbur/ddim-gmm.

URL PDF HTML ☆

赞 0 踩 0

2306.05905 2026-05-22 cs.LG math.OC

TreeDQN: Sample-Efficient Off-Policy Reinforcement Learning for Combinatorial Optimization

TreeDQN: 一种用于组合优化的高效离策略强化学习方法

D. Sorokin, A. Kostin, L. Savchenko, G. Gusev, A. V. Savchenko

发表机构 * Sber AI Lab（Sber AI实验室）； Laboratory for Theoretical Foundations of AI Models, HSE University（人工智能模型理论基础实验室，HSE大学）

AI总结 TreeDQN通过优化几何平均预期回报，提高了离策略强化学习在组合优化任务中的样本效率，并在合成任务和ML4CO竞赛中表现优异。

Comments Accepted in Knowledge-Based Systems

详情

AI中文摘要

解决组合优化任务的一种方便方法是分支定界法。其分支启发式可以学习以解决大量相似任务。在这里取得的有希望的结果是通过最近出现的基于树马尔可夫决策过程的在线策略强化学习方法实现的。为了克服其主要缺点，即训练时间非常大和不稳定，我们提出了TreeDQN（树深度Q网络），一种样本效率高的离策略RL方法，通过优化预期回报的几何平均来训练。为了理论支持我们的方法的训练过程，我们证明了树MDP中Bellman算子的收缩性质。结果表明，我们的方法所需的训练数据最多减少10倍，并在合成任务上比已知的在线策略方法运行更快。此外，TreeDQN在ML4CO竞赛中的挑战性实际任务上显著优于最先进的技术。

英文摘要

A convenient approach to optimally solving combinatorial optimization tasks is the Branch-and-Bound method. Its branching heuristic can be learned to solve a large set of similar tasks. The promising results here are achieved by the recently appeared on-policy reinforcement learning method based on the tree Markov Decision Process. To overcome its main disadvantages, namely, very large training time and unstable training, we propose TreeDQN (Tree Deep Q-Network), a sample-efficient off-policy RL method trained by optimizing the geometric mean of expected return. To theoretically support the training procedure for our method, we prove the contraction property of the Bellman operator for the tree MDP. As a result, our method requires up to 10 times less training data and performs faster than known on-policy methods on synthetic tasks. Moreover, TreeDQN significantly outperforms the state-of-the-art techniques on a challenging practical task from the ML4CO competition.

URL PDF HTML ☆

赞 0 踩 0

2605.22779 2026-05-22 cs.SE cs.LG

FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection

FAME：面向失败的混合专家模型用于消息级日志异常检测

Huanchi Wang, Zihang Huang, Yifang Tian, Kristina Dzeparoska, Hans-Arno Jacobsen, Alberto Leon-Garcia

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）

AI总结本文提出FAME，一种面向失败的混合专家模型，用于消息级日志异常检测。该方法通过少量标注数据训练轻量级路由器和领域专家，实现高效的异常检测，同时在BGL和Thunderbird数据集上取得了高精度和召回率。

Comments 12 pages, 5 figures

详情

AI中文摘要

生产系统每天生成数百万条日志行，但大多数异常检测器在会话或窗口级别工作，标记的是行组而非特定消息。这种粗粒度迫使操作员每条警报都要检查许多常规行。消息级检测提供更细粒度，但仍然具有挑战性。一个事件模板可能对应正常和异常消息，故障源于异构子系统，大规模行级标注不切实际。尽管大型语言模型（LLMs）可以推断日志语义，但将其应用于每条行对于持续监控来说成本太高。我们提出了FAME（Failure-Aware Mixture-of-Experts），一种标签高效的面向消息级的混合专家框架，该框架仅在离线时使用LLM一次。我们最多为每个模板标注K条标注行以推导二元正常/异常指标和代表性示例。LLM提出将模板划分为故障领域，并通过认证步骤验证该提议后再进行训练。FAME训练了一个轻量级路由器和领域专家，这些专家在本地运行，并输出异常预测和故障领域标签。在BGL上，FAME在K=100时达到F1=98.16，将标注工作量减少76倍，并检测出86.3%的未见过的EventIDs异常。在Thunderbird上，FAME达到F1=99.95，具有完美的召回率。

英文摘要

Production systems generate millions of log lines daily, yet most anomaly detectors operate at the session or window-level, flagging groups of lines rather than identifying the specific message responsible. This coarse granularity forces operators to inspect many routine lines per alert. Message-level detection offers finer granularity, but remains challenging. A single event template may correspond to both normal and anomalous messages, failures arise from heterogeneous subsystems, and line-level labeling at scale is impractical. Although large language models (LLMs) can reason over log semantics, applying them to every line is too costly for continuous monitoring. We present FAME (Failure-Aware Mixture-of-Experts), a label-efficient message-level mixture-of-experts framework that uses an LLM only once offline. We annotate at most K labeled lines per template to derive binary normal/anomaly indicators and representative examples. The LLM proposes a partition of templates into failure domains, and a certification step validates the proposal before training. FAME trains a lightweight router and domain experts that run on-premise and output anomaly predictions and failure-domain labels. On BGL, FAME achieves F1 = 98.16 at K = 100 reducing annotation effort by 76x and detects 86.3% of anomalies from unseen EventIDs. On Thunderbird, FAME reaches F1 = 99.95 with perfect recall.

URL PDF HTML ☆

赞 0 踩 0

2605.22736 2026-05-22 math.OC cs.LG cs.NA math.DG math.NA

Optimization over the intersection of manifolds

在两个流形交集上的优化

Yan Yang, Bin Gao, Ya-xiang Yuan

发表机构 * State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and University of Chinese Academy of Sciences, China（数学科学国家重点实验室，数学与系统科学研究院，中国科学院，以及中国科学院大学，中国）

AI总结本文提出了一种几何方法，通过在单个流形上进行重新参数化，并在两个正交方向上更新迭代点，以解决两个流形交集上的优化问题，证明了清洁交集和内在横贯性是等价的，并展示了该方法在稀疏和低秩优化问题中的有效性。

Comments 26 pages, 5 figures, 3 tables

详情

AI中文摘要

在两个流形交集上的优化出现在广泛的应用中，但受到可行区域耦合几何的阻碍。在本文中，我们证明了正则性——清洁交集和内在横贯性——是等价的，这导致了可处理的交集切空间投影。因此，我们提出了一种几何方法，该方法仅在单个流形上使用重新参数化，并在两个正交方向上更新迭代点。具体而言，迭代点停留在一个流形上，而这两个方向分别负责渐近接近另一个流形和减少目标函数。在内在横贯性下，我们推导了可行性和最优性度量的收敛速度，并证明了每个积累点都是第一阶 stationary 的。在稀疏和低秩优化问题上的数值实验，包括拟合球形数据、在真实数据上近似双曲嵌入和计算压缩模式，展示了所提方法的有效性。

英文摘要

Optimization over the intersection of two manifolds arises in a broad range of applications, but is hindered by the coupled geometry of the feasible region. In this paper, we prove that the regularities -- clean intersection and intrinsic transversality -- are equivalent, which yields a tractable projection onto the tangent space of the intersection. Therefore, we propose a geometric method that employs a retraction on only one manifold and updates the iterate along two orthogonal directions. Specifically, the iterates stay on one manifold, and the two directions are responsible for asymptotically approaching the other manifold and decreasing the objective function, respectively. Under intrinsic transversality, we derive the convergence rate for both the feasibility and optimality measures, and show that every accumulation point is first-order stationary. Numerical experiments on problems stemming from sparse and low-rank optimization, including fitting spherical data, approximating hyperbolic embeddings on real data, and computing compressed modes, demonstrate the effectiveness of the proposed method.

URL PDF HTML ☆

赞 0 踩 0

2605.22709 2026-05-22 cs.CR cs.ET cs.RO cs.SY eess.SY

TriSweep: A Four-Drone Swarm Framework for Electromagnetic Side-Channel Analysis

TriSweep: 一种四无人机群框架用于电磁侧信道分析

Eric Yocam, Varghese Vaidyan

发表机构 * Department of Computer Science, College of Engineering, California Polytechnic State University, San Luis Obispo, CA 93407, USA（计算机科学系，工程学院，加州州立大学，圣路易斯奥比斯波分校，CA 93407，USA）

AI总结本文提出TriSweep框架，通过四无人机群实现自主远距离电磁侧信道分析，针对嵌入式微控制器在0.25-1.5米范围内进行攻击，通过空间专业化收集无人机和固定积累无人机的协同工作，实现信号增强和掩码消除，验证了无人机群在对抗环境中的有效性。

Comments Simulation framework + systems design for a four-drone swarm performing standoff electromagnetic side-channel analysis. No hardware fabricated yet

详情

AI中文摘要

电磁（EM）侧信道分析传统上假设存在一个静止且近距离的探测器，这种威胁模型低估了空中对手的威胁。TriSweep是一种模拟框架，设计并评估了一种四无人机群架构，用于自主远距离电磁侧信道分析（EM-SCA）嵌入式微控制器，距离为0.25-1.5米。三个空间专业化收集无人机——锚点（全频谱）、掩码探测器（掩码寄存器加载泄漏）和密码探测器（掩码SubBytes输出泄漏）——将信号馈入一个固定积累无人机，该无人机通过两个空间分离泄漏流的中心乘积进行相干结合（+4.8 dB信噪比增益）和二次掩码消除。在三个真实的ANSSI ASCAD数据集（ATmega8515掩码AES-128和50/100样本非同步变体）上评估该框架，其在0.25米范围内主要掩码数据集上实现了模拟密钥排名为18±1.7（五种子）。通过轮廓跟踪轨迹交叉相关对齐，单无人机排名从89降低到21，在100样本抖动变体上展示了对无人机悬停振动的补偿。积累无人机中的两个通道CNN收敛到损失为0.454（与随机基线5.545相比）并在非同步数据集上提高了排名。尚未制造物理硬件；原型构建是下一步计划。

英文摘要

Electromagnetic (EM) side-channel analysis traditionally assumes a stationary, close-proximity probe - a threat model that underestimates aerial adversaries. TriSweep is a simulation framework that designs and evaluates a four-drone swarm architecture for autonomous standoff EM-SCA of embedded microcontrollers at 0.25-1.5 m. Three spatially specialized collector drones - Anchor (full-spectrum), Mask Probe (mask-register loading leakage), and Cipher Probe (masked SubBytes output leakage) - feed a stationary Accumulator drone that performs coherent combining (+4.8 dB SNR gain) and second-order mask cancellation via a centered product of the two spatially separated leakage streams. Evaluated against three real ANSSI ASCAD datasets (ATmega8515 masked AES-128 and 50/100-sample desynchronized variants), the framework achieves a simulated key rank of 18 +/- 1.7 (five-seed) at 0.25 m on the primary masked dataset. Profiling-trace cross-correlation alignment reduces single-drone rank from 89 to 21 on the 100-sample-jitter variant, demonstrating compensation for drone hover vibration. A two-channel CNN in the Accumulator converges to a loss of 0.454 (vs. random baseline 5.545) and improves rank on desynchronized datasets. No physical hardware has been fabricated; prototype construction is the planned next step.

URL PDF HTML ☆

赞 0 踩 0

2605.22666 2026-05-22 math.CO cs.LG math.PR

Holographic functions and neural networks

全息函数与神经网络

Balazs Szegedy

发表机构 * Rényi Institute of Mathematics（雷尼数学研究所）

AI总结本文研究了全息函数的复杂性，通过三种不同方法（采样性质、结构性质和计算性质）探讨了全息函数的复杂性界限，并证明了这三种性质在参数上是等价的。

2605.22653 2026-05-22 cs.DS cs.LG

The Secretary Problem with a Stochastic Precursor

带随机前导的秘书问题

Franziska Eberle, Alexander Lindermayr

发表机构 * Institut für Mathematik, Technische Universität Berlin, Germany（柏林技术大学数学研究所，德国）

AI总结本文研究了带随机前导的秘书问题，展示了预测仅因其到达时间而有价值。在随机顺序模型中，单个均匀时间的前导可使成功概率达到至少1/2，优于经典1/e的基准。在对抗性顺序模型中，足够集中的前导可恢复常数成功保证。

详情

AI中文摘要

在学习增强的在线算法中，预测通常因其提供的价值估计、解决方案或算法推荐而被重视。本文表明，预测仅因其到达时间而有价值。我们研究了带随机前导的秘书问题：一种无内容的信号，保证在最佳项目之前到达，但其他时间是随机的。该信号不携带额外信息；然而，其到达时间本身改变了最优停止策略的结构。我们分别在随机顺序和对抗性顺序模型中刻画了最优策略。在随机顺序中，单个均匀时间的前导可使成功概率达到至少1/2，优于经典1/e的基准。随着前导时间越来越晚，成功概率接近1。在对抗性顺序中，对于传统模型无法提供强保证的情况，足够集中的前导可恢复常数成功保证。我们的结果表明，这种新型的异步时间信息是在线决策中的独特且强大的建议形式，可能对其他问题也有效。

英文摘要

In learning-augmented online algorithms, predictions are usually valued for what they say: a value estimate, a solution, or an algorithmic recommendation. This paper shows that predictions can also be valuable solely due to their arrival time. We study the fundamental secretary problem augmented with a stochastic precursor: a content-free signal that is guaranteed to arrive no later than the best item, but is otherwise stochastically timed. The signal does not carry any additional information; nevertheless, its timing alone changes the structure of optimal stopping. We characterize optimal policies in the random-order and adversarial-order models. In random order, a single uniformly timed precursor already gives success probability at least $\frac12$, improving on the classic $\frac1e$ benchmark. With increasingly late precursors, the success probability approaches $1$. In adversarial order, for which traditional models do not admit strong guarantees, sufficiently concentrated precursors recover constant success guarantees. Our results show that such novel forms of asynchronous temporal information are a distinct and powerful form of advice in online decision making and may also be effective for other problems.

URL PDF HTML ☆

赞 0 踩 0

2605.22621 2026-05-22 cs.CR cs.LG cs.NI

UNAD+: An Explainable Hybrid Framework for Unknown Network Attack Detection

UNAD+: 一种用于未知网络攻击检测的可解释混合框架

Saif Alzubi, Frederic Stahl

发表机构 * Department of Computer Science, University of Exeter（埃克塞特大学计算机科学系）； German Research Center for Artificial Intelligence GmbH (DFKI)（德国人工智能研究中心（DFKI））； Marine Perception（海洋感知）

AI总结本文提出UNAD+框架，结合无监督集成与监督精修阶段，通过集成可解释性层提升未知网络攻击检测的性能和透明度。

详情

AI中文摘要

先前未见的网络攻击检测仍然是入侵检测系统面临的主要挑战。尽管监督学习方法在已知攻击类别上表现良好，但当新攻击类型未在训练数据中表示时，它们的性能受限。无监督方法更适合检测零日攻击，因为它们不需要标记的攻击样本，但它们通常具有较高的误报率，这限制了其在现实中的实用性。本文提出了UNAD+，一种改进的未知网络攻击检测框架，源自之前提出的Unknown Network Attack Detector (UNAD)。UNAD+结合了仅良性样本的无监督集成、加权多数投票（WMV），一种在伪标记检测上训练的监督精修阶段，以及一个后验可解释性层，提供局部和全局解释。该框架在CICIDS2017和NSL-KDD基准数据集上进行了评估。结果表明，UNAD+在原始UNAD框架上有所改进，在基准数据集上实现了超过98%的F1分数，同时显著减少了误报率，并通过集成可解释性增强了透明度和部署适用性。

英文摘要

The detection of previously unseen network attacks remains a major challenge for intrusion detection systems. Although supervised learning methods often perform well on known attack classes, they are limited when new attack types are not represented in the training data. Unsupervised methods are more suitable for detecting zero-day attacks, as they do not require labelled attack samples, but they often suffer from high false positive rates, which limits their real-world usefulness. This paper presents UNAD+, an enhanced framework for unknown network attack detection derived from the previously proposed Unknown Network Attack Detector (UNAD). UNAD+ combines a benign-only unsupervised ensemble with Weighted Majority Voting (WMV), a supervised refinement stage trained on pseudo-labelled detections, and a post hoc explainability layer that provides both local and global explanations. The framework was evaluated on the CICIDS2017 and NSL-KDD benchmark datasets. The results show that UNAD+ improves on the original UNAD framework, achieving F1-scores above 98% across the benchmark datasets while significantly reducing false positives and enhancing transparency and deployment suitability through integrated explainability.

URL PDF HTML ☆

赞 0 踩 0

2605.22612 2026-05-22 cs.CY cs.AI cs.LG

Healthcare LLM Benchmarks Are Only as Good as Their Explicit Assumptions

医疗LLM基准测试的可靠性仅取决于其显式假设

Naveen Raman, Santiago Cortes-Gomez, Mateo Dulce Rubio, Fei Fang, Bryan Wilder

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； New York University（纽约大学）

AI总结本文提出医疗LLM基准测试的评估-部署差距源于隐式假设，而非基准设计问题，并通过BenchmarkCards和分阶段评估方法来解决这一问题。

Comments 13 pages, 1 figure

详情

AI中文摘要

基准测试对于医疗评估是必要的，但不足以预测部署性能。我们的观点是，评估-部署差距并非源于基准设计不当，而是源于关于用户如何与模型交互的隐式假设，这些假设无法仅通过基准测试本身来揭示。为了使这一观点更明确，我们提出了将假设分为两类的分类：任务假设，可通过对话数据单独测试；以及结果假设，需要结果数据和行为研究来测试。关键的是，结果假设依赖于人类行为，即使设计良好的基准也无法直接观察。为了证明该框架的实用性，我们回顾性分析了一个医疗RCT作为案例研究，并发现差距自然分为大致相等的任务和结果差距。为此，我们做出了两项贡献：首先，我们提出BenchmarkCards，一种记录假设的工具；其次，我们提出分阶段评估，一种系统测试假设并评估性能的程序。

英文摘要

Benchmarks are necessary for healthcare evaluation, but are not sufficient for predicting deployment performance. Our position is that the evaluation--deployment gap arises not because of poorly designed benchmarks, but from implicit assumptions about how users interact with models that cannot be surfaced from benchmarks alone. To make this precise, we propose a classification of assumptions into two categories: task, which can be tested from conversation data alone, and outcome, which requires outcome data and behavioral studies for testing. Critically, outcome assumptions depend on human behavior, something that even well-designed benchmarks cannot directly observe. To demonstrate the operationality of this framework, we retrospectively analyze a healthcare RCT as a case study and find that the gap naturally separates into task and outcome gaps of roughly equal size. To address this, we make two contributions: first, we propose BenchmarkCards, an artifact that documents assumptions, and second, we propose staged evaluation, a procedure that systematically tests assumptions and evaluates performance.

URL PDF HTML ☆

赞 0 踩 0

2605.22604 2026-05-22 cs.CR cs.AI cs.LG cs.SE

Innovations in Cardless Artificial Intelligence Banking: A Comprehensive Framework for Cyber Secure and Fraud Mitigation using Machine Learning Algorithms

无卡人工智能银行业创新：基于机器学习算法的全面框架用于网络安全与欺诈防范

Md Israfeel

发表机构 * Computer Engineering, University of Central Florida, Orlando, Florida, USA

AI总结本文提出了一种全面的框架，利用机器学习算法增强无卡人工智能银行系统的网络安全和欺诈防范能力，通过AI驱动的数据加密生成虚拟卡，减少信息泄露风险。

详情

AI中文摘要

无卡人工智能（AI）银行业的发展标志着金融领域的一次范式转变，为用户提供前所未有的安全性和便利性。本文概述了一个全面的框架，旨在增强网络安全，引入自动生成的虚拟卡，并在无卡AI银行系统中减轻欺诈风险。该框架设想了一种未来银行架构，利用AI驱动的数据加密技术来创建安全的虚拟卡以实现无缝交易。通过强调安全的通信渠道，它确保了银行系统、持卡人和第三方供应商之间的金融活动的完整性。基于AI的授权方法在验证每一笔交易的同时，主动识别潜在欺诈，展示了该框架在加强无卡AI银行业安全方面的有效性。初始方法，包含一个AI驱动的基于特征的银行系统，确保生成带有加密数据的虚拟卡，减少信息暴露并降低欺诈风险。整合机器学习算法为潜在的欺诈活动增加了一层保护。最后，所提出的框架为无卡AI银行系统建立了一个全面的网络安全和欺诈防范范式。其实施使金融机构能够应对传统银行业相关的安全问题，为一个不仅抗欺诈而且对用户安全和方便的未来银行业景观铺平道路。

英文摘要

The advent of cardless artificial intelligence (AI) banking heralds a paradigm shift in the financial landscape, offering users unprecedented security and convenience. This paper outlines a comprehensive framework designed to enhance cybersecurity, introduce auto-generated virtual cards, and mitigate fraud risks within cardless AI banking systems. The framework envisions a future banking architecture that employs AI-powered data cryptography to create secure virtual cards for seamless transactions. By emphasizing secure communication channels, it ensures the integrity of financial activities among banking systems, cardholders, and third-party vendors. AI-based authorization methodologies play a pivotal role in authenticating each transaction while proactively identifying potential fraud, demonstrating the framework's efficacy in fortifying cardless AI banking security. The initial approach, featuring an AI-driven, feature-based banking system, ensures the generation of virtual cards with encrypted data, minimizing information exposure and reducing fraud risks. Integrating a machine learning algorithm adds an additional layer of protection against potential fraudulent activities. In conclusion, the proposed framework establishes a holistic cybersecurity and fraud-mitigation paradigm for cardless AI banking systems. Its implementation empowers financial institutions to address security concerns associated with traditional banking, paving the way for a future banking landscape that is not only fraud-resistant but also secure and convenient for users.

URL PDF HTML ☆

赞 0 踩 0

2605.22568 2026-05-22 cs.CR cs.AI

Measuring Security Without Fooling Ourselves: Why Benchmarking Agents Is Hard

在不欺骗自己的情况下衡量安全：为什么基准测试智能体是困难的

Sahar Abdelnabi, Chris Hicks, Konrad Rieck, Ahmad-Reza Sadeghi

发表机构 * ELLIS Institute Tübingen & MPI-IS & Tübingen AI Center（图宾根ELLIS研究所及MPI-IS与图宾根人工智能中心）； The Alan Turing Institute（艾伦·图灵研究所）； BIFOLD & Technische Universität Berlin（BIFOLD与柏林技术大学）； Technische Universität Darmstadt（达姆施塔特技术大学）

AI总结本文探讨了在安全关键角色中评估AI代理的基准测试存在的核心挑战，包括基准漏洞、时间滞后的不准确性以及运行时的不确定性，并提出了构建更可靠和可信评估框架的方向。

2605.22549 2026-05-22 stat.ML cs.LG

A Martingale Kernel Independence Test

一个鞅核独立性检验

Felix Laumann, Zhaolu Liu, Mauricio Barahona

发表机构 * Imperial College London（伦敦帝国学院）

AI总结本文提出两种学生化统计量，通过自归一化和半样本分割，实现了无需排列校准的独立性检验，显著提升了计算效率和测试性能。

详情

AI中文摘要

Hilbert-Schmidt Independence Criterion (HSIC) 及其联合独立性扩展 dHSIC 是退化 V 统计量，其数据依赖的加权 χ² 空间迫使排列校准，导致每测试成本乘以排列次数，实际中为两到三个数量级。通过将最近的鞅 MMD 构造应用于两样本检验到联合独立性问题，我们引入了两个学生化统计量，其空分布为标准正态分布，无论数据分布如何，因此单次正态分位数查找可完全替代排列步骤。第一个，mHSIC，是两个经验中心 Gram 矩阵的 Hadamard 积的自归一化下三角和。在独立性和有界四次矩核下，它收敛于标准正态分布。它对所有固定替代一致，且在样本量二次成本下运行，无需样本分割，与偏置 HSIC V 统计量匹配。第二个统计量 mdHSIC 通过单个半样本分割实现有限样本一致性：中心化估计在一半，下三角自归一化鞅在另一半运行，使条件均值残差缩成指数小量，因此在任意固定联合测试变量数下，统计量渐近标准正态分布，每测试成本仅与 d 线性增长。在合成数据中，输入维度从 1 到 500，联合测试变量从 2 到 10，两种统计量在运行速度上比排列校准基线快 25 到 60 倍，同时保持相同的经验 I 类错误率和测试功效。

英文摘要

The Hilbert-Schmidt Independence Criterion (HSIC) and its joint-independence extension $d\mathrm{HSIC}$ are degenerate $V$-statistics whose data-dependent weighted-$χ^2$ null limits force a permutation calibration that multiplies the per-test cost by the number of permutations, in practice two orders of magnitude. Adapting the recent martingale MMD construction for two-sample testing to the (joint) independence problem, we introduce two studentised statistics whose null distributions are standard normal regardless of the data law, so that a single normal-quantile lookup replaces the permutation step entirely. The first, $m\mathrm{HSIC}$, is a self-normalised lower-triangular sum of the Hadamard product of two empirically centred Gram matrices. Under independence and bounded-fourth-moment kernels it converges to a standard normal. It is consistent against every fixed alternative, and runs at quadratic cost in the sample size without any sample split, matching the biased HSIC $V$-statistic. Our second statistic, $md\mathrm{HSIC}$, achieves finite-sample consistency with a single half-sample split: the centring is estimated on one half and the lower-triangular self-normalised martingale is run on the other, shrinking the conditional-mean residual to a quantity that is exponentially small in $d$, so the statistic is asymptotically standard normal at every fixed number of jointly tested variables, with a per-test cost that grows only linearly in $d$. On synthetic data with per-variable input dimension from $1$ to $500$ and between $2$ and $10$ jointly tested variables, both statistics match the empirical type-I error rate and test power of permutation-calibrated baselines while running $25$ to $60\times$ faster.

URL PDF HTML ☆

赞 0 踩 0

2605.22540 2026-05-22 cs.CE cs.AI

Dynamic Hypergraph Representation Learning for Multivariate Time Series without Prior Knowledge

动态超图表示学习用于无先验知识的多变量时间序列

Marco Gregnanin, Johannes De Smedt, Giorgio Gnecco, Maurizio Parton

发表机构 * IMT School for Advanced Studies Lucca（IMT高级研究学院卢塞拉）； KU Leuven（根特大学）； University of Chieti-Pescara（切塞纳-皮斯卡拉大学）

AI总结本文提出了一种无需先验知识的多变量时间序列动态超图表示学习方法，通过社区检测和注意力机制构建超图，并利用动态超图注意力卷积网络进行预测。

详情

AI中文摘要

超图有能力捕捉跨不同领域的实体之间的高维关系，使其成为研究社区中理解和分析复杂系统结构和动态的热门话题。然而，一个关键挑战是在超图结构有限或不存在的情况下，从时间序列数据中推导出超图表示。在本研究中，我们提出了一种模型，通过应用社区检测到时间序列并利用注意力机制将所得社区转换为超图，从而为多变量时间序列构建动态超图表示。通过不同时间序列数据集推导出的超图，然后由动态超图注意力卷积网络（DHACN）用于多变量时间序列预测。本研究通过引入一种新的方法，推动了超图表示领域的发展，该方法更适合在无先验知识的情况下揭示高阶关系。

英文摘要

Hypergraphs have the capacity to capture higher-dimensional relationships among entities across various domains, making them a subject of growing interest within the research community for understanding the structure and dynamics of complex systems. However, a key challenge is the derivation of hypergraph representations from time series data in situations where the structure of the hypergraph is limited or absent. In this study, we propose a model that constructs a dynamic hypergraph representation for multivariate time series without relying on prior knowledge of the data. This is achieved by applying community detection to the time series and transforming the resulting communities, obtained through an attention mechanism, into a hypergraph using a clique-based technique. Hypergraph representations are derived from different time series datasets, and the resulting hypergraphs are then used by a Dynamic Hypergraph Attention Convolution Network (DHACN) for multivariate time series predictions. This research advances the field of hypergraph representation by introducing a novel approach that is better suited to uncover high-order relationships without prior knowledge.

URL PDF HTML ☆

赞 0 踩 0

2605.22506 2026-05-22 cs.CR cs.LG

EnCAgg: Enhanced Clustering Aggregation for Robust Federated Learning against Dynamic Model Poisoning

EnCAgg: 增强型聚类聚合用于对抗动态模型中毒的联邦学习

Tianyun Zhang, Zhen Yang, Haozhao Wang, Ru Zhang, Yongfeng Huang

发表机构 * School of Cyberspace Security, Beijing University of Posts and Telecommunications（信息安全部门，北京邮电大学）； School of Computer Science and Technology, Huazhong University of Science and Technology（计算机科学与技术学院，华中科技大学）； Department of Electronic Engineering, Tsinghua University（电子工程系，清华大学）

AI总结本文提出了一种新的鲁棒聚合方法，通过利用少量已知的良性客户端作为参考，准确识别和过滤恶意梯度，同时保留尽可能多的良性梯度，即使恶意客户端的数量未知且变化。方法包括密度基低维梯度聚类、增强聚类低维梯度生成模型和低维梯度重新聚类。

详情

AI中文摘要

联邦学习面临越来越多的模型中毒攻击威胁，这些攻击损害了其在提高隐私保护方面的应用。现有的防御方法通常依赖于固定的阈值或使用固定数量的聚类来进行区分恶意梯度和良性梯度。然而，这些方法难以适应恶意客户端的动态中毒策略，且由于客户端本地数据集的异质性，常常导致良性梯度的丢失。为了解决这些问题，我们提出了一种新的鲁棒聚合方法，该方法利用少量已知的良性客户端作为参考，能够准确识别和过滤恶意梯度，同时尽可能保留良性梯度，即使恶意客户端的数量未知且变化。首先，我们引入了一种基于密度的低维梯度聚类方法，将梯度投影到两个最分散的维度，并应用基于密度的聚类来识别恶意梯度，同时保留聚类中的良性梯度和可能的良性异常值。其次，我们设计了一种增强聚类低维梯度生成模型，该模型学习生成与良性簇边界对齐的伪梯度。这些伪梯度充当桥梁，连接稀疏的良性梯度异常值。第三，我们引入了低维梯度重新聚类，将生成的伪梯度与真实梯度一起聚类，以恢复被误分类为噪声点的良性梯度，使更多的良性梯度能够参与聚合。在MNIST、CIFAR-10和MIND数据集上的广泛实验表明，我们的方法在动态中毒场景下表现出卓越的保真度和鲁棒性。

英文摘要

Federated learning faces increasing threats from model poisoning attacks, which harms its application to improve privacy. Existing defense methods typically rely on fixed thresholds or perform clustering with a fixed number of clusters to distinguish malicious gradients from benign ones. However, these methods are difficult to adapt to dynamic poisoning strategies of malicious clients, and often result in the loss of benign gradients due to the heterogeneity of clients' local datasets. To address these problems, we propose a novel robust aggregation method that leverages a small number of known benign clients as references, enabling accurate identification and filtering of malicious gradients while retaining as many benign gradients as possible, even when the number of malicious clients is unknown and variable. First, we introduce a density-based low-dimensional gradient clustering method, which projects gradients onto the two most divergent dimensions and applies density-based clustering to identify malicious gradients while retaining clustered benign gradients and potentially benign outliers. Second, we design an enhancing clustering low-dimensional gradient generator model, which learns to generate pseudo-gradients aligned with the boundary of the benign cluster. These pseudo-gradients act as bridges to connect sparse benign gradient outliers. Third, we introduce low-dimensional gradient re-clustering that clusters the generated pseudo-gradients together with real gradients to recover benign gradients misclassified as noise points, enabling more benign gradients to participate in aggregation. Extensive experiments on the MNIST, CIFAR-10, and MIND datasets demonstrate that our method exhibits superior fidelity and robustness under dynamic poisoning scenarios.

URL PDF HTML ☆

赞 0 踩 0

2605.22463 2026-05-22 quant-ph cs.LG

Reinforcement learning for ion shuttling on trapped-ion quantum computers

基于受限离子量子计算机的离子穿梭强化学习

Maximilian Schier, Lea Richtmann, Christian Staufenbiel, Tobias Schmale, Daniel Borcherding, Michèle Heurs, Bodo Rosenhahn

发表机构 * Institute for Information Processing (tnt), L3S, Leibniz University Hannover, Germany ； Institute for Gravitational Physics, Leibniz University Hannover, Germany ； QUDORA Technologies GmbH ； Institute for Theoretical Physics, Leibniz University Hannover, Germany

AI总结本文提出利用强化学习优化受限离子量子计算机中的离子穿梭过程，通过直接交互学习策略，显著提高了离子穿梭效率，减少了36.3%的穿梭操作，并展示了方法在不同芯片架构中的广泛应用潜力。

Comments 15 pages + 9 pages supplementary material, 6 figures

详情

AI中文摘要

可扩展的受限离子量子计算通常通过具有不同功能区域的模块化芯片实现，如存储、状态准备和门执行。为了执行量子电路，离子必须在这些区域之间运输，这一过程称为离子穿梭。为了获得可靠计算结果，必须优化穿梭过程。然而，随着离子数量的增加，这一过程成为高维优化问题，最优解无法高效计算。本文首次将强化学习（RL）应用于离子穿梭的优化，RL适用于此类场景，因为它能够通过直接与问题交互学习策略。我们证明我们的RL方法优于当前最先进的启发式技术，减少了多达36.3%的穿梭操作。此外，我们展示了该方法可以轻松应用于各种芯片架构。我们的方法为研究芯片设计中的穿梭效率提供了灵活的工具，因此对于未来更复杂的架构具有高度相关性。

英文摘要

Scalable trapped-ion quantum computing is commonly realized with modular chips that feature distinct zones with specific functionalities, such as storage, state preparation, and gate execution. To execute a quantum circuit, the ions must be transported between these zones. This process is called ion shuttling. To achieve reliable computation results, the shuttling process must be optimized. However, as the number of ions increases, this becomes a high-dimensional optimization problem where optimal solutions cannot be computed efficiently. We demonstrate, to the best of our knowledge, the first use of reinforcement learning (RL) for the optimization of ion shuttling. RL is well-suited for such scenarios, as it enables learning a strategy through direct interaction with the problem. We show that our RL approach outperforms current state-of-the-art heuristic techniques, yielding a reduction in shuttling operations of up to 36.3 %. Furthermore, we show that our method is easily applicable to various chip architectures. Our approach offers a versatile method to study shuttling efficiency during chip design and, therefore, a highly relevant tool for future, more complex architectures.

URL PDF HTML ☆

赞 0 踩 0

2605.22441 2026-05-22 cs.CR cs.AI

A Constant-Time Implementation Methodology for Activation Functions on Microcontrollers

在微控制器上实现激活函数的常数时间方法

Andrii Tyvodar, Andreas Rechberger, Dirmanto Jap, Shivam Bhasin, Bernhard Jungk, Jakub Breier, Xiaolu Hou

发表机构 * Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava（信息与信息技术学院，布拉格斯拉夫技术大学）； State Key Laboratory of Blockchain and Data Security, Zhejiang University（区块链与数据安全国家重点实验室，浙江大学）； Temasek Laboratories, Nanyang Technological University（淡马锡实验室，南洋理工大学）； Faculty of Computer Science, Albstadt-Sigmaringen University（计算机科学学院，阿尔布斯塔-西格马林根大学）； TTControl GmbH

AI总结本文提出了一种在嵌入式微控制器上实现激活函数的常数时间方法，通过结合无分支选择、固定成本Padé近似、必要的虚拟算术和周期对齐，实现了定时规律的激活函数实现，并验证了其在ReLU、sigmoid、tanh、GELU和Swish函数上的有效性。

详情

AI中文摘要

嵌入式神经网络推断可能通过定时侧信道泄露信息，包括由激活函数评估引起的泄露。本文提出了一种在嵌入式微控制器上实现激活函数的常数时间方法，并在ARM Cortex-M4平台上验证了ReLU、sigmoid、tanh、GELU和Swish函数。所提出的方法结合了无分支选择、固定成本Padé近似、必要的虚拟算术和周期对齐，以获得定时规律的激活函数实现。作为动机，我们评估了一种基于脱同步的防护措施，并展示了其仍易受基于模板的定时攻击攻击。实验结果表明，所得到的受保护实现对于所有测试输入具有相同的周期数，包括三函数设置下的88个周期和五函数设置下的108个周期。同时，数值误差分析表明，近似的非线性函数保留了高精度。这些结果表明，所提出的方法为构建在嵌入式推断中抗侧信道攻击的激活函数提供了实用基础。

英文摘要

Embedded neural-network inference can leak information through timing side channels, including leakage caused by the evaluation of activation functions. This work proposes a constant-time implementation methodology for activation functions on embedded microcontrollers and validates it on ReLU, sigmoid, tanh, GELU, and Swish on an ARM Cortex-M4 platform. The proposed methodology combines branchless selection, fixed-cost Padé-based approximation, dummy arithmetic where needed, and cycle alignment to obtain timing-regular activation-function implementations. As motivation, we also evaluate a desynchronization-based countermeasure and show that it remains vulnerable to a template-based timing attack. Experimental results show that the resulting protected implementations achieve identical cycle counts for all tested inputs, including (88) cycles in the three-function setting and (108) cycles in the five-function setting. At the same time, the numerical-error analysis indicates that the approximated nonlinear functions retain high accuracy. These results suggest that the proposed methodology provides a practical basis for constructing side-channel-resistant activation functions in embedded inference.

URL PDF HTML ☆

赞 0 踩 0

2605.22438 2026-05-22 stat.ML cs.GT cs.LG

Do Not Trust The Auctioneer: Learning to Bid in Feedback-Manipulated Auctions

不要相信拍卖师：在反馈操纵拍卖中学习出价

Luigi Foscari, Matilde Tullii, Vianney Perchet

发表机构 * Università degli Studi di Milano（米兰大学）； Crest-Ensae（Ensae研究中心）； IP Paris（巴黎研究所）； CRITEO AI Team（CRITEO人工智能团队）

AI总结研究在反馈操纵拍卖中学习出价的问题，提出一种结合鲁棒区间消除分支和乐观分支的算法，以应对反馈操纵带来的挑战，并在单活跃区域情况下提供匹配下界。

详情

AI中文摘要

Shilling是指通过人工出价使竞争看起来更激烈以推高价格。我们研究了重复的第一价格拍卖，在其中shilling影响反馈但不影响分配：学习者在真实竞争出价中获胜或失败，但在失败后观察到真实出价和一个独立的shill出价的最大值。这种操纵改变了学习者所观察到的内容，从而影响其学习出价的方式，而不会改变当前拍卖的结果。我们分析了与最佳出价基准相比的遗憾，假设shill-bid分布已知。即使如此，shilling仍可能掩盖真实出价，而有用的侧信息仅通过间歇性低shill事件出现。我们的算法结合了一个鲁棒的区间消除分支，该分支忽略shilled报告并达到动态定价率$ ilde{\mathcal{O}}(T^{2/3})$，以及一个乐观分支，该分支去偏失败侧报告并利用其在可靠时的结果信息，达到第一价格拍卖的速率$ ilde{\mathcal{O}}(\sqrt{T})$。一个验证和竞赛过程让算法在不知道正确尺度或反馈几何学的情况下使用这些乐观更新。我们用单活跃区域情况下的匹配下界补充了上界，除了对数因子外。总体而言，结果表明，即使只有反馈的shilling也能显著改变重复出价的统计难度。

英文摘要

Shilling is the use of artificial bids to make competition appear stronger and push prices upward. We study repeated first-price auctions in which shilling affects feedback but not allocation: the learner wins or loses against the real competing bid, but after a loss observes the maximum of the real bid and an independent shill bid. Thus the manipulation changes what the learner observes and hence how it learns to bid, without changing the outcome of the current auction. We analyze regret with respect to the best bid benchmark, assuming that the shill-bid distribution is known. Even then, shilling can mask the real bid, while useful side information appears only through intermittent low-shill events. Our algorithm combines a robust interval-elimination branch, which ignores the shilled report and achieves the dynamic-pricing rate $\tilde{\mathcal{O}}(T^{2/3})$, with an optimistic branch that debiases losing-side reports and exploits the resulting suffix information when it is reliable and achieves the first-price auctions rate $\tilde{\mathcal{O}}(\sqrt{T})$. A validation and racing procedure lets the algorithm use these optimistic updates without knowing the right scale or feedback geometry in advance. We complement the upper bounds with a matching lower bound, up to logarithmic factors, in the single-active-region case. Overall, the results show that even feedback-only shilling can sharply alter the statistical difficulty of repeated bidding.

URL PDF HTML ☆

赞 0 踩 0

2605.22437 2026-05-22 cs.CR cs.AI cs.LG

Characterizing the Fault Response of the Intel Neural Compute Stick 2 Under Single-Pulse Electromagnetic Fault Injection

对Intel神经计算Stick 2在单脉冲电磁故障注入下的故障响应进行表征

Štefan Kučerák, Jakub Breier, Xiaolu Hou

发表机构 * Faculty of Informatics and Information Technologies, Slovak University of Technology（信息与信息技术学院，斯洛伐克技术大学）； State Key Laboratory of Blockchain and Data Security（区块链与数据安全国家重点实验室）； TTControl GmbH

AI总结本文研究了Intel神经计算Stick 2在单脉冲电磁故障注入下的故障响应，通过系统性的测试发现四种可重复的故障类别，并探讨了针对这些故障类别的缓解策略。

详情

AI中文摘要

视觉处理单元和其他商业神经网络推断加速器越来越多地应用于安全相关的边缘应用，但它们在瞬态硬件干扰下的故障响应在开放文献中仍然缺乏表征。对于Intel Movidius Myriad X，封装为Intel神经计算Stick 2（NCS2），只有单篇可行性研究已发表。我们报告了一项系统性的单脉冲电磁故障注入（EMFI）测试，该测试在运行三个ImageNet训练的卷积神经网络（ResNet-18、ResNet-50、VGG-11）的OpenVINO运行时上进行。在1,536次热点测试和约16,000次参数搜索测试中，单脉冲产生四种可重复的故障类别：无测量精度变化、轻微的静默数据破坏、主要的持续退化，该退化在后续推断中持续直到模型重新加载，以及需要USB电源循环的设备挂起；这些结果分别解释为无影响、SDC可能带有类似SET或小的持久状态机制、SEU-like持续破坏，以及SEFI-like功能丧失。两个发现是核心。首先，主要退化类别可以在18-31%的测试中诱导，其中崩溃后的top-1精度低于5%，在所有后续推断中持续直到显式模型重新加载 - 这一状态没有任何推断API级别的机制可以检测。第二，这一状态也可以通过向空闲设备发送脉冲来诱导，表明仅靠加载时的完整性检查是不够的。我们讨论了按类别分级的缓解策略，重点是可以在应用级别实现的机制，而无需修改设备固件或OpenVINO运行时。

英文摘要

Vision processing units and other commercial neural-network inference accelerators are increasingly deployed in safety-relevant edge applications, but their fault response under transient hardware disturbances remains poorly characterized in the open literature. For the Intel Movidius Myriad X, packaged as the Intel Neural Compute Stick 2 (NCS2), only a single feasibility study has been published. We report a systematic single-pulse electromagnetic fault injection (EMFI) campaign on the NCS2 running three ImageNet-trained convolutional neural networks (ResNet-18, ResNet-50, VGG-11) on the OpenVINO runtime. Across 1,536 spot-test trials at characterized hotspots and approximately 16,000 parameter-search trials, single pulses produce four reproducible outcome classes: no measured accuracy change, minor silent data corruption, major persistent degradation that survives across subsequent inferences until model reload, and device hangs requiring USB power-cycling; these outcomes are respectively interpreted as no-effect, SDC with possible SET-like or small persistent-state mechanisms, SEU-like persistent corruption, and SEFI-like loss of functionality. Two findings are central. First, the major-degradation class can be induced at 18-31% of trials at characterized hotspots, with post-collapse top-1 accuracy below five percent and persistence across all subsequent inferences until explicit model reload - a regime that no inference-API-level mechanism detects. Second, this regime is also inducible by pulses delivered to an idle device with the model already loaded, demonstrating that load-time integrity checks alone are insufficient. We discuss mitigation strategies graded by class, focusing on mechanisms implementable at the application level without modification to the device firmware or the OpenVINO runtime.

URL PDF HTML ☆

赞 0 踩 0

2605.22425 2026-05-22 eess.IV cs.CV

Time-varying rPPG signal separation via block-sparse signal model

基于块稀疏信号模型的时变rPPG信号分离

Kosuke Kurihara, Yoshihiro Maeda, Daisuke Sugimura, Takayuki Hamamoto

发表机构 * Tokyo University of Science（东京科学大学）； Shibaura Institute of Technology（Shibaura工学院）； Tokyo Metropolitan University（东京 Metropolitan 大学）

AI总结本文提出了一种利用rPPG信号近似周期特性进行信号提取的方法，通过构建时变信号分离框架，在光照变化下实现适应性信号分离，实验验证了方法的有效性。

Comments Accepted by IEEE International Conference on Image Processing (ICIP 2026)

2605.22379 2026-05-22 cs.HC cs.AI cs.LG

Cross-Subject EEG Emotion Recognition Based on Temporal Asynchronous Alignment Contrastive Learning

基于时间异步对齐对比学习的跨受体EEG情绪识别

Ying Xie, Yi Zheng, Zehui Xiao, Wenkai Lu, Mengting Liu

发表机构 * School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University（中山大学生物医学工程学院深圳校区）； School of Computer Science and Technology, Tianjin University（天津大学计算机科学与技术学院）

AI总结本文提出了一种基于时间异步对齐对比学习（TA2CL）的框架，用于解决跨受体EEG情绪识别中由于不同受体响应时间不一致导致的识别问题，通过改进相似性计算策略，提升模型对跨受体差异和时间延迟的鲁棒性。

Comments 16 pages, 7 figures

详情

AI中文摘要

随着科技的发展，情绪研究的重要性日益凸显。近年来，基于脑电图（EEG）的情绪识别已成为一个活跃的研究领域，因其客观性和高时间分辨率。然而，大多数现有方法侧重于优化编码器结构以增强特征提取能力，而对相似性计算策略关注较少，特别是忽略了不同受体之间响应的潜在时间不一致问题。为了解决这些不足，本文受ColBERT在自然语言处理（NLP）中的晚期交互机制启发，提出了一种基于时间异步对齐的对比学习（TA2CL）框架。该方法将传统的全局

英文摘要

With the advancement of science and technology, the importance of emotion research has become increasingly evident. Electroencephalography (EEG)-based emotion recognition has emerged as an active research area in recent years, owing to its objectivity and high temporal resolution. However, most existing methods focus on optimizing encoder structures to enhance feature extraction capabilities, while paying relatively little attention to similarity calculation strategies, particularly overlooking the potential temporal misalignment of responses among different subjects. To address these shortcomings, this paper draws inspiration from the late interaction mechanism of ColBERT in natural language processing (NLP) and proposes a Temporal Asynchronous Alignment-based Contrastive Learning (TA2CL) framework. This method transforms the traditional global "hard alignment" similarity calculation approach into a fine-grained local matching mechanism, enabling the model to adaptively search for and align "locally highly correlated" segments between two EEG signals, thereby effectively mitigating the effects of inter-subject differences and temporal delays. Experimental results demonstrate that the proposed method achieves strong performance across multiple public datasets. Specifically, on the FACED dataset, it achieves an accuracy of 64.5% for the nine-class classification task and 79.5% for the binary classification task, while on the SEED and SEED-V datasets, it achieves accuracies of 86.4% and 70.1%, respectively, validating the method's effectiveness and generalization capability.

URL PDF HTML ☆

赞 0 踩 0

2605.22363 2026-05-22 math.OC cs.AI cs.GT

Incentive-Aligned Vehicle-to-Vehicle Energy Trading via Nash-Integrated Multi-Agent Reinforcement Learning

通过纳什整合多智能体强化学习实现激励对齐的车对车能源交易

Yujin Lin, Yue Yang, Hao Wang

发表机构 * Department of Data Science and AI, Faculty of IT, Monash University, Australia（数据科学与人工智能系，信息科技学院，墨尔本大学，澳大利亚）； Monash Energy Institute, Monash University, Australia（墨尔本能源研究所，墨尔本大学，澳大利亚）

AI总结本文提出一种基于纳什博弈解的多智能体深度确定性策略梯度（Nash-MADDPG）方法，用于车对车能源交易中的激励对齐，提升了社会福利和交易量，并在公平性方面取得了显著改进。

Comments The 24th IEEE International Conference on Industrial Informatics, 2026

详情

AI中文摘要

车对车（V2V）能源交易允许电动车辆（EVs）之间进行去中心化的点对点能源交换，从而减少对电网的依赖并利用剩余容量获取收益。然而，协调具有不同充电需求和不确定到达-离开时间表的自利EV代理仍然具有挑战性。现有方法要么需要集中优化但计算受限，要么缺乏公平性保障。本文将纳什博弈解整合到多智能体深度确定性策略梯度中，即纳什-MADDPG，用于激励对齐的V2V能源交易。纳什博弈确定高效的双方面定价，而纳什引导的价格接近性奖励使代理学习朝着博弈最优策略方向发展。在30天连续运行的评估中，与双重拍卖相比，社会福利提高了61.6%，交易量提高了62.9%，同时实现了更高的公平性，如贾恩指数提高了40.1%。在6-100个代理跨越30天的时间范围内进行测试，连续车辆周转确认了在种群规模上的可扩展性和在纳什博弈基准附近的经验稳定价格。

英文摘要

Vehicle-to-vehicle (V2V) energy trading enables decentralized peer-to-peer energy exchange among electric vehicles (EVs), reducing grid dependency while monetizing surplus capacity. However, coordinating self-interested EV agents with diverse charging needs and uncertain arrival-departure schedules remains challenging. Existing approaches either require centralized optimization with computational limitations or lack fairness guarantees. This paper integrates Nash Bargaining Solution into Multi-Agent Deep Deterministic Policy Gradient, namely Nash-MADDPG, for incentive-aligned V2V energy trading. Nash bargaining determines efficient bilateral pricing, while Nash-guided price proximity rewards align agent learning toward bargaining-optimal strategies. Evaluation over 30-day continuous operation demonstrates an improvement of 61.6% in social welfare and 62.9% improvement in trading volume over Double Auction, while achieving superior fairness, such as 40.1% improvement in Jain's index. Testing across 6-100 agents over a 30-day horizon with continuous vehicle turnover confirms scalability across population size and empirically stable pricing near the Nash Bargaining benchmark.

URL PDF HTML ☆

赞 0 踩 0

2605.22343 2026-05-22 cs.MA cs.AI cs.SE

Sibyl-AutoResearch: Autonomous Research Needs Self-Evolving Trial-and-Error Harnesses, Not Paper Generators

Sibyl-AutoResearch：自主研究需要自我进化的试验与错误机制，而非论文生成器

Chengcheng Wang, Qinhua Xie, Wei He, Jianyuan Guo, Shiqi Wang, Chang Xu

发表机构 * University of Sydney（悉尼大学）； East China Normal University（华东师范大学）； TokenRhythm AI ； City University of Hong Kong（香港城市大学）

AI总结本文提出Sibyl-AutoResearch框架，通过自我进化的方法改进自主研究系统，解决现有系统在试验经验积累方面的不足，通过可审计的转换单元实现试验到行为和试验到机制行为的转换，从而提高自主研究系统的可靠性。

详情

AI中文摘要

自主研究系统日益使科学工作流程可执行：代理可以提出想法、运行代码、检查结果并起草论文。但可执行的工作流程本身并不产生研究判断。我们分析了当前系统在试验经验积累方面的不足：弱证据变成散文，试点信号变成广泛声明，记忆保持文本，重复的过程失败不改变后来的行为。我们引入Sibyl-AutoResearch，一个自我进化的AutoResearch框架，围绕科学试验与错误机制构建。一个机制让代理运行有界试验，保存积极和消极结果，并将教训路由到后来的规划、验证、声明范围、调度、批评、写作和机制修复。我们通过两个可审计的转换单元正式化这一过程：试验到行为转换，将试验信号链接到后来的研究行动，以及试验到机制行为转换，将重复的过程失败链接到系统更新。我们实现了该框架在SIBYL中，这是一个基于文件的自主研究系统，暴露了状态、角色、记忆、门、和制品痕迹所需以检查这些转换路径。回顾性审计识别出八个高置信度的转换事件，中位延迟为一个迭代，最大延迟为三个迭代。一个恢复失败注册表进一步展示了如何通过五个自然发生的失败类别，包括重复结果、过时数字和不支持的统计数据，被阻止、降级或路由到后来的修复。这些痕迹不建立比较性能的主张；它们表明所提出的转换单元可以从现实的自主研究工作空间中恢复。SIBYL框架和系统可在https://github.com/Sibyl-Research-Team/AutoResearch-SibylSystem上获得。

英文摘要

Autonomous research systems increasingly make the scientific workflow executable: agents can propose ideas, run code, inspect results, and draft papers. But executable workflows do not by themselves produce research judgment. We analyze where current systems lose trial experience: weak evidence becomes prose, pilot signals become broad claims, memory remains textual, and recurring process failures do not change later behavior. We introduce Sibyl-AutoResearch, a self-evolving AutoResearch framework built around Scientific Trial-and-Error Harnesses. A harness lets agents run bounded trials, preserve positive and negative outcomes, and route lessons into later planning, validation, claim scope, scheduling, critique, writing, and harness repair. We formalize this through two auditable conversion units: trial-to-behavior conversion, which links trial signals to later research actions, and trial-to-harness-behavior conversion, which links recurring process failures to system updates. We implement the framework in SIBYL, a file-backed autonomous research system that exposes the state, roles, memory, gates, and artifact traces needed to inspect these conversion paths. A retrospective audit identifies eight high-confidence conversion events, with a median latency of one iteration and a maximum latency of three iterations. A recovered-failure registry further shows how five naturally occurring failure classes, including duplicate results, stale numbers, and unsupported statistics, were blocked, downgraded, or routed into later repair. These traces do not establish a comparative performance claim; they show that the proposed conversion units are recoverable from realistic autonomous-research workspaces. The SIBYL framework and system are available at https://github.com/Sibyl-Research-Team/AutoResearch-SibylSystem.

URL PDF HTML ☆

赞 0 踩 0

2605.22321 2026-05-22 cs.CR cs.AI cs.SE

Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions

对时间、空间和语义规避的自主代理进行基准测试

Jianan Ma, Xiaohu Du, Ruixiao Lin, Yaoxiang Bian, Jialuo Chen, Jingyi Wang, Xiaofang Yang, Shiwen Cui, Changhua Meng, Xinhao Deng, Zhen Wang

发表机构 * Tsinghua University（清华大学）

AI总结本文提出了一种针对基于大语言模型（LLM）的代理系统的多维规避框架，通过引入时间、空间和语义三种隐蔽攻击向量，系统地量化了这些威胁，并展示了其在实际威胁场景中的效果，揭示了现有自主代理系统在架构层面的系统性漏洞。

Comments 21 pages, 9 figures, 7 tables. Code and data available at https://github.com/antgroup/Agent3Sigma-Stage

详情

AI中文摘要

随着自主代理（例如OpenClaw）越来越多地利用深度系统级权限执行复杂任务，它们引入了严重的、未缓解的安全风险。当前的漏洞分析大多集中在单轮、无状态行为上，忽略了状态ful、多轮交互和动态工具调用中扩大的攻击面。在本文中，我们提出了一种新的、多维的规避框架，针对基于LLM的代理系统。我们引入了三种隐蔽的攻击向量：（1）时间规避，将恶意负载碎片化地分布在连续的交互轮次中；（2）空间规避，将负载隐藏在复杂的外部 artifacts 中，以逃避标准LLM解析机制；（3）语义规避，将恶意意图隐藏在良性上下文噪声之下。为了系统地量化这些威胁，我们构建了A3S-Bench，一个包含2,254个真实世界代理执行轨迹的综合基准。评估一个标准代理框架，分别与10个主流LLM后端整合，针对20个实际威胁场景，我们展示了我们的规避框架将平均风险触发率从28.3%的基准提升到52.6%。这些发现揭示了当前自主代理系统在架构层面的系统性漏洞，现有防御措施无法解决这些问题，突显了需要针对这些独特威胁设计的防御机制的紧迫性。

英文摘要

As autonomous agents (e.g., OpenClaw) increasingly operate with deep system-level privileges to execute complex tasks, they introduce severe, unmitigated security risks. Current vulnerability analyses overwhelmingly focus on single-turn, stateless behaviors, overlooking the expanded attack surface inherent in stateful, multi-turn interactions and dynamic tool invocations. In this paper, we propose a novel, multi-dimensional evasion framework targeting LLM-based agent systems. We introduce three stealthy attack vectors: (1) Temporal evasion, which fragments malicious payloads across sequential interaction turns; (2) Spatial evasion, which conceals payloads within complex external artifacts that evade standard LLM parsing mechanisms; and (3) Semantic evasion, which obscures malicious intents beneath benign contextual noise. To systematically quantify these threats, we construct A3S-Bench, a comprehensive benchmark comprising 2,254 real-world agent execution trajectories. Evaluating a standard agent framework separately integrated with 10 mainstream LLM backbones against 20 practical threat scenarios, we demonstrate that our evasion framework elevates the average risk trigger rate from a 28.3\% baseline to 52.6\%. These findings reveal systemic, architecture-level vulnerabilities in current autonomous agent systems that existing defenses fail to address, highlighting an urgent need for defense mechanisms tailored to the unique threats.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Internal narratives parameterise affective states

Deeply Learned Robust Matrix Completion for Large-scale Low-rank Data Recovery

Deep learning-based modularized loading protocol for parameter estimation of Bouc-Wen class models

A Mechanistic Explanatory Strategy for XAI

ImProver: Agent-Based Automated Proof Optimization

Enhancing Instructional Quality: Leveraging Computer-Assisted Textual Analysis to Generate In-Depth Insights from Educational Artifacts

Enhancing Causal Reasoning in Large Language Models: A Causal Attribution Model for Precision Fine-Tuning

Improved DDIM Sampling with Moment Matching Gaussian Mixtures

TreeDQN: Sample-Efficient Off-Policy Reinforcement Learning for Combinatorial Optimization

FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection

Optimization over the intersection of manifolds

TriSweep: A Four-Drone Swarm Framework for Electromagnetic Side-Channel Analysis

Holographic functions and neural networks

The Secretary Problem with a Stochastic Precursor

UNAD+: An Explainable Hybrid Framework for Unknown Network Attack Detection

Healthcare LLM Benchmarks Are Only as Good as Their Explicit Assumptions

Innovations in Cardless Artificial Intelligence Banking: A Comprehensive Framework for Cyber Secure and Fraud Mitigation using Machine Learning Algorithms

Measuring Security Without Fooling Ourselves: Why Benchmarking Agents Is Hard

A Martingale Kernel Independence Test

Dynamic Hypergraph Representation Learning for Multivariate Time Series without Prior Knowledge

EnCAgg: Enhanced Clustering Aggregation for Robust Federated Learning against Dynamic Model Poisoning

Reinforcement learning for ion shuttling on trapped-ion quantum computers

A Constant-Time Implementation Methodology for Activation Functions on Microcontrollers

Do Not Trust The Auctioneer: Learning to Bid in Feedback-Manipulated Auctions

Characterizing the Fault Response of the Intel Neural Compute Stick 2 Under Single-Pulse Electromagnetic Fault Injection

Time-varying rPPG signal separation via block-sparse signal model

Cross-Subject EEG Emotion Recognition Based on Temporal Asynchronous Alignment Contrastive Learning

Incentive-Aligned Vehicle-to-Vehicle Energy Trading via Nash-Integrated Multi-Agent Reinforcement Learning

Sibyl-AutoResearch: Autonomous Research Needs Self-Evolving Trial-and-Error Harnesses, Not Paper Generators

Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions