arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1946
2605.21707 2026-05-22 cs.CE cs.LG

Zero-shot adaptation to order book dynamics

面向订单簿动态的零样本适应

Arip Asadulaev

AI总结 本文提出了一种适应性市场做市架构,保留了Avellaneda-Stoikov框架的分析结构,同时引入了继任者度量式适应机制,通过分离市场动态与交易目标,实现对变化市场制度和交易目标的适应。

详情
AI中文摘要

我们描述了一种自适应市场做市架构,该架构保留了Avellaneda-Stoikov框架的分析结构,同时引入了继任者度量式适应机制。在本文中,我们保持Avellaneda-Stoikov快速Hamilton-Jacobi-Bellman结构,并使其适应于变化的市场制度和交易目标。核心思想是将市场动态与交易目标分离。市场状态确定一组低维的Avellaneda-Stoikov参数集,而近期实现的奖励确定一个低维的目标向量。HJB前向映射则通过未来奖励特征的标量化,将此目标转换为最优的买方和卖方报价。

英文摘要

We describe an adaptive market-making architecture that preserves the analytical structure of the Avellaneda--Stoikov framework while introducing a successor measure-style adaptation mechanism. In our paper we keep Avellaneda--Stoikov fast Hamilton--Jacobi--Bellman structure and make it adaptive to changing market regimes and trading objectives. The central idea is to separate market dynamics from the trading objective. The market state determines a low-dimensional set of Avellaneda--Stoikov parameters, while recent realized rewards determine a low-dimensional objective vector. The HJB forward map then converts this objective into optimal bid and ask quotes through a scalarization of future reward features.

2605.21694 2026-05-22 cs.CR cs.AI

PocketAgents: A Manifest-Driven Library of Autonomous Defense Agents

PocketAgents: 一种基于manifest的自主防御代理库

Sidnei Barbieri, Ágney Lopes Roth Ferraz, Lourenço Alves Pereira Júnior

AI总结 本文提出PocketAgents,一种基于manifest的自主防御代理库,通过定义代理的manifest、prompt和运行时上下文,实现了对大型语言模型驱动的防御系统进行可测量、可扩展和可追溯的防御。

详情
AI中文摘要

将大型语言模型(LLMs)连接到防御执行需要的不仅仅是询问模型攻击是否正在发生。防御方必须决定哪些模型输出可能改变系统状态,哪些输出必须被拒绝,以及如何记录失败。我们提出了PocketAgents,一种基于manifest的自主防御代理库。每个代理安装为三个数据文件:manifest、prompt和运行时上下文。共享的运行时提供代理受限制的遥测访问,并只接受具有请求动作出现在manifest中的类型化报告。我们基于网络沙箱(Perry)和网络欺骗测试床实现了PocketAgents,并在18次闭合回路试验中评估了两个代理,命令与控制和数据外泄,在DarkSide启发的攻击测试中对小型企业拓扑进行评估。13次试验产生了验证的网络阻断动作并遏制了攻击;4次失败了模式验证;1次产生了有效的无操作决定。实验表明,类型化的边界使LLM驱动的防御变得可测量、可扩展和可追溯。

英文摘要

Connecting large language models (LLMs) to defensive enforcement requires more than asking a model whether an attack is happening. A defender must decide which model outputs may change the system state, which outputs must be rejected, and how failures should be recorded. We present PocketAgents, a manifest-driven library of autonomous defense agents. Each agent is installed as three data files: a manifest, a prompt, and a runtime context. The shared runtime gives the agent bounded telemetry access and accepts only typed reports whose requested action appears in the manifest. We implemented PocketAgents on top of a cyber arena (Perry), a cyber-deception testbed, and evaluated two agents, Command and Control and Exfiltration, in 18 closed-loop trials of a DarkSide-inspired attack on a small enterprise topology. Thirteen trials produced validated network-block actions and contained the attack; four failed schema validation; one produced a valid no-action decision. The experiments show that a typed boundary makes LLM-driven defense measurable, extensible, and attributable.

2605.21671 2026-05-22 eess.IV cs.CV

HyperBench: Standardizing and Scaling Synthetic Evaluation for Hyperspectral Super-Resolution

HyperBench: 标准化和扩展超光谱超分辨率的合成评估

Ritik Shah, Marco F. Duarte

AI总结 本文提出HyperBench框架,通过标准化和扩展合成实验来评估超光谱超分辨率方法,以解决现有评估方法中配置不一致、结果难以比较和复现的问题。

详情
AI中文摘要

超光谱超分辨率(HSR)通过融合低分辨率超光谱图像(LR-HSI)和高分辨率多光谱图像(HR-MSI)来重建高空间分辨率的超光谱图像。在缺乏真实世界配对数据的情况下,HSR方法几乎 exclusively 评估于通过Wald协议从超光谱数据集中衍生的合成实验中。尽管该协议被广泛采用,但其实际实施在不同研究工作中差异显著,通常依赖于单一(通常是高斯)或非常少的点扩散函数(PSFs),一个或两个光谱响应函数(SRFs),以及少量的空间下采样因子。因此,报告的性能指标在文献中难以比较,且往往难以复现;此外,它们可能无法在现实传感条件下推广。我们引入HyperBench,一个统一且可扩展的框架,用于标准化HSR的合成实验。HyperBench支持跨度十个PSFs、四个源自操作多光谱传感器的SRFs、可配置的空间下采样因子以及匹配的加性白高斯噪声;其目标是自动化大规模评估和结构化日志记录。通过将模型开发与实验设计解耦,该框架使可复现、公平的跨方法比较成为可能,且摩擦最小。我们使用HyperBench在四个广泛使用的超光谱场景上对六种最近提出的HSR方法进行了70种配置的评估,并观察到方法间PSNR的差异从最简单的PSF上的约5 dB扩大到最困难的PSF上的超过13 dB——这种脆弱性在现有的单配置评估协议中是结构上不可见的。HyperBench代码可在https://github.com/ritikgshah/HyperBench上获取。

英文摘要

Hyperspectral super-resolution (HSR) reconstructs a high-spatial-resolution hyperspectral image by fusing a low-resolution hyperspectral image (LR-HSI) with a high-resolution multispectral image (HR-MSI). In the absence of real-world paired data, HSR methods are evaluated almost exclusively on synthetic experiments derived from hyperspectral datasets through Wald's protocol. Despite the protocol's widespread adoption, its practical implementation varies markedly across research works, typically relying on a single (usually Gaussian) or very few point spread functions (PSFs), one or two spectral response functions (SRFs), and a couple of spatial downsampling factors. As a result, reported performance figures are difficult to compare across the literature, in addition to being often difficult to reproduce; furthermore, they may not generalize across realistic sensing conditions. We introduce HyperBench, a unified and extensible framework that standardizes synthetic experimentation for HSR. HyperBench supports diverse degradation configurations spanning ten PSFs, four SRFs derived from operational multispectral sensors, configurable spatial downsampling factors, and matched additive white Gaussian noise; its goal is to automate large-scale evaluation and structured logging. By decoupling model development from experimental design, the framework enables reproducible, apples-to-apples cross-method comparison with minimal friction. We use HyperBench to evaluate six recently proposed HSR methods across a 70-configuration sweep on four widely used hyperspectral scenes and observe that the inter-method PSNR spread widens from approximately 5 dB on the easiest PSF to over 13 dB on the hardest - a fragility that is structurally invisible to the prevailing single-configuration evaluation protocol. HyperBench code is available at https://github.com/ritikgshah/HyperBench .

2605.21665 2026-05-22 cs.MA cs.AI

Planning, Scheduling, and Behavior in EV Charging Systems: A Critical Survey and Trilemma Framework

电动汽车充电系统中的规划、调度与行为:一项批判性综述与三重困境框架

Peiyan Xiao, Yuheng Li, Ayan Mukhopadhyay, Sai Krishna Ghanta, Sabur Baidya, Yanhai Xiong

AI总结 本文综述了电动汽车充电系统中规划、调度和行为三个层面的研究,提出了三重困境框架,揭示了在追求高保真度时面临的可计算性与现实整合之间的权衡问题。

Comments Review article; 56 pages excluding references; 1 figure and 3 tables

详情
AI中文摘要

电动汽车的快速增长正在将交通电气化的主要约束从车辆普及转移到充电基础设施的部署和运行。充电网络设计需要在三个相互依赖的层面做出决策:规划,决定在哪里和建设多少基础设施;调度,管理充电调度、定价和电网交互;以及行为,捕捉用户如何选择站点、充电时间和持续时间。现有研究在每个层面都有显著进展,但文献仍然碎片化,跨层交互往往通过简化假设来处理。本文开发了一个三层规划-调度-行为(PSB)框架,根据决策时间跨度、主体目标和耦合结构来组织电动汽车充电研究。我们进一步识别了一个保真度-可计算性权衡,称为PSB三重困境:每个层面单独来看都是计算困难的,而现实层面的整合通常需要至少减少一个层面的保真度。审查三个成对耦合文献——规划-调度、调度-行为和规划-行为——我们发现通常省略的第三层通常是外生固定的或用静态的汇总代理来表示。这些简化使问题变得可计算,但带来了不同的成本:它们可能会掩盖长期投资反馈、时间电网和排放动态,或异质用户响应和公平结果。基于这一诊断,我们识别了新兴充电技术、行为激励、公平度量和城市规模基于学习的方法中亟需解决的挑战,这些挑战平衡了保真度、可解释性和政策相关性。

英文摘要

The rapid growth of electric vehicles is shifting the main constraint on transport electrification from vehicle adoption to the deployment and operation of charging infrastructure. Charging-network design requires decisions across three interdependent layers: Planning, which determines where and how much infrastructure to build; Scheduling, which governs charging dispatch, pricing, and grid interaction; and Behavior, which captures how users choose stations, charging times, and charging durations. Existing studies have advanced each layer substantially, but the literature remains fragmented, and cross-layer interactions are often treated through simplifying assumptions. This survey develops a three-layer Planning-Scheduling-Behavior (PSB) framework to organize EV charging research according to decision horizon, actor objective, and coupling structure. We further identify a fidelity-tractability tradeoff, termed the PSB trilemma: each layer is computationally difficult in isolation, and realistic integration across layers generally requires reducing the fidelity of at least one layer. Reviewing the three pairwise-coupling literatures - Planning-Scheduling, Scheduling-Behavior, and Planning-Behavior - we show that the omitted third layer is typically fixed exogenously or represented by a static aggregate surrogate. These simplifications enable tractability but impose distinct costs: they can obscure long-term investment feedback, temporal grid and emissions dynamics, or heterogeneous user response and equity outcomes. Building on this diagnosis, we identify open challenges in emerging charging technologies, behavioral incentives, equity metrics, and city-scale learning-based methods that balance fidelity, interpretability, and policy relevance.

2605.21635 2026-05-22 cs.HC cs.AI cs.CY

Addressing the Synergy Gap: The Six Elements of the Design Space

弥合协同效应鸿沟:设计空间的六大要素

Tommaso Turchi, Ben Wilson, Matt Roach, Alan Dix, Alessio Malizia

AI总结 本文探讨了人机协同效应的缺失问题,提出设计空间的六大要素,为构建混合系统提供共享词汇,为研究协同模式提供分析视角,并为评估人机决策质量提供起点。

Comments 10 pages, 2 figures

详情
AI中文摘要

人工智能如今已嵌入医疗、金融、政策等众多领域,但真正的协同效应——即双方协同表现超过单独一方的表现——却很少见。元分析显示,人工智能辅助通常比单独工作时提升人类表现,但发现真正协同效应的研究却很少。我们称这种持续的不足为协同效应鸿沟。目前大多数工作将人机协同视为工程问题,专注于可解释性、信任校准或界面设计。这些方面固然重要,但仅涵盖了决定协同是否有效的一部分因素。为弥合协同效应鸿沟,我们主张需要更广泛地参与设计空间。我们通过六个相互关联的要素来映射这个空间:社会技术环境、决策框架、人类决策参与者、人工智能能力、交互以及整体评估。对于每个要素,我们描述了其涵盖的内容、在实践中如何影响其他要素以及对设计的含义。结果为构建混合系统的技术人员提供了一个共享词汇,为研究协同模式的研究人员提供了一个分析视角,并为人机协同决策质量评估者提供了一个起点,而非仅关注准确性。

英文摘要

AI is now embedded in healthcare, finance, policy, and many other domains, yet genuine human-AI synergy - combined performance that exceeds what either party achieves alone - is uncommon. Meta-analyses show that AI assistance tends to improve human performance compared to working alone, but studies finding true synergy are scarce. We call this persistent shortfall the synergy gap. Most current work treats human-AI combination as an engineering problem and concentrates on interpretability, trust calibration, or interface design. These matter, but they cover only part of what determines whether combination works. Closing the synergy gap, we argue, requires explicit engagement with a wider design space. We map that space through six interconnected elements: sociotechnical context, decision-making frameworks, human decision participants, AI capabilities, interaction, and holistic evaluation. For each element, we describe what it covers, how it shapes the others in practice, and what it implies for design. The result is a shared vocabulary for practitioners building hybrid systems, an analytical lens for researchers studying combination patterns, and a starting point for evaluators interested in the full quality of human-AI decision-making rather than accuracy alone.

2605.21633 2026-05-22 eess.IV cs.CV

VRXU-net: A Deep Learning Approach for Brain Ischemic Stroke Lesion Detection and Segmentation in T1W MRI

VRXU-net: 一种用于T1W MRI中脑缺血性中风病变检测和分割的深度学习方法

Sayed Amir Mousavi Mobarakeh

AI总结 该研究提出了一种基于视觉特征、残差连接和U型网络的VRU-Net架构,用于在3D磁共振成像扫描中检测和分割脑缺血性中风病变,通过改进的VGG模型和U型分割模型在不同切面中独立处理,并通过聚合结果提高分割精度和处理速度。

详情
AI中文摘要

当大脑供血被血栓阻断时,脑组织的氧气供应不足,导致细胞坏死。在医疗环境中,准确识别和勾勒缺血性病变边界对于治疗和手术计划至关重要。然而,缺血性中风病变在形状、大小和位置上差异很大,在灰度MRI模态如T1W中,它们可能与周围脑结构相似,这使得病变检测和分割对临床医生来说是一项挑战。本研究介绍了一种新的VRU-Net架构,该架构基于视觉特征、残差连接和U型网络,用于检测和分割3D磁共振成像扫描中的缺血性中风病变。所提出的方法首先使用修改后的VGG模型在单独的2D切片中识别缺血性中风。然后,一个带有残差块的U型分割模型对每个切片中的病变进行分割。此过程独立应用于轴向、矢状和冠状平面,并通过聚合三个分割结果生成最终输出。为了提高性能和处理速度,一种高性能分类器在顺序框架中应用于分割模型之前。这种策略减少了非病变切片的不必要的分割,并提高了整体准确性。此外,将3D图像分解为2D切片减少了模型复杂性,同时允许来自三个解剖平面的信息支持更准确的病变定位。所提出的方法在脑缺血后解剖追踪数据集上进行训练,并在准确率和Dice系数方面优于现有最先进模型。此外,分割输出提供的反馈有助于分类模型减少假阳性预测。

英文摘要

When the blood supply to the brain is obstructed by a clot, oxygen delivery to brain tissues becomes insufficient, leading to cellular necrosis. In healthcare settings, accurately identifying and delineating ischemic lesion boundaries is essential for treatment and surgical planning. However, ischemic stroke lesions vary widely in shape, size, and location, and in grayscale MRI modalities such as T1W they may resemble surrounding brain structures. This makes lesion detection and segmentation a challenging task for clinicians. This study introduces a novel VRU-Net architecture, derived from visual features, residual connections, and a U-shaped network, for detecting and segmenting ischemic stroke lesions in 3D magnetic resonance imaging scans. The proposed method first uses a modified VGG model to identify ischemic stroke in separate 2D slices. Then, a U-shaped segmentation model with residual blocks segments the lesion in each slice. This procedure is applied independently to the axial, sagittal, and coronal planes, and the final output is generated by aggregating the three segmentation results. To improve both performance and processing speed, a high-performance classifier is applied before the segmentation model in a sequential framework. This strategy reduces unnecessary segmentation of non-lesion slices and improves overall accuracy. In addition, decomposing 3D images into 2D slices reduces model complexity while allowing information from three anatomical planes to support more accurate lesion localization. The proposed model is trained on the Anatomical Tracings of Lesions After Stroke dataset and outperforms state-of-the-art models in terms of accuracy and Dice coefficient. Moreover, the segmentation output provides feedback that helps the classification model reduce false-positive predictions.

2605.21615 2026-05-22 cs.CR cs.LG cs.SE

ASSEMBLAGE-DEEPHISTORY: A Cross-Build Binary Dataset with Temporal Coverage

ASSEMBLAGE-DEEPHISTORY: 一个具有时间覆盖的跨构建二进制数据集

Chang Liu, Noah Fleischmann, Nicolò Altamura, Edward Raff, James Holt, Kristopher Micinski

AI总结 本文提出ASSEMBLAGE-DEEPHISTORY数据集,整合了跨构建多样性、跨版本历史和CVE标签,为二进制分析提供统一框架,通过三个分析验证了其在LLM漏洞检测、版本聚类和二进制相似性分解中的价值。

详情
AI中文摘要

现有的二进制数据集通常只能捕捉一个或两个二进制变化轴:它们要么提供无时间轴的跨编译器构建,要么为单构建二进制提供CVE标签。没有一个结合跨构建多样性、跨版本历史和CVE标签到可查询的结构中。我们提出了ASSEMBLAGE-DEEPHISTORY,将这些维度整合到统一的框架中,其中每个二进制的编译上下文、源代码、易受攻击函数和包版本都作为一等元数据存储。ASSEMBLAGE-DEEPHISTORY包含73,610个二进制文件,涵盖248个开源项目,这些二进制文件在Linux和Windows上使用GCC、Clang和MSVC在多个优化级别下编译,具有多年的历史构建。每个二进制文件都被索引到一个数据库中,该数据库将其链接到其源代码、函数、调试信息、变体构建、历史版本和易受攻击函数。三种分析展示了该结构的价值:(1)一个三阶段LLM基准测试(识别、策略引导检测和跨构建转移)以测试LLM是否在二进制漏洞上进行推理或在构建特定的artifact上进行模式匹配;(2)MalConv嵌入、jTrans函数嵌入和TLSH模糊哈希的比较,量化了同一包版本在每个空间中的聚类情况;(3)贝叶斯回归分解二进制相似性为时间距离、文件更改和提交的贡献。

英文摘要

Existing binary corpora typically capture only one or two axes of binary variation: they either provide cross-compiler builds without a temporal axis, or CVE labels for single-build binaries. None combine cross-build diversity, cross-version history, and CVE labels into a queryable structure. We present ASSEMBLAGE-DEEPHISTORY, which consolidates these dimensions into a unified framework where every binary's compilation context, source code, vulnerable functions, and package version are stored as first-class metadata. ASSEMBLAGE-DEEPHISTORY comprises 73,610 binaries spanning 248 open-source projects, compiled across GCC, Clang, and MSVC at multiple optimization levels on Linux and Windows, with multi-year historical builds. Each binary is indexed in a database that links it to its source code, functions, debug info, variant builds, historical versions, and vulnerable functions. Three analyses demonstrate this structure's value: (1) a three-stage LLM benchmark (recognition, strategy-guided detection, and cross-build transfer) to test whether LLMs reason about binary vulnerabilities or pattern-match on build-specific artifacts; (2) a comparison of MalConv embeddings, jTrans function embeddings, and TLSH fuzzy hashes quantifying how same-package versions cluster in each space; and (3) a Bayesian regression decomposing binary similarity into contributions from temporal distance, file changes, and commits.

2605.21614 2026-05-22 cs.HC cs.LG

Exploring the Effectiveness of Using LLMs for Automated Assessment of Student Self Explanations in Programming Education

探索使用LLMs对编程教育中学生自解释进行自动评估的有效性

Arun-Balajiee Lekshmi-Narayanan, Mohammad Hassany, Peter Brusilovsky

AI总结 本文研究了在编程教育中使用LLMs自动评估学生自解释的有效性,通过比较LLMs与语义相似性方法在二元分类任务中的表现,探讨了自动评分技术的优劣。

详情
AI中文摘要

worked examples是特定领域的逐步问题解决示例,提供给学生以获得领域特定的问题解决技能。通过将worked examples与self-explanations结合,可以增强其有效性,因为self-explanations要求学生解释而不是被动学习每个问题解决步骤。主要挑战是评估学生解释的正确性。在现有方法中,学生解释通过其语义相似性与教师或领域专家的解释进行判断。鉴于近期LLM基于的自动评分进展,仍不清楚语义相似性方法是否仍然是自动评分文本学生响应(如文章或代码解释)最有效的方法。比较这些方法需要高质量的数据集,提供如平衡的类别分布和领域特定的标注数据等特征。在本文中,我们提出了一个严格比较LLMs与语义相似性用于自动评分的比较,框架为二元分类任务。

英文摘要

Worked examples are step-by-step solutions to problems in a specific domain, offered to students to acquire domain-specific problem-solving skills. The effectiveness of worked examples could be enhanced by combining them with self-explanations, which ask students to explain rather than passively study each problem-solving step. The main challenge of this approach is assessing the correctness of the student's explanations. In the prevailing approach, student explanations are judged by their semantic similarity to an instructor's or domain expert's explanation. Given recent advances in LLM-based automated scoring, it remains unclear whether semantic similarity methods are still the most effective technique to automatically score textual student responses like essays or code explanations. Comparing these methods also requires quality datasets that offer distinctive features such as balanced class distributions and domain-specific labeled data for automated scoring tasks. In this paper, we present a rigorous comparison between LLMs and semantic similarity used for automated scoring, framed as a binary classification task.

2605.21548 2026-05-22 stat.ML cs.AI cs.LG

Local Covariate Selection for Average Causal Effect Estimation without Pretreatment and Causal Sufficiency Assumptions

局部协变量选择用于无预处理和因果充分性假设下的平均因果效应估计

Zeyu Liu, Zheng Li, Feng Xie, Yan Zeng, Hao Zhang, Kun Zhang

AI总结 本文提出了一种局部学习方法,用于非参数因果效应估计中的协变量选择,避免了预处理和因果充分性假设,提高了计算效率和估计准确性。

详情
AI中文摘要

我们研究了选择协变量以无偏估计总因果效应的问题。现有方法通常依赖于对所有变量的全局因果结构学习,或依赖于强假设,如因果充分性假设——观测变量不共享潜在混杂因素,或预处理假设,限制协变量只能是不受处理或结果影响的变量。这些要求在实践中往往不现实,且在高维设置中全局学习变得计算上不可行。为了解决这些挑战,我们提出了一种新颖的局部学习方法,用于非参数因果效应估计中的协变量选择,避免了预处理和因果充分性假设。我们首先刻画了一个局部边界,该边界包含至少一个有效的调整集,当且仅当存在调整集来识别因果效应时。然后我们开发了局部识别程序,以在该边界内高效地搜索。我们证明了所提出的方法是正确且完整的。在多个合成数据集和两个真实世界数据集上的实验表明,我们的方法在准确估计因果效应的同时,显著提高了计算效率。

英文摘要

We study the problem of selecting covariates for unbiased estimation of the total causal effect.Existing approaches typically rely on global causal structure learning over all variables, or on strong assumptions such as causal sufficiency - where observed variables share no latent confounders - or the pretreatment assumption, which limits covariates to those unaffected by the treatment or outcome. These requirements are often unrealistic in practice, and global learning becomes computationally prohibitive in high-dimensional settings.To address these challenges, we propose a novel local learning method for covariate selection in nonparametric causal effect estimation that avoids both the pretreatment and causal sufficiency assumptions. We first characterize a local boundary that contains at least one valid adjustment set whenever one exists for identifying the causal effect, and then develop local identification procedures to efficiently search within this boundary.We prove that the proposed method is sound and complete. Experiments on multiple synthetic datasets and two real-world datasets show that our approach achieves accurate causal effect estimation while substantially improving computational efficiency.

2605.21545 2026-05-22 cs.SE cs.AI

RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts

RefusalBench: 为什么拒绝率错误排名前沿大语言模型在生物研究提示中的表现

Lukas Weidener, Marko Brkić, Mihailo Jovanović, Emre Ulgac, Aakaash Meduri

AI总结 本文提出RefusalBench,通过141个提示的47组匹配三元组,评估前沿大语言模型在生物研究提示中的拒绝行为,发现拒绝率错误排名安全校准,揭示了模型在不同风险层级下的表现差异。

Comments 34 pages, 4 figures, 12 tables (10 in main text, 2 in supplementary). Code and data: https://github.com/AppliedScientific/refusalbench

详情
AI中文摘要

前沿大语言模型越来越多地被用作生物研究工作流的编排骨干,但尚无共同的证据基础来比较它们在合法研究提示上的拒绝行为。本文引入了RefusalBench,这是一个包含141个提示的47组匹配三元组基准,保持任务框架不变,仅改变生物风险层级(无害、临界、双用途),从而实现层级条件下的稳健比较,以避免子领域混淆。一个15个提示的应拒绝正控模块为每个模型建立了校准地板;三个模型未能拒绝这些提示。在2026年5月快照中的19个前沿模型中,严格拒绝率在相同提示上从0.1%到94.6%不等。在此次快照中,管辖权并不能预测拒绝(Mann-Whitney U,p = 0.393;欧盟n = 1,美国双模态);提供者身份可以,Anthropic的API堆栈预测拒绝的OR为21.03(95% CI:14.58-30.34提示聚类;5.70-77.55在模型聚类GEE下)。这种效应最好解读为访问路径级别而不是模型权重级别:Anthropic的99.8%严格拒绝携带相同的安全政策裁决原因代码,与一小组标准拒绝模板一致,而不是个别模型推理。严格拒绝率错误排名安全校准:Grok 4.20实现了最高的层级区分(Youden's J = 0.787)尽管在总体拒绝率中仅排名第七,而Claude Opus 4.7的J值从先前版本下降了65%,尽管双用途检测没有改进。18个前沿模型中有9个在双用途层级上表现出一种“谨慎但帮助”的部分合规模式,二元拒绝指标无法检测到这种模式。

英文摘要

Frontier large language models are increasingly deployed as orchestration backbones for biological research workflows, yet no shared evidence base exists for comparing their refusal behaviour on legitimate research prompts. RefusalBench, introduced here, is a matched-triple benchmark of 141 prompts in 47 bundles that holds task framing constant while varying only biological risk tier (benign, borderline, dual-use), enabling tier-conditioned comparisons robust to subdomain confounding. A 15-prompt should-refuse positive-control module establishes per-model calibration floors; three models fail to refuse even these prompts. Across 19 frontier models in the May 2026 snapshot, strict refusal rates span 0.1% to 94.6% on identical prompts. Jurisdiction does not predict refusal in this snapshot (Mann-Whitney U, p = 0.393; EU n = 1, US bimodal); provider identity does, with Anthropic's API stack predicting refusal at OR = 21.03 (95% CI: 14.58-30.34 prompt-clustered; 5.70-77.55 under model-clustered GEE). This effect is best read as access-path-level rather than model-weight-level: 99.8% of Anthropic's strict refusals carry the same safety_policy adjudicated reason code, consistent with a small set of canonical refusal templates rather than case-by-case model reasoning. Strict refusal rate misranks safety calibration: Grok 4.20 achieves the highest tier discrimination (Youden's J = 0.787) while ranking only seventh by overall refusal rate, and Claude Opus 4.7's J dropped 65% from prior versions with no improvement in dual-use detection. Nine of 18 frontier models exhibit a hedge-but-help partial-compliance pattern at dual-use tier that binary refusal metrics cannot detect.

2605.21541 2026-05-22 cs.CR cs.AI cs.LG stat.ML

Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs

频域正则化对抗对齐用于针对闭源大语言模型的可转移攻击

Leitao Yuan, Qinghua Mao, Daizong Liu, Kun Wang, Wenjie Wang, Yan Teng, Jing Shao, Dongrui Liu

AI总结 本文提出FRA-Attack,通过频域正则化方法解决对抗转移性问题,通过高通DCT目标和频率域梯度正则化提升跨模型的对抗转移能力。

详情
AI中文摘要

多模态大语言模型(MLLMs)仍易受基于转移的针对性攻击影响,其中在开源代理编码器上优化的扰动可以泛化到闭源MLLMs。提高对抗转移性的一个关键挑战是有效捕捉不同模型间共享的内在视觉聚焦特性,使得扰动与可转移的语义线索对齐,而非代理特定行为。然而,现有方法受到空间域特征冗余和代理特定梯度信号的阻碍,影响跨模型转移性。在本文中,我们提出FRA-Attack,从统一的频域正则化视角解决这两个挑战。在特征对齐方面,对patch特征的高通DCT目标抑制冗余的全局结构,并将损失集中在承载MLLMs内在视觉聚焦的高频带。在梯度优化方面,我们引入频率域梯度正则化(FGR),一种无模型依赖的低通正则化器,仅使用几何频率坐标调节代理梯度,即不涉及代理衍生的统计量,因此FGR通过构造无模型依赖性,消除代理特定的高频伪影,同时保留可转移的低频方向。两者共同形成统一的频域转移性处理。在15个旗舰MLLMs上进行的广泛实验显示,FRA-Attack在跨模型转移性方面表现优异,特别是在GPT-5.4、Claude-Opus-4.6和Gemini-3-flash等最先进的模型上实现了最先进的性能。

英文摘要

Multimodal large language models (MLLMs) remain vulnerable to transfer-based targeted attacks, where perturbations optimized on open-source surrogate encoders can generalize to closed-source MLLMs. A key challenge for improving adversarial transferability is to effectively capture the intrinsic visual focus shared across different models, such that perturbations align with transferable semantic cues rather than surrogate-specific behaviors. However, existing methods suffer from spatial-domain feature redundancy and surrogate-specific gradient signals, thereby hindering cross-model transferability. In this paper, we propose FRA-Attack, which addresses both challenges from a unified frequency-domain regularization perspective. For feature alignment, a high-pass DCT objective on patch features suppresses redundant global structures and concentrates the loss on the high-frequency band that carries the MLLMs' intrinsic visual focus. For gradient optimization, we introduce Frequency-domain Gradient Regularization (FGR), a \textit{model-agnostic} low-pass regularizer that modulates the surrogate gradient using only the geometric frequency coordinate, \textit{i.e.}, no surrogate-derived statistic is involved, so that FGR is model-agnostic by construction, removing surrogate-specific high-frequency artifacts while preserving transferable low-frequency directions. Together, the two components form a unified frequency-domain treatment of transferability. Extensive experiments on $15$ flagship MLLMs across $7$ vendors show that FRA-Attack achieves superior cross-model transferability, particularly with state-of-the-art performance on GPT-5.4, Claude-Opus-4.6 and Gemini-3-flash.

2605.21540 2026-05-22 cs.SI cs.AI cs.CL cs.CY

Detecting Synthetic Political Narratives in Cross-Platform Social Media Discourse

在跨平台社交媒体讨论中检测合成政治叙述

Despoina Antonakaki, Sotiris Ioannidis

AI总结 本文提出了一种跨平台框架,通过四个协调信号(词汇多样性、时间爆发性、修辞重复和语义同质性)组合成合成叙述协调评分SNC(C),以检测合成政治叙述,研究发现IntelSlava在四个事件窗口中排名第一,而Rybar尽管语义同质性高但因语言差异导致表现不佳。

详情
AI中文摘要

大规模语言模型的普及引入了新的合成政治沟通范式,其中叙述可能被生成、语义协调并战略性地在多个平台大规模传播。我们提出了一种跨平台框架,利用四个协调信号——词汇多样性D(C)、时间爆发性B(C)、修辞重复R(C)和语义同质性H(C)——组合成合成叙述协调评分SNC(C)以检测合成政治叙述。我们对包含6个地缘政治事件窗口的353,223条记录进行了分析,数据来自六个Telegram频道和九个Reddit社区(2023-2026)。结果表明,IntelSlava表现出最低的词汇多样性(MATTR 0.52-0.54)、最高的爆发性(B=+0.48至+0.73)和最高的与同僚频道的修辞重叠(Jaccard 0.12),在六个事件窗口中的四个中排名第一(SNC 0.45-0.60)。Rybar在所有窗口中排名最后,尽管其俄语输出导致词汇多样性高且与英语频道的修辞Jaccard接近零,这表明单一指标不足以检测协调性。多维SNC(C)评分提供了比任何单一指标更稳健和可解释的信号。

英文摘要

The proliferation of large language models has introduced a new paradigm of synthetic political communication in which narratives may be generated, semantically coordinated, and strategically disseminated across platforms at scale. We present a cross-platform framework for detecting synthetic political narratives using four coordination signals -- lexical diversity D(C), temporal burstiness B(C), rhetorical repetition R(C), and semantic homogenization H(C) -- combined into a Synthetic Narrative Coordination Score SNC(C). We apply the framework to a corpus of 353,223 records spanning six geopolitical event windows collected from six Telegram channels and nine Reddit communities (2023--2026). Results show that IntelSlava exhibits the lowest lexical diversity (MATTR 0.52--0.54), the highest burstiness (B=+0.48 to +0.73), and the highest rhetorical overlap with peer channels (Jaccard 0.12), ranking first in the composite SNC(C) on four of six event windows (SNC 0.45--0.60). Rybar ranks last on all windows despite its high semantic homogenization, because its Russian-language output yields high lexical diversity and near-zero rhetorical Jaccard with English-language channels -- demonstrating that no single indicator is sufficient for coordination detection. Multi-dimensional SNC(C) scoring provides a more robust and interpretable signal than any individual metric.

2605.21534 2026-05-22 stat.ML cs.LG

Adaptive RBF-KAN: A Comparative Evaluation of Dynamic Shape Parameters in Kolmogorov-Arnold Networks

自适应RBF-KAN:动态形状参数在Kolmogorov-Arnold网络中的比较评估

Roberto Cavoretto, Alessandra De Rossi, Adeeba Haider, Amir Noorizadegan

AI总结 本文研究了Kolmogorov-Arnold网络中动态形状参数的选择问题,通过引入更广泛的径向基核和基于留一验证的核尺度估计,改进了RBF-KAN模型,提升了对不同函数类型的适应能力。

详情
AI中文摘要

Kolmogorov-Arnold网络(KANs)通过可学习的单变量边缘函数近似多变量函数,通常参数化为B样条基。尽管有效,基于样条的实现可能计算成本较高。一种改进的KAN变体称为FastKAN,通过将样条替换为高斯径向基函数(RBF)来提高效率,但其依赖于固定的核和形状参数。在本工作中,我们扩展了基于RBF的KAN框架,引入了更广泛的径向基核,并通过留一验证(LOOCV)初始化核形状参数。到目前为止,这是首次将基于LOOCV的核尺度估计与深度KAN训练相结合的研究。我们还首次将Matérn和Wendland核引入KAN框架,使KAN能够超越FastKAN中使用的高斯核,提供更灵活的基函数表示。LOOCV估计提供了数据驱动的核尺度初始化,随后在网络训练中进一步优化。所提出的自适应RBF-KAN在多个二维基准函数上进行了评估。结果突显了核选择和自适应形状参数的重要性,不同核在光滑函数、不连续性和振荡模式中表现出优势。总体而言,结合基于LOOCV的初始化与自适应核学习为改进RBF-KAN模型提供了一种实用策略。

英文摘要

Kolmogorov-Arnold Networks (KANs) approximate multivariate functions using learnable univariate edge functions, typically parameterized by B-spline bases. Although effective, spline-based implementations can be computationally expensive. A modified version of KANs, called FastKAN, improves efficiency by replacing splines with Gaussian radial basis functions (RBFs), but it relies on a fixed kernel and shape parameter. In this work, we extend the RBF-based KAN framework by introducing a broader family of radial basis kernels and by initializing the kernel shape parameter using leave-one-out cross-validation (LOOCV). To the best of our knowledge, this is the first study that integrates LOOCV-based kernel scale estimation with deep KAN training. We also introduce Matérn and Wendland kernels into the KAN framework for the first time, enabling more flexible basis representations beyond the Gaussian kernel used in FastKAN. The LOOCV estimate provides a data-driven initialization of the kernel scale, which is subsequently refined during network training. The proposed adaptive RBF-KAN is evaluated on several two-dimensional benchmark functions. The results highlight the importance of kernel selection and adaptive shape parameters, with different kernels showing advantages for smooth functions, discontinuities, and oscillatory patterns. Overall, combining LOOCV-based initialization with adaptive kernel learning provides a practical strategy for improving RBF-based KAN models.

2605.21527 2026-05-22 eess.IV cs.CV cs.LG

CryoNet: A Deep Learning Framework for Multi-Modal Debris-Covered Glacier Mapping. A Case Study of the Poiqu Basin, Central Himalaya

CryoNet:一种用于多模态冰川覆盖区制图的深度学习框架。帕iqu盆地,中央喜马拉雅地区案例研究

Farzaneh Barzegar, Tobias Bolch, Norbert Kuehtreiber, Silvia L. Ullo

AI总结 本研究提出CryoNet,一种利用多模态数据集的深度学习框架,用于区分干净冰川、覆盖冰川和冰湖,通过在喜马拉雅中央帕iqu盆地的案例研究展示了其在复杂高山环境中的有效性。

Comments 15 pages, 10 figures, 5 tables. Preprint submitted to IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS); currently under review

详情
AI中文摘要

冰川作为淡水储备和气候变化指标起着关键作用,但其自动制图,尤其是覆盖冰川,由于与周围地形的光谱相似性仍具挑战性。本研究引入了CryoNet,一种深度学习框架,利用丰富的多模态数据集,包括Sentinel-2光学影像、DEM导出的地形变量、光谱指数、主成分分析(PCA)、InSAR相干性和相位、点状特征和GLCM纹理,以区分干净冰川、覆盖冰川和冰湖。CryoNet是一种基于ResNet101编码器的编码器-解码器CNN,具有嵌套跳接连接和空间-通道Squeeze-and-Excitation(scSE)注意力机制。本研究在喜马拉雅中央帕iqu盆地进行,通过将训练模型应用于阿尔卑斯山脉的蒙布朗山群评估其可转移性。我们还分析了每层数据在提高冰川制图性能中的重要性。所提出的模型实现了总体IoU为90.52%,平均召回率为98.08%,平均精确率为92.26%。对于覆盖冰川,CryoNet实现了IoU为90.46%,召回率为95.79%,精确率为94.21%。在单类和总体指标上,CryoNet超越了DeepLabV3+、SegFormer和U-Net,作为最先进的(SOTA)参考,证明了其在复杂高山环境中的冰川制图有效性。

英文摘要

Glaciers play a critical role as freshwater reserves and indicators of climate change, yet their automatic delineation, especially for debris-covered glaciers, remains challenging due to spectral similarity with surrounding terrain. This study introduces CryoNet, a deep learning framework that leverages a rich multi-modal dataset combining Sentinel-2 optical imagery, DEM-derived topographic variables, spectral indices, Principal Component Analysis (PCA), InSAR coherence and phase, tasseled-cap features, and GLCM texture to discriminate clean-ice glaciers, debris-covered glaciers, and glacial lakes. CryoNet is an encoder-decoder CNN with nested skip connections and spatial-channel Squeeze-and-Excitation (scSE) attention, built upon a ResNet101 encoder to capture hierarchical contextual and spatial features. The study is conducted in the Poiqu Basin in the central Himalaya, and transferability is evaluated by applying the trained model to the Mont Blanc Massif in the Alps. We additionally analyse the importance of each data layer in improving glacier mapping performance. The proposed model achieves an overall IoU of 90.52%, mean Recall of 98.08%, and mean Precision of 92.26%. For debris-covered glaciers specifically, CryoNet obtains an IoU of 90.46%, a recall of 95.79%, and a precision of 94.21%. Across both per-class and overall metrics, CryoNet surpasses DeepLabV3+, SegFormer, and U-Net, taken as state-of-the-art (SOTA) references, demonstrating its effectiveness for robust glacier mapping in complex high-mountain environments.

2605.21523 2026-05-22 eess.IV cs.AI cs.CV cs.MM eess.SP

Tackle CSM in JPEG Steganalysis with Data Adaptation

用数据适应法对抗JPEG隐写分析中的CSM

Rony Abecidan, Vincent Itier, Jérémie Boulanger, Patrick Bas, Tomáš Pevný

AI总结 本文提出TADA框架,通过数据适应方法学习未知的处理流程,以提高在真实场景中对抗CSM问题的鲁棒性,并改进实际应用中的泛化能力。

Comments ACM Workshop on Information Hiding and Multimedia Security, (IH&MMSec '26), Jun 2026, Florence, Italy

详情
AI中文摘要

隐写分析模型在基准数据集上表现优异,但在实际应用中遇到由训练时未见过的处理流程生成的图像时会遇到困难。这种被称为覆盖源不匹配(CSM)的问题在现实场景中尤为棘手,因为实践者只能访问少量未标记的数据集,不确定这些图像所应用的处理技术,且缺乏关于该数据集中覆盖和隐写图像比例的信息。为解决这一挑战,我们引入了TADA(通过数据适应的目标对齐)框架,该框架学习从少量未标记的目标数据集中模拟未知的处理流程。该架构通过结合残差协方差对齐、残差分布匹配和一个ℓ²损失约束模拟器生成逼真图像。在玩具和实际目标上,TADA在对抗CSM的鲁棒性和实际应用泛化能力方面相比强大的整体和原子基线有显著提升。附加资源可在本链接中获得:https://github.com/RonyAbecidan/TADA

英文摘要

Steganalysis models excel on benchmark datasets but struggle in the wild when analyzed images are produced by a processing pipeline unseen during training. This problem known as Cover Source Mismatch (CSM) is particularly hard in realistic settings where practitioners (1) have access to only a small, unlabeled dataset, (2) are unsure of the processing techniques applied to these images, and (3) lack information on the proportion of covers and stegos in that set. To answer this challenge, we introduce TADA (Target Alignment through Data Adaptation), a framework learning to emulate the unknown processing pipeline from a small unlabeled target set. This architecture is trained with a loss combining residual covariance alignment, residual distribution matching, and a $\ell^2$ loss constraining the emulator to produce realistic images. Across toy and operational targets, TADA yields substantial gains in robustness to CSM and improves operational generalization compared to strong holistic and atomistic baselines. Additional resources are available at this link: https://github.com/RonyAbecidan/TADA

2605.21522 2026-05-22 q-bio.QM cs.AI cs.CE cs.LG stat.ML

Protein Thoughts: Interpretable Reasoning with Tree of Thoughts and Embedding-Space Flow Matching for Protein-Protein Interaction Discovery

蛋白质思想:基于树 of 思维和嵌入空间流匹配的可解释推理用于蛋白质-蛋白质相互作用发现

Kingsley Yeon, Xuefeng Liu, Promit Ghosal

AI总结 本文提出了一种可解释的蛋白质-蛋白质相互作用发现框架,通过显式推理将PPI发现转化为可解释的搜索问题,利用嵌入空间流匹配和树 of 思维搜索方法提升预测精度和可解释性。

详情
AI中文摘要

蛋白质-蛋白质相互作用(PPIs)调控几乎所有细胞过程,但计算方法通常产生排名预测而缺乏机理解释。这限制了其应用,因为生物学家无法判断预测是否反映真实的生化见解或偶然相关性。我们提出了Protein Thoughts框架,将PPI发现重新表述为可解释的搜索问题。该系统将结合证据分解为四个生物意义的信号:序列相似性反映进化关系,结构互补性捕捉几何契合,界面平衡,以及化学兼容性编码残基级相互作用。而不是将这些信号合并为一个模糊的分数,我们通过透明的价值函数保留每个信号的贡献,从而实现排序和审计。为了高效地导航大规模候选空间,我们引入了假设引导的熵正则化树 of 思维搜索。微调的语言模型从嵌入衍生的特征生成搜索指令,将候选者分类为高优先级、探索性或可跳过。这些指令条件化一个玻尔兹曼策略,平衡利用与熵驱动的探索,同时假设意识修剪防止提前放弃有前途的候选者。对于表现出评分分歧的候选者,假设条件的嵌入空间流匹配将蛋白质嵌入推向结合者流形。在SHS148k基准测试中,Protein Thoughts实现了平均最佳结合体排名为11.2,比熵树搜索基线的47.7提高了76%,在结合预测中,训练的价值函数实现了91.08±0.19 Micro-F1,优于现有PPI方法在同一数据集上的表现。

英文摘要

Protein-protein interactions (PPIs) govern nearly all cellular processes, yet computational methods for identifying binding partners typically produce ranked predictions without mechanistic justification. This creates a fundamental barrier to adoption because biologists cannot assess whether predictions reflect genuine biochemical insight or spurious correlations. We present \textbf{Protein Thoughts}, a framework that reformulates PPI discovery as an interpretable search problem with explicit reasoning. The system decomposes binding evidence into four biologically meaningful signals: sequence similarity reflecting evolutionary relationships, structural complementarity capturing geometric fit, interface balance, and chemical compatibility encoding residue-level interactions. Rather than collapsing these signals into an opaque score, we preserve their individual contributions through a transparent value function that enables both ranking and auditing. To navigate large candidate spaces efficiently, we introduce hypothesis-guided entropy-regularized Tree-of-Thoughts search. A fine-tuned language model generates search directives from embedding-derived features, classifying candidates as high-priority, exploratory, or skippable. These directives condition a Boltzmann policy that balances exploitation with entropy-driven exploration, while hypothesis-aware pruning prevents premature abandonment of promising candidates. For candidates exhibiting score disagreement, hypothesis-conditioned embedding-space flow matching transports protein embeddings toward the binder manifold. On the SHS148k benchmark, Protein Thoughts achieves mean best-binder rank of 11.2 versus 47.7 for an entropic tree search baseline, a 76% improvement, and for binding prediction the trained value function achieves $91.08 \pm 0.19$ Micro-F1, outperforming existing PPI methods on the same dataset.

2605.21519 2026-05-22 cs.SI cs.LG

Neural Acceleration for Graph Partitioning

图划分的神经加速

Joshua Dennis Booth, Vishvam Patel

AI总结 本文提出利用神经网络模型替代传统特征值计算,以加速图划分过程,从而在保持划分质量的同时显著降低计算开销,提升大规模问题的可扩展性和效率。

详情
AI中文摘要

图划分是在许多科学和工程领域中至关重要的问题,包括社交网络分析、VLSI设计等。谱方法在广泛的问题中能够产生高质量的划分,同时最小化边切分。然而,计算图拉普拉斯矩阵的第二大特征值对应的费米尔向量所带来的计算成本仍然是一个瓶颈,由于内存问题和计算成本。在本文中,我们提出了一种加速谱二分划分的方法,通过用简单的神经网络模型替代传统的特征值计算来近似费米尔向量。我们证明我们的方法在划分质量上与谱二分划分相当,同时显著降低了计算开销,使其在大规模问题中更加可扩展和高效。

英文摘要

Graph Partitioning is a critical problem in numerous scientific and engineering domains including social network analysis, VLSI design, and many more. Spectral methods are known to produce quality partitions while minimizing edge cuts for a wide range of problems. However, the computational cost associated with the calculation of the Fiedler vector, an eigenvector associated with the second smallest eigenvalue of the graph Laplacian, remains a significant bottleneck due to memory issues and computational costs. In this paper, we present an accelerated approach to spectral bisection partitioning by replacing the traditional eigenvalue calculation with a simple artificial neural network model to approximate the Fiedler vector. We demonstrate that our approach achieves partitioning quality comparable to spectral bisection while significantly reducing the computational overhead, making it more scalable and efficient for large-scale problems

2605.21514 2026-05-22 cs.SI cond-mat.stat-mech cs.IT cs.LG math.IT physics.data-an

Conditional Entropy of Heat Diffusion on Temporal Networks

时间网络上热扩散的条件熵

Samuel Koovely, Alexandre Bovet

AI总结 本文研究了时间网络上热扩散的条件熵,提出了一种新的方法来检测时间网络中的相变点,并展示了其在信息论中的意义,类似于热力学第二定律。

详情
AI中文摘要

许多复杂系统可以建模为时间网络,其组织通常通过不同的结构阶段演变。检测这些阶段的转折点既重要又具有挑战性。在本工作中,我们将静态图上的条件熵扩展到时间网络,并研究其性质。我们提供了一个上界,并解释了偏差如何源于不对称的时间路径的存在。此外,我们展示了该量在时间上是单调的,从而为时间网络上的非均匀扩散提供了信息论意义上的热力学第二定律的类比。然后,我们引入了条件熵的局部版本,旨在探测有限时间窗口内的扩散,并展示了它在连续时间时间网络中用于转折点检测的有用信号。我们还在合成基准上评估了所提出的方法,包括与现有非参数基线在快照设置下的比较实验,然后将其应用于法国小学的现实时间接触网络。最后,我们展示了如何利用检测到的转折点在目标子区间内进行社区检测,从而提高聚类结果的质量和可解释性。

英文摘要

Many complex systems can be modeled by temporal networks, whose organization often evolves through distinct structural phases. Detecting the change points that delimit these phases is both important and challenging. In this work, we extend the conditional entropy of heat diffusion from static graphs to temporal networks and study its properties. We provide an upper bound and explain how discrepancies from it arise from the presence of asymmetric temporal paths. Moreover, we show that this quantity is monotone in time, yielding an information-theoretic analog of the second law of thermodynamics for inhomogeneous diffusion on temporal networks. We then introduce a local version of conditional entropy, designed to probe diffusion over finite temporal windows, and show that it provides an informative signal for change-point detection in continuous-time temporal networks. We evaluate the proposed methodology on synthetic benchmarks, including comparative experiments with existing nonparametric baselines in the snapshot setting, and then apply it to a real-world temporal contact network from a French primary school. Finally, we show how to use detected change points to perform community detection on targeted sub-intervals, improving the quality and interpretability of the clustering results.

2605.21510 2026-05-22 cs.SI cs.LG

Community-Aware Vertex Ordering for Reference-Based Graph Compression: A Cross-Encoder Empirical Study

面向社区的顶点排序用于基于参考的图压缩:一种交叉编码实证研究

Jimmy Dubuisson

AI总结 本文提出了一种两阶段的Leiden+LLP顶点排序方法,并研究其与基于参考的压缩的交互作用,结果显示在初始顶点排序较差的图中,重新排序能显著节省比特数,且不同编码器对排序的响应具有高度一致性。

Comments 26 pages, 7 figures, 9 tables. Full reproducibility package at https://github.com/jimbotonic/Adjacently.jl. Preprint; comments welcome

详情
AI中文摘要

基于参考的图压缩通过将每个顶点的邻接列表相对于最近的顶点进行编码,利用局部性来压缩大规模有向图。主流工具WebGraph的BVGraph固定单一编码流程,并依赖于单独选择的顶点排序--通常为URL字典序或分层标签传播(LLP)。排序与编码器之间的相互作用很少被测量。我们提出了一种两阶段的Leiden+LLP顶点排序--全局LLP用于种子标签,Leiden社区检测,然后在每个诱导子图上进行每簇LLP--并研究其与基于参考的压缩的交互。在初始顶点排序较差的图中,重新排序在每组数据集和编码器上节省了0.3到5.4比特每边。该收益的大小对编码器的敏感性较小:在四个五弱排序数据集中,四个独立参数化的编码器在Leiden+LLP与纯LLP之间的收益在大约±0.04 bpe内一致。在URL排序的网络爬虫中,其中分布式排序已经编码了局部性,自适应编码器仍然受益于重新排序,但经过URL诱导残差结构(BV-HC,CG at K>1)调优的编码器会受到轻微损害。为了量化在排序固定后编码器选择的重要性,我们贡献了三个基于参考的编码器--BG、CS和CG--它们能够从最多28个候选分解中进行每顶点成本最优的选择。每个在自己最佳测试排序下运行。这三个中的最佳在每个测试数据集上都优于BVGraph高压缩性能,编码器层面的收益在弱排序数据集中始终小于排序层面的收益。编码器框架还产生了一个自限定的位流,支持低开销随机访问。

英文摘要

Reference-based graph compression encodes each vertex's neighbor list relative to a recent vertex, exploiting locality to compress large directed graphs. The dominant tool, WebGraph's BVGraph, fixes a single encoding pipeline and relies on a separately chosen vertex ordering -- typically URL-lexicographic or Layered Label Propagation (LLP). The interaction between ordering and encoder is rarely measured. We propose a two-stage Leiden+LLP vertex ordering -- global LLP to seed labels, Leiden community detection, then per-cluster LLP on each induced subgraph -- and study how it interacts with reference-based compression. On graphs with poor initial vertex order, reordering saves 0.3 to 5.4 bits per edge on every dataset and encoder we measured. The size of that gain is largely insensitive to the encoder: on four of five weakly ordered datasets, four independently parameterised encoders agree on the Leiden+LLP-vs-plain-LLP gain within roughly +/- 0.04 bpe. On URL-ordered web crawls, where the distributed ordering already encodes locality, adaptive encoders still benefit from reordering, but encoders tuned to URL-induced residual structure (BV-HC, CG at K>1) are mildly hurt by it. To quantify how much encoder choice matters once ordering is fixed, we contribute three reference-based encoders -- BG, CS, and CG -- that perform per-vertex cost-optimal selection from up to 28 candidate decompositions. Each is run under its own best-tested ordering. The best of the three improves over BVGraph high-compression by 2-9% on every dataset tested, with the encoder-level gain consistently smaller than the ordering-level gain on weakly ordered datasets. The encoder framework also yields a self-delimiting bitstream that supports low-overhead random access.

2605.21507 2026-05-22 physics.ao-ph cs.AI cs.CE cs.LG

Visibility nowcasting in South Korea: a machine learning approach to class imbalance and distribution shift

韩国可见度现在预测:一种处理数据不平衡和分布偏移的机器学习方法

Bong Gyun Shin, Chan Sik Lee, Hyesun Suh

AI总结 本文提出了一种机器学习方法,用于预测韩国六个主要城市的大气可见度,通过SMOTENC和CTGAN处理数据不平衡,并结合机器学习和深度学习模型进行评估,发现训练与测试期间的分布偏移导致预测性能下降,强调了在时间序列数据上实施现在预测模型时考虑外部环境因素的重要性。

Comments Published in Theoretical and Applied Climatology

详情
Journal ref
Theoretical and Applied Climatology, vol. 157, art. no. 283, 2026
AI中文摘要

大气可见度是交通安全和空气质量管理的关键变量,然而,由于气象条件和空气污染物之间的复杂相互作用以及低可见度事件的稀有性,准确预测仍然具有挑战性。本研究引入了一种机器学习框架,用于预测韩国六个主要城市的可见度。为了处理2018-2020年训练数据中的不平衡问题,我们应用了合成少数类过采样技术(SMOTENC)和条件表格生成对抗网络(CTGAN)。然后,使用结合机器学习和深度学习模型的集成方法,并在2021年测试数据集上进行评估。结果表明,测试集的预测性能相比交叉验证阶段明显下降。这种退化归因于训练和测试期间的分布偏移,通过测量SHAP分析确定的最显著特征的Wasserstein距离得到了定量确认。总体而言,本研究提出了一种旨在同时解决数据不平衡和时间分布偏移双重挑战的方法,并强调在时间序列数据上实施现在预测模型时考虑不断变化的外部环境因素的必要性。

英文摘要

Atmospheric visibility is a critical variable for transportation safety and air quality management, however, accurate prediction remains challenging due to the complex interactions between meteorological conditions and air pollutants, as well as the rarity of low-visibility events. This study introduces a machine learning framework to nowcast visibility in six major South Korean cities. To handle the imbalance in the 2018-2020 training data, we applied the Synthetic Minority Over-sampling Technique with Nominal and Continuous (SMOTENC) and Conditional Tabular Generative Adversarial Network (CTGAN). An ensemble approach combining machine learning and deep learning models was then used and evaluated on a 2021 test dataset. The results revealed a marked decline in predictive performance in the test set compared to the cross-validation phase. This degradation was attributed to a distributional shift between training and testing periods, which was quantitatively confirmed by measuring the Wasserstein distance of the most influential feature identified by SHAP analysis. In general, this study presents a methodology that aims to simultaneously address the dual challenges of data imbalance and temporal distributional shifts, and emphasizes the necessity of accounting for evolving external environmental factors when implementing nowcasting models on time-series data.

2605.21502 2026-05-22 q-bio.MN cs.AI cs.LG

Graph neural network explanations reveal a topological signature of disease-associated hubs in biological networks

图神经网络解释揭示了生物网络中与疾病相关的枢纽的拓扑特征

Kyle Higgins, Ivan Laponogov, Dennis Veselkov, Kirill Veselkov

AI总结 本文研究了图神经网络在生物网络中识别疾病相关结构的方法,发现不同解释方法在稀疏单节点驱动和分布式路径信号中有不同的表现,并提出了一种结合壳层枢纽评分和解释器共识排名的框架,提升了对癌症基因的优先级排序和生物学相关分子的恢复能力。

Comments 25 pages (excluding supplement), 7 figures, 7 supplementary tables

详情
AI中文摘要

图神经网络(GNNs)越来越多地用于建模生物系统,但后验解释方法恢复有意义的分子机制的可靠性仍不清楚。本文系统评估了四种广泛使用的解释方法:显著性归因(SA)、集成梯度(IG)、GNNExplainer 和层间相关传播(LRP),以识别乳腺癌RNA-seq数据在蛋白质-蛋白质相互作用网络上的疾病相关结构。通过合成基准测试,我们发现解释方法恢复了不同的信号组织:SA在稀疏单节点驱动方面表现最佳,而IG和LRP更倾向于恢复分布式的路径样和级联样信号。在TCGA BRCA数据中,我们识别出一种一致的拓扑特征,即疾病相关枢纽的归因在最近的1跳邻居中达到峰值,并在后续网络壳层中衰减,这种模式在IG和LRP中最为显著,并与已知癌症枢纽的强富集相关。我们进一步观察到局部枢纽富集与全局基因排名性能之间的权衡,IG优化局部富集,而SA在全局区分方面表现更优。受这些互补行为的启发,我们提出了一种结合基于壳层的枢纽评分和解释器共识排名的框架。共识评分提高了对经典癌症基因(TP53、BRCA1、ESR1、MYC)的优先级排序,减少了对节点度数的依赖,并且在调优时优于单独的方法。通路富集进一步揭示了对生物上一致的癌症程序的改进恢复,包括ERBB2、RTK、MAPK、免疫和细胞因子信号。这些结果表明,拓扑感知的图解释整合可以提高生物可解释性和生物相关分子的恢复能力。

英文摘要

Graph neural networks (GNNs) are increasingly used to model biological systems, yet the reliability of post-hoc explanation methods for recovering meaningful molecular mechanisms remains unclear. Here, we systematically evaluate four widely used approaches: Saliency Attribution (SA), Integrated Gradients (IG), GNNExplainer, and Layer-wise Relevance Propagation (LRP) for identifying disease-relevant structure in breast cancer RNA-seq data projected onto a protein-protein interaction network. Using synthetic benchmarks with known ground-truth motifs, we show that explanation methods recover distinct signal organizations: SA performs best for sparse single-node drivers, whereas IG and LRP preferentially recover distributed pathway-like and cascade-like signals. In TCGA BRCA data, we identify a consistent topological signature of disease-associated hubs in which attribution peaks in the immediate 1-hop neighborhood and decays across successive network shells, a pattern most pronounced for IG and LRP and associated with strong enrichment of known cancer hubs. We further observe a trade-off between local hub enrichment and global gene ranking performance, with IG optimizing local enrichment and SA achieving superior global discrimination. Motivated by these complementary behaviors, we introduce a framework combining a shell-based hub score with consensus ranking across explainers. Consensus scores improve prioritization of canonical cancer genes (TP53, BRCA1, ESR1, MYC), reduce dependence on node degree, and, especially when tuned, outperform individual methods. Pathway enrichment further reveals improved recovery of biologically coherent cancer programs, including ERBB2, RTK, MAPK, immune, and cytokine signaling. Together, these results demonstrate that topology-aware integration of graph explanations can improve biological interpretability and biologically relevant molecular recovery.

2605.21500 2026-05-22 eess.IV cs.CV

A Task-Agnostic Algebraic Integrity Metric for Event-Camera Streams Toward SOTIF-Compliant Perception using Pearson Correlation Coefficient

一种任务无关的代数完整性度量用于事件相机流,以实现SOTIF兼容的感知使用皮尔逊相关系数

Arthur de Miranda Neto

AI总结 本文提出了一种任务无关的代数完整性度量,通过将皮尔逊相关系数提升到三个标准事件表示中,以实现SOTIF兼容的感知。

Comments 12 pages, 6 figures, 3 tables, 14 equations. Theoretical framework paper with procedural-synthetic illustrations; empirical validation on real datasets reserved for follow-up. Code and demonstration video available

详情
AI中文摘要

事件相机已作为一种高带宽、低延迟的感知模态,用于自动化驾驶系统(ADS)中的安全关键感知,提供微秒时间分辨率、120-140 dB动态范围和固有的无运动模糊。然而,目前没有任务无关的质量度量可以直接操作异步事件流:最先进的代理需要下游任务(例如检测精度、跟踪误差)来评估流的完整性,这与ISO 21448(SOTIF)和ISO/PAS 8800:2024的认证要求不兼容。最近的BiasBench基准(CVPR 2025)明确指出了这一差距。本文提出了一种统一的代数框架,将皮尔逊相关系数(PCC)提升到三个标准事件表示:时间表面、事件帧和体素网格。该框架产生三个度量:(i)r-TS用于流完整性监控,以对抗自我运动预测的时间表面;(ii)r2-EF用于需要仅整数比较的自适应ROI选择;(iii)r-VG用于时间冗余门控。在事件相机的对比阈值机制(|Delta L| >= C)和基于PCC的变化标准之间建立了结构同构性,三个提升的度量被形式化,并且管道延迟和信息损失被对称分析,以与原始流相对比。每个度量的示例行为在由直接模拟发射模型生成的程序合成事件流中得到演示,而不是从任何真实或视频派生的数据集中获取,包括一个隧道下陷完整性异常场景,其中r_C从0.93(一致流动)降至低于0(警报)。一个显式的认知惯例([ESTABLISHED],[SOLID],[HYPOTH.],[OPEN])界定了每个贡献的状态。

英文摘要

Event cameras have emerged as a high-bandwidth, low-latency sensing modality for safety-critical perception in automated driving systems (ADS), offering microsecond temporal resolution, 120-140 dB dynamic range, and intrinsic absence of motion blur. However, no task-agnostic quality metric currently operates directly on the asynchronous event stream: state-of-the-art proxies require a downstream task (e.g., detection accuracy, tracking error) to assess stream integrity, which is incompatible with the certification requirements of ISO 21448 (SOTIF) and ISO/PAS 8800:2024. The recent BiasBench benchmark (CVPR 2025) explicitly identifies this gap. This work proposes a unified algebraic framework that lifts the Pearson Correlation Coefficient (PCC), historically used in two prior works for redundancy filtering and ROI selection on frame-based images, to the three standard event representations: Time Surface, Event Frame, and Voxel Grid. The framework yields three metrics: (i) r-TS for stream integrity monitoring against an ego-motion-predicted Time Surface, (ii) r2-EF for adaptive ROI selection requiring only integer comparisons, and (iii) r-VG for temporal redundancy gating. A structural isomorphism is established between the contrast-threshold mechanism of the event camera (|Delta L| >= C) and the PCC-based change criterion, the three lifted metrics are formalized, and pipeline latency and information loss are analyzed symmetrically against the raw stream. Illustrative behavior of each metric is demonstrated on a procedural-synthetic event stream, generated by direct simulation of the emission model rather than drawn from any real or video-derived dataset, including a tunnel-dip integrity-anomaly scenario in which r_C drops from 0.93 (coherent flow) to below 0 (alarm). An explicit epistemic convention ([ESTABLISHED], [SOLID], [HYPOTH.], [OPEN]) delineates the status of every contribution.

2605.21499 2026-05-22 physics.flu-dyn cs.LG

Conditional Neural Field based Reduced Order Model for Dynamic Ditching Load Prediction

基于条件神经场的降阶模型用于动态倾倒载荷预测

Henning Schwarz, Pyei Phyo Lin, Jens-Peter M. Zemke, Thomas Rung

AI总结 本文提出一种基于条件神经场的降阶模型,用于预测飞机倾倒载荷,该模型在不依赖空间离散化的情况下,通过结合LSTM网络实现了高精度的时空预测,并在不同空间离散化条件下展示了良好的重建能力。

详情
AI中文摘要

基于网格的神经网络,如卷积自编码器,在计算流体力学中广泛用于基于维度缩减的替代模型。近年来,基于坐标的方案,如条件神经场的使用逐渐兴起。其不依赖空间离散化的特性为计算流体力学中的各种应用提供了有益的特性。本文讨论了使用条件神经场方法对飞机倾倒载荷进行时空预测。模型使用两个数据集进行评估,一个与单个固定空间离散化相关,另一个包含不同离散化数据的数据。当与潜在空间中的长短期记忆(LSTM)网络结合时,基于神经场的模型在第一个数据集上实现了与网格依赖的卷积自编码器模型相当的时空预测精度,但参数显著更少。第二个数据集的结果展示了基于神经场的方法在异质空间离散化条件下准确重建倾倒载荷的能力。这允许灵活地使用为不同几何形状和/或离散化生成的训练数据集,以及使用替代模型预测不同配置的载荷。

英文摘要

Grid-based neural networks such as convolutional autoencoders are widely used in dimension reduction-based surrogate models for computational fluid dynamics. In recent years, the use of coordinate-based approaches like conditional neural fields has emerged. Their independence of the spatial discretization is a beneficial feature for various applications in computational fluid dynamics. This paper discusses the spatio-temporal prediction of aircraft ditching loads using a conditional neural field approach. The model is evaluated using two datasets for the dynamic loads of the fuselage of a DLR-D150 aircraft, one of which relates to a single fixed spatial discretization and the other that includes data from different discretizations. When paired with a long short-term memory (LSTM) network in the latent space, the neural field-based model achieves a spatio-temporal prediction accuracy for the first data set that is close to that of grid-dependent convolutional autoencoder-based models, and with significantly less parameters. Results for the second data set demonstrate the ability of the neural field-based approach to reconstruct ditching loads accurately for heterogeneous spatial discretizations. This allows for flexible use of training datasets generated for different geometries and/or discretizations, as well as the use of the surrogate model to predict loads for different configurations.

2605.21497 2026-05-22 cs.CR cs.AI

Autonomous LLM Agents & CTFs: A Second Look

自主大语言模型代理与CTF:再看一次

Youness Bouchari, Matteo Boffa, Marco Mellia, Idilio Drago, Thanh Minh Bui, Dario Rossi

AI总结 本文重新审视了大语言模型代理在自动化进攻性安全任务中的表现,通过在30个基于网络的CTF挑战中测试不同架构的代理,发现通用代理在性能上与定制架构相当,并揭示了当前代理在某些类别中的持续障碍。

Comments Accepted at DeMeSSAI Workshop @ IEEE EuroS&P 2026

详情
AI中文摘要

大型语言模型(LLM)代理越来越多地被提出以自动化进攻性安全任务,最近的研究报告称在捕获-the-Flag(CTF)挑战中接近人类水平的成功率。我们在此重新审视这些结果,提供对这些声明的第二次审视。我们针对30个基于网络的CTF挑战(涵盖14种漏洞类别)设计了不同复杂度和模块化的代理架构。我们使用多种LLM主干来实例化这些代理,并将其与claude-code通用代理进行比较,该代理能够自动确定其内部架构。我们的评估得出三个主要发现。首先,claude-code在性能上与定制架构相当(19/30个任务解决),表明通用代理是进攻性安全任务的强大基线。其次,我们的架构和claude-code在相同的挑战类别中挣扎,揭示了持续存在的障碍,使当前代理仍低于人类水平的能力。第三,通过利用我们手动设计的架构,我们能够系统地衡量额外组件的影响,发现专门化角色的结构化协调优于单体设计,提高了运行一致性,并减少了执行成本。

英文摘要

Large Language Model (LLM) agents are increasingly proposed to automate offensive security tasks, with recent studies reporting near human-level success rates in Capture-the-Flag (CTF) challenges. We here revisit these results, providing a second look at these claims. We engineer different agent architectures of increasing complexity and modularity on 30 web-based CTFs challenges spanning 14 vulnerability classes. We instantiate these agents with multiple LLM backbones, and compare them with claude-code, a general-purpose agent that automatically determines its internal architecture. Our evaluation yields three main findings. First, claude-code achieves performance comparable to the engineered architectures (19/30 solved tasks), suggesting that general-purpose agents are strong baselines for offensive security tasks. Second, both our architectures and claude-code struggle in the same challenge categories, revealing persistent barriers that keep current agents below human-level capability. Third, by leveraging our manually designed architectures we can systematically measure the impact of additional components, finding that structured orchestration of specialized roles outperforms monolithic designs, improving run-to-run consistency, and reducing execution costs.

2605.21187 2026-05-22 cs.NI cs.AI cs.DC

High-speed Networking for Giga-Scale AI Factories

面向十亿级AI工厂的高速网络

Sajy Khashab, Albert Gran Alcoz, Alon Gal, Jacky Romano, Rani Abboud, Yonatan Piasetzky, Lior Maman, Amit Nishry, Barak Gafni, Omer Shabtai, Matty Kadosh, Dror Goldenberg, Gilad Shainer, Mark Silberstein

AI总结 本文提出了一种面向大规模AI训练需求的高速网络架构,通过拓扑并行性替代传统层次结构,利用硬件加速的负载均衡技术,在微秒级动态网络条件下提供稳定性能,展示了在三大核心维度上的生产级AI基础设施性能。

详情
AI中文摘要

随着分布式模型训练扩展到数以万计的GPU,扩展型网络面临前所未有的性能和效率需求。NVIDIA Spectrum-X Ethernet从零开始设计,以实现可预测且稳定的网络性能,具有高利用率和低延迟。本文提出了Spectrum-X多平面架构,该架构用拓扑并行性替代层次深度,并在NIC和交换机中引入硬件加速的负载均衡作为关键架构方法,以提供快速响应高度动态网络条件的能力。我们描述了动机、设计原则、评估方法和在最先进基准上的性能,以及在大规模系统中部署和调试Spectrum-X网络所学到的经验。我们的评估突显了生产级AI基础设施在三个核心维度上的性能:98%的理论线路速率,低抖动延迟;强跨租户隔离;容量比例的双倍带宽和10%链路故障时7%的延迟增加;以及在LLM训练工作负载中快速响应主机和链路波动。

英文摘要

As distributed model training scales to span hundreds of thousands of GPUs, scale-out networks face unprecedented performance and efficiency demands. NVIDIA Spectrum-X Ethernet has been designed from the ground up to achieve predictable and stable network performance with high utilization and low latency. This paper presents the Spectrum-X multiplane architecture, which replaces hierarchical depth with topological parallelism, and introduces hardware-accelerated load balancing in NICs and switches as the key architectural approach to provide fast reaction to highly dynamic network conditions at the microsecond timescales that AI training workloads demand. We describe the motivation, design principles, evaluation methodology and performance on state-of-the-art benchmarks, as well as the lessons we learned from deploying and debugging Spectrum-X networks in large-scale systems. Our evaluation highlights production-grade AI infrastructure performance across three core dimensions: 98% of the theoretical line rate with low jitter-free latency; strong cross-tenant isolation for concurrent workloads; robust, capacity-proportional bisection bandwidth and 7% latency increase for 10% fabric link failures; and rapid reaction to host and fabric link flaps during LLM training workloads.

2605.19955 2026-05-22 cs.CR cs.SD

DASM: Domain-Aware Sharpness Minimization for Multi-Domain Voice Stream Steganalysis

DASM:多领域语音流隐写分析中的领域感知锐度最小化

Pengcheng Zhou, Pianran Guo, Shuhua Chen, Mengqin Zhao, Zhongliang Yang, Linna Zhou

AI总结 本文提出DASM,一种领域感知锐度最小化方法,通过结合领域监督对比学习和锐度感知优化,提升多领域语音流隐写分析的鲁棒性和泛化能力。

详情
AI中文摘要

随着信息隐藏在网络流媒体中的广泛应用,其用于隐蔽通信的安全威胁日益加剧,亟需开发鲁棒的检测技术。然而,现有网络语音流隐写分析方法主要依赖特定场景的数据分布,难以适应非同源数据分布的实践检测需求。通过Hessian分析,我们发现主流模型的损失景观被大量鞍点和尖锐局部极小值主导,使其对数据分布变化高度敏感,从根本上限制了泛化能力。因此,我们提出一种新的优化器,领域感知锐度最小化(DASM)。DASM的核心机制包括两个方面:首先,它结合领域监督对比学习和锐度感知优化,明确保持跨领域特征分离的同时寻找平坦极小值;其次,我们设计了一种自适应领域间隙调节策略,通过感知不同领域实时特征分离性动态校准优化损失权重。大量实验结果表明,我们的方法在很大程度上优于现有最先进方法,并实现了出色的泛化能力和鲁棒性。

英文摘要

The growing use of information hiding in network streaming media for covert communication poses a significant security threat, necessitating the development of robust detection technologies. However, existing steganalysis methods for network voice streams mostly rely on data distributions in specific scenarios, making it difficult to adapt to the practical detection needs of non-homologous data distributions. Through Hessian analysis, we find that the loss landscapes of mainstream models are dominated by numerous saddle points and sharp local minima, rendering them highly sensitive to data distribution shifts and fundamentally limiting generalization. Therefore, we propose a new optimizer, Domain-Aware Sharpness Minimization (DASM). The core mechanisms of DASM consist of two aspects: first, it integrates domain-supervised contrastive learning with sharpness-aware optimization, explicitly preserving inter-domain feature separation while seeking flat minima; second, we design an adaptive domain gap modulation strategy that dynamically calibrates the optimization loss weights by sensing the real-time feature separability of different domains. Extensive experimental results demonstrate that our method outperforms the state-of-the-art methods by a large margin and achieves excellent generalization and robustness.

2605.17998 2026-05-22 cs.SE cs.AI

Verify-Gated Completion as Admission Control in a Governed Multi-Agent Runtime: A Bounded Architecture Case Study

验证门控完成作为受控多智能体运行时的准入控制:一个有界架构案例研究

Hai-Duong Nguyen, Xuan-The Tran

AI总结 本文研究了验证门控完成作为受控多智能体运行时的准入控制机制,通过一个有界参考实现,探讨了可审计的验证门控完成所能支持的信息,并分析了其在不同场景下的表现和限制。

Comments 39 pages, 2 figures, 17 tables. Preprint

详情
AI中文摘要

随着多智能体系统从短时交互转向具有专门角色和持久状态的工具使用工作流,完成性问题从纯粹的生成性问题转变为运行时控制问题。本文研究了验证门控完成作为受控多智能体运行时的准入控制模式:智能体可以提出完成请求,但只读验证器决定是否接受该请求。模糊或证据薄弱的情况采用失败关闭策略,而分组状态和事件轨迹保留审计路径。我们检查了一个有界参考实现,并探讨释放的证据能支持关于可审计、验证门控完成的哪些信息。在释放的验证完成切片中,已知结果触发事件验证成功比例为1,791/1,800 = 99.5%。这是一个关于触发验证事件的计数措施,而不是任务完成、生产可靠性或基准成功率。任务级验证覆盖率不可计算;1,762/1,801行来自一个高流量报告集群;只有17个事件被生产分类。一个影子策略/治理验证器评估显示,1,526/1,548 = 98.58%的规则一致,0/1,526个安全通过预测中的假成功,以及阻塞精度为2/518 = 0.39%,因此仍属建议性。证据支持一个狭窄的结论:在观察到的条件下,只读验证门和分组的准入记录使完成决策可检查且失败关闭。关于部署操作、安全保证、结果收益、任务级覆盖率、恢复有效性或外部效度的声明仍超出研究范围。

英文摘要

As multi-agent systems move from short interactions to tool-using workflows with specialized roles and persistent state, completion becomes a runtime-control problem rather than a purely generative one. This preprint studies verify-gated completion as an admission-control pattern for governed multi-agent runtimes: agents may propose completion, but a read-only verifier decides whether the claim is admitted. Ambiguous or weakly evidenced cases resolve fail-closed, while packetized state and event traces preserve an audit path. We examine one bounded reference implementation and ask what the released evidence can support about auditable, verify-gated completion. In the released verify-completed slice, the known-outcome invoked-event verify success share was 1,791/1,800 = 99.5%. This is an accounting measure over invoked verification events, not a task-completion, production-reliability, or benchmark-success rate. Task-level verify coverage is not computable; 1,762/1,801 rows came from one high-volume reporting cluster; and only 17 events were production-classified. A shadow Policy/Governance Verifier evaluation showed 1,526/1,548 = 98.58% rule agreement, 0/1,526 false-success among safe-to-proceed predictions, and blocked precision of 2/518 = 0.39%, so it remains advisory. The evidence supports a narrow conclusion: under observed conditions, a read-only verify gate plus packetized admission records made completion decisions inspectable and fail-closed. Claims about deployed operation, safety guarantees, outcome gains, task-level coverage, recovery effectiveness, or external validity remain outside scope.

2605.17156 2026-05-22 quant-ph cs.LG

Sparse Mamba Decoder for Quantum Error Correction: Efficient Defect-Centric Processing of Surface Code Syndromes

稀疏Mamba解码器用于量子纠错:高效处理表面码syndrome的缺陷中心处理

Samira Sayedsalehi, Nader Bagherzadeh, Maxim Shcherbakov, Jean-Luc Gaudiot

AI总结 本文提出了一种基于缺陷中心的稀疏Mamba解码器,通过仅处理活跃的检测事件,提高了量子纠错中表面码syndrome处理的效率和准确性,同时在多个基准测试中展示了显著的性能提升。

Comments 22 pages, 7 figures, 10 tables. Neural decoder for surface code quantum error correction. Submitted to Quantum

详情
AI中文摘要

量子纠错(QEC)对于构建容错量子计算机至关重要,需要同时准确、快速且可扩展的解码器。大多数最先进的神经解码器在高准确性方面表现优异,但处理整个密集的syndrome数组,其大小为O(d²R),无论实际错误率如何,其中d是编码距离,R是测量轮次的数量。在物理相关错误率(p ~ 0.1%)下,少于5%的syndrome条目包含活跃的检测事件——然而现有解码器处理整个syndrome体积。我们引入了稀疏Mamba解码器(SMD),一种以缺陷为中心的神经解码器,使用每个缺陷13维的特征表示和Mamba状态空间骨干,仅处理k个活跃的检测事件,实现O(k)复杂度。在去极化、均匀电路级、SI1000和Google Sycamore实验基准上,SMD在SI1000噪声下,d ≤ 5时将MWPM逻辑错误率降低了高达49%,比Tesseract近MLD解码器快95-467倍,比Belief Matching快232-463倍,并在均匀电路级噪声下保持几乎恒定的延迟(24-57 us)。在Sycamore实验数据集上,SMD集合匹配或略微超过Varbanov等人密集的Mamba解码器。所有结果均在商用NVIDIA GPU上获得,参数数量为7.5-16M,无需专用加速器。

英文摘要

Quantum error correction (QEC) is essential for building fault-tolerant quantum computers, requiring decoders that are simultaneously accurate, fast, and scalable. Most state-of-the-art neural decoders achieve high accuracy but process the full dense syndrome array of size $O(d^2 R) $regardless of the actual error rate, where d is the code distance and R is the number of measurement rounds. At physically relevant error rates (p ~ 0.1%), fewer than 5% of syndrome entries contain active detection events -- yet existing decoders process the entire syndrome volume. We introduce the Sparse Mamba Decoder (SMD), a defect-centric neural decoder that processes only the k active detection events using a 13-dimensional feature representation per defect and a Mamba state-space backbone, achieving $O(k)$ complexity. Across depolarizing, uniform circuit-level, SI1000, and Google Sycamore experimental benchmarks, SMD reduces the MWPM logical error rate by up to 49% at $d \le 5$ under SI1000 noise, runs 95-467x faster than the Tesseract near-MLD decoder and 232-463x faster than Belief Matching, and maintains nearly constant latency (24-57 us) across d = 3-9 under uniform circuit-level noise. On the Sycamore experimental dataset, the SMD ensemble matches or slightly surpasses the dense Mamba decoder of Varbanov et al. All results are obtained on commodity NVIDIA GPUs with 7.5-16M parameters, without specialized accelerators.

2605.16304 2026-05-22 eess.SP cs.SD

Modulation Feature Enhancement with a Multi-Stage Attention Network for Underwater Acoustic Target Recognition

基于多阶段注意力网络的调制特征增强用于水下声学目标识别

Jiaping Yu, Shefeng Yan, Linlin Mao, Zeping Sui, Chunjin Jiang

AI总结 本文提出了一种基于变分模态分解和3/2-D频谱的特征提取与融合方法,结合多阶段多类型注意力机制和可调类平衡焦点损失,提升水下声学目标识别性能。

Comments 31 pages, 14 figures, Accepted by Signal Processing

详情
AI中文摘要

水下声学目标识别对于海洋应用至关重要,但面临船舶辐射噪声复杂多样的挑战。为解决这些问题,我们提出了一种稳健的深度学习框架。首先,我们引入基于变分模态分解(VMD)和3/2-D频谱的特征提取与融合方法,生成高保真的2-D DEMON频谱特征,有效捕捉调制包络信息。为进一步增强特征表示,我们设计了一种集成新型多阶段多类型注意力机制(MMATT)的一维卷积神经网络(1-D CNN),该机制能够自适应地在不同网络深度上优化特征。在此机制中,我们提出了一种残差通道独立频谱注意力机制(R-CISAM)和多尺度分离与融合频谱注意力机制(MS-SFSAM)。此外,为了缓解实际船舶辐射噪声数据中固有的严重类别不平衡导致的性能下降,我们设计了一种可调类平衡焦点损失(ACBFL),该损失函数在任务不平衡程度不同的情况下提供灵活性。在真实世界船舶辐射噪声数据集上的实验结果表明,所提出的方法有效提升了水下声学目标识别性能。

英文摘要

Underwater acoustic target recognition is critical for maritime applications, yet it faces challenges arising from the complex and diverse nature of ship-radiated noise. To address these issues, we propose a robust deep learning-based framework. First, we introduce a feature extraction and fusion method based on variational mode decomposition (VMD) and the 3/2-D spectrum to generate high-fidelity 2-D DEMON spectral features, which effectively capture modulation envelope information. To further enhance feature representation, we design a one-dimensional convolutional neural network (1-D CNN) integrated with a novel Multi-Stage Multi-Type Attention Mechanism (MMATT) that adaptively refines features at different network depths. Within this mechanism, we propose a Residual Channel-Independent Spectral Attention Mechanism (R-CISAM) and a Multi-Scale Separate-and-Fuse Spectral Attention Mechanism (MS-SFSAM). Moreover, to mitigate performance degradation caused by severe class imbalance inherent in real-world ship-radiated noise data, we devise an Adjustable Class-Balanced Focal Loss (ACBFL), which provides flexibility across tasks with varying degrees of imbalance. Experimental results on a real-world ship-radiated noise dataset demonstrate that the proposed solutions effectively enhance underwater acoustic target recognition performance.

2605.08380 2026-05-22 cs.SE cs.AI

What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook

AI代理如何看待软件工程?-- 对MoltBook上纯AI技术讨论的实证研究

Junyu Huo, Ziqi Mao, Zihao Wan, Gouri Ginde

AI总结 本研究通过分析MoltBook上纯AI代理生成的技术讨论,探讨了AI代理在自主交互中产生的 discourse 特点,发现其讨论内容更侧重于安全与信任、内存管理、工具和API、调试与错误处理等主题,但缺乏人类开发者讨论中常见的具体项目细节和运行时信息。

详情
AI中文摘要

AI代理越来越多地被描述为软件工程的队友,但大多数研究仍集中在人类主导的工作流程中。本文研究了AI代理在主要相互交互时产生的讨论内容,探讨了这些讨论的组织方式以及与人类开发者讨论的区别。我们结合了对500篇帖子的人类开放编码,一个覆盖4,707篇英语过滤MoltBook技术帖子的集中加检查主题分析流程,以及与5,211篇人类生成的GitHub Discussions帖子的匹配比较。MoltBook技术讨论涵盖12个反复出现的主题,其中安全和信任占27.4%。在社区层面,活动高度集中:最大的子molt占63.5%的帖子(基尼系数=0.88),但一个稳定性感知的BERTopic流程仍能识别出32个非异常子主题。与GitHub Discussions基线相比,MoltBook讨论中较少具体的、上下文丰富的提示,如代码格式化 artifacts、环境细节、运行时失败和重现步骤。社会模仿仅以有限的形式出现,而理想化主要通过较低的 hedging 反映出来。总体而言,纯AI技术讨论是连贯但选择性的。它反复回到安全和信任、内存和上下文管理、工具和API、调试和错误处理、工作流自动化以及基础设施/运维等主题,而省略了人类开发者讨论中常见的许多项目本地和运行时细节。这可能反映了MoltBook中较少的环境特定失败、重现步骤和其他基础提示。

英文摘要

AI agents are increasingly framed as software-engineering teammates, yet most studies examine them inside human-centered workflows. Little is known about the discourse autonomous AI agents produce when they interact mainly with one another. This paper examines what autonomous agents discuss on MoltBook, how that discourse is organized, and how it differs from human developer discourse. We combine human open coding of a 500-post sample, a concentration-plus-check topic-analysis pipeline over 4,707 English-filtered MoltBook technology posts, and a matched comparison with 5,211 human-generated GitHub Discussions posts. MoltBook technology discourse spans 12 recurring themes, led by Security and Trust (27.4%). At the community level, activity is highly concentrated: the largest submolt accounts for 63.5% of posts (Gini = 0.88), yet a stability-aware BERTopic pipeline still identifies 32 non-outlier sub-topics. Relative to the GitHub Discussions baseline, MoltBook discourse contains fewer concrete, context-rich cues such as code-formatted artifacts, environment details, runtime failures, and reproduction steps. Social mimicry appears only in limited form, while idealization is reflected mainly through lower hedging. Overall, AI-only technical discourse is coherent but selective. It repeatedly returns to security and trust, memory and context management, tooling and APIs, debugging and error handling, workflow automation, and infrastructure/ops, while omitting much of the project-local and runtime detail common in human developer discourse. This may reflect fewer environment-specific failures, reproduction steps, and other grounding cues in MoltBook.