arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2256
2605.00457 2026-05-28 cs.NI cs.LG cs.SY eess.SY

Policy-Driven DRL-Based TXOP Adaptation in NR-U and Wi-Fi Coexistence

基于策略驱动的DRL的NR-U与Wi-Fi共存中的TXOP自适应

Po-Heng Chou, Yi-Fang Yu, Shou-Yu Chen, Chiapin Wang

AI总结 针对NR-U与Wi-Fi在非授权频谱共存中的频谱利用不平衡问题,提出一种基于策略驱动的深度强化学习框架,通过奖励设计实现公平性、吞吐量和效用的灵活权衡控制。

Comments 13 pages, 13 figures, 2 tables, submitted to IEEE Transactions on Cognitive Communications and Networking

详情
AI中文摘要

NR-U与Wi-Fi在非授权频谱中的共存引入了一个具有挑战性的共存管理问题,其中异构信道接入机制导致频谱利用的显著不平衡和Wi-Fi性能下降。为了解决这一挑战,我们提出了一种基于策略驱动的深度强化学习(DRL)框架,用于自适应传输机会(TXOP)控制,其中共存过程被建模为马尔可夫决策过程(MDP),深度Q网络(DQN)通过在线交互学习控制策略。一个关键贡献是通过奖励设计引入策略层,从而实现对公平性、吞吐量和效用之间共存权衡的显式控制。开发了三种策略,即绝对公平、适度公平和基于效用的公平,以实现不同的工作点。仿真结果表明,所提出的框架在严格公平控制下实现了高于0.9的Jain公平指数。与绝对公平相比,适度公平将总吞吐量提高了68.22%,而基于效用的策略进一步将效用提高了177.6%。这些结果表明,策略驱动控制为管理异构共存网络中的权衡提供了一种灵活有效的解决方案。

英文摘要

The coexistence of NR-U and Wi-Fi in unlicensed spectrum introduces a challenging coexistence management problem, where heterogeneous channel access mechanisms lead to a significant imbalance in spectrum utilization and degraded Wi-Fi performance. To address this challenge, we propose a policy-driven deep reinforcement learning (DRL) framework for adaptive transmission opportunity (TXOP) control, in which the coexistence process is formulated as a Markov decision process (MDP) and a deep Q-network (DQN) learns control policies through online interaction. A key contribution is the introduction of a policy layer via reward design, enabling explicit control of coexistence tradeoffs among fairness, throughput, and utility. Three policies, namely absolute fairness, moderate fairness, and utility-based fairness, are developed to achieve different operating points. Simulation results show that the proposed framework achieves a Jain fairness index above 0.9 under strict fairness control. Compared to absolute fairness, moderate fairness improves aggregate throughput by 68.22%, while the utility-based policy further enhances utility by 177.6%. These results demonstrate that policy-driven control provides a flexible and effective solution for managing tradeoffs in heterogeneous coexistence networks.

2605.00180 2026-05-28 cs.NI cs.CL

RouteProfile: Graph-Based Profiling for Cold-Start LLM Routing

RouteProfile:基于图的冷启动LLM路由画像方法

Jingjun Xu, Hongji Pu, Tao Feng, Haozhen Zhang, Jiaxuan You, Ge Liu

AI总结 针对冷启动LLM路由中新模型缺乏交互数据的问题,提出基于图结构的RouteProfile框架,利用技术报告中的公开信号构建模型画像,实验表明结构化画像优于扁平基线,且模型家族元数据比基准域信息更可靠。

详情
AI中文摘要

LLM路由在多样化用户需求和部署约束下选择合适模型日益重要,但其实际效果取决于对新兴查询和新发布模型的持续适应。新LLM集成尤其具有挑战性,因为新发布模型缺乏路由训练所需的查询-响应-奖励交互,且无法像新查询那样通过语义嵌入直接画像。现有画像存在局限:LLM生成的描述往往粗糙,而基于交互的嵌入构建成本高昂。为解决此问题,我们提出RouteProfile,一种基于图的画像框架,从技术报告或模型卡中的公开信号(包括模型家族、模型描述、报告基准分数和基准域)构建LLM画像。RouteProfile将这些异构信号组织成图,并从组织形式、表示类型、聚合深度和学习配置四个维度研究画像构建。我们在无训练冷启动路由和新LLM集成设置中评估RouteProfile。实验表明:(1) 结构化画像在无训练冷启动路由中优于扁平基线;(2) 模型家族元数据比基准域信息更可靠;(3) 有效的新LLM集成需要画像-路由协同设计。总体而言,我们的发现强调了画像设计对于使路由系统适应不断发展的模型生态系统的重要性。

英文摘要

LLM routing is increasingly important for selecting suitable models under diverse user needs and deployment constraints, but its practical effectiveness depends on continual adaptation to emerging queries and newly released models. New-LLM integration is particularly challenging, as newly released models lack the query-response-reward interactions required for router training and cannot be profiled as directly as new queries via semantic embeddings. Existing profiles are limited: LLM-generated descriptions are often coarse, while interaction-based embeddings are costly to construct. To address this problem, we propose RouteProfile, a graph-based profiling framework that constructs LLM profiles from public signals in technical reports or model cards, including model family, model description, reported benchmark scores, and benchmark domains. RouteProfile organizes these heterogeneous signals into a graph and studies profile construction along four dimensions: organizational form, representation type, aggregation depth, and learning configuration. We evaluate RouteProfile in training-free cold-start routing and new-LLM integration settings. Experiments show that: (1) structured profiles outperform flat baselines in training-free cold-start routing; (2) model family metadata is more reliable than benchmark domain information; and (3) effective new-LLM integration requires profile-router co-design. Overall, our findings highlight the importance of profile design for enabling routing systems to adapt to the evolving model ecosystem.

2605.00025 2026-05-28 q-bio.NC cs.CL cs.HC cs.LG eess.AS

MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis

MoDAl: 基于去相关的自监督神经模态发现用于语音神经假体

Yuanhao Chen, Peter Chin

AI总结 提出MoDAl框架,通过对比学习和对齐损失与去相关损失之间的协同作用,从多脑区发现互补神经模态,在Brain-to-Text Benchmark '24上将词错误率从26.3%降至21.6%。

详情
AI中文摘要

语音神经假体系统在无听觉输出的情况下从神经活动解码预期语音,为言语障碍患者恢复交流提供了途径。当前方法主要从运动皮层区域解码,忽略了其他区域——如布罗卡区的一部分44区——这些区域可能编码互补的语言信息。我们提出了MoDAl(模态去相关与对齐)框架,该框架通过在共享投影空间中两个目标的相互作用来发现互补的神经模态。对比损失将多个并行脑编码器中的每一个与预训练大语言模型(LLM)的文本嵌入对齐,而去相关损失防止编码器合并成重复表示。我们证明这些目标之间存在富有成效的张力:对比对齐诱导传递性模态合并,而去相关必须抵消这一点,以使框架发现多样的神经语言学模态。在Brain-to-Text Benchmark '24上,与之前最佳端到端方法相比,MoDAl将词错误率(WER)从26.3%降低到21.6%,其中纳入先前丢弃的44区信号的增益完全来自去相关机制。对发现模态的分析揭示了功能特化:接收44区输入的编码器捕获结构和句法属性(句子长度、语法语态、wh-词),这与布罗卡区的神经语言学理解一致。

英文摘要

Speech neuroprosthesis systems decode intended speech from neural activity in the absence of audible output, offering a path to restoring communication for individuals with speech-impairing conditions. Current approaches decode predominantly from motor cortical areas, discarding others -- such as area 44, part of Broca's area -- that may encode complementary linguistic information. We introduce MoDAl (Modality Decorrelation and Alignment), a framework that discovers complementary neural modalities through the interplay of two objectives in a shared projection space. A contrastive loss aligns each of several parallel brain encoders with the text embeddings of a pretrained large language model (LLM), while a decorrelation loss prevents the encoders from coalescing to duplicative representations. We prove that these objectives are in productive tension: Contrastive alignment induces transitive modality coalescence, which decorrelation must counteract for the framework to discover diverse neurolinguistic modalities. On the Brain-to-Text Benchmark '24, MoDAl reduces word error rate (WER) from 26.3% to 21.6% compared to the previous best end-to-end method, with the gain from incorporating previously discarded area 44 signals arising entirely from the decorrelation mechanism. Analysis of the discovered modalities reveals functional specialization: Encoders receiving area 44 input capture structural and syntactic properties (sentence length, grammatical voice, wh-words), consistent with the neurolinguistic understanding of Broca's area.

2604.23184 2026-05-28 cs.IT cs.LG math.IT

A Unified Fractional Regularization Framework for Sparse Recovery

用于稀疏恢复的统一分数阶正则化框架

Yinhao Zhao, Haoyu He, Chuanqi Ma, Hao Wang

AI总结 提出基于 ℓ1/ℓp^q 模型的统一分数阶正则化框架,通过参数 p 和 q 泛化多种稀疏正则化器,理论刻画了与 ℓ1-αℓp 模型的一阶驻点等价性,并建立了基于 RIP 的充分恢复条件,开发了 MM 算法并证明收敛性,实验表明在稀疏恢复和 MRI 重建中优于现有方法。

详情
AI中文摘要

我们提出了一种基于 ℓ1/ℓp^q 模型的统一分数阶正则化框架,用于稀疏信号恢复。该模型推广了几种广泛使用的稀疏促进正则化器,并通过参数 p 和 q 提供了额外的灵活性。我们的主要理论贡献是刻画了 ℓ1/ℓp^q 公式的一阶驻点与减法 ℓ1-αℓp 模型之间的等价性,从而为这些非凸正则化器提供了统一视角。此外,我们在受限等距性质(RIP)下建立了一个新的充分恢复条件,表明所提出的框架可以提供更宽松的恢复保证和更好的鲁棒性。为了求解由此产生的非凸问题,我们开发了一种最大最小化(MM)算法,并利用 Kurdyka-Łojasiewicz(KL)性质证明了其收敛性。在不同感知矩阵和 MRI 重建上的稀疏恢复问题数值实验表明,所提出的方法在恢复精度上优于现有方法。

英文摘要

We propose a unified fractional regularization framework for sparse signal recovery based on the $\ell_1/\ell_p^q$ model. This model generalizes several widely used sparsity-promoting regularizers and provides additional flexibility through the parameters $p$ and $q$. Our main theoretical contribution is the characterization of the equivalence between the first-order stationary points of the $\ell_1/\ell_p^q$ formulation and the subtractive $\ell_1-α\ell_p$ model, thereby offering a unified perspective on these nonconvex regularizers. In addition, we establish a new sufficient recovery condition under the Restricted Isometry Property (RIP), which shows that the proposed framework can provide relaxed recovery guarantees and improved robustness. To solve the resulting nonconvex problem, we develop a majorization--minimization (MM) algorithm and prove its convergence by using the Kurdyka--Łojasiewicz (KL) property. Numerical experiments on sparse recovery problems with different sensing matrices and MRI reconstruction demonstrate that the proposed approach outperforms existing methods in recovery accuracy.

2604.20857 2026-05-28 cs.IR cs.AI

DiagramBank: A Quality-Audited Dataset of Scientific Schematic Diagrams with Multi-Level Document Context

DiagramBank: 一个经过质量审核的科学示意图数据集,包含多级文档上下文

Ling Yue, Tingwen Zhang, Jiaying Wang, Zhen Xu, Shaowu Pan

AI总结 提出DiagramBank,一个从OpenReview的AI/ML会议中精选的57,100个示意图数据集,通过级联过滤管道和手动盲审确保高质量,并保留文档上下文,用于科学文档理解、示意图检索和基准构建。

详情
AI中文摘要

科学论文使用示意图来传达方法、工作流程和系统结构,然而现有的科学图形语料库通常将它们与图表、截图和照片混合在一起,并且很少保留文档上下文。我们介绍了DiagramBank,一个从OpenReview主办的AI/ML会议中精选的57,100个示意图的质量审核数据集。每条记录将示意图图像与其论文标题、摘要、图表标题、文本内图表引用跨度、会议/年份元数据、来源字段和过滤标签关联起来。DiagramBank是用于科学文档理解、示意图检索、语料库分析和未来基准构建的可重用资源。我们描述了其提取和级联过滤管道、发布模式、置信度控制视图、数据集卡和索引工具。对发布的级联过滤记录进行的手动盲审估计精度为93.67%,另外的CLIP阈值分析描述了更简单过滤视图的精度-覆盖权衡。我们进一步提供了轻量级的元数据索引和编写示例,以说明下游协议,而不将这些工具视为独立方法。代码公开于:https://github.com/csml-rpi/DiagramBank。

英文摘要

Scientific papers use schematic diagrams to communicate methods, workflows, and system structure, yet existing scientific-figure corpora often mix them with plots, screenshots, and photographs and rarely preserve document context. We introduce DiagramBank, a quality-audited dataset of 57,100 schematic diagrams curated from OpenReview-hosted AI/ML venues. Each record links a diagram image to its paper title, abstract, figure caption, in-text figure-reference spans, venue/year metadata, provenance fields, and filtering labels. DiagramBank is a reusable resource for scientific-document understanding, diagram retrieval, corpus analysis, and future benchmark construction. We describe its extraction and cascade-filtering pipeline, release schema, confidence-controlled views, dataset card, and indexing utilities. A manual blind audit of the released cascade-filtered records estimates 93.67% precision, and a separate CLIP threshold analysis characterizes the precision--coverage trade-off for simpler filtering views. We further provide lightweight metadata-indexing and authoring examples to illustrate downstream protocols without treating these utilities as standalone methods. The code is public at: https://github.com/csml-rpi/DiagramBank.

2512.15791 2026-05-28 cs.CY cs.AI cs.CL

Evaluation of AI Ethics Tools in Language Models: A Developers' Perspective Case Study

语言模型中AI伦理工具评估:开发者视角案例研究

Jhessica Silva, Diego A. B. Moreira, Gabriel O. dos Santos, Alef Ferreira, Helena Maia, Sandra Avila, Helio Pedrini

AI总结 通过文献筛选和开发者访谈,评估四种AI伦理工具在葡萄牙语语言模型中的应用效果,发现它们能指导一般伦理考虑但未覆盖模型特有方面。

Comments 7 figures, 11 tables. Accepted for publication in AI and Ethics

详情
AI中文摘要

在人工智能中,语言模型因能够通过文本生成模拟与人类真实对话的系统被广泛采用而变得日益重要。由于它们对社会的影响,开发和部署这些语言模型必须负责任地进行,关注其负面影响和可能的危害。在此背景下,AI伦理工具(AIETs)的出版物数量近期有所增加。这些AIETs旨在通过引入公认的价值观来指导AI的设计、开发和使用阶段,帮助开发者、公司、政府和其他利益相关者建立对其技术的信任、透明度和责任。然而,许多AIETs缺乏良好的文档、使用示例以及在实践中有效性的证明。本文提出了一种评估语言模型中AIETs的方法。我们的方法包括对213个AIETs进行广泛的文献调查,在应用纳入和排除标准后,我们选择了四个AIETs:模型卡片、ALTAI、事实表以及危害建模。为了评估,我们将AIETs应用于为葡萄牙语开发的语言模型,并对它们的开发者进行了35小时的访谈。评估考虑了开发者对AIETs在帮助识别其模型伦理考量方面的使用和质量的看法。结果表明,所应用的AIETs可作为制定关于语言模型的一般伦理考量的指南。然而,我们注意到它们并未解决这些模型的独特方面,例如习语表达。此外,这些AIETs未能帮助识别葡萄牙语模型的潜在负面影响。

英文摘要

In Artificial Intelligence (AI), language models have gained significant importance due to the widespread adoption of systems capable of simulating realistic conversations with humans through text generation. Because of their impact on society, developing and deploying these language models must be done responsibly, with attention to their negative impacts and possible harms. In this scenario, the number of AI Ethics Tools (AIETs) publications has recently increased. These AIETs are designed to help developers, companies, governments, and other stakeholders establish trust, transparency, and responsibility with their technologies by bringing accepted values to guide AI's design, development, and use stages. However, many AIETs lack good documentation, examples of use, and proof of their effectiveness in practice. This paper presents a methodology for evaluating AIETs in language models. Our approach involved an extensive literature survey on 213 AIETs, and after applying inclusion and exclusion criteria, we selected four AIETs: Model Cards, ALTAI, FactSheets, and Harms Modeling. For evaluation, we applied AIETs to language models developed for the Portuguese language, conducting 35 hours of interviews with their developers. The evaluation considered the developers' perspective on the AIETs' use and quality in helping to identify ethical considerations about their model. The results suggest that the applied AIETs serve as a guide for formulating general ethical considerations about language models. However, we note that they do not address unique aspects of these models, such as idiomatic expressions. Additionally, these AIETs did not help to identify potential negative impacts of models for the Portuguese language.

2507.07067 2026-05-28 eess.SP cs.LG

How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks

如何弥合数字孪生辅助电信网络中的仿真到现实差距

Clement Ruah, Houssem Sifaou, Osvaldo Simeone, Bashir M. Al-Hashimi

AI总结 本文综述了通过数字孪生校准和仿真到现实差距感知训练策略来弥合合成数据与真实数据之间差异的两种互补方法。

Comments This work has been accepted for publication in IEEE Communications Magazine

详情
AI中文摘要

由于缺乏特定部署数据,为电信训练有效的人工智能模型具有挑战性。真实数据收集成本高昂,且可用数据集通常无法捕捉网络环境的独特操作条件和上下文变异性。数字孪生为此问题提供了潜在解决方案,因为针对当前网络部署定制的模拟器可以生成站点特定数据以扩充可用训练数据集。然而,需要开发解决方案来弥合合成数据与真实数据之间固有的仿真到现实(sim-to-real)差距。本文综述了两种互补策略的最新进展:1)通过真实世界测量校准数字孪生(DTs),以及2)使用仿真到现实差距感知训练策略来鲁棒地处理数字孪生生成数据与真实数据之间的残余差异。对于后者,我们评估了两种概念上不同的方法,它们分别在环境层面通过贝叶斯学习或在训练损失层面通过预测驱动推理来建模仿真到现实差距。

英文摘要

Training effective artificial intelligence models for telecommunications is challenging due to the scarcity of deployment-specific data. Real data collection is expensive, and available datasets often fail to capture the unique operational conditions and contextual variability of the network environment. Digital twinning provides a potential solution to this problem, as simulators tailored to the current network deployment can generate site-specific data to augment the available training datasets. However, there is a need to develop solutions to bridge the inherent simulation-to-reality (sim-to-real) gap between synthetic and real-world data. This paper reviews recent advances on two complementary strategies: 1) the calibration of digital twins (DTs) through real-world measurements, and 2) the use of sim-to-real gap-aware training strategies to robustly handle residual discrepancies between digital twin-generated and real data. For the latter, we evaluate two conceptually distinct methods that model the sim-to-real gap either at the level of the environment via Bayesian learning or at the level of the training loss via prediction-powered inference.

2603.24631 2026-05-28 cs.SE cs.AI

Coherence Collapse: Diagnosing Why Code Agents Fail After Reaching the Right Code

一致性崩溃:诊断代码智能体在到达正确代码后失败的原因

Myeongsoo Kim, Dingmin Wang, Siwei Cui, Farima Farmahinifarahani, Terry Yue Zhuo, Shweta Garg, Baishakhi Ray, Rajdeep Mukherjee, Varun Kumar

AI总结 通过轨迹分解分析,发现代码智能体在定位正确后仍因编辑质量缺陷(尤其是“一致性崩溃”)而失败,并提出了无需参考的共识驱动改进方法。

详情
AI中文摘要

代码智能体解决了SWE-bench Verified中65-70%的问题,但Pass@1无法告诉我们其余问题失败的原因,并且我们表明,没有轨迹数据,有能力的模型的失败会被系统性地误诊。我们引入了TRAJEVAL,一种无需训练的智能体轨迹分解方法,将其分解为参考补丁对齐的搜索、读取和编辑阶段,并应用于跨越三种架构和七个模型的16,758条轨迹。有能力的模型的主要失败并非定位问题:SWE-Agent和OpenHands上60-69%的失败到达并编辑了正确的函数,但仍然产生不正确的补丁,并且这种模式在仅使用bash的LiveSWEAgent上对大多数模型持续存在。在这个编辑质量残差中,我们识别出“一致性崩溃”,即智能体到达正确的代码然后覆盖或破坏它,作为最大的主题,在SWE-bench Verified和多语言PolyBench Verified中重复出现。在5个案例中,智能体生成了与黄金参考补丁位相同的中间轨迹,然后破坏了它;一个编辑提交检查点恢复了所有5个案例,对抗SWE-bench Docker测试框架。一种无需参考的共识驱动变体在GPT-5上产生了方向性的+3.0个百分点Pass@1测量(p=0.08)。

英文摘要

Code agents resolve 65-70% of SWE-bench Verified issues, but Pass@1 cannot tell us why the rest fail, and, as we show, capable-model failures are systematically misdiagnosed without trajectory data. We introduce TRAJEVAL, a training-free decomposition of agent trajectories into reference-patch-aligned search, read, and edit stages, and apply it across 16,758 trajectories spanning three architectures and seven models. The dominant failure of capable models is not localization: 60-69% of failures on SWE-Agent and OpenHands reach and edit the correct functions yet still produce incorrect patches, and the pattern persists for most models on the bash-only LiveSWEAgent. Within this Edit-Quality residual, we identify Coherence Collapse, where the agent reaches correct code and then overwrites or thrashes it, as the largest theme, replicating across SWE-bench Verified and the multilingual PolyBench Verified. In 5 cases, the agent produces a patch bit-identical to the gold reference mid-trajectory and destroys it later; an edit-commit checkpoint recovers all 5 against the SWE-bench Docker harness. A reference-free consensus-driven variant yields a directional +3.0 pp Pass@1 measurement on GPT-5 (p=0.08).

2603.22335 2026-05-28 cs.IR cs.AI

Causal Direct Preference Optimization for Distributionally Robust Generative Recommendation

因果直接偏好优化用于分布鲁棒的生成式推荐

Chu Zhao, Enneng Yang, Jianzhe Zhao, Guibing Guo

AI总结 针对直接偏好优化(DPO)在生成式推荐中放大环境混杂因素导致的虚假相关性问题,提出CausalDPO,通过因果不变性学习、后门调整和软聚类环境建模来提升分布外泛化性能。

Comments 22 pages, 3 figures

详情
AI中文摘要

直接偏好优化(DPO)通过最小化偏好对齐损失,引导大型语言模型(LLMs)生成与用户历史行为分布一致的推荐。然而,我们的系统实证研究和理论分析表明,DPO倾向于放大对齐过程中由环境混杂因素引起的虚假相关性,显著削弱了基于LLM的生成式推荐方法在分布外(OOD)场景下的泛化能力。为缓解这一问题,我们提出CausalDPO,它是DPO的扩展,引入了因果不变性学习机制。该方法在偏好对齐阶段采用后门调整策略以消除环境混杂因素的干扰,使用软聚类方法显式建模潜在环境分布,并通过不变性约束增强跨环境的鲁棒一致性。理论分析表明,CausalDPO能够有效捕捉用户在多环境下的稳定偏好结构,从而提升基于LLM的推荐模型的OOD泛化性能。我们在四种代表性分布偏移设置下进行了大量实验,验证了CausalDPO的有效性,在四个评估指标上平均性能提升17.17%。

英文摘要

Direct Preference Optimization (DPO) guides large language models (LLMs) to generate recommendations aligned with user historical behavior distributions by minimizing preference alignment loss. However, our systematic empirical research and theoretical analysis reveal that DPO tends to amplify spurious correlations caused by environmental confounders during the alignment process, significantly undermining the generalization capability of LLM-based generative recommendation methods in out of distribution (OOD) scenarios. To mitigate this issue, we propose CausalDPO, an extension of DPO that incorporates a causal invariance learning mechanism. This method introduces a backdoor adjustment strategy during the preference alignment phase to eliminate interference from environmental confounders, explicitly models the latent environmental distribution using a soft clustering approach, and enhances robust consistency across diverse environments through invariance constraints. Theoretical analysis demonstrates that CausalDPO can effectively capture users stable preference structures across multiple environments, thereby improving the OOD generalization performance of LLM-based recommendation models. We conduct extensive experiments under four representative distribution shift settings to validate the effectiveness of CausalDPO, achieving an average performance improvement of 17.17% across four evaluation metrics.

2504.08923 2026-05-28 cs.LO cs.AI math.LO

A convergence law for continuous logic and continuous structures with finite domains

有限域连续逻辑与连续结构的收敛律

Vera Koponen

AI总结 本文研究有限域上的连续关系结构及其多值逻辑CLA,通过证明每个CLA公式渐近等价于无聚合函数公式,进而建立CLA的收敛律。

详情
Journal ref
Information and Computation, Volume 310, May 2026, 105441
AI中文摘要

我们考虑有限域$[n] := \{1, \ldots, n\}$上的连续关系结构,以及一种多值逻辑$CLA$,其取值于单位区间并使用连续连接词和连续聚合函数。$CLA$包含了“常规”有限结构上的一阶逻辑。对于每个关系符号$R$和满足元组长度与$R$的元数匹配的恒等约束$ic$,我们关联一个连续概率密度函数$μ_R^{ic} : [0, 1] o [0, \infty)$。我们还考虑域为$[n]$的连续结构集合$\mathbf{W}_n$上的概率分布,使得对于每个关系符号$R$、恒等约束$ic$以及满足$ic$的元组$ar{a}$,$R(ar{a})$的值的分布由$μ_R^{ic}$给出,且独立于其他关系符号或其他元组的值。在此设定下,我们证明$CLA$中的每个公式渐近等价于一个不含任何聚合函数的公式。这用于证明$CLA$的收敛律,对于无自由变量的公式表述如下:若$φ\in CLA$无自由变量且$I \subseteq [0, 1]$是一个区间,则存在$α\in [0, 1]$,使得当$n$趋于无穷时,$φ$的值落在$I$中的概率趋于$α$。

英文摘要

We consider continuous relational structures with finite domain $[n] := \{1, \ldots, n\}$ and a many valued logic, $CLA$, with values in the unit interval and which uses continuous connectives and continuous aggregation functions. $CLA$ subsumes first-order logic on ``conventional'' finite structures. To each relation symbol $R$ and identity constraint $ic$ on a tuple the length of which matches the arity of $R$ we associate a continuous probability density function $μ_R^{ic} : [0, 1] \to [0, \infty)$. We also consider a probability distribution on the set $\mathbf{W}_n$ of continuous structures with domain $[n]$ which is such that for every relation symbol $R$, identity constraint $ic$, and tuple $\bar{a}$ satisfying $ic$, the distribution of the value of $R(\bar{a})$ is given by $μ_R^{ic}$, independently of the values for other relation symbols or other tuples. In this setting we prove that every formula in $CLA$ is asymptotically equivalent to a formula without any aggregation function. This is used to prove a convergence law for $CLA$ which reads as follows for formulas without free variables: If $φ\in CLA$ has no free variable and $I \subseteq [0, 1]$ is an interval, then there is $α\in [0, 1]$ such that, as $n$ tends to infinity, the probability that the value of $φ$ is in $I$ tends to $α$.

2603.13283 2026-05-28 cs.NE cs.LG

Bullet Trains: Parallelizing Training of Temporally Precise Spiking Neural Networks

子弹列车:并行训练时间精确的脉冲神经网络

Todd Morrill, Christian Pehle, Anthony Zador

AI总结 提出使用并行关联扫描和可微脉冲时间求解器,实现精确硬重置动力学下的脉冲神经网络高效训练,在GPU上获得高达44倍加速。

Comments Published as a conference paper at ICML 2026

详情
AI中文摘要

连续时间、事件原生的脉冲神经网络(SNN)严格在脉冲事件上运行,将脉冲时间和顺序视为表示,而非时间离散化的产物。这种观点与生物计算、事件传感器和神经形态处理器的原生分辨率一致,同时使计算和内存随事件数量扩展。然而,两个挑战阻碍了实用的、端到端可训练的事件型SNN系统:1)精确的充电-放电-重置动力学强制对输入脉冲进行顺序处理,2)必须在不使用时间箱的情况下求解精确的脉冲时间。我们解决了这两个问题。首先,我们使用并行关联扫描一次消耗多个输入脉冲,在保留精确硬重置动力学的同时,相比顺序模拟实现了高达44倍的加速。其次,我们实现了可微脉冲时间求解器,无需离散时间近似或限制性解析假设即可计算机器精度的脉冲时间。我们在四个基于事件的GPU数据集上展示了使用我们的解决方案训练SNN的可行性。

英文摘要

Continuous-time, event-native spiking neural networks (SNNs) operate strictly on spike events, treating spike timing and ordering as the representation rather than an artifact of time discretization. This viewpoint aligns with biological computation and with the native resolution of event sensors and neuromorphic processors, while enabling compute and memory that scale with the number of events. However, two challenges hinder practical, end-to-end trainable event-based SNN systems: 1) exact charge--fire--reset dynamics impose inherently sequential processing of input spikes, and 2) precise spike times must be solved without time bins. We address both. First, we use parallel associative scans to consume multiple input spikes at once, yielding up to 44x speedups over sequential simulation while retaining exact hard-reset dynamics. Second, we implement differentiable spike time solvers that compute spike times to machine-precision without discrete-time approximations or restrictive analytic assumptions. We demonstrate the viability of training SNNs using our solutions on four event-based datasets on GPUs.

2603.12824 2026-05-28 cs.IR cs.CV cs.LG

NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval

NanoVDR:将20亿参数的视觉语言检索器蒸馏为7000万参数的纯文本编码器用于视觉文档检索

Zhuchenyang Liu, Yao Zhang, Yu Xiao

AI总结 利用查询-文档不对称性,通过蒸馏将20亿参数的视觉语言模型教师蒸馏为7000万参数的纯文本学生编码器,采用点态余弦对齐目标,实现视觉文档检索的高效推理。

详情
AI中文摘要

基于视觉语言模型(VLM)的检索器已将视觉文档检索(VDR)提升到令人印象深刻的水平。它们需要相同的数十亿参数编码器用于文档索引和查询编码,即使对于纯文本查询也会导致高延迟和GPU依赖。我们观察到这种设计是不必要对称的:文档在视觉上复杂且需要强大的视觉理解,而查询只是短文本字符串。NanoVDR利用这种查询-文档不对称性,解耦两个编码路径:冻结的20亿VLM教师离线索引文档,而蒸馏后的纯文本学生(小至6900万参数)在推理时编码查询。关键设计选择是蒸馏目标。通过对三个骨干网络和22个ViDoRe基准数据集的六个目标进行系统比较,我们发现查询文本上的点态余弦对齐始终优于基于排序和对比的替代方案,同时在训练期间仅需要预缓存的教师查询嵌入,无需处理文档。此外,我们识别出跨语言迁移是主要性能瓶颈,并通过使用机器翻译的查询增强训练数据廉价地解决它。最终的NanoVDR-S-Multi(DistilBERT,6900万)保留了教师质量的95.1%,在v2和v3上以32倍更少的参数和50倍更低的CPU查询延迟优于DSE-Qwen2(20亿),总训练成本低于13 GPU小时。

英文摘要

Vision-Language Model (VLM) based retrievers have advanced visual document retrieval (VDR) to impressive quality. They require the same multi-billion parameter encoder for both document indexing and query encoding, incurring high latency and GPU dependence even for plain-text queries. We observe that this design is unnecessarily symmetric: documents are visually complex and demand strong visual understanding, whereas queries are just short text strings. NanoVDR exploits this query--document asymmetry by decoupling the two encoding paths: a frozen 2B VLM teacher indexes documents offline, while a distilled text-only student as small as 69M parameters encodes queries at inference. The key design choice is the distillation objective. Through systematic comparison of six objectives across three backbones and 22 ViDoRe benchmark datasets, we find that pointwise cosine alignment on query text consistently outperforms ranking-based and contrastive alternatives, while requiring only pre-cached teacher query embeddings and no document processing during training. Furthermore, we identify cross-lingual transfer as the primary performance bottleneck, and resolve it cheaply by augmenting training data with machine-translated queries. The resulting NanoVDR-S-Multi (DistilBERT, 69M) retains 95.1\% of teacher quality and outperforms DSE-Qwen2 (2B) on v2 and v3 with 32$\times$ fewer parameters and 50$\times$ lower CPU query latency, at a total training cost under 13 GPU-hours.

2603.08761 2026-05-28 stat.ML cs.LG

No Certificate for Alignment: Two Independent Impossibilities and the Pareto Frontier of Achievable Safety Guarantees

对齐无证书:两个独立的不可行性与可实现安全保证的帕累托前沿

Ayushi Agarwal

AI总结 本文通过两个独立的不可行性定理证明,在标准计算复杂性和学习理论假设下,对开放或无界输入域的AI对齐进行形式化认证是不可能的,并刻画了可实现的安全保证的帕累托前沿。

详情
AI中文摘要

我们论证,在计算复杂性和学习理论的标准假设下,对开放或无界输入域上的AI对齐进行形式化认证是不可能的,并刻画了仍可实现的内容。两个结构独立的不可行性定理支持这一立场。语义障碍(定理1):判断一个系统是否在整个输入域上满足任何非平凡的对齐性质,对于前馈网络是NP难的,对于图灵完备架构是不可判定的——这是神经网络验证复杂性和Rice定理的直接推论。统计障碍(定理2):任何既正确又易处理的验证过程无法在整个输入域上满足完备性——这是从有限观测中认证无限域性质的不可能性的直接推论。这两个定理共同蕴含一个三难困境:没有过程能同时满足正确性(没有未对齐系统被认证)、完备性(没有对齐系统被拒绝)和易处理性(多项式运行时间)。每对性质可同时实现,但三者不可兼得。我们将这些结果整合为一个包含两个结构独立障碍的联合框架,证明它们的独立性,并通过构造性的覆盖间隙下界定量刻画可实现的帕累托前沿。

英文摘要

We argue that formal certification of AI alignment over open-ended or unbounded input domains is impossible under standard assumptions in computational complexity and learning theory, and characterise what remains achievable. Two structurally independent impossibility theorems support this position. The semantic barrier (Theorem 1): deciding whether a system satisfies any non-trivial alignment property over the full input domain is NP-hard for feedforward networks and undecidable for Turing-complete architectures -- a direct consequence of neural-network verification complexity and Rice's Theorem. The statistical barrier (Theorem 2): any verification procedure that is both sound and tractable cannot satisfy Completeness over the full input domain -- a direct consequence of the impossibility of certifying infinite-domain properties from finite observations. These two theorems jointly entail a trilemma: no procedure can simultaneously satisfy soundness (no misaligned system is certified), completeness (no aligned system is rejected), and tractability (polynomial runtime). Each pair is simultaneously achievable; all three are not. We combine these results as a joint framework of two structurally independent barriers, prove their independence, and characterise the achievable Pareto frontier quantitatively via a constructive coverage-gap lower bound.

2512.00252 2026-05-28 stat.ML cs.LG physics.ao-ph

DAISI: Data Assimilation with Inverse Sampling using Stochastic Interpolants

DAISI:基于随机插值逆采样的数据同化

Martin Andrae, Erik Wikingsson, So Takao, Tomas Landelius, Fredrik Lindsten

AI总结 提出DAISI算法,利用流式生成模型实现灵活的概率推断,通过逆采样结合预报信息与观测数据,解决传统高斯近似在复杂非线性系统中的局限性。

Comments Accepted at the International Conference on Machine Learning 2026, 44 pages, 28 figures

详情
AI中文摘要

数据同化是科学和工程应用的基石,它将模型预报与稀疏且带噪声的观测相结合,以估计潜在的系统状态。经典的高维数据同化方法,如集合卡尔曼滤波器,依赖于高斯近似,这在复杂动力学或观测算子中会被违反。为了解决这一局限性,我们引入了DAISI,一种基于流式生成模型的可扩展滤波算法,能够利用数据驱动的先验实现灵活的概率推断。核心思想是使用一个固定的、预训练好的生成先验,首先通过一种新颖的逆采样步骤融入预报信息,然后通过基于引导的条件采样同化观测。这使我们能够利用任何预报模型作为数据同化流程的一部分,而无需在每个同化步骤重新训练或微调生成先验。在具有挑战性的非线性系统上的实验表明,DAISI在稀疏、带噪声和非线性观测的情况下实现了准确的滤波结果,而传统方法在这些情况下表现不佳。DAISI的代码可在https://github.com/Erik-Wikingsson/DAISI获取。

英文摘要

Data assimilation (DA) is a cornerstone of scientific and engineering applications, combining model forecasts with sparse and noisy observations to estimate latent system states. Classical high-dimensional DA methods, such as the ensemble Kalman filter, rely on Gaussian approximations that are violated for complex dynamics or observation operators. To address this limitation, we introduce DAISI, a scalable filtering algorithm built on flow-based generative models that enables flexible probabilistic inference using data-driven priors. The core idea is to use a stationary, pre-trained generative prior that first incorporates forecast information through a novel inverse-sampling step, before assimilating observations via guidance-based conditional sampling. This allows us to leverage any forecasting model as part of the DA pipeline without having to retrain or fine-tune the generative prior at each assimilation step. Experiments on challenging nonlinear systems show that DAISI achieves accurate filtering results in regimes with sparse, noisy, and nonlinear observations where traditional methods struggle. The code for DAISI is available at https://github.com/Erik-Wikingsson/DAISI.

2602.04898 2026-05-28 cs.CR cs.AI

Semantic-level Backdoor Attack against Text-to-Image Diffusion Models

针对文本到图像扩散模型的语义级后门攻击

Tianxin Chen, Wenbo Jiang, Hongqiao Chen, Zhirun Zheng, Cheng Huang

AI总结 提出语义级后门攻击(SemBD),通过基于连续语义区域的表示级触发器替代离散文本模式,利用蒸馏编辑交叉注意力层的键和值投影矩阵植入后门,并引入语义正则化和多实体后门目标增强隐蔽性,实现100%攻击成功率并抵御输入级防御。

详情
AI中文摘要

文本到图像(T2I)扩散模型因其强大的生成能力而被广泛采用,但仍易受到后门攻击。现有攻击通常依赖于固定的文本触发器和单实体后门目标,使其极易受到基于枚举的输入防御和注意力一致性检测的攻击。在这项工作中,我们提出了语义级后门攻击(SemBD),它引入了基于连续语义区域而非离散文本模式的表示级触发器。SemBD通过基于蒸馏编辑交叉注意力层中的键和值投影矩阵来植入此类语义后门,使得语义等价但文本多样的提示能够激活后门。为了进一步增强隐蔽性,SemBD引入了语义正则化以防止在不完整语义下的意外激活,以及避免高度一致交叉注意力模式的多实体后门目标。大量实验表明,SemBD实现了100%的攻击成功率,同时保持了对最先进输入级防御的强鲁棒性。我们的代码可在https://github.com/DPAS-Lab/SemBD/获取。

英文摘要

Text-to-image (T2I) diffusion models are widely adopted for their strong generative capabilities, yet remain vulnerable to backdoor attacks. Existing attacks typically rely on fixed textual triggers and single-entity backdoor targets, making them highly susceptible to enumeration-based input defenses and attention-consistency detection. In this work, we propose Semantic-level Backdoor Attack (SemBD), which introduces representation-level triggers based on continuous semantic regions rather than discrete textual patterns. SemBD implants such semantic backdoors by distillation-based editing of the key and value projection matrices in cross-attention layers, enabling semantically equivalent but textually diverse prompts to activate the backdoor. To further enhance stealthiness, SemBD incorporates a semantic regularization to prevent unintended activation under incomplete semantics, as well as multi-entity backdoor targets that avoid highly consistent cross-attention patterns. Extensive experiments demonstrate that SemBD achieves a 100% attack success rate while maintaining strong robustness against state-of-the-art input-level defenses. Our code is available at https://github.com/DPAS-Lab/SemBD/.

2602.23754 2026-05-28 cs.GR cs.CV

Neural Image Space Tessellation efect

神经图像空间镶嵌效应

Youyang Du, Junqiu Zhu, Zheng Zeng, Lu Wang, Lingqi Yan

AI总结 提出一种轻量级屏幕空间后处理方法NIST,通过隐式变形图像空间轮廓并重新分配外观,减少低多边形渲染中的面状轮廓,实现接近基于镶嵌的平滑效果,且每帧成本几乎恒定。

详情
AI中文摘要

我们提出神经图像空间镶嵌效应(NIST),一种轻量级的屏幕空间后处理方法,用于减少低多边形渲染中的面状轮廓。NIST不进行图元镶嵌、创建新几何体或修改底层网格,而是利用低多边形渲染结果和简单的辅助G缓冲区属性,学习在图像空间中几何引导的对象轮廓平滑。其核心是,NIST首先隐式变形图像空间轮廓,然后学习在整个图像空间(包括变形区域)重新分配外观,保持纹理连续性并避免接缝伪影。实验表明,NIST减少了视觉上明显的几何面状化,并产生接近基于镶嵌的平滑参考的平滑、连贯轮廓,在我们测试的设置中每帧成本几乎恒定。据我们所知,NIST是第一个将低多边形轮廓面状化解决方案从渲染前几何阶段转移到渲染后屏幕空间阶段的工作。

英文摘要

We present Neural Image Space Tessellation effect (NIST), a lightweight screen-space post-processing approach for reducing the faceted silhouettes of low-poly renderings. Instead of tessellating primitives, creating new geometry, or modifying the underlying mesh, NIST uses the low-poly rendering result together with simple auxiliary G-buffer attributes to learn geometry-guided smoothing of object contours in image space. At its core, NIST first deforms image-space contours implicitly and then learns to reassign appearance in the whole image-space, including the deformed regions, preserving texture continuity and avoiding seam artifacts. Experiments show that NIST reduces visually apparent geometric faceting and produces smooth, coherent silhouettes close to tessellation-based smoothing references, with a nearly constant per-frame cost in our tested settings. To the best of our knowledge, NIST is the first work to move the solution of low-poly silhouette faceting from the pre-rendering geometry stage to a post-rendering screen-space stage.

2602.23602 2026-05-28 stat.ML cs.LG

Moment Matters: Mean and Variance Causal Graph Discovery from Heteroscedastic Observational Data

矩重要:从异方差观测数据中发现均值和方差因果图

Yoichi Chikahara

AI总结 提出贝叶斯矩驱动因果发现框架,从异方差观测数据中分别推断均值和方差因果图,并实现结构特征的不确定性量化。

Comments Accepted at KDD 2026. This is the full version of the accepted paper. 17 pages, 6 figures

详情
AI中文摘要

异方差性——即一个变量的方差随其他变量变化——在真实数据中普遍存在,从统计矩的角度阐明其产生原因对于科学知识发现和决策至关重要。然而,标准因果发现无法揭示哪些原因作用于均值还是方差,因为它返回一个单一的不考虑矩的图,限制了可解释性和下游干预设计。我们提出了一个贝叶斯、矩驱动的因果发现框架,从观测异方差数据中推断独立的 extit{均值}和 extit{方差}因果图。我们首先通过建立充分条件推导出这两个图可分别识别的识别结果。基于此理论,我们开发了一种变分推理方法,学习两个图的后验分布,从而实现对结构特征(如边、路径和子图)的原则性不确定性量化。为了解决具有两个图结构的异方差模型中参数优化的挑战,我们采用曲率感知优化方法,并开发了一种先验引入技术,利用节点顺序的领域知识,提高样本效率。在合成、半合成和真实数据上的实验表明,我们的方法能够准确恢复均值和方差结构,并优于最先进的基线方法。

英文摘要

Heteroscedasticity -- where the variance of a variable changes with other variables -- is pervasive in real data, and elucidating why it arises from the perspective of statistical moments is crucial in scientific knowledge discovery and decision-making. However, standard causal discovery does not reveal which causes act on the mean versus the variance, as it returns a single moment-agnostic graph, limiting interpretability and downstream intervention design. We propose a Bayesian, moment-driven causal discovery framework that infers separate \textit{mean} and \textit{variance} causal graphs from observational heteroscedastic data. We first derive the identification results by establishing sufficient conditions under which these two graphs are separately identifiable. Building on this theory, we develop a variational inference method that learns a posterior distribution over both graphs, enabling principled uncertainty quantification of structural features (e.g., edges, paths, and subgraphs). To address the challenges of parameter optimization in heteroscedastic models with two graph structures, we take a curvature-aware optimization approach and develop a prior incorporation technique that leverages domain knowledge on node orderings, improving sample efficiency. Experiments on synthetic, semi-synthetic, and real data show that our approach accurately recovers mean and variance structures and outperforms state-of-the-art baselines.

2602.22873 2026-05-28 math.AT cs.AI cs.CG

Learning Tangent Bundles and Characteristic Classes with Autoencoder Atlases

使用自编码器图谱学习切丛和示性类

Eduardo Paluzo-Hidalgo, Yuichi Ike

AI总结 本文提出一个理论框架,将流形学习中的多图自编码器与向量丛和示性类的经典理论联系起来,通过自编码器图谱定义转移映射并计算第一Stiefel-Whitney类,从而检测数据可定向性。

详情
AI中文摘要

我们引入了一个理论框架,将流形学习中的多图自编码器与向量丛和示性类的经典理论联系起来。我们不将自编码器视为产生单个全局欧几里得嵌入,而是将一组局部训练的编码器-解码器对视为流形上的学习图谱。我们证明,任何重建一致的自编码器图谱都能典范地定义满足上循环条件的转移映射,并且将这些转移映射线性化会得到一个向量丛,当潜在维度与流形的内在维度匹配时,该向量丛与切丛一致。这种构造提供了对数据的微分拓扑不变量的直接访问。特别地,我们证明第一Stiefel-Whitney类可以从学习到的转移映射的雅可比行列式的符号计算出来,从而得到检测可定向性的算法准则。我们还证明,非平凡的示性类对单图表示构成障碍,并且自编码器图的最小数量由流形的良好覆盖结构决定。最后,我们将我们的方法应用于低维可定向和不可定向流形,以及一个不可定向的高维图像数据集。

英文摘要

We introduce a theoretical framework that connects multi-chart autoencoders in manifold learning with the classical theory of vector bundles and characteristic classes. Rather than viewing autoencoders as producing a single global Euclidean embedding, we treat a collection of locally trained encoder-decoder pairs as a learned atlas on a manifold. We show that any reconstruction-consistent autoencoder atlas canonically defines transition maps satisfying the cocycle condition, and that linearising these transition maps yields a vector bundle coinciding with the tangent bundle when the latent dimension matches the intrinsic dimension of the manifold. This construction provides direct access to differential-topological invariants of the data. In particular, we show that the first Stiefel-Whitney class can be computed from the signs of the Jacobians of learned transition maps, yielding an algorithmic criterion for detecting orientability. We also show that non-trivial characteristic classes provide obstructions to single-chart representations, and that the minimum number of autoencoder charts is determined by the good cover structure of the manifold. Finally, we apply our methodology to low-dimensional orientable and non-orientable manifolds, as well as to a non-orientable high-dimensional image dataset.

2602.18481 2026-05-28 q-fin.TR cs.AI

AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models

AlphaForgeBench:用大型语言模型对端到端交易策略设计进行基准测试

Wentao Zhang, Mingxuan Zhao, Jincheng Gao, Jieshun You, Huaiyu Jia, Yilei Zhao, Bo An, Shuo Sun

AI总结 提出AlphaForgeBench框架,将LLM从随机交易代理重新定义为量化研究员,通过生成可执行alpha因子和基于因子的交易策略,消除执行不稳定,实现可复现的金融推理评估。

详情
AI中文摘要

大型语言模型(LLMs)的快速发展催生了大量金融基准测试,从静态知识评估演变为交互式交易模拟。然而,现有的实时交易评估框架在很大程度上忽略了一个关键的失败模式:LLMs在金融不确定性下的序贯决策中表现出严重的行为不稳定性。通过大量实验,我们表明,当作为交易代理部署时,LLMs表现出极端的运行间方差,即使在确定性解码下也会产生不一致的动作序列,并且经常在相邻时间步产生不合理的动作翻转。我们将这些行为归因于LLMs的无状态自回归特性,它们缺乏对先前动作的持久记忆,以及它们对投资组合分配任务中连续到离散动作映射的敏感性。这些缺陷从根本上破坏了现有许多在线和离线交易基准的可靠性和可复现性。为了解决这些局限性,我们提出了AlphaForgeBench,一个原则性的评估框架,将LLMs重新定义为量化研究员而非随机交易代理。AlphaForgeBench不要求模型产生离散的交易动作,而是要求模型生成可执行的alpha因子,并基于金融知识构建基于因子的交易策略。这种范式将推理与执行机制解耦,实现了确定性和可复现的评估,同时与真实的量化研究工作流程保持一致。在多个最先进的LLM上进行的大量实验表明,AlphaForgeBench消除了执行引起的不稳定性,并为评估金融推理、策略制定和alpha发现提供了严格的基准。网页链接:https://finbrain-lab-hkustgz.github.io/AlphaForgeBench

英文摘要

The rapid advancement of Large Language Models (LLMs) has led to a surge of financial benchmarks, evolving from static knowledge evaluation toward interactive trading simulations. However, existing frameworks for evaluating real-time trading largely overlook a critical failure mode: the severe behavioral instability of LLMs in sequential decision-making under financial uncertainty. Through extensive experiments, we show that when deployed as trading agents, LLMs exhibit extreme run-to-run variance, generate inconsistent action sequences even under deterministic decoding, and frequently produce irrational action flipping across adjacent time steps. We attribute these behaviors to the stateless autoregressive nature of LLMs, which lack persistent memory of prior actions, together with their sensitivity to continuous-to-discrete action mappings in portfolio allocation tasks. These deficiencies fundamentally undermine the reliability and reproducibility of many existing online and offline trading benchmarks. To address these limitations, we propose AlphaForgeBench, a principled evaluation framework that redefines LLMs as quantitative researchers rather than stochastic trading agents. Instead of producing discrete trading actions, AlphaForgeBench requires models to generate executable alpha factors and compose factor-based trading strategies grounded in financial knowledge. This paradigm decouples reasoning from execution mechanics, enabling deterministic and reproducible evaluation while remaining aligned with real-world quantitative research workflows. Extensive experiments across multiple state-of-the-art LLMs demonstrate that AlphaForgeBench eliminates execution-induced instability and provides a rigorous benchmark for evaluating financial reasoning, strategy formulation, and alpha discovery. Webpage at https://finbrain-lab-hkustgz.github.io/AlphaForgeBench

2602.15198 2026-05-28 cs.MA cs.AI cs.CL

Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

Colosseum: 审计合作多智能体系统中的合谋行为

Mason Nakamura, Abhinav Kumar, Saswat Das, Sahar Abdelnabi, Saaduddin Mahmud, Ferdinando Fioretto, Shlomo Zilberstein, Eugene Bagdasarian

AI总结 提出Colosseum框架,通过形式化决策框架和基于遗憾的度量审计LLM智能体在合作多智能体系统中的合谋行为,发现大多数模型存在新兴合谋倾向,并观察到“纸上合谋”现象。

详情
AI中文摘要

多智能体系统中,通过自由形式语言通信的LLM智能体能够实现复杂的协调以解决复杂的合作任务。当一组智能体形成联盟并合谋追求次要目标、降低联合目标时,这会产生独特的安全问题。在本文中,我们提出Colosseum,一个用于审计多智能体设置中LLM智能体合谋行为的框架。我们通过形式化的多智能体决策框架来理解智能体如何合作,并通过相对于合作最优的遗憾来度量基于行动的合谋行为,并将其与基于通信的合谋行为进行比较。Colosseum能够在良性设置、不同联盟目标、说服策略和网络拓扑下审计LLM智能体的合谋行为。然后,我们通过创建智能体之间的秘密通信渠道引入一种新的行为探针,表明大多数开箱即用的模型在此探针下表现出合谋倾向,我们称之为新兴合谋。此外,我们发现了“纸上合谋”现象,即智能体在文本中计划合谋但往往选择非合谋行动。Colosseum提供了一种审计合作多智能体系统中合谋的新方法,同时呈现了关于合谋如何出现、什么影响合谋效率以及哪些策略可能缓解合谋的观察。

英文摘要

Multi-agent systems, where LLM agents communicate through free-form language, enable sophisticated coordination for solving complex cooperative tasks. This surfaces a unique safety problem when a group of agents forms a coalition and colludes to pursue secondary goals and degrade the joint objective. In this paper, we present Colosseum, a framework for auditing LLM agents' collusive behavior in multi-agent settings. We ground how agents cooperate through a formal multi-agent decision-making framework and measure action-based collusive behavior in actions via regret relative to the cooperative optimum and compare it with communication-based collusive behavior. Colosseum enables audits of LLM agents for collusion under benign settings, different coalition objectives, persuasion tactics, and network topologies. We then introduce a new behavioral probe by creating secret communication channels between agents, showing that most out-of-the-box models exhibit a propensity to collude under this probe, which we term emergent collusion. Furthermore, we discover ``collusion on paper'' when agents plan to collude in text but often pick non-collusive actions. Colosseum provides a new way to audit collusion in cooperative multi-agent systems while presenting observations about how collusion emerges, what affects collusion efficacy, and which strategies may mitigate it.

2602.14862 2026-05-28 stat.ML cs.AI cs.IT cs.LG math.IT stat.ME

The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling

温度缩放分类器:温度缩放的一些基本性质

Pierre-Alexandre Mattei, Bruno Loureiro

AI总结 本文通过信息投影和线性缩放子模型等新视角,严格分析了温度缩放对分类器校准和LLM多样性的影响,证明升温普遍增加不确定性但质疑其增加多样性的说法。

详情
AI中文摘要

温度缩放是一种简单的方法,可以控制概率模型的不确定性。它主要用于两个场景:改进分类器的校准和调节大型语言模型(LLM)的随机性。在这两种情况下,温度缩放都是最流行的方法。尽管其流行,但温度缩放性质的严格理论分析仍然难以捉摸。我们在此研究其中一些性质。对于分类,我们表明提高温度在非常普遍的意义上增加了模型的不确定性(特别是增加了其熵)。然而,对于LLM,我们质疑了提高温度会增加多样性的常见说法。此外,我们引入了温度缩放的两种新表征。第一种是几何的:温度缩放模型被证明是原始模型在具有给定熵的模型集合上的信息投影。第二种表征阐明了温度缩放作为更一般线性缩放器(如矩阵缩放和狄利克雷校准)的子模型的作用:我们表明温度缩放是唯一不改变模型硬预测的线性缩放器。

英文摘要

Temperature scaling is a simple method that allows to control the uncertainty of probabilistic models. It is mostly used in two contexts: improving the calibration of classifiers and tuning the stochasticity of large language models (LLMs). In both cases, temperature scaling is the most popular method for the job. Despite its popularity, a rigorous theoretical analysis of the properties of temperature scaling has remained elusive. We investigate here some of these properties. For classification, we show that increasing the temperature increases the uncertainty in the model in a very general sense (and in particular increases its entropy). However, for LLMs, we challenge the common claim that increasing temperature increases diversity. Furthermore, we introduce two new characterisations of temperature scaling. The first one is geometric: the tempered model is shown to be the information projection of the original model onto the set of models with a given entropy. The second characterisation clarifies the role of temperature scaling as a submodel of more general linear scalers such as matrix scaling and Dirichlet calibration: we show that temperature scaling is the only linear scaler that does not change the hard predictions of the model.

2510.03534 2026-05-28 cs.MA cs.LG cs.SY eess.SY stat.ML

Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning

基于多智能体强化学习的杜罗河羽流长期映射

Nicolò Dal Fabbro, Milad Mesbahi, Renato Mendes, João Borges de Sousa, George J. Pappas

AI总结 提出一种能量与通信高效的多智能体强化学习方法,结合时空高斯过程回归与多头Q网络控制器,实现多艘自主水下航行器对杜罗河羽流的长期(多天)映射,在Delft3D模拟中优于基准方法,且增加智能体数量可提升精度与续航。

Comments Accepted at the 2026 IEEE International Conference on Robotics and Automation

详情
AI中文摘要

我们研究了使用多艘自主水下航行器(AUV)对河流羽流进行长期(多天)映射的问题,重点关注杜罗河代表性用例。我们提出了一种能量和通信高效的多智能体强化学习方法,其中中央协调器间歇性地与AUV通信,收集测量数据并发出指令。我们的方法将时空高斯过程回归(GPR)与多头Q网络控制器相结合,该控制器调节每个AUV的方向和速度。使用Delft3D海洋模型的模拟表明,我们的方法始终优于单智能体和多智能体基准,并且增加智能体数量既能改善均方误差(MSE)又能提高操作续航。在某些情况下,我们的算法表明,将AUV数量加倍可以使续航增加一倍以上,同时保持或提高精度,这凸显了多智能体协调的优势。我们学习到的策略能够泛化到不同月份和年份的未见季节性情景,为未来开发数据驱动的动态羽流环境长期监测展示了前景。

英文摘要

We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.

2602.03855 2026-05-28 eess.SP cs.LG

Majorization-Minimization Networks for Inverse Problems: An Application to EEG Imaging

用于反问题的Majorization-Minimization网络:在脑电图成像中的应用

Le Minh Triet Tran, Sarah Reynaud, Ronan Fablet, Adrien Merlini, François Rousseau, Mai Quyen Pham

AI总结 提出一种基于双层优化的学习型Majorization-Minimization框架,通过参数化曲率主导量并施加MM条件约束,在保持收敛保证的同时提升反问题求解的精度与稳定性,并在脑电图源成像中验证了其优于深度展开和元学习方法。

详情
AI中文摘要

反问题通常是不适定的,需要具有强稳定性和收敛保证的优化方案。虽然基于学习的方法(如深度展开和元学习)取得了强大的实证性能,但它们通常缺乏对下降和曲率的显式控制,限制了鲁棒性。我们提出了一种在双层优化设置中用于反问题的学习型Majorization-Minimization(MM)框架。我们不学习完整的优化器,而是学习一个结构化的曲率主导量,该主导量控制每个MM步骤,同时保留经典的MM下降保证。该主导量由一个轻量级循环神经网络参数化,并显式约束以满足有效的MM条件。对于余弦相似度损失,我们推导出显式的曲率界限,从而得到对角主导量。当解析界限不可用时,我们依赖基于高效Hessian-向量积的谱估计来自动上界局部曲率,而无需显式形成Hessian矩阵。在脑电图源成像上的实验表明,与深度展开和元学习基线相比,该方法在准确性、稳定性和跨数据集泛化方面均有改进。

英文摘要

Inverse problems are often ill-posed and require optimization schemes with strong stability and convergence guarantees. While learning-based approaches such as deep unrolling and meta-learning achieve strong empirical performance, they typically lack explicit control over descent and curvature, limiting robustness. We propose a learned Majorization-Minimization (MM) framework for inverse problems within a bilevel optimization setting. Instead of learning a full optimizer, we learn a structured curvature majorant that governs each MM step while preserving classical MM descent guarantees. The majorant is parameterized by a lightweight recurrent neural network and explicitly constrained to satisfy valid MM conditions. For cosine-similarity losses, we derive explicit curvature bounds yielding diagonal majorants. When analytic bounds are unavailable, we rely on efficient Hessian-vector product-based spectral estimation to automatically upper-bound local curvature without forming the Hessian explicitly. Experiments on EEG source imaging demonstrate improved accuracy, stability, and cross-dataset generalization over deep-unrolled and meta-learning baselines.

2602.01665 2026-05-28 cs.MA cs.AI cs.LG

TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning

TABX:面向多智能体强化学习的高吞吐沙盒战斗模拟器

Hayeong Lee, JunHyeok Oh, Byung-Jun Lee

AI总结 提出基于JAX的高吞吐沙盒模拟器TABX,通过可重构任务和硬件加速支持多智能体强化学习的高效研究与评估。

详情
AI中文摘要

环境的设计在塑造合作多智能体强化学习(MARL)算法的开发和评估中起着关键作用。虽然现有基准突出了关键挑战,但它们通常缺乏设计自定义评估场景所需的模块化。我们介绍了基于JAX的全加速战斗模拟器(TABX),这是一个专为可重构多智能体任务设计的高吞吐沙盒。TABX提供对环境参数的精细控制,允许系统地研究涌现的智能体行为和跨不同任务复杂度谱系的算法权衡。利用JAX在GPU上进行硬件加速执行,TABX实现了大规模并行化并显著降低了计算开销。通过提供一个快速、可扩展且易于定制的框架,TABX促进了复杂结构化领域中MARL智能体的研究,并作为未来研究的可扩展基础。我们的代码可在https://github.com/ku-dmlab/TABX获取。

英文摘要

The design of environments plays a critical role in shaping the development and evaluation of cooperative multi-agent reinforcement learning (MARL) algorithms. While existing benchmarks highlight critical challenges, they often lack the modularity required to design custom evaluation scenarios. We introduce the Totally Accelerated Battle Simulator in JAX (TABX), a high-throughput sandbox designed for reconfigurable multi-agent tasks. TABX provides granular control over environmental parameters, permitting a systematic investigation into emergent agent behaviors and algorithmic trade-offs across a diverse spectrum of task complexities. Leveraging JAX for hardware-accelerated execution on GPUs, TABX enables massive parallelization and significantly reduces computational overhead. By providing a fast, extensible, and easily customized framework, TABX facilitates the study of MARL agents in complex structured domains and serves as a scalable foundation for future research. Our code is available at: https://github.com/ku-dmlab/TABX.

2601.22519 2026-05-28 stat.ML cs.LG

Corrected Samplers for Discrete Flow Models

离散流模型的校正采样器

Zhengyan Wan, Yidong Ouyang, Liyan Xie, Hongyuan Zha, Fang Fang, Guang Cheng

AI总结 针对离散流模型中现有采样器(如tau-leaping和Euler求解器)离散化误差大、需大量迭代的问题,提出时间校正和位置校正两种采样器,在不增加计算成本下降低误差,并证明位置校正采样器复杂度更低。

详情
AI中文摘要

离散流模型(DFMs)被提出用于学习有限状态空间上的数据分布,作为离散扩散模型的替代方案提供了灵活框架。近期一系列工作研究了离散扩散模型的采样器,如tau-leaping和Euler求解器。然而,这些采样器需要大量迭代来控制离散化误差,因为转移速率在时间上被冻结并在每个时间区间内以初始状态评估。此外,这些采样器的理论结果通常要求转移速率的有限性条件,或专注于特定类型的源分布。为解决这些限制,我们在离散流模型框架下,建立了这些采样器的非渐近离散化误差界,且对转移速率和源分布无任何限制。进一步,通过分析Euler求解器的一步下界,我们提出了两种校正采样器: extit{时间校正采样器}和 extit{位置校正采样器},它们几乎不增加额外计算成本即可减少tau-leaping和Euler求解器的离散化误差。我们严格证明位置校正采样器比现有并行采样器具有更低的复杂度。通过在模拟和文本到图像生成任务上以更少的推理时间获得更好的生成质量,验证了所提方法的有效性。代码见 https://github.com/WanZhengyan/Corrected-Samplers-for-Discrete-Flow-Models。

英文摘要

Discrete flow models (DFMs) have been proposed to learn the data distribution on finite state space, offering a flexible framework as an alternative to discrete diffusion models. A line of recent work has studied samplers for discrete diffusion models, such as tau-leaping and Euler solver. However, these samplers require a large number of iterations to control discretization error, since the transition rates are frozen in time and evaluated at the initial state within each time interval. Moreover, theoretical results for these samplers often require boundedness conditions of the transition rate or they focus on a specific type of source distributions. To address those limitations, we establish non-asymptotic discretization error bounds for those samplers without any restriction on transition rates and source distributions, under the framework of discrete flow models. Furthermore, by analyzing a one-step lower bound of the Euler sampler, we propose two corrected samplers: \textit{time-corrected sampler} and \textit{location-corrected sampler}, which can reduce the discretization error of tau-leaping and Euler solver with almost no additional computational cost. We rigorously show that the location-corrected sampler has a lower complexity than existing parallel samplers. We validate the effectiveness of the proposed method by achieving better generation quality with reduced inference time on simulations and text-to-image generation tasks. Code can be found in https://github.com/WanZhengyan/Corrected-Samplers-for-Discrete-Flow-Models.

2509.23019 2026-05-28 cs.CR cs.AI

LLM Watermark Evasion via Bias Inversion

通过偏差反转实现LLM水印规避

Jeongyeon Hwang, Sangdon Park, Jungseul Ok

AI总结 提出偏差反转重写攻击(BIRA),通过理论分析证明降低绿色令牌平均条件概率可指数级衰减检测概率,实现黑盒下高规避率(>99%)且保持语义保真度。

详情
AI中文摘要

水印为检测LLM生成内容提供了一种有前景的解决方案,但在现实无查询(黑盒)规避下的鲁棒性仍是一个开放挑战。现有的无查询攻击往往成功率有限或严重扭曲语义。我们通过理论分析重写型规避来弥合这一差距,证明将绿色令牌的平均条件概率降低一个小幅度会导致检测概率指数级衰减。受此洞察启发,我们提出了偏差反转重写攻击(BIRA),一种实用的无查询方法,该方法对通过令牌惊讶度识别的代理抑制集应用负对数几率偏差。实验上,BIRA在多种水印方案中实现了最先进的规避率(>99%),同时语义保真度显著优于先前的基线。我们的发现揭示了当前水印方法的一个根本性漏洞,并强调了进行严格压力测试的必要性。我们的代码可在\href{https://github.com/ml-postech/LLM-Watermark-Evasion-via-Bias-Inversion}{此处}获取。

英文摘要

Watermarking offers a promising solution for detecting LLM-generated content, yet its robustness under realistic query-free (black-box) evasion remains an open challenge. Existing query-free attacks often achieve limited success or severely distort semantic meaning. We bridge this gap by theoretically analyzing rewriting-based evasion, demonstrating that reducing the average conditional probability of sampling green tokens by a small margin causes the detection probability to decay exponentially. Guided by this insight, we propose the \emph{Bias-Inversion Rewriting Attack} (BIRA), a practical query-free method that applies a negative logit bias to a proxy suppression set identified via token surprisal. Empirically, BIRA achieves state-of-the-art evasion rates ($>99\%$) across diverse watermarking schemes while preserving semantic fidelity substantially better than prior baselines. Our findings reveal a fundamental vulnerability in current watermarking methods and highlight the need for rigorous stress tests. Our code is available at \href{https://github.com/ml-postech/LLM-Watermark-Evasion-via-Bias-Inversion}{here}.

2507.14109 2026-05-28 cs.CR cs.LG eess.SP

An Adversarial-Driven Experimental Study on Deep Learning for RF Fingerprinting

一种对抗驱动的深度学习射频指纹识别实验研究

Xinyu Cao, Bimal Adhikari, Shangqing Zhao, Jingxian Wu, Yanjun Pan

AI总结 通过对抗性实验分析,发现深度学习射频指纹识别系统在域偏移下存在一致误分类行为,可被用作后门攻击,且模型在原始信号上训练会纠缠射频指纹与环境特征,产生无法通过置信度阈值等后处理缓解的攻击向量。

详情
Journal ref
IEEE Military Communications Conference (MILCOM), 2025
AI中文摘要

射频指纹识别通过提取无线设备独特的硬件缺陷,已成为零信任架构和超5G网络中有前景的物理层设备识别机制。特别是,深度学习方法在该领域展示了最先进的性能。然而,现有方法主要侧重于增强系统对无线环境时空变化的鲁棒性,而基于深度学习的方法的安全漏洞常被忽视。在这项工作中,我们通过对抗驱动的实验分析,系统性地研究了基于深度学习的射频指纹识别系统的安全风险。我们观察到深度学习模型在域偏移下存在一致的误分类行为,即一个设备经常被误分类为另一个特定设备。基于广泛真实实验的分析表明,这种行为可以被利用为有效的后门,使外部攻击者能够入侵系统。此外,我们证明在原始接收信号上训练深度学习模型会导致模型将射频指纹与环境及信号模式特征纠缠在一起,产生无法仅通过置信度阈值等后处理安全方法缓解的额外攻击向量。

英文摘要

Radio frequency (RF) fingerprinting, which extracts unique hardware imperfections of radio devices, has emerged as a promising physical-layer device identification mechanism in zero trust architectures and beyond 5G networks. In particular, deep learning (DL) methods have demonstrated state-of-the-art performance in this domain. However, existing approaches have primarily focused on enhancing system robustness against temporal and spatial variations in wireless environments, while the security vulnerabilities of these DL-based approaches have often been overlooked. In this work, we systematically investigate the security risks of DL-based RF fingerprinting systems through an adversarial-driven experimental analysis. We observe a consistent misclassification behavior for DL models under domain shifts, where a device is frequently misclassified as another specific one. Our analysis based on extensive real-world experiments demonstrates that this behavior can be exploited as an effective backdoor to enable external attackers to intrude into the system. Furthermore, we show that training DL models on raw received signals causes the models to entangle RF fingerprints with environmental and signal-pattern features, creating additional attack vectors that cannot be mitigated solely through post-processing security methods such as confidence thresholds.

2601.01496 2026-05-28 cs.GT cs.AI cs.LG

The Optimal Sample Complexity of Linear Contracts

线性合约的最优样本复杂度

Mikael Møller Høgsgaard

AI总结 本文通过经验效用最大化算法,证明仅需 O(ln(1/δ)/ε²) 个样本即可实现最优线性合约的 ε-近似,并匹配下界,从而确立最优样本复杂度。

详情
AI中文摘要

在本文中,我们解决了离线环境下从数据中学习最优线性合约的问题,其中代理人类型来自未知分布,委托人的目标是设计一个最大化其期望效用的合约。具体来说,我们的分析表明,简单的经验效用最大化(EUM)算法仅需 $O(\ln(1/δ) / \varepsilon^2)$ 个样本,就能以至少 $1-δ$ 的概率得到最优线性合约的 $\varepsilon$-近似。这一结果改进了先前已知的界限,并在常数因子内匹配了 Dütting 等人 2025 年的下界,从而证明了其最优性。此外,我们的结果建立了更强的一致收敛保证:每个线性合约的经验效用以其真实期望的 $\varepsilon$-近似成立的概率至少为 $1-δ$,且使用了相同的最优 $O(\ln(1/δ) / \varepsilon^2)$ 样本复杂度。

英文摘要

In this paper, we settle the problem of learning optimal linear contracts from data in the offline setting, where agent types are drawn from an unknown distribution and the principal's goal is to design a contract that maximizes her expected utility. Specifically, our analysis shows that the simple Empirical Utility Maximization (EUM) algorithm yields an $\varepsilon$-approximation of the optimal linear contract with probability at least $1-δ$, using just $O(\ln(1/δ) / \varepsilon^2)$ samples. This result improves upon previously known bounds and matches a lower bound from Dütting et al. 2025 up to constant factors, thereby proving its optimality. Furthermore, our result establishes the stronger guarantee of uniform convergence: the empirical utility of every linear contract is an $\varepsilon$-approximation of its true expectation with probability at least $1-δ$, using the same optimal $O(\ln(1/δ) / \varepsilon^2)$ sample complexity.

2501.06491 2026-05-28 cs.SE cs.AI cs.SY eess.SY

Improving Requirements Classification with SMOTE-Tomek Preprocessing

使用SMOTE-Tomek预处理改进需求分类

Barak Or

AI总结 针对PROMISE数据集中的类别不平衡问题,采用SMOTE-Tomek预处理结合分层K折交叉验证,显著提升了需求分类准确率,逻辑回归达到76.16%。

Comments 21 pages, 5 figures, Preprint

详情
AI中文摘要

本研究通过应用SMOTE-Tomek预处理技术,结合分层K折交叉验证,解决PROMISE数据集中类别不平衡问题,强调需求工程领域。该数据集包含969个分类需求,分为功能性和非功能性类型。所提出的方法在保持验证折完整性的同时,增强了少数类的表示,从而显著提高了分类准确率。逻辑回归达到了76.16%,大幅超过基线58.31%。这些结果凸显了机器学习模型作为可扩展且可解释解决方案的适用性和效率。

英文摘要

This study emphasizes the domain of requirements engineering by applying the SMOTE-Tomek preprocessing technique, combined with stratified K-fold cross-validation, to address class imbalance in the PROMISE dataset. This dataset comprises 969 categorized requirements, classified into functional and non-functional types. The proposed approach enhances the representation of minority classes while maintaining the integrity of validation folds, leading to a notable improvement in classification accuracy. Logistic regression achieved 76.16\%, significantly surpassing the baseline of 58.31\%. These results highlight the applicability and efficiency of machine learning models as scalable and interpretable solutions.

2512.20657 2026-05-28 cs.SI cs.LG

Graph Neural Networks for Source Detection: A Review and Benchmark Study

图神经网络用于源检测:综述与基准研究

Martin Sterchi, Nathan Brack, Lorenz Hilfiker

AI总结 本文系统综述了基于图神经网络的源检测方法,并在受控条件下复现和基准测试了四种代表性GNN架构,实验表明GNN在多种网络拓扑上显著优于传统方法和MLP基线。

详情
AI中文摘要

当流行病过程在接触网络上展开时,源检测问题出现,目标是识别其起源点,即源节点。该问题的研究始于Shah和Zaman在2010年的开创性工作,他们正式定义了该问题并引入了谣言中心性的概念。随着图神经网络(GNN)的出现,多项研究提出了基于GNN的源检测方法。然而,这些工作在方法论的清晰度和可重复性方面仍有改进空间。因此,目前尚不清楚GNN在可比设置下是否真正优于更传统的源检测方法。在本文中,我们首先系统回顾了现有的基于GNN的源检测方法,清晰概述了每种方法所处理的具体设置及其采用的架构。然后,我们在受控的可比条件下,复现并基准测试了四种代表性GNN架构与多种传统和基于MLP的基线方法。我们还研究了围绕该问题的关键问题,包括可检测性如何随时间演变、性能如何随训练集大小扩展,以及方法对观测时间和流行病参数不确定性的敏感程度。我们的实验表明,GNN在多种网络拓扑上显著优于我们测试的所有其他方法。尽管我们最初旨在挑战GNN作为源检测解决方案的观点,但我们的结果反而证明了它们在此任务上的显著有效性。为确保完全可重复性,我们在GitHub上发布了所有代码和数据。最后,我们认为流行病源检测构成了评估GNN架构的一个自然且有吸引力的基准任务。

英文摘要

The source detection problem arises when an epidemic process unfolds over a contact network, and the objective is to identify its point of origin, i.e., the source node. Research on this problem began with the seminal work of Shah and Zaman in 2010, who formally defined it and introduced the notion of rumor centrality. With the emergence of Graph Neural Networks (GNNs), several studies have proposed GNN-based approaches to source detection. However, there is room to strengthen methodological clarity and reproducibility across these works. As a result, it remains unclear whether GNNs truly outperform more traditional source detection methods across comparable settings. In this paper, we first systematically review existing GNN-based methods for source detection, clearly outlining the specific settings each addresses and the architectures they employ. We then reproduce and benchmark four representative GNN architectures against a diverse set of traditional and MLP-based baselines under controlled, comparable conditions. We also investigate key questions surrounding this problem, including how detectability evolves over time, how performance scales with training set size, and how sensitive methods are to uncertainty in observation timing and epidemic parameters. Our experiments show that GNNs substantially outperform all other methods we test across a variety of network topologies. Although we initially set out to challenge the notion of GNNs as a solution to source detection, our results instead demonstrate their remarkable effectiveness for this task. To ensure full reproducibility, we release all code and data on GitHub. Finally, we argue that epidemic source detection constitutes a natural and attractive benchmark task for evaluating GNN architectures.