arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1942
2511.09557 2026-05-21 cs.DC cs.LG

Understanding and Improving Communication Performance in Multi-node LLM Inference

理解并改进多节点LLM推理中的通信性能

Prajwal Singhania, Siddharth Singh, Lannie Dalton Hough, Akarsh Srivastava, Harshitha Menon, Charles Fredrick Jekel, Abhinav Bhatele

AI总结 本研究探讨了多节点分布式推理中通信性能的优化,通过分析不同模型并行方案的强标度行为,提出了一种基于递归倍增的分层all-reduce算法NVRAR,显著降低了推理延迟。

Comments 17 Figures, To Appear in Proceedings of ACM Conference on AI and Agentic Systems 2026

详情
AI中文摘要

随着大型语言模型(LLMs)的持续增长,分布式推理变得越来越重要。模型并行策略现在必须高效地扩展到多个GPU以及多个节点。在本工作中,我们对使用GPU超级计算机上的LLM进行多节点分布式推理进行了详细性能研究。我们使用几种最先进的推理引擎以及YALIS,一个面向研究的原型引擎进行实验。我们分析了不同模型并行方案的强标度行为,并识别了关键瓶颈。由于all-reduce操作是常见的性能瓶颈,我们开发了NVRAR,一种基于递归倍增的分层all-reduce算法,使用NVSHMEM。NVRAR在HPE Slingshot和InfiniBand互连上,对于128 KB到2 MB的消息大小,延迟比NCCL低高达1.9$ imes$-3.6$ imes$。集成到YALIS中,NVRAR在使用张量并行的多节点解码密集工作负载中,对于Llama 3.1 405B模型实现了高达1.72$ imes$的端到端批量延迟减少。

英文摘要

As large language models (LLMs) continue to grow in size, distributed inference has become increasingly important. Model-parallel strategies must now efficiently scale not only across multiple GPUs but also across multiple nodes. In this work, we present a detailed performance study of multi-node distributed inference using LLMs on GPU-based supercomputers. We conduct experiments with several state-of-the-art inference engines alongside YALIS, a research-oriented prototype engine designed for controlled experimentation. We analyze the strong-scaling behavior of different model-parallel schemes and identify key bottlenecks. Because all-reduce operations are a common performance bottleneck, we develop NVRAR, a hierarchical all-reduce algorithm based on recursive doubling with NVSHMEM. NVRAR achieves up to 1.9$\times$-3.6$\times$ lower latency than NCCL for message sizes between 128 KB and 2 MB on HPE Slingshot and InfiniBand interconnects. Integrated into YALIS, NVRAR achieves up to a 1.72$\times$ reduction in end-to-end batch latency for the Llama 3.1 405B model in multi-node decode-heavy workloads using tensor parallelism.

2510.15949 2026-05-21 q-fin.TR cs.AI

ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination

ATLAS:通过动态提示优化和多智能体协调实现LLM智能体的自适应交易

Charidimos Papadakis, Angeliki Dimitriou, Giorgos Filandrianos, Maria Lymperaiou, Konstantinos Thomas, Giorgos Stamou

AI总结 本文提出ATLAS框架,通过动态提示优化和多智能体协调,解决LLM在金融交易中的适应性问题,提升交易决策的鲁棒性和执行效率。

详情
AI中文摘要

大型语言模型在金融决策中展现出潜力,但将其作为自主交易代理存在根本性挑战:如何在奖励延迟和市场噪声干扰下适应指令,如何将异质信息流合成连贯决策,以及如何弥合模型输出与可执行市场行动之间的差距。本文提出ATLAS(Adaptive Trading with LLM AgentS),一个统一的多智能体框架,整合市场、新闻和公司基本面的结构化信息以支持稳健的交易决策。在ATLAS中,核心交易智能体在订单感知的动作空间中运作,确保输出对应可执行的市场订单而非抽象信号。该智能体可通过Adaptive-OPRO技术在交易中整合反馈,这是一种新颖的提示优化技术,通过动态适应提示并结合实时随机反馈,随着时间推移提高性能。在特定市场环境的股票研究和多个LLM家族中,Adaptive-OPRO consistently outperforms fixed prompts,而基于反思的反馈未能提供系统性增益。

英文摘要

Large language models show promise for financial decision-making, yet deploying them as autonomous trading agents raises fundamental challenges: how to adapt instructions when rewards arrive late and obscured by market noise, how to synthesize heterogeneous information streams into coherent decisions, and how to bridge the gap between model outputs and executable market actions. We present ATLAS (Adaptive Trading with LLM AgentS), a unified multi-agent framework that integrates structured information from markets, news, and corporate fundamentals to support robust trading decisions. Within ATLAS, the central trading agent operates in an order-aware action space, ensuring that outputs correspond to executable market orders rather than abstract signals. The agent can incorporate feedback while trading using Adaptive-OPRO, a novel prompt-optimization technique that dynamically adapts the prompt by incorporating real-time, stochastic feedback, leading to increasing performance over time. Across regime-specific equity studies and multiple LLM families, Adaptive-OPRO consistently outperforms fixed prompts, while reflection-based feedback fails to provide systematic gains.

2510.04905 2026-05-21 cs.SE cs.CL

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

检索增强的代码生成:聚焦于仓库级方法的综述

Yicheng Tao, Yuante Li, Yao Qin, Yepang Liu

AI总结 本文综述了检索增强的代码生成方法,重点探讨仓库级方法,分析了其在大规模代码生成中的挑战与解决方案,总结了现有方法的分类框架及关键挑战。

详情
AI中文摘要

近年来,大型语言模型(LLMs)的进展显著提升了自动化代码生成的能力。尽管现有方法在函数和文件级别上表现优异,但现实中的软件工程需要对整个仓库进行推理,包括跨文件依赖、不断演变的执行环境和全局语义一致性。这一挑战催生了仓库级代码生成(RLCG),其中模型必须检索、组织并利用仓库级上下文以生成连贯且可执行的代码变更。为解决这些挑战,检索增强生成(RAG)已成为仓库级代码智能的重要范式。本文综述了检索增强代码生成(RACG),特别关注仓库级方法。不同于将RACG视为静态的“检索后生成”流程,我们将其视为一个耦合且不断演变的过程,涉及上下文构建、检索优化、生成和环境交互。通过统一的分析框架,我们组织现有方法,涵盖检索子系统、控制机制和评估设置。基于此框架,我们系统地考察了检索策略、基于图和非基于图的检索范式、训练驱动的优化以及自主代理架构。我们进一步总结了广泛使用的数据集、基准和系统配置,并讨论了关键挑战,包括可扩展性、可靠性、效率以及RACG与长上下文LLMs之间的必要性边界。通过本文综述,我们旨在为快速发展的RACG领域提供结构化的理解,并突出未来人工智能驱动的软件工程研究的有前景方向。

英文摘要

Recent advances in large language models (LLMs) have significantly improved automated code generation. While existing approaches have achieved strong performance at the function and file levels, real-world software engineering requires reasoning over entire repositories, including cross-file dependencies, evolving execution environments, and global semantic consistency. This challenge has led to the emergence of Repository-Level Code Generation (RLCG), where models must retrieve, organize, and utilize repository-scale context to generate coherent and executable code changes. To address these challenges, Retrieval-Augmented Generation (RAG) has become an increasingly important paradigm for repository-level code intelligence. In this survey, we present a comprehensive review of Retrieval-Augmented Code Generation (RACG), with a particular focus on repository-level approaches. Rather than viewing RACG as a static ``retrieve-then-generate'' pipeline, we characterize it as a coupled and evolving process involving context construction, retrieval optimization, generation, and environment interaction. We organize existing methods through a unified analytical framework spanning retrieval substrate, control regime, and evaluation setting. Based on this framework, we systematically examine retrieval strategies, graph-based and non-graph-based retrieval paradigms, training-driven optimizations, and autonomous agent architectures. We further summarize widely used datasets, benchmarks, and system configurations, and discuss key challenges including scalability, reliability, efficiency, and the necessity boundary between RACG and long-context LLMs. Through this survey, we aim to provide a structured understanding of the rapidly evolving RACG landscape and highlight promising directions for future AI-powered software engineering research.

2506.11060 2026-05-21 cs.SE cs.AI

Code Researcher: Deep Research Agent for Large Systems Code and Commit History

Code Researcher: 用于大型系统代码和提交历史的深度研究代理

Ramneet Singh, Sathvik Joel, Abhav Mehrotra, Nalin Wadhwa, Ramakrishna B Bairi, Aditya Kanade, Nagarajan Natarajan

AI总结 本文提出Code Researcher,一种用于大型系统代码和提交历史的深度研究代理,通过多步骤推理和全局上下文收集,有效解决系统代码崩溃修复问题,显著优于现有基线方法。

详情
AI中文摘要

基于大型语言模型(LLM)的编码代理在编码基准测试中表现出色,但在系统代码上的有效性仍待探索。由于系统代码的规模和复杂性,对系统代码库进行修改需要研究大量上下文,这些上下文来源于大型代码库及其庞大的提交历史。受近期深度研究代理进展的启发,我们设计了首个代码研究代理Code Researcher,并将其应用于生成补丁以缓解系统代码中的崩溃问题。Code Researcher通过多步骤推理,对代码的语义、模式和提交历史进行推理,以从代码库和提交历史中检索所有相关上下文。我们评估了Code Researcher在kBenchSyz基准测试中的表现,结果显示其显著优于强基线方法,使用OpenAI的GPT-4o模型时,崩溃解决率(CRR)达到48%,相比SWE-agent的31.5%和Agentless的31%。扩大采样预算至10条轨迹可将CRR提升至54%。Code Researcher对模型选择也具有鲁棒性,使用新模型Gemini 2.5-Flash时达到67%。通过在开源多媒体软件上的另一个实验,我们展示了Code Researcher的泛化能力,并进行了消融实验。我们的实验突显了对大型代码库进行全局上下文收集和多维推理的重要性。

英文摘要

Large Language Model (LLM)-based coding agents have shown promising results on coding benchmarks, but their effectiveness on systems code remains underexplored. Due to the size and complexities of systems code, making changes to a systems codebase requires researching about many pieces of context, derived from the large codebase and its massive commit history, before making changes. Inspired by the recent progress on deep research agents, we design the first deep research agent for code, called Code Researcher, and apply it to the problem of generating patches to mitigate crashes reported in systems code. Code Researcher performs multi-step reasoning about semantics, patterns, and commit history of code to retrieve all relevant context from the codebase and its commit history. We evaluate Code Researcher on kBenchSyz, a benchmark of Linux kernel crashes, and show that it significantly outperforms strong baselines, achieving a crash-resolution rate (CRR) of 48%, compared to 31.5% by SWE-agent and 31% by Agentless, using OpenAI's GPT-4o model. Scaling up sampling budget to 10 trajectories increases Code Researcher's CRR to 54%. Code Researcher is also robust to model choices, reaching 67% with the newer Gemini 2.5-Flash model. Through another experiment on an open-source multimedia software, we show the generalizability of Code Researcher and also conduct ablations. Our experiments highlight the importance of global context gathering and multi-faceted reasoning for large codebases.

2506.09521 2026-05-21 eess.AS cs.CL

You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks

你所说的就是你:利用语言内容进行语音隐私攻击

Ünal Ege Gaznepoglu, Anna Leschanowsky, Ahmad Aloradi, Prachi Singh, Daniel Tenbrinck, Emanuël A. P. Habets, Nils Peters

AI总结 本文研究了语音隐私保护系统中语言内容对语音隐私攻击的影响,通过调整BERT模型作为自动说话人验证系统,评估了说话人内部语言内容相似性对攻击性能的影响,并提出改进语音隐私数据集以实现更公平的隐私评估。

Comments 5 pages, 6 figures, 1 table, accepted at INTERSPEECH 2025 update reason: change to the acknowledgements

详情
AI中文摘要

说话人匿名化系统在隐藏说话人身份的同时,保留了诸如语言内容和情感等其他信息。为了评估其隐私效益,采用自动说话人验证(ASV)系统进行攻击。在本研究中,我们通过调整BERT模型作为ASV系统,评估了攻击者训练和评估数据集中说话人内部语言内容相似性的影响。在VoicePrivacy Attacker Challenge数据集中,我们的方法实现了平均相等错误率(EER)为35%,某些说话人仅基于其语音的文本内容就达到了2%的EER。我们的可解释性研究发现,系统决策与语音中语义相似的关键词有关,这些关键词源于LibriSpeech的编纂方式。我们的研究建议重新设计VoicePrivacy数据集,以确保公平和无偏的评估,并挑战对全球EER用于隐私评估的依赖。

英文摘要

Speaker anonymization systems hide the identity of speakers while preserving other information such as linguistic content and emotions. To evaluate their privacy benefits, attacks in the form of automatic speaker verification (ASV) systems are employed. In this study, we assess the impact of intra-speaker linguistic content similarity in the attacker training and evaluation datasets, by adapting BERT, a language model, as an ASV system. On the VoicePrivacy Attacker Challenge datasets, our method achieves a mean equal error rate (EER) of 35%, with certain speakers attaining EERs as low as 2%, based solely on the textual content of their utterances. Our explainability study reveals that the system decisions are linked to semantically similar keywords within utterances, stemming from how LibriSpeech is curated. Our study suggests reworking the VoicePrivacy datasets to ensure a fair and unbiased evaluation and challenge the reliance on global EER for privacy evaluations.

2503.13549 2026-05-21 cs.SE cs.AI

A Showdown of ChatGPT vs DeepSeek in Solving Programming Tasks

ChatGPT与DeepSeek在解决编程任务中的对决

Ronas Shakya, Sam Urmian, Mohammad Khalil

AI总结 本文评估了ChatGPT和DeepSeek在解决编程任务中的性能,发现ChatGPT在中等难度任务中表现更优,而两者在困难任务上均面临挑战。

详情
AI中文摘要

大语言模型(LLMs)的发展为AI辅助编程工具创造了竞争环境。本研究评估了ChatGPT 03-mini和DeepSeek-R1在Codeforces上解决编程任务的能力。使用三个难度级别的29个编程任务,我们通过接受的解决方案、内存效率和运行时间性能评估了两种模型的表现。我们的结果表明,尽管两者在简单任务上表现相似,但ChatGPT在中等难度任务中表现更优,成功率为54.5%,而DeepSeek为18.1%。两者在困难任务上均面临挑战,突显了LLMs在处理高度复杂编程问题方面的持续挑战。这些发现突显了两种模型在能力和计算能力上的关键差异,为开发者和研究人员改进AI驱动的编程工具提供了有价值的见解。

英文摘要

The advancement of large language models (LLMs) has created a competitive landscape for AI-assisted programming tools. This study evaluates two leading models: ChatGPT 03-mini and DeepSeek-R1 on their ability to solve competitive programming tasks from Codeforces. Using 29 programming tasks of three levels of easy, medium, and hard difficulty, we assessed the outcome of both models by their accepted solutions, memory efficiency, and runtime performance. Our results indicate that while both models perform similarly on easy tasks, ChatGPT outperforms DeepSeek-R1 on medium-difficulty tasks, achieving a 54.5% success rate compared to DeepSeek 18.1%. Both models struggled with hard tasks, thus highlighting some ongoing challenges LLMs face in handling highly complex programming problems. These findings highlight key differences in both model capabilities and their computational power, offering valuable insights for developers and researchers working to advance AI-driven programming tools.

2503.00565 2026-05-21 stat.ML cs.LG math.ST stat.ME stat.TH

Batched Single-Index Global Multi-Armed Bandits with Covariates

批量单索引全局多臂老虎机与协变量

Sakshi Arya, Hyebin Song

AI总结 本文提出了一种新的半参数框架,用于带有协变量的批量老虎机问题,通过引入共享参数和单索引回归模型来捕捉臂奖励之间的关系,提出BIDS算法,在两种设置下推导了理论遗憾界,证明了在协变量维度为1时非参数批量老虎机的最优率。

详情
AI中文摘要

多臂老虎机(MAB)框架是一种广泛用于顺序决策制定的方法,其中决策者在每一轮中选择一个臂,以最大化长期奖励。在许多实际应用中,如个性化医学和推荐系统,决策时可用上下文信息,不同臂的奖励相关而非独立,且反馈以批量形式提供。我们提出了一种新的半参数框架,用于带有协变量的批量老虎机,该框架在臂之间共享参数。我们利用单索引回归(SIR)模型来捕捉臂奖励之间的关系,同时在可解释性和灵活性之间取得平衡。我们的算法,批量单索引动态分箱和 successive arm elimination(BIDS),采用批量 successive arm elimination 策略,并通过单索引方向引导的动态分箱机制。我们考虑了两种设置:一种是可用 pilot 方向,另一种是方向从数据估计,推导了两种情况的理论遗憾界。当 pilot 方向足够准确且臂的数量 K 固定时,我们的方法在非参数批量老虎机中实现了最小化最优率(d=1),规避了维度灾难。在模拟和现实数据集上的大量实验展示了我们的算法相比由 \cite{jiang2025batched} 引入的非参数批量老虎机方法的有效性。

英文摘要

The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making, where a decision-maker selects an arm in each round with the goal of maximizing long-term rewards. In many practical applications, such as personalized medicine and recommendation systems, contextual information is available at the time of decision-making, rewards from different arms are related rather than independent, and feedback is provided in batches. We propose a novel semi-parametric framework for batched bandits with covariates that incorporates a shared parameter across arms. We leverage the single-index regression (SIR) model to capture relationships between arm rewards while balancing interpretability and flexibility. Our algorithm, Batched single-Index Dynamic binning and Successive arm elimination (BIDS), employs a batched successive arm elimination strategy with a dynamic binning mechanism guided by the single-index direction. We consider two settings: one where a pilot direction is available and another where the direction is estimated from data, deriving theoretical regret bounds for both cases. When a pilot direction is available with sufficient accuracy and the number of arms $K$ is fixed, our approach achieves minimax-optimal rates (with $d = 1$) for nonparametric batched bandits, circumventing the curse of dimensionality. Extensive experiments on simulated and real-world datasets demonstrate the effectiveness of our algorithm compared to the nonparametric batched bandit method introduced by \cite{jiang2025batched}.

2411.09593 2026-05-21 eess.IV cs.AI cs.CV

SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms

SMILE-UHURA挑战 -- 从超高分辨率7T磁共振血管造影中进行微血管分割

Soumick Chatterjee, Hendrik Mattern, Marc Dörner, Alessandro Sciarra, Florian Dubost, Hannes Schnurre, Rupali Khatun, Chun-Chih Yu, Tsung-Lin Hsieh, Yi-Shan Tsai, Yi-Zeng Fang, Yung-Ching Yang, Juinn-Dar Huang, Marshall Xu, Siyu Liu, Fernanda L. Ribeiro, Saskia Bollmann, Karthikesh Varma Chintalapati, Chethan Mysuru Radhakrishna, Sri Chandana Hudukula Ram Kumara, Raviteja Sutrave, Abdul Qayyum, Moona Mazher, Imran Razzak, Cristobal Rodero, Steven Niederren, Fengming Lin, Yan Xia, Jiacheng Wang, Riyu Qiu, Liansheng Wang, Arya Yazdan Panah, Rosana El Jurdi, Guanghui Fu, Janan Arslan, Ghislain Vaillant, Romain Valabregue, Didier Dormont, Bruno Stankoff, Olivier Colliot, Luisa Vargas, Isai Daniel Chacón, Ioannis Pitsiorlas, Pablo Arbeláez, Maria A. Zuluaga, Stefanie Schreiber, Oliver Speck, Andreas Nürnberger

AI总结 该研究旨在解决公共标注数据集不足的问题,通过提供一个包含时间飞行血管造影的7T MRI标注数据集,评估了多种深度学习方法在微血管分割任务中的性能。

详情
AI中文摘要

人类大脑通过复杂的血管网络获取营养和氧气。影响微血管的病理状况是脑血供中的关键弱点,可能导致严重疾病,如小脑血管疾病。7特斯拉MRI系统的发展使得可以获得更高的空间分辨率图像,使能够可视化大脑中的这些血管。然而,缺乏公开可用的标注数据集阻碍了稳健的机器学习驱动分割算法的发展。为此,SMILE-UHURA挑战被组织起来。该挑战与2023年ISBI会议同期在哥伦比亚的加勒比海城市卡塔赫纳举行,旨在为相关研究领域研究人员提供一个平台。SMILE-UHURA挑战通过提供一个包含7T MRI获取的时间飞行血管造影的标注数据集,填补了公共标注数据集的空白。该数据集是通过自动预分割和大量手动精修相结合创建的。在本文中,十六种提交的方法和两个基线方法在两个不同的数据集上进行了定量和定性比较:一个是来自相同数据集的保留测试MRA(标签保密),另一个是单独的7T ToF MRA数据集(输入体积和标签均保密)。结果表明,大多数提交的深度学习方法在提供的训练数据集上训练后,实现了可靠的分割性能。Dice分数在相应数据集上达到了最高0.838±0.066和0.716±0.125,平均性能最高可达0.804±0.15。

英文摘要

The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, making it possible to visualise such vessels in the brain. However, the lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms. To address this, the SMILE-UHURA challenge was organised. This challenge, held in conjunction with the ISBI 2023, in Cartagena de Indias, Colombia, aimed to provide a platform for researchers working on related topics. The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI. This dataset was created through a combination of automated pre-segmentation and extensive manual refinement. In this manuscript, sixteen submitted methods and two baseline methods are compared both quantitatively and qualitatively on two different datasets: held-out test MRAs from the same dataset as the training data (with labels kept secret) and a separate 7T ToF MRA dataset where both input volumes and labels are kept secret. The results demonstrate that most of the submitted deep learning methods, trained on the provided training dataset, achieved reliable segmentation performance. Dice scores reached up to 0.838 $\pm$ 0.066 and 0.716 $\pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $\pm$ 0.15.

2205.10995 2026-05-21 cs.DS cs.AI cs.DM cs.FL cs.LO

From Width-Based Model Checking to Width-Based Automated Theorem Proving

从基于宽度的模型检验到基于宽度的自动定理证明

Mateus de Oliveira Oliveira, Sam Urmian

AI总结 本文提出一个通用框架,将大量基于宽度的模型检验算法转换为用于测试图论猜想在有限宽度图类上有效性的算法,改进了理论上的上界。

Comments A preliminary version of this work was published in the proceedings of AAAI 2023

详情
AI中文摘要

在参数化复杂性理论领域,图宽度度量的研究与图上组合性质的基于宽度的模型检验算法的发展紧密相连。在本工作中,我们提出一个通用框架,将一大类基于宽度的模型检验算法转换为可用于测试图论猜想在有限宽度图类上有效性的算法。我们的框架是模块化的,并可以应用于几种已研究的图宽度度量,包括树宽和克lique宽。作为我们框架的定量应用,我们证明了对于几个长期存在的图论猜想,存在一个算法,其输入为一个数k,并在时间双指数于k^{O(1)}内正确判断该猜想是否在树宽不超过k的所有图上成立。这些上界,可以视为这些猜想在树宽不超过k的图类上的证明/反驳大小的上界,显著改进了之前使用现有技术得到的理论上界。

英文摘要

In the field of parameterized complexity theory, the study of graph width measures has been intimately connected with the development of width-based model checking algorithms for combinatorial properties on graphs. In this work, we introduce a general framework to convert a large class of width-based model-checking algorithms into algorithms that can be used to test the validity of graph-theoretic conjectures on classes of graphs of bounded width. Our framework is modular and can be applied with respect to several well-studied width measures for graphs, including treewidth and cliquewidth. As a quantitative application of our framework, we prove analytically that for several long-standing graph-theoretic conjectures, there exists an algorithm that takes a number $k$ as input and correctly determines in time double-exponential in $k^{O(1)}$ whether the conjecture is valid on all graphs of treewidth at most $k$. These upper bounds, which may be regarded as upper-bounds on the size of proofs/disproofs for these conjectures on the class of graphs of treewidth at most $k$, improve significantly on theoretical upper bounds obtained using previously available techniques.

2105.09034 2026-05-21 cs.GR cs.CV

Guided Facial Skin Color Correction

引导式面部肤色校正

Keiichiro Shirai, Tatsuya Baba, Shunsuke Ono, Masahiro Okuda, Yusuke Tatesumi, Paul Perrotin

AI总结 本文提出了一种自动图像校正方法,用于人像照片,通过抑制背景颜色引起的肤色变化来提高面部肤色的一致性。在人像摄影中,由于光照环境(如从彩色背景墙反射的光线或相机 strobe 过曝)常导致肤色失真,若照片人工合成其他背景色,则这种颜色变化会更加明显,导致不自然的合成结果。在我们的框架中,首先大致提取面部区域并在颜色空间中校正肤色分布,然后在原始图像中对面部周围进行颜色和亮度校正,以实现适当的面部颜色平衡,不受亮度和背景颜色影响。与传统颜色校正算法不同,我们的最终结果通过带有引导图像的颜色校正过程获得。特别是,我们的引导图像过滤器在颜色校正中不需要像 He 等人最初提出的引导图像过滤器方法中所需的完美对齐的引导图像。实验结果表明,我们的方法在人像照片和自然场景照片上都比传统方法生成更自然的结果。我们还展示了自动年鉴风格照片生成作为另一种应用。

Comments 12 pages, 16 figures

详情
Journal ref
Signals, vol. 2, no. 3, pp. 540-558, 2021
AI中文摘要

本文提出了一种自动图像校正方法,用于人像照片,该方法通过抑制由于背景颜色引起的肤色变化来促进面部肤色的一致性。在人像照片中,由于光照环境(例如,从彩色背景墙反射的光线或相机 strobe 过曝)常常导致肤色失真,如果照片人工合成另一种背景颜色,这种颜色变化会更加明显,导致不自然的合成结果。在我们的框架中,首先大致提取面部区域并在颜色空间中校正肤色分布,然后在原始图像中对面部周围进行颜色和亮度校正,以实现适当的面部颜色平衡,该平衡不受亮度和背景颜色的影响。与传统颜色校正算法不同,我们的最终结果通过带有引导图像的颜色校正过程获得。特别是,我们的引导图像过滤器在颜色校正中不需要像 He 等人最初提出的引导图像过滤器方法中所需的完美对齐的引导图像。实验结果表明,我们的方法在人像照片和自然场景照片上都比传统方法生成更自然的结果。我们还展示了自动年鉴风格照片生成作为另一种应用。

英文摘要

This paper proposes an automatic image correction method for portrait photographs, which promotes consistency of facial skin color by suppressing skin color changes due to background colors. In portrait photographs, skin color is often distorted due to the lighting environment (e.g., light reflected from a colored background wall and over-exposure by a camera strobe), and if the photo is artificially combined with another background color, this color change is emphasized, resulting in an unnatural synthesized result. In our framework, after roughly extracting the face region and rectifying the skin color distribution in a color space, we perform color and brightness correction around the face in the original image to achieve a proper color balance of the facial image, which is not affected by luminance and background colors. Unlike conventional algorithms for color correction, our final result is attained by a color correction process with a guide image. In particular, our guided image filtering for the color correction does not require a perfectly-aligned guide image required in the original guide image filtering method proposed by He et al. Experimental results show that our method generates more natural results than conventional methods on not only headshot photographs but also natural scene photographs. We also show automatic yearbook style photo generation as an another application.

2605.21253 2026-05-21 stat.ML cs.LG

Theoretical guidelines for annealed Langevin dynamics in compositional simulation-based inference

关于组成式得分方法在基于模拟的推断中的退火动力学理论指南

Camille Touron, Gabriel V. Cardoso, Julyan Arbel, Pedro L. C. Rodrigues

AI总结 本文研究了基于模拟的推断中组成式得分方法的退火动力学理论,提出了一种新的理论框架,通过推导Wasserstein界,为超参数选择提供了理论指导,并在高斯情况下证明了不同复合得分方法在步长和总动力学步数上的差异。

详情
AI中文摘要

基于模拟的推断(SBI)中的组成式得分方法通过聚合单独学习的后验得分来近似给定n个独立观测的后验分布。目前主要有两种方法(Geffner等人,2023;Linhart等人,2026)。由于所得到的复合得分不对应于真实多观测后验的正向扩散路径上的任何分布的得分,通过反向SDE采样会导致不可消除的偏差。退火动力学提供了一种原理性的替代方法:它将复合得分视为一系列可处理的桥梁密度序列的真实得分,并依次采样这些密度。当正确调节时,它可能导致可控的偏差。然而,其超参数,即步长、每个级别步数和退火级别数,迄今为止都是经验选择。我们推导了退火动力学在近似得分下的Wasserstein界,并将其转化为这些超参数的显式决策规则,以保证规定的采样精度,同时突显每种复合得分方法的不同理论方面。在高斯情况下,我们获得了所有相关量的闭式表达式,并证明了Linhart等人(2026)的桥梁密度一致地允许更大的步长和更少的总动力学步数,而Geffner等人(2023)的则不然。此外,我们还通过实验证明,在高斯情况下的调节可以推广到更复杂的问题,从而为使用组成式得分方法的实践者提供了一个清晰且理论坚实的起点。

英文摘要

Compositional score-based approaches to simulation-based inference (SBI) approximate the posterior over a shared parameter given $n$ independent observations by aggregating individually learned posterior scores: currently, there are two main propositions of such methods (Geffner et al. (2023), Linhart et al. (2026)). As the resulting composite score does not correspond to the score of any distribution along the forward diffusion path of the true multi-observation posterior, sampling from it via a reverse SDE leads to an irreducible bias. Annealed Langevin dynamics provides a principled alternative: it treats the composite score as the genuine score of a sequence of tractable bridging densities and samples from them in succession. When properly tuned, it could lead to a controllable bias. However, its hyperparameters, namely step sizes, the number of steps per level, and the number of annealing levels, have so far been chosen empirically. We derive Wasserstein bounds for annealed Langevin with approximate scores and translate them into explicit decision rules for these hyperparameters that guarantee a prescribed sampling accuracy, while highlighting different theoretical aspects of each composite score formulation. In the Gaussian setting, we obtain closed-form expressions for all relevant quantities and prove that the bridging densities of Linhart et al. (2026) consistently admit larger step sizes and require fewer total Langevin steps than those of Geffner et al. (2023). Furthermore, we show empirically that the tuning obtained in the Gaussian setting generalizes to more complex problems, thus providing a well-understood and theoretically grounded starting point for practitioners using compositional score-based approaches.

2605.21251 2026-05-21 eess.IV cs.CV

Local-sensitive connectivity filter (ls-cf): A post-processing unsupervised improvement of the frangi, hessian and vesselness filters for multimodal vessel segmentation

局部敏感连通性滤波器(ls-cf):一种后处理的无监督改进方法,用于多模态血管分割的Frangi、Hessian和血管性滤波器

Erick O Rodrigues, Lucas O Rodrigues, João HP Machado, Dalcimar Casanova, Marcelo Teixeira, Jeferson T Oliva, Giovani Bernardes, Panos Liatsis

AI总结 本文提出了一种无监督的多模态方法,改进Frangi滤波器的响应,实现自动血管分割。通过计算像素级血管连续性并引入局部容忍启发式方法来填补Frangi响应产生的血管不连续性,提出局部敏感连通性滤波器(LS-CF),在多种多模态数据集上取得了有竞争力的结果,尤其在OSIRIX视网膜血管造影数据集中,其准确率优于现有最先进方法。

详情
Journal ref
Journal of Imaging 2022
AI中文摘要

视网膜血管分析是一种可用于评估眼部风险的程序。本文提出了一种无监督的多模态方法,改进Frangi滤波器的响应,实现自动血管分割。我们提出了一种滤波器,计算像素级血管连续性并引入局部容忍启发式方法来填补Frangi响应产生的血管不连续性。该方法称为局部敏感连通性滤波器(LS-CF),与基于阈值的Frangi响应滤波器、结合形态学闭运算的简单连通性滤波器以及文献中的现有方法进行了比较。该方法在多种多模态数据集中取得了有竞争力的结果。在OSIRIX视网膜血管造影数据集中,它在准确率方面优于所有现有最先进方法;在IOSTAR数据集中,它在4/5项任务中优于现有方法;在DRIVE和STARE数据集中,它也优于一些现有工作;在CHASE-DB数据集中,它在6/10项任务中优于现有方法,并且在CHASE-DB数据集中也优于所有现有的无监督方法。

英文摘要

A retinal vessel analysis is a procedure that can be used as an assessment of risks to the eye. This work proposes an unsupervised multimodal approach that improves the response of the Frangi filter, enabling automatic vessel segmentation. We propose a filter that computes pixel-level vessel continuity while introducing a local tolerance heuristic to fill in vessel discontinuities produced by the Frangi response. This proposal, called the local-sensitive connectivity filter (LS-CF), is compared against a naive connectivity filter to the baseline thresholded Frangi filter response and to the naive connectivity filter response in combination with the morphological closing and to the current approaches in the literature. The proposal was able to achieve competitive results in a variety of multimodal datasets. It was robust enough to outperform all the state-of-the-art approaches in the literature for the OSIRIX angiographic dataset in terms of accuracy and 4 out of 5 works in the case of the IOSTAR dataset while also outperforming several works in the case of the DRIVE and STARE datasets and 6 out of 10 in the CHASE-DB dataset. For the CHASE-DB, it also outperformed all the state-of-the-art unsupervised methods.

2605.21224 2026-05-21 physics.optics cs.AI eess.SP

Artificial Intelligence Reshapes Microwave Photonics

人工智能重塑微波光子学

Peng Li, Xihua Zou, Jia Ye, Wei Pan, Lianshan Yan

AI总结 本文研究了人工智能如何推动微波光子学的发展,通过整合人工智能与微波光子学技术,实现了在信号生成、传输、处理和检测等方面的创新突破。

Comments 13 pages, 12 figures

详情
AI中文摘要

作为一项迅速发展的跨学科领域,微波光子学(MWP)通过整合微波和光子技术,为克服传统电子系统的根本带宽限制提供了颠覆性解决方案。通过利用光子技术固有的超宽带宽和低损耗特性,MWP实现了微波、毫米波和太赫兹信号的生成、传输、处理和检测。代表性突破包括全光微波雷达系统、带宽高达320 GHz的全光模拟-数字转换器,以及数据速率高达616 Gbit/s的全光无线通信系统。同时,人工智能的快速成长正在以前所未有的方式重塑科学研究、工程和日常生活,如AI用于科学/工程和AI合作者/助手。相应地,人工智能在微波光子学的各个方面产生了深远影响,从信号生成、传输到信号处理和检测。人工智能已经革新了MWP系统的 设计、仿真、制造、测试、部署和维护,实现了超越传统系统的自主操作和卓越效率。受这些进展的启发,本文综述论文提供了人工智能赋能微波光子学的首次全面概述,系统总结了最先进的进展,并为学术界和更广泛公众提供了见解。

英文摘要

As a rapidly emerging interdisciplinary field that intrinsically integrates microwave and photonics, microwave photonics (MWP) provides disruptive solutions to overcome the fundamental bandwidth of conventional electronic systems. By exploiting the inherently ultra-wide bandwidth and low-loss characteristics of photonic technologies, MWP enables the generation, transmission, processing, and detection of microwave, millimeter-wave, and terahertz signals. Representative breakthroughs include fully photonic microwave radar systems, photonic analog-to-digital converters with bandwidth up to 320 GHz, and photonic wireless communication systems achieving data rate as high as 616 Gbit/s. Meanwhile, the rapid growth of artificial intelligence (AI) is reshaping scientific research, engineering, and daily life in unprecedented ways, such as AI for science/engineering and AI co-scientist/assistant. Correspondingly, AI is profoundly reshaping MWP in all aspects, ranging from signal generation, transmission to signal processing and detection. AI has revolutionized the design, simulation, fabrication, testing, deployment, and maintenance of MWP systems, delivering autonomous operation and exceptional efficiency beyond traditional systems. Motivated by these developments, this Review Paper provides the first comprehensive overview of AI-enabled MWP, systematically summarizing the state-of-the-art advances and presenting insights for both the academic community and the broader public.

2605.21217 2026-05-21 stat.ML cs.LG

Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment

通过协作对齐的联邦LoRA微调大型语言模型

Shuaida He, Liwen Chen, Long Feng

AI总结 本文研究了在联邦学习环境下使用LoRA进行参数高效微调的问题,提出了一种名为CLAIR的框架,通过结构低秩加块稀疏分解来恢复共享LoRA子空间并检测污染客户端,从而在噪声情况下实现精确恢复,并在不同条件下实现稳定和一致的协作集恢复。

详情
AI中文摘要

低秩适应(LoRA)已成为参数高效微调大型语言模型(LLMs)的强大工具。本文研究了在联邦学习设置下的LoRA,使客户端能够在保持参数效率的同时进行协作微调。我们专注于一个高度异质的环境,在这种环境中客户端仅共享部分结构,且大量子集可能被污染。我们提出了Collaborative Low-rank Alignment and Identifiable Recovery(CLAIR),一个意识污染的框架,仅依赖于初步的本地估计器。其公式适用于从线性回归到神经网络和LLM模块的广泛领域,只要本地适应可以表示为矩阵值更新。CLAIR通过结构低秩加块稀疏分解恢复共享LoRA子空间并检测污染客户端。我们证明了在无噪声情况下能够精确恢复共享LoRA子空间,在初步估计误差下实现稳定恢复,并在温和的分离条件下实现一致的协作集恢复。我们进一步量化了CLAIR的改进效果:它通过跨客户端平均减少子空间外的估计误差,同时在共享LoRA子空间内保留客户端特定的变异,从而在该Oracle增益超过子空间估计和良性客户端异质性的成本时优于本地微调。经验上,我们通过在文本复制任务上微调Transformer架构来展示CLAIR的优势。结果表明,与本地微调和非鲁棒联邦平均相比,CLAIR在准确检测污染客户端和改善良性客户端性能方面表现出色。

英文摘要

Low-rank adaptation (LoRA) has emerged as a powerful tool for parameter-efficient fine-tuning of large language models (LLMs). This paper studies LoRA under a federated learning setting, enabling collaborative fine-tuning across clients while preserving parameter efficiency. We focus on a highly heterogeneous regime in which clients share only partial structure and a substantial subset may be contaminated. We propose Collaborative Low-rank Alignment and Identifiable Recovery (CLAIR), a contamination-aware framework that relies only on preliminary local estimators. Its formulation applies broadly, from linear regression to neural network and LLM modules, whenever local adaptation can be represented by matrix-valued updates. CLAIR recovers the shared LoRA subspace and detects contaminated clients via a structured low-rank plus block-sparse decomposition. We prove exact recovery of the shared LoRA subspace in the noiseless case, stable recovery under preliminary estimation error, and consistent collaborative-set recovery under mild separation conditions. We further quantify the gain from CLAIR refinement: it reduces off-subspace estimation error through cross-client averaging while preserving client-specific variation within the shared LoRA subspace, thus improves over local fine-tuning whenever this oracle gain outweighs the costs of subspace estimation and benign-client heterogeneity. Empirically, we demonstrate the benefits of CLAIR by fine-tuning a Transformer architecture on a text-copying task. The results show accurate contamination detection and improved benign-client performance compared with local fine-tuning and non-robust federated averaging.

2605.21213 2026-05-21 quant-ph cs.AI cs.LG math.OC

Enhanced Reinforcement Learning-based Process Synthesis via Quantum Computing

通过量子计算增强的强化学习过程合成

Austin Braniff, Fengqi You, Yuhe Tian

AI总结 本文提出了一种基于量子强化学习的过程合成方法,通过构建通用框架将过程合成问题形式化为马尔可夫决策过程,并引入量子增强的强化学习算法以提高可扩展性,同时通过经典强化学习作为基准进行比较,展示了量子方法在过程合成中的竞争力。

详情
AI中文摘要

在本文中,我们提出量子强化学习(RL)作为解决过程合成问题的策略。基于我们先前的工作,我们开发了一个通用框架,将过程合成正式化为马尔可夫决策过程,并引入量子增强的强化学习算法来解决它,从而提高了可扩展性。早期的量子强化学习在过程合成中的实现受到量子位需求的限制,随着问题复杂度的增加,其扩展性较差。本文通过引入状态编码算法将量子位需求与问题规模解耦。使用经典强化学习作为基准,在相同的训练条件下评估量子算法。所有算法在具有递增单元数量的流程表合成问题上进行评估,以分析其性能和可扩展性。结果表明,所有方法都能在小设计空间中识别出最优的流程表设计。对于中等规模的单元数量,量子方法在每回合的基础上表现出竞争性的性能,并且在每参数的基础上具有改进的效率,优于经典强化学习基准。本文为未来量子计算在过程系统工程中的应用提供了基础,建立了比较经典和量子算法的受控基准,并展示了所提出的量子变体在本文研究的过程合成问题中仍具有竞争力。

英文摘要

In this work, we present quantum reinforcement learning (RL) as a solution strategy for process synthesis problems. Building on our prior work, we develop a generalized framework that formally poses process synthesis as a Markov decision process and introduces quantum-enhanced RL algorithms to solve it with improved scalability. Earlier implementations of quantum-based RL for process synthesis were limited by qubit requirements, which scaled poorly with problem complexity. This work overcomes this challenge by introducing state encoding algorithms to decouple qubit requirements from problem size. A classical RL-based solution strategy is used as a baseline to benchmark the quantum algorithms under identical training conditions. All algorithms are evaluated across a flowsheet synthesis problem of increasing unit counts to analyze their performance and scalability. Results show that all approaches are capable of identifying the optimal flowsheet designs in small design spaces. For moderate-scale unit counts, quantum approaches demonstrate competitive performance on a per-episode basis and improved efficiency on a per-parameter basis versus the classical RL benchmark. This work provides a foundation for future quantum computing applications within process systems engineering, establishes a controlled benchmark for comparing classical and quantum algorithms, and shows that the proposed quantum variants remain competitive for the process synthesis problem examined in this work.

2605.21211 2026-05-21 eess.SY cs.LG cs.SY math.OC

Reinforcement Learning-based Control via Y-wise Affine Neural Networks: Comparative Case Studies for Chemical Processes

基于Y-wise affine神经网络的强化学习控制:化学过程的比较案例研究

Austin Braniff, Yuhe Tian

AI总结 本文提出了一种高效且实用的强化学习控制方法,用于化学过程系统,通过Y-wise Affine Neural Network (YANN)-RL算法解决信任RL算法和训练可靠智能体的挑战,并在三个公开的化学工程案例研究中展示了其在减少训练时间和数据需求方面的优势。

Comments Accepted for publication at the 23rd IFAC World Congress, 2026

详情
AI中文摘要

在本工作中,我们提出了一种高效且实用的方法,用于将基于强化学习(RL)的控制应用于化学过程系统。这是一个尚未广泛采用RL控制的领域,主要由于RL算法的固有挑战和训练可靠智能体的耗时过程。为了解决这些挑战,我们利用了一类称为Y-wise Affine Neural Network (YANN)-RL的RL算法,该算法在我们之前的研究所提出(Braniff和Tian,2025a)。通过战略性地初始化actor和critic网络,YANN-RL算法在控制方案中提供自信且可解释的起点。我们将这种基于RL的控制方法应用于三个不同的过程工程案例研究,这些研究在PC-Gym库(Bloor等人,2026)中公开:(i)连续搅拌釜反应器(CSTR),(ii)四塔系统,以及(iii)多级萃取柱。我们的方法与几种流行的RL算法(PPO、SAC、DDPG和TD3)以及非线性模型预测控制(NMPC)进行了比较。这些案例研究证明,YANN-RL可以显著减少训练时间和所需的数据,可以放心地部署在化学过程系统中,并且在不掌握完整非线性模型的情况下可以接近NMPC的性能。

英文摘要

In this work we present an efficient and practically implementable approach for the application of reinforcement learning (RL)-based control in chemical process systems. This is an area that has yet to widely adopt RL-based control largely due to inherent challenges in trusting RL algorithms and the time-consuming process of training reliable agents. To address these challenges, we leverage a class of RL algorithms termed Y-wise Affine Neural Network (YANN)- RL, which we have developed in our prior work (Braniff and Tian, 2025a). By strategically initializing actor and critic networks YANN-RL algorithms provide confident and interpretable starting points within control schemes. We apply this RL-based control approach to three different process engineering case studies publicly available on the PC-Gym library (Bloor et al., 2026): (i) a continuous stirred tank reactor (CSTR), (ii) a four-tank system, and (iii) a multistage extraction column. Our approach is compared to several popular RL algorithms (PPO, SAC, DDPG, and TD3) and is benchmarked against nonlinear model predictive control (NMPC). These case studies demonstrate that YANN-RL can greatly reduce the training time and data needed, can be deployed with confidence for chemical process systems, and can approach the performance of NMPC without the knowledge of a full nonlinear model.

2605.21198 2026-05-21 cs.SI cs.AI

SURGE: An Event-Centric Social Media Sentiment Time Series Benchmark with Interaction Structure

SURGE:一个以事件为中心的社会媒体情感时间序列基准,包含交互结构

Chen Su, Pengsen Cheng, Yuanhe Tian, Yan Song

AI总结 该研究提出了SURGE基准,通过整合事件级时间序列与对齐的文本和交互结构,用于评估社交互动如何影响预测行为,揭示了基准在局部持久性、转移能力及回复密集期的挑战性。

详情
AI中文摘要

社交媒体上的公共事件产生大量讨论,其集体动态对意见预测和危机响应具有直接价值。捕捉这些动态在事件生命周期中的演变需要将碎片化帖子组织成事件级时间序列。现有数据集仅涵盖少量事件,且在构建时间序列时通常丢弃帖子间的交互结构,限制了跨事件类型的迁移和对交互如何塑造集体动态的受控研究。我们提出了SURGE,一个多事件社交媒体基准,将事件级时间序列与对齐的文本和交互结构联系起来。SURGE通过自动化流程生成日历对齐的时间序列,覆盖五个事件类别中的67个事件和超过80万条帖子。每个时间单元配对来自相同选定帖子的扁平和结构化文本视图,使受控评估社交交互结构对预测行为的影响成为可能。在SURGE之上,我们定义了数值预测、文本增强预测、高交互评估和留一类别外推广的基准协议。实验表明,该基准具有强局部持久性,即在绝对误差下朴素基线难以超越;现有文本增强预测器在事件驱动社交媒体数据中的迁移有限;回复密集期的难度增加,聚合指标往往掩盖了这些挑战。我们还包含了一个轻量级结构感知探针作为参考实现,展示了SURGE如何支持交互感知预测研究。

英文摘要

Public events on social media generate large volumes of discussion whose collective dynamics carry direct value for opinion forecasting and crisis response. Capturing how these dynamics evolve across an event's lifecycle requires organizing fragmented posts into event-level time series. Existing datasets cover only a small number of events within a single category, and typically discard the interaction structure between posts when constructing time series, which restricts both transfer across event types and controlled study of how interactions shape the resulting collective dynamics. We present SURGE, a multi-event social media benchmark that pairs event-level time series with aligned text and interaction structure linking posts within an event. SURGE is built through an automated pipeline that produces calendar-aligned time series at three temporal granularities, covering 67 events and more than 800K posts across five event categories. Each time bin is paired with flat and structured textual views derived from the same selected posts, enabling controlled evaluation of whether social interaction structure affects forecasting behavior. On top of SURGE we define benchmark protocols for numerical-only forecasting, text-augmented forecasting, high-interaction evaluation, and leave-one-category-out generalization. Experiments with representative time-series and multimodal forecasting models reveal three properties of the benchmark: a strong local-persistence regime in which naive baselines remain hard to beat under absolute error, limited transfer of existing text-augmented forecasters to event-driven social-media data, and increased difficulty on reply-dense periods that aggregate metrics tend to obscure. We further include a lightweight structure-aware probe as a reference implementation, illustrating how SURGE can support interaction-aware forecasting research.

2605.21167 2026-05-21 stat.ML cs.LG

A Rigorous, Tractable Measure of Model Complexity

一个严格且可计算的模型复杂度度量

Oskar Allerbo, Thomas B. Schön

AI总结 本文提出了一种严格且易于计算的模型复杂度度量方法,基于模型在不同输入上的梯度相似性,适用于参数模型和非参数模型,并扩展了多项式度数、核长度尺度等模型特定复杂度度量,同时揭示了随机傅里叶特征、随机森林、神经网络和梯度提升中的双下降现象。

详情
AI中文摘要

对模型复杂度的准确评估对于解释、泛化和模型选择等主题至关重要。然而,大多数现有复杂度度量要么依赖于启发式假设,要么计算上不可行。在本文中,我们提出了一种数学上严谨且易于计算的模型复杂度度量方法,该方法基于模型在不同输入上的梯度相似性。因此,它适用于任何参数模型,也适用于基于核的非参数模型。我们证明了我们的复杂度度量可以推广到模型特定的复杂度度量,如多项式度数(多项式回归)、核长度尺度(Matérn核)、邻居数(k-近邻)、分割数(决策树)和树数(随机森林)。我们还利用我们的度量方法获得了关于随机傅里叶特征、随机森林、神经网络和梯度提升中的双下降现象的新见解。

英文摘要

An accurate assessment of a model's complexity is crucial for topics such as interpretation, generalization, and model selection. However, most existing complexity measures either rely on heuristic assumptions or are computationally prohibitive. In this paper, we present a mathematically rigorous yet easy-to-compute measure of model complexity that is based on the similarities between the model gradients across inputs. It is thus well-defined for any parametric model, but also for kernel-based non-parametric models. We prove that our measure of complexity generalizes model-specific complexity measures such as polynomial degree (for polynomial regression), kernel length scale (for Matérn kernels), number of neighbors (for k-nearest neighbors), number of splits (for decision trees), and number of trees (for random forests). We also use our measure to obtain new insights into the double descent phenomenon for random Fourier features, random forests, neural networks, and gradient boosting.

2605.21146 2026-05-21 cs.CR cs.AI cs.SE

Detecting Trojaned DNNs via Spectral Regression Analysis

通过谱回归分析检测被植入的深度神经网络

Samuele Pasini, Jinhan Kim, Paolo Tonella

AI总结 本文提出MIST方法,通过分析模型在微调过程中的内部表示变化来检测植入的后门,利用预激活谱特征来识别与参考不一致的更新,从而在不依赖 poisoned 数据或触发器的情况下实现高准确率的后门检测。

详情
AI中文摘要

现代深度神经网络(DNN)经常被反复微调以整合新数据和功能。这种进化流程在更新数据不可信时引入了安全风险,因为攻击者可能在微调过程中植入后门。我们提出了MIST,一种后门检测方法,分析模型在微调过程中内部表示的变化。而不是尝试重建触发条件,MIST利用预激活谱特征来表征良性模型进化,并标记出与参考不一致的更新。这种框架将后门检测视为对模型更新的回归问题。在四个数据集和八个后门攻击上的实证评估表明,谱距离可靠地区分被植入的更新与干净的微调。MIST在单次更新后优于现有最先进检测精度,无需任何关于被污染数据或触发器的知识,并在多步良性进化中保持有效,具有优雅且有界的退化。这些结果表明,谱进化提供了一种稳定且假设轻量的信号,用于检测恶意模型更新。

英文摘要

Modern DNNs are repeatedly fine-tuned to incorporate new data and functionality. This evolutionary workflow introduces a security risk when updated data cannot be fully trusted, as adversaries may implant Trojans during fine-tuning. We present MIST, a Trojan detection approach that analyzes how a model's internal representations change during fine-tuning. Rather than attempting to reconstruct trigger conditions, MIST characterizes benign model evolution using pre-activation spectra and flags updates whose spectral deviations are inconsistent with this reference. This framing treats Trojan detection as a regression problem over model updates. An empirical evaluation across four datasets and eight Trojan attacks shows that spectral distances reliably distinguish Trojaned updates from clean fine-tuning. MIST outperforms state-of-the-art detection accuracy after a single update, without requiring any knowledge about the poisoned data or the trigger, and remains effective under multi-step benign evolution, with graceful and bounded degradation. These results indicate that spectral evolution provides a stable and assumption-light signal for detecting malicious model updates.

2605.21113 2026-05-21 cs.LO cs.AI

On the Complexity of Entailment for Cumulative Propositional Dependence Logics

关于累积命题依赖逻辑蕴含复杂性的研究

Kai Sauerwald, Juha Kontinen, Arne Meier

AI总结 本文研究了累积命题依赖逻辑和累积命题逻辑(带有团队语义)的蕴含问题的复杂性,通过关系模型确定了其复杂性结果。

Comments arXiv admin note: substantial text overlap with arXiv:2602.21360

详情
AI中文摘要

本文建立了并证明了累积命题依赖逻辑和累积命题逻辑(带有团队语义)的蕴含问题的复杂性结果。正如最近所显示的,累积逻辑以其System~C系统为特征,并且恰好由Kraus、Lehmann和Magidor的累积模型所捕捉。这导致了通过关系模型的蕴含问题,本文特别考虑了这一问题。

英文摘要

This paper establishes and proves complexity results for entailment for cumulative propositional dependence logic and for cumulative propositional logic with team semantics. As recently shown, cumulative logics are famously characterised by System~C and exactly captured by the cumulative models of Kraus, Lehmann and Magidor. This gives rise to the entailment problem via relational models, which is specifically considered here.

2605.21085 2026-05-21 cs.MA cs.AI cs.LG

Decoupling Communication from Policy: Robust MARL under Bandwidth Constraints

分离通信与策略:在带宽限制下的鲁棒多智能体强化学习

Alexi Canesse, Benoît Goupil, Jesse Read, Sonia Vanier

AI总结 本文提出了一种新的方法,通过引入β指标和SLIM架构,将通信路径与策略的潜在表示分离,从而在带宽受限的情况下提高多智能体强化学习的鲁棒性和性能。

详情
AI中文摘要

通信在多智能体强化学习(MARL)中起到了协调作用,但许多实际应用,例如无人机编队的搜索与救援任务,在严重的带宽限制下运行。许多通信架构仍然存在耦合瓶颈,其中共享的潜在表示用于策略执行和智能体间通信。因此,减少信息量会直接限制策略的潜在空间,通常导致显著的性能下降。我们通过两个贡献来解决这个问题。首先,我们引入β,一个归一化的每智能体带宽预算,将稀疏性、轮次和信息维度统一为一个可比的约束。其次,我们提供SLIM,一个最小的架构,将通信路径与策略的潜在表示分离,使我们能够隔离带宽的影响与策略容量的影响,同时受益于步骤内通信。我们在几个部分可观测的MARL基准上评估了我们的方法,其中通信是至关重要的。我们的方法在状态空间中实现了最先进的性能,并且在有限的通信下表现出可扩展性和鲁棒性,随着带宽的减少,降级仅是轻微的。

英文摘要

Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce $β$, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.

2605.21083 2026-05-21 physics.app-ph cs.LG physics.bio-ph physics.med-ph

AIMBio-Mat: An AI-Native FAIR Platform for Closed-Loop Materials Discovery and Biomedical Translation

AIMBio-Mat: 一个面向AI的FAIR平台,用于闭环材料发现与生物医学转化

D. -M. Mei, K. Acharya, C. M. Adhikari, M. Adhikari, S. Aryal, B. V. Benson, K. Bhatta, S. Bhattarai, N. Budhathoki, A. M. Castillo, D. Chakraborty, S. Chhetri, S. Choudhury, T. A. Chowdhury, R. D. Cruz, B. Cui, S. Dhital, K. -M. Dong, R. Gapuz, A. Ghasemi, E. Z. Gnimpieba, B. D. S. Gurung, H. A. Hashim, R. I. Harry, K. -E. Hasin, M. K. Hassanzadeh, M. K. Jha, D. Kim, K. -C. Kong, B. Lama, A. Mahat, N. Maharjan, A. Majeed, J. Mammo, M. M. Masud, K. S. Moore, A. Nawaz, H. Oli, S. A. Panamaldeniya, L. Pandey, R. Pandey, Z. Peng, A. Prem, M. M. Rana, K. Rana Magar, R. Rizk, C. S. Tadi, L. -W. Wang, Y. Yang, G. -L. Yin, C. -X. Yu, D. Zeng, M. Zhou, Q. Zhou

AI总结 本文提出AIMBio-Mat平台,通过整合材料来源、生物医学背景、知识图谱、不确定性感知的机器学习和人机协作主动学习,解决材料发现与生物医学转化中跨领域推理的问题,并提供可验证的平台蓝图。

Comments 35 pages, 4 figures, and 12 tables

详情
AI中文摘要

材料发现和生物医学转化日益需要能够跨组成、加工、结构、生物响应、可制造性、安全性和治理约束进行推理的模型。现有的材料和生物医学数据生态系统虽然强大,但仍然缺乏与AI指导发现相结合的能力。本文提出AIMBio,一个面向AI的、符合FAIR原则和治理意识的决策层框架,将材料来源、生物医学背景、知识图谱、不确定性感知的机器学习和人机协作主动学习联系起来。该框架将生物医学-材料发现建模为在不确定性下的约束多目标优化,并引入了元数据、模型文档、风险分层治理、评估指标和分阶段实施的实用要求。为使路线图可测试,我们增加了最小可行原型规范和一个用于AI指导的纳米材料药物输送的示范试点。AIMBio被定位为探索性和临床前发现基础设施,而不是临床决策支持软件;任何临床或受控设备使用都需要单独的验证、变更控制和监管审查。核心贡献是提供一个可发表的平台蓝图,将碎片化的材料和生物医学记录转化为可审计、实验可操作和转化负责任的发现工作流。

英文摘要

Materials discovery and biomedical translation increasingly require models that can reason across composition, processing, structure, biological response, manufacturability, safety, and governance constraints. Existing materials and biomedical data ecosystems are powerful but remain poorly coupled for AI-guided discovery. Here we present AIMBio, a conceptual framework for an AI-native, FAIR, and governance-aware decision layer that links materials provenance, biomedical context, knowledge graphs, uncertainty-aware machine learning, and human-in-the-loop active learning. The framework formulates biomedical-materials discovery as constrained multi-objective optimization under uncertainty and introduces practical requirements for metadata, model documentation, risk-tiered governance, evaluation metrics, and phased implementation. To make the roadmap testable, we add a minimum viable prototype specification and a worked pilot for AI-guided nanomaterials for drug delivery. AIMBio is positioned as exploratory and preclinical discovery infrastructure, not as clinical decision-support software; any clinical or regulated-device use would require separate validation, change control, and regulatory review. The central contribution is a publishable platform blueprint for converting fragmented materials and biomedical records into auditable, experimentally actionable, and translationally responsible discovery workflows.

2605.21055 2026-05-21 cs.NE cs.LG

Genetic Programming with Transformer-Based Mutation for Approximate Circuit Design

基于变压器的突变遗传编程用于近似电路设计

Ondrej Galeta, Lukas Sekanina

AI总结 本文提出了一种基于变压器的突变算子,用于改进遗传编程在近似算术电路自动设计中的进化设计和优化过程,通过混合方案防止电路近似过程停滞,并在多个目标误差约束下优于EvoApproxLib库中的现有高优化设计。

Comments To appear at IEEE World Congress on Computational Intelligence, Congress on Evolutionary Computation, Maastricht, NL, 2026

详情
AI中文摘要

最近的趋势是利用机器学习模型来提高进化设计和优化过程。我们提出了一种新的基于变压器的突变算子,用于Cartesian遗传编程(CGP)以实现近似算术电路的自动设计。我们引入了一种CGP的混合方案,其中所提出的突变算子与标准突变算子交替使用,以防止电路近似过程停滞。我们还开发了一种新的训练方案,用于底层变压器,该方案利用由成千上万的CGP染色体组成的训练向量,这些染色体代表各种近似乘法器。对于几种目标误差约束,使用基于变压器的突变算子的CGP进化出的近似乘法器在性能和优化方面优于EvoApproxLib库中的现有高优化设计。尽管训练和进化过程计算上都很耗费资源,但它们似乎是改进现有近似电路和产生新、可能可专利的电路设计所必需的步骤。

英文摘要

A recent trend is to leverage machine learning models to improve the evolutionary design and optimization process. We propose a novel transformer-based mutation operator for Cartesian genetic programming (CGP) for the automated design of approximate arithmetic circuits. We introduce a hybrid scheme for CGP in which the proposed mutation operator is switched with the standard mutation operator to prevent stagnation of the circuit approximation process. We also develop a new training scheme for the underlying transformer that utilizes training vectors composed of thousands of CGP chromosomes representing various approximate multipliers. For several target error constraints, the approximate multipliers evolved with CGP utilizing the transformer-based mutation achieve better trade-offs than the highly optimized designs available in the state-of-the-art EvoApproxLib library of approximate circuits. Although both training and evolutionary processes are computationally demanding, they appear to be necessary steps for improving existing approximate circuits and producing new, potentially patentable circuit designs.

2605.21041 2026-05-21 stat.ML cs.LG stat.ME

Conditioning Gaussian Processes on Almost Anything

对几乎任何事物进行高斯过程的条件化

Henry Moss, Lachlan Astfalck, Thomas Cowperthwaite, Colin Doumont, Sam Willis, Philipp Hennig, Christopher Nemeth, Andrew Zammit-Mangion

AI总结 本文提出了一种通用的方法,通过将高斯过程与线性扩散模型建立等价关系,实现了对任意条件语句的高效条件化,包括非线性物理模型和自然语言,从而扩展了高斯过程在现实世界建模中的应用。

详情
AI中文摘要

高斯过程(GPs)提供了一种基于函数的原理性概率模型,但精确推断仅限于线性-高斯范式。我们建立了GPs与一类线性扩散模型之间的显式等价关系,将预测采样重新表述为一个具有闭式高斯动力学和一个依赖似然的引导项的ODE,该引导项允许简单的蒙特卡洛近似。在线性-高斯设置中,我们精确恢复了标准GP条件化;超越共轭性之外,相同的机制能够处理任何允许逐点似然评估的条件语句——包括非线性物理模型,以及首次通过大型语言模型实现自然语言。白化分离了不可约的非高斯动力学,最小化了Wasserstein-2运输成本并消除了数值刚性。结果是一种通用的GP推断方案,无需专门推导。这些结果提供了一种通用机制,将现实世界知识的全部丰富性作为条件信息纳入其中,为现实世界问题的概率建模开辟了新的前沿。

英文摘要

Gaussian processes (GPs) offer a principled probabilistic model over functions, but exact inference is restricted to the linear-Gaussian regime. We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation. In the linear-Gaussian setting, we recover standard GP conditioning exactly; beyond conjugacy, the same machinery handles any conditioning statement admitting point-wise likelihood evaluation -- including non-linear physics, and, for the first time, natural language via large language models. Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost and eliminating numerical stiffness. The result is a general-purpose GP inference scheme requiring no bespoke derivations. Together, these results provide a general mechanism for incorporating the full richness of real-world knowledge as conditioning information, opening a new frontier for the probabilistic modelling of real-world problems.

2605.21002 2026-05-21 cs.CR cs.CV cs.CY cs.MM

Verifiable Provenance and Watermarking for Generative AI: An Evidentiary Framework for International Operational Law and Domestic Courts

可验证的来源和水印技术用于生成式AI:一个用于国际作战法和国内法院的证据框架

Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov, Nurana Abdullayeva

AI总结 本文提出一个统一的证据框架,将加密内容来源、稳健的统计水印和零知识证明映射到各法律制度的证明要求,通过公开基准和模型附录为法律专业人士、工程师和操作员提供可复现的参考流程。

Comments 13 pages, 4 figures, 10 tables. Submitted to IEEE Transactions on Information Forensics and Security

详情
AI中文摘要

生成式人工智能现在能够以成本低廉的方式合成逼真图像、音频和视频,这超出了传统法医学的直觉。法律后果跨越了三个迄今为止孤立研究的制度:国际作战法、国内程序和产品监管。本文提出一个统一的证据框架,将加密内容来源、稳健的统计水印和零知识证明映射到各制度的证明要求。我们定义了一个跨越朴素再生、对抗性清洗、跨模型再生、主动水印移除和内部来源伪造的五级威胁模型。我们发布了一个包含12000个生成项目(图像、音频和视频模态)的公开基准,这些项目在六个清洗管道下进行72000次评估样本测试。我们评估了四种代表性方案,报告了固定假阳性率下的真阳性率、鲁棒性曲线下面积、计算开销以及受制度条件限制的法律充分性分数。我们将经验检测界限转化为国际武装冲突法下的命令决策法律充分性阈值,以及国内程序中的刑事和民事可采性阈值,以及欧盟人工智能法案和类似制度下的持续审计阈值。结果是一个可复现的参考流程、一个公开基准和模型附录,供法律专业人士、工程师和操作员共同使用。

英文摘要

Generative artificial intelligence now synthesizes photorealistic imagery, audio, and video at a cost that defeats traditional forensic intuition. The legal consequences span three regimes studied so far in isolation: international operational law, domestic procedure, and product regulation. This article presents a unified evidentiary framework that maps cryptographic content provenance, robust statistical watermarking, and zero knowledge attestation to the proof requirements of each regime. We define a five tier threat model spanning naive regeneration, adversarial laundering, cross model regeneration, active watermark removal, and insider provenance forgery. We release a public benchmark of 12000 generated items across image, audio, and video modalities under six laundering pipelines for 72000 evaluation samples. We evaluate four representative schemes and report true positive rate at fixed false positive rate, robustness area under the curve, computational overhead, and a regime conditioned legal sufficiency score. We translate empirical detection bounds into legal sufficiency thresholds for command decisions under the law of armed conflict, for criminal and civil admissibility under domestic procedure, and for persistence audits under the European Union Artificial Intelligence Act and analogous regimes. The result is a reproducible reference pipeline, a public benchmark, and model annexes that lawyers, engineers, and operators can deploy together.

2605.20999 2026-05-21 math.PR cs.LG math.OC stat.ML

Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise

一般随机逼近的集中性在重尾马尔可夫噪声下

Shubhada Agrawal, Siva Theja Maguluri, Martin Zubeldia

AI总结 本文研究了在具有有限状态马尔可夫分量和马丁格尔差分分量的噪声下,随机逼近算法迭代项的最大集中性界。通过新的Lyapunov函数和辅助投影算法,分析了不同步长序列和随机算子性质对误差尾部行为的影响,并展示了在无界马丁格尔差分噪声情况下,误差尾部的集中性结果。

Comments 67 pages

详情
AI中文摘要

我们建立了由具有通用步长的随机逼近算法生成的迭代项的最大集中性界,其中噪声包含有限状态马尔可夫分量和马丁格尔差分分量。当马丁格尔差分噪声有界时,我们证明误差尾部可以是亚高斯、亚魏伯或比任何帕累托分布更轻但比任何魏伯分布更重,这取决于步长序列和随机算子是否几乎必然收缩、几乎必然非扩张或以正概率扩张。我们的分析依赖于一个涉及解泊松方程的矩生成函数的新型Lyapunov函数,以及一个辅助投影算法。我们通过最坏情况例子补充上界,表明更精确的上界不可能实现。我们进一步研究了当平均算子是收缩的且步长为$1/k$时无界马丁格尔差分噪声的情况,在此设置下,如果随机算子几乎必然非扩张,则误差尾部至多是噪声尾部的三倍重;如果随机算子以正概率扩张,则误差尾部可能显著更重。这些结果通过一种新的黑盒截断论证获得,将无界噪声情况转化为有界噪声情况。

英文摘要

We establish maximal concentration bounds for the iterates generated by stochastic approximation algorithms with general step sizes, where the noise has a finite-state Markovian component plus a Martingale-difference component. When the Martingale-difference noise is bounded, we show that the tail of the error can be sub-Gaussian, sub-Weibull, or something lighter than any Pareto but heavier than any Weibull, depending on the step size sequence and on whether the random operator is almost surely contractive, almost surely non-expansive, or expansive with positive probability. Our analysis relies on a novel Lyapunov function involving the moment-generating function of the solution to a Poisson equation, together with an auxiliary projected algorithm. We complement the upper bounds with worst-case examples showing that qualitatively sharper bounds are impossible. We further study the case of unbounded Martingale-difference noise when the average operator is contractive, and the step sizes are of order $1/k$. In this setting, we show that if the random operator is almost surely non-expansive, then the error tail is at most three times heavier than the noise tail, whereas if the random operator is expansive with positive probability, then the error may have substantially heavier tails. These results are obtained through a novel black-box truncation argument that reduces the unbounded-noise setting to the bounded-noise case.

2605.20982 2026-05-21 cs.DC cs.AI cs.LG

Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory

调度操作中的开销诊断:跨架构观测站

Bole Ma, Jan Eitzinger, Harald Koestler, Gerhard Wellein

AI总结 该研究通过测试四个缓解方案的假设,发现扩展进程(EP)规模变化对专家最大/均值token比率的影响最多为5%,并且mock-token基准在路由Gini系数和批大小缩放趋势上存在高估。研究发现五种架构在相同矩阵中形成两个稳定的带状分布,这些带状分布而非EP度或mock数据配置是AlltoAll-aware互连和调度设计的正确工作负载输入。

详情
AI中文摘要

AlltoAll调度是MoE专家并行性的主要瓶颈,互连社区对此做出了四种缓解方案:预测样本放置、自适应专家重新布局、分层收集和EP-aware拓扑。这四种方案都基于两个关于工作负载的假设。第一个假设是路由不平衡可以通过系统层纠正。第二个假设是评估它们的mock-token基准忠实代表生产路由。我们引入DODOCO来测试这两个假设。我们对五个MoE检查点进行仪器化,涵盖五个序列混合器设计(DeepSeek-V2-Lite MLA,DeepSeek-MoE-16B MHA,Qwen3-30B GQA,Nemotron-30B Mamba-2,Qwen3.5-35B GDN)在5x6的数据条件下网格以及匹配的EP扫描(4到32个rank在H100s上);两个假设都失败。扩展EP在每个架构的可测量范围内将每专家最大/均值token比率改变最多5%:straggler是模型路由决策的固有属性,而不是其专家落在rank上的方式。mock tokens高估路由Gini系数高达2.35倍,并制造了一个批次大小缩放趋势,一旦真实文本取代随机ID,该趋势就消失。从相同矩阵中出现第三种模式,意外的是,五种架构分裂成两个稳定的带状分布。MHA和Mamba-2(数据容错)在wikitext上降至Gini 0.105和0,150。MLA和GDN(持续集中)在所有真实文本条件下保持在0.24以上,并在mock中达到0.29到0.38。GQA是中间情况。这些带状分布,而不是EP度或mock数据配置,是AlltoAll-aware互连和调度设计的正确工作负载输入。

英文摘要

AlltoAll dispatch is the dominant bottleneck of MoE expert parallelism, and the interconnect community has responded with four families of mitigations: predictive sample placement, adaptive expert relayout, hierarchical collectives, and EP-aware topology. All four rest on two assumptions about the workload. The first is that routing imbalance is correctable by the system layer. The second is that the mock-token benchmarks evaluating them faithfully represent production routing. We introduce DODOCO to test both assumptions. We instrument five MoE checkpoints spanning five sequence-mixer designs (DeepSeek-V2-Lite MLA, DeepSeek-MoE-16B MHA, Qwen3-30B GQA, Nemotron-30B Mamba-2, Qwen3.5-35B GDN) under a 5 by 6 grid of data conditions plus a matched EP scan from 4 to 32 ranks on H100s; both assumptions fail. Scaling EP changes the per-expert max/mean token ratio by at most 5% within every architecture's measurable range: the straggler is intrinsic to the routing decision the model makes, not to how its experts land on ranks. Mock tokens overestimate routing Gini by up to a factor of 2.35 and fabricate a batch-size scaling trend that vanishes the moment real text replaces random IDs. A third pattern, unexpected, emerges from the same matrix: the five architectures cleave into two stable bands. MHA and Mamba-2 (data-resilient) drop to Gini 0.105 and 0.150 on wikitext. MLA and GDN (persistently concentrated) stay above 0.24 on every real-text condition and reach 0.29 to 0.38 on mock. GQA is the intermediate case. These bands, not the EP degree or the mock-data profile, are the right workload input to AlltoAll-aware interconnect and dispatch design.

2605.20923 2026-05-21 cs.LO cs.AI cs.PL

Causal Past Logic for Runtime Verification of Distributed LLM Agent Workflows

因果过去逻辑用于分布式大语言模型代理工作流的运行时验证

Benedikt Bollig

AI总结 本文提出了一种因果过去逻辑(CPL)用于分布式大语言模型代理工作流的运行时验证,通过在ZipperGen框架中引入CPL,实现了对工作流中因果可见事件的实时监控和控制流影响,从而将运行时验证整合到协调语言本身。

Comments 20 pages

详情
AI中文摘要

分布式大语言模型代理工作流不应被当作产生单一顺序日志来监控。在异步执行中,一个决策只能依赖于对其生命线可见的因果事件:某些日志中出现较早的事件可能在本地仍未知。我们扩展ZipperGen代理工作流框架,引入因果过去逻辑(CPL),一种用于条件和while循环中守卫的简洁过去时间时序逻辑。除了标准的过去时间模态如previous和since外,守卫可以检查另一个生命线和所选变量中最新可见的事件。公式是一种源级守卫:它由拥有生命线在线评估,并可以在运行时影响控制流。我们给出了一个具有最新值视图的向量钟监控器,并证明本地计算的监控值与守卫在当前事件处的指称语义一致。因此,运行时验证成为协调语言本身的一部分,而不是执行日志上的事后检查。

英文摘要

Distributed LLM agent workflows should not be monitored as if they produced a single sequential log. In an asynchronous execution, a decision can only depend on events that are causally visible to the lifeline that makes it: an event that appears earlier in some log may still be unknown locally. We extend the ZipperGen agent-workflow framework with Causal Past Logic (CPL), a small past-time temporal logic for guards in conditionals and while loops. In addition to standard past-time modalities such as previous and since, a guard can inspect the latest causally visible event of another lifeline and selected variables stored there. The formula is a source-level guard: it is evaluated online by the owner lifeline and can influence control flow at runtime. We give a vector-clock monitor with latest-value views and prove that the locally computed monitor value coincides with the denotational semantics of the guard at the current event. Thus runtime verification becomes part of the coordination language itself, rather than a post-hoc check over an execution log.

2605.20867 2026-05-21 cs.MA cs.CV

ProCrit: Self-Elicited Multi-Perspective Reasoning with Critic-Guided Revision for Multimodal Sarcasm Detection

ProCrit: 通过批评引导的修订实现自激发多视角推理用于多模态讽刺检测

Yingjia Xu, Jiulong Wu, Bowen Zhang, Baokui Guo, Siyuan Chai, Min Cao

AI总结 本文提出ProCrit,一种通过批评引导的修订实现自激发多视角推理的框架,用于多模态讽刺检测,解决了现有方法依赖固定视角的问题,通过动态生成多视角分析并进行协同优化。

详情
AI中文摘要

多模态讽刺检测需要对字面表达与意图意义之间的跨模态不一致进行推理,但因讽刺机制的多样性,所需的具体分析视角在样本间变化。尽管近期方法使分析过程显式化,但它们仍依赖于固定、预定义的视角,通过手工设计的路由规则独立运作。我们主张多模态讽刺检测应采用自激发多视角推理,即模型自主为每个样本生成所需的视角并逐步将其整合到一致的分析中。为实现这一目标,我们提出ProCrit,一种Proposal-Critic双智能体框架,包含用于多视角推理的提案智能体和用于外部评估和定向修订指导的批评智能体。首先,为克服现有讽刺数据集在过程级监督方面的不足,ProCrit通过动态角色智能体滚动生成过程级推理注释:一个强大的视觉-语言模型在共享上下文中依次生成分析角色,生成的多角色轨迹被展平为序列,保留跨视角依赖性的同时允许高效的自回归生成。其次,为提高推理可靠性,ProCrit采用草稿-批评-修订范式,其中独立的批评者识别推理缺陷并提供定向的自然语言反馈以指导修订。最后,我们开发了互为改进的训练框架,通过双阶段强化学习共同优化提案起草和反馈引导的修订,同时根据反馈的实际效果优化批评智能体。在三个广泛使用的基准测试上进行的实验验证了ProCrit的有效性。

英文摘要

Multimodal sarcasm detection requires reasoning over cross-modal incongruities between literal expression and intended meaning, yet the specific analytical perspectives needed vary across samples due to the diversity of sarcastic mechanisms. While recent methods make this analytical process explicit, they still rely on fixed, predefined perspectives that operate independently under hand-crafted routing rules. We argue that multimodal sarcasm detection instead calls for self-elicited multi-perspective reasoning, where a model autonomously generates the perspectives needed for each sample and progressively integrates them into a coherent analysis. To realize this goal, we propose ProCrit, a Proposal-Critic two-agent framework with a proposal agent for multi-perspective reasoning and a critic agent for external evaluation and targeted revision guidance. First, to overcome the lack of process-level supervision in existing sarcasm datasets, ProCrit synthesizes process-level reasoning annotations through a dynamic-role agentic rollout: a strong vision-language model sequentially spawns analytical roles within a shared context, and the resulting multi-role trajectories are flattened into sequences that preserve cross-perspective dependencies while enabling efficient autoregressive generation. Second, to improve reasoning reliability, ProCrit adopts a draft-critique-revise paradigm in which an independent critic identifies reasoning deficiencies and provides targeted natural-language feedback for directed revision. Finally, we develop a mutual-refinement training framework that jointly optimizes proposal drafting and feedback-guided revision via dual-stage reinforcement learning, while refining the critic agent according to the actual effectiveness of its feedback. Experiments on three widely used benchmarks demonstrate the effectiveness of ProCrit.

2605.20863 2026-05-21 cs.DC cs.LG

PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR

PlexRL: 服务化大语言模型执行在RLVR中的集群级编排

Yiqi Zhang, Fangzheng Jiao, Tian Tang, Boyu Tian, Hangyu Wang, Qiaoling Chen, Guoteng Wang, Zhen Jiang, Peng Sun, Ping Zhang, Xiaohe Hu, Ziming Liu, Menghao Zhang, Yanmin Jia, Yang You, Siyuan Feng

AI总结 本文提出PlexRL,通过集群级编排服务化大语言模型执行,解决RLVR训练中的效率问题,提升集群容量并降低GPU小时成本,同时保持算法灵活性和最小的单任务开销。

详情
AI中文摘要

可验证奖励的强化学习(RLVR)最近在大语言模型(LLMs)中解锁了强大的推理能力,触发了新算法和数据的快速探索。然而,RLVR训练 notoriously 不高效:长尾回放、工具引起的停滞以及回放和训练之间资源需求的不对称性引入了大量空闲时间,无法通过作业本地优化如同步流水线、异步回放或 colocated 执行来消除。我们认为这种低效是结构性的。虽然个体RLVR作业中的空闲间隙是不可避免的,但它们在不同作业之间 largely 抗相关,因此可以在集群级别利用。基于这一观察,我们提出了PlexRL,一个用于在RLVR作业中多路复用统一LLM服务的集群级运行时。通过集中管理模型放置、状态转换和功能级调度,在严格亲和约束下,PlexRL将LLM执行时间片分配到作业中以填补否则空闲的时期,而无需昂贵的模型迁移。我们的实现和评估表明,PlexRL显著提高了有效集群容量,并通过最大37.58%减少了用户GPU小时成本,同时保持算法灵活性并引入最小的单作业开销。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has recently unlocked strong reasoning capabilities in large language models (LLMs), triggering rapid exploration of new algorithms and data. However, RLVR training is notoriously inefficient: long-tailed rollouts, tool-induced stalls, and asymmetric resource requirements between rollout and training introduce substantial idle time that cannot be eliminated by job-local optimizations such as synchronous pipelining, asynchronous rollout, or colocated execution. We argue that this inefficiency is structural. While idle gaps are unavoidable within individual RLVR jobs, they are largely anti-correlated across jobs and therefore exploitable at the cluster level. Leveraging this observation, we present PlexRL, a cluster-level runtime for multiplexing unified LLM services across RLVR jobs. By centrally managing model placement, state transitions, and function-level scheduling under strict affinity constraints, PlexRL time-slices LLM execution across jobs to fill otherwise idle periods without expensive model migration. Our implementation and evaluations demonstrate that PlexRL significantly improves effective cluster capacity and reduces user GPU hour cost by maximum 37.58% while preserving algorithmic flexibility and introducing minimal per-job overhead.