arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4101
2606.00655 2026-06-02 cs.MA cs.AI cs.CY

Scaling Behavior of Single LLM-Driven Multi-Agent Systems

单一LLM驱动的多智能体系统的扩展行为

Jialing Li, Zhouhong Gu, Yin Cai, Hongwei Feng

发表机构 * Fudan University(复旦大学)

AI总结 本文通过提出顺序迭代多智能体系统(SIMAS)框架,系统研究了同质多智能体系统性能随智能体数量变化的扩展规律,发现性能并非单调提升,而是受协作协同与协调开销之间的权衡支配,呈现收益递减模式。

详情
AI中文摘要

基于LLM的多智能体系统(MAS)这一新兴领域有望通过协作智能处理复杂任务,但其扩展行为和内在集体动力学的基本问题仍未被充分探索。本文系统研究了同质MAS的性能如何随智能体数量增加而变化,将协作变量与模型或知识异质性分离。我们提出了顺序迭代多智能体系统(SIMAS)框架,这是一种以顺序智能体间通信为中心的极简架构,以清晰观察扩展效应。通过跨不同任务和模型规模的广泛实验,我们确定MAS性能并非随智能体数量单调扩展,而是遵循收益递减模式,受协作协同与协调开销之间的权衡支配。我们的发现表明,有效的MAS需要足够强大的基础LLM,任务类型关键地调节最优智能体数量,并且集体智能是一种依赖于策略性交互设计的新兴属性,而非智能体数量的必然结果。性能下降源于协调开销,而不仅仅是长上下文失败,并且扩展趋势在结构化辩论拓扑等交互架构中具有普遍性。这项工作为MAS扩展规律提供了基础理解,为设计高效协作系统提供了实践指导,并挑战了“更多智能体必然带来更好性能”的普遍假设。

英文摘要

The burgeoning field of LLM-based Multi-Agent Systems (MAS) promises to tackle complex tasks through collaborative intelligence, yet fundamental questions regarding their scaling behavior and intrinsic collective dynamics remain underexplored. This paper systematically investigates how the performance of a homogeneous MAS evolves as the number of agents increases, isolating the variable of collaboration from model or knowledge heterogeneity. We propose the Sequential Iterative Multi-Agent System (SIMAS) framework, a minimalist architecture centered on sequential inter-agent communication, to clearly observe scaling effects. Through extensive experiments across diverse tasks and model scales, we establish that MAS performance does not scale monotonically with agent count but follows a pattern of diminishing returns, governed by a trade-off between collaborative synergy and coordination overhead. Our findings reveal that effective MAS requires a sufficiently capable base LLM, that task type critically modulates the optimal agent count, and that collective intelligence is an emergent property contingent on strategic interaction design rather than a guaranteed outcome of agent plurality. The performance degradation stems coordination overhead rather than merely long-context failure, and the scaling tendency generalizes across interaction architectures like structured debate topologies. This work provides a foundational understanding of MAS scaling laws, offering practical guidance for designing efficient collaborative systems and challenging the prevailing assumption that more agents invariably lead to better performance.

2606.00636 2026-06-02 cs.AR cs.AI

LP5X-PIM Sim: A High-Fidelity HW/SW Integrated Simulator for LPDDR5X-PIM

LP5X-PIM Sim:用于LPDDR5X-PIM的高保真硬件/软件集成模拟器

SangHoon Cha, Jaewan Choi, Byeongho Kim, Yoonah Paik, Sukhan Lee, Kyomin Sohn

发表机构 * Samsung Electronics, South Korea(三星电子(韩国))

AI总结 本文介绍三星电子开发的LPDDR5X-PIM模拟器,通过集成硬件数据路径和软件控制层的高保真模型,实现系统性能和能效的精确评估。

Comments 4 pages, 4 figures, tech note

详情
AI中文摘要

本技术说明描述了由三星电子开发的LPDDR5X-PIM模拟器的架构和执行结果。基于最新研究和内部规范,该模拟器提供了LPDDR5X-PIM模块的硬件数据路径和软件控制层的高保真模型。这种集成的硬件-软件仿真方法能够在最大化PIM资源利用率的同时,精确评估系统性能和能效。我们改进了现有的仿真框架以与实际硬件实现保持一致,确保行为准确性的一致性。关于LPDDR5X-PIM的具体架构和电路设计的进一步技术细节将在未来的出版物中披露。

英文摘要

This tech note describes the architecture and execution results of the LPDDR5X-PIM simulator, developed by Samsung Electronics. Based on the latest research and internal specifications, the simulator provides a high-fidelity model of both the hardware data paths and the software control layers of the LPDDR5X-PIM block. This integrated hardware-software simulation approach enables precise evaluation of system performance and energy efficiency while maximizing PIM resource utilization. We have refined existing simulation frameworks to align with actual hardware implementation, ensuring consistent behavioral accuracy. Further technical details regarding the specific architecture and circuit design of the LPDDR5X-PIM will be disclosed in future publications

2606.00621 2026-06-02 cs.CR cs.AI cs.CY

Authenticity Debt and the Synthetic Content Threat Landscape: A Layered Framework for Trust, Provenance, and IP Governance in the Generative AI Era

真实性债务与合成内容威胁格局:生成式AI时代信任、溯源和知识产权治理的分层框架

Shubhashis Sengupta, Benjamin McCarty, Milind Savagaonkar, Rhine Andotra

发表机构 * Accenture Services Pvt. Ltd.(Accenture服务有限公司)

AI总结 提出真实性债务概念,并基于零信任架构原则设计分层参考架构,整合密码学溯源、人工验证和持续治理,以应对生成式AI带来的合成内容威胁。

详情
AI中文摘要

生成式人工智能从根本上改变了内容的生产方式。它使得高保真文本、图像、音频和视频能够以接近零的边际成本创建、修改和重新分发。这种转变使企业和生态系统面临跨四个相互加强的真实性层(真实性、溯源、完整性和问责性)的多种风险,而传统控制措施单独无法充分应对。我们引入了真实性债务的概念:当组织在未保留可验证来源、完整性和问责性的情况下部署AI生成内容时,累积的制度性负债,将暴露推迟到监管、法律或市场审查之下。本文提出了生成式AI危害和攻击向量的全面多维分类法,调查了技术控制(包括数字水印、溯源框架(C2PA、Adobe CAI)和检测技术)的能力和失效模式,并论证了在开放、对抗和不断变化的环境中没有任何单一机制是足够的。借鉴零信任架构原则和企业治理框架,我们提出了一个分层参考架构,整合密码学溯源、人工验证和持续治理,以大规模维持可辩护的真实性。我们进一步审视了监管格局(欧盟AI法案、美国联邦贸易委员会、NIST AI风险管理框架),并为寻求将真实性建设为制度基础设施而非事后考虑的组织确定了实用指导原则。

英文摘要

Generative artificial intelligence has fundamentally changed how content is now produced. It has enabled how high-fidelity text, images, audio, and videos are created, modified, and redistributed at near-zero marginal cost. This shift exposes enterprises and ecosystems to a number of risks across four reinforcing authenticity layers -- authenticity, provenance, integrity, and accountability -- that traditional controls are inadequate to address in isolation. We introduce the concept of authenticity debt: the cumulative institutional liability that accumulates when organizations deploy AI-generated content without preserving verifiable origin, integrity, and accountability, deferring exposure that surfaces under regulatory, legal, or market scrutiny. This paper presents a comprehensive, multi-dimensional taxonomy of generative AI harms and attack vectors, surveys the capabilities and failure modes of technical controls including digital watermarking, provenance frameworks (C2PA, Adobe CAI), and detection technologies, and argues that no single mechanism is sufficient in open, adversarial, and evolving environments. Drawing on Zero Trust Architecture principles and enterprise governance frameworks, we propose a layered reference architecture that integrates cryptographic provenance, human-in-the-loop verification, and continuous governance to sustain defensible authenticity at scale. We further examine the regulatory landscape (EU AI Act, U.S.\ FTC, NIST AI RMF) and identify practical guiding principles for organizations seeking to build authenticity as institutional infrastructure rather than an afterthought.

2606.00610 2026-06-02 cs.IR cs.AI cs.MA

MemGraphRAG: Memory-based Multi-Agent System for Graph Retrieval-Augmented Generation

MemGraphRAG:基于记忆的多智能体系统用于图检索增强生成

Chuanjie Wu, Zhishang Xiang, Yunbo Tang, Zerui Chen, Qinggang Zhang, Jinsong Su

发表机构 * Xiamen University(厦门大学) Jilin University(吉林大学)

AI总结 提出MemGraphRAG框架,通过基于记忆的多智能体系统构建高质量知识图谱,并设计记忆感知的分层检索算法,在多个基准上超越现有模型。

Comments Accepted by KDD 2026

详情
AI中文摘要

检索增强生成(RAG)已成为通过利用外部知识来减轻大型语言模型(LLMs)幻觉的重要方法。虽然对简单查询有效,但传统RAG在处理信息高度碎片化的大规模非结构化语料库时存在困难。基于图的RAG(GraphRAG)引入知识图谱来捕获结构关系,从而实现对复杂推理的更全面检索。然而,现有的GraphRAG方法依赖孤立的、片段级别的提取来构建图,缺乏对整个语料库的全局视角。因此,这些方法经常导致主题不一致、逻辑冲突和结构碎片化的图,从而降低检索性能。在本文中,我们提出MemGraphRAG,一种新颖的框架,引入基于记忆的多智能体系统以确保高质量的图构建。具体来说,MemGraphRAG采用由共享记忆支持的智能体协作社会,在整个提取过程中提供统一的全局上下文。这种机制允许智能体动态解决逻辑冲突并保持整个语料库的结构连通性。此外,我们提出了一种针对所构建图的记忆感知分层检索算法。在多个基准上的大量实验表明,MemGraphRAG以相当的效率优于最先进的基线模型。我们的代码可在https://github.com/XMUDeepLIT/MemGraphRAG获取。

英文摘要

Retrieval-Augmented Generation (RAG) has become an essential method for mitigating hallucinations in Large Language Models (LLMs) by leveraging external knowledge. Although effective for simple queries, traditional RAG struggles with large-scale, unstructured corpora where information is highly fragmented. Graph-based RAG (GraphRAG) incorporates knowledge graphs to capture structural relationships, enabling more comprehensive retrieval for complex reasoning. However, existing GraphRAG methods rely on isolated, fragment-level extraction for graph construction, lacking a global perspective on the whole corpus. As a result, these methods frequently lead to thematically inconsistent, logically conflicting, and structurally fragmented graphs that degrade retrieval performance. In this paper, we propose MemGraphRAG, a novel framework that introduces a memory-based multi-agent system to ensure high-quality graph construction. Specifically, MemGraphRAG employs a collaborative society of agents supported by shared memory, which provides a unified global context throughout the extraction process. This mechanism allows agents to dynamically resolve logical conflicts and maintain structural connectivity throughout the corpus. Furthermore, we propose a memory-aware hierarchical retrieval algorithm tailored for the constructed graph. Extensive experiments on multiple benchmarks demonstrate that MemGraphRAG outperforms the state-of-the-art baseline models with comparable efficiency. Our code is available at https://github.com/XMUDeepLIT/MemGraphRAG.

2606.00590 2026-06-02 cs.IR cs.AI

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Critic-R:使用具有自然语言内省反馈的指令调优检索器改进智能搜索

Md Zarif Ul Alam, Alireza Salemi, Hamed Zamani

发表机构 * Center for Intelligent Information Retrieval(智能信息检索中心) University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校)

AI总结 提出Critic-R框架,通过引入评论模型评估智能体内省推理轨迹,实现检索模型与推理代理之间的反馈闭环,无需人工标注即可优化检索质量与下游答案准确性。

详情
AI中文摘要

智能搜索系统迭代地与检索模型交互以回答复杂查询。尽管取得了实质性进展,但优化检索器以适应智能搜索仍然具有挑战性,通常需要大量的协同训练或黄金标准标注,这限制了现实世界的适用性。我们提出Critic-R,一个在推理和训练过程中明确关闭推理代理与检索模型之间反馈循环的框架。Critic-R引入了一个评论模型,该模型在消费检索到的证据后评估代理的内省推理轨迹,以确定检索到的上下文是否充分支持下一步推理。Critic-R具有两种互补机制:Critic-R-Zero,一种推理时查询细化循环,迭代地重写查询和检索指令;以及Critic-Embed,一种检索模型的优化方法,利用成功和失败的细化轨迹作为自动监督,无需手动相关性标注。我们在HotpotQA、2WikiMultihopQA、MuSiQue和Bamboogle上评估Critic-R。结果表明,Critic-R显著提高了检索质量和下游答案准确性。

英文摘要

Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring heavy co-training or gold-standard annotations that limit real-world applicability. We propose Critic-R, a framework that explicitly closes the feedback loop between the reasoning agent and the retrieval model during both inference and training. Critic-R introduces a critic model that evaluates the agent's introspective reasoning trace after consuming retrieved evidence to determine whether the retrieved context sufficiently supports the next reasoning step. Critic-R has two complementary mechanisms: Critic-R-Zero, an inference-time query refinement loop that iteratively rewrites queries and retrieval instructions, and Critic-Embed, an optimization approach for retrieval models that leverages successful and failed refinement trajectories as automatic supervision without requiring manual relevance annotation. We evaluate Critic-R on HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle. Results show that Critic-R significantly improves both retrieval quality and downstream answer accuracy.

2606.00584 2026-06-02 stat.ML cs.LG

Spectra-Guided Neural Tucker Factorization

光谱引导的神经Tucker分解

Fusheng Wang, Yikai Hou

发表机构 * School of Automation, Chongqing University of Posts and Telecommunications(重庆邮电大学自动化学院) College of Computer and Information Science, School of Software, Southwest University(西南大学计算机与信息科学学院、软件学院)

AI总结 提出光谱引导的神经Tucker分解(SG-NTF),通过连续光谱空间映射和时空共门控机制,实现高维不完整张量的高效补全。

详情
AI中文摘要

本文针对高维不完整(HDI)张量补全问题,提出光谱引导的神经Tucker分解(SG-NTF)。为规避离散表示的局限性,SG-NTF将标量时间戳映射到连续光谱空间以抽象时间周期性。同时,时空共门控(STCG)机制通过时空上下文上的乘法调制显式过滤潜在交互。在真实世界HDI张量上的评估验证了SG-NTF在参数效率下保持有竞争力的补全精度。

英文摘要

This paper proposes Spectra-Guided Neural Tucker Factorization (SG-NTF) for High-Dimensional and Incomplete (HDI) tensor completion. Circumventing discrete representational limits, SG-NTF maps scalar timestamps into a continuous spectral space to abstract temporal periodicities. Concurrently, a Spatio-Temporal Co-Gating (STCG) mechanism explicitly filters latent interactions via multiplicative modulation on spatiotemporal contexts. Evaluations on real-world HDI tensors verify that SG-NTF maintains competitive completion accuracy with parameter efficiency.

2606.00552 2026-06-02 cs.OS cs.DC cs.NI cs.RO cs.SY eess.SY

Edge-Based QoS-Aware Adaptive Task Placement: A Closed-Loop Control in Multi-Robot Systems

基于边缘的QoS感知自适应任务放置:多机器人系统中的闭环控制

Thien Tran, Jonathan Kua, Thuong Hoang, Minh Tran, Honghao Lyu, Jiong Jin

发表机构 * Deakin University(德肯大学) RMIT University(皇家墨尔本理工大学) Zhejiang University(浙江大学) Swinburne University of Technology(西姆伯恩理工大学)

AI总结 提出一种QoS感知的自适应任务放置(ATP)控制器,通过多指标成本评分和闭环控制,在共享边缘节点上动态切换任务放置,以降低尾延迟和截止时间违规。

Comments 6 pages, 2 figure, 1 algorithm, accepted as a regular paper on the 24th IEEE International Conference on Industrial Informatics (INDIN), 26-29 July, 2026, Melbourne, Australia

详情
AI中文摘要

多机器人系统(MRS)越来越多地将计算密集型感知任务卸载到边缘节点,以满足严格的时间敏感服务质量(QoS)约束。然而,共享边缘节点上的静态任务编排可能因网络延迟、抖动和边缘资源争用而严重降低QoS。我们使用Raspberry Pi节点构建了一个以边缘为中心的MRS试验平台,评估了三种模式下的相机到机械臂流水线:本地执行、静态卸载和QoS感知的自适应任务放置(ATP)控制器。ATP通过两秒控制窗口内的多指标成本(归一化延迟、CPU利用率和切换开销)对候选放置进行评分。该闭环视觉伺服试验平台配备了亚毫秒级时钟同步、网络仿真以及跨节点的多指标详细监控,以捕获真实抖动。在计算压力和网络故障场景下的实验结果表明,静态边缘卸载降低了板载CPU负载,但放大了尾延迟和截止时间违规。相比之下,QoS感知的ATP控制器通过基于测量延迟和利用率阈值切换任务放置,持续降低了截止时间违规和尾延迟。总体而言,结果将ATP定位为MRS的实用边缘侧控制原语,并为云-边缘机器人部署在更广泛的云-雾自动化中提供了具体设计指南,同时激励了面向工业信息物理系统的QoS感知多目标工作负载编排。

英文摘要

Multi-robot systems (MRS) increasingly offload compute-intensive perception tasks to edge nodes to meet strict time-sensitive Quality-of-Service (QoS) constraints. However, static task orchestration on a shared edge node can severely degrade QoS due to network latency, jitter, and edge-resource contention. We present a pilot edge-centric MRS testbed using Raspberry Pi nodes to evaluate a camera-to-manipulator pipeline under three modes: local execution, static offloading, and a QoS-aware Adaptive Task Placement (ATP) controller. ATP scores candidate placements using a multi-metric cost (normalized latency, CPU utilization, and switching overhead) over two-second control windows. The closed-loop visual servoing testbed is instrumented with sub-millisecond clock synchronization, network emulation, and detailed monitoring of multiple metrics across nodes to capture realistic jitter. Experimental results under compute-stress and network-fault scenarios show that static edge offloading reduces on-board CPU load but amplifies tail latency and deadline misses. In contrast, the QoS-aware ATP controller, by switching task placement based on measured latency and utilization thresholds, consistently lowers deadline violations and tail latency. Overall, the results position ATP as a practical edge-side control primitive for MRS and concrete design guidelines for Cloud-Edge Robotics deployments within the broader cloud-fog automation, while motivating QoS-aware multi-objective workload orchestration for industrial cyber-physical systems.

2606.00550 2026-06-02 cs.HC cs.ET cs.RO

A Four-Tier Communication Architecture and Sim-to-Real Validation of a Graphical Open-Source Platform for Robotic Engineering Education

用于机器人工程教育的四层通信架构与图形化开源平台的仿真到现实验证

Thien Tran, Khang Duong, Minh Tran, Jonathan Kua, Thuong Hoang, Jiong Jin

发表机构 * Deakin University(德金大学) RMIT University(皇家墨尔本理工大学) Swinburne University of Technology(斯威本科技大学)

AI总结 针对大学实验室中机械臂教育规模化面临的商业数字孪生成本高和ROS门槛高的问题,提出一种四层通信架构,基于图形化开源平台(GOSP)实现虚拟环境与物理机器人的数据桥接,并通过仿真到现实验证其硬件无关的可行性。

Comments 4 pages, 4 figures, accepted as a Work-in-Progress (WiP) paper, on the 24th IEEE International Conference on Industrial Informatics (INDIN), 26-29 July, 2026, Melbourne, Australia

详情
AI中文摘要

在大学实验室中规模化开展真实的机械臂教育面临一个结构性难题:商业数字孪生通常成本高昂且脚本僵化,而开源机器人中间件(ROS)对新手来说存在陡峭的技术和语法门槛。为解决这一后勤和教育上的摩擦,本工作进展(WiP)论文提出了一种可扩展的四层通信架构,专为可持续的机器人课程设计。我们的研究不关注软件应用设计,而是考察桥接视觉概念环境与物理机器人端点所需的基础数据交换机制,并以图形化开源平台(GOSP)作为基础实例化。本WiP详细介绍了该框架的技术集成,包括3D视觉骨架建模与强大的ROS中间件后端,重点阐述了复杂通信例程的序列化、路由和封装。使用多轴空间轨迹进行的初步仿真到现实验证表明,封装这些通信管道提供了一条足够保真度的硬件无关路径。通过桥接虚拟设计与物理执行,该架构蓝图为工程教育提供了可行的基础设施。

英文摘要

The persistent challenge in scaling authentic manipulator education within university laboratories is a structural dichotomy: commercial digital twins are often cost-prohibitive and rigidly scripted, whereas open-source robotics middleware (ROS) imposes steep technical and syntax barriers for novices. To resolve this logistical and educational friction, this Work-in-Progress (WiP) paper proposes a scalable four-tier communication architecture tailored for sustainable robotic curricula. Rather than focusing on software application design, our study examines the underlying data exchange mechanisms required to bridge visual conceptual environments with physical robotic endpoints, utilizing the Graphical Open-Source Platform (GOSP) as a foundational instantiation. This WiP details the framework's technical integration of 3D visual armature modeling with a robust ROS middleware backend, emphasizing the serialization, routing, and encapsulation of intricate communication routines. Preliminary sim-to-real validation using multi-axis spatial trajectories confirms that encapsulating these communication pipelines provides a sufficient fidelity hardware-agnostic pathway. By bridging virtual design and physical execution, this architectural blueprint offers a viable infrastructure for engineering education.

2606.00520 2026-06-02 math.OC cs.LG stat.ML

In-Expectation Convergence of Stochastic Gradient Methods under Heavy-Tailed Noise

重尾噪声下随机梯度方法的期望收敛性

Zijian Liu

发表机构 * Stern School of Business, New York University(纽约大学斯特恩商学院)

AI总结 针对重尾噪声(有限p阶矩,p∈(1,2))下随机梯度方法的收敛性问题,证明了随机镜像下降(SMD)、加速随机镜像下降(ASMD)在凸优化中以及SGD和带动量的SGD(SGDM)在非凸优化中的期望收敛性,无需算法修改或有界域假设。

详情
AI中文摘要

许多随机梯度方法被认为在随机梯度的噪声仅具有有限$p$阶矩($p\in\left(1,2\right)$)时不会收敛,这种设置被称为重尾噪声假设。然而,最近的一些研究发现,随机梯度下降($\textsf{SGD}$)无需对其更新规则进行任何修改,就能在有界域的凸问题中出人意料地收敛,这凸显了经典随机梯度方法的潜力。受这一最新进展的启发,我们对重尾噪声下的随机优化进行了全面研究,并为凸优化中的随机镜像下降($\textsf{SMD}$)和加速随机镜像下降($\textsf{ASMD}$)以及非凸优化中的$\textsf{SGD}$和带动量的随机梯度下降($\textsf{SGDM}$)建立了新的期望收敛结果。值得注意的是,我们的结果不仅无需算法修改,而且避免了先前工作中施加的限制性假设,如有界域。更重要的是,我们的分析为研究重尾随机优化提供了一个新颖、优雅且强大的框架,为理解一阶随机梯度方法开辟了一条新途径。

英文摘要

Many stochastic gradient methods are believed not to converge when the noise in stochastic gradients has only a finite $p$-th moment for $p\in\left(1,2\right)$, a setting known as the heavy-tailed noise assumption. However, some recent studies have found that Stochastic Gradient Descent ($\textsf{SGD}$), without any modification to its update rule, can surprisingly converge in expectation for convex problems with bounded domains, highlighting the potential of classical stochastic gradient methods. Inspired by this recent progress, we provide a comprehensive study of stochastic optimization under heavy-tailed noise and establish new in-expectation convergence results for Stochastic Mirror Descent ($\textsf{SMD}$) and Accelerated Stochastic Mirror Descent ($\textsf{ASMD}$) in convex optimization, and for $\textsf{SGD}$ and Stochastic Gradient Descent with Momentum ($\textsf{SGDM}$) in nonconvex optimization. Notably, our results not only hold without algorithmic changes but also avoid restrictive assumptions, such as bounded domains, imposed in prior work. More importantly, our analysis provides a new, elegant, and powerful framework for studying heavy-tailed stochastic optimization, opening a new route to understanding first-order stochastic gradient methods.

2606.00500 2026-06-02 cs.DS cs.LG math.ST stat.ML stat.TH

Easy, robust approximate message passing for planted spike models

用于植入尖峰模型的简单、鲁棒近似消息传递

Misha Ivkov, Tselil Schramm

发表机构 * Stanford University(斯坦福大学)

AI总结 针对含对抗性噪声的尖峰矩阵模型,提出一种结合谱预处理与鲁棒谱初始化的算法,使近似消息传递(AMP)在无需修改的情况下实现鲁棒性,输出与无噪声AMP结果接近的向量。

Comments 32 pages

详情
AI中文摘要

我们提出了一种简单高效的算法,用于尖峰矩阵设置中的鲁棒近似消息传递(AMP)。特别地,设 $\varepsilon$ 为足够小的常数,并假设 $X \in \mathbb R^{n \times n}$ 是带有植入秩-$1$ 尖峰的高斯矩阵,而 $E \in \mathbb R^{n \times n}$ 是支撑在 $\varepsilon n \times \varepsilon n$ 主子矩阵上的对抗性选择矩阵。令 $v_{\mathrm{AMP}}(X)$ 为在未损坏矩阵 $X$ 上执行 AMP 迭代的输出。我们给出一个过程,仅给定损坏矩阵 $Y = X + E$,即可计算向量 $v_{\mathrm{ALG}}(Y)$,该向量与 $v_{\mathrm{AMP}}(X)$ 的差距为 $\tilde{O}(\sqrt{\varepsilon})$,适用于包括稀疏主成分分析(PCA)、非负 PCA 和 $\mathbb Z_2$ 同步在内的一类 AMP 迭代。我们的算法由谱预处理步骤结合鲁棒谱初始化过程组成;给定这些输入,我们证明(或许令人惊讶地)AMP 开箱即用具有鲁棒性。

英文摘要

We present a simple and efficient algorithm for robust approximate message passing (AMP) in the spiked matrix setting. In particular, let $\varepsilon$ be a sufficiently small constant, and suppose that $X \in \mathbb R^{n \times n}$ is a Gaussian matrix with a planted rank-$1$ spike, and $E \in \mathbb R^{n \times n}$ is an adversarially chosen matrix supported on an $\varepsilon n \times \varepsilon n$ principal minor. Let $v_{\mathrm{AMP}}(X)$ be the output of an AMP iteration on the uncorrupted matrix $X$. We give a procedure that, given access only to the corrupted matrix $Y = X + E$, computes a vector $v_{\mathrm{ALG}}(Y)$ which is $\tilde{O}(\sqrt{\varepsilon})$-close to $v_{\mathrm{AMP}}(X)$, for any of a class of AMP iterations which includes sparse Principal Component Analysis (PCA), non-negative PCA, and $\mathbb Z_2$ synchronization. Our algorithm consists of a spectral pre-processing step combined with a robust spectral initialization procedure; given these inputs, we prove that (perhaps surprisingly) AMP is robust out-of-the-box.

2606.00497 2026-06-02 cs.CR cs.CL

"I Strongly Suspect This Website Is a Scam": Benchmarking PII Leakage and Detection without Defense in Autonomous Web Agents

“我强烈怀疑这个网站是骗局”:自主网络代理中无防御的PII泄露与检测基准测试

Soham Roy, Sarthakbrata Halder, Arya Bharaty, Vaibhav Bhaskar, Yash Sinha, Dhruv Kumar, Srikant Panda, Murari Mandal

发表机构 * KIIT Bhubaneshwar(KIIT布巴内什瓦尔) BITS Pilani(比特斯理工学院) Lam Research(拉姆研究)

AI总结 本文通过构建包含91个攻击者控制环境和10个良性孪生基线的基准Scammer4U,评估前沿自主网络代理在社交工程攻击下的PII泄露风险,发现关键PII泄露率高达54-93%,并揭示了代理检测到攻击但仍有35.9%概率提交PII的检测-行动差距。

Comments 24 pages

详情
AI中文摘要

欺骗性网络内容广泛存在于互联网上,通常被称为社交工程攻击,它操纵自主网络代理将用户的个人身份信息(PII)提交给攻击者控制的端点。在本文中,我们表明社交工程攻击在从前沿网络代理中提取关键级PII方面非常有效,对已部署的代理系统构成严重风险。为了量化这一风险,我们引入了Scammer4U,一个预先注册的基准测试,包含91个攻击者控制的环境和10个良性孪生基线,涵盖8个攻击向量和16个站点类别,采用8轴因子分类法,隔离单个攻击设计因素的因果贡献。在前沿代理中,我们发现无隐私指导时关键级PII泄露率达到54-93%,而良性孪生基线为0%,证实泄露归因于攻击而非偶然的表单填写。升级提示级缓解措施在四个模型家族中产生急剧的模型依赖性降低,并且在汇总水平上仍不足以可靠地防止关键PII提交。最关键的是,我们识别出一个检测-行动差距:独立LLM法官确认代理推理已标记网站为可疑的情况下,代理仍然在35.9%的会话中提交关键PII,而代理未表达怀疑时为66.1%,这一30.2%的差距在四个模型家族中均稳健。我们的发现表明,基于代理自身对攻击识别的防御措施依赖于错误的信号,这激发了独立于代理推理循环的出站提交输出级拦截。

英文摘要

Deceptive web content, widely instantiated across the internet and commonly known as \textit{social-engineering attacks}, manipulates autonomous web agents into submitting users' personally identifiable information (PII) to attacker-controlled endpoints. In this paper, we show that social-engineering attacks are highly effective at extracting critical-tier PII from frontier web agents, posing a severe risk to deployed agentic systems. To quantify this risk, we introduce \textbf{\textsc{Scammer4U}}, a pre-registered benchmark of 91 attacker-controlled environments and 10 benign-twin baselines, spanning 8 attack vectors and 16 site categories on an 8-axis factorial taxonomy that isolates the causal contribution of individual attack design factors. Across frontier agents, we find that critical-tier PII leakage reaches 54--93\% under no privacy guidance, compared to 0\% on benign-twin baselines, confirming that leakage is attack-attributable rather than incidental form-filling. Escalating prompt-level mitigation yields sharply model-dependent reductions across the four families and remains insufficient to reliably prevent critical PII submission at the pooled level. Most critically, we identify a detection--action gap: agents whose reasoning an independent LLM judge confirms has flagged the site as suspicious still submit critical PII in 35.9\% of sessions, versus 66.1\% when no suspicion is verbalized, a 30.2\% gap robust across all four model families. Our findings reveal that defenses conditioned on the agent's own recognition of an attack are gating on the wrong signal, motivating output-level interception of outbound submissions that operates independently of the agent's reasoning loop.

2606.00422 2026-06-02 cs.IR cs.LG

UniPinRec: Unifying Generative Retrieval and Ranking at Pinterest Scale

UniPinRec:在Pinterest规模下统一生成式检索与排序

Hanyu Li, Yi-Ping Hsu, Aditya Mantha, Prabhat Agarwal, Laksh Bhasin, Jialu Wang, Hongtao Lin, Bella Huang, Yaxin Li, Xinyi Li, Chuxi Wang, Kousik Rajesh, Hooshmand Shokri Razaghi, Shunyao Li, Zongyue Qin, Jaewon Yang, James Li, Dhruvil Deven Badani, Jiajing Xu, Charles Rosenberg

发表机构 * Pinterest

AI总结 提出UniPinRec,通过共享Transformer编码用户行为序列,结合掩码动作建模、混合训练样本和跨阶段KV缓存共享,在Pinterest生产系统中首次实现检索与排序的全栈统一,提升在线参与度并降低延迟。

详情
AI中文摘要

现代推荐系统主要将检索和排序作为独立模型训练,尽管两者都越来越依赖编码相同用户行为数据的大型Transformer,导致参数、计算和服务成本重复。先前的工作统一了模型架构,但未统一完整流程:输入格式、训练过程和服务栈在阶段间仍然分散。我们提出UniPinRec,在Pinterest实现了检索和排序的全栈统一:一种输入格式、一个模型、一个训练阶段,部署在现有服务基础设施中。共享Transformer将用户行为序列编码为候选无关的表示,通过任务特定的头部分支到检索(ANN点积)和排序(交叉注意力)。三个关键思想使此工作成立:(1)掩码动作建模(MAM)消除了交错,使得无需加倍上下文长度即可实现权重共享;(2)混合训练样本将动作序列与feedview曝光列表配对,以共同满足两个目标;(3)跨阶段KV缓存共享重用检索中的用户历史计算用于排序,相比服务两个独立模型减少了总FLOPs。部署在Pinterest核心表面,UniPinRec实现了约+1%的在线参与度提升,同时将端到端服务延迟降低11.1%,QPS提升63.6%。据我们所知,这是首个在生产推荐系统中实现检索和排序全栈统一的工作,涵盖输入、模型、训练和服务。

英文摘要

Modern recommendation systems predominantly train retrieval and ranking as separate models despite both increasingly relying on large transformers encoding the same user behavior data, duplicating parameters, compute, and serving cost. Prior work unifies the model architecture but not the full pipeline: input formats, training procedures, and serving stacks remain fragmented across stages. We present UniPinRec, which achieves full-stack unification of retrieval and ranking at Pinterest: one input format, one model, one training stage, deployed within existing serving infrastructure. A shared transformer encodes the user action sequence into candidate-independent representations that branch into retrieval (ANN dot-product) and ranking (cross-attention) via task-specific heads. Three ideas make this work: (1) Masked Action Modeling (MAM) eliminates interleaving, enabling weight sharing without doubling context length; (2) Blended training examples pair action sequences with feedview impression slates to satisfy both objectives jointly; (3) Cross-stage KV cache sharing reuses user-history computation from retrieval for ranking, reducing total FLOPs versus serving two independent models. Deployed in the Pinterest core surfaces, UniPinRec delivers approximately +1% online engagement lift while cutting end-to-end serving latency by 11.1% and lifting QPS by 63.6%. To our knowledge, this is the first full-stack unification of retrieval and ranking, covering inputs, model, training and serving, deployed in a production recommendation system.

2606.00417 2026-06-02 cs.NI cs.AI

AgentxGCore: Agentic AI for Next-Generation Mobile Core Network

AgentxGCore:面向下一代移动核心网络的智能体AI

Maria Katarine Santana Barbosa, Kelvin L. Dias

发表机构 * Centro de Informática - Universidade Federal de Pernambuco(计算机中心 - 佩鲁巴科联邦大学)

AI总结 本文提出AgentxGCore,通过智能体AI原生层扩展3GPP架构,利用多智能体系统实现基于实时信息的闭环优化,支持自组织和自适应。

Comments This paper has been accepted for publication in IEEE Network

详情
AI中文摘要

为满足新兴应用的严格要求以及日益复杂的网络管理和操作,下一代移动网络(NextG)或6G将在核心网(CN)上采用AI原生架构。在此进程中,第三代合作伙伴计划(3GPP)已通过新功能扩展蜂窝CN,作为集成分析、人工智能(AI)和机器学习的第一步。然而,这些新功能受限于集中式方法和管理复杂性。此外,随着大型语言模型(LLM)的兴起,网络编排和管理进入新时代,利用并赋能基于意图的网络(IBN)范式。同时,AI智能体和智能体AI集成了推理与行动(ReAct),使得能够利用此类意图持续与网络交互。与主要采用智能体AI来缓解CN中部署和配置复杂性的现有方法不同,本文介绍了AgentxGCore,它利用智能体AI原生层扩展3GPP架构,并基于超越下一代核心网(xGC)域中的现有API构建系统。该提案建立了基于实时信息的AI驱动闭环,用于持续优化,实现自组织和自适应。我们的方法涉及一个多智能体专用系统,分为网络规划智能体(能够可视化网络状态并制定满足意图的计划)和网络执行器(负责批评并执行计划)。为验证所提方案,使用开源CN、异构数据集构建了环境,并采用不同的LLM来证明其有效性。

英文摘要

To meet the stringent requirements of emerging applications and the increasingly complex network management and operation, the Next Generation Mobile Networks (NextG), or 6G, will adopt an AI-native architecture on the Core Network (CN). In this movement, the Third Generation Partnership Project (3GPP) has extended the cellular CN with new function as a first step toward integrating analytics, Artificial Intelligence (AI), and machine learning. However, those new functionalities are constrained by a centralized approach and managerial complexity. Furthermore, with the rise of Large Language Models (LLMs), a new era in network orchestration and management begins, leveraging and empowering the Intent-based Networking (IBN) paradigm. In addition, AI agents and Agentic AI integrate Reasoning and Acting (ReAct), enabling the usage of such intents to continuously interact with the network. Unlike state-of-the-art approaches that primarily employ Agentic AI to mitigate deployment and configuration complexity in the CN, this paper introduces AgentxGCore, which leverages an Agentic AI-Native layer to extend the 3GPP architecture and enable a system based on the existing APIs across the Beyond Next Generation Core (xGC) domain. This proposal establishes an AI-driven closed-loop for continuous optimization based on real-time information, enabling self-organization and self-adaptation. Our approach involves a multi-agent specialized system, divided into a network planner agent, capable of visualizing the network state and developing a plan to meet the intents, and a network executor, responsible for criticizing and executing the plan. To validate the proposed solution, an environment was built using an open-source CN, heterogeneous datasets, and different LLMs were employed to demonstrate its effectiveness.

2606.00402 2026-06-02 stat.ME cs.AI stat.AP

A Distribution-Free Framework for Rewrite-Based Human-text Detection via Knockoff Filtering

基于重写的人类文本检测的无分布框架:通过Knockoff过滤

Yi Liu

发表机构 * Prorata.ai

AI总结 提出一种无分布统计框架,将任意基于重写的检测器转化为具有有限样本FDR保证的检测器,无需重新训练,通过将重写检测视为具有knockoff结构的多重假设检验问题实现。

详情
AI中文摘要

我们提出了一种无分布统计框架,该框架无需重新训练即可将任意基于重写的检测器转化为具有有限样本FDR保证的检测器。我们的关键观察是,基于重写的检测隐式地构建了knockoff样本,使得LLM生成的文本检测可以被表述为具有knockoff结构的多重假设检验问题。这一视角将检测统计量的设计与错误发现的控制分离开来,通过一个简单的校准过程,使现有的重写检测器能够继承有限样本错误发现率(FDR)保证。我们在三个检测模型、19个领域和四个LLM上展示了可靠的FDR控制和有意义的检测能力。

英文摘要

We propose a distribution-free statistical framework that converts arbitrary rewrite-based detectors into detectors with finite-sample FDR guarantees without retraining. Our key observation is that rewrite-based detection implicitly constructs knockoff samples, enabling LLM-generated text detection to be formulated as a multiple hypothesis testing problem with knockoff structure. This perspective separates the design of detection statistics from the control of false discoveries, allowing existing rewrite detectors to inherit finite-sample false discovery rate (FDR) guarantees through a simple calibration procedure. We demonstrate reliable FDR control with meaningful detection power across three detection models, 19 domains, and four LLMs.

2606.00393 2026-06-02 eess.IV cs.CV

AutoIQ: An Ensemble Framework for Automatic Assessment of Geometric Distortion in Prostate Diffusion-Weighted Imaging

AutoIQ:前列腺扩散加权成像中几何畸变自动评估的集成框架

Haoran Sun, Lixia Wang, Yin-Chen Hsu, Hsu-Lei Lee, Chang Gao, Fei Han, Robert Grimm, Vibhas Deshpande, Ziyang Long, Hsin-Jung Yang, Rola Saouaf, Alessandro D'Agnolo, Timothy Daskivich, Hyung Kim, Debiao Li, Yibin Xie

发表机构 * Biomedical Imaging Research Institute, Cedars-Sinai Medical Center(生物医学成像研究 institute, Cedars-Sinai 医疗中心) Department of Bioengineering, University of California(生物工程系,加州大学) Siemens Medical Solutions USA Inc.(西门子医疗解决方案美国公司) Siemens Healthineers AG(西门子健康影像股份有限公司) Department of Imaging, Cedars-Sinai Medical Center(成像部,Cedars-Sinai 医疗中心) Department of Nuclear Medicine, Cedars-Sinai Medical Center(核医学部,Cedars-Sinai 医疗中心) Department of Urology, Cedars-Sinai Medical Center(泌尿科,Cedars-Sinai 医疗中心)

AI总结 提出AutoIQ集成机器学习框架,结合分割和配准方法量化DWI几何畸变,用于自动分类畸变严重程度,在独立测试集上达到0.95准确率。

Comments Original research; 11 pages, 7 figures, 1 table

详情
AI中文摘要

前列腺扩散加权成像(DWI)中的几何畸变会损害病灶定位并降低基于MRI的临床评估的可靠性。我们提出了AutoIQ,一个用于自动量化和分类DWI几何畸变严重程度的集成机器学习框架。共分析了140例回顾性前列腺双参数MRI检查,包括33次严重畸变需要重复采集的扫描和107次基于放射科专家评估可接受的畸变扫描。AutoIQ结合了两种互补的畸变量化策略:一种基于分割的方法,测量T2加权成像(T2WI)和DWI之间的前列腺边界不匹配;另一种基于配准的方法,估计DWI到T2WI对齐后的变形幅度。由此产生的畸变分数用于训练单个分类器和逻辑回归集成模型。两种计算方法均显著区分了严重和可接受的畸变病例(p < 0.001)。在独立测试集上,集成模型达到了0.95的准确率、0.93的F1分数和0.98的AUC,优于单个模型。这些结果表明,AutoIQ可以为前列腺DWI提供自动化的定量质量评估,并可能有助于识别需要重复采集的扫描。

英文摘要

Geometric distortion in prostate diffusion-weighted imaging (DWI) can impair lesion localization and reduce the reliability of MRI-based clinical assessment. We propose AutoIQ, an ensemble machine learning framework for automatic quantification and classification of DWI geometric distortion severity. A total of 140 retrospective prostate biparametric MRI examinations were analyzed, including 33 scans with severe distortion requiring repeat acquisition and 107 scans with acceptable distortion based on expert radiologist assessment. AutoIQ combines two complementary distortion quantification strategies: a segmentation-based method measuring prostate boundary mismatch between T2-weighted imaging (T2WI) and DWI, and a registration-based method estimating deformation magnitude after DWI-to-T2WI alignment. The resulting distortion scores were used to train individual classifiers and a logistic-regression ensemble model. Both computational methods significantly differentiated severe from acceptable distortion cases (p < 0.001). On an independent test set, the ensemble model achieved an accuracy of 0.95, F1-score of 0.93, and AUC of 0.98, outperforming individual models. These results suggest that AutoIQ can provide automated, quantitative quality assessment for prostate DWI and may help identify scans that require repeat acquisition.

2606.00370 2026-06-02 cs.HC cs.AI

Agentic Authoring of Interactive Multiview Visualizations in Genomics

交互式多视图基因组学可视化的智能体创作

Astrid van den Brandt, Kiroong Choe, Sehi L'Yi, Devin Lange, Nils Gehlenborg

发表机构 * Harvard Medical School(哈佛医学院) Boston College(波士顿学院)

AI总结 针对基因组学可视化创作中定制化不足和编程门槛高的问题,提出基于大语言模型的智能体方案,通过结构化输出和迭代优化提升可视化质量。

Comments 11 pages, 12 figures

详情
AI中文摘要

多样化的基因组学数据、科学问题和分析任务通常需要高度专业化的可视化。因此,用户通常必须定制或创作适合其数据的新可视化。现有工具要么定制能力有限,要么需要大量学习或编程,即使表达力强的工具也假设用户具备可视化专业知识,而许多用户缺乏这一点。智能体和大型语言模型方法越来越多地应用于复杂的科学任务,包括可视化。自然语言对话界面为复杂可视化的创作民主化提供了一条有希望的途径。在基因组学背景下,这些方法面临额外挑战:基因组学可视化通常整合异构数据类型,并由多个链接的交互式视图组成。这些挑战促使我们设计更结构化的基于LLM的方案。我们首先描述了普通LLM生成在基因组学可视化中成功和失败的地方,确定了八个质量维度。然后,我们比较了六种方案——直接生成、固定流水线和四种智能体配置(在专业智能体数量和是否存在审查者方面有所不同)——跨越159个案例,涵盖三个查询模糊性和规范复杂性级别。所有方案都使用Gosling可视化语法作为结构化输出。智能体迭代在感知质量上显著优于两个基线,而更复杂的智能体架构没有带来额外收益。我们讨论了为特定领域可视化创作设计智能体系统的启示。所有补充材料可在https://osf.io/uqe83获取。

英文摘要

Diverse genomics data, scientific questions, and analysis tasks typically demand highly specialized visualizations. Therefore, users often must customize or author new ones tailored to their data. Existing tools are usually either limited in customization or require substantial learning or programming, and even expressive tools assume visualization expertise many users lack. Agentic and large language model (LLM) approaches are increasingly applied to complex scientific tasks, including visualization. Natural-language conversational interfaces offer a promising path to democratizing the authoring of complex visualizations. In the context of genomics, these approaches face additional challenges: genomics visualizations typically integrate heterogeneous data types and are composed of multiple linked interactive views. These challenges motivate more structured LLM-based schemes. We first characterize where vanilla LLM generation succeeds and fails for genomics visualization, identifying eight quality dimensions. We then compare six schemes--direct generation, a fixed pipeline, and four agentic configurations varying in the number of specialist agents and the presence of a reviewer--across 159 cases spanning three levels of query ambiguity and specification complexity. All schemes use the Gosling visualization grammar as structured output. Agentic iteration substantially improves perceived quality over both baselines, while more complex agent architectures yield no additional benefit. We discuss implications for designing agentic systems for domain-specific visualization authoring. All supplemental materials are available at https://osf.io/uqe83.

2606.00369 2026-06-02 cs.CY cs.LG

Quantifying the Salience of Geo-Cultural Values for Pluralistic Safety Alignment

量化地理文化价值对多元安全对齐的显著性

Arkadiy Saakyan, Charvi Rastogi, Lora Aroyo

发表机构 * University of Oxford(牛津大学) University of Cambridge(剑桥大学)

AI总结 通过多层次模型分析,发现文化区域归属对安全评分有显著影响(p<0.05),约10%的项目存在文化敏感性,当前LLM无法可靠替代人类评分员但可辅助筛选。

Comments 119 pages, 13 figures. ICML 2026 camera ready

详情
AI中文摘要

AI模型的安全全球部署需要与跨文化的人类价值观对齐。然而,安全评估数据集中的评分者群体在地理上仍然高度同质,未能捕捉地理文化差异。此外,在控制年龄、性别和种族等人口统计学因素后,这些差异是否仍然存在尚不清楚。通过对安全数据集的元分析,我们发现大多数数据集未报告地理文化信息,而那些报告的数据集缺乏统一的方法来联合分析地理文化和人口统计学相关性。利用Inglehart-Welzel跨文化变异维度,我们通过多层次模型证明,文化区域归属解释了超出标准人口统计学变量的安全评分方差(6个数据集中p<0.05)。此外,我们的分析表明,我们检查的数据集中大约10%的项目具有文化敏感性:如果没有充分的文化代表性,这些项目很可能被错误分类为安全。我们将LLM评估为评分替代工具和分诊工具,发现当前的LLM不能可靠地替代评分员,尽管它们可以帮助优先选择文化敏感项目进行人工标注。我们的发现推动了更多文化多元的安全评估,并提供了支持其实践的实用建议。

英文摘要

Safe global deployment of AI models requires alignment with human values that vary across cultures. Yet rater pools in safety evaluation datasets remain largely geographically homogeneous, failing to capture geo-cultural differences. Further, it remains unclear whether such differences persist after controlling for demographics such as age, gender, and ethnicity. Through a meta-analysis of safety datasets, we find that most do not report geo-cultural information, and those that do lack a unified methodology to jointly analyze geo-cultural and demographic correlates. Using the Inglehart-Welzel dimensions of cross-cultural variation, we demonstrate via multilevel modeling that cultural zone membership explains variance in safety ratings beyond standard demographics (p<0.05 across 6 datasets). Moreover, our analysis indicates that roughly 10% of items in the datasets we examined are culturally sensitive: likely to be misclassified as safe without adequate cultural representation. We evaluate LLMs as both rater surrogates and triage tools, finding that current LLMs do not reliably stand in for raters, though they can help prioritize culturally sensitive items for human annotation. Our findings motivate more culturally pluralistic safety evaluation and offer practical takeaways to support it.

2606.00327 2026-06-02 stat.ME cs.LG stat.AP stat.ML

Cluster Analysis with Resampling for Validation and Exploration (CARVE)

基于重采样的聚类验证与探索分析(CARVE)

Kai R. Wycik, Tiffany M. Tang, Tarek M. Zikry, Genevera I. Allen

发表机构 * Department of Statistics, Columbia University, New York, NY, USA(哥伦比亚大学统计学系) Center for Theoretical Neuroscience, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA(哥伦比亚大学理论神经科学中心、Zuckerman思维-大脑-行为研究所) Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, USA(诺丁汉大学应用与计算数学与统计学系) School of Data and Information Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA(北卡罗来纳大学夏洛特分校数据与信息科学学院) Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA(哥伦比亚大学癌症动力学伊万·里弗斯研究所)

AI总结 提出CARVE开源软件包,通过重采样评估聚类稳定性和泛化性,在全局、簇和样本级别提供诊断,优于传统聚类验证指标。

详情
AI中文摘要

聚类在科学领域被广泛用作下游数据驱动科学发现的基础。然而,聚类结果对算法选择、预处理和聚类数$k$高度敏感,导致科学声明往往不可重复。当前用于验证聚类解决方案的最先进技术包括轮廓系数、Davies-Bouldin和Calinski-Harabasz等聚类验证指标(CVI),这些指标依赖于几何假设,但在生物医学研究中遇到的重尾、高维和非线性结构数据上失效。基于重采样的替代方法——基于聚类稳定性和泛化性的思想——已被提出,但仍分散在专门的工具中,缺乏统一、易用的软件。我们通过CARVE(基于重采样的聚类验证与探索分析)填补了这一空白,这是一个开源的Python和R包,可联合评估多个聚类算法和超参数,在全局、簇和样本级别返回稳定性和泛化性诊断,以及基于原则的选择规则和基于共识的簇标签。在六个合成基准测试中,CARVE一致地恢复了接近最优的聚类,而经典指标则显著退化。在实验基因组学和蛋白质组学数据集上,当经典CVI完全失效时,CARVE恢复了更精细的生物结构。CARVE提供与scikit-learn兼容的Python API和与Seurat工作流兼容的类似R接口。

英文摘要

Clustering is widely used across the sciences as the foundation for downstream data-driven scientific discoveries. However, clustering results are highly sensitive to the choice of algorithm, preprocessing, and the number of clusters $k$, producing scientific claims that are often not reproducible. The current state of the art for validating clustering solutions consists of clustering validation indices (CVIs) such as Silhouette, Davies-Bouldin, and Calinski-Harabasz, which rely on geometric assumptions that break down on the heavy-tailed, high-dimensional, and nonlinearly structured data encountered in biomedical research. Resampling-based alternatives - grounded in the ideas of clustering stability and generalizability - have been proposed but remain scattered across specialized tools with no unified, accessible software. We fill this gap with CARVE (Cluster Analysis with Resampling for Validation and Exploration), an open-source Python and R package that jointly evaluates multiple clustering algorithms and hyperparameters, returning stability and generalizability diagnostics at the global, cluster, and sample level together with principled selection rules and consensus-based cluster labels. Across six synthetic benchmarks CARVE consistently recovers near-optimal clusterings where classical indices degrade substantially. On experimental genomics and proteomics data sets, CARVE recovers finer biological structure when classical CVIs collapse entirely. CARVE is available with a scikit-learn-compatible Python API and an analogous R interface compatible with Seurat workflows.

2606.00324 2026-06-02 cs.IR cs.AI

LLMs Need Encoders for Semantic IDs Too

LLM 也需要语义 ID 的编码器

Xiangyi Chen, Zelun Wang, Xinyi Li, Yi-Ping Hsu, Jaewon Yang, Jiajing Xu

发表机构 * Pinterest United States(Pinterest美国公司)

AI总结 提出 PrefixMem,一种基于前缀 n-gram 记忆表的轻量级语义 ID 编码器,为 LLM 提供结构化、前缀条件的表示,显著提升生成推荐中的语义 ID 准确率和检索召回率。

详情
AI中文摘要

多模态 LLM 使用专用编码器来桥接非语言模态(图像用视觉编码器,音频编解码器令牌用深度模型),因为原始令牌嵌入无法捕获模态特定的结构。我们认为语义 ID(SID),即生成推荐中使用的层次化代码,构成了另一种这样的模态:SID 级别令牌的含义取决于其前缀上下文,但当前系统只是将 SID 令牌添加到词汇表中,并依赖训练从头学习这些上下文相关的含义。我们提出 PrefixMem,一种基于前缀 n-gram 记忆表的轻量级 SID 编码器,它在 SID 令牌位置为 LLM 提供结构化、前缀条件的表示。与多模态 LLM 中的视觉编码器类似,PrefixMem 可以独立预训练,然后附加到任何 LLM 上进行联合训练。我们在 Pinterest 的大规模数据上,跨多个 LLM 家族进行评估,结果表明,在相同的训练计算量下,PrefixMem 将最深层次 SID 准确率相对提升高达 46%,完整 SID 检索召回率相对提升高达 22%。编码器的优势集中在贪心解码失败的困难样本上,准确率相对提升高达 77%,这证实了 SID 令牌与其他非语言模态一样,受益于专用编码器。

英文摘要

Multimodal LLMs use dedicated encoders to bridge non-language modalities (vision encoders for images, depth models for audio codec tokens) because raw token embeddings alone cannot capture modality-specific structure. We argue that Semantic IDs (SIDs), the hierarchical codes used in generative recommendation, constitute another such modality: a SID level token's meaning depends on its prefix context, yet current systems simply add SID tokens to the vocabulary and rely on training to learn these context-dependent meanings from scratch. We propose PrefixMem, a lightweight SID encoder based on prefix n-gram memory tables that provides the LLM with structured, prefix-conditioned representations at SID token positions. Like vision encoders in multimodal LLMs, PrefixMem can be pre-trained independently and then attached to any LLM for joint training. We evaluate on large-scale data from Pinterest across multiple LLM families and show that PrefixMem improves deepest-level SID accuracy by up to 46% relative and full-SID retrieval recall by up to 22% relative at matched training compute. The encoder's benefit concentrates on hard examples where greedy decoding fails, with up to 77% relative accuracy gains, confirming that SID tokens benefit from a dedicated encoder just as other non-language modalities do.

2606.00312 2026-06-02 math.NA cs.LG cs.NA

Stochastic Rounding Increases Small Singular Values

随机舍入增加小奇异值

Linkai Ma, Tingzhou Yu, Petros Drineas

发表机构 * Department of Computer Science, Purdue University(计算机科学系,普渡大学) Department of Mathematics, University of Alberta(数学系,阿尔伯塔大学)

AI总结 本文证明随机舍入作为低精度浮点运算的量化方案,不仅对极端长宽比矩阵,而且对恒定长宽比矩阵都能提升尾部奇异值簇,从而更广泛地发挥谱正则化作用。

详情
AI中文摘要

在过去的六七年中,随机舍入(SR)作为一种低精度浮点运算的量化方案重新引起了广泛关注,其应用涵盖数值分析和现代机器学习系统。最近的研究表明,SR通过增加极瘦长(或对称地,极矮胖)矩阵的最小奇异值来充当隐式正则化器。在这项工作中,我们从两个方向大幅改进并扩展了这一理解。首先,我们证明SR的正则化效应并不局限于极端长宽比区域:它对于恒定长宽比的矩阵仍然存在。其次,我们证明SR不仅正则化最小奇异值,而是提升谱尾部整个奇异值簇。这些结果共同提供了随机舍入作为谱正则化器的更一般特征,揭示其效应超越极端长宽比,并作用于奇异值谱的更广泛部分。

英文摘要

Over the past half-dozen years, stochastic rounding (SR) has regained significant attention as a quantization scheme for low-precision floating-point arithmetic, with applications spanning numerical analysis and modern machine learning systems. Recent work has shown that SR acts as an implicit regularizer by increasing the smallest singular value of extremely tall-and-thin (or, symmetrically, short-and-fat) matrices. In this work, we substantially sharpen and extend this understanding in two directions. First, we show that the regularization effect of SR is not restricted to extreme aspect ratio regimes: it persists for matrices with constant aspect ratio. Second, we demonstrate that SR does not merely regularize the smallest singular value, but instead lifts entire clusters of singular values at the tail of the spectrum. Together, these results provide a more general characterization of stochastic rounding as a spectral regularizer, revealing that its effects extend beyond extremal aspect ratios and act on a broader portion of the singular value spectrum.

2606.00308 2026-06-02 cs.SE cs.AI cs.LG

How Generation Architecture Shapes Code Complexity in Multi-Agent LLM Systems: A Paired Study on HumanEval

生成架构如何塑造多智能体LLM系统中的代码复杂度:基于HumanEval的配对研究

Nazmus Ashrafi

发表机构 * GitHub

AI总结 通过配对实验比较六种多智能体架构在HumanEval上的代码复杂度,发现架构复杂度与功能正确性无正相关,最简架构在准确率上持平或超越复杂架构。

Comments 16 pages, 7 figures, 7 tables

详情
AI中文摘要

大语言模型代码生成已从单次提示转向多智能体编排——分析师、编码员、测试员和调试器流水线——并且几乎完全根据功能正确性进行评估。这些架构是否也影响它们生成代码的结构复杂度,以及哪些编排层承担了成本,在很大程度上仍未得到检验:先前的工作记录了提示级别对代码复杂度的影响,但架构级别的问题仍是开放的。我们在GPT-4o系列的两个模型下,针对所有164个HumanEval任务(1,968个配对观测),使用五个RADON复杂度度量(SLOC、圈复杂度以及Halstead体积、难度和努力),比较了六种广泛使用的多智能体配置(Basic、AC、ACT、Debugger、AC+Debugger、ACT+Debugger)。我们在所有完成和仅通过条件下应用了配对非参数统计流程(Friedman总体检验、Wilcoxon符号秩事后检验与Holm校正、Kendall's W和配对秩双列效应量)。六种架构坍缩为两个不可区分的复杂度簇,间隔50-130%的差距,在两个模型和两种条件下分区相同;在架构层中,分析师-编码员分割增加了复杂度,运行时调试器没有——并且在分析师-编码员背景下主动降低复杂度——而测试员则重新增加复杂度。重簇的额外复杂度并未带来pass@1优势:最简架构在准确率上匹配或超越最重架构。因此,LLM代码生成中的架构细化应通过所关注维度上的实测收益来证明,而非假设。

英文摘要

Large-language-model code generation has shifted from single-shot prompting to multi-agent orchestrations - analyst, coder, tester, and debugger pipelines - and is evaluated almost exclusively on functional correctness. Whether these architectures also affect the structural complexity of the code they produce, and which orchestration layers carry the cost, remains largely unexamined: prior work has documented prompt-level effects on code complexity, but the architecture-level question is open. We compare six widely-used multi-agent configurations (Basic, AC, ACT, Debugger, AC+Debugger, ACT+Debugger) under two models from the GPT-4o family across all 164 HumanEval tasks - 1,968 paired observations - using the five RADON complexity metrics (SLOC, cyclomatic complexity, and Halstead Volume, Difficulty, and Effort). We apply a paired non-parametric statistical pipeline (Friedman omnibus, Wilcoxon signed-rank post-hoc with Holm correction, Kendall's $W$ and matched-pairs rank-biserial effect sizes) in both all-completions and passing-only conditions. The six architectures collapse into two indistinguishable complexity clusters separated by a 50-130% gap, the same partition in both models and under both conditions; among the architectural layers, the analyst-coder split inflates complexity, the runtime debugger does not - and on the analyst-coder background actively deflates it - and the tester re-inflates it. The heavy cluster's additional complexity buys no pass@1 advantage: the leanest architectures match or beat the heaviest on accuracy. Architectural elaboration in LLM code generation should therefore be justified by measured benefit on the dimensions that matter, not assumed.

2606.00298 2026-06-02 math.NA cs.LG cs.NA cs.SY eess.SY math.DS math.OC

Symmetric Hermite quadrature-based balanced truncation for learning linear dynamical systems from derivative data

基于对称Hermite求积的平衡截断:从导数数据学习线性动力系统

Sean Reiter, Steffen W. R. Werner

发表机构 * New York University(纽约大学) Virginia Tech(弗吉尼亚理工学院)

AI总结 提出一种对称Hermite求积平衡截断算法,通过传递函数及其导数数据构建线性降阶模型,保持状态空间Hermite性和渐近稳定性。

Comments 14 pages, 2 figures, 4 tables

详情
AI中文摘要

数据驱动的降阶建模是控制系统计算机辅助设计的重要组成部分。本文提出了一种新颖的对称Hermite形式的求积平衡截断算法,该算法通过评估全阶系统的传递函数及其导数来构建线性降阶模型。值得注意的是,Hermite形式保留了用于生成数据的系统的理想定性性质,例如状态空间Hermite性,进而保持渐近稳定性。

英文摘要

Data-driven reduced-order modeling is an essential component in the computer-aided design of control systems. In this work, we present a novel symmetric Hermite formulation of the quadrature-based balanced truncation algorithm that constructs linear reduced-order models from evaluations of the full-order system's transfer function and its derivative. Significantly, the Hermite formulation preserves desirable qualitative properties of the system used to generate the data, such as state-space Hermiticity and, consequently, asymptotic stability.

2606.00297 2026-06-02 eess.SY cs.RO cs.SY

Predicted-Flow Control Barrier Functions for Real-Time Safe Optimal Control

预测流控制障碍函数用于实时安全最优控制

Amirsaeid Safari, Jesse B. Hoagg

发表机构 * Department of Mechanical and Aerospace Engineering, University of Kentucky(机械与航空航天工程系,肯塔基大学)

AI总结 本文提出预测流控制障碍函数(P-CBF),通过将CBF推广为预测流的泛函,结合终端候选和规划时间偏移,实现有限预测时域内的安全证书,并统一了有限时域积分成本优化与安全认证。

详情
AI中文摘要

控制障碍函数(CBF)通过状态上的逐点条件提供实时安全保证。然而,合成有效的CBF是困难的,且得到的控制器是短视的。为解决短视问题,本文引入了预测流控制障碍函数(P-CBF),它将CBF从当前状态的函数推广为在有限预测时域内参数化控制计划下的预测流的泛函。为了安全,P-CBF可以证明预测流在整个预测时域内处于安全集中。然而,候选P-CBF面临与候选CBF相同的挑战,即控制约束使得保证P-CBF的有效性变得困难。本文通过引入终端候选P-CBF(要求预测流在终端时刻终止于备份安全集)和规划时间偏移(调节预测时域,提供额外的自由度以确保可行性)来解决这一挑战。实时控制以及控制计划参数和规划时间偏移的演化由单个凸优化联合确定,该优化保证可行且使相关安全集前向不变。所得到的安全最优流控制在整个预测时域内提供安全证书,并统一了有限时域积分成本优化与安全认证。如果控制约束是凸多面体,则该优化简化为二次规划(QP)。该QP实现称为FlowBarrier,在非完整地面机器人穿越密集环境的场景中进行了验证。FlowBarrier与非线性模型预测控制和两种基于CBF的安全滤波方法在100次试验中进行了比较,FlowBarrier实现了最高的目标到达率、零安全违规和最低的计算时间。

英文摘要

Control barrier functions (CBFs) provide real-time safety guarantees through pointwise conditions on the state. However, synthesizing a valid CBF is difficult and the resulting controllers are myopic. To address myopia, this article introduces predicted-flow control barrier functions (P-CBFs), which generalize the CBF from a function of the current state to a functional of a predicted flow under a parametrized control plan over a finite prediction horizon. For safety, a P-CBF can certify that the predicted flow is in a safe set over the entire prediction horizon. However, candidate P-CBFs suffer from the same challenge as candidate CBFs, namely, control constraints make it difficult to guarantee that the P-CBF is valid. This article resolves this challenge by introducing a terminal candidate P-CBF requiring that the predicted flow end in a backup safe set at the terminal time, and a planning-time shift that modulates the prediction horizon, providing an additional degree of freedom to ensure feasibility. The real-time control and the evolution of the control-plan parameter and planning-time shift are determined jointly by a single convex optimization that is guaranteed to be feasible and renders the associated safe set forward invariant. The resulting safe optimal flow control provides a safety certificate over the entire prediction horizon and unifies finite-horizon integral-cost optimization with safety certification. This optimization reduces to a quadratic program (QP) if the control constraints are a convex polytope. The QP implementation, termed FlowBarrier, is validated on a nonholonomic ground robot navigating a dense environment. FlowBarrier is compared to nonlinear model predictive control and two CBF-based safety filter methods across 100 trials, where FlowBarrier achieves the highest goal-reaching rate, zero safety violations, and the lowest computation time.

2606.00296 2026-06-02 stat.ML cs.LG math.AP

Is Zero-Shot Super-Resolution Possible in Operator Learning?

零样本超分辨率在算子学习中是否可能?

Unique Subedi, Ambuj Tewari

发表机构 * Unique Subedi Ambuj Tewari

AI总结 本文系统研究算子学习中的零样本超分辨率现象,证明其在信息论上可能不可行,并识别输出函数的Hölder光滑性作为充分条件,给出泛化界。

详情
AI中文摘要

神经算子常被报道具有零样本超分辨率能力,即模型在粗网格上训练后,无需额外训练即可在更细的测试网格上产生准确预测。尽管有强有力的经验证据,这一现象的理论基础仍不清楚。本文对算子学习中的零样本超分辨率进行了系统的理论研究。我们首先证明,即使在输入函数在整个连续域上可用且真实映射为简单秩一线性算子的良性设置下,零样本超分辨率在信息论上也可能不可行。然后,我们识别出输出函数的Hölder光滑性作为零样本超分辨率的充分条件,并推导出相应的泛化界。最后,我们通过实验结果验证了所识别的失败模式。

英文摘要

Neural operators are often reported to exhibit zero-shot super-resolution, a phenomenon in which a model trained on coarse grids produces accurate predictions on finer testing grids without additional retraining. Despite strong empirical evidence, the theoretical foundations of this phenomenon remain unclear. In this work, we provide a systematic theoretical study of zero-shot super-resolution in operator learning. We first show that zero-shot super-resolution can be information-theoretically impossible even in benign settings such as when the input functions are available over the entire continuum and the ground truth is a simple rank-one linear operator. We then identify H{\" o}lder smoothness of the output functions as a sufficient condition for zero-shot super-resolution and derive corresponding generalization bounds. Finally, we also validate the identified failure modes through experimental results.

2606.00291 2026-06-02 cs.GT cs.LG

The Representation-Rationalizability Tradeoff in Reward Learning

奖励学习中的表示-可理性权衡

Jing Dong, Yaoliang Yu, Pascal Pourpart

发表机构 * Vector Institute(向量研究所) University of Waterloo(滑铁库大学)

AI总结 本文研究RLHF中奖励学习面临的表示与可理性之间的权衡,通过分解交叉熵损失为表示项和聚合项,证明更丰富的表示会扩大不可理性比较的数量,且联合训练无法自动达到最优平衡点。

详情
AI中文摘要

在RLHF中,每个训练样本包含一个提示$x$和两个候选回答$y,y'$,标注者提供这些回答之间的成对偏好。学习问题是将这些异质成对判断转换为一个标量奖励$r(x,y)$,用于衡量每个提示的回答质量。经典社会选择理论表明这是不可能的,因为异质标注者样本可能导致具有孔多塞循环的汇总偏好,因此没有标量奖励能够一致地评估所有被比较的回答对。越来越多的文献将RLHF作为社会选择问题进行分析,但通常假设固定的有限备选集合,即每个提示预先列举的有限候选回答集。现代流程则通过一个学习的表示$ϕ(x,y)$对回答进行评分,然后通过标量头,因此$ϕ$决定了哪些回答被视为可区分的备选,以及哪些比较对奖励模型可见。一旦嵌入成为问题的一部分,社会选择理论中的不可能结果就变成了一个权衡。我们证明,任何基于$ϕ$构建的奖励的额外交叉熵损失可以精确分解为一个表示项(更丰富的$ϕ$会缩小它)和一个聚合项(更丰富的$ϕ$通过暴露更多无法被任何标量一致排序的比较而扩大它)。相同的结果扩展到直接偏好优化(DPO),并且联合训练嵌入和奖励不能保证恢复此权衡的最佳点。在合成数据和真实偏好数据集上的实验证实了我们的结果。

英文摘要

In RLHF, each training example contains a prompt $x$ and two candidate responses $y,y'$, and annotators provide pairwise preferences between these responses. The learning problem is to convert these heterogeneous pairwise judgments into a single scalar reward $r(x,y)$ that measures response quality for each prompt. Classical social choice implies an impossibility because heterogeneous annotator samples can induce pooled preferences with Condorcet cycles, so no scalar reward can evaluate all compared response pairs consistently. A growing literature analyzes RLHF as a social-choice problem, but usually assumes a fixed finite set of alternatives, i.e., a pre-enumerated finite set of candidate responses for each prompt. Modern pipelines instead score responses through a learned representation $ϕ(x,y)$ before a scalar head, so $ϕ$ determines which responses are treated as distinguishable alternatives and which comparisons are visible to the reward model. Once this embedding is part of the problem, the impossibility results from social choice theory become a tradeoff. We show that the excess cross-entropy loss of any reward built on $ϕ$ decomposes exactly into a representational term, which a richer $ϕ$ shrinks, and an aggregation term, which a richer $ϕ$ enlarges by exposing more comparisons that no scalar can rank consistently. The same results extend to direct preference optimization (DPO), and jointly training the embedding and the reward cannot guarantee to recover the sweet spot of this tradeoff. Experiments on synthetic data and real preference datasets corroborate our results.

2606.00282 2026-06-02 cs.IR cs.AI

Synthetic Data from Cross-Domain Events for Large-Scale Recommendation Systems

跨域事件的合成数据用于大规模推荐系统

Xiangyu Wang, Yawen He, Shivendra Pratap Singh, Han Huang, Mengtong Hu, Sharath Ciddu, Yi-Hsuan Hsieh, Erik Groving, Yi Ding, Jieming Di, Tony Wang, Min Yun, Xiaoyu Chen, Ling Leng, Rob Malkin

发表机构 * Meta

AI总结 提出SCALR框架,通过源域事件生成目标域的合成用户-物品交互事件,以缓解数据稀疏和噪声反馈问题,并在工业推荐平台的在线A/B测试中取得显著改进。

Comments 13 pages, 3 figures

详情
AI中文摘要

大规模推荐系统在多个域中运行,但面临数据稀疏和噪声隐式反馈的挑战。传统方法通过从源域到目标域的特定模型知识蒸馏来缓解这一问题。受大型语言模型(LLM)中合成数据生成的变革性成功启发,我们提出了用于推荐的合成跨域增强与学习(SCALR)框架,该框架通过利用源域中的观察事件,为目标推荐域生成合成用户-物品交互事件。SCALR将跨域学习分解为两个模块化阶段。首先,它通过将事件生成视为估计用户在源域中观察到的交互条件下与目标域物品交互的可能性,来翻译源域中的观察用户事件。其次,下游模型将这些合成事件作为跨域学习目标进行训练,其中合成事件以模型无关的方式增强目标域的训练数据。我们的方法在工业推荐平台的在线A/B测试中取得了统计显著的改进。据我们所知,这是首批明确将跨域事件转移作为推荐系统合成数据生成的工作之一。

英文摘要

Large-scale recommendation systems operate across diverse domains, yet they face the challenges of data sparsity and noisy implicit feedback. Traditional approaches mitigate this via model-specific knowledge distillation from source domains to a target domain. Inspired by the transformative success of synthetic data generation in large language models (LLMs), we introduce Synthetic Cross-domain Augmentation and Learning for Recommendation (SCALR), a framework that generates synthetic user-item interaction events for a target recommendation domain by leveraging observed events from a source domain. SCALR decomposes cross-domain learning into two modular stages. First, it translates observed user events in source domains by framing event generation as estimating the likelihood that a user would interact with a target-domain item, conditioned on their observed interactions in a source domain. Second, downstream models train on these synthetic events as cross-domain learning objectives, where the synthetic events augment the target domain's training data in a model-agnostic manner. Our approach yields statistically significant improvements in online A/B tests on an industrial recommendation platform. To the best of our knowledge, this is among the first works to explicitly frame cross-domain event transfer as synthetic data generation for recommendation systems.

2606.00281 2026-06-02 physics.ao-ph cs.LG

Flow Matching for Convective-Scale Precipitation Downscaling

对流尺度降水降尺度的流匹配方法

Tom Wetherell

发表机构 * Met Office(英国气象局)

AI总结 针对对流尺度降水降尺度问题,提出流匹配生成模型,相比扩散模型在空间技能上表现更优,但低估降水分布上尾导致气候平均偏干。

详情
AI中文摘要

生成式机器学习正日益成为动力降尺度的重要补充,用于生成高分辨率降水预测,其中扩散模型是目前领先的方法。流匹配是一种相关的生成框架,最近在图像、视频和其他领域取得了强劲成果,并在降尺度方面显示出早期前景。我们训练了一个流匹配模型,将新加坡周围对流尺度区域上的每日降水从8公里映射到2公里,并将其与基于分数的扩散模型CPMGEM进行基准测试。流匹配在空间技能上始终表现更好:在每个降水阈值和邻域尺度测试中,分数技能得分更高,并且SAL得分的结构和幅度分量更紧密,位置技能相当。然而,流匹配低估了降水分布的上尾,导致气候平均存在干偏差。这些结果表明,流匹配是对流尺度降水降尺度的竞争性生成框架,特别适合捕捉空间结构。

英文摘要

Generative machine learning is an increasingly important complement to dynamical downscaling for producing high-resolution precipitation projections, with diffusion models currently the leading approach. Flow matching is a related generative framework that has recently achieved strong results across image, video and other domains, and shown early promise for downscaling. We train a flow matching model to map daily precipitation from 8 km to 2 km over a convective-scale domain centred on Singapore, and benchmark it against CPMGEM, a score-based diffusion model. Flow matching achieves consistently better spatial skill: higher fractions skill score at every precipitation threshold and neighbourhood scale tested, and tighter structure and amplitude components of the SAL score with comparable location skill. However, flow matching underestimates the upper tail of the precipitation distribution, resulting in a dry bias in the climatological mean. These results suggest that flow matching is a competitive generative framework for convective-scale precipitation downscaling, particularly well suited to capturing spatial structure.

2606.00266 2026-06-02 cs.NI cs.LG

KISS: Keeping it Simple and Slotted when Learning to Communicate over Wireless

KISS:学习无线通信时保持简单和时隙化

Kamil Szczech, Maksymilian Wojnar, Krzysztof Rusek, Katarzyna Kosek-Szott, Szymon Szott

发表机构 * AGH University of Krakow(克拉科夫AGH大学)

AI总结 本文使用离线双深度Q网络结合贝叶斯推理,在时隙信道上训练分布式智能体自主学习随机接入策略,实现了接近理论效率且公平的接入,并发现学习到的行为类似于动态调整传输概率的时隙ALOHA。

详情
AI中文摘要

分布式无线系统中长期存在的挑战是确保高效且公平的随机信道接入。现有解决方案通常处理与时间、周期性或集中化相关的特定约束,但它们通常依赖固定启发式方法。受机器学习(ML)最新进展的启发,我们研究ML智能体能否自主学习高效且公平的接入策略,以及这种学习能否为介质访问控制(MAC)设计提供新见解。我们的目标不是提出可部署的协议,而是检验在最小假设下,分散式学习能否重新发现或近似理论上高效的随机接入机制。为此,我们部署了带有贝叶斯推理的离线双深度Q网络(DDQN)来训练在时隙信道上运行的智能体。所得方法完全在线(无需预训练)、完全分布式(独立的多智能体学习器)、随机(非周期性),且无需协调或显式通信。大量仿真表明,学习到的策略适应变化的网络条件,并在保持公平性的同时实现接近理论的效率。消融研究进一步揭示,学习到的行为类似于具有动态调整传输概率的时隙ALOHA,因此我们将该方法称为KISS:保持简单和时隙化。

英文摘要

A long-standing challenge in distributed wireless systems is ensuring efficient and fair random channel access. Existing solutions often address specific constraints related to timing, periodicity, or centralization, but they typically rely on fixed heuristics. Motivated by recent advances in machine learning (ML), we investigate whether ML agents can autonomously learn efficient and fair access strategies, and whether such learning can offer new insights into medium access control (MAC) design. Rather than proposing a deployable protocol, our aim is to examine whether decentralized learning can rediscover or approximate theoretically efficient random-access mechanisms under minimal assumptions. To this end, we deploy an off-policy Double Deep Q-Network (DDQN) with Bayesian inference to train agents operating over a slotted channel. The resulting method is fully online (no pre-training), fully distributed (independent multi-agent learners), stochastic (non-periodic), and requires no coordination or explicit communication. Extensive simulations show that the learned strategy adapts to varying network conditions and achieves near-theoretical efficiency while maintaining fairness. Ablation studies further reveal that the learned behavior resembles slotted ALOHA with a dynamically adjusted transmission probability, leading us to refer to the method as KISS: Keeping It Simple and Slotted.

2606.00263 2026-06-02 eess.SP cs.LG

ReFLEX: Length-Generalizable CSI Denoising for MIMO-OFDM via Relative-Frequency Bias

ReFLEX: 通过相对频率偏置实现MIMO-OFDM中长度可泛化的CSI去噪

Zhibin Zhang, Robert Potekhin, Ziwei Wan, Vladimir Lyashev, Zhen Gao

发表机构 * Moscow Institute of Physics and Technology (State University)(莫斯科物理技术学院(国家大学)) Yangtze Delta Region Academy(长江三角洲地区研究院) Beijing Institute of Technology(北京理工大学) School of Interdisciplinary Science(交叉科学学院)

AI总结 提出ReFLEX,一种基于相对频率位置偏置(RFPB)的长度可泛化Transformer,用于MIMO-OFDM系统中可变RB分配的CSI去噪,在未见RB长度和稀疏DM-RS场景下无需重训即可应用,并在3GPP信道和NR PUSCH仿真中显著提升性能。

Comments 5 pages, 3 figures, submitted to IEEE journal

详情
AI中文摘要

本文研究了具有可变NR资源块(RB)分配的MIMO-OFDM系统的CSI去噪问题。ReFLEX是一种长度可泛化的Transformer,其频率注意力使用由子载波偏移生成的相对频率位置偏置(RFPB)。单个检查点可处理未见过的RB长度,并可应用于测试的RB5/RB10 PUSCH配置中的稀疏DM-RS观测,无需重新训练。在3GPP TR 38.901 UMa NLOS信道中,ReFLEX在未见RB长度上实现了约-9.6 dB的NMSE。在NR PUSCH/UL-SCH仿真中,ReFLEX去噪后接时频插值将10% BLER阈值降低了约2-3 dB。

英文摘要

This letter studies CSI denoising for MIMO--OFDM with variable NR resource block (RB) allocations. ReFLEX is a length-generalizable Transformer whose frequency attention uses a relative-frequency position bias (RFPB) generated from subcarrier offsets. A single checkpoint handles unseen RB lengths and can be applied to sparse DM-RS observations in the tested RB5/RB10 PUSCH setup without retraining. In a 3GPP~TR~38.901 UMa NLOS channel, ReFLEX achieves about $-9.6$~dB NMSE on unseen RB lengths. In NR PUSCH/UL-SCH simulations, ReFLEX denoising followed by time-frequency interpolation reduces the 10\% BLER threshold by about 2--3~dB.

2606.00235 2026-06-02 physics.soc-ph cs.AI cs.CY cs.MA

Civilizational Metamaterials: Engineering Coordination Under Capability Gradients and Structural Turbulence

文明超材料:能力梯度与结构湍流下的协调工程

David Orban

发表机构 * Independent Researcher(独立研究者)

AI总结 受超材料物理学启发,提出将治理从规范性学科转变为工程学科的正式框架,通过有效协调系数模型预测自愈与自失稳相变,并设计可检验假设与实验方案。

Comments 19 pages, 4 figures. Accepted for presentation at AGI-26 (Springer LNAI, forthcoming). v2 corrects the sign of the synergy term in the constitutive law (Eq. 2) and reformulates H3 as a threshold-crossing claim, per peer review

详情
AI中文摘要

我们认为治理必须从规范性学科转变为工程学科,并受超材料物理学启发,开发了一个正式框架,使这一转变量化和可检验。通用人工智能主要通过提高决策速度来影响文明,而人类验证能力仍然有限。当验证AI生成输出的成本超过基于其行动的预期效用时,理性主体默认不行动:我们称之为冻结均衡的稳定但灾难性的纳什均衡。借鉴超材料(其中涌现的宏观性质源于设计的微观结构),我们为制度协调建立了一个现象学本构定律:$R_{\mathrm{eff}} = \beta\cdot (1-\rho) \cdot (1-\tau) \cdot (1-\gamma\rho\tau)$,其中$\beta$是决策分支因子,$\rho$是来源保真度,$\tau$是验证率,$\gamma\in [0,1]$捕捉来源和验证失败之间相关检测的协同效应。该模型预测自愈($R_{\mathrm{eff}} < 1$)和自失稳($R_{\mathrm{eff}} > 1$)状态之间的尖锐相变。我们引入了一个三类来源分类法:密码学、制度性和上下文绑定,并推导出四个可证伪的假设,以及一个提议在政府拨款评审小组中进行的为期12周的阶梯楔形整群随机试验。该框架连接了AI对齐理论和制度设计。

英文摘要

We argue that governance must transition from a normative discipline to an engineering discipline, and we develop a formal framework, inspired by the physics of metamaterials, to make this transition quantitative and testable. Artificial General Intelligence affects civilization primarily by increasing decision velocity while human verification capacity remains bounded. When the cost of validating AI-generated outputs exceeds the expected utility of acting on them, rational agents default to inaction: a stable but catastrophic Nash equilibrium we term the Freezing Equilibrium. Drawing on metamaterials, where emergent macro-properties arise from designed microstructure, we develop a phenomenological constitutive law for institutional coordination: $R_{\mathrm{eff}} = β\cdot (1-ρ) \cdot (1-τ) \cdot (1-γρτ)$, where $β$ is the decision branching factor, $ρ$ is provenance fidelity, $τ$ is the verification rate, and $γ\in [0,1]$ captures correlated-detection synergy between provenance and verification failures. The model predicts a sharp phase transition between self-healing ($R_{\mathrm{eff}} < 1$) and self-destabilizing ($R_{\mathrm{eff}} > 1$) regimes. We introduce a three-class provenance taxonomy: cryptographic, institutional, and context binding, and derive four falsifiable hypotheses with a proposed 12-week stepped-wedge cluster-randomized trial in government grant review panels. The framework bridges AI alignment theory and institutional design.