arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4101
2606.00684 2026-06-02 eess.AS cs.CL cs.SD

Local Diagnostics of Continuous Normalizing Flow for Out-of-Distribution Detection

连续归一化流用于分布外检测的局部诊断

Xinwei Cao, Mengxuan Lu, Torbjørn Svendsen, Giampiero Salvi

发表机构 * Department of Electronic Systems(电子系统系) Norwegian University of Science and Technology(挪威科学技术大学) Trondheim, Norway(特伦德内克,挪威)

AI总结 针对高维数据子空间中目标观测的分布外检测问题,提出基于连续归一化流的拉格朗日子流框架,通过速度场几何诊断信号设计零样本音素级发音错误检测指标,优于基于似然的方法。

Comments 16 pages, 5 figures

详情
AI中文摘要

我们解决了嵌入在高维数据空间子空间中的目标观测的分布外(OOD)检测问题。利用连续归一化流(CNFs),我们提出了一个拉格朗日子流(LSF)框架,旨在隔离并估计表示中相关分量的密度,同时将剩余分量作为上下文。通过对语音合成模型的实验,我们表明CNFs与其他深度生成模型(DGMs)类似,容易受到“似然悖论”的影响,即OOD样本被错误地赋予高似然。这归因于DGMs的归纳偏差,即优先考虑低级结构细节而非高级语义一致性。为了缓解这一现象,我们提出了基于子流轨迹上速度场的若干几何诊断信号。基于这些信号,我们为零样本音素级发音错误检测这一具有挑战性的任务设计了指标。最后,我们在一个真实的发音错误检测基准上展示了这些指标相对于基于似然的方法的优越性。

英文摘要

We address the problem of out-of-distribution (OOD) detection for target observations embedded in a subspace of the high dimensional data space. Using continuous normalizing flows (CNFs), we propose a Lagrangian sub-flow (LSF) framework designed to isolate and estimate the density for the relevant components in the representation and using the remaining components as context. Through experimentation with models for speech synthesis, we show that CNFs, similarly to other deep generative models (DGMs), are susceptible to the "likelihood paradox", where high likelihood is erroneously assigned to OOD samples. This is attributed to the inductive bias of DGMs that prioritize low-level structural details over high-level semantic coherence. To mitigate this phenomenon, we propose a number of geometric diagnostic signals based on the velocity field over the sub-flow trajectory. Based on these signals, we design metrics for the challenging task of zero-shot phoneme-level mispronunciation detection. Finally, we demonstrate the superiority of these metrics compared to likelihood-based methods on a real-world mispronunciation detection benchmark.

2606.00667 2026-06-02 q-bio.NC cs.LG

Cortex and subcortex play distinct roles over learning when cortical memory is limited

皮层与皮层下在学习中扮演不同角色:当皮层记忆受限时

Matthew Farrell, Taro Toyoizumi

发表机构 * Laboratory for Neural Computation and Adaptation(神经计算与适应实验室) RIKEN Center for Brain Science(脑科学研究中心) Department of Mathematical Informatics, Graduate School of Information Science and Technology(信息科学与技术研究生院数学信息学系) The University of Tokyo(东京大学)

AI总结 通过约束模型基模块的记忆资源,研究皮层与皮层下系统在学习中的功能分离,发现皮层支持一般结构学习而皮层下专攻奖励学习。

Comments Preprint. 19 pages, 4 figures

详情
AI中文摘要

已有研究表明,大脑将灵活但计算成本高的皮层处理与更简单、成本更低的皮层下机制相结合,以实现比任一系统单独运行更高效的资源利用。尽管这一观点具有吸引力,但探索该假设的理论框架仍然有限。我们扩展了现有框架,其中模型基模块和模型无关模块并行学习,通过显式约束模型基模块的记忆资源,并在一个简单的决策设置中研究该约束的影响。记忆约束自然引发了分配记忆资源的策略。我们评估了不同策略在不同情境下的表现,并证明当奖励状态频繁变化时,模型基模块将记忆资源用于捕捉环境的通用结构而非利用当前奖励可能更有利。这项工作为学习过程中皮层和皮层下系统的功能分离提供了理论基础:皮层支持通用结构学习,而皮层下回路专门负责基于奖励的学习。我们进一步详细说明了如何在实验数据上检验这些假设。

英文摘要

It has been proposed that the brain integrates flexible, computationally expensive cortical processing with simpler, lower-cost subcortical mechanisms to achieve resource-efficient performance greater than that of either system alone. Despite the allure of this perspective, satisfying theoretical frameworks that explore this hypothesis are still limited. We extend existing frameworks in which a model-based module and model-free module learn in tandem by explicitly constraining the memory resources of the model-based module, and investigate the impact of this constraint in a simple decision-making setting. Memory constraints naturally give rise to strategies for allocating memory resources. We evaluate the performance of different strategies in different situations and demonstrate that when the rewarded states change often, it can be advantageous for the model-based module to focus its memory resources not on exploiting the current reward, but on capturing general structure of the environment. This work provides a theoretical foundation for a functional dissociation between cortical and subcortical systems during learning: the cortex supports general structure learning, while subcortical circuits specialize in reward-based learning. We further detail how these hypotheses can be tested on experimental data.

2606.00666 2026-06-02 cond-mat.mtrl-sci cs.LG physics.chem-ph

Manifold Diffusion for Structure Generation of Transition Metal Complexes

过渡金属配合物结构生成的流形扩散

Luca Schaufelberger, Kjell Jorner

发表机构 * Institute of Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich(苏黎世联邦理工学院化学与生物工程研究所,化学与应用生物科学系) NCCR Catalysis, Switzerland(瑞士催化中心)

AI总结 提出TMCgen流形扩散模型,通过金属-配体配位角与配体扭转/旋转扩散,高效生成过渡金属配合物的精确几何结构。

详情
AI中文摘要

过渡金属配合物是催化、药物设计和材料科学的核心,其相关性质对三维几何结构高度敏感。然而,过渡金属配合物的电子多样性和非常规键合环境对准确结构生成构成重大挑战。在这项工作中,我们引入了TMCgen,一种流形扩散机器学习模型,能够高效准确地生成过渡金属配合物的几何结构。通过将扩散过程公式化为金属-配体配位角,并结合配体的扭转和旋转扩散,TMCgen聚焦于过渡金属配合物的关键几何自由度。TMCgen在多样化的实验衍生生物无机和有机金属配合物上表现出生成准确配位环境的强大性能,同时仅需少量推理步骤,实现高效生成。我们的结果展示了基于流形的生成建模在数据高效几何生成中的潜力,为过渡金属配合物的性质条件设计铺平了道路。

英文摘要

Transition metal complexes are central to catalysis, drug design, and materials science, with relevant properties strongly sensitive to their three-dimensional geometry. However, the electronic diversity and unconventional bonding environments of transition metal complexes pose a major challenge for accurate structure generation. In this work, we introduce TMCgen, a manifold diffusion machine learning model that efficiently and accurately generates geometries of transition metal complexes. By formulating the diffusion process over the metal-ligand coordination angles, combined with torsional and rotational diffusion of the ligands, TMCgen focuses on the key geometric degrees of freedom of transition metal complexes. TMCgen shows strong performance in generating accurate coordination environments on a diverse set of experimentally derived bioinorganic and organometallic complexes while requiring only few inference steps, enabling efficient generation. Our results demonstrate the potential of manifold-based generative modeling for data-efficient geometry generation, paving the way for property-conditioned design of transition metal complexes.

2606.00661 2026-06-02 stat.ML cs.LG

On Median of Incomplete U-Statistics

关于不完全U-统计量的中位数

Nong Minh Hieu

发表机构 * Singapore Management University, School of Computing and Information Systems(新加坡国立管理学院,计算机与信息系统学院)

AI总结 本文针对不完全U-统计量的中位数(MIU)建立了有限样本浓度率,这是一种用于对称核期望的高效稳健估计量。

详情
AI中文摘要

我们建立了不完全U-统计量的中位数(MIU)的有限样本浓度率,这是一种用于对称核期望的高效稳健估计量。

英文摘要

We establish the finite-sample concentration rate for the Median-of-Incomplete-U-Statistics (MIU), an efficient robust estimator for the expectation of symmetric kernels.

2606.00655 2026-06-02 cs.MA cs.AI cs.CY

Scaling Behavior of Single LLM-Driven Multi-Agent Systems

单一LLM驱动的多智能体系统的扩展行为

Jialing Li, Zhouhong Gu, Yin Cai, Hongwei Feng

发表机构 * Fudan University(复旦大学)

AI总结 本文通过提出顺序迭代多智能体系统(SIMAS)框架,系统研究了同质多智能体系统性能随智能体数量变化的扩展规律,发现性能并非单调提升,而是受协作协同与协调开销之间的权衡支配,呈现收益递减模式。

详情
AI中文摘要

基于LLM的多智能体系统(MAS)这一新兴领域有望通过协作智能处理复杂任务,但其扩展行为和内在集体动力学的基本问题仍未被充分探索。本文系统研究了同质MAS的性能如何随智能体数量增加而变化,将协作变量与模型或知识异质性分离。我们提出了顺序迭代多智能体系统(SIMAS)框架,这是一种以顺序智能体间通信为中心的极简架构,以清晰观察扩展效应。通过跨不同任务和模型规模的广泛实验,我们确定MAS性能并非随智能体数量单调扩展,而是遵循收益递减模式,受协作协同与协调开销之间的权衡支配。我们的发现表明,有效的MAS需要足够强大的基础LLM,任务类型关键地调节最优智能体数量,并且集体智能是一种依赖于策略性交互设计的新兴属性,而非智能体数量的必然结果。性能下降源于协调开销,而不仅仅是长上下文失败,并且扩展趋势在结构化辩论拓扑等交互架构中具有普遍性。这项工作为MAS扩展规律提供了基础理解,为设计高效协作系统提供了实践指导,并挑战了“更多智能体必然带来更好性能”的普遍假设。

英文摘要

The burgeoning field of LLM-based Multi-Agent Systems (MAS) promises to tackle complex tasks through collaborative intelligence, yet fundamental questions regarding their scaling behavior and intrinsic collective dynamics remain underexplored. This paper systematically investigates how the performance of a homogeneous MAS evolves as the number of agents increases, isolating the variable of collaboration from model or knowledge heterogeneity. We propose the Sequential Iterative Multi-Agent System (SIMAS) framework, a minimalist architecture centered on sequential inter-agent communication, to clearly observe scaling effects. Through extensive experiments across diverse tasks and model scales, we establish that MAS performance does not scale monotonically with agent count but follows a pattern of diminishing returns, governed by a trade-off between collaborative synergy and coordination overhead. Our findings reveal that effective MAS requires a sufficiently capable base LLM, that task type critically modulates the optimal agent count, and that collective intelligence is an emergent property contingent on strategic interaction design rather than a guaranteed outcome of agent plurality. The performance degradation stems coordination overhead rather than merely long-context failure, and the scaling tendency generalizes across interaction architectures like structured debate topologies. This work provides a foundational understanding of MAS scaling laws, offering practical guidance for designing efficient collaborative systems and challenging the prevailing assumption that more agents invariably lead to better performance.

2606.00643 2026-06-02 stat.ML cs.LG cs.NA math.NA math.OC math.ST stat.TH

Taming the Loss Landscape of PINNs with Noisy Feynman-Kac Supervision: Operator Preconditioning and Non-Asymptotic Error Bounds

驯服带噪声Feynman-Kac监督的PINN损失景观:算子预条件与非渐近误差界

Nathanael Tepakbong, Hanyu Hu, Chengyu Liu, Xiang Zhou

发表机构 * Department of Data Science, City University of Hong Kong(香港城市大学数据科学系) Department of Mathematics, City University of Hong Kong(香港城市大学数学系)

AI总结 通过引入点态数据保真项作为算子级预条件,显著改善PINN的损失景观条件数,并基于Feynman-Kac表示生成标签,提出FK-PINN方法,推导了梯度下降下的非渐近误差界。

Comments accepted in ICML 2026 (poster), 59 pages

详情
AI中文摘要

物理信息神经网络(PINNs)在求解具有挑战性的偏微分方程(PDEs)时通常训练缓慢或无法收敛,这一行为最近被归因于从底层微分算子继承的严重病态损失景观。我们研究了在标准残差和边界损失基础上,于域内少数点添加点态数据保真项的PINNs。我们证明,该监督项作为算子级预条件:对于合适的权重,我们的比较界保证条件数比标准PINN损失下显著更小,且与点态标签的获取方式无关。对于一类允许Feynman-Kac(FK)表示的PDEs,我们通过FK泛函的蒙特卡洛平均生成此类标签,得到所谓的“FK-PINNs”,并利用超额风险分解方法,推导了使用tanh激活函数、通过有限步梯度下降训练的FK-PINNs的非渐近$L^2(Ω)$误差界。在此过程中,我们建立了tanh神经网络一阶和二阶导数的伪维数界,这些结果具有独立意义,且据我们所知是新的。在泊松、薛定谔、平均逃逸时间和通量问题上的数值实验证实了理论,并表明FK-PINNs能够成功求解标准PINNs表现出严重失效模式的PDEs。

英文摘要

Physics-Informed Neural Networks (PINNs) often train slowly or fail to converge on challenging partial differential equations (PDEs), a behavior recently linked to severely ill-conditioned loss landscapes inherited from the underlying differential operator. We study PINNs augmented with a pointwise data-fidelity term, added at a few points in the domain to the standard residual and boundary losses. We show that this supervision term acts as an operator-level preconditioner: for suitable weights, our comparison bounds guarantee a substantially smaller condition number than under the standard PINN loss, independently of how the pointwise labels are obtained. For a broad class of PDEs admitting a Feynman-Kac (FK) representation, we generate such labels by Monte Carlo averages of the FK functional, resulting in what we call ``FK-PINNs", and using the excess risk decomposition approach, we derive non-asymptotic $L^2(Ω)$-error bounds for FK-PINNs with $\tanh$ activation trained by finitely many steps of gradient descent. Along the way, we establish pseudo-dimension bounds for first- and second-order derivatives of $\tanh$ neural networks, which are of independent interest and, to the best of our knowledge, new. Numerical experiments on Poisson, Schrödinger, mean exit time, and committor problems corroborate the theory, and show that FK-PINNs can successfully solve PDEs for which standard PINNs exhibit severe failure modes.

2606.00636 2026-06-02 cs.AR cs.AI

LP5X-PIM Sim: A High-Fidelity HW/SW Integrated Simulator for LPDDR5X-PIM

LP5X-PIM Sim:用于LPDDR5X-PIM的高保真硬件/软件集成模拟器

SangHoon Cha, Jaewan Choi, Byeongho Kim, Yoonah Paik, Sukhan Lee, Kyomin Sohn

发表机构 * Samsung Electronics, South Korea(三星电子(韩国))

AI总结 本文介绍三星电子开发的LPDDR5X-PIM模拟器,通过集成硬件数据路径和软件控制层的高保真模型,实现系统性能和能效的精确评估。

Comments 4 pages, 4 figures, tech note

详情
AI中文摘要

本技术说明描述了由三星电子开发的LPDDR5X-PIM模拟器的架构和执行结果。基于最新研究和内部规范,该模拟器提供了LPDDR5X-PIM模块的硬件数据路径和软件控制层的高保真模型。这种集成的硬件-软件仿真方法能够在最大化PIM资源利用率的同时,精确评估系统性能和能效。我们改进了现有的仿真框架以与实际硬件实现保持一致,确保行为准确性的一致性。关于LPDDR5X-PIM的具体架构和电路设计的进一步技术细节将在未来的出版物中披露。

英文摘要

This tech note describes the architecture and execution results of the LPDDR5X-PIM simulator, developed by Samsung Electronics. Based on the latest research and internal specifications, the simulator provides a high-fidelity model of both the hardware data paths and the software control layers of the LPDDR5X-PIM block. This integrated hardware-software simulation approach enables precise evaluation of system performance and energy efficiency while maximizing PIM resource utilization. We have refined existing simulation frameworks to align with actual hardware implementation, ensuring consistent behavioral accuracy. Further technical details regarding the specific architecture and circuit design of the LPDDR5X-PIM will be disclosed in future publications

2606.00621 2026-06-02 cs.CR cs.AI cs.CY

Authenticity Debt and the Synthetic Content Threat Landscape: A Layered Framework for Trust, Provenance, and IP Governance in the Generative AI Era

真实性债务与合成内容威胁格局:生成式AI时代信任、溯源和知识产权治理的分层框架

Shubhashis Sengupta, Benjamin McCarty, Milind Savagaonkar, Rhine Andotra

发表机构 * Accenture Services Pvt. Ltd.(Accenture服务有限公司)

AI总结 提出真实性债务概念,并基于零信任架构原则设计分层参考架构,整合密码学溯源、人工验证和持续治理,以应对生成式AI带来的合成内容威胁。

详情
AI中文摘要

生成式人工智能从根本上改变了内容的生产方式。它使得高保真文本、图像、音频和视频能够以接近零的边际成本创建、修改和重新分发。这种转变使企业和生态系统面临跨四个相互加强的真实性层(真实性、溯源、完整性和问责性)的多种风险,而传统控制措施单独无法充分应对。我们引入了真实性债务的概念:当组织在未保留可验证来源、完整性和问责性的情况下部署AI生成内容时,累积的制度性负债,将暴露推迟到监管、法律或市场审查之下。本文提出了生成式AI危害和攻击向量的全面多维分类法,调查了技术控制(包括数字水印、溯源框架(C2PA、Adobe CAI)和检测技术)的能力和失效模式,并论证了在开放、对抗和不断变化的环境中没有任何单一机制是足够的。借鉴零信任架构原则和企业治理框架,我们提出了一个分层参考架构,整合密码学溯源、人工验证和持续治理,以大规模维持可辩护的真实性。我们进一步审视了监管格局(欧盟AI法案、美国联邦贸易委员会、NIST AI风险管理框架),并为寻求将真实性建设为制度基础设施而非事后考虑的组织确定了实用指导原则。

英文摘要

Generative artificial intelligence has fundamentally changed how content is now produced. It has enabled how high-fidelity text, images, audio, and videos are created, modified, and redistributed at near-zero marginal cost. This shift exposes enterprises and ecosystems to a number of risks across four reinforcing authenticity layers -- authenticity, provenance, integrity, and accountability -- that traditional controls are inadequate to address in isolation. We introduce the concept of authenticity debt: the cumulative institutional liability that accumulates when organizations deploy AI-generated content without preserving verifiable origin, integrity, and accountability, deferring exposure that surfaces under regulatory, legal, or market scrutiny. This paper presents a comprehensive, multi-dimensional taxonomy of generative AI harms and attack vectors, surveys the capabilities and failure modes of technical controls including digital watermarking, provenance frameworks (C2PA, Adobe CAI), and detection technologies, and argues that no single mechanism is sufficient in open, adversarial, and evolving environments. Drawing on Zero Trust Architecture principles and enterprise governance frameworks, we propose a layered reference architecture that integrates cryptographic provenance, human-in-the-loop verification, and continuous governance to sustain defensible authenticity at scale. We further examine the regulatory landscape (EU AI Act, U.S.\ FTC, NIST AI RMF) and identify practical guiding principles for organizations seeking to build authenticity as institutional infrastructure rather than an afterthought.

2606.00610 2026-06-02 cs.IR cs.AI cs.MA

MemGraphRAG: Memory-based Multi-Agent System for Graph Retrieval-Augmented Generation

MemGraphRAG:基于记忆的多智能体系统用于图检索增强生成

Chuanjie Wu, Zhishang Xiang, Yunbo Tang, Zerui Chen, Qinggang Zhang, Jinsong Su

发表机构 * Xiamen University(厦门大学) Jilin University(吉林大学)

AI总结 提出MemGraphRAG框架,通过基于记忆的多智能体系统构建高质量知识图谱,并设计记忆感知的分层检索算法,在多个基准上超越现有模型。

Comments Accepted by KDD 2026

详情
AI中文摘要

检索增强生成(RAG)已成为通过利用外部知识来减轻大型语言模型(LLMs)幻觉的重要方法。虽然对简单查询有效,但传统RAG在处理信息高度碎片化的大规模非结构化语料库时存在困难。基于图的RAG(GraphRAG)引入知识图谱来捕获结构关系,从而实现对复杂推理的更全面检索。然而,现有的GraphRAG方法依赖孤立的、片段级别的提取来构建图,缺乏对整个语料库的全局视角。因此,这些方法经常导致主题不一致、逻辑冲突和结构碎片化的图,从而降低检索性能。在本文中,我们提出MemGraphRAG,一种新颖的框架,引入基于记忆的多智能体系统以确保高质量的图构建。具体来说,MemGraphRAG采用由共享记忆支持的智能体协作社会,在整个提取过程中提供统一的全局上下文。这种机制允许智能体动态解决逻辑冲突并保持整个语料库的结构连通性。此外,我们提出了一种针对所构建图的记忆感知分层检索算法。在多个基准上的大量实验表明,MemGraphRAG以相当的效率优于最先进的基线模型。我们的代码可在https://github.com/XMUDeepLIT/MemGraphRAG获取。

英文摘要

Retrieval-Augmented Generation (RAG) has become an essential method for mitigating hallucinations in Large Language Models (LLMs) by leveraging external knowledge. Although effective for simple queries, traditional RAG struggles with large-scale, unstructured corpora where information is highly fragmented. Graph-based RAG (GraphRAG) incorporates knowledge graphs to capture structural relationships, enabling more comprehensive retrieval for complex reasoning. However, existing GraphRAG methods rely on isolated, fragment-level extraction for graph construction, lacking a global perspective on the whole corpus. As a result, these methods frequently lead to thematically inconsistent, logically conflicting, and structurally fragmented graphs that degrade retrieval performance. In this paper, we propose MemGraphRAG, a novel framework that introduces a memory-based multi-agent system to ensure high-quality graph construction. Specifically, MemGraphRAG employs a collaborative society of agents supported by shared memory, which provides a unified global context throughout the extraction process. This mechanism allows agents to dynamically resolve logical conflicts and maintain structural connectivity throughout the corpus. Furthermore, we propose a memory-aware hierarchical retrieval algorithm tailored for the constructed graph. Extensive experiments on multiple benchmarks demonstrate that MemGraphRAG outperforms the state-of-the-art baseline models with comparable efficiency. Our code is available at https://github.com/XMUDeepLIT/MemGraphRAG.

2606.00590 2026-06-02 cs.IR cs.AI

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Critic-R:使用具有自然语言内省反馈的指令调优检索器改进智能搜索

Md Zarif Ul Alam, Alireza Salemi, Hamed Zamani

发表机构 * Center for Intelligent Information Retrieval(智能信息检索中心) University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校)

AI总结 提出Critic-R框架,通过引入评论模型评估智能体内省推理轨迹,实现检索模型与推理代理之间的反馈闭环,无需人工标注即可优化检索质量与下游答案准确性。

详情
AI中文摘要

智能搜索系统迭代地与检索模型交互以回答复杂查询。尽管取得了实质性进展,但优化检索器以适应智能搜索仍然具有挑战性,通常需要大量的协同训练或黄金标准标注,这限制了现实世界的适用性。我们提出Critic-R,一个在推理和训练过程中明确关闭推理代理与检索模型之间反馈循环的框架。Critic-R引入了一个评论模型,该模型在消费检索到的证据后评估代理的内省推理轨迹,以确定检索到的上下文是否充分支持下一步推理。Critic-R具有两种互补机制:Critic-R-Zero,一种推理时查询细化循环,迭代地重写查询和检索指令;以及Critic-Embed,一种检索模型的优化方法,利用成功和失败的细化轨迹作为自动监督,无需手动相关性标注。我们在HotpotQA、2WikiMultihopQA、MuSiQue和Bamboogle上评估Critic-R。结果表明,Critic-R显著提高了检索质量和下游答案准确性。

英文摘要

Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring heavy co-training or gold-standard annotations that limit real-world applicability. We propose Critic-R, a framework that explicitly closes the feedback loop between the reasoning agent and the retrieval model during both inference and training. Critic-R introduces a critic model that evaluates the agent's introspective reasoning trace after consuming retrieved evidence to determine whether the retrieved context sufficiently supports the next reasoning step. Critic-R has two complementary mechanisms: Critic-R-Zero, an inference-time query refinement loop that iteratively rewrites queries and retrieval instructions, and Critic-Embed, an optimization approach for retrieval models that leverages successful and failed refinement trajectories as automatic supervision without requiring manual relevance annotation. We evaluate Critic-R on HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle. Results show that Critic-R significantly improves both retrieval quality and downstream answer accuracy.

2606.00584 2026-06-02 stat.ML cs.LG

Spectra-Guided Neural Tucker Factorization

光谱引导的神经Tucker分解

Fusheng Wang, Yikai Hou

发表机构 * School of Automation, Chongqing University of Posts and Telecommunications(重庆邮电大学自动化学院) College of Computer and Information Science, School of Software, Southwest University(西南大学计算机与信息科学学院、软件学院)

AI总结 提出光谱引导的神经Tucker分解(SG-NTF),通过连续光谱空间映射和时空共门控机制,实现高维不完整张量的高效补全。

详情
AI中文摘要

本文针对高维不完整(HDI)张量补全问题,提出光谱引导的神经Tucker分解(SG-NTF)。为规避离散表示的局限性,SG-NTF将标量时间戳映射到连续光谱空间以抽象时间周期性。同时,时空共门控(STCG)机制通过时空上下文上的乘法调制显式过滤潜在交互。在真实世界HDI张量上的评估验证了SG-NTF在参数效率下保持有竞争力的补全精度。

英文摘要

This paper proposes Spectra-Guided Neural Tucker Factorization (SG-NTF) for High-Dimensional and Incomplete (HDI) tensor completion. Circumventing discrete representational limits, SG-NTF maps scalar timestamps into a continuous spectral space to abstract temporal periodicities. Concurrently, a Spatio-Temporal Co-Gating (STCG) mechanism explicitly filters latent interactions via multiplicative modulation on spatiotemporal contexts. Evaluations on real-world HDI tensors verify that SG-NTF maintains competitive completion accuracy with parameter efficiency.

2606.00552 2026-06-02 cs.OS cs.DC cs.NI cs.RO cs.SY eess.SY

Edge-Based QoS-Aware Adaptive Task Placement: A Closed-Loop Control in Multi-Robot Systems

基于边缘的QoS感知自适应任务放置:多机器人系统中的闭环控制

Thien Tran, Jonathan Kua, Thuong Hoang, Minh Tran, Honghao Lyu, Jiong Jin

发表机构 * Deakin University(德肯大学) RMIT University(皇家墨尔本理工大学) Zhejiang University(浙江大学) Swinburne University of Technology(西姆伯恩理工大学)

AI总结 提出一种QoS感知的自适应任务放置(ATP)控制器,通过多指标成本评分和闭环控制,在共享边缘节点上动态切换任务放置,以降低尾延迟和截止时间违规。

Comments 6 pages, 2 figure, 1 algorithm, accepted as a regular paper on the 24th IEEE International Conference on Industrial Informatics (INDIN), 26-29 July, 2026, Melbourne, Australia

详情
AI中文摘要

多机器人系统(MRS)越来越多地将计算密集型感知任务卸载到边缘节点,以满足严格的时间敏感服务质量(QoS)约束。然而,共享边缘节点上的静态任务编排可能因网络延迟、抖动和边缘资源争用而严重降低QoS。我们使用Raspberry Pi节点构建了一个以边缘为中心的MRS试验平台,评估了三种模式下的相机到机械臂流水线:本地执行、静态卸载和QoS感知的自适应任务放置(ATP)控制器。ATP通过两秒控制窗口内的多指标成本(归一化延迟、CPU利用率和切换开销)对候选放置进行评分。该闭环视觉伺服试验平台配备了亚毫秒级时钟同步、网络仿真以及跨节点的多指标详细监控,以捕获真实抖动。在计算压力和网络故障场景下的实验结果表明,静态边缘卸载降低了板载CPU负载,但放大了尾延迟和截止时间违规。相比之下,QoS感知的ATP控制器通过基于测量延迟和利用率阈值切换任务放置,持续降低了截止时间违规和尾延迟。总体而言,结果将ATP定位为MRS的实用边缘侧控制原语,并为云-边缘机器人部署在更广泛的云-雾自动化中提供了具体设计指南,同时激励了面向工业信息物理系统的QoS感知多目标工作负载编排。

英文摘要

Multi-robot systems (MRS) increasingly offload compute-intensive perception tasks to edge nodes to meet strict time-sensitive Quality-of-Service (QoS) constraints. However, static task orchestration on a shared edge node can severely degrade QoS due to network latency, jitter, and edge-resource contention. We present a pilot edge-centric MRS testbed using Raspberry Pi nodes to evaluate a camera-to-manipulator pipeline under three modes: local execution, static offloading, and a QoS-aware Adaptive Task Placement (ATP) controller. ATP scores candidate placements using a multi-metric cost (normalized latency, CPU utilization, and switching overhead) over two-second control windows. The closed-loop visual servoing testbed is instrumented with sub-millisecond clock synchronization, network emulation, and detailed monitoring of multiple metrics across nodes to capture realistic jitter. Experimental results under compute-stress and network-fault scenarios show that static edge offloading reduces on-board CPU load but amplifies tail latency and deadline misses. In contrast, the QoS-aware ATP controller, by switching task placement based on measured latency and utilization thresholds, consistently lowers deadline violations and tail latency. Overall, the results position ATP as a practical edge-side control primitive for MRS and concrete design guidelines for Cloud-Edge Robotics deployments within the broader cloud-fog automation, while motivating QoS-aware multi-objective workload orchestration for industrial cyber-physical systems.

2606.00550 2026-06-02 cs.HC cs.ET cs.RO

A Four-Tier Communication Architecture and Sim-to-Real Validation of a Graphical Open-Source Platform for Robotic Engineering Education

用于机器人工程教育的四层通信架构与图形化开源平台的仿真到现实验证

Thien Tran, Khang Duong, Minh Tran, Jonathan Kua, Thuong Hoang, Jiong Jin

发表机构 * Deakin University(德金大学) RMIT University(皇家墨尔本理工大学) Swinburne University of Technology(斯威本科技大学)

AI总结 针对大学实验室中机械臂教育规模化面临的商业数字孪生成本高和ROS门槛高的问题,提出一种四层通信架构,基于图形化开源平台(GOSP)实现虚拟环境与物理机器人的数据桥接,并通过仿真到现实验证其硬件无关的可行性。

Comments 4 pages, 4 figures, accepted as a Work-in-Progress (WiP) paper, on the 24th IEEE International Conference on Industrial Informatics (INDIN), 26-29 July, 2026, Melbourne, Australia

详情
AI中文摘要

在大学实验室中规模化开展真实的机械臂教育面临一个结构性难题:商业数字孪生通常成本高昂且脚本僵化,而开源机器人中间件(ROS)对新手来说存在陡峭的技术和语法门槛。为解决这一后勤和教育上的摩擦,本工作进展(WiP)论文提出了一种可扩展的四层通信架构,专为可持续的机器人课程设计。我们的研究不关注软件应用设计,而是考察桥接视觉概念环境与物理机器人端点所需的基础数据交换机制,并以图形化开源平台(GOSP)作为基础实例化。本WiP详细介绍了该框架的技术集成,包括3D视觉骨架建模与强大的ROS中间件后端,重点阐述了复杂通信例程的序列化、路由和封装。使用多轴空间轨迹进行的初步仿真到现实验证表明,封装这些通信管道提供了一条足够保真度的硬件无关路径。通过桥接虚拟设计与物理执行,该架构蓝图为工程教育提供了可行的基础设施。

英文摘要

The persistent challenge in scaling authentic manipulator education within university laboratories is a structural dichotomy: commercial digital twins are often cost-prohibitive and rigidly scripted, whereas open-source robotics middleware (ROS) imposes steep technical and syntax barriers for novices. To resolve this logistical and educational friction, this Work-in-Progress (WiP) paper proposes a scalable four-tier communication architecture tailored for sustainable robotic curricula. Rather than focusing on software application design, our study examines the underlying data exchange mechanisms required to bridge visual conceptual environments with physical robotic endpoints, utilizing the Graphical Open-Source Platform (GOSP) as a foundational instantiation. This WiP details the framework's technical integration of 3D visual armature modeling with a robust ROS middleware backend, emphasizing the serialization, routing, and encapsulation of intricate communication routines. Preliminary sim-to-real validation using multi-axis spatial trajectories confirms that encapsulating these communication pipelines provides a sufficient fidelity hardware-agnostic pathway. By bridging virtual design and physical execution, this architectural blueprint offers a viable infrastructure for engineering education.

2606.00520 2026-06-02 math.OC cs.LG stat.ML

In-Expectation Convergence of Stochastic Gradient Methods under Heavy-Tailed Noise

重尾噪声下随机梯度方法的期望收敛性

Zijian Liu

发表机构 * Stern School of Business, New York University(纽约大学斯特恩商学院)

AI总结 针对重尾噪声(有限p阶矩,p∈(1,2))下随机梯度方法的收敛性问题,证明了随机镜像下降(SMD)、加速随机镜像下降(ASMD)在凸优化中以及SGD和带动量的SGD(SGDM)在非凸优化中的期望收敛性,无需算法修改或有界域假设。

详情
AI中文摘要

许多随机梯度方法被认为在随机梯度的噪声仅具有有限$p$阶矩($p\in\left(1,2\right)$)时不会收敛,这种设置被称为重尾噪声假设。然而,最近的一些研究发现,随机梯度下降($\textsf{SGD}$)无需对其更新规则进行任何修改,就能在有界域的凸问题中出人意料地收敛,这凸显了经典随机梯度方法的潜力。受这一最新进展的启发,我们对重尾噪声下的随机优化进行了全面研究,并为凸优化中的随机镜像下降($\textsf{SMD}$)和加速随机镜像下降($\textsf{ASMD}$)以及非凸优化中的$\textsf{SGD}$和带动量的随机梯度下降($\textsf{SGDM}$)建立了新的期望收敛结果。值得注意的是,我们的结果不仅无需算法修改,而且避免了先前工作中施加的限制性假设,如有界域。更重要的是,我们的分析为研究重尾随机优化提供了一个新颖、优雅且强大的框架,为理解一阶随机梯度方法开辟了一条新途径。

英文摘要

Many stochastic gradient methods are believed not to converge when the noise in stochastic gradients has only a finite $p$-th moment for $p\in\left(1,2\right)$, a setting known as the heavy-tailed noise assumption. However, some recent studies have found that Stochastic Gradient Descent ($\textsf{SGD}$), without any modification to its update rule, can surprisingly converge in expectation for convex problems with bounded domains, highlighting the potential of classical stochastic gradient methods. Inspired by this recent progress, we provide a comprehensive study of stochastic optimization under heavy-tailed noise and establish new in-expectation convergence results for Stochastic Mirror Descent ($\textsf{SMD}$) and Accelerated Stochastic Mirror Descent ($\textsf{ASMD}$) in convex optimization, and for $\textsf{SGD}$ and Stochastic Gradient Descent with Momentum ($\textsf{SGDM}$) in nonconvex optimization. Notably, our results not only hold without algorithmic changes but also avoid restrictive assumptions, such as bounded domains, imposed in prior work. More importantly, our analysis provides a new, elegant, and powerful framework for studying heavy-tailed stochastic optimization, opening a new route to understanding first-order stochastic gradient methods.

2606.00500 2026-06-02 cs.DS cs.LG math.ST stat.ML stat.TH

Easy, robust approximate message passing for planted spike models

用于植入尖峰模型的简单、鲁棒近似消息传递

Misha Ivkov, Tselil Schramm

发表机构 * Stanford University(斯坦福大学)

AI总结 针对含对抗性噪声的尖峰矩阵模型,提出一种结合谱预处理与鲁棒谱初始化的算法,使近似消息传递(AMP)在无需修改的情况下实现鲁棒性,输出与无噪声AMP结果接近的向量。

Comments 32 pages

详情
AI中文摘要

我们提出了一种简单高效的算法,用于尖峰矩阵设置中的鲁棒近似消息传递(AMP)。特别地,设 $\varepsilon$ 为足够小的常数,并假设 $X \in \mathbb R^{n \times n}$ 是带有植入秩-$1$ 尖峰的高斯矩阵,而 $E \in \mathbb R^{n \times n}$ 是支撑在 $\varepsilon n \times \varepsilon n$ 主子矩阵上的对抗性选择矩阵。令 $v_{\mathrm{AMP}}(X)$ 为在未损坏矩阵 $X$ 上执行 AMP 迭代的输出。我们给出一个过程,仅给定损坏矩阵 $Y = X + E$,即可计算向量 $v_{\mathrm{ALG}}(Y)$,该向量与 $v_{\mathrm{AMP}}(X)$ 的差距为 $\tilde{O}(\sqrt{\varepsilon})$,适用于包括稀疏主成分分析(PCA)、非负 PCA 和 $\mathbb Z_2$ 同步在内的一类 AMP 迭代。我们的算法由谱预处理步骤结合鲁棒谱初始化过程组成;给定这些输入,我们证明(或许令人惊讶地)AMP 开箱即用具有鲁棒性。

英文摘要

We present a simple and efficient algorithm for robust approximate message passing (AMP) in the spiked matrix setting. In particular, let $\varepsilon$ be a sufficiently small constant, and suppose that $X \in \mathbb R^{n \times n}$ is a Gaussian matrix with a planted rank-$1$ spike, and $E \in \mathbb R^{n \times n}$ is an adversarially chosen matrix supported on an $\varepsilon n \times \varepsilon n$ principal minor. Let $v_{\mathrm{AMP}}(X)$ be the output of an AMP iteration on the uncorrupted matrix $X$. We give a procedure that, given access only to the corrupted matrix $Y = X + E$, computes a vector $v_{\mathrm{ALG}}(Y)$ which is $\tilde{O}(\sqrt{\varepsilon})$-close to $v_{\mathrm{AMP}}(X)$, for any of a class of AMP iterations which includes sparse Principal Component Analysis (PCA), non-negative PCA, and $\mathbb Z_2$ synchronization. Our algorithm consists of a spectral pre-processing step combined with a robust spectral initialization procedure; given these inputs, we prove that (perhaps surprisingly) AMP is robust out-of-the-box.

2606.00497 2026-06-02 cs.CR cs.CL

"I Strongly Suspect This Website Is a Scam": Benchmarking PII Leakage and Detection without Defense in Autonomous Web Agents

“我强烈怀疑这个网站是骗局”:自主网络代理中无防御的PII泄露与检测基准测试

Soham Roy, Sarthakbrata Halder, Arya Bharaty, Vaibhav Bhaskar, Yash Sinha, Dhruv Kumar, Srikant Panda, Murari Mandal

发表机构 * KIIT Bhubaneshwar(KIIT布巴内什瓦尔) BITS Pilani(比特斯理工学院) Lam Research(拉姆研究)

AI总结 本文通过构建包含91个攻击者控制环境和10个良性孪生基线的基准Scammer4U,评估前沿自主网络代理在社交工程攻击下的PII泄露风险,发现关键PII泄露率高达54-93%,并揭示了代理检测到攻击但仍有35.9%概率提交PII的检测-行动差距。

Comments 24 pages

详情
AI中文摘要

欺骗性网络内容广泛存在于互联网上,通常被称为社交工程攻击,它操纵自主网络代理将用户的个人身份信息(PII)提交给攻击者控制的端点。在本文中,我们表明社交工程攻击在从前沿网络代理中提取关键级PII方面非常有效,对已部署的代理系统构成严重风险。为了量化这一风险,我们引入了Scammer4U,一个预先注册的基准测试,包含91个攻击者控制的环境和10个良性孪生基线,涵盖8个攻击向量和16个站点类别,采用8轴因子分类法,隔离单个攻击设计因素的因果贡献。在前沿代理中,我们发现无隐私指导时关键级PII泄露率达到54-93%,而良性孪生基线为0%,证实泄露归因于攻击而非偶然的表单填写。升级提示级缓解措施在四个模型家族中产生急剧的模型依赖性降低,并且在汇总水平上仍不足以可靠地防止关键PII提交。最关键的是,我们识别出一个检测-行动差距:独立LLM法官确认代理推理已标记网站为可疑的情况下,代理仍然在35.9%的会话中提交关键PII,而代理未表达怀疑时为66.1%,这一30.2%的差距在四个模型家族中均稳健。我们的发现表明,基于代理自身对攻击识别的防御措施依赖于错误的信号,这激发了独立于代理推理循环的出站提交输出级拦截。

英文摘要

Deceptive web content, widely instantiated across the internet and commonly known as \textit{social-engineering attacks}, manipulates autonomous web agents into submitting users' personally identifiable information (PII) to attacker-controlled endpoints. In this paper, we show that social-engineering attacks are highly effective at extracting critical-tier PII from frontier web agents, posing a severe risk to deployed agentic systems. To quantify this risk, we introduce \textbf{\textsc{Scammer4U}}, a pre-registered benchmark of 91 attacker-controlled environments and 10 benign-twin baselines, spanning 8 attack vectors and 16 site categories on an 8-axis factorial taxonomy that isolates the causal contribution of individual attack design factors. Across frontier agents, we find that critical-tier PII leakage reaches 54--93\% under no privacy guidance, compared to 0\% on benign-twin baselines, confirming that leakage is attack-attributable rather than incidental form-filling. Escalating prompt-level mitigation yields sharply model-dependent reductions across the four families and remains insufficient to reliably prevent critical PII submission at the pooled level. Most critically, we identify a detection--action gap: agents whose reasoning an independent LLM judge confirms has flagged the site as suspicious still submit critical PII in 35.9\% of sessions, versus 66.1\% when no suspicion is verbalized, a 30.2\% gap robust across all four model families. Our findings reveal that defenses conditioned on the agent's own recognition of an attack are gating on the wrong signal, motivating output-level interception of outbound submissions that operates independently of the agent's reasoning loop.

2606.00483 2026-06-02 q-bio.GN cs.LG

Annotation-Informed Block-Sparse Bayesian Modeling for cis-Expression Prediction

基于注释信息的块稀疏贝叶斯建模用于顺式表达预测

Lei Huang, Hui Shen, Kuan-Jui Su, Chuan Qiu, Martha Isabel Gonzalez-Ramirez, Anqi Liu, Zhe Luo, Yun Gong, Yipu Zhang, Dawei Li, Chaoyang Zhang, Hong-Wen Deng

发表机构 * School of Computing Sciences and Computer Engineering, University of Southern Mississippi(南密西西比大学计算机科学与计算机工程学院) Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, School of Medicine, Tulane University(路易斯安那大学医学中心生物医学信息学与基因组学中心,德明医学部) Texas Tech University Health Sciences Center, School of Medicine, Texas Tech University(德克萨斯科技大学健康科学中心,德克萨斯科技大学医学院)

AI总结 提出块稀疏贝叶斯稀疏线性混合模型(bsBSLMM),通过整合LD块尖峰-板稀疏性和TSS先验,提高了顺式表达预测性能及下游TWAS发现能力。

Comments 16 pages manuscript; 38 pages supplementary

详情
AI中文摘要

基于基因型的顺式表达预测依赖于对局部调控架构的精确建模。我们提出了块稀疏贝叶斯稀疏线性混合模型(bsBSLMM),这是贝叶斯稀疏线性混合模型(BSLMM)的扩展,它整合了连锁不平衡(LD)块的尖峰-板稀疏性和转录起始位点(TSS)先验的SNP包含。在来自GEUVADIS欧洲血统淋巴母细胞系系的23,098个基因中,在匹配的评估标准下,bsBSLMM保留了比BSLMM、LASSO、BLUP、TIGAR弹性网和TIGAR狄利克雷过程回归更多可预测的基因。与BSLMM相比,bsBSLMM提高了大多数共享基因的留出预测性能,其增益主要由LD块稀疏性驱动,并通过TSS先验进一步增强。bsBSLMM选择的变异在GM12878 DNase和H3K27ac调控区域中显示出比BSLMM选择的变异更强的富集性。在全转录组关联研究(TWAS)分析中,bsBSLMM恢复了已建立的炎症性肠病信号,包括IL23R,并识别了BSLMM未检测到的其他全基因组显著基因。在路易斯安那州骨质疏松症研究中的独立验证重现了跨祖先的预测产量增加,并在下游TWAS和基因集富集分析中恢复了生物学相关的骨矿物质密度通路。这些结果表明,整合LD块结构和生物学先验的SNP改进了顺式表达预测并增强了下游TWAS发现。

英文摘要

Genotype-based cis-expression prediction depends on accurately modeling local regulatory architecture. We present block-sparse Bayesian sparse linear mixed model (bsBSLMM), an extension of Bayesian sparse linear mixed model (BSLMM) that incorporates linkage disequilibrium (LD)-block spike-and-slab sparsity and a transcription start site (TSS)-informed SNP inclusion prior. Across 23,098 genes from GEUVADIS European-ancestry lymphoblastoid cell lines, bsBSLMM retained more predictable genes than BSLMM, LASSO, BLUP, TIGAR elastic net, and TIGAR Dirichlet-process regression under matched evaluation criteria. Compared with BSLMM, bsBSLMM improved held-out prediction performance for most shared genes, with gains driven primarily by LD-block sparsity and further enhanced by the TSS-informed prior. Variants selected by bsBSLMM showed stronger enrichment in GM12878 DNase and H3K27ac regulatory regions than variants selected by BSLMM. In transcriptome-wide association study (TWAS) analysis, bsBSLMM recovered established inflammatory bowel disease signals, including IL23R, and identified additional genome-wide significant genes not detected by BSLMM. Independent validation in the Louisiana Osteoporosis Study reproduced the increased prediction yield across ancestries and recovered biologically relevant bone mineral density pathways in downstream TWAS and gene set enrichment analyses. These results demonstrate that incorporating LD-block structure and biologically informed SNP priors improves cis-expression prediction and enhances downstream TWAS discovery.

2606.00448 2026-06-02 cs.SE cs.AI cs.CR

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems

当安全技能碰撞:衡量智能体技能生态系统中的组合风险

Su Wang, Pin Qian, Yihang Chen, Junxian You, Xiaoyuan Wang, Xiaochong Jiang, Lifei Liu, Haoran Yu, Jingzhou Xu

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Georgia Institute of Technology(佐治亚理工学院) University of Glasgow(格拉斯哥大学) Independent Researcher(独立研究员) Corespeed Inc.(Corespeed公司)

AI总结 本文提出SkillReact框架,通过静态组合基准、双评分者LLM辅助人工审核和基于动作的可利用性测试,研究LLM智能体中个体安全的技能组合后可能产生的不安全行为,发现约18.2%的候选组合存在真实风险,且主机模型决定是否利用这些组合能力。

详情
AI中文摘要

LLM智能体越来越依赖社区贡献的技能,这些技能扩展了智能体的操作能力集。我们研究了智能体AI系统中的一个核心安全问题:个体安全的技能是否可能组合成不安全的已安装技能集。我们提出了SkillReact,一个组合安全测量框架,包含三个组件:一个确定性静态组合基准、一个双评分者LLM辅助人工审核流程,以及一个基于动作的可利用性测试工具。在1,520个ClawHub技能中,651个通过个体检查并形成211,575对;基准标记其中22.25%为结构候选。我们将这个原始比率视为面向召回率的扫描上限,并根据人类判断进行校准:在按模式分层的审计中,大约五分之一的标记对模式命中被确认为真实的组合风险(人口加权有效性18.2%,我们的主要结果),这意味着在单个注册表中约有14K个真实风险成员,而按技能扫描由于构造原因会遗漏这些风险,因为每一对个体都是安全的。然后,基于动作的测试工具探测这些候选何时成为模型发出的工具调用,并发现实现受主机模型倾向的门控:在一个锚定条件的dropper子集上,Haiku-4-5在所有39次直接提示试验中发出了dropper阶段的工具调用(其中36次是完整的下载然后执行链,3次仅下载),Opus-4-7在下载阶段停止,而Sonnet-4-6直接拒绝。一个保持请求固定且仅改变已安装技能的对照实验发现,未安装任何技能时合规性最高:组合决定了哪些能力可达,而主机模型决定是否使用它们。这些结果共同表明,安装时组合检查和能力隔离是对按技能扫描的补充。

英文摘要

LLM agents increasingly rely on community-contributed skills that expand an agent's operational capability set. We study a core safety problem in agentic AI systems: whether individually safe skills can compose into unsafe installed skill sets. We present SkillReact, a compositional security measurement framework with three components: a deterministic static-composition benchmark, a two-rater LLM-assisted human-adjudication pipeline, and an action-based exploitability harness. On 1,520 ClawHub skills, 651 pass individual inspection and form 211,575 pairs; the benchmark flags 22.25% of these as structural candidates. We treat this raw rate as a recall-oriented scanner ceiling and calibrate it against human judgment: in a pattern-stratified audit, roughly one in five flagged pair-pattern hits survives as a real compositional risk (population-weighted validity 18.2%, our headline result), implying about 14K genuine risk memberships in a single registry that per-skill scanning misses by construction, since every pair is individually safe. An action-based harness then probes when these candidates become model-issued tool calls, and finds realization gated by host-model disposition: on an anchor-conditioned dropper subset, Haiku-4-5 issues the dropper-stage tool call on all 39 direct-prompt trials (36 of them the full download-then-execute chain, 3 download-only), Opus-4-7 stops at the download, and Sonnet-4-6 refuses outright. A control that holds the request fixed and varies only the installed skills finds compliance highest with no skills installed: a composition fixes which capabilities are reachable, while the host model decides whether to use them. Together these motivate install-time compositional checks and capability isolation as complements to per-skill scanning.

2606.00422 2026-06-02 cs.IR cs.LG

UniPinRec: Unifying Generative Retrieval and Ranking at Pinterest Scale

UniPinRec:在Pinterest规模下统一生成式检索与排序

Hanyu Li, Yi-Ping Hsu, Aditya Mantha, Prabhat Agarwal, Laksh Bhasin, Jialu Wang, Hongtao Lin, Bella Huang, Yaxin Li, Xinyi Li, Chuxi Wang, Kousik Rajesh, Hooshmand Shokri Razaghi, Shunyao Li, Zongyue Qin, Jaewon Yang, James Li, Dhruvil Deven Badani, Jiajing Xu, Charles Rosenberg

发表机构 * Pinterest

AI总结 提出UniPinRec,通过共享Transformer编码用户行为序列,结合掩码动作建模、混合训练样本和跨阶段KV缓存共享,在Pinterest生产系统中首次实现检索与排序的全栈统一,提升在线参与度并降低延迟。

详情
AI中文摘要

现代推荐系统主要将检索和排序作为独立模型训练,尽管两者都越来越依赖编码相同用户行为数据的大型Transformer,导致参数、计算和服务成本重复。先前的工作统一了模型架构,但未统一完整流程:输入格式、训练过程和服务栈在阶段间仍然分散。我们提出UniPinRec,在Pinterest实现了检索和排序的全栈统一:一种输入格式、一个模型、一个训练阶段,部署在现有服务基础设施中。共享Transformer将用户行为序列编码为候选无关的表示,通过任务特定的头部分支到检索(ANN点积)和排序(交叉注意力)。三个关键思想使此工作成立:(1)掩码动作建模(MAM)消除了交错,使得无需加倍上下文长度即可实现权重共享;(2)混合训练样本将动作序列与feedview曝光列表配对,以共同满足两个目标;(3)跨阶段KV缓存共享重用检索中的用户历史计算用于排序,相比服务两个独立模型减少了总FLOPs。部署在Pinterest核心表面,UniPinRec实现了约+1%的在线参与度提升,同时将端到端服务延迟降低11.1%,QPS提升63.6%。据我们所知,这是首个在生产推荐系统中实现检索和排序全栈统一的工作,涵盖输入、模型、训练和服务。

英文摘要

Modern recommendation systems predominantly train retrieval and ranking as separate models despite both increasingly relying on large transformers encoding the same user behavior data, duplicating parameters, compute, and serving cost. Prior work unifies the model architecture but not the full pipeline: input formats, training procedures, and serving stacks remain fragmented across stages. We present UniPinRec, which achieves full-stack unification of retrieval and ranking at Pinterest: one input format, one model, one training stage, deployed within existing serving infrastructure. A shared transformer encodes the user action sequence into candidate-independent representations that branch into retrieval (ANN dot-product) and ranking (cross-attention) via task-specific heads. Three ideas make this work: (1) Masked Action Modeling (MAM) eliminates interleaving, enabling weight sharing without doubling context length; (2) Blended training examples pair action sequences with feedview impression slates to satisfy both objectives jointly; (3) Cross-stage KV cache sharing reuses user-history computation from retrieval for ranking, reducing total FLOPs versus serving two independent models. Deployed in the Pinterest core surfaces, UniPinRec delivers approximately +1% online engagement lift while cutting end-to-end serving latency by 11.1% and lifting QPS by 63.6%. To our knowledge, this is the first full-stack unification of retrieval and ranking, covering inputs, model, training and serving, deployed in a production recommendation system.

2606.00417 2026-06-02 cs.NI cs.AI

AgentxGCore: Agentic AI for Next-Generation Mobile Core Network

AgentxGCore:面向下一代移动核心网络的智能体AI

Maria Katarine Santana Barbosa, Kelvin L. Dias

发表机构 * Centro de Informática - Universidade Federal de Pernambuco(计算机中心 - 佩鲁巴科联邦大学)

AI总结 本文提出AgentxGCore,通过智能体AI原生层扩展3GPP架构,利用多智能体系统实现基于实时信息的闭环优化,支持自组织和自适应。

Comments This paper has been accepted for publication in IEEE Network

详情
AI中文摘要

为满足新兴应用的严格要求以及日益复杂的网络管理和操作,下一代移动网络(NextG)或6G将在核心网(CN)上采用AI原生架构。在此进程中,第三代合作伙伴计划(3GPP)已通过新功能扩展蜂窝CN,作为集成分析、人工智能(AI)和机器学习的第一步。然而,这些新功能受限于集中式方法和管理复杂性。此外,随着大型语言模型(LLM)的兴起,网络编排和管理进入新时代,利用并赋能基于意图的网络(IBN)范式。同时,AI智能体和智能体AI集成了推理与行动(ReAct),使得能够利用此类意图持续与网络交互。与主要采用智能体AI来缓解CN中部署和配置复杂性的现有方法不同,本文介绍了AgentxGCore,它利用智能体AI原生层扩展3GPP架构,并基于超越下一代核心网(xGC)域中的现有API构建系统。该提案建立了基于实时信息的AI驱动闭环,用于持续优化,实现自组织和自适应。我们的方法涉及一个多智能体专用系统,分为网络规划智能体(能够可视化网络状态并制定满足意图的计划)和网络执行器(负责批评并执行计划)。为验证所提方案,使用开源CN、异构数据集构建了环境,并采用不同的LLM来证明其有效性。

英文摘要

To meet the stringent requirements of emerging applications and the increasingly complex network management and operation, the Next Generation Mobile Networks (NextG), or 6G, will adopt an AI-native architecture on the Core Network (CN). In this movement, the Third Generation Partnership Project (3GPP) has extended the cellular CN with new function as a first step toward integrating analytics, Artificial Intelligence (AI), and machine learning. However, those new functionalities are constrained by a centralized approach and managerial complexity. Furthermore, with the rise of Large Language Models (LLMs), a new era in network orchestration and management begins, leveraging and empowering the Intent-based Networking (IBN) paradigm. In addition, AI agents and Agentic AI integrate Reasoning and Acting (ReAct), enabling the usage of such intents to continuously interact with the network. Unlike state-of-the-art approaches that primarily employ Agentic AI to mitigate deployment and configuration complexity in the CN, this paper introduces AgentxGCore, which leverages an Agentic AI-Native layer to extend the 3GPP architecture and enable a system based on the existing APIs across the Beyond Next Generation Core (xGC) domain. This proposal establishes an AI-driven closed-loop for continuous optimization based on real-time information, enabling self-organization and self-adaptation. Our approach involves a multi-agent specialized system, divided into a network planner agent, capable of visualizing the network state and developing a plan to meet the intents, and a network executor, responsible for criticizing and executing the plan. To validate the proposed solution, an environment was built using an open-source CN, heterogeneous datasets, and different LLMs were employed to demonstrate its effectiveness.

2606.00413 2026-06-02 stat.ML cs.LG

Riemannian Stochastic Optimization for Sufficient Dimension Reduction

充分降维的黎曼随机优化

Thibault Pautrel, François Portier

发表机构 * Laboratoire des Signaux et Systèmes (L2S), CentraleSupélec, Université Paris-Saclay, Gif-sur-Yvette, France(信号与系统实验室(L2S),中央超导电子研究所,巴黎-萨克雷大学,吉夫-sur-伊夫特,法国)

AI总结 提出一种基于黎曼流形随机梯度上升的算法SMAVE,通过将充分降维问题转化为Stiefel流形上的光滑最大化,实现高效的低维子空间恢复。

详情
AI中文摘要

充分降维(SDR)通过将协变量投影到保留响应条件均值的低维子空间,使高维回归变得易于处理。现有的基于梯度的估计器要么在原始空间中操作并遭受维数灾难,要么在降维空间中局部化,每次外迭代的代价至少与样本量成二次关系。我们证明了总体最小平均方差估计(MAVE)风险的最小化器与梯度外积(OPG)逼近相同的Grassmannian目标,并将经验准则重新表述为Stiefel流形上的光滑最大化,具有闭式黎曼梯度。由此产生的算法SMAVE结合了稀疏投影空间最近邻局部化和黎曼随机梯度上升。简化版本具有几乎必然收敛性和非渐近速率,匹配标准的非凸随机一阶缩放。实验上,SMAVE在中高维环境中匹配或改进了RMAVE的合成子空间恢复,在四个真实数据集上一致优于OPG,并且与RMAVE相比具有竞争力或更优,同时运行时间低几个数量级。

英文摘要

Sufficient dimension reduction (SDR) makes high-dimensional regression tractable by projecting the covariates onto a low-dimensional subspace that preserves the conditional mean of the response. Existing gradient-based estimators either operate in the ambient space and suffer from the curse of dimensionality, or localize in the reduced space at a per-outer-iteration cost at least quadratic in the sample size. We show that minimizers of the population Minimum Average Variance Estimation (MAVE) risk approximate the same Grassmannian target as the Outer Product of Gradients (OPG), and recast the empirical criterion as a smooth maximization on the Stiefel manifold with closed-form Riemannian gradient. The resulting algorithm, SMAVE, combines sparse projected-space nearest-neighbor localization with Riemannian stochastic gradient ascent. A simplified version comes with almost-sure convergence and a non-asymptotic rate matching the standard non-convex stochastic first-order scaling. Empirically, SMAVE matches or improves on RMAVE's synthetic subspace recovery at moderate-to-high ambient dimension, and on four real datasets it uniformly improves over OPG and is competitive with or outperforms RMAVE at orders of magnitude lower runtime.

2606.00402 2026-06-02 stat.ME cs.AI stat.AP

A Distribution-Free Framework for Rewrite-Based Human-text Detection via Knockoff Filtering

基于重写的人类文本检测的无分布框架:通过Knockoff过滤

Yi Liu

发表机构 * Prorata.ai

AI总结 提出一种无分布统计框架,将任意基于重写的检测器转化为具有有限样本FDR保证的检测器,无需重新训练,通过将重写检测视为具有knockoff结构的多重假设检验问题实现。

详情
AI中文摘要

我们提出了一种无分布统计框架,该框架无需重新训练即可将任意基于重写的检测器转化为具有有限样本FDR保证的检测器。我们的关键观察是,基于重写的检测隐式地构建了knockoff样本,使得LLM生成的文本检测可以被表述为具有knockoff结构的多重假设检验问题。这一视角将检测统计量的设计与错误发现的控制分离开来,通过一个简单的校准过程,使现有的重写检测器能够继承有限样本错误发现率(FDR)保证。我们在三个检测模型、19个领域和四个LLM上展示了可靠的FDR控制和有意义的检测能力。

英文摘要

We propose a distribution-free statistical framework that converts arbitrary rewrite-based detectors into detectors with finite-sample FDR guarantees without retraining. Our key observation is that rewrite-based detection implicitly constructs knockoff samples, enabling LLM-generated text detection to be formulated as a multiple hypothesis testing problem with knockoff structure. This perspective separates the design of detection statistics from the control of false discoveries, allowing existing rewrite detectors to inherit finite-sample false discovery rate (FDR) guarantees through a simple calibration procedure. We demonstrate reliable FDR control with meaningful detection power across three detection models, 19 domains, and four LLMs.

2606.00401 2026-06-02 physics.comp-ph cond-mat.mtrl-sci cs.LG cs.NA math.NA

Data-Driven Spectral Prediction for Accelerating Large-Scale Electronic Structure Calculations

数据驱动的光谱预测加速大规模电子结构计算

Abhiram Badrinarayanan, Davor Davidovic, Edoardo Di Napoli, Jurica Novak, Luigi Genovese, Gustavo Ramirez-Hidalgo, Xinzhe Wu

发表机构 * Jülich Supercomputing Centre, Forschungszentrum Jülich, Germany(耶拿超级计算中心,耶拿研究中心,德国) Ruđer Bošković Institute, Croatia(鲁德·波斯科维奇研究所,克罗地亚)

AI总结 针对大规模电子结构计算中广义本征问题求解的瓶颈,提出基于数据驱动的光谱预测框架,通过机器学习预测切比雪夫多项式系数,提供初始猜测以跳过早期自洽场迭代,并优化有理滤波器本征求解器。

详情
AI中文摘要

模拟包含数千个原子的大分子系统需要高度可扩展的方法。虽然现代密度泛函理论(DFT)代码具有线性标度性,但在百亿亿次架构上,求解相关的大规模稀疏广义本征问题仍然是关键的计算瓶颈。在LimitX项目背景下,我们提出了一个数据驱动框架来加速这些计算。通过将机器学习目标从离散特征值转移到插值切比雪夫多项式的系数,并比较全原子和基于片段的结构表示,我们成功克服了大规模光谱预测的维度限制。我们研究了三种机器学习模型(核岭回归、图神经网络和随机森林),这些模型在包含2 TB蛋白质二聚体的新数据集上进行训练。预测的光谱提供了初始猜测,有效跳过了BigDFT中的早期自洽场(SCF)迭代。最终,这些光谱预测器将被部署以动态优化即将推出的基于有理滤波器的本征求解器(如目前处于初期开发阶段的FrASE)。

英文摘要

Simulating large molecular systems comprising thousands of atoms requires highly scalable methodologies. While modern Density Functional Theory (DFT) codes exhibit linear scaling, solving the associated large, sparse generalized eigenproblems remains a critical computational bottleneck on exascale architectures. In the context of the LimitX project, we propose a data-driven framework to accelerate these calculations. By shifting the machine learning target from discrete eigenvalues to the coefficients of an interpolating Chebyshev polynomial, and by comparing both all-atom and fragment-based structural representations, we successfully overcome the dimensionality constraints of large-scale spectral prediction. We investigate three machine learning models (Kernel Ridge Regression, Graph Neural Networks, and Random Forests) trained on a novel 2 TB dataset of protein dimers. The predicted spectra provide initial guesses that effectively bypass early Self-Consistent Field (SCF) iterations in BigDFT. Ultimately, these spectral predictors will be deployed to dynamically optimize upcoming rational filter-based eigensolvers, such as FrASE, which is currently in initial development.

2606.00393 2026-06-02 eess.IV cs.CV

AutoIQ: An Ensemble Framework for Automatic Assessment of Geometric Distortion in Prostate Diffusion-Weighted Imaging

AutoIQ:前列腺扩散加权成像中几何畸变自动评估的集成框架

Haoran Sun, Lixia Wang, Yin-Chen Hsu, Hsu-Lei Lee, Chang Gao, Fei Han, Robert Grimm, Vibhas Deshpande, Ziyang Long, Hsin-Jung Yang, Rola Saouaf, Alessandro D'Agnolo, Timothy Daskivich, Hyung Kim, Debiao Li, Yibin Xie

发表机构 * Biomedical Imaging Research Institute, Cedars-Sinai Medical Center(生物医学成像研究 institute, Cedars-Sinai 医疗中心) Department of Bioengineering, University of California(生物工程系,加州大学) Siemens Medical Solutions USA Inc.(西门子医疗解决方案美国公司) Siemens Healthineers AG(西门子健康影像股份有限公司) Department of Imaging, Cedars-Sinai Medical Center(成像部,Cedars-Sinai 医疗中心) Department of Nuclear Medicine, Cedars-Sinai Medical Center(核医学部,Cedars-Sinai 医疗中心) Department of Urology, Cedars-Sinai Medical Center(泌尿科,Cedars-Sinai 医疗中心)

AI总结 提出AutoIQ集成机器学习框架,结合分割和配准方法量化DWI几何畸变,用于自动分类畸变严重程度,在独立测试集上达到0.95准确率。

Comments Original research; 11 pages, 7 figures, 1 table

详情
AI中文摘要

前列腺扩散加权成像(DWI)中的几何畸变会损害病灶定位并降低基于MRI的临床评估的可靠性。我们提出了AutoIQ,一个用于自动量化和分类DWI几何畸变严重程度的集成机器学习框架。共分析了140例回顾性前列腺双参数MRI检查,包括33次严重畸变需要重复采集的扫描和107次基于放射科专家评估可接受的畸变扫描。AutoIQ结合了两种互补的畸变量化策略:一种基于分割的方法,测量T2加权成像(T2WI)和DWI之间的前列腺边界不匹配;另一种基于配准的方法,估计DWI到T2WI对齐后的变形幅度。由此产生的畸变分数用于训练单个分类器和逻辑回归集成模型。两种计算方法均显著区分了严重和可接受的畸变病例(p < 0.001)。在独立测试集上,集成模型达到了0.95的准确率、0.93的F1分数和0.98的AUC,优于单个模型。这些结果表明,AutoIQ可以为前列腺DWI提供自动化的定量质量评估,并可能有助于识别需要重复采集的扫描。

英文摘要

Geometric distortion in prostate diffusion-weighted imaging (DWI) can impair lesion localization and reduce the reliability of MRI-based clinical assessment. We propose AutoIQ, an ensemble machine learning framework for automatic quantification and classification of DWI geometric distortion severity. A total of 140 retrospective prostate biparametric MRI examinations were analyzed, including 33 scans with severe distortion requiring repeat acquisition and 107 scans with acceptable distortion based on expert radiologist assessment. AutoIQ combines two complementary distortion quantification strategies: a segmentation-based method measuring prostate boundary mismatch between T2-weighted imaging (T2WI) and DWI, and a registration-based method estimating deformation magnitude after DWI-to-T2WI alignment. The resulting distortion scores were used to train individual classifiers and a logistic-regression ensemble model. Both computational methods significantly differentiated severe from acceptable distortion cases (p < 0.001). On an independent test set, the ensemble model achieved an accuracy of 0.95, F1-score of 0.93, and AUC of 0.98, outperforming individual models. These results suggest that AutoIQ can provide automated, quantitative quality assessment for prostate DWI and may help identify scans that require repeat acquisition.

2606.00370 2026-06-02 cs.HC cs.AI

Agentic Authoring of Interactive Multiview Visualizations in Genomics

交互式多视图基因组学可视化的智能体创作

Astrid van den Brandt, Kiroong Choe, Sehi L'Yi, Devin Lange, Nils Gehlenborg

发表机构 * Harvard Medical School(哈佛医学院) Boston College(波士顿学院)

AI总结 针对基因组学可视化创作中定制化不足和编程门槛高的问题,提出基于大语言模型的智能体方案,通过结构化输出和迭代优化提升可视化质量。

Comments 11 pages, 12 figures

详情
AI中文摘要

多样化的基因组学数据、科学问题和分析任务通常需要高度专业化的可视化。因此,用户通常必须定制或创作适合其数据的新可视化。现有工具要么定制能力有限,要么需要大量学习或编程,即使表达力强的工具也假设用户具备可视化专业知识,而许多用户缺乏这一点。智能体和大型语言模型方法越来越多地应用于复杂的科学任务,包括可视化。自然语言对话界面为复杂可视化的创作民主化提供了一条有希望的途径。在基因组学背景下,这些方法面临额外挑战:基因组学可视化通常整合异构数据类型,并由多个链接的交互式视图组成。这些挑战促使我们设计更结构化的基于LLM的方案。我们首先描述了普通LLM生成在基因组学可视化中成功和失败的地方,确定了八个质量维度。然后,我们比较了六种方案——直接生成、固定流水线和四种智能体配置(在专业智能体数量和是否存在审查者方面有所不同)——跨越159个案例,涵盖三个查询模糊性和规范复杂性级别。所有方案都使用Gosling可视化语法作为结构化输出。智能体迭代在感知质量上显著优于两个基线,而更复杂的智能体架构没有带来额外收益。我们讨论了为特定领域可视化创作设计智能体系统的启示。所有补充材料可在https://osf.io/uqe83获取。

英文摘要

Diverse genomics data, scientific questions, and analysis tasks typically demand highly specialized visualizations. Therefore, users often must customize or author new ones tailored to their data. Existing tools are usually either limited in customization or require substantial learning or programming, and even expressive tools assume visualization expertise many users lack. Agentic and large language model (LLM) approaches are increasingly applied to complex scientific tasks, including visualization. Natural-language conversational interfaces offer a promising path to democratizing the authoring of complex visualizations. In the context of genomics, these approaches face additional challenges: genomics visualizations typically integrate heterogeneous data types and are composed of multiple linked interactive views. These challenges motivate more structured LLM-based schemes. We first characterize where vanilla LLM generation succeeds and fails for genomics visualization, identifying eight quality dimensions. We then compare six schemes--direct generation, a fixed pipeline, and four agentic configurations varying in the number of specialist agents and the presence of a reviewer--across 159 cases spanning three levels of query ambiguity and specification complexity. All schemes use the Gosling visualization grammar as structured output. Agentic iteration substantially improves perceived quality over both baselines, while more complex agent architectures yield no additional benefit. We discuss implications for designing agentic systems for domain-specific visualization authoring. All supplemental materials are available at https://osf.io/uqe83.

2606.00369 2026-06-02 cs.CY cs.LG

Quantifying the Salience of Geo-Cultural Values for Pluralistic Safety Alignment

量化地理文化价值对多元安全对齐的显著性

Arkadiy Saakyan, Charvi Rastogi, Lora Aroyo

发表机构 * University of Oxford(牛津大学) University of Cambridge(剑桥大学)

AI总结 通过多层次模型分析,发现文化区域归属对安全评分有显著影响(p<0.05),约10%的项目存在文化敏感性,当前LLM无法可靠替代人类评分员但可辅助筛选。

Comments 119 pages, 13 figures. ICML 2026 camera ready

详情
AI中文摘要

AI模型的安全全球部署需要与跨文化的人类价值观对齐。然而,安全评估数据集中的评分者群体在地理上仍然高度同质,未能捕捉地理文化差异。此外,在控制年龄、性别和种族等人口统计学因素后,这些差异是否仍然存在尚不清楚。通过对安全数据集的元分析,我们发现大多数数据集未报告地理文化信息,而那些报告的数据集缺乏统一的方法来联合分析地理文化和人口统计学相关性。利用Inglehart-Welzel跨文化变异维度,我们通过多层次模型证明,文化区域归属解释了超出标准人口统计学变量的安全评分方差(6个数据集中p<0.05)。此外,我们的分析表明,我们检查的数据集中大约10%的项目具有文化敏感性:如果没有充分的文化代表性,这些项目很可能被错误分类为安全。我们将LLM评估为评分替代工具和分诊工具,发现当前的LLM不能可靠地替代评分员,尽管它们可以帮助优先选择文化敏感项目进行人工标注。我们的发现推动了更多文化多元的安全评估,并提供了支持其实践的实用建议。

英文摘要

Safe global deployment of AI models requires alignment with human values that vary across cultures. Yet rater pools in safety evaluation datasets remain largely geographically homogeneous, failing to capture geo-cultural differences. Further, it remains unclear whether such differences persist after controlling for demographics such as age, gender, and ethnicity. Through a meta-analysis of safety datasets, we find that most do not report geo-cultural information, and those that do lack a unified methodology to jointly analyze geo-cultural and demographic correlates. Using the Inglehart-Welzel dimensions of cross-cultural variation, we demonstrate via multilevel modeling that cultural zone membership explains variance in safety ratings beyond standard demographics (p<0.05 across 6 datasets). Moreover, our analysis indicates that roughly 10% of items in the datasets we examined are culturally sensitive: likely to be misclassified as safe without adequate cultural representation. We evaluate LLMs as both rater surrogates and triage tools, finding that current LLMs do not reliably stand in for raters, though they can help prioritize culturally sensitive items for human annotation. Our findings motivate more culturally pluralistic safety evaluation and offer practical takeaways to support it.

2606.00329 2026-06-02 eess.SY cs.LG cs.SY stat.ML

Benchmarking Recursive-Collapse Warning Claims Under Matched False-Positive Control

在匹配假阳性控制下对递归崩溃警告声明的基准测试

David Mullett

发表机构 * Independent Researcher(独立研究者)

AI总结 提出Loopzero基准框架,通过方向性遥测模式(增益G、递归持久性p、多样性δ)在匹配假阳性预算下评估递归系统崩溃警告声明,并报告标准检测器未达到可接受工作点。

Comments 29 pages, 7 figures, 2 tables; supplementary materials: 9 pages, 1 figure, 4 tables. Code, derived data packets, and Lean artifact: https://github.com/davidmullett/loopzero-paper-public (release tag lean-v1.0)

详情
AI中文摘要

递归系统在明显故障变得可见之前可能进入类似崩溃的状态——自我强化放大、持续递归和多样性缩小,这些掩盖了加速的内部退化。我们引入了Loopzero,一个声明约束的基准框架,用于测试递归故障是否遵循方向性遥测模式:上升增益(G)、递归持久性(p)和下降多样性(δ)。声明边界在Lean中指定;Lean构件不验证实际遥测、基准有效性或检测器性能。我们在两个冻结的公共构件基准上评估桥梁:一个分段公共市场基准(2018年Volmageddon,2020年COVID MWCB)和一个MovieLens-25M离线确定性推荐回放。检测器在锁定等假阳性合同(FP ∈ [0.03, 0.07],预注册)下进行评估,因此所有配置面临相同的警报预算。测试的标准比较器和Loopzero预注册的分位数检测器均未达到可接受的工作点。方向性证人对齐在两个规范基准上成立,并披露了相邻视野和行级限制。数字化Shumailov等人(2024)的LLM训练循环轨迹在方向上与模式一致;该领域的匹配假阳性评估被推迟。贡献是一个可复现、可证伪的基准框架,用于在显式警报预算合同下评估递归崩溃警告声明——将不接受报告为第一类科学结果。

英文摘要

Recursive systems can enter collapse-like regimes -- self-reinforcing amplification, persistent recursion, and narrowing diversity that mask accelerating internal degradation -- before overt failure becomes visible. We introduce Loopzero, a claim-bounded benchmark framework for testing whether recursive failures follow a directional telemetry pattern: rising gain (G), recursive persistence (p), and declining diversity ($δ$). The claim boundary is specified in Lean; the Lean artifact does not verify real telemetry, benchmark validity, or detector performance. We evaluate the bridge on two frozen public-artifact benchmarks: a segmented public-markets benchmark (Volmageddon 2018, COVID MWCB 2020) and a MovieLens-25M offline deterministic recommender replay. Detectors are evaluated under a locked equal-false-positive contract (FP $\in$ [0.03, 0.07], pre-registered) so all configurations face the same alert budget. Neither tested standard comparators nor Loopzero's pre-registered quantile detector achieved an accepted operating point. Directional witness alignment held on both canonical benchmarks, with adjacent-horizon and row-level limitations disclosed. Digitized Shumailov et al. (2024) LLM training-loop trajectories are directionally consistent with the pattern; matched-FP evaluation in that domain is deferred. The contribution is a reproducible, falsifiable benchmark framework for evaluating recursive-collapse warning claims under an explicit alert-budget contract -- non-acceptance reported as a first-class scientific outcome.

2606.00327 2026-06-02 stat.ME cs.LG stat.AP stat.ML

Cluster Analysis with Resampling for Validation and Exploration (CARVE)

基于重采样的聚类验证与探索分析(CARVE)

Kai R. Wycik, Tiffany M. Tang, Tarek M. Zikry, Genevera I. Allen

发表机构 * Department of Statistics, Columbia University, New York, NY, USA(哥伦比亚大学统计学系) Center for Theoretical Neuroscience, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA(哥伦比亚大学理论神经科学中心、Zuckerman思维-大脑-行为研究所) Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, USA(诺丁汉大学应用与计算数学与统计学系) School of Data and Information Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA(北卡罗来纳大学夏洛特分校数据与信息科学学院) Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA(哥伦比亚大学癌症动力学伊万·里弗斯研究所)

AI总结 提出CARVE开源软件包,通过重采样评估聚类稳定性和泛化性,在全局、簇和样本级别提供诊断,优于传统聚类验证指标。

详情
AI中文摘要

聚类在科学领域被广泛用作下游数据驱动科学发现的基础。然而,聚类结果对算法选择、预处理和聚类数$k$高度敏感,导致科学声明往往不可重复。当前用于验证聚类解决方案的最先进技术包括轮廓系数、Davies-Bouldin和Calinski-Harabasz等聚类验证指标(CVI),这些指标依赖于几何假设,但在生物医学研究中遇到的重尾、高维和非线性结构数据上失效。基于重采样的替代方法——基于聚类稳定性和泛化性的思想——已被提出,但仍分散在专门的工具中,缺乏统一、易用的软件。我们通过CARVE(基于重采样的聚类验证与探索分析)填补了这一空白,这是一个开源的Python和R包,可联合评估多个聚类算法和超参数,在全局、簇和样本级别返回稳定性和泛化性诊断,以及基于原则的选择规则和基于共识的簇标签。在六个合成基准测试中,CARVE一致地恢复了接近最优的聚类,而经典指标则显著退化。在实验基因组学和蛋白质组学数据集上,当经典CVI完全失效时,CARVE恢复了更精细的生物结构。CARVE提供与scikit-learn兼容的Python API和与Seurat工作流兼容的类似R接口。

英文摘要

Clustering is widely used across the sciences as the foundation for downstream data-driven scientific discoveries. However, clustering results are highly sensitive to the choice of algorithm, preprocessing, and the number of clusters $k$, producing scientific claims that are often not reproducible. The current state of the art for validating clustering solutions consists of clustering validation indices (CVIs) such as Silhouette, Davies-Bouldin, and Calinski-Harabasz, which rely on geometric assumptions that break down on the heavy-tailed, high-dimensional, and nonlinearly structured data encountered in biomedical research. Resampling-based alternatives - grounded in the ideas of clustering stability and generalizability - have been proposed but remain scattered across specialized tools with no unified, accessible software. We fill this gap with CARVE (Cluster Analysis with Resampling for Validation and Exploration), an open-source Python and R package that jointly evaluates multiple clustering algorithms and hyperparameters, returning stability and generalizability diagnostics at the global, cluster, and sample level together with principled selection rules and consensus-based cluster labels. Across six synthetic benchmarks CARVE consistently recovers near-optimal clusterings where classical indices degrade substantially. On experimental genomics and proteomics data sets, CARVE recovers finer biological structure when classical CVIs collapse entirely. CARVE is available with a scikit-learn-compatible Python API and an analogous R interface compatible with Seurat workflows.

2606.00324 2026-06-02 cs.IR cs.AI

LLMs Need Encoders for Semantic IDs Too

LLM 也需要语义 ID 的编码器

Xiangyi Chen, Zelun Wang, Xinyi Li, Yi-Ping Hsu, Jaewon Yang, Jiajing Xu

发表机构 * Pinterest United States(Pinterest美国公司)

AI总结 提出 PrefixMem,一种基于前缀 n-gram 记忆表的轻量级语义 ID 编码器,为 LLM 提供结构化、前缀条件的表示,显著提升生成推荐中的语义 ID 准确率和检索召回率。

详情
AI中文摘要

多模态 LLM 使用专用编码器来桥接非语言模态(图像用视觉编码器,音频编解码器令牌用深度模型),因为原始令牌嵌入无法捕获模态特定的结构。我们认为语义 ID(SID),即生成推荐中使用的层次化代码,构成了另一种这样的模态:SID 级别令牌的含义取决于其前缀上下文,但当前系统只是将 SID 令牌添加到词汇表中,并依赖训练从头学习这些上下文相关的含义。我们提出 PrefixMem,一种基于前缀 n-gram 记忆表的轻量级 SID 编码器,它在 SID 令牌位置为 LLM 提供结构化、前缀条件的表示。与多模态 LLM 中的视觉编码器类似,PrefixMem 可以独立预训练,然后附加到任何 LLM 上进行联合训练。我们在 Pinterest 的大规模数据上,跨多个 LLM 家族进行评估,结果表明,在相同的训练计算量下,PrefixMem 将最深层次 SID 准确率相对提升高达 46%,完整 SID 检索召回率相对提升高达 22%。编码器的优势集中在贪心解码失败的困难样本上,准确率相对提升高达 77%,这证实了 SID 令牌与其他非语言模态一样,受益于专用编码器。

英文摘要

Multimodal LLMs use dedicated encoders to bridge non-language modalities (vision encoders for images, depth models for audio codec tokens) because raw token embeddings alone cannot capture modality-specific structure. We argue that Semantic IDs (SIDs), the hierarchical codes used in generative recommendation, constitute another such modality: a SID level token's meaning depends on its prefix context, yet current systems simply add SID tokens to the vocabulary and rely on training to learn these context-dependent meanings from scratch. We propose PrefixMem, a lightweight SID encoder based on prefix n-gram memory tables that provides the LLM with structured, prefix-conditioned representations at SID token positions. Like vision encoders in multimodal LLMs, PrefixMem can be pre-trained independently and then attached to any LLM for joint training. We evaluate on large-scale data from Pinterest across multiple LLM families and show that PrefixMem improves deepest-level SID accuracy by up to 46% relative and full-SID retrieval recall by up to 22% relative at matched training compute. The encoder's benefit concentrates on hard examples where greedy decoding fails, with up to 77% relative accuracy gains, confirming that SID tokens benefit from a dedicated encoder just as other non-language modalities do.

2606.00312 2026-06-02 math.NA cs.LG cs.NA

Stochastic Rounding Increases Small Singular Values

随机舍入增加小奇异值

Linkai Ma, Tingzhou Yu, Petros Drineas

发表机构 * Department of Computer Science, Purdue University(计算机科学系,普渡大学) Department of Mathematics, University of Alberta(数学系,阿尔伯塔大学)

AI总结 本文证明随机舍入作为低精度浮点运算的量化方案,不仅对极端长宽比矩阵,而且对恒定长宽比矩阵都能提升尾部奇异值簇,从而更广泛地发挥谱正则化作用。

详情
AI中文摘要

在过去的六七年中,随机舍入(SR)作为一种低精度浮点运算的量化方案重新引起了广泛关注,其应用涵盖数值分析和现代机器学习系统。最近的研究表明,SR通过增加极瘦长(或对称地,极矮胖)矩阵的最小奇异值来充当隐式正则化器。在这项工作中,我们从两个方向大幅改进并扩展了这一理解。首先,我们证明SR的正则化效应并不局限于极端长宽比区域:它对于恒定长宽比的矩阵仍然存在。其次,我们证明SR不仅正则化最小奇异值,而是提升谱尾部整个奇异值簇。这些结果共同提供了随机舍入作为谱正则化器的更一般特征,揭示其效应超越极端长宽比,并作用于奇异值谱的更广泛部分。

英文摘要

Over the past half-dozen years, stochastic rounding (SR) has regained significant attention as a quantization scheme for low-precision floating-point arithmetic, with applications spanning numerical analysis and modern machine learning systems. Recent work has shown that SR acts as an implicit regularizer by increasing the smallest singular value of extremely tall-and-thin (or, symmetrically, short-and-fat) matrices. In this work, we substantially sharpen and extend this understanding in two directions. First, we show that the regularization effect of SR is not restricted to extreme aspect ratio regimes: it persists for matrices with constant aspect ratio. Second, we demonstrate that SR does not merely regularize the smallest singular value, but instead lifts entire clusters of singular values at the tail of the spectrum. Together, these results provide a more general characterization of stochastic rounding as a spectral regularizer, revealing that its effects extend beyond extremal aspect ratios and act on a broader portion of the singular value spectrum.