arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.05150 2026-06-04 cs.NE cs.AI

Multi-Column RBF Neural Network Using Adaptive and Non-Adaptive Particle Swarm Optimization

使用自适应和非自适应粒子群优化的多列RBF神经网络

Ammar Hoori, Yuichi Motai

发表机构 * Department of Biomedical Engineering, Case Western Reserve University（生物医学工程系，凯斯西储大学）； Department of Electrical and Computer Engineering, Virginia Commonwealth University（电气与计算机工程系，弗吉尼亚 Commonwealth 大学）

AI总结针对大规模数据集下RBF神经网络训练的可扩展性问题，提出基于粒子群优化（PSO）和自适应PSO（APSO）的多列RBF网络（MC-PSO和MC-APSO），通过并行训练多个RBFN并利用子集专门化提高精度和速度。

Comments 15 Page, Under Review

详情

AI中文摘要

使用梯度下降算法训练的径向基函数神经网络（RBFN）在浅层和深层网络中提供了有效的全连接结构。误差校正（ErrCor）是一种先进的基于梯度的训练方法，它选择最优隐藏单元以提高精度。另外，作为基于种群的算法，粒子群优化算法（PSO）利用群体经验优化RBFN参数，提供全局搜索和对局部最小值的鲁棒性。自适应PSO（APSO）作为PSO的改进变体出现。APSO算法通过在优化过程中动态调整群体参数来提高收敛速度。ErrCor和PSO都显示出改进的结果和有竞争力的收敛性。然而，对于大规模数据集，这些方法面临可扩展性挑战，如过多的核计算和大的隐藏层结构。最近的多列RBFN方法（MCRN）通过在并行系统中部署小型RBFN来提高ErrCor性能。受MCRN成功的启发，我们提出了两种改进PSO性能的新方法：使用PSO的多列RBFN（MC-PSO）和使用APSO的多列RBFN（MC-APSO）。这些方法引入了使用进化群方法训练的并行RBFN结构。每个RBFN独立地在数据集的特定空间子集上使用PSO或APSO算法进行训练。这些经过专门训练的RBFN针对各自的子集进行了定制。在测试期间，只有测试实例邻居所在的选定RBFN对多列输出有贡献。这种专门化提高了精度，而并行性提高了速度。我们在各种基准数据集上评估了所提出的方法。MC-PSO和MC-APSO在精度和召回率方面优于ErrCor、PSO、APSO和MCRN。在大多数实验中，它们还表现出更快的训练和测试时间。

英文摘要

The radial basis function neural network (RBFN) trained with a gradient descending algorithm provides an effective fully connected structure in both shallow and deep networks. The error correction (ErrCor), a state-of-the-art gradient-based training method, selects optimal hidden units to improve accuracy. Alternatively, as a population-based algorithm, the particle swarm optimization algorithm (PSO) uses the swarm experience to optimize RBFN parameters, offering global search and robustness to local minima. Adaptive PSO (APSO) has emerged as an improved variant of PSO. APSO algorithm improves convergence speed by dynamically adjusting swarm parameters during optimization. Both ErrCor and PSO demonstrate improved results and competitive convergence. However, with large datasets, these methods face scalability challenges such as excessive kernel computations and large hidden layer structures. A recent multi-column RBFN approach (MCRN) improves ErrCor performance by deploying small RBFNs in a parallel system. Inspired by MCRN's success, we propose two novel approaches to improve PSO performance: the multi-column RBFN with PSO (MC-PSO) and the multi-column RBFN with APSO (MC-APSO). These methods introduce parallel RBFN structures trained using evolutionary swarm methods. Each RBFN is independently trained on a specific spatial subset of the dataset using either PSO or APSO algorithms. These resulting specialist-trained RBFNs are tailored to their respective subsets. During testing, only selected RBFNs, where the test instance neighbors are located, contribute to the multi-column output. This specialization improves accuracy, while parallelism enhances speed. We evaluate the proposed methods on various benchmark datasets. The MC-PSO and MC-APSO outperform ErrCor, PSO, APSO, and MCRN in terms of accuracy and recall. They also demonstrate faster training and testing times in most experiments.

URL PDF HTML ☆

赞 0 踩 0

2606.05129 2026-06-04 cs.CR cs.LG

Preserving Data Privacy in Learning Causal Structure with Fully Homomorphic Encryption

在全同态加密下学习因果结构时保护数据隐私

Jian Yang, Yuan Tong, Qinbin Li, Zeyi Wen, Xiaofang Zhou

发表机构 * Hong Kong University of Science and Technology (Guangzhou)（香港理工大学（广州））； Hong Kong University of Science and Technology（香港理工大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结针对分布式因果结构学习中的隐私泄露问题，提出基于全同态加密的方法，通过电路简化、除法和对数近似以及SIMD批处理技术，在加密数据上高效完成因果结构学习，并支持扩展到差分隐私。

详情

AI中文摘要

保护数据隐私是结构数据管理和数据挖掘中的重要课题。然而，分布式因果结构学习中的隐私泄露问题是一个持续的挑战，特别是在需要数据传输和计算的情况下。在本文中，我们提出了一种基于全同态加密（FHE）的方法，该方法在密文上进行计算，保持数据在传输和计算过程中加密。然而，由于FHE计算成本高且对除法和对数运算的支持有限，将FHE应用于因果结构学习具有挑战性。为了应对这一挑战，我们提出了一系列新颖的技术，包括（i）电路简化以提高效率，（ii）通过牛顿-拉夫森倒数和泰勒展开近似除法和对数，以及（iii）使用SIMD加速的批处理技术来增强整个学习过程。此外，我们的方法可以轻松扩展到FHE之外，通过展示其可移植性来支持差分隐私。实验结果表明，我们的方法在测试的数据集上实现了与明文版本高度一致且可比的因果结构。最后，即使在FHE的隐私保护下，我们的方法也能在几十分钟内高效且实际地完成因果结构学习。

英文摘要

Preserving data privacy is an important topic in structural data management and data mining. However, the issue of privacy leakage in distributed causal structure learning is a persistent challenge, especially in cases where data transmission and computation are required. In this paper, we propose a method based on fully homomorphic encryption (FHE) that performs calculations on ciphertexts, keeping data encrypted in transition and computation. Nevertheless, adopting FHE to causal structure learning is challenging due to the high computation cost and limited support on division as well as logarithm operations in FHE. To tackle this challenge, we propose a series of novel techniques including (i) circuit simplification for better efficiency, (ii) approximation of division and logarithm through Newton-Raphson Reciprocal and Taylor expansion, and (iii) a batching technique with SIMD-acceleration to enhance the whole learning process. Additionally, our method can be easily extended beyond FHE by demonstration of its portability to support differential privacy. Empirical results show that our method achieves high consistency and comparable causal structure with the plaintext version in the datasets tested. Last, our method is efficient and practical to complete learning causal structures in tens of minutes even under the privacy protection of FHE.

URL PDF HTML ☆

赞 0 踩 0

2606.05124 2026-06-04 cs.GR cs.CV cs.LG

Geometry Gaussians: Decoupling Appearance and Geometry in Gaussian Splatting

几何高斯：在高斯泼溅中解耦外观与几何

Hongyu Zhou, Zorah Lähner

发表机构 * University of Bonn（波恩大学）； Lamarr Institut（拉马尔研究所）

AI总结针对3D高斯泼溅在几何表示与外观渲染间的冲突，提出通过为每个溅射添加几何不透明度参数并配合透明度优化流程，实现几何与外观的解耦，提升复杂场景（尤其是透明物体）的渲染与几何性能。

详情

AI中文摘要

在3D高斯泼溅（3DGS）成功用于新视角合成后，许多工作探索了如何将其用于几何表面表示。然而，直接从3DGS中提取准确的几何信息仍然具有挑战性，且往往会降低外观渲染质量。在这项工作中，我们通过使用完整的地面真值纹理和几何信息进行训练，证明了默认形式的3DGS本质上不适合同时表示纹理和几何。我们还提出了一种简单的解决方案，即为每个溅射应用一个额外的几何不透明度参数，并配合可选的透明度策划优化流程。我们的实验，无论是使用地面真值还是视觉基础模型的几何输入，都表明这一改变在多种数据集上提高了渲染和几何性能，尤其是对于包含透明物体的复杂场景，我们的方法带来了显著提升。

英文摘要

After the success of 3D Gaussian Splatting (3DGS) for novel view synthesis, many works have explored how to also use it for geometric surface representation. However, extracting accurate geometric information directly from 3DGS remains challenging and can often reduce the appearance rendering quality. In this work, we show that 3DGS in its default form is inheritedly unsuited to represent texture and geometry at the same time, by training with complete ground-truth texture and geometry information. We also propose a simple solution by applying a single additional geometry opacity parameter to each splat, together with an optional transparency-curated optimization pipeline. Our experiments, both with ground-truth and vision foundation model geometric input, show that this change leads to improved rendering and geometry performance on a wide variety of dataset, and especially complex scenes with transparent objects benefit significantly from our method.

URL PDF HTML ☆

赞 0 踩 0

2606.05045 2026-06-04 math.DS cs.LG

Learning Control-Affine Reduced-Order Models via Autoencoders

通过自编码器学习控制仿射降阶模型

Ali Mjalled, Martin Mönnigmann

发表机构 * Automatic Control and Systems Theory Ruhr-Universität Bochum（自动控制与系统理论梅尔恩大学波恩分校）

AI总结提出一种利用自编码器同时学习降阶潜在空间和控制仿射状态空间动力学的框架，并扩展为序列模型以提高预测精度，通过反馈线性化验证其有效性。

详情

AI中文摘要

本文提出了一种用于识别控制仿射降阶模型（ROM）的框架。该方法利用自编码器（AE）将高维状态以及潜在的高维输入变换为适合控制仿射状态空间动力学的降维潜在变量。这是通过同时训练AE和状态空间模型实现的。此外，我们将离散ROM公式扩展为基于序列的模型，该模型处理状态和输入历史以提高预测精度，同时保持控制仿射结构。我们通过对导出的模型应用反馈线性化来激励我们的框架，并提出了有效使用它的指南。所提出的框架在两个数值示例上进行了评估，并将其性能与基线模型（其中AE识别具有线性状态空间动力学的潜在空间）进行了比较。评估涉及测试数据上ROM的预测精度及其将系统控制到期望状态或轨迹的有效性。

英文摘要

We present in this paper a framework for the identification of control-affine reduced-order models (ROMs). The proposed method utilizes autoencoders (AEs) to transform the high-dimensional states, and potentially the high-dimensional inputs, into reduced latent ones suitable for control-affine state-space dynamics. This is achieved by simultaneous training of the AE and the state-space model. In addition, we extend the discrete ROM formulation to a sequence-based model, which processes state and input histories to improve prediction accuracy while preserving the control-affine structure. We motivate our framework by applying feedback linearization to the derived models, and we present guidelines for its efficient use. The proposed framework is assessed on two numerical examples and its performance is compared to a baseline model, where the AE identifies a latent space with linear state-space dynamics. The assessment involves evaluating the prediction accuracy of the ROM on test data and its effectiveness in controlling the system to a desired state or trajectory.

URL PDF HTML ☆

赞 0 踩 0

2606.05037 2026-06-04 cs.SE cs.AI

Self-Reflective APIs: Structure Beats Verbosity for AI Agent Recovery

自反式API：结构优于冗长，助力AI代理恢复

Arquimedes Canedo, Grama Chethan

发表机构 * Siemens Digital Industries Software, USA（西门子数字工业软件公司）

AI总结提出自反式API，在验证失败时返回机器可读的结构化建议，使AI代理无需外部推理即可修复请求并重试，在Anthropic模型上将任务完成率提升36.7-40.0个百分点，且每成功令牌效率提升1.8-2.2倍。

详情

AI中文摘要

当AI代理调用API并遇到验证错误时，它需要的不仅仅是哪里出错了——它需要下一步该做什么。自反式API在验证失败时返回一个机器可读的 recovery_feedback.suggestions[] 负载，足以让代理修复请求并在无需外部推理的情况下重试。在一个经过泄露审计的试点实验（每单元N=30，3个LLM，10个对抗性任务）中，结构化建议在Anthropic模型上将任务完成率提升了+36.7至40.0个百分点（Fisher精确检验 p ≤ 0.0022），每成功令牌效率提高了1.8至2.2倍。在gpt-4o-mini上提升不显著（p=0.435）；在计费API上的第二个领域复制确认了这一模式。该比较仅在审计了LLM基准测试中两个未记录的答案泄露类别后才成立。我们提供了 audit_prompt_leakage.py 作为可重用的CI基础设施。代码和数据：https://github.com/arquicanedo/self-reflective-apis。

英文摘要

When an AI agent calls an API and hits a validation error, it needs more than what went wrong -- it needs what to do next. A self-reflective API returns, on validation failure, a machine-readable recovery\_feedback.suggestions[] payload sufficient for the agent to repair the request and retry without external reasoning. On a leak-audited pilot ($N{=}30$ per cell, 3 LLMs, 10 adversarial tasks), structured suggestions lift task-completion rate by $+36.7$--$40.0$pp over plain-English diagnoses on Anthropic models (Fisher's exact $p \le 0.0022$), at $1.8$--$2.2\times$ better per-success token efficiency. The lift is not significant on gpt-4o-mini ($p{=}0.435$); a second-domain replication on a billing API confirms the pattern. The comparison only holds after auditing two undocumented classes of answer leakage in LLM benchmarks. We shipaudit\_prompt\_leakage.py as reusable CI infrastructure. Code and data: https://github.com/arquicanedo/self-reflective-apis.

URL PDF HTML ☆

赞 0 踩 0

2606.05004 2026-06-04 cs.CR cs.AI

SharedRequest: Privacy-Preserving Model-Agnostic Inference for Large Language Models

SharedRequest: 面向大型语言模型的隐私保护模型无关推理

Peihua Mai, Xuanrong Gao, Youlong Ding, Xianglong Du, Wei Liu, Yan Pang

发表机构 * National University of Singapore (Chongqing) Research Institute（新加坡国立大学（重庆）研究院）； Chongqing Key Laboratory of Trusted Perception and Interaction Technology for Intelligent and Connected Vehicles（重庆智能网联车辆可信感知与交互技术重点实验室）； National University of Singapore（新加坡国立大学）； Hebrew University of Jerusalem（耶路撒冷希伯来大学）； State Key Laboratory of Intelligent Vehicle Safety Technology, Chongqing, China（重庆智能车辆安全技术国家重点实验室）； CHONGQING CHANGAN AUTOMOBILE Co., Ltd（重庆长安汽车有限公司）

AI总结提出一种模型无关的隐私保护推理框架SharedRequest，通过批量级别混淆和语义分组实现高效隐私保护，相比差分隐私基线效用提升20%以上，查询成本降低5倍。

Comments accepted by ACL 2026 (main)

详情

AI中文摘要

随着ChatGPT等公共大型语言模型（LLMs）的广泛部署，保护用户提示隐私已成为一个日益关键的问题。现有的隐私保护推理方法要么牺牲效用，要么牺牲效率，并且通常需要特定于模型的修改，限制了其兼容性。在本文中，我们提出了SharedRequest，一个模型无关的隐私保护LLM推理框架，它将隐私保护重新定义为批量级别而非单个提示级别。关键思想是通过将原始提示与噪声变体混合来混淆敏感信息，同时将语义等效的指令分组，以在大量查询批次中分摊推理成本，对LLM响应质量影响最小。该设计独立于LLM架构，无需访问模型参数或进行架构修改。实验结果表明，与先前的差分隐私基线相比，SharedRequest实现了超过20%的效用提升，并且其共享提示机制相比非批量推理将查询成本降低了5倍。

英文摘要

With the widespread deployment of public large language models (LLMs) such as ChatGPT, protecting user prompt privacy has become an increasingly critical issue. Existing privacy-preserving inference methods sacrifice either utility or efficiency, and often require model-specific modifications that limit their compatibility. In this paper, we propose SharedRequest, a model-agnostic framework for privacy-preserving LLM inference that reformulates privacy protection at the batch level rather than the individual-prompt level. The key idea is to obscure sensitive information by mixing original prompts with noisy variants, while grouping semantically equivalent instructions to amortize the inference cost over a large batch of queries with minimal impact on LLM response quality. This design is independent of the LLM architecture, requiring no access to model parameters or architectural modification. Empirical results demonstrate that SharedRequest achieves over $20\%$ higher utility compared to prior differential privacy baselines, and its shared-prompt mechanism reduces query cost by up to $5\times$ compared to non-batched inference.

URL PDF HTML ☆

赞 0 踩 0

2606.04989 2026-06-04 cs.HC cs.RO

What Can Eye Gaze Teach Us About Real-World Cycling? Insights From the Oxford RobotCycle Project

眼动能教会我们关于真实世界骑行的什么？来自牛津RobotCycle项目的见解

Benjamin Hardin, Efimia Panagiotaki, Daniele De Martini, Lars Kunze

发表机构 * University of Oxford（牛津大学）； University of the West of England（西英格兰大学）

AI总结本研究利用可穿戴眼动追踪眼镜，通过分析不同环境（如自行车道、汽车道和共享公交车道）和事件（如超车和行人）下的眼动模式，揭示了骑行中感知危险的潜意识差异，并评估了眼动追踪在估计骑行压力和认知负荷方面的潜力。

详情

AI中文摘要

尽管对骑行情境的身体危险已有较多了解，但对骑行的感知危险知之甚少。此外，危险感知可能在潜意识层面被过滤，因此难以自我报告。为此，这些潜意识感知可以通过眼动等生理指标揭示。本文探讨了英国牛津骑行的感知安全性，并研究了可穿戴眼动追踪眼镜在不同环境和事件下产生关于感知差异见解的能力。本文发现，在自行车道、汽车道和共享公交车道之间，眼动模式发生变化，代表了每种车道类型的不同认知挑战。本文表明，不同交叉路口的眼动模式显著不同，这可能对骑行者的压力有影响。最后，与无事件骑行相比，在超车和道路行人等事件发生时，眼动模式存在差异。本文总结了使用可穿戴眼动追踪器估计压力和骑行者工作量的优点和局限性。

英文摘要

Although much is known about the physical danger of cycling situations, less is understood about the perceived danger of cycling. Furthermore, perception of danger may be filtered at a subconscious level and therefore difficult for one to self-report. To this end, these subconscious perceptions can be revealed through physiological metrics such as eye gaze. This paper explores the perceived safety of cycling in Oxford, United Kingdom and explores the ability of wearable eye tracking glasses to produce insights about the differences in perception under different environments and events. This paper finds that eye gaze patterns change between using bike lanes, car lanes and shared bus lanes, representing different cognitive challenges of each lane type. This paper presents that different intersections have significantly different eye gaze patterns which may have implications for cyclist stress. Finally, eye gaze patterns differ in the presence of events such as passes and pedestrians in the road compared to when cycling with no events. This paper draws conclusions on the benefits and limitations of using wearable eye trackers to estimate stress and cyclist workload.

URL PDF HTML ☆

赞 0 踩 0

2606.04967 2026-06-04 cs.SE cs.AI

From Prompt to Process: a Process Taxonomy and Comparative Assessment of Frameworks Supporting AI Software Development Agents

从提示到流程：支持AI软件开发智能体的框架流程分类与比较评估

Sanderson Oliveira de Macedo

发表机构 * Federal Institute of Goias（戈亚斯联邦理工学院）

AI总结提出六维流程分类法，对六个AI软件开发框架进行评分比较，揭示流程深度与可移植性之间的结构性权衡。

详情

AI中文摘要

AI编程工具不再仅仅是自动补全或聊天助手：它们组织为开发框架，包含流程、角色、工件和验证。最近的调查绘制了用于软件工程的智能体和LLM，但缺少一项以将这些能力转化为流程的操作框架为中心的研究。我们对主要来源进行了定向搜索，采用功能性纳入标准和牵引力测量，选择了六个框架：GitHub Spec Kit、OpenSpec、BMAD Method、Get Shit Done (GSD)、Spec Kitty和Reversa。每个框架通过不同路径攻击AI开发：完整和轻量变体的规范驱动开发、智能体驱动的敏捷规划、智能体上的上下文工程、工作树隔离与审查，以及从遗留系统中恢复操作规范。我们的核心贡献是一个六维流程分类法：规范、上下文、角色、执行、验证和可移植性，并附带一个评分标准，使其成为可复制的工具。我们将其应用于六个框架和一个样本外案例Spec-Flow。两个结果突出。在已经采用某种流程的框架中，存在趋同：孤立的提示失去中心地位，持久工件、工作合同、可追溯性和人工审查成为减少歧义和协调智能体的机制。并且没有框架强覆盖所有六个维度，暴露了流程深度与跨智能体可移植性之间的结构性权衡。我们还发现了反复出现的风险：规范与代码之间的漂移、对生成工件的过度信任、社区扩展的脆弱性、平台依赖性以及缺乏完整流程的基准测试。我们以一个研究议程结束，侧重于中间质量指标、上下文治理、安装安全性和可重复性。

英文摘要

AI tools for programming are no longer just autocomplete or chat assistants: they organize themselves as development frameworks, with process, roles, artifacts and verification. Recent surveys map agents and LLMs for software engineering, but a study centered on the operational frameworks that turn these capabilities into process is missing. We ran a directed search of primary sources, with a functional inclusion criterion and traction measurement, and selected six frameworks: GitHub Spec Kit, OpenSpec, BMAD Method, Get Shit Done (GSD), Spec Kitty and Reversa. Each attacks AI development through a different path: spec-driven development in full and lightweight variants, agent-driven agile planning, context engineering over the agent, worktree isolation and review, and recovery of operational specifications from legacy systems. Our central contribution is a six-dimension process taxonomy: specification, context, roles, execution, validation and portability, with a scoring rubric that turns it into a replicable instrument. We apply it to the six frameworks and an out-of-sample case, Spec-Flow. Two results stand out. Among frameworks that already adopt some process there is convergence: the isolated prompt loses centrality, and persistent artifacts, work contracts, traceability and human review become mechanisms that reduce ambiguity and coordinate agents. And no framework strongly covers all six dimensions, exposing a structural trade-off between process depth and portability across agents. We also found recurring risks: drift between specification and code, excessive trust in generated artifacts, fragility of community extensions, platform dependence and a lack of benchmarks for the complete process. We close with a research agenda for empirical evaluation, focused on intermediate-quality metrics, context governance, installation security and reproducibility.

URL PDF HTML ☆

赞 0 踩 0

2606.04957 2026-06-04 cs.CR cs.IR cs.LG

NLLog: Lightweight, Explainable SOC Anomaly Detection via Log-to-Language Rewriting

NLLog: 通过日志到语言重写的轻量级、可解释的SOC异常检测

Samuel Ndichu, Tao Ban, Seiichi Ozawa, Takeshi Takahashi, Daisuke Inoue

发表机构 * University of Tokyo（东京大学）； National Institute of Information and Communications Technology（日本信息通信技术研究所）

AI总结提出NLLog流水线，将日志模板重写为自然语言句子，结合TF-IDF加权和树集成分类，利用TreeSHAP提供可解释的异常检测，在HDFS、BGL和AIT数据集上实现低误报率和低延迟。

Comments 15 pages, 11 figures, 12 tables; submitted to ACSAC 2026

详情

AI中文摘要

系统生成的日志是安全监控的基础，但其僵化的基于模板的格式阻碍了自动化分析和人类理解。我们提出NLLog（自然语言日志），一个轻量级流水线，它确定性地将解析后的模板重写为WHO-WHAT-SEVERITY句子，通过词频-逆文档频率加权进行池化，使用树集成对会话进行分类，并通过TreeSHAP反向投影证据供分析师审查。在Hadoop分布式文件系统（HDFS）和Blue Gene/L（BGL）语料库上，NLLog超过了两个复现的匹配协议基线；在HDFS、BGL和AIT警报数据集上，它保持了低误报率，且延迟适用于安全运营中心分类。覆盖度、稀疏与密集、忠实性和对抗性消融实验表明，回退充分性依赖于语料库，部署前的覆盖度检查可以揭示细化需求，并且可审计的确定性重写结合轻量级密集编码为日志异常检测和分类提供了可测量的表示层。

英文摘要

System-generated logs underpin security monitoring, yet their rigid template-based format hinders both automated analysis and human comprehension. We present NLLog (Natural-Language Log), a lightweight pipeline that deterministically rewrites parsed templates into WHO-WHAT-SEVERITY sentences, pools them with term-frequency-inverse-document-frequency weighting, classifies sessions with tree ensembles, and back-projects evidence with TreeSHAP for analyst review. On Hadoop Distributed File System (HDFS) and Blue Gene/L (BGL) corpora, NLLog exceeds two reproduced matched-protocol baselines; across HDFS, BGL, and the AIT Alert Data Set, it sustains low false-positive rates with commodity-hardware latency suitable for security operations center triage. Coverage, sparse-versus-dense, faithfulness, and adversarial ablations show that fallback sufficiency is corpus-dependent, that an enrollment-time coverage check can surface refinement requirements before deployment, and that an auditable deterministic rewrite combined with lightweight dense encoding provides a measurable representation layer for log-anomaly detection and triage.

URL PDF HTML ☆

赞 0 踩 0

2606.04952 2026-06-04 cs.HC cs.CL

Clinical Assistant for Remote Engagement Link (CARE-link): A Web-Based Electronic Health Records Software for Managing Diabetes

临床远程参与助手（CARE-link）：一种用于管理糖尿病的基于网络的电子健康记录软件

Prince Ebenezer Adjei, Joshua Teye Tettey, Toufiq Musah, Audrey Agbeve, John Amuasi

发表机构 * Global One Health Research Group, Bernhard Nocht Institute of Tropical Medicine（全球健康研究组，伯恩哈德-诺克特热带医学研究所）； Global Health and Infectious Diseases Research Group, Kumasi Centre for Collaborative Research in Tropical Medicine（全球健康与传染病研究组，库马西协作热带医学研究中心）； Department of Computer Engineering, Kwame Nkrumah University of Science and Technology（计算机工程系，库马西大学科学与技术学院）； Department of Global Health, School of Public Health, Kwame Nkrumah University of Science and Technology（全球健康系，公共卫生学院，库马西大学科学与技术学院）

AI总结 CARE-link是一个开源、基于网络的临床支持平台，通过LLM介导的工作流程连接临床医生和患者，用于改善妊娠期糖尿病管理，系统汇总院外患者生成数据、提供临床决策支持，并通过WhatsApp界面为患者提供管理计划解释和生活方式指导。

2606.04946 2026-06-04 cs.DS cs.LG stat.ML

A General Framework for Dynamic Consistent Submodular Maximization

动态一致子模最大化的通用框架

Paul Dütting, Federico Fusco, Silvio Lattanzi, Ashkan Norouzi-Fard, Ola Svensson, Morteza Zadimoghaddam

发表机构 * ETH Zurich（苏黎世联邦理工学院）； KTH Royal Institute of Technology（皇家理工学院）； University of Toronto（多伦多大学）

AI总结针对全动态环境下的子模最大化问题，提出一个通用算法框架，首次实现具有次线性一致性的常数因子近似解。

Comments Accepted at ICML 2026

详情

AI中文摘要

一致性是动态子模最大化中的一个重要性质，它要求算法始终维持一个接近最优的解，并且在每一步只对解进行少量调整。先前的工作仅在仅插入的情况下探讨了这个问题，其中算法面临 $n$ 个插入的流，并建立了基数约束版本的下界和上界。我们在全动态设置中考虑这个问题，其中操作流可能同时包含插入和删除。我们开发了一个通用框架来设计该设置下的算法，并通过实例化得到了首个具有次线性一致性的常数因子近似。对于基数约束，我们提出了一个 $\frac 12 - O(\varepsilon)$ 近似，其一致性为 $O\left(\frac{1}{\varepsilon^2}\right)$。对于秩-$k$ 拟阵约束，我们构造了一个 $\frac 14 - O(\varepsilon)$ 近似于动态最优解，其一致性为 $O\left(\frac{\log k}{\varepsilon^2}\right)$。

英文摘要

Consistency is an important property in dynamic submodular maximization and entails maintaining a near-optimal solution at all times, making only a small number of adjustments to the solution in each step. Prior work has explored this question for the insertion-only case, where the algorithm faces a stream of $n$ insertions, and has established lower and upper bounds for the cardinality-constrained version of the problem. We consider this question in the fully dynamic setting, where the stream of operations may contain both insertions and deletions. We develop a general framework for designing algorithms for this setting, and instantiate it to obtain the first constant-factor approximations with sublinear consistency. For cardinality constraints, we propose a $\frac 12 - O(\varepsilon)$ approximation that is $O\left(\frac{1}{\varepsilon^2}\right)$ consistent. For rank-$k$ matroid constraints, we construct a $\frac 14 - O(\varepsilon)$ approximation to the dynamic optimum that is $O\left(\frac{\log k}{\varepsilon^2}\right)$ consistent.

URL PDF HTML ☆

赞 0 踩 0

2606.04909 2026-06-04 cs.IR cs.CL

BEATS: Bootstrapping E-commerce Attribute Taxonomies for Search through Iterative Human-AI Collaboration

BEATS: 通过迭代人机协作引导电商搜索属性分类

Yung-Yu Shih, Shang-Yu Su, Tzu-I Ho, Dongzhe Wang, Yun-Nung Chen

发表机构 * National Taiwan University（国立台湾大学）； Rakuten Group, Inc.（拉肯集团）； Taiwan Rakuten Ichiba, Inc.（台湾拉肯Ichiba公司）； Rakuten Asia Pte. Ltd.（拉肯亚洲有限公司）

AI总结针对新兴市场电商平台缺乏结构化属性模式的问题，提出BEATS框架，利用人机协作的LLM流水线从零构建产品属性分类，并通过属性标注提升搜索系统性能。

Comments 6 pages, 1 figure, 5 tables. Accepted to SIGIR 2026 Industry Track. Official version: https://doi.org/10.1145/3805712.3808520

详情

DOI: 10.1145/3805712.3808520

AI中文摘要

新兴市场的电商平台通常使用欠发达的产品目录，仅包含类别分类而缺乏结构化属性模式。缺乏细粒度产品属性限制了搜索能力——阻碍分面过滤、降低查询理解、削弱搜索系统使用的语义表示。我们提出BEATS，一种人机协作的LLM框架，用于从零开始引导产品属性分类。我们的方法扩展了一个多阶段LLM生成流水线，包含两个关键生产阶段：(1) 模型开发者主动进行质量检查以过滤错误输出，以及(2) 领域专家本地工作人员进行人工标注以验证生成的属性。该框架迭代运行——每个生成阶段的提示基于质量检查观察和标注者在连续轮次中的反馈进行优化，逐步提高属性质量。一旦属性分类建立，我们使用LLM对单个产品项目进行结构化属性标注，丰富其上下文表示。丰富的目录直接有益于搜索系统的多个组件：实现细粒度基于属性的过滤、为排序模型提供结构化特征、改善密集检索的语义表示。我们通过在属性丰富的产品数据上训练密集检索模型来验证生成的分类，证明相对于使用原始目录信息的基线有一致的改进。我们的系统已在台湾乐天部署，丰富了9个主要类别，涵盖2,694个子类别，生成了67,277个属性，超过540万产品已使用生成的属性进行标注，并计划丰富整个产品目录。

英文摘要

E-commerce platforms in emerging markets often operate with underdeveloped product catalogs that contain only category taxonomies but lack structured attribute schemas. This absence of fine-grained product attributes limits search capabilities -- preventing faceted filtering, degrading query understanding, and weakening semantic representations used by search systems. We present BEATS, a human-in-the-loop LLM framework for bootstrapping product attribute taxonomies entirely from scratch. Our approach extends a multi-stage LLM generation pipeline with two critical production stages: (1) proactive quality checking by model developers to filter erroneous outputs, and (2) human annotation by domain-expert local staff to validate generated attributes. The framework operates iteratively -- prompts at each generation stage are refined based on quality check observations and annotator feedback across successive rounds, progressively improving attribute quality. Once the attribute taxonomy is established, we employ LLMs to perform structured attribute tagging on individual product items, enriching their contextual representations. The enriched catalog directly benefits multiple components of the search system: enabling granular attribute-based filtering, providing structured features for ranking models, and improving semantic representations for dense retrieval. We validate the generated taxonomy by training dense retrieval models on attribute-enriched product data, demonstrating consistent improvements over baselines using original catalog information. Our system has been deployed at Rakuten Taiwan, enriching 9 major categories spanning 2,694 sub-categories with 67,277 generated attributes, and over 5.4 million products have been tagged with the generated attributes, with plans to enrich the entire product catalog.

URL PDF HTML ☆

赞 0 踩 0

2606.04903 2026-06-04 cs.LO cs.AI cs.MA cs.PL

Provably Auditable and Safe LLM Agents from Human-Authored Ontologies

基于人类编写本体的可审计且安全的LLM智能体

Aaron Sterling

发表机构 * Thistleseeds

AI总结提出Agentic Redux架构，通过类型化λ演算证明其在适当领域上的执行语义正确且决策可审计，并引入本体优先的智能体设计方法。

2606.04877 2026-06-04 cs.LO cs.AI cs.PL cs.SE

Abduction Prover in Isabelle/HOL

Isabelle/HOL中的溯因证明器

Yutaka Nagashima, Daniel Sebastian Goc

发表机构 * Institute of Computer Science, the Czech Academy of Sciences（捷克科学院计算机科学研究所）

AI总结针对基于表达逻辑的证明助手自动化程度低的问题，提出了一种利用溯因推理识别有用猜想并自动构建证明脚本的Isabelle/HOL溯因证明器。

Comments Accepted to Isabelle2026

2606.04845 2026-06-04 stat.ML cs.LG math.ST stat.CO stat.TH

Bayesian learning for the stochastic shortest path problem

随机最短路径问题的贝叶斯学习

Chon Wai Ho, Sumeetpal S. Singh, Jiaqi Guo

发表机构 * Department of Engineering, University of Cambridge, UK（剑桥大学工程系）； School of Mathematics and Physics, University of Wollongong, Wollongong, Australia（沃林根大学数学与物理学院）

AI总结针对随机最短路径问题，提出一种贝叶斯框架，通过贝尔曼最优方程直接构建最优动作价值函数Q*的后验分布，并解决似然松弛导致的不可识别性问题，实现不确定性量化与数据高效学习。

Comments 50 pages, 19 figures

详情

AI中文摘要

序列决策问题通常被建模为马尔可夫决策过程（MDP）。我们关注随机最短路径（SSP）问题，这是一个具有吸收终止状态的无限水平无折扣MDP。我们开发了一个贝叶斯框架，通过与决策任务的交互来学习最优决策策略。具体来说，我们学习最优动作价值函数$Q^*$，但与许多现有的贝叶斯方法不同，我们不依赖于不现实的建模假设和临时近似。我们的方法是通过贝尔曼最优方程直接构建$Q^*$的后验信念。对于确定性奖励，我们将后验描述为具有流形密度的分布。为了简化推理，我们放松了似然，使得勒贝格密度存在。但这样做的代价是产生不可识别性问题。具体来说，放松后的后验可能在不当决策规则上有显著质量，而精确后验则不会。我们还计算了$Q^*$的表格参数化、高斯似然放松和高斯先验下最优动作选择的精确后验概率，这在基准测试研究中很有用。对深海基准测试变体的数值研究验证了我们的发现。我们证明了我们的框架能够忠实地量化不确定性，并且与其他基于时间差分的贝叶斯方法相比，数据效率更高。最后，我们对未来工作提出了建议。

英文摘要

Sequential decision-making problems are often modelled as a Markov decision process (MDP). We focus on the stochastic shortest path (SSP) problem, which is an infinite-horizon undiscounted MDP with absorbing terminal states. We develop a Bayesian framework to learn the optimal decision strategy through interactions with the decision-making task. Specifically, we learn the optimal action-value function $Q^*$, but unlike many existing Bayesian approaches, we do not rely on unrealistic modelling assumptions and ad-hoc approximations. Our approach is to directly construct the posterior beliefs for $Q^*$ through Bellman's optimality equations. For deterministic rewards, we characterise the posterior as a distribution with a manifold density. To facilitate simpler inference, we relax the likelihood so that a Lebesgue density exists. The flip side is to create unidentifiability issues. Specifically, the relaxed posterior can have significant mass on improper decision rules, while the exact posterior will not. We also calculate the exact posterior probabilities for optimal action selections for the tabular parametrisation of $Q^*$, a Gaussian likelihood relaxation and a Gaussian prior, which is useful in benchmarking studies. Numerical studies on variants of the Deep Sea benchmark verify our findings. We demonstrate that our framework faithfully quantifies uncertainty and, compared to other temporal-difference-based Bayesian methodologies, is more data efficient. We conclude with recommendations for future work.

URL PDF HTML ☆

赞 0 踩 0

2606.04769 2026-06-04 cs.CR cs.AI cs.SE

Description-Code Inconsistency in Real-world MCP Servers: Measurement, Detection, and Security Implications

现实世界 MCP 服务器中的描述-代码不一致性：测量、检测与安全影响

Yutao Shi, Xiaohan Zhang, Xiangjing Zhang, Xihua Shen, Hui Ouyang, Huming Qiu, Mi Zhang, Min Yang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对 MCP 服务器中工具描述与代码实现不一致的问题，提出结合结构感知静态分析与 Direct-Reverse-Arbitration 提示方法的自动检测框架 DCIChecker，并在大规模数据集上揭示 9.93% 的不一致率及其安全风险。

Comments Preprint

详情

AI中文摘要

模型上下文协议 (MCP) 已成为赋能大型语言模型 (LLM) 使用外部工具的关键标准。在此生态系统中，LLM 依赖 MCP 服务器提供的自然语言描述来选择和执行函数。这种交互隐含假设工具描述忠实地反映了其底层实现，而该假设在实践中并未得到强制验证。因此，MCP 部署可能遭受名为描述-代码不一致性 (DCI) 的问题，即工具对其能力和安全边界的描述与代码实际行为不一致。本文对现实世界 MCP 服务器中的 DCI 进行了全面研究。我们正式定义了该问题，并提出了一个涵盖功能不一致和未声明副作用的综合分类法。在此分类法指导下，我们开发了 DCIChecker，一个自动框架，结合结构感知静态分析与 Direct-Reverse-Arbitration 提示方法，交叉验证工具描述与实际代码实现。我们将该框架应用于一个大规模数据集，包含从 2,214 个现实世界 MCP 服务器中提取的 19,200 个描述-代码对。我们的测量揭示 DCI 普遍存在，其中 9.93% 的对存在不一致。我们进一步证明 DCI 造成了关键的防御盲点，助长了从操作故障到隐蔽恶意行为等多种风险。最后，我们提出了缓解策略以强制语义一致性并增强新兴智能体生态系统的可靠性。

英文摘要

The Model Context Protocol (MCP) has emerged as a critical standard empowering Large Language Models (LLMs) to utilize external tools. In this ecosystem, LLMs rely on natural language descriptions provided by MCP servers to select and execute functions. This interaction implicitly assumes that tool descriptions faithfully reflect their underlying implementations, while this assumption is not mandatorily verified in practice. As a result, MCP deployments may suffer from a problem named Description-Code Inconsistency (DCI), where a tool's description of its capabilities and security boundaries is not consistent with what the code actually does. In this paper, we present a comprehensive study of DCI in real-world MCP servers. We formally define the problem and propose a comprehensive taxonomy spanning functionality inconsistencies and undeclared side effects. Guided by this taxonomy, we develop DCIChecker, an automated framework that combines structure-aware static analysis with the Direct-Reverse-Arbitration prompting method to cross-validate tool descriptions against actual code implementations. We apply this framework to a large-scale dataset comprising 19,200 description-code pairs extracted from 2,214 real-world MCP servers. Our measurement reveals that DCI is widespread, with 9.93% of these pairs exhibiting inconsistencies. We further demonstrate that DCI creates a critical defense blind spot, facilitating varied risks from operational failures to stealthy malicious behaviors. Finally, we propose mitigation strategies to enforce semantic consistency and enhance the reliability of the emerging agentic ecosystem.

URL PDF HTML ☆

赞 0 踩 0

2606.04757 2026-06-04 math.OC cs.LG

Near-Optimal Decentralized Stochastic Convex Optimization over Networks

网络上的近最优去中心化随机凸优化

Nitai Kluger, Amit Attia, Tomer Koren

发表机构 * Blavatnik School of Computer Science, Tel Aviv University（塔尔大学比拉维克计算机科学学院）； Google Research Tel Aviv（谷歌研究以色列特拉维夫）

AI总结针对去中心化随机光滑凸优化问题，提出一种加速去中心化方法，在总梯度样本预算N下，将可支持的工作节点数提升至M≲√ρ N^{3/4}，并证明其最优性。

Comments 12 papers

详情

AI中文摘要

我们研究去中心化随机光滑凸优化，其中$M$个工作者使用局部随机梯度并通过固定八卦网络上的仅邻居通信来最小化平均目标。该设置中的一个核心问题是，在总梯度样本预算为$N$的情况下，确定可以使用的最大工作者数量，同时仍保持集中式$O(1/\sqrt N)$统计速率。我们引入了一种加速去中心化方法，该方法在最多$\smash{M\lesssim \sqrt\rho\,N^{3/4}}$个工作者时保持该速率，其中$\rho$是八卦网络的谱间隙，改进了先前最佳的最大缩放$\smash{M\lesssim \rho\sqrt N}$。该方法基于一步延迟随机加速方案，使工作者能够将小批量与加速八卦交错进行，同时控制残差分歧，其保证仅对数依赖于最优-局部异质性。我们还为线性跨度去中心化一阶方法建立了匹配的下界，表明该方法在对数因子内是最优的。

英文摘要

We study decentralized stochastic smooth convex optimization, where $M$ workers minimize an average objective using local stochastic gradients and neighbor-only communication over a fixed gossip network. A central question in this setting is to determine the largest number of workers that can be used under a total budget of $N$ gradient samples while still preserving the centralized $O(1/\sqrt N)$ statistical rate. We introduce an accelerated decentralized method that preserves this rate for up to $\smash{M\lesssim \sqrtρ\,N^{3/4}}$ workers, where $ρ$ is the spectral gap of the gossip network, improving the best prior maximal scaling of $\smash{M\lesssim ρ\sqrt N}$. The method is based on a one-step-delayed stochastic acceleration scheme that enables workers to interleave minibatching with accelerated gossip while controlling residual disagreement, and its guarantee depends only logarithmically on the optimum-local heterogeneity. We also establish a matching lower bound for linear-span decentralized first-order methods, showing that the method is optimal up to logarithmic factors.

URL PDF HTML ☆

赞 0 踩 0

2606.04755 2026-06-04 hep-ex cs.AI cs.IR

Archi: Agentic Operations at the CMS Experiment

Archi: CMS实验中的代理操作

Pietro Lugato, Luca Lavezzo, Jason Mohoney, Hasan Ozturk, Muhammad Hassan Ahmed, Juan Pablo Salas, Viphava Ohm, Krittin Phornsiricharoenphant, Gabriele Benelli, Mariarosaria D'Alfonso, Manasvita Joshi, Warren Nam, Aron Soha, Samantha Sunnarborg, Austin Swinney, Jack Tucker, Dmytro Kovalskyi, Tim Kraska, Christoph Paus

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； CMS Collaboration（CMS合作组）； CERN（欧洲核子研究中心）； University of Wisconsin-Madison（威斯康星大学麦迪逊分校）； Fermi National Accelerator Laboratory（费米国家加速器实验室）； Brown University（布朗大学）； Harvard University（哈佛大学）

AI总结提出Archi开源框架，整合异构数据源并部署可配置、私有的代理，用于CMS实验计算操作支持，在真实查询中表现有效。

详情

AI中文摘要

我们提出Archi，一个面向科学合作的开源端到端框架，它结合了异构数据源的系统化摄取和组织，以及可配置、私有且可扩展的代理的部署，这些代理能够检索和推理这些数据。自2026年2月起，Archi的一个实例已部署在CERN大型强子对撞机的CMS实验计算操作团队中，作为技术操作员的辅助代理，通过结合文档、历史数据和实时监控系统提供检索和分析能力。我们根据操作员反馈和从生产使用中收集的问题集对系统进行评估，这些问题由人工和自动化专家组评分。该系统在操作任务中证明有效，解决了CMS操作员提出的真实世界查询。我们还观察到，本地托管的开源权重模型表现具有竞争力，从而能够对敏感数据进行完全私有管理。

英文摘要

We present Archi, an open-source, end-to-end framework for scientific collaborations that combines the systematic ingestion and organization of heterogeneous data sources with the deployment of configurable, private, and extensible agents that retrieve and reason over them. An instance of Archi has been deployed for the Computing Operations team of the CMS experiment at CERN's LHC since February 2026 as a support agent for technical operators, offering retrieval and analysis capabilities by combining documentation, historical data, and live monitoring systems. We evaluate the system on operator feedback and a question set collected from production usage, graded by human and automated panels. The system proves effective at operational tasks, resolving real-world queries posed by CMS operators. We also observe that locally-hosted, open-weight models perform competitively, enabling fully private management of sensitive data.

URL PDF HTML ☆

赞 0 踩 0

2606.04739 2026-06-04 cs.SE cs.AI

Revisiting Vul-RAG: Reproducibility and Replicability of RAG-based Vulnerability Detection with Open-Weight Models

重新审视Vul-RAG：基于RAG的漏洞检测的可复现性与可复制性——使用开放权重模型

Sabrina Kaniewski, Fabian Schmidt, Tobias Heer

发表机构 * Institute for Secure Networked Systems, Esslingen University（安全网络系统研究所，埃斯林根大学）； Institute for Intelligent Systems, Esslingen University（智能系统研究所，埃斯林根大学）

AI总结本研究通过本地部署和多种开放权重模型，复现并扩展了Vul-RAG框架，发现其性能存在约0.30成对准确率的上限，且模型能力提升无法显著改善性能。

Comments Accepted at AI&CCPS 2026 workshop, co-located with the 21st International Conference on Availability, Reliability and Security (ARES 2026). This is the authors' preprint version

详情

AI中文摘要

大型语言模型（LLMs）在自动化软件漏洞检测方面展现出强大潜力，尤其是在检索增强生成（RAG）设置中。然而，对于依赖专有模型和API的方法，可复现性和可复制性在很大程度上仍未得到探索，这引发了一个问题：报告的结果是否具有普遍性，还是主要依赖于特定的模型选择。在这项工作中，我们对Vul-RAG进行了可复现性研究，Vul-RAG是一个基于RAG的源代码漏洞检测框架，它利用高级漏洞知识增强LLMs。我们首先使用报告中的开放权重基线模型，在完全本地和开放权重的设置下复现了结果。然后，我们将评估扩展到一组多样化的最新开放权重LLMs，包括代码专用、通用和推理模型，参数规模各异。结果证实，Vul-RAG的发现可以在本地部署下复现，但存在微小偏差。在所有评估的模型中，我们观察到性能在约0.30成对准确率（即漏洞函数和修补函数都被正确分类的代码对）处达到平台期。值得注意的是，即使对于更新更先进的模型，这一平台期仍然存在，表明仅凭模型能力的提升并不能显著提高性能。最后，我们讨论了检测效果、模型能力和模型规模之间的实际影响和权衡。实现和评估工件可在 https://github.com/hs-esslingen-it-security/revisiting-Vul-RAG 公开获取。

英文摘要

Large language models (LLMs) have shown strong potential for automated software vulnerability detection, particularly in retrieval-augmented generation (RAG) settings. However, for approaches relying on proprietary models and APIs, reproducibility and replicability remain largely unexplored, raising the question of whether reported results generalize or depend primarily on specific model choices. In this work, we present a reproducibility study of Vul-RAG, a RAG-based framework for source code vulnerability detection that enhances LLMs with high-level vulnerability knowledge. We first replicate the results in a fully local and open-weights setting using the reported open-weight baseline models. We then extend the evaluation to a diverse set of recent open-weight LLMs, including code-specialized, general-purpose, and reasoning models of varying parameter sizes. The results confirm that the findings of Vul-RAG are reproducible under local deployment, but with minor deviations. Across all evaluated models, we observe a performance plateau at approximately 0.30 pairwise accuracy (code pairs for which both the vulnerable and the patched function are correctly classified). Notably, this plateau persists even for more recent and advanced models, indicating that improvements in model capacity alone do not substantially enhance performance. Finally, we discuss practical implications and trade-offs between detection effectiveness, model capabilities, and model scale. Implementation and evaluation artifacts are publicly available at https://github.com/hs-esslingen-it-security/revisiting-Vul-RAG.

URL PDF HTML ☆

赞 0 踩 0

2606.04689 2026-06-04 quant-ph cs.LG

QPredSGG: Hybrid Quantum Predicate Learning for Long-Tailed Scene Graph Generation

QPredSGG：面向长尾场景图生成的混合量子谓词学习

Prerana Ramkumar, Nouhaila Innan, Muhammad Shafique

发表机构 * Department of Computer Science, University of Waterloo（1. 温哥华大学计算机科学系）； Machine Learning Research Group, University of Waterloo（2. 温哥华大学机器学习研究组）

AI总结针对场景图生成中长尾谓词分布导致的分类偏差，提出用量子谓词头（QP-Head）替换经典谓词头，通过振幅嵌入和强纠缠层压缩特征，在Visual Genome 150上实现参数高效的长尾关系分类。

Comments 11 pages, 5 figures

详情

AI中文摘要

场景图生成（SGG）需要对物体及其交互进行关系推理，但性能常受严重的长尾谓词不平衡限制。经典SGG模型通常依赖数据集统计，导致预测偏向频繁关系而非细粒度语义谓词。尽管现有去偏策略提高了平均召回率，但当前框架中的谓词分类仍常依赖参数成本高的大型经典决策模块。本文通过用加权交叉熵训练的量子谓词头（QP-Head）替换因果特征增强网络（CFEN）中的经典谓词头，引入了一种用于SGG的混合量子谓词分类器。据我们所知，这是首批评估混合量子架构在Visual Genome 150上进行场景图谓词分类的研究之一。我们研究了量子比特数、编码策略、纠缠结构和电路深度对关系预测的影响。最佳4量子比特QP-Head使用振幅嵌入和强纠缠层将4096维对特征压缩为16维量子兼容表示，对应256倍缩减。它实现了57.25%的mR@100，而经典CFEN参考为41.1%，同时仅使用96个可训练量子参数。扩展到8量子比特保持了强大的长尾性能，达到55.38%的mR@100，使用384个量子参数，而深度分析显示了表达能力和运行时间开销之间的权衡。这些结果表明，紧凑的混合量子谓词头可以支持复杂视觉推理任务中参数高效的长尾关系分类。

英文摘要

Scene Graph Generation (SGG) requires relational reasoning over objects and their interactions, but performance is often limited by severe long-tail predicate imbalance. Classical SGG models frequently rely on dataset statistics, leading to biased predictions toward frequent relations rather than fine-grained semantic predicates. Although existing debiasing strategies improve mean recall, predicate classification in current frameworks still often depends on large classical decision modules with high parameter cost. This work introduces a hybrid quantum predicate classifier for SGG by replacing the classical predicate head in Causal Feature Enhancement Network (CFEN) with a Quantum Predicate Head (QP-Head) trained using weighted cross-entropy. To the best of our knowledge, this is among the first studies to evaluate a hybrid quantum architecture for scene graph predicate classification on Visual Genome 150. We study the effect of qubit count, encoding strategy, entangling structure, and circuit depth on relational prediction. The best 4-qubit QP-Head uses Amplitude Embedding and Strongly Entangling Layers to compress 4096-dimensional pair features into a 16-dimensional quantum-compatible representation, corresponding to a 256$\times$ reduction. It achieves an mR@100 of 57.25%, compared with 41.1% for the classical CFEN reference, while using only 96 trainable quantum parameters. Scaling to 8 qubits maintains strong long-tail performance, reaching an mR@100 of 55.38% with 384 quantum parameters, while the depth analysis shows a trade-off between expressibility and runtime overhead. These results suggest that compact hybrid quantum predicate heads can support parameter-efficient long-tail relational classification in complex visual reasoning tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.04680 2026-06-04 eess.AS cs.CL cs.SD

Read What You Hear: Reference-Free Hypotheses Evaluation with Acoustic Discrepancy

听你所写：基于声学差异的无参考假设评估

Zhihan Li, Hankun Wang, Yiwei Guo, Bohan Li, Xie Chen, Kai Yu

发表机构 * X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, China（X-LANCE实验室、计算机科学学院、上海交通大学、中国）； MoE Key Lab of Artificial Intelligence, Jiangsu Key Lab of Language Computing, China（人工智能MOE重点实验室、江苏省语言计算重点实验室、中国）

AI总结提出READ指标，利用预训练自回归TTS模型计算语音与文本假设的声学差异，无需参考转录即可评估ASR假设，并在噪声条件下实现高达20%的相对错误率降低。

Comments Submitted to Interspeech 2026. 6 pages, 4 figures

2606.04670 2026-06-04 math.NA cs.LG cs.MS cs.NA

Fitting scattered data with optional monotonicity constraints on GPU: LipFit package

在GPU上拟合带有可选单调性约束的散乱数据：LipFit包

Gleb Beliakov

发表机构 * School of Information Technology, Deakin University（德肯大学信息科技学院）

AI总结提出一种多变量散乱数据插值与逼近方法，在满足单调性约束下产生最优Lipschitz连续逼近，并实现GPU并行化的Python包LipFit。

2606.04658 2026-06-04 cs.NE cs.LG

U-Net-Accelerated Quality-Diversity Optimization for Climate-Adaptive Urban Layouts

U-Net加速的质量-多样性优化用于气候适应性城市布局

Alexander Hagg, Tania Guerrero, Dirk Reith

发表机构 * Institute of Technology, Resource and Energy-efficient Engineering (TREE)（技术学院，资源与能源高效工程院（TREE））； Bonn-Rhein-Sieg University of Applied Sciences（博恩-莱茵-锡格应用科学大学）； Fraunhofer Institute for Algorithms and Scientific Computing (SCAI)（弗劳恩霍夫算法与科学计算研究所（SCAI））

AI总结提出用U-Net替代慢速物理模拟器作为代理模型，结合离线MAP-Elites算法，实现快速生成数千个多样化且经气候评估的建筑布局。

详情

AI中文摘要

优化城市布局以适应气候需要在建筑密度与冷空气通风之间取得平衡。由于基于物理的气候模拟计算成本高昂，规划者通常只能评估少于十个手动设计方案。质量-多样性（QD）算法提供了一种系统性地照亮设计空间的方法，但需要代理模型才能实用。在本文中，我们用一个空间深度学习代理（U-Net）替换了缓慢的监管物理模拟器，并将其嵌入离线MAP-Elites循环中。我们系统地比较了这种空间方法与传统的高斯过程（GP）代理在不同训练数据策略（准随机Sobol采样 vs. 主动QD自举）下的表现。结果表明，标量GP代理在随机样本上训练时灾难性地失败，需要昂贵的、主动生成的QD存档才能泛化。相比之下，U-Net的空间归纳偏置使其能够稳健地学习底层物理映射（R² = 0.996），完全独立于训练数据来源。这使得离线QD优化仅需一次性随机训练样本批次即可实现高度准确的适应度排名（ρ = 0.994）。最终流程部署在开源OpenSKIZZE工具中，能在十分钟内生成数千个多样化且经气候评估的建筑布局。

英文摘要

Optimizing urban layouts for climate adaptation requires balancing building density with cold-air ventilation. Because physics-based climate simulations are computationally expensive, planners typically evaluate fewer than ten manual designs. \gls{qd} algorithms offer a way to systematically illuminate the design space, but they require surrogate models to be practical. In this paper, we replace a slow, regulatory physics simulator with a spatial deep-learning surrogate (U-Net) inside an offline MAP-Elites loop. We systematically compare this spatial approach with a traditional \gls{gp} surrogate across different training-data strategies (quasi-random Sobol sampling vs.\ active \gls{qd} bootstrapping). Our results reveal that scalar \gls{gp} surrogates fail catastrophically when trained on random samples, requiring expensive, actively generated \gls{qd} archives to generalize. In contrast, the spatial inductive bias of the U-Net allows it to learn the underlying physics mapping robustly ($R^2 = 0.996$), completely independent of the training data source. This allows offline \gls{qd} optimization to achieve highly accurate fitness rankings ($ρ= 0.994$) using only a one-time batch of random training samples. The resulting pipeline, deployed in the open-source OpenSKIZZE tool, generates thousands of diverse, climate-evaluated building layouts in under ten minutes.

URL PDF HTML ☆

赞 0 踩 0

2606.04603 2026-06-04 cs.IR cs.LG stat.ML

Distributional Approximate Nearest Neighbour Search for Uncertainty-Aware Retrieval

面向不确定性感知检索的分布近似最近邻搜索

Olivier Jeunen

发表机构 * Antwerp, Belgium（比利时安特卫普）

AI总结提出DINOSAUR框架，通过为每个物品采样多个嵌入并构建索引，在检索时对用户嵌入进行采样，以隐式边缘化嵌入不确定性，从而在不改变模型架构或索引基础设施的情况下提升长尾物品的覆盖。

详情

AI中文摘要

近似最近邻搜索索引构成了现实世界推荐系统的骨干，支持在百万级物品目录上进行实时候选检索。通常，为每个用户和每个物品学习一个点估计嵌入。在服务时，用户嵌入查询索引以获取相关物品。由于这些表示是从稀疏交互数据中学习的，它们带有噪声，可能无法捕捉所有有助于“相关性”的细微差别——忽略了其固有的基本不确定性。结果是检索管道系统性地偏向于少数嵌入估计良好的热门头部物品，而牺牲了长尾中多数小众、多样和偶然的内容。我们提出了DINOSAUR（面向不确定性感知检索的分布近似最近邻搜索）：一个简单且与基础设施兼容的框架，将嵌入不确定性纳入候选生成。DINOSAUR不为点估计建立索引，而是为每个物品采样$S_i$个嵌入，并在这一增强集上构建索引。类似地，在查询时，对用户嵌入进行采样。这种双边的随机检索过程隐式地边缘化了嵌入不确定性，无需改变模型架构或ANN索引基础设施。在分析方面，我们展示了当不确定性消失时，DINOSAUR恢复标准的点估计检索，并刻画了增加的嵌入方差如何扩展不确定物品可检索的潜在空间区域。可重复的实证观察与这些预期一致，显示出在离线召回率小幅损失的情况下，覆盖率大幅提升。

英文摘要

Approximate Nearest Neighbour search indices form the backbone of real-world recommender systems, enabling real-time candidate retrieval over million-item catalogues. Typically, a single point estimate embedding is learnt for every user and every item. At serving time, the user embedding queries the index for relevant items. Since these representations are learnt from sparse interaction data, they are noisy and might fail to capture all the nuances that contribute to ``relevance'' -- ignoring the fundamental uncertainty that is inherent to them. The result is a retrieval pipeline that is systematically biased toward the small minority of popular head items with well-estimated embeddings, at the expense of the long-tail majority of niche, diverse, and serendipitous content. We propose DINOSAUR (Distributional Approximate Nearest Neighbour Search for Uncertainty-Aware Retrieval): a simple and infrastructure-compatible framework to incorporate embedding uncertainty into candidate generation. Rather than indexing point estimates, DINOSAUR samples $S_i$ embeddings per item and constructs an index on this augmented set. Analogously, at query time, a user embedding is sampled. This two-sided stochastic retrieval process implicitly marginalises over embedding uncertainty, without requiring changes to model architecture or ANN index infrastructure. On the analytical side, we show that DINOSAUR recovers standard point-estimate retrieval as uncertainty vanishes, and we characterise how increased embedding variance expands the regions of latent space in which uncertain items are retrievable. Reproducible empirical observations align with these expectations, showing large coverage gains with small losses in offline recall.

URL PDF HTML ☆

赞 0 踩 0

2606.04594 2026-06-04 cs.DC cs.AI cs.SE

Ekka: Automated Diagnosis of Silent Errors in LLM Inference

Ekka: LLM推理中静默错误的自动诊断

Yile Gu, Zhen Zhang, Shaowei Zhu, Xinwei Fu, Jun Wu, Yida Wang, Baris Kasikci

发表机构 * University of Science and Technology of China（中国科学技术大学）； Tsinghua University（清华大学）

AI总结提出Ekka系统，通过差分调试对齐比较中间执行状态，自动诊断LLM推理框架中的静默错误，在真实错误基准上达到80% pass@1和88% pass@5的诊断准确率。

Comments ICML 2026

详情

AI中文摘要

LLM服务框架随着复杂的软件栈和大量优化而快速发展。快速开发过程可能引入静默错误，即输出质量在没有任何显式错误信号的情况下悄然下降。由于高层症状与底层根本原因之间存在巨大的语义鸿沟，诊断静默错误非常困难。我们观察到，通过利用语义正确的参考实现，静默错误的诊断可以有效地构建为差分调试问题。我们提出了Ekka，一个自动诊断系统，通过系统地对齐和比较目标框架与参考框架之间的中间执行状态来识别根本原因。我们构建了一个来自流行服务框架的真实静默错误基准，Ekka显示出80%的pass@1诊断准确率和88%的pass@5诊断准确率，优于现有系统。Ekka还诊断了服务框架中的4个新静默错误，所有错误均已得到开发者确认。

英文摘要

LLM serving frameworks are quickly evolving with a complex software stack and a vast number of optimizations. The rapid development process can introduce silent errors where output quality silently degrades without any explicit error signals. Diagnosing silent errors is notoriously difficult due to the substantial semantic gap between the high-level symptoms and the low-level root causes. We observe that diagnosis of silent errors can be effectively framed as a differential debugging problem by leveraging the existence of semantically correct reference implementations. We propose Ekka, an automated diagnosis system that identifies root causes by systematically aligning and comparing intermediate execution states between a target and a reference framework. We constructed a benchmark of real-world silent errors from popular serving frameworks, where Ekka shows 80% pass@1 diagnosis accuracy and 88% pass@5 diagnosis accuracy, outperforming state-of-the-art systems. Ekka also diagnoses 4 new silent errors from serving frameworks, all of which have been confirmed by the developers.

URL PDF HTML ☆

赞 0 踩 0

2606.04592 2026-06-04 cs.CY cs.AI cs.HC

Synthetic Personalities: How Well Can LLMs Mimic Individual Respondents Using Socio-Economic Microdata?

合成人格：LLM 如何使用社会经济微观数据模仿个体受访者？

Leonard Kinzinger, Jochen Hartmann

发表机构 * Technical University of Munich（慕尼黑技术大学）

AI总结研究利用德国社会经济面板数据构建个体级数字孪生，通过评估不同构建方法（模型、信息深度、嵌入方式、推理模式）对200万以上孪生响应的准确性，发现信息深度在75%熵分位数达到成本效益帕累托点，最佳单元准确率达78.8%。

详情

AI中文摘要

基于LLM的数字孪生有望扩展和加速市场研究，但大多数已发表的孪生要么是基于少数人口统计问题的粗略角色机器人，要么是基于专门收集的调查和访谈记录构建的详细个体级孪生。这两种设置都不涉及营销实践中操作上最相关的情况：从企业通过CRM系统、忠诚度计划和重复调查积累的现有异构面板数据中构建详细的个体孪生。我们从德国社会经济面板（SOEP）构建详细的个体级孪生，并在一个$3 \times 5 \times 2 \times 2$的构建方法网格中评估它们，该网格涵盖三个开放权重的LLM、五个按归一化香农熵排序的累积信息深度、两种嵌入方法和两种推理模式，对500名参与者和183个保留问题评分超过210万个孪生响应。孪生质量随信息深度提高，但超过75%熵分位数后收益递减，该分位数相对于性能最佳的100%单元充当成本效益帕累托点。将嵌入从叙述性角色摘要切换到原始对话历史（过去响应）在100%深度下每个模型-推理单元中提高了保留准确率，而显式思考模式提高了秩次相关性但不改变准确率。最佳单元准确率达到78.8%，Fisher-$z$相关性在SOEP保留评估集上达到$r = 0.590$。研究结果表明，基于孪生的市场研究不再受数据设计限制，而是受项目数量、模型选择和本文现在映射的一小部分构建级决策限制。

英文摘要

LLM-based digital twins promise to scale and accelerate market research, but most published twins are either coarse persona bots conditioned on a few demographic questions or detailed individual-level twins built on purpose-collected surveys and interview transcripts. Neither setup speaks to the operationally most relevant case for marketing practice: building detailed individual twins from the pre-existing heterogeneous panel data that firms already accumulate through CRM systems, loyalty programs, and repeat surveys. We construct detailed individual-level twins from the German Socio-Economic Panel (SOEP) and evaluate them across a $3 \times 5 \times 2 \times 2$ construction-method grid that covers three open-weights LLMs, five cumulative information depths ranked by normalized Shannon entropy, two embedding methods, and two reasoning modes, scoring over 2.1 million twin responses on 500 participants and 183 held-out questions. Twin quality rises with information depth but with diminishing returns past the 75 percent entropy quartile, which acts as a cost-efficient Pareto point relative to the best-performing 100 percent cells. Switching the embedding from a narrative persona summary to a raw dialog history of past responses raises hold-out accuracy in every model-by-reasoning cell at the 100 percent depth, while an explicit thinking mode raises rank-order correlation without moving accuracy. Best-cell accuracy reaches 78.8 percent and Fisher-$z$ correlation reaches $r = 0.590$ on the SOEP held-out evaluation set. The findings suggest that twin-based market research is no longer gated by data design, but by item volume, model selection, and a small set of construction-level decisions that this paper now maps.

URL PDF HTML ☆

赞 0 踩 0

2606.04582 2026-06-04 physics.comp-ph cs.LG physics.app-ph

Reconstructing Unobservable Temperature Fields via Simulation-Aided Intelligent Sensing

通过仿真辅助智能感知重建不可观测温度场

Monika Stipsitz, Hèlios Sanchis-Alepuz, Jacob Reynvaan, Silvester Sabathiel

发表机构 * Silicon Austria Labs（硅酸奥地利实验室）； Republic of Austria（奥地利共和国）； Styrian Business Promotion Agency（施蒂里亚商业促进局）； federal state of Carinthia（卡林西亚联邦州）； Upper Austrian Research（上奥地利研究）； Austrian Association for the Electric and Electronics Industry（奥地利电子电气工业协会）

AI总结提出基于随机物理仿真生成数据集的方法，训练神经网络从稀疏传感器重建内部温度场，实现实时在线监测。

Comments Presented at IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Nancy, France, 2026

2606.04576 2026-06-04 stat.ML cs.LG econ.EM q-fin.RM

ReSGA: A Large Tail Risk Model for Learning Value-at-Risk and Expected Shortfall

ReSGA: 一种用于学习风险价值和预期缺口的大尾部风险模型

Yichi Zhang, Ke Zhu, Zhoufan Zhu

发表机构 * Hong Kong University（香港大学）； Xiamen University（厦门大学）

AI总结提出检索增强自分组自编码器（ReSGA），利用数百万参数捕捉资产横截面依赖和长期时间动态，在1926-2023年美国股票数据上优于12种基准模型，并通过新规模增强左尾动量策略实现经济收益。

详情

AI中文摘要

学习风险价值（VaR）和预期缺口（ES）对于有效管理金融风险至关重要。在大数据时代，参数有限的现有方法容易受到模型错误设定的影响。为了解决这一局限性，我们提出了一种大尾部风险模型——检索增强自分组自编码器（ReSGA），该模型设计有数百万个参数，利用资产的特征来挖掘丰富的横截面依赖性和长期时间动态。应用于1926年至2023年的月度美国股票收益数据，包含153个公司特征，ReSGA在样本外损失和统计回测方面优于十二种计量经济学和机器学习竞争对手。此外，其预测优势可以通过一种新的规模增强左尾动量策略构建的多空十分位投资组合转化为显著的经济收益。为了阐明复杂性的作用，我们进一步进行了系统的规模分析，并证明联合VaR-ES预测的改进主要由数据复杂性驱动，而非模型复杂性。最后，我们的组重要性和迁移学习分析展示了ReSGA的可解释性和跨市场泛化能力。

英文摘要

Learning Value-at-Risk (VaR) and Expected Shortfall (ES) is important for managing financial risks effectively. Existing approaches with limited parameters are vulnerable to model misspecification in the era of big data. To address this limitation, we propose a large tail risk model, the retrieval-enhanced self-grouping autoencoder (ReSGA), which is designed with millions of parameters to exploit the rich cross-sectional dependence and long-term temporal dynamics of assets using their characteristics. Applied to monthly US equity returns from 1926 to 2023 with 153 firm characteristics, ReSGA outperforms twelve econometric and machine learning competitors in terms of out-of-sample loss and statistical backtesting. In addition, its forecast advantages can translate into significant economic gains from long-short decile portfolios that are constructed by a new size-enhanced left-side momentum strategy. To clarify the role of complexity, we further conduct a systematic scaling analysis and demonstrate that improvements in joint VaR-ES forecasting are primarily driven by data complexity rather than model complexity. Finally, our analyses of group-importance and transfer-learning exhibit the interpretability and cross-market generalizability of ReSGA.

URL PDF HTML ☆

赞 0 踩 0

2606.04527 2026-06-04 cs.MM cs.CV cs.GR

Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

Echo-Infinity: 学习演化记忆用于实时无限视频生成

Yuxuan Bian, Zeyue Xue, Songchun Zhang, Shiyi Zhang, Weiyang Jin, Yaowei Li, Junhao Zhuang, Haoran Li, Jie Huang, Haoyang Huang, Nan Duan, Qiang Xu

发表机构 * The Chinese University of Hong Kong（香港中文大学）； Joy Future Academy, JD（joy future academy）； The Hong Kong University of Science and Technology（香港科学与技术大学）； Tsinghua University（清华大学）； The University of Hong Kong（香港大学）； University of Science and Technology of China（中国科学技术大学）

AI总结提出Echo-Infinity框架，通过可学习的演化记忆以恒定成本动态过滤、抽象和压缩任意长度历史，结合统一相对RoPE方案，首次实现24小时实时无限视频生成。

Comments Website: https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity/

详情

AI中文摘要

我们提出了Echo Infinity，一个面向实时无限视频生成的自回归（AR）框架，它采用可学习的演化记忆，以恒定成本动态过滤、抽象和压缩任意长度的历史。现有方法主要使用预定义的KV缓存调度、固定比例启发式压缩或推理时的RoPE适配来管理记忆。这些设计由于有限的缓存窗口和忽略自回归生成噪声，不可避免地丢失历史信息并放大复合误差。受人类记忆巩固的启发，Echo-Infinity用可学习的记忆查询替代手工设计的记忆管理，这些查询通过注意力和门控机制在过去的帧从局部窗口中被驱逐时更新。查询与视频扩散变换器（DiTs）进行端到端优化，形成一种演化记忆，支持任意压缩比，计算量恒定且与视频长度无关。它们还充当可泛化的生成先验，即使仅使用优化后的初始状态也能提高质量。我们进一步引入了统一相对RoPE方案，它将锚定帧固定从id 0开始，并让最新帧的id在训练和推理过程中最多增长到DiTs预训练的最大时间RoPE id，从而将模型从有限的RoPE约束中解放出来，并缩小训练-测试的RoPE外推差距。在长视频和短视频生成中，Echo-Infinity达到了最先进的性能，并且据我们所知，首次展示了有前景的24小时（>130万帧）实时滚动生成，为无限视频生成提供了一条实用路径。

英文摘要

We present Echo Infinity, an autoregressive (AR) framework towards real-time infinite video generation that employs a learnable evolving memory to dynamically filter, abstract, and compress any-length history at constant cost. Existing methods mainly curate memory with predefined KV-cache schedules, fixed-ratio heuristic compression, or inference-time RoPE adaptation. These designs inevitably lose historical information and amplify compounding errors due to their limited cache window and ignorance of autoregressive generation noise. Inspired by human memory consolidation, Echo-Infinity replaces handcrafted memory curation with learnable Memory Query, which are updated by attention and a gating mechanism when past frames are evicted from the local window. The queries are optimized end-to-end with the video diffusion transformers (DiTs), forming an evolving memory that supports arbitrary compression ratios with constant computation independent of video length. They also act as a generalizable generation prior, improving quality even when only the optimized initial state is used. We further introduce Unified Relative RoPE Recipe, which anchors the sink frames to start from id 0 and lets the newest frame id grow at most to the DiTs' pretrained maximum temporal RoPE id throughout training and inference, freeing the model from the finite RoPE constraint and closing the train-test RoPE extrapolation gap. In long and short video generation, Echo-Infinity achieves state-of-the-art performance, and, to our knowledge, demonstrates promising 24-hour (>1.3 M frames) real-time rollouts for the first time, suggesting a practical path toward infinite video generation.

URL PDF HTML ☆

赞 0 踩 0

2606.04522 2026-06-04 cs.IR cs.AI cs.DB cs.LG

ANN Search: Recall What Matters

ANN搜索：召回真正重要的

Dimitris Dimitropoulos, Nikos Mamoulis

发表机构 * University of Ioannina（伊奥尼亚大学）； Archimedes, Athena RC（阿基米德，雅典RC）

AI总结本文提出用逆近似比1/Ratio@k替代Recall@k来评估近似最近邻搜索质量，实验表明前者能更准确反映实际效用并降低计算开销。

详情

AI中文摘要

近似最近邻（ANN）搜索已成为信息检索和现代机器学习任务（从分类到检索增强生成）的核心原语。社区主要通过给定Recall@k（检索到的真实精确最近邻的比例）下的吞吐量来评估和调优ANN算法。我们认为，ANN搜索真正重要的是检索结果的质量，而非它们与真实kNN集合的重叠。我们证明，使用Recall@k评估检索质量会带来不必要的计算开销，并研究用逆近似比1/Ratio@k替代它。1/Ratio@k评估检索到的邻居与真实邻居之间距离的差异。它无需判断、无需超参数，仅通过标准ANN基准输入即可计算。我们在涵盖广泛内在维度的多样化数据集上对最先进的ANN算法进行基准测试，从效率、下游分类和检索增强生成三个维度全面评估这两个指标。在效率方面，优化1/Ratio@k达到操作质量阈值所需的计算成本远低于Recall@k。在下游任务中，即使Recall@k显著下降，性能指标（标签精度、语义相似度、BERTScore和LLM评分质量）仍保持高度稳定。相反，逆近似比紧密反映了这种稳定性，比Recall@k更好地追踪实际效用。最终，虽然Recall@k夸大了近似的真实成本，但1/Ratio@k提供了更准确、可部署的ANN实际质量代理。

英文摘要

Approximate nearest neighbor (ANN) search has become a core primitive in information retrieval and modern machine learning tasks, from classification to retrieval-augmented generation. The community evaluates and tunes ANN algorithms primarily on their throughput at a given Recall@k, the fraction of true exact neighbors retrieved. We argue that what really matters in ANN search is the quality of the retrieved results and not their overlap with the true kNN set. We show that using Recall@k to assess retrieval quality forces unnecessary computational overhead and investigate replacing it by 1/Ratio@k, the inverse approximation ratio. 1/Ratio@k evaluates the differences between the distances of the retrieved and true neighbors. It is judge-free, hyperparameter-free, and computable from standard ANN benchmark inputs alone. We benchmark state-of-the-art ANN algorithms across diverse datasets spanning a wide range of intrinsic dimensionalities, evaluating the two metrics comprehensively across efficiency, downstream classification, and retrieval-augmented generation. On the efficiency axis, optimizing for 1/Ratio@k reaches operational quality thresholds at a substantially lower computational cost than Recall@k. In downstream tasks, performance indicators (label precision, semantic similarity, BERTScore, and LLM-graded quality) remain highly stable even when Recall@k drops significantly. The inverse approximation ratio, on the other hand, closely mirrors this stability, tracking true utility much better than Recall@k. Ultimately, while Recall@k overstates the true cost of approximation, 1/Ratio@k offers a more accurate, deployable proxy for actual ANN quality.

URL PDF HTML ☆

赞 0 踩 0