RAG / 检索增强生成

2605.29517 2026-06-18 cs.IR 版本更新 95%

FLASH-MAXSIM: IO-Aware Fused Kernels for Late-Interaction Retrieval

FLASH-MAXSIM: 面向IO感知的融合内核用于晚期交互评分

Roi Pony, Daniel Ezer, Adi Raz Goldfarb, Idan Friedman, Oshri Naparstek, Udi Barzelay

专题命中检索器：提出Flash-MaxSim内核加速晚期交互检索，核心是检索器优化。

AI总结提出Flash-MaxSim，一种IO感知的融合GPU内核，通过流式分块和片上SRAM折叠行最大规约，避免物化完整相似度张量，显著降低内存占用并加速晚期交互检索（如ColBERT、ColPali）的MaxSim评分。

详情

AI中文摘要

晚期交互检索（ColBERT, ColPali）使用MaxSim算子对查询和文档进行评分：对于每个查询词元，取与文档词元的最大相似度，然后对所有查询词元求和。标准实现会在GPU内存中物化完整的查询词元×文档词元相似度张量；对于视觉ColPali处理10K文档，仅该张量在FP16下就占用21 GB，创建后仅为了减少为每个文档一个分数然后丢弃。这会耗尽40 GB GPU，并限制了推理和训练中可实现的批大小。我们提出Flash-MaxSim，一种IO感知的融合GPU内核，通过将查询和文档分块流式传输到片上SRAM，并在同一遍中折叠行最大规约，从而在不物化张量的情况下精确计算相同的分数。我们将IO感知原理扩展到训练反向传播：一种逆网格CSR构造，重用前向argmax实现无原子操作、目标拥有的梯度规约；以及INT8×INT8量化和可变长度（无填充）评分。在A100上，Flash-MaxSim比同等精度的朴素PyTorch快3.9倍（H100上快4.7倍），推理内存减少16倍，训练内存减少约28倍，解锁了PyTorch完全无法处理的语料库和批大小，并保持了精确的排序（与FP32参考的top-20一致性为100%）。

英文摘要

Late-interaction retrieval (ColBERT, ColPali) scores a query against a document via the MaxSim operator. The standard PyTorch implementation materialises the full query-token x document-token similarity tensor only to reduce it away. At ColPali scale this is the single largest tensor in the pipeline (e.g. 21 GB in FP16 for 10K documents) and limits both candidate set size at inference and batch size during contrastive training. We present Flash-MaxSim (FM), an IO-aware fused GPU kernel that computes the same MaxSim scores without ever materialising the tensor, and extends the same principle to the training backward. At ColPali scale on A100 this cuts inference memory up to 9x and training memory by two orders of magnitude, unlocking candidate sets and contrastive batch sizes a single GPU could not previously reach. The kernel is a drop-in replacement, exact up to floating-point evaluation order under its stated FP32-accumulation protocol: rankings match the FP32 reference within 5e-4 of nDCG@10 on BEIR and REAL-MM-RAG. A separate INT8 path trades exactness for halved index storage at high fidelity. Released open-source.

URL PDF HTML ☆

赞 0 踩 0

2606.01697 2026-06-18 cs.CL 版本更新 90%

RCEM: Robust Conversational Search EMbedder in Distributional Shift

RCEM：配备查询重写技能的嵌入器，用于分布偏移下的鲁棒对话搜索

Kilho Son, Paul Hsu, Cha Zhang, Dinei Florencio

发表机构 * Microsoft（微软）

专题命中检索器：对话搜索嵌入器，结合LLM查询重写与检索

AI总结提出RCEM模型，通过将LLM的查询重写能力蒸馏到嵌入模型中，实现无需显式重写的上下文感知检索，在分布偏移下提升鲁棒性。

详情

AI中文摘要

对话搜索在检索增强生成（RAG）系统中变得越来越重要，用户通过包含上下文相关查询的多轮对话与AI助手交互。我们提出RCEM，一种对话式稠密检索模型，它将LLM的查询改写能力蒸馏到嵌入模型中，从而在推理时无需显式查询改写即可实现上下文感知检索。与先前学习直接对话到文档匹配的对话式稠密检索方法不同，RCEM将对话查询嵌入与改写后的查询嵌入对齐，提高了在分布偏移下的鲁棒性。RCEM不需要用于训练的对话查询到文档的相关性映射，这些映射通常昂贵且难以获得高质量。在QReCC、TopiOCQA和TREC CAsT上的大量实验表明，RCEM始终优于强对话检索基线，在分布偏移下取得了特别大的增益，包括Recall@10提升高达20%。RCEM进一步扩展了基础嵌入模型，使其具备对话查询改写能力，同时保留了原有的检索功能，允许单个模型对独立查询和对话查询进行编码，并针对现有文档索引进行搜索，而无需重建检索数据库。

英文摘要

We propose RCEM, a Robust Conversational search EMbedder that is additionally equipped with LLM's query reformulation capability without losing base model's generalization. Unlike prior conversational dense retrieval approaches that learn direct conversation-to-passage matching, RCEM aligns conversations, prepended by special token, to LLM-rewritten queries, while preserving the original embedding space. The unchanged embedding space automatically maps the rewritten-query to the relevant passages. As a result, RCEM (1) reduces overfitting by simplifying the alignment task from long passages to shorter rewritten queries, (2) eliminates the need for conversation-to-passage relevance labels for training, and (3) maintains its original embedding space that allows conversational queries against indexes built by original embedder without rebuilding them. Extensive experiments show that RCEM consistently outperforms prior approaches, achieving up to 30% improvement under distributional shift.

URL PDF HTML ☆

赞 0 踩 0

2601.08554 2026-06-18 cs.SI cs.DB cs.GR 版本更新 60%

Maintaining Leiden Communities in Large Dynamic Graphs

维护大规模动态图中的 Leiden 社区

Chunxu Lin, Yumao Xie, Yixiang Fang, Yongmin Hu, Yingqian Hu, Cheng Chen

专题命中检索器：社区检测用于RAG的层次索引，但非核心

AI总结针对现有动态 Leiden 算法在频繁更新下效率低的问题，提出 HIT-Leiden 算法，通过维护连通分量和层次社区结构减少受影响顶点范围，实现高达五个数量级的加速。

详情

AI中文摘要

社区检测是大规模工业图分析中的基础能力，支撑着欺诈团伙发现、推荐系统和检索增强生成的层次索引等应用。在基于模块度的方法中，Leiden 算法因其能生成具有连通性保证的高质量社区而被广泛采用。然而，现实世界的图不断演化，需要及时更新社区以保持下游特征和检索索引的新鲜度。同时，现有的动态 Leiden 方法在顶点和边发生变化时重新计算社区，因此在频繁更新下几乎退化为接近完全重新计算。为了解决效率问题，我们研究了大规模动态图中 Leiden 社区的高效维护，并提出了一种新颖算法，称为层次增量树 Leiden（HIT-Leiden）。我们首先进行了有界性分析，表明先前的增量 Leiden 方法即使对于小更新也可能产生本质上无界的工作量。在此分析的指导下，我们提出了 HIT-Leiden，它通过维护连通分量和层次社区结构有效减少了受影响顶点的范围。在大型真实动态图上的大量实验表明，HIT-Leiden 实现了与最先进竞争对手相当的社区质量，同时相比现有解决方案实现了高达五个数量级的加速。生产部署结果表明，HIT-Leiden 在高频更新下满足严格的延迟要求。

英文摘要

Community detection is a foundational capability in large-scale industrial graph analytics, powering applications such as fraud-ring discovery, recommendation systems, and hierarchical indexing for retrieval-augmented generation. Among modularity-based methods, the Leiden algorithm has been widely adopted in production because it delivers high-quality communities with connectivity guarantees. However, real-world graphs evolve continuously, and timely community updates are needed to keep downstream features and retrieval indices fresh. Meanwhile, existing dynamic Leiden approaches recompute the communities whenever their vertices and edges change, thereby almost degrading to near-full recomputation under frequent updates. To alleviate the efficiency issue, we study the efficient maintenance of Leiden communities in large dynamic graphs and present a novel algorithm, called Hierarchical Incremental Tree Leiden (HIT-Leiden). We first provide a boundedness analysis showing that prior incremental Leiden methods can incur essentially unbounded work even for small updates. Guided by this analysis, we propose HIT-Leiden which effectively reduces the range of affected vertices by maintaining connected components and hierarchical community structures. Extensive experiments on large real-world dynamic graphs demonstrate that HIT-Leiden achieves community quality comparable to the state-of-the-art competitors while delivering speedups of up to five orders of magnitude over existing solutions. The production deployment results show that HIT-Leiden meets stringent latency requirements under high-rate updates at scale.

URL PDF HTML ☆

赞 0 踩 0

2602.06495 2026-06-18 cs.CR 版本更新 85%

Graphs Don't Stay Secret: Practical Subgraph Reconstruction Attacks on Defended Graph RAG

图并非保密：对防御图RAG的实用子图重构攻击

Minkyoo Song, Jaehan Kim, Myungchul Kang, Hanna Kim, Seungwon Shin, Sooel Son

专题命中知识库问答：图RAG子图重构攻击

AI总结提出GRASP攻击，通过多轮查询从防御的图RAG系统中重构子图，达到82.9 F1，并评估防御措施。

详情

AI中文摘要

基于图的检索增强生成（Graph RAG）越来越多地用于支持LLM应用，通过从知识图谱中检索的结构化知识增强用户查询。虽然Graph RAG改善了关系推理，但它引入了一个研究不足的威胁：攻击者可以从目标RAG系统的知识图谱中重构子图，从而推断隐私并复制精心策划的知识资产。我们表明，即使有简单的基于提示的防护，现有攻击对Graph RAG也基本无效，因为这些攻击暴露了明确的窃取意图，因此容易被轻量级的安全提示抑制。我们识别了在现实防护下进行实用Graph RAG提取的三个技术挑战，并引入了GRASP，一种黑盒、多轮子图重构攻击。GRASP (i) 将提取重新定义为上下文处理任务，(ii) 通过每条记录的标识符强制执行格式合规、基于实例的输出，以减少幻觉并保留关系细节，以及(iii) 使用发现感知调度器多样化目标驱动的攻击查询，以在严格的查询预算内操作。在两个真实知识图谱、四个安全对齐的LLM和多个Graph RAG框架上，GRASP在先前方法失败的情况下实现了最强的类型忠实重构，达到82.9 F1。我们进一步评估了防御措施，并提出了两种缓解方法，可在不损失效用的情况下有效降低重构保真度。

英文摘要

Graph-based retrieval-augmented generation (Graph RAG) is increasingly deployed to support LLM applications by augmenting user queries with structured knowledge retrieved from a knowledge graph. While Graph RAG improves relational reasoning, it introduces a largely understudied threat: adversaries can reconstruct subgraphs from a target RAG system's knowledge graph, enabling privacy inference and replication of curated knowledge assets. We show that existing attacks are largely ineffective against Graph RAG even with simple prompt-based safeguards, because these attacks expose explicit exfiltration intent and are therefore easily suppressed by lightweight safe prompts. We identify three technical challenges for practical Graph RAG extraction under realistic safeguards and introduce GRASP, a closed-box, multi-turn subgraph reconstruction attack. GRASP (i) reframes extraction as a context-processing task, (ii) enforces format-compliant, instance-grounded outputs via per-record identifiers to reduce hallucinations and preserve relational details, and (iii) diversifies goal-driven attack queries using a discovery-aware scheduler to operate within strict query budgets. Across two real-world knowledge graphs, four safety-aligned LLMs, and multiple Graph RAG frameworks, GRASP attains the strongest type-faithful reconstruction where prior methods fail, reaching up to 82.9 F1. We further evaluate defenses and propose two mitigations that effectively reduce reconstruction fidelity without utility loss.

URL PDF HTML ☆

赞 0 踩 0

2604.06967 2026-06-18 cs.CR cs.DB 版本更新 70%

VulLink: A Dynamic Open-Access Vulnerability Graph Database for Cybersecurity Data Mining

VulLink: 用于网络安全数据挖掘的动态开放访问漏洞图数据库

Luat Do, Jiao Yin, Jinli Cao, Hua Wang

专题命中知识库问答：漏洞图数据库，可视为知识库，与RAG弱相关

AI总结提出VulLink，一种通过自动化ETL管道集成多源漏洞数据、提供图数据库、Web接口和API的动态开放平台，支持漏洞利用性预测等下游挖掘任务。

详情

AI中文摘要

软件漏洞的快速增长已将网络威胁情报分析转变为一项具有挑战性的数据挖掘问题，涉及异构且不断变化的数据源。公共存储库如国家漏洞数据库（NVD）、通用漏洞与暴露（CVE）、通用弱点枚举（CWE）、漏洞利用数据库（EDB）和CVE Details提供了有价值的信息，但其以记录为中心的架构使得捕获漏洞、弱点、利用、受影响产品、供应商和引用之间的跨源关系变得困难。现有的基于图的漏洞资源强调了关系威胁建模的价值，但许多资源仍然是静态的、离线的或难以用于下游图挖掘。本文提出了VulLink，一个已部署的、动态的、开放访问的漏洞图数据库，用于网络安全数据挖掘。VulLink通过自动化的提取-转换-加载（ETL）管道集成多个公共存储库，将孤立的、以记录为中心的漏洞数据转换为具有类型化实体和显式跨源关系的持续更新的图数据库。它提供交互式Web界面和公共API，用于探索、查询和导出可用于挖掘的漏洞子图。它还提供由预训练语言模型生成的漏洞描述的预计算嵌入，用户可以通过模型和嵌入维度查询和下载这些嵌入，作为下游挖掘任务（如可利用性预测）的语义特征。为了展示VulLink的实际效用，我们实现了一个下游的可利用性预测用例，该用例利用异构图上下文和语义漏洞特征。VulLink平台，包括Web界面、公共API、源代码和部署资源，均可在线公开访问。

英文摘要

The rapid growth of software vulnerabilities has turned cyber threat intelligence analysis into a challenging data mining problem over heterogeneous and continuously changing sources. Public repositories such as the National Vulnerability Database (NVD), Common Vulnerabilities and Exposures (CVE), Common Weakness Enumeration (CWE), Exploit Database (EDB), and CVE Details provide valuable information, but their record-centric schemas make it difficult to capture cross-source relationships among vulnerabilities, weaknesses, exploits, affected products, vendors, and references. Existing graph-based vulnerability resources highlight the value of relational threat modelling, yet many remain static, offline, or difficult to access for downstream graph mining. This paper presents VulLink, a deployed, dynamic, and open-access vulnerability graph database for cybersecurity data mining. VulLink integrates multiple public repositories through an automated Extract-Transform-Load (ETL) pipeline that converts isolated, record-centric vulnerability data into a continuously updated graph database with typed entities and explicit cross-source relationships. It provides an interactive Web interface and public API for exploring, querying, and exporting mining-ready vulnerability subgraphs. It also provides pre-computed embeddings of vulnerability descriptions generated by pretrained language models, which users can query and download by model and embedding dimension as semantic features for downstream mining tasks such as exploitability prediction. To demonstrate the practical utility of VulLink, we implement a downstream exploitability prediction use case that leverages heterogeneous graph context and semantic vulnerability features. The VulLink platform, including the Web interface, public API, source code, and deployment resources, is publicly available online.

URL PDF HTML ☆

赞 0 踩 0

2603.29247 2026-06-18 cs.CL cs.AI cs.LG 版本更新 70%

MemRerank: Preference Memory for Personalized Product Reranking

MemRerank：用于个性化产品重排序的偏好记忆

Zhiyuan Peng, Xuyang Wu, Huaixiao Tou, Yi Fang, Yu Gong

发表机构 * Santa Clara University（圣克拉拉大学）； Independent Researcher（独立研究者）

专题命中知识库问答：偏好记忆用于LLM购物代理重排序，含检索

AI总结提出MemRerank框架，通过强化学习将用户购买历史提炼为查询无关的偏好记忆，用于LLM购物代理的个性化重排序，在1-in-5选择任务中准确率提升高达10.61个百分点。

Comments correct author name in metadata

详情

AI中文摘要

基于LLM的购物代理越来越依赖长购买历史和多轮交互来实现个性化，然而，由于噪声、长度和相关性不匹配，将原始历史简单地附加到提示中通常效果不佳。我们提出MemRerank，一个偏好记忆框架，将用户购买历史提炼为简洁、查询无关的信号，用于个性化产品重排序。为了研究这个问题，我们构建了一个端到端的基准测试和评估框架，围绕基于LLM的\ extbf{1-in-5}选择任务，该任务同时衡量记忆质量和下游重排序效用。我们进一步使用强化学习（RL）训练记忆提取器，以下游重排序性能作为监督。使用两个基于LLM的重排序器进行的实验表明，MemRerank始终优于无记忆、原始历史和现成记忆基线，在1-in-5准确率上提高了高达\ extbf{+10.61}个绝对百分点。这些结果表明，显式偏好记忆是代理型电子商务系统中个性化的一种实用且有效的构建模块。

英文摘要

LLM-based shopping agents increasingly rely on long purchase histories and multi-turn interactions for personalization, yet naively appending raw history to prompts is often ineffective due to noise, length, and relevance mismatch. We propose MemRerank, a preference memory framework that distills user purchase history into concise, query-independent signals for personalized product reranking. To study this problem, we build an end-to-end benchmark and evaluation framework centered on an LLM-based \textbf{1-in-5} selection task, which measures both memory quality and downstream reranking utility. We further train the memory extractor with reinforcement learning (RL), using downstream reranking performance as supervision. Experiments with two LLM-based rerankers show that MemRerank consistently outperforms no-memory, raw-history, and off-the-shelf memory baselines, yielding up to \textbf{+10.61} absolute points in 1-in-5 accuracy. These results suggest that explicit preference memory is a practical and effective building block for personalization in agentic e-commerce systems.

URL PDF HTML ☆

赞 0 踩 0

2603.00026 2026-06-18 cs.CL cs.AI cs.IR 版本更新 70%

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

ActMem：弥合LLM代理中记忆检索与推理之间的差距

Xiaohui Zhang, Zequn Sun, Chengyuan Yang, Yaqin Jin, Yazhong Zhang, Wei Hu

发表机构 * State Key Laboratory for Novel Software Technology, Nanjing University, China（南京大学新型软件技术国家重点实验室）； Alibaba Group, Hangzhou, China（阿里巴巴集团，杭州，中国）； National Institute of Healthcare Data Science, Nanjing University, China（南京大学健康数据科学国家研究院）

专题命中知识库问答：记忆管理涉及检索，但侧重推理

AI总结提出ActMem框架，通过将非结构化对话历史转化为结构化因果语义图，结合反事实推理和常识补全，实现主动因果推理，显著提升LLM代理在复杂记忆依赖任务中的表现。

详情

AI中文摘要

记忆管理对于长期交互中的LLM代理至关重要。当前的记忆框架通常将代理视为被动的“记录器”，并在不理解其深层含义的情况下检索信息。它们可能在需要推理和复杂决策的场景中失败。为了弥合这一关键差距，我们提出了一种新颖的可操作记忆框架ActMem，它将记忆检索与主动因果推理相结合。ActMem将非结构化对话历史转化为结构化的因果语义图。通过利用反事实推理和常识补全，它使代理能够推断隐含约束并解决过去状态与当前意图之间的潜在冲突。此外，我们引入了一个全面的数据集ActMemEval，用于评估代理在逻辑驱动场景中的推理能力，超越了现有记忆基准测试中事实检索的焦点。实验表明，ActMem在处理复杂的、依赖记忆的任务时显著优于基线，为更一致和可靠的智能助手铺平了道路。

英文摘要

Memory management is essential for LLM agents in long-term interactions. Current memory frameworks typically treat agents as passive ``recorders'' and retrieve information without understanding its deeper implications. They may fail in scenarios requiring reasoning and complex decision-making. To bridge this critical gap, we propose a novel actionable memory framework called ActMem that integrates memory retrieval with active causal reasoning. ActMem transforms unstructured dialogue history into a structured causal and semantic graph. By leveraging counterfactual reasoning and commonsense completion, it enables agents to deduce implicit constraints and resolve potential conflicts between past states and current intentions. Furthermore, we introduce a comprehensive dataset ActMemEval to evaluate agent reasoning capabilities in logic-driven scenarios, moving beyond the fact-retrieval focus of existing memory benchmarks. Experiments demonstrate that ActMem significantly outperforms baselines in handling complex, memory-dependent tasks, paving the way for more consistent and reliable intelligent assistants.

URL PDF HTML ☆

赞 0 踩 0

2601.14288 2026-06-18 astro-ph.CO cs.AI cs.CE gr-qc hep-th 版本更新 60%

DeepInflation: an AI agent for research and model discovery of inflation

DeepInflation：用于暴胀研究与模型发现的AI智能体

Ze-Yu Peng, Hao-Shi Yuan, Qi Lai, Jun-Qian Jiang, Gen Ye, Jun Zhang, Yun-Song Piao

发表机构 * School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China ； International Centre for Theoretical Physics Asia-Pacific, University of Chinese Academy of Sciences, 100190 Beijing, China Taiji Laboratory for Gravitational Wave Universe, University of Chinese Academy of Sciences, 100049 Beijing, China School of Fundamental Physics ； Mathematical Sciences, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China Institute of Theoretical Physics, Chinese Academy of Sciences, P.O. Box 2735, Beijing 100190, China D\' e partement de Physique Th\' e orique, Universit\' e de Gen\` e ve, 24 quai Ernest-Ansermet, CH-1211 Gen\` e ve 4, Switzerland

专题命中知识库问答：集成RAG知识库提供理论背景

AI总结提出基于多智能体架构的AI智能体DeepInflation，集成大语言模型、符号回归引擎和检索增强生成知识库，自动发现与最新观测一致的单场慢滚暴胀势，并解释理论背景。

详情

AI中文摘要

我们提出了DeepInflation，一个专为暴胀宇宙学中的研究和模型发现而设计的AI智能体。基于多智能体架构，DeepInflation将大语言模型（LLMs）与符号回归（SR）引擎以及检索增强生成（RAG）知识库相结合。该框架使智能体能够自动探索和验证广阔的暴胀势景观，同时将其输出建立在既定的理论文献基础上。我们证明，DeepInflation能够成功发现与最新观测（以ACT DR6结果为例）或任意给定的$n_s$和$r$一致的简单且可行的单场慢滚暴胀势，并为晦涩的暴胀场景提供准确的理论背景。DeepInflation作为宇宙学中新一代自主科学发现引擎的原型，使研究人员和非专家都能使用自然语言探索暴胀景观。该智能体可从此网址获取：https://example.com。

英文摘要

We present DeepInflation, an AI agent designed for research and model discovery in inflationary cosmology. Built upon a multi-agent architecture, DeepInflation integrates Large Language Models (LLMs) with a symbolic regression (SR) engine and a retrieval-augmented generation (RAG) knowledge base. This framework enables the agent to automatically explore and verify the vast landscape of inflationary potentials while grounding its outputs in established theoretical literature. We demonstrate that DeepInflation can successfully discover simple and viable single-field slow-roll inflationary potentials consistent with the latest observations (with the ACT DR6 results taken as an example) or any given $n_s$ and $r$, and provide accurate theoretical context for obscure inflationary scenarios. DeepInflation serves as a prototype for a new generation of autonomous scientific discovery engines in cosmology, which enables researchers and non-experts alike to explore the inflationary landscape using natural language. This agent is available at https://github.com/pengzy-cosmo/DeepInflation.

URL PDF HTML ☆

赞 0 踩 0

1. 检索器 3 篇

FLASH-MAXSIM: IO-Aware Fused Kernels for Late-Interaction Retrieval

RCEM: Robust Conversational Search EMbedder in Distributional Shift

Maintaining Leiden Communities in Large Dynamic Graphs

2. 知识库问答 5 篇

Graphs Don't Stay Secret: Practical Subgraph Reconstruction Attacks on Defended Graph RAG

VulLink: A Dynamic Open-Access Vulnerability Graph Database for Cybersecurity Data Mining

MemRerank: Preference Memory for Personalized Product Reranking

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

DeepInflation: an AI agent for research and model discovery of inflation