arXivDaily arXiv每日学术速递 周一至周五更新

AI 大模型

RAG / 检索增强生成

检索增强生成、向量检索、知识库问答和面向大模型的搜索系统。

今日/当前日期收录 9 信号源:cs.IR, cs.CL, cs.AI, cs.DB
2506.20869 2026-06-18 cs.SE cs.AI cs.IR 95%

Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation

为现实应用工程化RAG系统:设计、开发与评估

Md Toufique Hasan, Muhammad Waseem, Kai-Kristian Kemell, Ayman Asad Khan, Mika Saari, Pekka Abrahamsson

发表机构 * Faculty of Information Technology and Communication Sciences, Tampere University(信息科技与通讯科学学院,塔尔皮耶大学)

专题命中 知识库问答 :五个领域特定RAG系统的工程化实践

AI总结 本文介绍了五个领域特定的RAG应用,涵盖治理、网络安全、农业、工业研究和医疗诊断,通过多语言OCR、语义向量检索和领域适应LLM,评估六个维度并总结十二项关键经验教训。

Comments Published in the Proceedings of the 51st Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2025. Lecture Notes in Computer Science, volume 16082, pages 143-158. Springer, 2026

Journal ref LNCS 16082, 143-158, 2026

详情
AI中文摘要

检索增强生成(RAG)系统正成为一种关键方法,用于将大型语言模型(LLMs)与外部知识联系起来,以解决事实准确性和上下文相关性方面的限制。然而,缺乏实证研究报告RAG基于真实应用场景的实现,通过一般用户参与评估,并伴有系统性的经验总结。本文提出了五个领域特定的RAG应用,分别应用于治理、网络安全、农业、工业研究和医疗诊断。每个系统都集成了多语言OCR、语义检索通过向量嵌入以及领域适应的LLM,并通过本地服务器或云API部署以满足不同的用户需求。一个基于网络的评估涉及总共100名参与者,评估了六个维度:(i)易用性,(ii)相关性,(iii)透明度,(iv)响应性,(v)准确性,(vi)推荐可能性。基于用户反馈和我们的开发经验,我们记录了十二项关键经验教训,突出了影响RAG系统在实践中可靠性和可用性的技术、操作和伦理挑战。

英文摘要

Retrieval-Augmented Generation (RAG) systems are emerging as a key approach for grounding Large Language Models (LLMs) in external knowledge, addressing limitations in factual accuracy and contextual relevance. However, there is a lack of empirical studies that report on the development of RAG-based implementations grounded in real-world use cases, evaluated through general user involvement, and accompanied by systematic documentation of lessons learned. This paper presents five domain-specific RAG applications developed for real-world scenarios across governance, cybersecurity, agriculture, industrial research, and medical diagnostics. Each system incorporates multilingual OCR, semantic retrieval via vector embeddings, and domain-adapted LLMs, deployed through local servers or cloud APIs to meet distinct user needs. A web-based evaluation involving a total of 100 participants assessed the systems across six dimensions: (i) Ease of Use, (ii) Relevance, (iii) Transparency, (iv) Responsiveness, (v) Accuracy, and (vi) Likelihood of Recommendation. Based on user feedback and our development experience, we documented twelve key lessons learned, highlighting technical, operational, and ethical challenges affecting the reliability and usability of RAG systems in practice.

2602.06495 2026-06-18 cs.CR 版本更新 85%

Graphs Don't Stay Secret: Practical Subgraph Reconstruction Attacks on Defended Graph RAG

图并非保密:对防御图RAG的实用子图重构攻击

Minkyoo Song, Jaehan Kim, Myungchul Kang, Hanna Kim, Seungwon Shin, Sooel Son

专题命中 知识库问答 :图RAG子图重构攻击

AI总结 提出GRASP攻击,通过多轮查询从防御的图RAG系统中重构子图,达到82.9 F1,并评估防御措施。

详情
AI中文摘要

基于图的检索增强生成(Graph RAG)越来越多地用于支持LLM应用,通过从知识图谱中检索的结构化知识增强用户查询。虽然Graph RAG改善了关系推理,但它引入了一个研究不足的威胁:攻击者可以从目标RAG系统的知识图谱中重构子图,从而推断隐私并复制精心策划的知识资产。我们表明,即使有简单的基于提示的防护,现有攻击对Graph RAG也基本无效,因为这些攻击暴露了明确的窃取意图,因此容易被轻量级的安全提示抑制。我们识别了在现实防护下进行实用Graph RAG提取的三个技术挑战,并引入了GRASP,一种黑盒、多轮子图重构攻击。GRASP (i) 将提取重新定义为上下文处理任务,(ii) 通过每条记录的标识符强制执行格式合规、基于实例的输出,以减少幻觉并保留关系细节,以及(iii) 使用发现感知调度器多样化目标驱动的攻击查询,以在严格的查询预算内操作。在两个真实知识图谱、四个安全对齐的LLM和多个Graph RAG框架上,GRASP在先前方法失败的情况下实现了最强的类型忠实重构,达到82.9 F1。我们进一步评估了防御措施,并提出了两种缓解方法,可在不损失效用的情况下有效降低重构保真度。

英文摘要

Graph-based retrieval-augmented generation (Graph RAG) is increasingly deployed to support LLM applications by augmenting user queries with structured knowledge retrieved from a knowledge graph. While Graph RAG improves relational reasoning, it introduces a largely understudied threat: adversaries can reconstruct subgraphs from a target RAG system's knowledge graph, enabling privacy inference and replication of curated knowledge assets. We show that existing attacks are largely ineffective against Graph RAG even with simple prompt-based safeguards, because these attacks expose explicit exfiltration intent and are therefore easily suppressed by lightweight safe prompts. We identify three technical challenges for practical Graph RAG extraction under realistic safeguards and introduce GRASP, a closed-box, multi-turn subgraph reconstruction attack. GRASP (i) reframes extraction as a context-processing task, (ii) enforces format-compliant, instance-grounded outputs via per-record identifiers to reduce hallucinations and preserve relational details, and (iii) diversifies goal-driven attack queries using a discovery-aware scheduler to operate within strict query budgets. Across two real-world knowledge graphs, four safety-aligned LLMs, and multiple Graph RAG frameworks, GRASP attains the strongest type-faithful reconstruction where prior methods fail, reaching up to 82.9 F1. We further evaluate defenses and propose two mitigations that effectively reduce reconstruction fidelity without utility loss.

2602.20135 2026-06-18 cs.CL cs.AI cs.IR 80%

KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration

KNIGHT: 基于知识图谱的多选题生成与自适应难度校准

Mohammad Amanlou, Erfan Shafiee Moghaddam, Yasaman Amou Jafari, Mahdi Noori, Farhan Farsi, Behnam Bahrak

发表机构 * University of Tehran(塔里班大学) Independent Researcher(独立研究员) Amirkabir University of Technology(阿米尔卡比尔技术大学) TEIAS Institute(TEIAS研究所)

专题命中 知识库问答 :基于知识图谱生成多选题用于RAG评估

AI总结 KNIGHT通过构建领域特定知识图谱,实现高效生成多选题数据集,支持自适应难度控制,提升生成效率与质量,验证了其在多个领域内的有效性。

Comments Accepted at the Third Conference on Parsimony and Learning (CPAL 2026). 36 pages, 12 figures. (Equal contribution: Yasaman Amou Jafari and Mahdi Noori.)

Journal ref Conference on Parsimony and Learning, Proceedings of Machine Learning Research, 328:989-1024, 2026

详情
AI中文摘要

随着大语言模型(LLMs)的兴起,它们在检索增强生成(RAG)等应用中变得至关重要。然而,评估这些系统仍受制于构建专用评估数据集的时间和成本。我们介绍了KNIGHT,一种基于LLM的知识图谱驱动框架,用于从外部来源生成多选题(MCQ)数据集。KNIGHT构建了一个主题特定的知识图谱,这是一个结构化且简洁的实体和关系摘要,可以重复使用以生成由教师控制的难度级别,包括多跳问题,而无需反复重新输入完整源文本。该知识图谱充当一个压缩、可重用的状态,使问题生成成为对图的廉价读取。我们将在维基百科/Wikidata上实例化KNIGHT,同时保持框架的领域和本体无关性。作为案例研究,KNIGHT在历史、生物学和数学领域生成了六个MCQ数据集。我们评估了五个标准:流畅性、无歧义性(单个正确答案)、主题相关性、选项唯一性和给定源提供的答案性(作为幻觉的代理)。结果表明,KNIGHT能够通过可重用的图表示实现令牌和成本高效的生成,实现了这些标准的高质量,且模型排名与MMLU风格基准一致,同时支持主题特定和难度控制的评估。

英文摘要

With the rise of large language models (LLMs), they have become instrumental in applications such as Retrieval-Augmented Generation (RAG). Yet evaluating these systems remains bottlenecked by the time and cost of building specialized assessment datasets. We introduce KNIGHT, an LLM-based, knowledge-graph-driven framework for generating multiple-choice question (MCQ) datasets from external sources. KNIGHT constructs a topic-specific knowledge graph, a structured and parsimonious summary of entities and relations, that can be reused to generate instructor-controlled difficulty levels, including multi-hop questions, without repeatedly re-feeding the full source text. This knowledge graph acts as a compressed, reusable state, making question generation a cheap read over the graph. We instantiate KNIGHT on Wikipedia/Wikidata while keeping the framework domain- and ontology-agnostic. As a case study, KNIGHT produces six MCQ datasets in History, Biology, and Mathematics. We evaluate quality on five criteria: fluency, unambiguity (single correct answer), topic relevance, option uniqueness, and answerability given the provided sources (as a proxy for hallucination). Results show that KNIGHT enables token- and cost-efficient generation from a reusable graph representation, achieves high quality across these criteria, and yields model rankings aligned with MMLU-style benchmarks, while supporting topic-specific and difficulty-controlled evaluation.

2606.18385 2026-06-18 cs.AI 新提交 70%

CaVe-VLM-CoT: An Interpretable Vision-Language Model Framework

CaVe-VLM-CoT:一种可解释的视觉-语言模型框架

Sneha Rao, Shaina Raza, Dhanesh Ramachandram

发表机构 * Vector Institute(向量研究所)

专题命中 知识库问答 :采用检索增强生成实现证据推理

AI总结 提出CaVe-VLM-CoT框架,通过五阶段闭环流水线(提取器、检索器、求解器、引用注入器、验证器)实现证据推理,并引入CaVeScore复合指标评估检索质量、引用忠实度和跨模态基础,在ScienceQA和MMMU上取得性能提升。

详情
AI中文摘要

视觉-语言模型(VLM)仍然容易产生幻觉,输出流畅但视觉上不忠实的输出。现有的思维链和检索增强方法仅部分解决了这一问题,因为它们既没有强制执行步骤级引用基础,也没有将验证失败路由回检索以进行纠正。我们提出了CaVe-VLM-CoT,一个模块化的基于反射的智能体RAG框架,通过五阶段闭环流水线强制执行证据推理:提取器、检索器、求解器、引用注入器和验证器,其中检测到的无根据声明会触发结构化反馈给提取器以进行针对性重新检索。由于现有框架没有联合衡量检索质量、逐步引用忠实度和跨模态基础,我们提出了一套涵盖所有阶段的23个组件级指标,以CaVeScore为核心,这是一个加权准确性、引用精确率和召回率、归因和证据基础的复合指标。无需任何架构或提示修改,CaVe-VLM-CoT在ScienceQA上达到87.1%的准确率和56.6%的CaVeScore,在MMMU(30个学科)上达到55.2%的准确率和35.7%的CaVeScore。

英文摘要

Vision-Language Models (VLMs) remain prone to hallucinations, producing fluent but visually unfaithful outputs. Existing chain-of-thought and retrieval-augmented methods only partially address this, as they neither enforce step-level citation grounding nor route verification failures back to retrieval for correction. We present CaVe-VLM-CoT, a modular reflection-based agentic-RAG framework that enforces evidence-grounded reasoning through a five-stage closed-loop pipeline: Extractor, Retriever, Solver, Citation Injector, and Verifier, in which detected ungrounded claims trigger structured feedback to the Extractor for targeted re-retrieval. Since no existing framework jointly measures retrieval quality, step-wise citation faithfulness, and cross-modal grounding, we propose a suite of 23 component-wise metrics across all stages, anchored by CaVeScore, a composite metric weighting accuracy, citation precision and recall, attribution, and evidence grounding. Without any architectural or prompt modifications, CaVe-VLM-CoT achieves 87.1\% accuracy and 56.6\% CaVeScore on ScienceQA , and 55.2\% accuracy and 35.7\% CaVeScore on MMMU (30 subjects).

2604.06967 2026-06-18 cs.CR cs.DB 版本更新 70%

VulLink: A Dynamic Open-Access Vulnerability Graph Database for Cybersecurity Data Mining

VulLink: 用于网络安全数据挖掘的动态开放访问漏洞图数据库

Luat Do, Jiao Yin, Jinli Cao, Hua Wang

专题命中 知识库问答 :漏洞图数据库,可视为知识库,与RAG弱相关

AI总结 提出VulLink,一种通过自动化ETL管道集成多源漏洞数据、提供图数据库、Web接口和API的动态开放平台,支持漏洞利用性预测等下游挖掘任务。

详情
AI中文摘要

软件漏洞的快速增长已将网络威胁情报分析转变为一项具有挑战性的数据挖掘问题,涉及异构且不断变化的数据源。公共存储库如国家漏洞数据库(NVD)、通用漏洞与暴露(CVE)、通用弱点枚举(CWE)、漏洞利用数据库(EDB)和CVE Details提供了有价值的信息,但其以记录为中心的架构使得捕获漏洞、弱点、利用、受影响产品、供应商和引用之间的跨源关系变得困难。现有的基于图的漏洞资源强调了关系威胁建模的价值,但许多资源仍然是静态的、离线的或难以用于下游图挖掘。本文提出了VulLink,一个已部署的、动态的、开放访问的漏洞图数据库,用于网络安全数据挖掘。VulLink通过自动化的提取-转换-加载(ETL)管道集成多个公共存储库,将孤立的、以记录为中心的漏洞数据转换为具有类型化实体和显式跨源关系的持续更新的图数据库。它提供交互式Web界面和公共API,用于探索、查询和导出可用于挖掘的漏洞子图。它还提供由预训练语言模型生成的漏洞描述的预计算嵌入,用户可以通过模型和嵌入维度查询和下载这些嵌入,作为下游挖掘任务(如可利用性预测)的语义特征。为了展示VulLink的实际效用,我们实现了一个下游的可利用性预测用例,该用例利用异构图上下文和语义漏洞特征。VulLink平台,包括Web界面、公共API、源代码和部署资源,均可在线公开访问。

英文摘要

The rapid growth of software vulnerabilities has turned cyber threat intelligence analysis into a challenging data mining problem over heterogeneous and continuously changing sources. Public repositories such as the National Vulnerability Database (NVD), Common Vulnerabilities and Exposures (CVE), Common Weakness Enumeration (CWE), Exploit Database (EDB), and CVE Details provide valuable information, but their record-centric schemas make it difficult to capture cross-source relationships among vulnerabilities, weaknesses, exploits, affected products, vendors, and references. Existing graph-based vulnerability resources highlight the value of relational threat modelling, yet many remain static, offline, or difficult to access for downstream graph mining. This paper presents VulLink, a deployed, dynamic, and open-access vulnerability graph database for cybersecurity data mining. VulLink integrates multiple public repositories through an automated Extract-Transform-Load (ETL) pipeline that converts isolated, record-centric vulnerability data into a continuously updated graph database with typed entities and explicit cross-source relationships. It provides an interactive Web interface and public API for exploring, querying, and exporting mining-ready vulnerability subgraphs. It also provides pre-computed embeddings of vulnerability descriptions generated by pretrained language models, which users can query and download by model and embedding dimension as semantic features for downstream mining tasks such as exploitability prediction. To demonstrate the practical utility of VulLink, we implement a downstream exploitability prediction use case that leverages heterogeneous graph context and semantic vulnerability features. The VulLink platform, including the Web interface, public API, source code, and deployment resources, is publicly available online.

2603.29247 2026-06-18 cs.CL cs.AI cs.LG 版本更新 70%

MemRerank: Preference Memory for Personalized Product Reranking

MemRerank:用于个性化产品重排序的偏好记忆

Zhiyuan Peng, Xuyang Wu, Huaixiao Tou, Yi Fang, Yu Gong

发表机构 * Santa Clara University(圣克拉拉大学) Independent Researcher(独立研究者)

专题命中 知识库问答 :偏好记忆用于LLM购物代理重排序,含检索

AI总结 提出MemRerank框架,通过强化学习将用户购买历史提炼为查询无关的偏好记忆,用于LLM购物代理的个性化重排序,在1-in-5选择任务中准确率提升高达10.61个百分点。

Comments correct author name in metadata

详情
AI中文摘要

基于LLM的购物代理越来越依赖长购买历史和多轮交互来实现个性化,然而,由于噪声、长度和相关性不匹配,将原始历史简单地附加到提示中通常效果不佳。我们提出MemRerank,一个偏好记忆框架,将用户购买历史提炼为简洁、查询无关的信号,用于个性化产品重排序。为了研究这个问题,我们构建了一个端到端的基准测试和评估框架,围绕基于LLM的\ extbf{1-in-5}选择任务,该任务同时衡量记忆质量和下游重排序效用。我们进一步使用强化学习(RL)训练记忆提取器,以下游重排序性能作为监督。使用两个基于LLM的重排序器进行的实验表明,MemRerank始终优于无记忆、原始历史和现成记忆基线,在1-in-5准确率上提高了高达\ extbf{+10.61}个绝对百分点。这些结果表明,显式偏好记忆是代理型电子商务系统中个性化的一种实用且有效的构建模块。

英文摘要

LLM-based shopping agents increasingly rely on long purchase histories and multi-turn interactions for personalization, yet naively appending raw history to prompts is often ineffective due to noise, length, and relevance mismatch. We propose MemRerank, a preference memory framework that distills user purchase history into concise, query-independent signals for personalized product reranking. To study this problem, we build an end-to-end benchmark and evaluation framework centered on an LLM-based \textbf{1-in-5} selection task, which measures both memory quality and downstream reranking utility. We further train the memory extractor with reinforcement learning (RL), using downstream reranking performance as supervision. Experiments with two LLM-based rerankers show that MemRerank consistently outperforms no-memory, raw-history, and off-the-shelf memory baselines, yielding up to \textbf{+10.61} absolute points in 1-in-5 accuracy. These results suggest that explicit preference memory is a practical and effective building block for personalization in agentic e-commerce systems.

2603.00026 2026-06-18 cs.CL cs.AI cs.IR 版本更新 70%

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

ActMem:弥合LLM代理中记忆检索与推理之间的差距

Xiaohui Zhang, Zequn Sun, Chengyuan Yang, Yaqin Jin, Yazhong Zhang, Wei Hu

发表机构 * State Key Laboratory for Novel Software Technology, Nanjing University, China(南京大学新型软件技术国家重点实验室) Alibaba Group, Hangzhou, China(阿里巴巴集团,杭州,中国) National Institute of Healthcare Data Science, Nanjing University, China(南京大学健康数据科学国家研究院)

专题命中 知识库问答 :记忆管理涉及检索,但侧重推理

AI总结 提出ActMem框架,通过将非结构化对话历史转化为结构化因果语义图,结合反事实推理和常识补全,实现主动因果推理,显著提升LLM代理在复杂记忆依赖任务中的表现。

详情
AI中文摘要

记忆管理对于长期交互中的LLM代理至关重要。当前的记忆框架通常将代理视为被动的“记录器”,并在不理解其深层含义的情况下检索信息。它们可能在需要推理和复杂决策的场景中失败。为了弥合这一关键差距,我们提出了一种新颖的可操作记忆框架ActMem,它将记忆检索与主动因果推理相结合。ActMem将非结构化对话历史转化为结构化的因果语义图。通过利用反事实推理和常识补全,它使代理能够推断隐含约束并解决过去状态与当前意图之间的潜在冲突。此外,我们引入了一个全面的数据集ActMemEval,用于评估代理在逻辑驱动场景中的推理能力,超越了现有记忆基准测试中事实检索的焦点。实验表明,ActMem在处理复杂的、依赖记忆的任务时显著优于基线,为更一致和可靠的智能助手铺平了道路。

英文摘要

Memory management is essential for LLM agents in long-term interactions. Current memory frameworks typically treat agents as passive ``recorders'' and retrieve information without understanding its deeper implications. They may fail in scenarios requiring reasoning and complex decision-making. To bridge this critical gap, we propose a novel actionable memory framework called ActMem that integrates memory retrieval with active causal reasoning. ActMem transforms unstructured dialogue history into a structured causal and semantic graph. By leveraging counterfactual reasoning and commonsense completion, it enables agents to deduce implicit constraints and resolve potential conflicts between past states and current intentions. Furthermore, we introduce a comprehensive dataset ActMemEval to evaluate agent reasoning capabilities in logic-driven scenarios, moving beyond the fact-retrieval focus of existing memory benchmarks. Experiments demonstrate that ActMem significantly outperforms baselines in handling complex, memory-dependent tasks, paving the way for more consistent and reliable intelligent assistants.

2606.18850 2026-06-18 cs.CL cs.IR 新提交 60%

ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement

ScholarSum:基于知识图谱推理与反思性精炼的师生式抽象摘要生成

Bohou Zhang, Xiaoyu Tao, Mingyue Cheng, Huijie Liu, Qi Liu

发表机构 * State Key Laboratory of Cognitive Intelligence(认知智能国家重点实验室)

专题命中 知识库问答 :使用知识图谱推理,非传统RAG。

AI总结 提出ScholarSum框架,通过构建层次知识图谱引导学生生成初稿,并利用教师式审阅者迭代检查与修正,实现科学文献摘要的流畅性与事实一致性。

详情
AI中文摘要

抽象摘要生成在实现科学文献高效理解中起着关键作用,但它本质上要求同时具备语言流畅性和事实忠实性。现有方法往往难以协调这两个要求。抽取式方法依赖僵硬的句子拼接,破坏了宏观层面的逻辑连贯性;而基于大语言模型的生成式方法尽管掌握了语言流畅性,但事实一致性有限。在这项工作中,我们提出了ScholarSum,一个层次化反思性图框架,模拟师生写作过程以实现流畅且忠实的科学摘要生成。ScholarSum首先通过将文档分割成语义连贯的单元,组织成层次知识图谱,其多层社区结构捕获全局逻辑和宏观主题。在该全局结构引导下,学生生成初稿,随后通过细粒度证据检索进行精炼。为确保事实一致性,教师式审阅者迭代检查初稿,识别不支持的内容,并触发有针对性的重新检索和重写,直到摘要达到严格的质量标准。大量实验表明,ScholarSum在完整性和忠实性方面显著优于之前的基线方法。我们的代码可在该https URL获取。

英文摘要

Abstractive summarization plays a crucial role in enabling efficient understanding of scientific literature, yet it inherently demands both linguistic fluency and factual faithfulness. Existing approaches often fail to reconcile these two requirements. Extractive methods rely on rigid sentence splicing that disrupts macro-level logical coherence, while large language model (LLM)-based generative approaches, despite mastering linguistic fluency, exhibit limited factual consistency. In this work, we propose ScholarSum, a hierarchical reflective graph-based framework that emulates a student-teacher writing process for fluent and faithful scientific summarization. ScholarSum first organizes the document into a hierarchical knowledge graph by segmenting it into semantically coherent units, whose multi-layered community structure captures global logic and macro-level themes. Guided by this global structure, the student generates an initial draft, which is subsequently refined through fine-grained evidence retrieval. To ensure factual consistency, a teacher-like reviewer then iteratively examines the draft, identifies unsupported content, and prompts targeted re-retrieval and rewriting until the summary meets rigorous quality standards. Extensive experiments demonstrate that ScholarSum significantly outperforms previous baselines in terms of both completeness and faithfulness. Our code is available at https://github.com/Xiaoyu-Tao/ScholarSum.

2601.14288 2026-06-18 astro-ph.CO cs.AI cs.CE gr-qc hep-th 版本更新 60%

DeepInflation: an AI agent for research and model discovery of inflation

DeepInflation:用于暴胀研究与模型发现的AI智能体

Ze-Yu Peng, Hao-Shi Yuan, Qi Lai, Jun-Qian Jiang, Gen Ye, Jun Zhang, Yun-Song Piao

发表机构 * School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China International Centre for Theoretical Physics Asia-Pacific, University of Chinese Academy of Sciences, 100190 Beijing, China Taiji Laboratory for Gravitational Wave Universe, University of Chinese Academy of Sciences, 100049 Beijing, China School of Fundamental Physics Mathematical Sciences, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China Institute of Theoretical Physics, Chinese Academy of Sciences, P.O. Box 2735, Beijing 100190, China D\' e partement de Physique Th\' e orique, Universit\' e de Gen\` e ve, 24 quai Ernest-Ansermet, CH-1211 Gen\` e ve 4, Switzerland

专题命中 知识库问答 :集成RAG知识库提供理论背景

AI总结 提出基于多智能体架构的AI智能体DeepInflation,集成大语言模型、符号回归引擎和检索增强生成知识库,自动发现与最新观测一致的单场慢滚暴胀势,并解释理论背景。

详情
AI中文摘要

我们提出了DeepInflation,一个专为暴胀宇宙学中的研究和模型发现而设计的AI智能体。基于多智能体架构,DeepInflation将大语言模型(LLMs)与符号回归(SR)引擎以及检索增强生成(RAG)知识库相结合。该框架使智能体能够自动探索和验证广阔的暴胀势景观,同时将其输出建立在既定的理论文献基础上。我们证明,DeepInflation能够成功发现与最新观测(以ACT DR6结果为例)或任意给定的$n_s$和$r$一致的简单且可行的单场慢滚暴胀势,并为晦涩的暴胀场景提供准确的理论背景。DeepInflation作为宇宙学中新一代自主科学发现引擎的原型,使研究人员和非专家都能使用自然语言探索暴胀景观。该智能体可从此网址获取:https://example.com。

英文摘要

We present DeepInflation, an AI agent designed for research and model discovery in inflationary cosmology. Built upon a multi-agent architecture, DeepInflation integrates Large Language Models (LLMs) with a symbolic regression (SR) engine and a retrieval-augmented generation (RAG) knowledge base. This framework enables the agent to automatically explore and verify the vast landscape of inflationary potentials while grounding its outputs in established theoretical literature. We demonstrate that DeepInflation can successfully discover simple and viable single-field slow-roll inflationary potentials consistent with the latest observations (with the ACT DR6 results taken as an example) or any given $n_s$ and $r$, and provide accurate theoretical context for obscure inflationary scenarios. DeepInflation serves as a prototype for a new generation of autonomous scientific discovery engines in cosmology, which enables researchers and non-experts alike to explore the inflationary landscape using natural language. This agent is available at https://github.com/pengzy-cosmo/DeepInflation.