语言大模型 / LLM - arXivDaily 专题

2606.19710 2026-06-19 cs.CL cs.AI 新提交 75%

FineREX: Fine-Tuned NER-RE for Human Smuggling Knowledge Graphs

FineREX: 面向人口走私知识图谱的微调NER-RE

Elijah Feldman, Dipak Meher, Carlotta Domeniconi

发表机构 * Thomas Jefferson High School for Science and Technology（托马斯·杰斐逊科技高中）

专题命中指令微调：微调LLM以提升特定领域信息提取性能。

AI总结提出FineREX，一个基于微调LLM的流水线，用于从法律文档中提取实体和关系构建知识图谱，在F1分数上分别提升15.50%和31.46%，并减少50%处理时间。

Comments Code available at https://github.com/ElijahFeldman7/FineREX

详情

AI中文摘要

法庭记录包含关于人口走私网络的有价值证据，但这些信息通常埋藏在非结构化的、充满术语的法律文件中。虽然大型语言模型（LLM）可以通过自动信息提取支持知识图谱构建，但现有方法依赖通用模型，未针对该领域所需的实体和关系定义进行定制。我们提出FineREX，一个精简的知识图谱构建流水线，基于微调的LLM进行命名实体识别和关系提取（NER-RE）。使用包含512个文本块的手动标注数据集，FineREX在实体和关系F1分数上分别比更大的通用基线模型绝对提高了15.50%和31.46%。这些提升转化为更高质量的知识图谱，将法律噪声减少近一半，并将长文档上的节点重复率从17.78%降至11.17%。通过消除文档重写和冗余提取阶段，FineREX还将端到端处理时间减少了50.0%。我们的结果表明，领域特定的微调可以显著优于更大的通用模型，同时提高非法网络分析知识图谱构建的质量和效率。

英文摘要

Court proceedings contain valuable evidence about human smuggling networks, but this information is often buried within unstructured, jargon-heavy legal documents. While large language models (LLMs) can support knowledge graph construction through automated information extraction, existing approaches rely on general-purpose models that are not tailored to the entity and relationship definitions required in this domain. We introduce FineREX, a streamlined knowledge graph construction pipeline built around a fine-tuned LLM for named entity recognition and relationship extraction (NER-RE). Using a manually annotated dataset of $512$ text chunks, FineREX achieves absolute improvements of 15.50% and 31.46% in entity and relationship F1-score, respectively, compared to a larger general-purpose baseline. These gains translate into higher-quality knowledge graphs, reducing legal noise by nearly half and lowering node duplication on long documents from 17.78% to 11.17%. By eliminating document rewriting and redundant extraction stages, FineREX also reduces end-to-end processing time by 50.0%. Our results demonstrate that domain-specific fine-tuning can substantially outperform larger general-purpose models while improving both the quality and efficiency of knowledge graph construction for illicit network analysis.

URL PDF HTML ☆

赞 0 踩 0