您当前的位置:首页 > 论文详情

基于大语言模型的《明史》知识抽取及知识图谱构建研究

A Study on Knowledge Extraction and Knowledge Graph Construction of the Ming History Based on Large Language Models

摘要: 大语言模型为古籍知识抽取及知识组织提供了新的研究范式。本文以《明史》为研究对象,选取了通义千问(Qwen-long)、文心一言(ERNIE-4.0-8K)、ChatGPT(gpt-4o)、Claude(Claude3.5-sonnet)及荀子(Xunzi-Qwen1.5-7B_chat)五类大模型,对人物、时间、地点、官职、事件5种实体及12类关系识别效果进行了对比分析,并在实体抽取任务上与经典深度学习的命名实体识别算法HMM、CRF、Bi-LSTM、Bi-LSTM-CRF、BERT、ERNIE进行了对比。研究发现,在实体抽取任务上,Qwen-long、ERNIE-4.0-8K和gpt-4o展现了较强的泛化能力,尤其在时间、地点和人物类别上表现优异;在官职和事件类别上的识别率普遍较低,显示出一定的局限性。在关系抽取任务上,Qwen-long最优。进一步地,本文基于Qwen-long抽取结果,构建了《明史》知识图谱。本研究验证了大语言模型在历史类古籍知识抽取中的应用价值,同时为传统人文研究提供了新的视角和技术支持,未来可以进一步探索基于通用大模型开发人文领域的专用模型,以助力相关研究。

Abstract: Large language models provide a new paradigm for knowledge extraction and organization from ancient texts. This study focuses on the Ming History, employing five large language models—Qwen-long, ERNIE-4.0-8K, gpt-4o, Claude3.5-sonnet, and Xunzi-Qwen1.5-7B_chat—to analyze their performance in recognizing five types of entities (person, time, location, official position, event) and twelve types of relationships. Additionally, this study compares the entity extraction tasks with classical deep learning named entity recognition algorithms such as HMM, CRF, Bi-LSTM, Bi-LSTM-CRF, BERT, and ERNIE.The research findings indicate that in entity extraction tasks, Qwen-long, ERNIE-4.0-8K, and gpt-4o demonstrate strong generalization capabilities, particularly excelling in categories of time, location, and person. However, these models exhibit limitations in recognizing official positions and events, where accuracy rates are generally lower. In relation extraction tasks, Qwen-long performs optimally. Furthermore, based on the extraction results from Qwen-long, this study constructs a knowledge graph for the Ming History.This research validates the application value of large language models in knowledge extraction from historical ancient texts and offers new perspectives and technical support for traditional humanities research. Future work can further explore developing specialized models for the humanities domain based on general-purpose large models to aid related studies.

版本历史

[V1] 2025-01-08 17:40:29 PSSXiv:202501.01048V1 下载全文
点击下载全文
在线阅读
许可声明
metrics指标
  •  点击量733
  •  下载量2
  • 评论量 0
评论
分享
邀请专家评阅
收藏