河北大学学报(自然科学版) ›› 2024, Vol. 44 ›› Issue (2): 199-207.DOI: 10.3969/j.issn.1000-1565.2024.02.011

• • 上一篇    下一篇

基于特征增强的中医本草命名实体识别方法

马月坤1,2,吴国仲1   

  • 收稿日期:2023-01-10 出版日期:2024-03-25 发布日期:2024-04-10
  • 作者简介:马月坤(1976—),女,华北理工大学教授,博士,主要从事自然语言处理. E-mail:mayuekun@163.com
  • 基金资助:
    河北省三三三人才项目(A201803082)

Research on named entity recognition of traditional Chinese medicine based on feature enhancement

MA Yuekun1,2, WU Guozhong1   

  1. 1. Hebei Key Laboratory of Industrial Intelligent Perception, Institute for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China; 2. Beijing Key Laboratory of Knowledge Engineering in the Field of Materials, Institute for Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
  • Received:2023-01-10 Online:2024-03-25 Published:2024-04-10

摘要: 传统中医本草文献含有丰富的中医知识,是中医理论研究的重要载体.为了更好地挖掘中医本草知识,精准地实现中医本草文献命名实体识别任务,提出了一种基于特征增强的Bert-BiGRU-CRF中医本草命名实体识别模型,使用特征融合器拼接Bert生成的词向量与实体特征作为输入,以双向门控循环单元(bi-directional gated recurrent unit,BiGRU)为特征提取器,以条件随机场(conditional random fields,CRF)进行标签预测,通过特征增强的方法更好地识别中医本草的药名、药性、药味、归经等实体及其边界信息,完成中医本草命名实体任务.在中医本草数据集上的实验结果表明,融入特征的模型F1值达到了90.54%,证明了所提出的方法可以更好地提高中医本草命名实体识别精度.

关键词: 命名实体识别, 中草药, 特征增强, 词典信息

Abstract: Traditional Chinese medicine(TCM)herbal literature contains rich knowledge of TCM and is an important carrier of theoretical research in TCM. In order to better explore the knowledge of TCM herbal literature and accurately achieve the task of named entity recognition in TCM herbal literature, a Bert-BiGRU-CRF named entity recognition model for TCM herbal literature based on feature enhancement is proposed, which uses a feature fusion tool to concatenate the word vector generated by Bert with entity features as input, With Bi directional gated recurrent unit(BiGRU)as the feature extractor and Conditional random field(CRF)as the tag prediction, the method of feature enhancement is used to better identify the entities and their boundary information such as the name, property, taste and meridian tropism of TCM herbs, and complete the task of naming entities of TCM herbs. The experimental results on the dataset of TCM herbs show that the F1 value of the model incorporating features reaches 90.54%,- DOI:10.3969/j.issn.1000-1565.2024.02.011基于特征增强的中医本草命名实体识别方法马月坤1,2,吴国仲1(1.华北理工大学 人工智能学院,河北省工业智能感知重点实验室,河北 唐山 063210;2.北京科技大学 计算机与通信工程学院,材料领域知识工程北京市重点实验室,北京 100083)摘 要:传统中医本草文献含有丰富的中医知识,是中医理论研究的重要载体.为了更好地挖掘中医本草知识,精准地实现中医本草文献命名实体识别任务,提出了一种基于特征增强的Bert-BiGRU-CRF中医本草命名实体识别模型,使用特征融合器拼接Bert生成的词向量与实体特征作为输入,以双向门控循环单元(bi-directional gated recurrent unit,BiGRU)为特征提取器,以条件随机场(conditional random fields,CRF)进行标签预测,通过特征增强的方法更好地识别中医本草的药名、药性、药味、归经等实体及其边界信息,完成中医本草命名实体任务.在中医本草数据集上的实验结果表明,融入特征的模型F1值达到了90.54%,证明了所提出的方法可以更好地提高中医本草命名实体识别精度.关键词:命名实体识别;中草药;特征增强;词典信息中图分类号:TP391.1;R271.14 文献标志码:A 文章编号:1000-1565(2024)02-0199-09Research on named entity recognition of traditional Chinese medicine based on feature enhancementMA Yuekun1,2, WU Guozhong1(1. Hebei Key Laboratory of Industrial Intelligent Perception, Institute for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China;2. Beijing Key Laboratory of Knowledge Engineering in the Field of Materials, Institute for Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China)Abstract: Traditional Chinese medicine(TCM)herbal literature contains rich knowledge of TCM and is an important carrier of theoretical research in TCM. In order to better explore the knowledge of TCM herbal literature and accurately achieve the task of named entity recognition in TCM herbal literature, a Bert-BiGRU-CRF named entity recognition model for TCM herbal literature based on feature enhancement is proposed, which uses a feature fusion tool to concatenate the word vector generated by Bert with entity features as input, With Bi directional gated recurrent unit(BiGRU)as the feature extractor and Conditional random field(CRF)as the tag prediction, the method of feature enhancement is used to better identify the entities and their boundary information such as the name, property, taste and meridian tropism of TCM herbs, and complete the task of naming entities of TCM herbs. The experimental results on the dataset of TCM herbs show that the F1 value of the model incorporating features reaches 90.54%,- 收稿日期:2023-01-10;修回日期:2023-04-26 基金项目:河北省三三三人才项目(A201803082) 第一作者:马月坤(1976—),女,华北理工大学教授,博士,主要从事自然语言处理. E-mail:mayuekun@163.com第2期马月坤等:基于特征增强的中医本草命名实体识别方法proving that the proposed method can better improve the accuracy of named entity recognition in TCM herbs.

Key words: named entity recognition, Chinese herbal medicine, feature enhancement, dictionary information

中图分类号: