河北大学学报(自然科学版) ›› 2021, Vol. 41 ›› Issue (2): 201-211.DOI: 10.3969/j.issn.1000-1565.2021.02.014

• • 上一篇    下一篇

集成局部和全局关键特征的文本情感分类方法

柴变芳1,杨蕾1,王建岭2,李仁玲2   

  • 收稿日期:2020-03-12 出版日期:2021-03-25 发布日期:2021-04-07
  • 通讯作者: 李仁玲(1973—)
  • 作者简介:柴变芳(1979—),女,山西运城人,河北地质大学教授,博士,主要从事机器学习、复杂网络分析研究.
    E-mail:chaibianfang@163.com
  • 基金资助:
    国家自然科学基金资助项目(81473773);河北省自然科学基金资助项目(F2019403070);河北省教育厅重点项目(ZD2020175)

Text sentiment classification approach with integrated local and global prominent features

CHAI Bianfang1, YANG Lei1,WANG Jianling2,LI Renling2   

  1. 1. College of Information Engineering, Hebei GEO University, Shijiazhuang 050031, China; 2. Library, Hebei University of Chinese Medicine, Shijiazhuang 050200, China
  • Received:2020-03-12 Online:2021-03-25 Published:2021-04-07

摘要: 融合卷积神经网络(convolutional neural network,CNN)和双向长短期记忆网络(Bi-directional long short-term memory,BiLSTM)的情感分析模型(CNN_BiLSTM)是一个流行的模型,其学习文本的局部特征和全局特征实现情感分类,但是忽略了特征对分类结果的重要程度,且没充分利用词语间的特征,导致分类准确率不高.提出一种集成基于多卷积核的卷积神经网络和注意力双向长短期记忆网络特征的文本情感分类方法(MCNN_Att-BiLSTM),其集成局部和全局的重要特征作为文本语义特征,该特征进而用于训练文本情感分类器XGBoost(eXtreme gradient Boosting).该方法基于注意力机制的BiLSTM提取对分类影响大的全局关键特征,基于多卷积核的CNN获得更全面的词语间特征,为集成分类器准备了有效分类的特征.实验结果表明,该模型具有更好的情感分类准确率,与CNN_BiLSTM模型相比,在IMDB数据集上准确率提升了1.75%,在txt-sentoken数据集上准确率提升了1.67%,在谭松波-酒店评论数据集上准确率提升了3.81%.

关键词: 情感分析, CNN, BiLSTM, XGBoost, 特征融合

Abstract: The model combining convolutional neural network(CNN)with Bi-directional long short-term memory(BiLSTM)feature(CNN_BiLSTM)is popular for sentiment analysis. It takes into account the local and global features to realize sentiment classification of the text. However, it ignores the importance of the features for the classification results, and does not make full use of features between words, which result in low classification efficiency and accuracy. Thus, a model based on a CNN that integrates multiple convolution kernels and a bidirectional long-term and short-term memory network of attention(MCNN_Att-BiLSTM)for- DOI:10.3969/j.issn.1000-1565.2021.02.014集成局部和全局关键特征的文本情感分类方法柴变芳1,杨蕾1,王建岭2,李仁玲2(1. 河北地质大学 信息工程学院,河北 石家庄 050031;2. 河北中医学院 图书馆,河北 石家庄 050200)摘 要:融合卷积神经网络(convolutional neural network,CNN)和双向长短期记忆网络(Bi-directional long short-term memory,BiLSTM)的情感分析模型(CNN_BiLSTM)是一个流行的模型,其学习文本的局部特征和全局特征实现情感分类,但是忽略了特征对分类结果的重要程度,且没充分利用词语间的特征,导致分类准确率不高.提出一种集成基于多卷积核的卷积神经网络和注意力双向长短期记忆网络特征的文本情感分类方法(MCNN_Att-BiLSTM),其集成局部和全局的重要特征作为文本语义特征,该特征进而用于训练文本情感分类器XGBoost(eXtreme gradient Boosting).该方法基于注意力机制的BiLSTM提取对分类影响大的全局关键特征,基于多卷积核的CNN获得更全面的词语间特征,为集成分类器准备了有效分类的特征.实验结果表明,该模型具有更好的情感分类准确率,与CNN_BiLSTM模型相比,在IMDB数据集上准确率提升了1.75%,在txt-sentoken数据集上准确率提升了1.67%,在谭松波-酒店评论数据集上准确率提升了3.81%.关键词:情感分析;CNN;BiLSTM;XGBoost;特征融合中图分类号:TP391 文献标志码:A 文章编号:1000-1565(2021)02-0201-11Text sentiment classification approach with integrated local and global prominent featuresCHAI Bianfang1, YANG Lei1,WANG Jianling2,LI Renling2(1. College of Information Engineering, Hebei GEO University, Shijiazhuang 050031, China;2. Library, Hebei University of Chinese Medicine, Shijiazhuang 050200, China)Abstract: The model combining convolutional neural network(CNN)with Bi-directional long short-term memory(BiLSTM)feature(CNN_BiLSTM)is popular for sentiment analysis. It takes into account the local and global features to realize sentiment classification of the text. However, it ignores the importance of the features for the classification results, and does not make full use of features between words, which result in low classification efficiency and accuracy. Thus, a model based on a CNN that integrates multiple convolution kernels and a bidirectional long-term and short-term memory network of attention(MCNN_Att-BiLSTM)for- 收稿日期:2020-03-12 基金项目:国家自然科学基金资助项目(81473773);河北省自然科学基金资助项目(F2019403070);河北省教育厅重点项目(ZD2020175) 第一作者:柴变芳(1979—),女,山西运城人,河北地质大学教授,博士,主要从事机器学习、复杂网络分析研究.E-mail:chaibianfang@163.com 通信作者:李仁玲(1973—),女,河北巨鹿人,河北中医学院副研究馆员,主要从事数据检索和大数据分析研究. E-mail:w3365@126.com王建岭(1973—),男,河北巨鹿人,河北中医学院教授,主要从事网络安全和大数据挖掘研究.E-mail:wang_jl@126.com第2期柴变芳等:集成局部和全局关键特征的文本情感分类方法text sentiment classification is proposed. It integrates local and global prominent features as semantic features of the text, which are used as inputs to train a XGBoost(eXtreme gradient Boosting)classifier to realize text sentiment classification. It utilizes the attention mechanism based on the BiLSTM to fully captain the global prominent features that affect the classification results largely. In addition, it utilizes the CNN with multi-convolution kernel to obtain more comprehensive inter-word features. Experimental results show that the model is better compared to the CNN_BiLSTM model, which improves the accuracy rate by 1.75% on the IMDB dataset, and 1.67% on the txt-sentoken dataset, and 3.81% on the Tan Songbo-hotel review dataset.

Key words: sentiment analysis, CNN, BiLSTM, XGBoost, feature fusion

中图分类号: