华东师范大学学报(自然科学版) ›› 2023, Vol. 2023 ›› Issue (6): 85-94.doi: 10.3969/j.issn.1000-5641.2023.00.008

• 计算机科学 • 上一篇    下一篇

基于多核支持向量机的句子分类算法

肖开研, 廉洁*()   

  1. 上海师范大学 信息与机电工程学院, 上海 201418
  • 收稿日期:2022-11-26 出版日期:2023-11-25 发布日期:2023-11-23
  • 通讯作者: 廉洁 E-mail:lianjie@shnu.edu.cn
  • 基金资助:
    上海市自然科学基金 (20ZR1440900)

Sentence classification algorithm based on multi-kernel support vector machine

Kaiyan XIAO, Jie LIAN*()   

  1. The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China
  • Received:2022-11-26 Online:2023-11-25 Published:2023-11-23
  • Contact: Jie LIAN E-mail:lianjie@shnu.edu.cn

摘要:

主流句子分类算法采用单一词向量表示模型获得文本表示, 导致了对文本的映射能力不足. 对此, 通过融合多种词向量的文本表示以提高分类的准确率. 针对多核学习在融合不同核函数时, 常规的核函数系数寻优方法存在的训练时间长、难以求得局部最优解等问题, 提出了一种新的核函数系数寻优方法, 该方法基于参数空间分割与广度优先搜索不断逼近核系数的最优值. 以支持向量机(support vector machine, SVM)为分类器, 在7个文本数据集上进行了分类实验. 实验结果表明, 多核学习分类效果明显优于单核学习, 并且所提出的寻优方法在训练次数少于常规方法时也能获得了好的分类效果.

关键词: 自然语言处理, 句子分类, 多核学习, 支持向量机, 混合核

Abstract:

Mainstream sentence classification algorithms rely on a single word vector model to obtain the feature vector representation of text, which leads to insufficient text mapping ability. Therefore, a multi-kernel learning method is used to fuse multiple text representations based on different word vectors to improve the accuracy of sentence classification. In the process of fusing different kernel functions, traditional kernel function coefficient optimization methods often lead to long training time and difficulty in finding a local optimum. To address this problem, a new kernel function coefficient optimization method that continuously approximates the optimal kernel function coefficient value based on parameter space segmentation and breadth first search was developed. In this study, a support vector machine (SVM) was used as a classifier to perform classification experiments on seven text datasets, and the experimental results showed that the multi-kernel learning classification results were significantly better than those of single-kernel learning. Moreover, the proposed optimization method performed better than traditional methods with less training cost.

Key words: natural language processing, sentence classification, multi-kernel learning, support vector machine (SVM), mixed kernel

中图分类号: