华东师范大学学报(自然科学版) ›› 2021, Vol. 2021 ›› Issue (6): 100-111.doi: 10.3969/j.issn.1000-5641.2021.06.011

• 计算机科学 • 上一篇    下一篇

基于集成特征选择的网络异常流量检测

黄奇文, 李丽颖, 沈富可, 魏同权*()   

  1. 华东师范大学 计算机科学与技术学院, 上海 200062
  • 收稿日期:2020-06-28 出版日期:2021-11-25 发布日期:2021-11-26
  • 通讯作者: 魏同权 E-mail:tqwei@cs.ecnu.edu.cn

Network anomaly traffic detection based on ensemble feature selection

Qiwen HUANG, Liying LI, Fuke SHEN, Tongquan WEI*()   

  1. School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
  • Received:2020-06-28 Online:2021-11-25 Published:2021-11-26
  • Contact: Tongquan WEI E-mail:tqwei@cs.ecnu.edu.cn

摘要:

随着互联网技术的不断发展, 网络的安全问题日益受到人们的重视. 网络异常流量检测能够为拦截网络攻击提供有效的保障. 然而, 为了准确检测网络中的异常流量, 通常需要分析海量的数据. 分析这些数据不仅消耗巨大的计算资源, 降低检测的实时性, 还有可能降低检测的准确率. 为解决这些问题, 提出了一种基于集成特征选择的网络异常流量检测方法: 采用5种不同的特征选择算法, 设计了一种投票机制以选择特征子集; 用朴素贝叶斯、决策树、XGBoost (eXtreme Gradient Boosting)这3种不同的机器学习算法, 评估所采用的特征选择算法; 选择表现最好的算法以实现网络异常流量检测. 实验结果表明, 在使用提出的算法所选取出的最优子特征上, 所提方法的运行时间比在原始数据集上少了84.38%, 平均准确率比单个特征选择算法提高了16.93%.

关键词: 异常流量检测, 集成特征选择, 投票机制

Abstract:

With the continuous development of Internet technology, network security is garnering increasing attention. Network anomalous traffic detection can provide an effective guarantee for blocking network attacks. However, to accurately detect anomalous traffic in a network, analyzing large volumes of data is usually required. Analyzing this data not only consumes substantial computational resources and reduces real-time detection capability, but it may also reduce the overall accuracy of detection. To solve these problems, we propose a network anomaly traffic detection method based on ensemble feature selection. Specifically, we use five different feature selection algorithms to design a voting mechanism for selecting feature subsets. Three different machine learning algorithms (Naive Bayesian, Decision Tree, XGBoost) are used to evaluate the feature selection algorithm, and the best algorithm is selected to detect abnormal network traffic. The experimental results show that the runtime of the proposed method is 84.38% less than the original data set on the optimal feature subset selected by the proposed approach, and the average accuracy is 16.93% higher than that of the single feature selection algorithm.

Key words: anomaly traffic detection, ensemble feature selection, voting mechanism

中图分类号: