计算机科学

基于集成特征选择的网络异常流量检测

  • 黄奇文 ,
  • 李丽颖 ,
  • 沈富可 ,
  • 魏同权
展开
  • 华东师范大学 计算机科学与技术学院, 上海 200062

收稿日期: 2020-06-28

  网络出版日期: 2021-11-26

Network anomaly traffic detection based on ensemble feature selection

  • Qiwen HUANG ,
  • Liying LI ,
  • Fuke SHEN ,
  • Tongquan WEI
Expand
  • School of Computer Science and Technology, East China Normal University, Shanghai 200062, China

Received date: 2020-06-28

  Online published: 2021-11-26

摘要

随着互联网技术的不断发展, 网络的安全问题日益受到人们的重视. 网络异常流量检测能够为拦截网络攻击提供有效的保障. 然而, 为了准确检测网络中的异常流量, 通常需要分析海量的数据. 分析这些数据不仅消耗巨大的计算资源, 降低检测的实时性, 还有可能降低检测的准确率. 为解决这些问题, 提出了一种基于集成特征选择的网络异常流量检测方法: 采用5种不同的特征选择算法, 设计了一种投票机制以选择特征子集; 用朴素贝叶斯、决策树、XGBoost (eXtreme Gradient Boosting)这3种不同的机器学习算法, 评估所采用的特征选择算法; 选择表现最好的算法以实现网络异常流量检测. 实验结果表明, 在使用提出的算法所选取出的最优子特征上, 所提方法的运行时间比在原始数据集上少了84.38%, 平均准确率比单个特征选择算法提高了16.93%.

本文引用格式

黄奇文 , 李丽颖 , 沈富可 , 魏同权 . 基于集成特征选择的网络异常流量检测[J]. 华东师范大学学报(自然科学版), 2021 , 2021(6) : 100 -111 . DOI: 10.3969/j.issn.1000-5641.2021.06.011

Abstract

With the continuous development of Internet technology, network security is garnering increasing attention. Network anomalous traffic detection can provide an effective guarantee for blocking network attacks. However, to accurately detect anomalous traffic in a network, analyzing large volumes of data is usually required. Analyzing this data not only consumes substantial computational resources and reduces real-time detection capability, but it may also reduce the overall accuracy of detection. To solve these problems, we propose a network anomaly traffic detection method based on ensemble feature selection. Specifically, we use five different feature selection algorithms to design a voting mechanism for selecting feature subsets. Three different machine learning algorithms (Naive Bayesian, Decision Tree, XGBoost) are used to evaluate the feature selection algorithm, and the best algorithm is selected to detect abnormal network traffic. The experimental results show that the runtime of the proposed method is 84.38% less than the original data set on the optimal feature subset selected by the proposed approach, and the average accuracy is 16.93% higher than that of the single feature selection algorithm.

参考文献

1 CISCO. Cisco visual networking index: Forecast and methodology, 2016–2021 [EB/OL]. (2017-06-15)[2020-06-24]. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visualnetworking-indexvni/complete-white-paper-c11-481360.pdf.
2 KYLE Y. Read Dyn’s statement on the 10/21/2016 DNS DDoS attack [EB/OL]. (2016-10-21)[2020-06-24]. https://dyn.com/blog/dyn-statement-on-10212016-ddos-attack.html.
3 PATIL N V, KRISHNA C R, KUMAR K, et al. E-Had: A distributed and collaborative detection framework for early detection of DDoS attacks [J/OL]. Journal of King Saud University-Computer and Information Sciences, 2019. https://doi.org/10.1016/j.jksuci.2019.06.016.
4 PACHECO F, EXPOSITO E, GINESTE M, et al. Towards the deployment of machine learning solutions in network traffic classification: A systematic survey. IEEE Communications Surveys and Tutorials, 2018, 21(4), 1988- 2014.
5 INTERNET ASSIGNED NUMBERS AUTHORITY. Protocol Assignments [EB/OL]. (2011-12-17)[2020-06-24]. https://www.iana.org/protocols.
6 CALLADO A, KELNER J, SADOK D, et al. Better network traffic identification through the independent combination of techniques. Journal of Network and Computer Applications, 2010, 33 (4): 433- 446.
7 BELAVAGI M C, MUNIYAL B. Performance evaluation of supervised machine learning algorithms for intrusion detection. Procedia Computer Science, 2016, 89, 117- 123.
8 OSANAIYE O, CAI H B, CHOO K K R, et al. Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing [J]. EURASIP Journal on Wireless Communications and Networking, 2016: Article number 130. DOI: 10.1186/s13638-016-0623-3
9 HOQUE N, SINGH M, BHATTACHARYYA D K. EFS-MI: An ensemble feature selection method for classification. Complex & Intelligent Systems, 2018(4): 105-118.,
10 SINGH K J, DE T. Efficient classification of DDoS attacks using an ensemble feature selection algorithm. Journal of Intelligent Systems, 2017, 29 (1): 71- 83.
11 KE G L, MENG Q, FINLEY T, et al. LightGBM: A highly efficient gradient boosting decision tree [C]// Advances in Neural Information Processing Systems (NIPS 2017). 2017: 3146-3154.
12 CHEN T Q, GUESTRIN C. XGBoost: A scalable tree boosting system[C] // Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016: 785–794.
13 BOLóN-CANEDO V, ALONSO-BETANZOS A. Ensembles for feature selection: A review and future trends. Information Fusion, 2019, 52, 1- 12.
14 HO T K. Random decision forests [C]// Proceedings of 3rd International Conference on Document Analysis and Recognition. IEEE,1995: 278-282.
15 BREIMAN L. Random forest. Machine Learning, 2001, 45, 5- 32.
16 李航. 统计学习方法[M]. 2版. 北京: 清华大学出版社, 2019: 59-60.
17 CHEN T Q. Story and lessons behind the evolution of XGBoost [EB/OL]. (2016-03-10)[2020-06-24]. https://homes.cs.washington.edu/~tqchen/2016/03/10/story-and-lessons-behind-the-evolution-of-xgboost.html.
18 SHARAFALDIN I, LASHKARI A H, GHORBANI A A. Toward generating a new intrusion detection dataset and intrusion traffic characterization [C]// Proceedings of the 4th International Conference on Information Systems Security and Privacy - ICISSP. 2018: 108-116.
19 LASHKARI A H, DRAPER-GIL G, MAMUN M S I, et al. Characterization of tor traffic using time based features [C]// Proceedings of the 3rd International Conference on Information Systems Security and Privacy - ICISSP. 2017: 253-262.
文章导航

/