Computer Science

Network anomaly traffic detection based on ensemble feature selection

  • Qiwen HUANG ,
  • Liying LI ,
  • Fuke SHEN ,
  • Tongquan WEI
Expand
  • School of Computer Science and Technology, East China Normal University, Shanghai 200062, China

Received date: 2020-06-28

  Online published: 2021-11-26

Abstract

With the continuous development of Internet technology, network security is garnering increasing attention. Network anomalous traffic detection can provide an effective guarantee for blocking network attacks. However, to accurately detect anomalous traffic in a network, analyzing large volumes of data is usually required. Analyzing this data not only consumes substantial computational resources and reduces real-time detection capability, but it may also reduce the overall accuracy of detection. To solve these problems, we propose a network anomaly traffic detection method based on ensemble feature selection. Specifically, we use five different feature selection algorithms to design a voting mechanism for selecting feature subsets. Three different machine learning algorithms (Naive Bayesian, Decision Tree, XGBoost) are used to evaluate the feature selection algorithm, and the best algorithm is selected to detect abnormal network traffic. The experimental results show that the runtime of the proposed method is 84.38% less than the original data set on the optimal feature subset selected by the proposed approach, and the average accuracy is 16.93% higher than that of the single feature selection algorithm.

Cite this article

Qiwen HUANG , Liying LI , Fuke SHEN , Tongquan WEI . Network anomaly traffic detection based on ensemble feature selection[J]. Journal of East China Normal University(Natural Science), 2021 , 2021(6) : 100 -111 . DOI: 10.3969/j.issn.1000-5641.2021.06.011

References

1 CISCO. Cisco visual networking index: Forecast and methodology, 2016–2021 [EB/OL]. (2017-06-15)[2020-06-24]. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visualnetworking-indexvni/complete-white-paper-c11-481360.pdf.
2 KYLE Y. Read Dyn’s statement on the 10/21/2016 DNS DDoS attack [EB/OL]. (2016-10-21)[2020-06-24]. https://dyn.com/blog/dyn-statement-on-10212016-ddos-attack.html.
3 PATIL N V, KRISHNA C R, KUMAR K, et al. E-Had: A distributed and collaborative detection framework for early detection of DDoS attacks [J/OL]. Journal of King Saud University-Computer and Information Sciences, 2019. https://doi.org/10.1016/j.jksuci.2019.06.016.
4 PACHECO F, EXPOSITO E, GINESTE M, et al. Towards the deployment of machine learning solutions in network traffic classification: A systematic survey. IEEE Communications Surveys and Tutorials, 2018, 21(4), 1988- 2014.
5 INTERNET ASSIGNED NUMBERS AUTHORITY. Protocol Assignments [EB/OL]. (2011-12-17)[2020-06-24]. https://www.iana.org/protocols.
6 CALLADO A, KELNER J, SADOK D, et al. Better network traffic identification through the independent combination of techniques. Journal of Network and Computer Applications, 2010, 33 (4): 433- 446.
7 BELAVAGI M C, MUNIYAL B. Performance evaluation of supervised machine learning algorithms for intrusion detection. Procedia Computer Science, 2016, 89, 117- 123.
8 OSANAIYE O, CAI H B, CHOO K K R, et al. Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing [J]. EURASIP Journal on Wireless Communications and Networking, 2016: Article number 130. DOI: 10.1186/s13638-016-0623-3
9 HOQUE N, SINGH M, BHATTACHARYYA D K. EFS-MI: An ensemble feature selection method for classification. Complex & Intelligent Systems, 2018(4): 105-118.,
10 SINGH K J, DE T. Efficient classification of DDoS attacks using an ensemble feature selection algorithm. Journal of Intelligent Systems, 2017, 29 (1): 71- 83.
11 KE G L, MENG Q, FINLEY T, et al. LightGBM: A highly efficient gradient boosting decision tree [C]// Advances in Neural Information Processing Systems (NIPS 2017). 2017: 3146-3154.
12 CHEN T Q, GUESTRIN C. XGBoost: A scalable tree boosting system[C] // Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016: 785–794.
13 BOLóN-CANEDO V, ALONSO-BETANZOS A. Ensembles for feature selection: A review and future trends. Information Fusion, 2019, 52, 1- 12.
14 HO T K. Random decision forests [C]// Proceedings of 3rd International Conference on Document Analysis and Recognition. IEEE,1995: 278-282.
15 BREIMAN L. Random forest. Machine Learning, 2001, 45, 5- 32.
16 李航. 统计学习方法[M]. 2版. 北京: 清华大学出版社, 2019: 59-60.
17 CHEN T Q. Story and lessons behind the evolution of XGBoost [EB/OL]. (2016-03-10)[2020-06-24]. https://homes.cs.washington.edu/~tqchen/2016/03/10/story-and-lessons-behind-the-evolution-of-xgboost.html.
18 SHARAFALDIN I, LASHKARI A H, GHORBANI A A. Toward generating a new intrusion detection dataset and intrusion traffic characterization [C]// Proceedings of the 4th International Conference on Information Systems Security and Privacy - ICISSP. 2018: 108-116.
19 LASHKARI A H, DRAPER-GIL G, MAMUN M S I, et al. Characterization of tor traffic using time based features [C]// Proceedings of the 3rd International Conference on Information Systems Security and Privacy - ICISSP. 2017: 253-262.
Outlines

/