With the development of power information systems, users' requirements for the quality of power data has gradually increased. Hence, it is important to ensure the accuracy, reliability, and integrity of massive power data. In this paper, an anomaly detection algorithm based on Isolation Forests is used to realize anomaly detection of large-scale electric energy data. Isolation Forest algorithms generate random binary trees and isolated forest models by dividing training samples and detecting abnormal data points. The algorithm can not only process massive data quickly, but it also offers accurate results and a high degree of reliability. In this paper, the positive active total power (PAP) and reverse active total power (RAP) fields of large-scale electric energy data are determined. The experimental results show that the algorithm has high detection efficiency and accuracy.
HUANG Fu-xing
,
ZHOU Guang-shan
,
DING Hong
,
ZHANG Luo-ping
,
QIAN Shu-yun
,
YUAN Pei-sen
. Electric energy abnormal data detection based on Isolation Forests[J]. Journal of East China Normal University(Natural Science), 2019
, 2019(5)
: 123
-132
.
DOI: 10.3969/j.issn.1000-5641.2019.05.010
[1] 罗志仕,张晋明.对国内电能量计量系统现状的调查研究[J].大科技, 2013(12):66-67.
[2] CHANDOLA V, BANERJEE A, KUMAR V. Anomaly detection[J]. ACM Computing Surveys, 2009, 41(3):1-58.
[3] 简富俊,曹敏,王磊,等.基于SVM的AMI环境下用电异常检测研究[J].电测与仪表, 2014, 51(6):64-69.
[4] 王增平,张晋芳,钱诚.基于同步测量信息的电网拓扑错误辨识方法[J].电力自动化设备, 2012, 32(1):1-8.
[5] 王兴志,严正,沈沉,等.基于在线核学习的电网不良数据检测与辨识方法[J].电力系统保护与控制, 2012(1):50-55.
[6] ESKIN E, ARNOLD A, PRERAU M, et al. A geometric framework for unsupervised anomaly detection:Detecting intrusions in unlabeled data[M]//Applications of Data Mining in Computer Security. Amsterdam:Kluwer Academic Publisher, 2002:77-101.
[7] MONEDERO I, BISCARRI F, LEÓN C, et al. Detection of frauds and other non-technical losses in a power utility using Pearson coefficient, Bayesian networks and decision trees[J]. International Journal of Electrical Power&Energy Systems, 2012, 34(1):90-98.
[8] PLATT J, SCHÖKOPF B, SHAWE-TAYLOR J, et al. Estimating the support of a high-dimensional distribution[J]. Neural computation, 2001, 13(7):1443-1471.
[9] STEINWART I, HUSH D, SCOVEL C. A classification framework for anomaly detection[J]. Journal of Machine Learning Research, 2005(6):211-232.
[10] 陈阳,王勇,孙伟.基于YARN规范的智能电网大数据异常检测[J].信息网络安全, 2017(7):11-17.
[11] 严英杰,盛戈皞,陈玉峰,等.基于大数据分析的输变电设备状态数据异常检测方法[J].中国电机工程学报, 2015, 35(1):52-59.
[12] 肖坚红,严小文,周永真,等.基于数据挖掘的计量装置在线监测与智能诊断系统的设计与实现[J].电测与仪表, 2014, 51(14):1-5.
[13] 魏瑶,朱伟义,龚桃荣,等.基于数据挖掘技术的用电异常分析系统设计[J].电力信息与通信技术, 2014, 12(5):70-73.
[14] 田野,张程,毛昕儒,等.运用PCA改进BP神经网络的用电异常行为检测[J].重庆理工大学学报(自然科学版), 2017, 31(8):125-133.
[15] LIU F T, TING K M, ZHOU Z H. Isolation forest[C]//2008 Eighth IEEE International Conference on Data Mining. IEEE, 2008:413-422.
[16] 倪永峰,闫连山,崔允贺,等.面向软件定义网络的隐蔽通信检测机制[J].计算机系统应用, 2018, 27(9):143-150.
[17] 朱佳俊,陈功,施勇,等.基于用户画像的异常行为检测[J].通信技术, 2017, 50(10):2310-2315.
[18] 李新鹏,高欣,阎博,等.基于孤立森林算法的电力调度流数据异常检测方法[J].电网技术, 2019, 43(4):1447-1456.
[19] 朱炜玉,史斌,姜继平,等.基于水质时间序列异常检测的动态预警方法[J].环境科学与技术, 2018, 41(12):131-137.
[20] 韩明涛.时间序列模式挖掘的算法研究[J].山东大学学报(工学版), 2004, 34(3):88-91.
[21] 余宇峰,朱跃龙,万定生,等.基于滑动窗口预测的水文时间序列异常检测[J].计算机应用, 2014, 34(8):2217-2220.
[22] 孙梅玉.基于距离和密度的时间序列异常检测方法研究[J].计算机工程与应用, 2012(20):11-17.
[23] 曹旭,曹瑞彤.基于大数据分析的网络异常检测方法[J].电信科学, 2014, 30(6):152-156.
[24] LEYS C, LEY C, KLEIN O, et al. Detecting outliers:Do not use standard deviation around the mean, use absolute deviation around the median[J]. Journal of Experimental Social Psychology, 2013, 49(4):764-766.