Isolation-Based Anomaly Detection

Isolation-Based Anomaly Detection Isolation-Based Anomaly Detection FEI TONY LIU and KAI MING TING, Monash University ZHI-HUA ZHOU, Nanjing University Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation. This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure ”fundamentally different from all existing methods. As a result, iForest is able to exploit subsampling (i) to achieve a low linear time-complexity and a small memory-requirement and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that iForest outperforms ORCA, one-class SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects. iForest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample. Categories and Subject Descriptors: H.2.8 [Database Management]: Database Applications ”Data mining; I.2.6 [Arti cial Intelligence]: Learning General Terms: Algorithms, Design, Experimentation Additional Key Words and Phrases: Anomaly detection, outlier detection, ensemble methods, binary tree, random tree ensemble, isolation, isolation http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Knowledge Discovery from Data (TKDD) Association for Computing Machinery

