Arrays of (locality-sensitive) Count Estimators (ACE): High-Speed Anomaly Detection via Cache Lookups

06/20/2017
by   Chen Luo, et al.
0

Anomaly detection is one of the frequent and important subroutines deployed in large-scale data processing systems. Even being a well-studied topic, existing techniques for unsupervised anomaly detection require storing significant amounts of data, which is prohibitive from memory and latency perspective. In the big-data world existing methods fail to address the new set of memory and latency constraints. In this paper, we propose ACE (Arrays of (locality-sensitive) Count Estimators) algorithm that can be 60x faster than the ELKI package DBLP:conf/ssd/AchtertBKSZ09, which has the fastest implementation of the unsupervised anomaly detection algorithms. ACE algorithm requires less than 4MB memory, to dynamically compress the full data information into a set of count arrays. These tiny 4MB arrays of counts are sufficient for unsupervised anomaly detection. At the core of the ACE algorithm, there is a novel statistical estimator which is derived from the sampling view of Locality Sensitive Hashing(LSH). This view is significantly different and efficient than the widely popular view of LSH for near-neighbor search. We show the superiority of ACE algorithm over 11 popular baselines on 3 benchmark datasets, including the KDD-Cup99 data which is the largest available benchmark comprising of more than half a million entries with ground truth anomaly labels.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset