Online Anomaly Detection For Practical Scenario

Outlier detection methods have been proposed for Practical Scenario. These existing approaches can be divided into three categories: distribution (statistical), distance and density-based methods. Statistical approaches assume that the data follows some standard or predetermined distributions, and this type of approach aims to find the outliers which deviate form such distributions. Most distribution models are assumed univariate, and thus the lack of robustness for multidimensional data is a concern. These methods are typically implemented in the original data space directly, their solution models might suffer from the noise present in the data. Nevertheless, the assumption or the prior knowledge of the data distribution is not easily determined for practical problems. For distance-based methods , the distances between each data point of interest and its neighbors are calculated. If the result is above some predetermined threshold, the target instance will be considered as an outlier. While no prior knowledge on data distribution is needed, these approaches might encounter problems when the data distribution is complex (e.g., multi-clustered structure). This type of approach will result in determining improper neighbors, and thus outliers cannot be correctly identified. Online anomaly detection applications such as spam mail filtering, one typically designs an initial classifier using the training normal data, and this classifier is updated by the newly received normal or outlier data accordingly. In practical scenarios, even the training normal data collected in advance can be contaminated by noise or incorrect data labeling. To construct a simple yet effective model for online detection, one should disregard these potentially deviated data instances from the training set of normal data (it is not practical to collect training outlier data anyway). The flowchart of our online detection procedure there are two phases required in this framework: Data cleaning and online detection.