12.03.2016 Views

Anomaly Detection for Monitoring

anomaly-detection-monitoring

anomaly-detection-monitoring

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

figure out how unlikely it was <strong>for</strong> us to see a particular value. We<br />

decided to keep a histogram of all the values we’d seen and compute<br />

the percentiles of each value as we saw it. If a value fell above the<br />

99.9th percentile, we reasoned, then we could consider it to be a<br />

one-in-a-thousand occurrence.<br />

Not so! For several reasons, primarily that we were computing our<br />

percentiles from the sample, and trying to infer the probability of<br />

that value existing in the population. You can see the fallacy<br />

instantly, as we did, if you just postulate the observation of a value<br />

much higher than we’d previously seen. How unlikely is it that we<br />

saw that value? Aside from the brain-hurting existential questions,<br />

there’s the obvious implication that we’d need to know the distribution<br />

of the population in order to answer that.<br />

In general, these non-parametric methods that work by comparing<br />

the distribution (usually via histograms) across sets of values can’t<br />

be used online as each value arrives. That’s because it’s difficult to<br />

compare single values (the current observation) to a distribution of<br />

a set of values.<br />

Grubbs’ Test and ESD<br />

The Grubbs’ Test is used to test whether or not a set of data contains<br />

an outlier. This set is assumed to follow an approximately Gaussian<br />

distribution. Here’s the general procedure <strong>for</strong> the test, assuming you<br />

have an appropriate data set D.<br />

1. Calculate the sample mean. Let’s call this μ.<br />

2. Calculate the sample standard deviation. Let’s call this s.<br />

3. For each element i in D<br />

4. Calculate abs( i - μ ) / s. This is the number of standard deviations<br />

away i is from the sample mean.<br />

5. Now you have the distance from the mean <strong>for</strong> each element in<br />

D. Take the maximum.<br />

6. Now you have the maximum distance (in standard deviations)<br />

any single element is away from the mean. This is the test statistic.<br />

7. Compare this to the critical value. The critical value, which is<br />

just a threshold, is calculated from some significance level, i.e.<br />

some coverage proportion that you want. In other words, if you<br />

Grubbs’ Test and ESD | 57

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!