Anomaly Detection for Monitoring
anomaly-detection-monitoring
anomaly-detection-monitoring
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
figure out how unlikely it was <strong>for</strong> us to see a particular value. We<br />
decided to keep a histogram of all the values we’d seen and compute<br />
the percentiles of each value as we saw it. If a value fell above the<br />
99.9th percentile, we reasoned, then we could consider it to be a<br />
one-in-a-thousand occurrence.<br />
Not so! For several reasons, primarily that we were computing our<br />
percentiles from the sample, and trying to infer the probability of<br />
that value existing in the population. You can see the fallacy<br />
instantly, as we did, if you just postulate the observation of a value<br />
much higher than we’d previously seen. How unlikely is it that we<br />
saw that value? Aside from the brain-hurting existential questions,<br />
there’s the obvious implication that we’d need to know the distribution<br />
of the population in order to answer that.<br />
In general, these non-parametric methods that work by comparing<br />
the distribution (usually via histograms) across sets of values can’t<br />
be used online as each value arrives. That’s because it’s difficult to<br />
compare single values (the current observation) to a distribution of<br />
a set of values.<br />
Grubbs’ Test and ESD<br />
The Grubbs’ Test is used to test whether or not a set of data contains<br />
an outlier. This set is assumed to follow an approximately Gaussian<br />
distribution. Here’s the general procedure <strong>for</strong> the test, assuming you<br />
have an appropriate data set D.<br />
1. Calculate the sample mean. Let’s call this μ.<br />
2. Calculate the sample standard deviation. Let’s call this s.<br />
3. For each element i in D<br />
4. Calculate abs( i - μ ) / s. This is the number of standard deviations<br />
away i is from the sample mean.<br />
5. Now you have the distance from the mean <strong>for</strong> each element in<br />
D. Take the maximum.<br />
6. Now you have the maximum distance (in standard deviations)<br />
any single element is away from the mean. This is the test statistic.<br />
7. Compare this to the critical value. The critical value, which is<br />
just a threshold, is calculated from some significance level, i.e.<br />
some coverage proportion that you want. In other words, if you<br />
Grubbs’ Test and ESD | 57