Anomaly Detection for Monitoring
anomaly-detection-monitoring
anomaly-detection-monitoring
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
No. You’ve stumbled into statistical quicksand:<br />
• It’s not important that the data is Gaussian. What matters is<br />
whether the residuals are Gaussian.<br />
• The histogram is of the sample of data, but the population, not<br />
the sample, is what’s important.<br />
Let’s explore each of these topics.<br />
The Data Doesn’t Need to Be Gaussian<br />
The residuals, not the data, need to be Gaussian (normal) to use<br />
three-sigma rules and the like.<br />
What are residuals? Residuals are the errors in prediction. They’re<br />
the difference between the predictions your model makes, and the<br />
values you actually observe.<br />
If you measure a system whose behavior is log-normal, and base<br />
your predictions on a model whose predictions are log-normal, and<br />
the errors in prediction are normally distributed, a standard SPC<br />
control chart of the results using three-sigma confidence intervals<br />
can actually work very well.<br />
Likewise, if you have multi-modal data (whose distribution looks<br />
like a camel’s humps, perhaps) and your model’s predictions result<br />
in normally distributed residuals, you’re doing fine.<br />
In fact, your data can look any kind of crazy. It doesn’t matter; what<br />
matters is whether the residuals are Gaussian. This is superimportant<br />
to understand. Every type of control chart we discussed<br />
previously actually works like this:<br />
• It models the metric’s behavior somehow. For example, the<br />
EWMA control chart’s implied model is “the next value is likely<br />
to be close to the current value of the EWMA.”<br />
• It subtracts the prediction from the observed value.<br />
• It effectively puts control lines on the residual. The idea is that<br />
the residual is now a stable value, centered around zero.<br />
Any control chart can be implemented either way:<br />
• Predict, take the residual, find control limits, evaluate whether<br />
the residual is out of bounds<br />
Common Myths About Statistical <strong>Anomaly</strong> <strong>Detection</strong> | 29