12.03.2016 Views

Anomaly Detection for Monitoring

anomaly-detection-monitoring

anomaly-detection-monitoring

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

No. You’ve stumbled into statistical quicksand:<br />

• It’s not important that the data is Gaussian. What matters is<br />

whether the residuals are Gaussian.<br />

• The histogram is of the sample of data, but the population, not<br />

the sample, is what’s important.<br />

Let’s explore each of these topics.<br />

The Data Doesn’t Need to Be Gaussian<br />

The residuals, not the data, need to be Gaussian (normal) to use<br />

three-sigma rules and the like.<br />

What are residuals? Residuals are the errors in prediction. They’re<br />

the difference between the predictions your model makes, and the<br />

values you actually observe.<br />

If you measure a system whose behavior is log-normal, and base<br />

your predictions on a model whose predictions are log-normal, and<br />

the errors in prediction are normally distributed, a standard SPC<br />

control chart of the results using three-sigma confidence intervals<br />

can actually work very well.<br />

Likewise, if you have multi-modal data (whose distribution looks<br />

like a camel’s humps, perhaps) and your model’s predictions result<br />

in normally distributed residuals, you’re doing fine.<br />

In fact, your data can look any kind of crazy. It doesn’t matter; what<br />

matters is whether the residuals are Gaussian. This is superimportant<br />

to understand. Every type of control chart we discussed<br />

previously actually works like this:<br />

• It models the metric’s behavior somehow. For example, the<br />

EWMA control chart’s implied model is “the next value is likely<br />

to be close to the current value of the EWMA.”<br />

• It subtracts the prediction from the observed value.<br />

• It effectively puts control lines on the residual. The idea is that<br />

the residual is now a stable value, centered around zero.<br />

Any control chart can be implemented either way:<br />

• Predict, take the residual, find control limits, evaluate whether<br />

the residual is out of bounds<br />

Common Myths About Statistical <strong>Anomaly</strong> <strong>Detection</strong> | 29

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!