Anomaly Detection for Monitoring
anomaly-detection-monitoring
anomaly-detection-monitoring
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
attempt to be as descriptive as possible about a collection of<br />
data. With summary statistics, your prediction <strong>for</strong>mula then<br />
becomes something like “the next value will be the same as the<br />
current average of recent values.” Now you’re predicting that<br />
values will most likely be close to what they’ve typically been<br />
like recently. You can replace “average” with median, EWMA, or<br />
other descriptive summary statistics.<br />
3. One step beyond this is predicting a likely range of values centered<br />
around a summary statistic. This usually boils down to a<br />
simple mean <strong>for</strong> the central value and standard deviation <strong>for</strong> the<br />
spread, or an EWMA with EWMA control limits (analogous to<br />
mean and standard deviation, but exponentially smoothed).<br />
4. All of these methods use parameters (e.g., the mean and standard<br />
deviation). Non-parametric methods, such as histograms<br />
of historical values, can also be used. We’ll discuss these in more<br />
detail later in this book.<br />
We can take prediction to an even higher level of sophistication<br />
using more complicated models, such as those from the ARIMA<br />
family. Furthermore, you can also attempt to build your own models<br />
based on a combination of metrics, and use the corresponding output<br />
to feed into a control chart. We’ll also discuss that later in this<br />
book.<br />
Prediction is a difficult problem in general, but it’s especially difficult<br />
when dealing with machine data. Machine data comes in many<br />
shapes and sizes, and it’s unreasonable to expect a single method or<br />
approach to work <strong>for</strong> all cases.<br />
In our experience, most anomaly detection success stories work<br />
because the specific data they’re using doesn’t hit a pathology. Lots<br />
of machine data has simple pathologies that break many models<br />
quickly. That makes accurate, robust 6 predictions harder than you<br />
might think.<br />
6 In statistics, robust generally means that outlying values don’t throw things <strong>for</strong> a loop;<br />
<strong>for</strong> example, the median is more robust than the mean.<br />
26 | Chapter 3: Modeling and Predicting