12.03.2016 Views

Anomaly Detection for Monitoring

anomaly-detection-monitoring

anomaly-detection-monitoring

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 5<br />

Practical <strong>Anomaly</strong> <strong>Detection</strong> <strong>for</strong><br />

<strong>Monitoring</strong><br />

Recall that one of our goals <strong>for</strong> this book is to help you actually get<br />

anomaly detection running in production and solving monitoring<br />

problems you have with your current systems.<br />

Typical goals <strong>for</strong> adding anomaly detection probably include:<br />

• To avoid setting or changing thresholds per server, because<br />

machines differ from each other<br />

• To avoid modifying thresholds when servers, features, and<br />

workloads change over time<br />

• To avoid static thresholds that throw false alerts at some times<br />

of the day or week, and miss problems at other times<br />

In general you can probably describe these goals as “just make<br />

Nagios a little better <strong>for</strong> some checks.”<br />

Another goal might be to find all metrics that are abnormal without<br />

generating alerts, <strong>for</strong> use in diagnosing problems. We consider this<br />

to be a pretty hard problem because it is very general. You probably<br />

understand why at this point in the book. We won’t focus on this<br />

goal in this chapter, although you can easily apply the discussion in<br />

this chapter to that approach on a case by case basis.<br />

The best place to begin is often where you experience the most painful<br />

monitoring problem right now. Take a look at your alert history<br />

41

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!