Anomaly Detection for Monitoring
anomaly-detection-monitoring
anomaly-detection-monitoring
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
CHAPTER 5<br />
Practical <strong>Anomaly</strong> <strong>Detection</strong> <strong>for</strong><br />
<strong>Monitoring</strong><br />
Recall that one of our goals <strong>for</strong> this book is to help you actually get<br />
anomaly detection running in production and solving monitoring<br />
problems you have with your current systems.<br />
Typical goals <strong>for</strong> adding anomaly detection probably include:<br />
• To avoid setting or changing thresholds per server, because<br />
machines differ from each other<br />
• To avoid modifying thresholds when servers, features, and<br />
workloads change over time<br />
• To avoid static thresholds that throw false alerts at some times<br />
of the day or week, and miss problems at other times<br />
In general you can probably describe these goals as “just make<br />
Nagios a little better <strong>for</strong> some checks.”<br />
Another goal might be to find all metrics that are abnormal without<br />
generating alerts, <strong>for</strong> use in diagnosing problems. We consider this<br />
to be a pretty hard problem because it is very general. You probably<br />
understand why at this point in the book. We won’t focus on this<br />
goal in this chapter, although you can easily apply the discussion in<br />
this chapter to that approach on a case by case basis.<br />
The best place to begin is often where you experience the most painful<br />
monitoring problem right now. Take a look at your alert history<br />
41