Anomaly Detection for Monitoring

More documents

Recommendations

Info

CHAPTER 5 Practical Anomaly Detection for Monitoring Recall that one of our goals for this book is to help you actually get anomaly detection running in production and solving monitoring problems you have with your current systems. Typical goals for adding anomaly detection probably include: • To avoid setting or changing thresholds per server, because machines differ from each other • To avoid modifying thresholds when servers, features, and workloads change over time • To avoid static thresholds that throw false alerts at some times of the day or week, and miss problems at other times In general you can probably describe these goals as “just make Nagios a little better for some checks.” Another goal might be to find all metrics that are abnormal without generating alerts, for use in diagnosing problems. We consider this to be a pretty hard problem because it is very general. You probably understand why at this point in the book. We won’t focus on this goal in this chapter, although you can easily apply the discussion in this chapter to that approach on a case by case basis. The best place to begin is often where you experience the most painful monitoring problem right now. Take a look at your alert history 41
Page 3 and 4: Anomaly Detection for Monitoring A
Page 5 and 6: Table of Contents Foreword. . . . .
Page 7: Foreword Monitoring is currently un
Page 10 and 11: tion” to anomaly detection is imp
Page 12 and 13: Why do we assume these things? Are
Page 14 and 15: Conclusions If you are like most of
Page 17 and 18: CHAPTER 2 A Crash Course in Anomaly
Page 19 and 20: How can you achieve similar results
Page 21 and 22: cause most anomaly detection techni
Page 23 and 24: CHAPTER 3 Modeling and Predicting A
Page 25 and 26: (say, the size of the drill bit), a
Page 27 and 28: lies. To fix this problem, the cont
Page 29 and 30: ally decaying window. This is made
Page 31 and 32: ack again, meaning that they smooth
Page 33 and 34: emains consistent across all of the
Page 35 and 36: Evaluating Predictions One of the m
Page 37 and 38: No. You’ve stumbled into statisti
Page 39: As an aside, there’s a rumor goin
Page 42 and 43: Dealing with Trend Trends break mod
Page 44 and 45: pletely out of phase with the seaso
Page 46 and 47: mon situations. You can probably gu
Page 50 and 51: or outages. What’s the source of
Page 52 and 53: more sophisticated if you want, but
Page 54 and 55: A Worked Example In this section, w
Page 56 and 57: In this plot, you’ll notice that
Page 58 and 59: Here’s what the histogram of resi
Page 60 and 61: nent built into its model, so you d
Page 62 and 63: Most machine data is much noisier t
Page 64 and 65: Clustering Not all anomaly detectio
Page 66 and 67: want to set the threshold for outli
Page 68 and 69: work. When a system or process is h
Page 70 and 71: more interested in using a cloud-ba
Page 72 and 73: this.stddev = function() { var mean
Page 74 and 75: About the Authors Baron Schwartz is

Anomaly Detection for Monitoring

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?