Anomaly Detection for Monitoring
anomaly-detection-monitoring
anomaly-detection-monitoring
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
work. When a system or process is highly unstable, it becomes<br />
extremely difficult <strong>for</strong> models to work well. We highly recommend<br />
implementing filters to reduce the number of false positives. Some<br />
of the filters we’ve used include:<br />
• Instead of sending an alert when an anomaly is detected, send<br />
an alert when N anomalies are detected within an interval of<br />
time.<br />
• Suppress anomalies when systems appear to be too unstable to<br />
determine any kind of normal behavior. For example, the<br />
variance-to-mean ratio (index of dispersion), or another dimensionless<br />
metric, can be used to indicate whether a system’s<br />
behavior is stable.<br />
• If a system violates a threshold and you trigger an anomaly or<br />
send an alert, don’t allow another one to be sent unless the system<br />
resets back to normal first. This can be implemented by<br />
having a reset threshold, below which the metrics of interest<br />
must dip be<strong>for</strong>e they can trigger above the upper threshold<br />
again.<br />
Filters don’t have to be complicated. Sometimes it’s much simpler<br />
and more efficient to just simply ignore metrics that are likely to<br />
cause alerting nuisances. Ruxit recently published a blog post titled<br />
“Parameterized anomaly detection settings” 5 in which they describe<br />
their anomaly detection settings. Although they don’t call it a “filter,”<br />
one of their settings disables anomaly detection <strong>for</strong> low traffic applications<br />
and services to avoid unnecessary alerts.<br />
Tools<br />
You generally don’t have to implement an entire anomaly detection<br />
framework yourself. As a significant component of monitoring,<br />
anomaly detection has been the focus of many monitoring projects<br />
and companies which have implemented many of the things we’ve<br />
discussed in this book.<br />
5 http://bit.ly/ruxitblog<br />
60 | Chapter 6: The Broader Landscape