The Hybrid Approach: Benefit from Both Multivariate and Univariate Anomaly Detection Techniques
In our previous post, we explained what time series data is and provided some details as to how the Anodot time series real-time anomaly detection system is able to spot anomalies in time series data. We also discussed the importance of choosing a model for a metric’s normal behavior, which includes any and all seasonal patterns in the metric, and the specific algorithm which Anodot uses to find seasonal patterns.
A concise explanation on conciseness
At the end of that post, we concluded that it’s possible to get a sense of the bigger picture from many individual anomalies. Conciseness is a requirement of any large-scale anomaly detection system because monitoring millions of metrics is guaranteed to generate a flood of reported anomalies, even if there are zero false positives.
Stable applications and operating systems often ship with errors, even if those errors don’t result in a failure state right away. For context, Google’s bug bounty program paid out nearly $3 million in 2017. This however doesn’t mean that Google’s programs were constantly crashing. These bugs were present but dormant, able to be activated only under certain conditions. An anomaly detection system might detect and flag them even if they’re not currently causing your application to crash.
Achieving conciseness in this context is analogous to distilling the many individual symptoms into a single diagnosis. This could be viewed in much the same way that a mechanic might diagnose a car problem by observing the pitch, volume and duration of all the sounds it makes, in addition to watching all the dials and indicator lights on the dashboard.
Univariate and multivariate anomaly detection techniques
After employing the anomaly detection techniques described in our last post (and in our previous series), how does a practical real-time system like Anodot’s actually achieve concise reporting of detected anomalies? How does the system determine the diagnosis? The answer is that after the system detects anomalies in individual metrics, a second layer of machine learning takes over and groups anomalies from related metrics together. This grouping of related anomalies condenses the original flood of individual alerts into a smaller, more manageable number of underlying incidents.
This two-step approach actually combines two different anomaly detection techniques: univariate and multivariate. Univariate anomaly detection looks for anomalies in each individual metric, while multivariate anomaly detection learns a single model for all the metrics in the system.
Univariate methods are simpler, so they are easier to scale to many metrics and large datasets. However, someone would then need to unravel the causal relationships between the anomalies in the resulting alert storm. Companies need to quickly understand what all those alerts mean before they can decide what to do in response, and many companies simply don’t have the time. With outages costing up to $5,600 per minute, every second counts when determining the response to an anomaly.
Multivariate approaches, on the other hand, detect anomalies as complete incidents, yet are difficult to scale both in terms of computation and accuracy of the models. This approach also produces anomaly alerts. These are hard to interpret because all the metrics are inputs that generate a single output from the anomaly detection system.
With multivariate methods, each added metric introduces interactions between itself and all the other metrics. Since multivariate anomaly detection methods have to model this entire complex system, the computational cost increases rapidly as the number of modeled metrics increases. In addition, individual metrics need to have similar statistical behavior for multivariate methods to accurately work.
Revolutionizing anomaly detection techniques: Anodot’s two-step approach
Anodot effectively combines the strengths of each of these techniques into a hybrid approach. Since univariate anomaly detection is used first on individual metrics, Anodot’s approach benefits from its scalability and simplicity. By using other advanced AI techniques for discovering relationships between the metrics, Anodot utilizes the multivariate approach to group and interpret related anomalies, satisfying the requirement of conciseness we spoke of earlier.
This blending of univariate and multivariate anomaly detection is similar to Anodot’s combination of the supervised and unsupervised anomaly detection techniques we discussed in our previous series. At each layer of its anomaly detection system, Anodot uses the most appropriate data science and machine learning techniques for that layer, even combining them in sophisticated ways to provide businesses with actionable information in real time.
Stay tuned soon, to learn how your business can sift through thousands or even millions of metrics in real time using automated anomaly detection.