Anodot Resources Page 49

FILTERS

Anodot Resources Page 49

Blog Post 6 min read

Evaluating the Maturity of Your Analytics System

.post-content table#custom-table-32 th, .post-content table#custom-table-32 td {padding: 9.5px;} I’m a big fan of maturity models. They help teams clearly articulate their vision and define a path forward. You can tie the product roadmap and projects to the model and justify budgets needed to reach the desired maturity level. Gartner offers the following “Analytics Spectrum” that describes how analytics platforms evolve in two main dimensions: Sophistication level of the analytics Amount of human intervention required in the decision-making process towards a desired action. The most common form of analytics is descriptive, with few that offer some level of diagnostics. Predictive Analytics are not yet mature, but we clearly see an increasing demand for a better prediction model and for longer durations. As for prescriptive analytics -- the icing on the cake -- there are very few organizations that have reached that level of maturity and are applying it in very specific use cases. As you can imagine, at the highest maturity level, an analytics platform provides insights about what is going to happen in the future and takes automated actions to react to those predictions. For example, an e-commerce web site can increase the price of a specific product if the demand is expected to increase significantly. Additionally, if the system detects a price increase by competitors, they can send a marketing campaign to customers that are interested in that product to head off declining sales, or they can scale up or down based on changes in traffic volumes. Taking the Gartner model into consideration, I have developed a new maturity model which takes a slightly different (but very much related) approach to help you evaluate the current state of your monitoring/analytics system and plan in which areas you want to invest. This model is to be used as a guide since each company will be at its own level of maturity for each of the monitoring system capabilities. Moving down the left side of the table below, we see the Monitoring System Key Capabilities: Collect (business and infrastructure metrics), Detect, Alert, Triage, Remediate. The numbers from left to right are the different levels of maturity of each of these capabilities. And lastly, on the right, are the KPIs affected by each capability, that I explained in more detail in the first post of this series: TTD (Time to Detect), TTA (Time to Acknowledge), TTT (Time to Triage), TTR (Time to Recover), and SNR (Signal to Noise Ratio).   Monitoring System Key Capability Maturity Level Affected KPIs 1 2 3 4 5 Collect (Business Metrics) Key metrics at Site/Company level Key metrics at product line, geography level Secondary level metrics at product line, geography, customer/partner Key and Secondary metrics at page, OS and browser level Fine grain dimensions per transaction TTD, TTR Collect (Infrastructure Metrics) Key metrics for key components at Site level Key metrics for key components at availability zone/data center level Key metrics per component in the entire technology stack (database, network, storage, compute etc.) Key metrics per instance of each component Fine grain dimensions per component/instance TTD, TTR Detect Human factor (using dashboards, customer input etc.) Static Threshold Basic statistical methods (week over week, month over month, standard deviation), ratios between different metrics Anomaly detection based on machine learning Dynamic anomaly detection based on machine learning with prediction TTD Alert Human factor (using dashboards, customer input etc.) Alert is triggered whenever detection happens on single metric The system can suppress alerts using de-duping, snoozing, minimum duration Alerts simulation, enriched alert Correlated and grouped alerts to reduce noise level and support faster triaging SNR, TTA Triage Ad Hoc (tribal knowledge) Initial play book for key flows Well defined play book with set of dashboards/scripts to help identify the root cause Set of dynamic dashboards with drill down/through capabilities and to help identify the root cause Auto Triaging based on advanced correlations TTT Remediate Ad Hoc Well defined standard operating procedure (SOP), manual restore Suggested actions for remediation, manual restore Partial auto-remediation (scale up/down, fail over, rollback, invoke business process) Self-Healing TTR One thing to consider is that the “collect” capability refers to how much surface area is covered by the monitoring system. Due to the dynamic nature of the way we do business today, it’s kind of a moving target -- new technologies are introduced, new services are being deployed, architecture change, and so on. Keep it in mind as you might want to prioritize and measure progress in the data coverage. You can use the following spider diagram to visualize the current state vs. the desired state of the different dimensions. If you want to enter your own maturity levels and see a personalized diagram, let me know and I'll send you an spreadsheet template to use (for free, of course). The ideal monitoring solution is completely aware of ALL components and services in the ecosystem it is monitoring and can auto-remediate issues as soon as they are detected.  In other words, it is a self-healing system. There are some organizations that have partial auto-remediation (mainly around core infrastructure components) by leveraging automation tools integrated into the monitoring solution. Obviously, to get to that level of automation requires a high level of confidence in the quality of the detection and alerting system, meaning the alerts should be very accurate with low (near zero) false positives. When you are looking to invest in a monitoring solution, you should consider what impact it will make on the overall maturity level. Most traditional analytics solutions may have good collectors (mainly for infrastructure metrics), but may fall short when it comes to accurate detection and alerting; the inevitable result, of course, is a flood of alerts. A recent survey revealed that the top 2 monitoring challenges organizations face are: 1) quickly remediating service disruptions and 2) reducing alert noise. The most effective way to address those challenges is by applying machine learning-based anomaly detection that can accurately detect issues before they become crises, enabling the teams to quickly resolve them and prevent them from having a significant impact on the business.
Blog Post 6 min read

Deliver Results at Scale: Supervised vs. Unsupervised Machine Learning Anomaly Detection Techniques

In this final installment of our three-part series, let’s recap our previous discussions of anomalies – what they are and why we need to find them. Our starting point was that every business has many metrics which they record and analyze. Each of these business metrics takes the form of a time series of data...
Documents 1 min read

Case Study: Autonomous Monitoring for Telco BSS

Learn how leading telcos are using Anodot's ML-based anomaly detection to ensure business support systems can keep pace with the high level of service required for mission-critical applications.
Documents 1 min read

Case Study: Autonomous Monitoring for Telco OSS

Learn how leading telcos are using Anodot and its real-time alerts to automatically monitor their network operations for proactive incident management.
Blog Post 3 min read

The App Trap: Why Every Mobile App Needs Anomaly Detection

If you're one of the many consumers using native apps 90% of the time you're on your smartphone, you know first hand that mobile apps are big business. So big, in fact, that they are expected to generate 188.9 billion U.S. dollars in revenues via app stores and in-app advertising by 2020. There's an app for just about everything, from games to ebooks, to dating, cooking, shipping, sharing photos and more, businesses are developing more and more mobile apps to reach and engage their customers. But do these apps make money? In addition to charging for the app, many app developers monetize through advertising, in-app purchases, referrals and cross promotions.   https://youtu.be/UBzh4McuFDc Once using an app, businesses offer targeted advertising per user, options for premium content like access to extra levels or additional features, suggestions for additional apps by the same company or for related content that the company will receive revenue from for referring anyone who clicks through and converts. With so many moving parts (e.g. frontend, backend, advertising platforms, partners), there are a multitude of opportunities for something to break, such as partner integration or data format changes, device changes like OS updates or new devices, external changes like media coverage or social media exposure, and company changes like deployments, new game releases, AB tests and more. Just like the butterfly effect, where the flap of a butterfly's wings can cause a string of events leading to a huge storm, if one element of an application is working less than optimally, it can cause major problems elsewhere, which translates into unhappy customers, uninstalls, revenue losses and drops in market share. With traditional BI and monitoring tools like dashboards and alerts, you may only realize that something has broken down once your uninstall numbers begin to rise or you notice that users have stopped returning. Only a small percentage of very dedicated users will try a crashing app more than twice, so fixing the problem before you've lost users in droves is of key importance. So, how can you mitigate problems on your business's mobile app keeping users happy and engaged? In a recent session at Strata Data San Jose, Ira Cohen, Anodot's Chief Data Scientist and co-founder, presented "The App Trap: Why Every Mobile App Needs Anomaly Detection," showing how to use automated anomaly detection to monitor all areas of your mobile app to fully optimize it. Watch the full video to learn more about the processes involved in automated anomaly detection -- metric collection, normal behavior learning, abnormal behavior learning, behavior topology learning and feedback-based learning -- and how, together, they can keep your app on track, making money, and keeping users happy.  
Documents 1 min read

Case Study: Autonomous Monitoring for Telco - OSS, BSS, CEM and More

Learn how telcos are using Anodot to automatically monitor their OSS, BSS and CEM layers and use real-time alerts for proactive incident management.
Blog Post 4 min read

Real-Time Anomaly Detection: Solving Problems, Seizing Opportunities

The business case In the first of our three-part series, What is anomaly detection?, we summarize how machine learning is enabling real-time, automated incident management. In this second post, we’ll discuss the reasons why this capability is so essential to today's data-driven business. The necessity In our previous post, we gave an example of a software update causing online sales from Asia to plummet. Obviously an anomaly in online sales volume for any specific region or device type needs to be detected immediately, and the same is true for other anomalies. This is because many real-life business anomalies require immediate action. That bad software update is causing you to lose a lot of money every second. And since discovering the problem is the first step in resolving it, eliminating the delay between when the problem occurs and when the problem is detected immediately brings you one crucial step closer to rolling back that update and restoring revenue flow from Asia. This is also true for anomalies which aren’t problems to be solved, but opportunities to be seized. For example, an unusual uptick in mobile app installations from a specific geographical area may be due to a successful social media marketing campaign that has gone viral in that region. Given the short lifespan of such surges, your business has a limited time window in which to capitalize on this popularity and turn all those shares, likes and tweets into sales. Real-time anomaly detection is advantageous even when the detected anomalies include ones which don’t require an immediate response. This is because you can always choose to postpone action on an instant alert, but you can never react in real-time to a delayed alert. In other words, real-time anomaly detection is always advantageous over delayed detection. But let’s think about it - what kind of anomaly of detection systems are able to provide this type of real-time notification? For only one or a few KPIs, a human monitoring a dashboard may work. This manual approach, however is not scalable to thousands or millions of metrics while maintaining real-time responsiveness. Beyond the mere number of metrics in many businesses, is the complexity of each individual metric: different metrics have different patterns (or no patterns at all) and different amounts of variability in the values of the sampled data. In addition, the metrics themselves are often changing, often exhibiting different patterns as the data exhibits a new “normal.” Manual vs. automated anomaly detection If manual anomaly detection is inadequate, then automated anomaly detection must be used to achieve real-time anomaly detection at large scale, and it must be sophisticated enough to handle all the complexity described above at the scale of millions of data points or more, updating every second. The machine learning algorithms that power Anodot’s automated anomaly detection system utilize the latest in AI research to meet this task. Our patented machine learning algorithms fall under the “online” category. This means that each data point in the sequence is processed only once and then never considered again. Online machine learning applications have the added benefit of scalability to the massive amount of metrics businesses keep track of. As each data point is processed, the online machine learning algorithms work in a way similar to the human brain in the jogger example of the previous post: A model which fits the data is created. This model, in turn, is used to predict the value of the next data point. If the next data point differs significantly from what the model predicted, that data point is flagged as a potential anomaly. Anodot’s machine learning algorithms use each new data point to intelligently update the model. AI anomaly detection in the real world The power of this application of AI to spot anomalies and the opportunities they present far faster than humans could, has already been used to great scientific success. An AI system developed by NASA’s Jet Propulsion Laboratory was able to detect and command an orbital satellite to image a rare volcanic event in Ethiopia - before volcanologists even asked NASA for that satellite to take images of the eruption. When working with thousands or millions of metrics, real-time decision making requires online machine learning algorithms. Whether it’s saving your business money or gleaning scientific insights from a brief volcanic eruption, real-time anomaly detection has enormous potential for catching the important deviations in the data. In the third post, we’ll dive a little deeper into the anomaly detection techniques which power Anodot’s software.
Documents 1 min read

Case Study: How 5 Leading Adtech Companies Used AI Analytics to Save Millions

Learn how leading adtech companies -- including Rubicon Project, Uprise and NetSeer -- are leveraging the power of machine learning to find outliers in time series data and turn them into valuable business insights.
Documents 1 min read

White Paper: The Build or Buy Dilemma For AI-Based Anomaly Detection

Leveraging the vast amount of business data available today to better meet customer needs and detect business incidents presents organizations with the challenge of whether to build their own anomaly detection system or buy one ready-made. Before organizations make this critical decision, it is important to weigh the benefits and challenges of each approach.