• Beyond accuracy, the False Positive and False Negative rates are sensible, intuitive ways of assessing performance
  • Not all anomaly detectors are equal: performance scores can differ substantially between anomaly detectors, operating on the same real-life time-series data for business metrics
  • In our test data, Avora’s anomaly detector achieves better performance compared to Facebook Kats, with significantly lower False Positive & Negative rates, but comparable accuracy
  • Even lower False Positive/Negative Rates can be achieved with hyper-parameters tuning, with no reduction in accuracy

Every business across the world has increasingly more and more data it can use to analyse performance and make data driven decisions. However, quite a few companies find themselves with too much data that can’t be possibly tracked and analysed by people. As a result, AI powered business intelligence tools and specifically Anomaly Detection, play a more and more important role in business success.

There is no scarcity in offers and solutions in business intelligence, but it’s often hard to evaluate the performance and quantify the potential impact the tool can have on the business. Among the reasons that make the evaluation hard are:

  1. Lack of performance comparative datasets that relate to noisy, real-life business performance data
  2. Performance is described using complex scientific metrics that are not easily translated into the business world.

In Avora we have created an evaluation pipeline using real life, time-series based on business data to benchmark Avora’s performance against the well known Facebook Kats Anomaly Detector, closely linked to the popular Facebook Prophet package.

Beyond accuracy, the most commonly used metrics when evaluating anomaly detection solutions are F1, Precision and Recall. One can think about these metrics in the following way:

  • Recall is used to answer the question: What proportion of true anomalies was identified? It is calculated as:
  • Precision answers the question: What proportion of identified anomalies are true anomalies?
  • F1 Score identifies the overall performance of the anomaly detection model by combining both Recall and Precision, using the harmonic mean

For example:

You are tracking Sales metric as one of your KPIs. You receive a notification that 10 anomalies have been identified. You check the graph and confirm that only 6 dates out of 10 are indeed anomalies. However, you also notice that there are 9 other dates for which the Sales metric…

Continue reading: https://towardsdatascience.com/anomaly-detection-how-to-tell-good-performance-from-bad-b57116d71a10?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com