You are here

Guidelines for Anomaly detection and root cause analysis in large-scale networks

Requirements

This page details UC-specific requirements in addition to those expressed for the Reference demonstration environment (link)

Notice that the interaction between the different components listed in the present page, is schematically illustrated in the corresponding use case description webpage (link).  This page assumes knowledge of these interactions, and merely expresses guidelines to setup and start each component.

 

 

Hardware list

  • One dedicated machine running Tstat as passive traffic probe
  • One dedicated machine for the repository running DBStream
  • One machine running DidNETPerf (non necessarily dedicated, can be the same machine running the use-case reasoner)

 

Software list

  • mPlane framework for the use-case (GitHub repository), containing the mPlane Supervisor, the Reasoner (mpAD_Reasoner) and the proxy for ADTool. 

Components

Repositories

Reasoner

 

Software installation

  • Download and install the Tstat; follow instructions at GitHub repository
  • Download and install DBStream; follow instructions at  GitHub repository
  • Download and install the ADTool analysis module; follow instructions at the corresponding mPlane page
  • Download and install the mPlane framework for the use-case, which already provides the mPlane proxy to the ADTool at  GitHub repository
  • Download and install the mPlane RIPE Atlas proxy following the instructions at GitHub repository
  • Download and install DisNETPerf following the instructions at GitHub repository 

 

Software configuration

  • Run the mPlane Supervisor:

./scripts/mpsup --config ./conf/supervisor.conf

  • Run the mPlane Client:

./scripts/mpcli --config ./conf/client.conf

  • Run the Tstat proxy:

./scripts/mpcom --config ./mplane/components/tstat/conf/tstat.conf

  • Run the Repository proxy:

./scripts/mpcom --config ./mplane/components/tstat/conf/tstatrepository.conf

Components

  • Run the ADTool proxy:

./scripts/mpcom --config ./mplane/components/ADTool/conf/adtool.conf

  • Run the RIPE Atlas proxy:

./scripts/mpcom --config ./mplane/components/ripe-atlas/conf/component.conf

Repositories

  • Run both DBStream and the MATH importer module, math_repo:

 ./hydra --config sc_tstat.xml 

./math_repo

  • Run Tstat and the MATH exporter module, math_probe, using the mPlane Client shell:

|mplane| runcap tstat-log_tcp_complete-core

|when| = now + inf

|mplane| runcap tstat-exporter_log

repository.url = localhost:3000

Reasoner

  • no specific configuration

Demonstration environment

  • Import in DBStream external data provided by geo-localization services such as MaxMind and IP address analysis services such as the one provided by the Team Cymru community.


 

Step-by-step walkthrough

 

Warmup

The use-case is run by starting the mpAD_Reasoner which interacts with all the mPlane components through the mPlane Supervisor, using the mPlane RI protocol, and orchestrates all the tasks needed to automate the detection and diagnosis of anomalies occurring in the distribution of YouTube videos

  • run the mpAD_Reasoner

./scripts/mpadtoolreasoner --config ./conf/mpadclient.conf

This use-case shall run for several months collecting and analyzing YouTube measurements in the quest for anomalies. However, we cannot ensure the presence of major anomalous events in the YouTube video provisioning during the deployment period. For the sake of showcasing the detection and diagnosis procedure of a major event, we will rely on the analysis historical data where we have detected and investigated a major anomaly and have been pre-imported in the use-case repository.

Consequently, the use-case demo consists of two scenarios

 

Scenario 1: use-case proof-of-concept

This scenario is intended to showcase that the mPlane Anomaly Detection modules can effectively detect anomalous behaviors related to both QoS-based and QoE-based performance metrics, and help in the root cause analysis investigation. The demo is based on historical YouTube traces pre-loaded in DBStream.

Trigger

Once the mpAD_Reasoner is started the ADTool starts analysing the preloaded YouTube traces. After a few analysis rounds ADTool starts flagging anomalies related to YouTube users QoE degradation. Along with the anomaly, the tool returns to the reasoner the list of involved server IP addresses. 

Observe

  • The reasoner client displays the list of server IP address involved in the anomaly along with the affected traffic features.
  • DisNETPerf is used to locate the RIPE Atlas probes closest to the servers involved in the anomaly.
  • The reasoner triggers, via the mPlane RIPE Atlas proxy, traceroutes from the identified probes towards the passive vantage point where TStat is deployed.
  • The reasoner combines results returned by RIPE Atlas and the historical passive measurements, to generate a report about the flagged YouTube server IP addresses.

 

Scenario 2: real-time correlation of passive and active measurements 

This part of the demo is meant to showcase that by using the mPlane reasoner it is possible to orchestrate the live collection of passive and periodic active measurements, and to trigger on the fly further active measurements. DBStream stores and analyzes nearly in real-time all the collected measurements. This part of the demo uses live traffic from Tstat probe at PoliTo/Fastweb (passive), active measurements returned by DistNETPerf (periodic/continuous, based on RIPE Atlas), and the RIPE Atlas mPlane proxy instantiation (reactive, on-demand).

The mpAD_Reasoner instructs ADTool to run on DBStream tables containing live data collected from the Tstat probe. The demo work-flow for Scenario 2 resembles the one of Scenario 1, with the only difference that we have no guarantee that any anomaly is flagged at run-time. 

Trigger

For demo purposes a subset of YouTube servers (e.g., those serving most of the traffic) are selected via the reasoner terminal to be further investigated through the RIPE Atlas active measurements. 

Observe

Analysis results are reported in a similar manner of Scenario 1.

 

The pictures below show the typical output of the ADTool running on a number of traffic features. The detector outputs are then used by the reasoner to generate the anomaly report. For instance, a number of red dots in the picture refers to  situations that can trigger alarms for anomalous situations detected by the reasoner.