You are here

Reasoner for Multimedia Content Delivery Analysis - RC1 Reasoner

General overview

For the Measurements for Multimedia Content Delivery Use Case (MMUC) NETVISOR has developed the RC1 Reasoner.

RC1 Reasoner has been released together with NETVISOR's EZrepo repository, in order to support the special monitoring and troubleshooting needs of multimedia content delivery networks. Below figure illustrates a typical multimedia delivery scenario.

In this scenario, the Reasoner is working in a troubleshooting/diagnostic role. The event triggering the Reasoner process is some kind of measured quality impairments, and the output of the diagnostics is the estimation of one or more causes responsible for the impairment. The Reasoner is used by the network operator (ISP) providing access for clients requesting multimedia (streaming) content. 

The typical error detection process has the following stages:

  • Isolation of the problem/bottleneck domain. The very first task is to know if problem occured in ISP's network or outside, and if within. If it is an outside error, the operator should notify and assist the content provider and inform the concerned customers, otherwise he should trigger the internal troubleshooting process.
  • Additional measurements to find the root cause} In most cases the root cause is not trivial and additional measurements / data collections need to be placed in order to exactly pinpoint the root cause.
  • Find the root cause and remediate. Root cause has been found and the operator starts the remediation process.

RC1 Reasoner will help in stage 1, i.e. to determine the problem domain. We assume that network operators usually have diagnostics tools and procedures already in place to deal with internal network problems. Thus mPlane Reasoner just triggers this process by proving that a capacity or other problem is present in the network, and points at a location or component for starting the investigation.

Operation of the Reasoner

 

I - Architecture

As shown in the picture, RC1 Reasoner has 3 main modules:

  • Reasoner GUI
  • Topology database
  • Diagnosis Ruleset

RC1 makes its decisions based the metrics received from the probes and/or repositories and on the topology informations stored in the Topology DB, using the Diagnosis Ruleset.

 

II - Periods

 

The starting concept of reasoner operation are Periods, which are tuples of attributes describing the availablility and access quality of some service (e.g network connectivity or availability of some piece of content) during a certain period. Events are created based on the qualification - i.e. "grading" - of some Probe measurements, along various criteria.

As an example an OTT probe measuring a piece of content may experience "EXCELLENT" quality for a long time, then the quality may go down to "NOT\_AVAILABLE", possibly going through a "POOR" period before that. (We recommend simple grading with low number of options, like this 3-step classification.)

The structure of each Period is as follows:

  • MeasType: the type of the probe/measurement represented by the Period
  • Begin: start time specification for the Period
  • End: end time specification for the Period
  • ProbeID: the probe producing the measurement
  • ClientID: the initiator of this measurement Period. For active probes, ClientID and ProbeID are equal.
  • ServerID: the server or responder terminating the measurement
  • ContentID: which program/content has been accessed
  • NodePath: a list of relevant topological nodes on the path between the client and the server. It is not necessary to enumerate all routers/switches on the signal path individually, rather only the devices that somehow distribute or join the path from/to various services and vantage points. As an example, the typical access of a subscriber of provider 'A' to some video content CDN server of provider 'B' could span the following signal path: 1. The subscriber's home network, 2. The access node (DSLAM, OLT, CMTS) of Prov\_A that serves the subscriber, 3. Prov\_A core network cloud, 4. Prov\_A peering router, 5. Prov\_B peering router, 6. Prov\_B data center.
  • OverallGrade: the overall quality classification for the Period. { EXCELLENT | POOR | NOT\_AVAILABLE }
  • BandwidthGrade: the classification of the bandwidth experienced trough this Period, i.e whether any slow downloads or slow IGMP responses were recorded. { EXCELLENT | FAIR | POOR | NOT\_AVAILABLE }

Periods are created from probe measurements by aggregating subsequent measurements into longer periods until any of classification criteria produces a different "grade". The example above shows two grade properties (for bandwidth and for overall quality), but we expect that periods will generally have not too many, i.e. up to 5 grade properties per period. This will avoid excessive segmentation of measurements into small periods.

It is to be emphasized that a Period only covers samples from repeated samples of a certain measurement (specification) on a certain probe.

The low number of grade properties and a reasonable classification system (offering a few grade values only for each property), should result in relatively few periods, i.e. a large number of measurements will be represented in a very concise way through Periods.

 

III - A priory Topological informaton

 

A principal assumption of our UC and our Reasoner is that the topology of the network and the probes are fully known to the Reasoner. Consequently we have exact knowledge of the location and address of each node in the network: probes, content servers, client machines reporting degradations, etc.

This knowledge is also supposed to include the capability of recognizing the path between any client/probe and server/responder, in the form of the NodePath list property explained above.

While the identification of the path of nodes is feasible for most typical network scenarios (especially if the nodepath is not too much detailed, as shown in the example above), we understand this is a strict assumption which cannot easily handle redundancies (i.e. clients with multiple access lines) and dynamic paths selection. Understanding these, this assumption may be reinvestigated and possibly released at a later phase.

A practical source of topological information could be the network management system(s) (NMSs) of the network operators involved, which continuously monitors the state of all devices in the network, and which can periodically produce a full snapshot of the topology.

 

IV - Diagnosis/Iterative Analysis Graph

 

The Diagnosis meachanism is based on Criteria-based searches over the set of all available Periods, producing basically a count of the matching Periods for each query. These count values are then used in the rules, as operands for and arithmetic/logical expression. If the expression evaluates to true, the rule is fired, otherwise no action is taken.

 

Criteria for Period selection

The concept of Criteria used in the searches has some key features that make them suitable for being used in diagnosis rules later in the Reasoner. In short, we define practical criteria grammar for each property type to cover all typical query scenarios.

Begin and End: to compare a Period with a certain point in time T. The available comparison primitives are:

BB(<T>) [begins before]
BA(<T>) [begins after]
EB(<T>) [ends before]
EA(<T>) [ends after]

T can be here an absolute time or (more frequently) a time relative to "Now" or "Today", like "Now - 20 mins". An example of a full time criteria for the latest 5-min period: 

BB(Now-5min) AND EA(Now)

ServerID, ProbeID, ClientID, ContentID, MeasType: the supported filtering primitives are EQ(<id>) (for strict equality), MATCH(<regexp>) for partial textual matching (on the ID), and MEMBEROF(<set>) for matching based on (predefined) sets. For network nodes only, an additional primitive NETMATCH(<netaddr/netmask>) is also available for network address based matching.

Grades can be compared for service level equality with the usual '<=' '<' '=' '!=' '>' '>=' comparison operators, which make use of the grades' metric nature, i.e. that they always represent a "ladder of qualities".
NodePath matching primitive is PASS(<node>), which allows to filter traffic passing any node. Using compound expression like 'PASS(nodeA) AND !PASS(nodeB)', we can filter all traffic running through nodeA, but branching before nodeB.

It is to be noted that criteria can be parametrized, i.e. a criteria can be applied for searches on traffic with serverA and serverB, through the selection of parameters. An example of "complete" real-world criteria could be 

MeasType EQ("OTT") AND BandwidthGrade <= "POOR" AND PASS(\$node1) AND PASS(\$node2)
AND BB(Now - 10m) and EA(Now)
which selects Periods for OTT measurements with poor or worse quality passing certain nodes (defined through parameters) fully covering the last 10 minutes.

 

Analysis rules

At the conceptual level, our Reasoner continually matches Analysis rules onto the set of Periods produced from probe measurements, and fires rules where the left-hand side matches. In practice this has to be constrained to some periodic re-checking of Criteria in the rules, and recalculation of the expressions, but we will try to apply this conceptual model quite closely.

Analysis rules have the generic form

<criteria-expression> --> <action>
<criteria-expression>

where is a standard arithmetic/logical expression built from constants and the match counts of certain Criteria (possibly specified parameters), and action can be any of:

  • Diagnose(<message>): create a diagnosis with a message.
  • Start<meas>: start an on-demand measurement, which will eventually produce additional Periods and possibly fire additional Reasoner rules.
  • Stop<meas>: clean up on-demand measurements which are considered superfluous.

An example:
CRIT_A(ServerC) + CRIT_A(ServerD) > 0.5 * CRIT_C - 0.1 * CRIT_F --> Diagnose("Network error in Data Center X");

 

Software Location

 

The RC1 Reasoner is downloadable from GitHub.