The mPlane platform is built around a simple architecture consisting of just two entities: components, which make measurement and analysis available by advertising capabilities; and clients, which make use of those capabilities by sending specifications to the components matching those capabilities. From this simple interaction, measurement infrastructures of arbitrary complexity can be built. But doesn't this basically mean we've just re-invented middleware? Why invent a new way to send measurement instructions from one device to another?
mPlane is architected around three principles, derived from the challenges presented by the distributed, multiscalar, heterogenous environment in which it must operate: schema-centric measurement definition, explicit iterative measurement support, and weak imperativeness.
Schema-Centric Measurement Definition
Translating mPlane into traditional remote procedure call (RPC) terms, a capability is roughly equivalent to a procedure declaration, and a specification to a procedure call. The capability has parameters, values which must be filled in in order to invoke that capability; as well as results, which will be returned in a result message or via indirect export to some third entity once the measurement is complete. Unlike in traditional RPC, schema-centric measurement definition means the "identity" of the function bound to a capability is defined by the parameters it accepts and the data it returns in the result. Instead of the name of the function being significant (since capabilities have no names, only labels for human readability), it is the names of the parameters and the results which are. Parameter and result names are themselves defined in extensible element registries, identified by URL and supporting inheritance. This allows anyone to extend a registry with new elements should the set of available elements not meet the specific requirements of a given measurement.
In addition to parameters and results, capabilities also have the ability to add metadata defined in terms of these elements. Metadata describes the environment in which measurements are taken, and can be thought of a parameters which cannot be changed. A client can use metadata to determine whether or not to invoke a given capability, while metadata in result messages can be used to help interpret and analyze these results.
When used with carefully defined elements, this design pattern allows measurement comparability and repeatability in heterogeneous environments to a far greater extent than traditional RPC, as function definitions are composed from the data types they generate, as opposed to requiring a registry of function names.
Explicit Iterative Measurement Support
In a typical manual troubleshooting workflow, an analyst will start by running general measurements or queries to isolate the most-likely cause of a given problem, or to eliminate possible causes with easy-to-run measurements. The results of these first measurements are input to the manual decision-making process to either determine the next measurement to run to further narrow down possible causes, or to make a final diagnosis. The Reasoner in mPlane takes the role of this troubleshooter, analyzing high-level information in result messages and automatically issuing new specifications matching the capabilities available to it from it measurement network it uses. In order to support this interaction, the mPlane protocol explicitly blends control and data, allowing iterative measurement to work in a tight loop.
The most unusual architectural principle behind mPlane is weak imperativeness. In short, in heterogeneous measurement environments, failure is inevitable, so the architecture should embrace it. This has the following concrete implications for the mPlane protocol:
- All messages within the mPlane protocol contain all the state required to interpret them. A transfer of a message therefore implies a transfer of responsibility for that message. If a component or client loses state (because it restarts, for example, or because one component has taken over for another in a cloud of measurement probes, for example), an incoming message gives it enough information to recover that state.
- Messages within the mPlane protocol are idempotent. Sending the same message to the same client or component twice only results in a single action being taken. (The only exception is specifications using relative temporal scopes (i.e. "do this now"), but two different messages with relative scope are notionally not the same message anyway, because the meaning of "now" has changed in the meantime.)
- Nominal failures (usually a component restarting or losing connectivity, because it is on an end-user terminal) are treated as normal events. It is up to clients or other components receiving data to determine whether results are missing from an expected set. This property seems odd, especially compared to traditional RPC, but consider measurements taken from thousands or tens of thousands of vantage points; here, the failure of any specific vantage point to report isn't an error -- it's data about the state of the network, just as the results from the other components are.
The mPlane protocol is described in the protocol specification; the live version of this document is available at https://github.com/fp7mplane/protocol-ri/tree/master/doc/protocol-spec.md. The software development kit is available at https://github.com/fp7mplane/protocol-ri; a Python module will be installable from PyPI shortly, and SDK docs are at http://fp7mplane.github.io/protocol-ri/