You are here

Passive Content Promotion and Curation

 

General description:

The goal of this use is to help users finding relevant content on the web, based on the passive observation of content requests flowing in the network. The intuition behind it is that the crowd of users that anyway browse the web, can be seen as smart robots that surf on the Internet looking for new content to consume. This smart behavior is reflected into content requests that are observed on the network. By analyzing these content requests, it is possible to infer relevant content in the web. For instance, the more a web page is visited, the higher the chances that this page is attractive and relevant. This use case implements therefore a form of crowd sourcing that does not need user engagement.

An overview of the use case is shown in the figure below.

The use case analyzes http traffic extracted from the network to determine (1) User-URLs, pages actually visited by humans and not automatically queried by the browser, (2) Interesting-URLs, pages among user-URLs that are likely to attract the crowd (3) Content-URLs, pages that correspond to a single piece of content (e.g. article), as opposed to portal pages that aggregate and promote multiple content items and finally (4) Promotion, that is, to decide which content should be promoted to users.

Components describing the architecture, and point to pieces of software

To this end, the mPlane architecture facilitates the deployment and the supervision of such a use case. Probes like tstat have the capability of extracting HTTP logs from network traffic. Thanks to the mPlane architecture, streaming such logs from multiple probes to one or more repositories is made easy. The analysis modules of the content curation use case, plugged on the repository importers analyze HTTP logs online to infer relevant content that will be promoted to users. The reasoner orchestrates these operations through the supervisor, and is able to perform on demand analysis on the popularity of web pages at different time scales.

The figure below shows how the content curation is use case maps to the mPlane architecture.

 

Tstat exports HTTP logs to the repository. The analysis modules of the reasoner continuously run on the repository to elect the content that will be promoted on the content curation promotion web page.

 

How to setup and deploy the use-case  

 

The figure below shows at high level, the steps involved in running the content curation use case, mapped to the mPlane components.

For the detailed instructions on the deployment, setup, and demo of the use-case we refer interested readers to the corresponding demontration guidelines page (link).

 


 

HOWTO setup and run it

We list these steps as follows. Note that the four first steps are detailed on the mPlane Github of Tstat.

1- Installing and running Tstat.

2- Installing the mPlane framework and the the Tstat proxy: Configure the proxy with the right information about the IP address of the repository, the port on which to send streamed data, the type of stream (log_http in the case of content curation).

3- Running the supervisor.

4- Running the Repository proxy: Among the capabilities that are automatically activated when starting the repository, there is the repository streaming import capability that is needed for the content curation use case.

5- Activate the popularity statistics capability on the repository: In case the reasoner is used to run the content curation use case (and not the mPlane client command line), the repository-top_popular_urls capability must be activated first on the repository.

6. Launch the content curation reasoner script. This will trigger the tstat export capability which in turn will activate automatically the import capability at the repository and launch the content curation use case.

Tests done considering integration, performance, validation tests

Because of a failure of the tstat probe, we could not test the entire setting (with mPlane components and reasoner). A command line version (through an mPlane client) was running successfully for few weeks on the Polito network prior to the failure. We will start the testing of the full setting as soon as the tstat probe is back.

Links to all software used*

 * Note that due to IPR issues, the open source analysis modules will have limited capabilities, and is pending internal publication release permission, which should come in the following weeks. The full version will be provided to all the mPlane partners who wish to, for research and demo purposes, during and beyond the life of the project.