You are here

Gold mining in a River of Internet Content Traffic

TitleGold mining in a River of Internet Content Traffic
Publication TypeConference Paper
Year of Publication2014
AuthorsBen-Houidi, Z., G. Scavo, S. Ghamri-Doudane, A. Finamore, S. Traverso, and M. Mellia
Conference Name6th International Workshop on Traffic Monitoring and Analysis, TMA
Date Published04/2014
PublisherSpringer
Conference LocationLondon
KeywordsContent mining, HTTP Traffic, URL extraction
Abstract

With the advent of Over-The-Top content providers
(OTTs), Internet Service Providers (ISPs) saw their portfolio of
services shrink to the low margin role of data transporters. In
order to counter this effect, some ISPs started to follow big OTTs
like Facebook and Google in trying to turn their data into a
valuable asset. In this paper, we explore the questions of what
meaningful information can be extracted from network data, and
what interesting insights it can provide. To this end, we tackle
the first challenge of detecting “user-URLs”, i.e., those links that
were clicked by users as opposed to those objects automatically
downloaded by browsers and applications. We devise algorithms
to pinpoint such URLs, and validate them on manually collected
ground truth traces. We then apply them on a three-day long
traffic trace spanning more than 19,000 residential users that
generated around 190 million HTTP transactions. We find that
only 1.6% of these observed URLs were actually clicked by users.
As a first application for our methods, we answer the question
of which platforms participate most in promoting the Internet
content. Surprisingly, we find that, despite its notoriety, only 11%
of the user URL visits are coming from Google Search.

Citation KeyBen2014
Project year: 
Second year
WP(s) associated with the paper: 
WP3 - Large-scale data analysis
WP4 - mPlane Supervisor: Iterative and Adaptive Analysis
Partner(s) associated with the paper's author(s): 
Politecnico di Torino
Alcatel-Lucent Bell Labs
Is this an OFFICIALLY supported mPlane paper?: 
Yes