Title | Gold mining in a River of Internet Content Traffic |
Publication Type | Conference Paper |
Year of Publication | 2014 |
Authors | Ben-Houidi, Z., G. Scavo, S. Ghamri-Doudane, A. Finamore, S. Traverso, and M. Mellia |
Conference Name | 6th International Workshop on Traffic Monitoring and Analysis, TMA |
Date Published | 04/2014 |
Publisher | Springer |
Conference Location | London |
Keywords | Content mining, HTTP Traffic, URL extraction |
Abstract | With the advent of Over-The-Top content providers
(OTTs), Internet Service Providers (ISPs) saw their portfolio of
services shrink to the low margin role of data transporters. In
order to counter this effect, some ISPs started to follow big OTTs
like Facebook and Google in trying to turn their data into a
valuable asset. In this paper, we explore the questions of what
meaningful information can be extracted from network data, and
what interesting insights it can provide. To this end, we tackle
the first challenge of detecting “user-URLs”, i.e., those links that
were clicked by users as opposed to those objects automatically
downloaded by browsers and applications. We devise algorithms
to pinpoint such URLs, and validate them on manually collected
ground truth traces. We then apply them on a three-day long
traffic trace spanning more than 19,000 residential users that
generated around 190 million HTTP transactions. We find that
only 1.6% of these observed URLs were actually clicked by users.
As a first application for our methods, we answer the question
of which platforms participate most in promoting the Internet
content. Surprisingly, we find that, despite its notoriety, only 11%
of the user URL visits are coming from Google Search.
|
Citation Key | Ben2014 |
WP(s) associated with the paper:
WP3 - Large-scale data analysis
WP4 - mPlane Supervisor: Iterative and Adaptive Analysis
Partner(s) associated with the paper's author(s):
Politecnico di Torino
Alcatel-Lucent Bell Labs
Is this an OFFICIALLY supported mPlane paper?: