<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>47</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Zied Ben-Houidi</style></author><author><style face="normal" font="default" size="100%">Giuseppe Scavo</style></author><author><style face="normal" font="default" size="100%">Samir Ghamri-Doudane</style></author><author><style face="normal" font="default" size="100%">Alessandro Finamore</style></author><author><style face="normal" font="default" size="100%">Stefano Traverso</style></author><author><style face="normal" font="default" size="100%">Marco Mellia</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Gold mining in a River of Internet Content Traffic</style></title><secondary-title><style face="normal" font="default" size="100%">6th International Workshop on Traffic Monitoring and Analysis, TMA</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Content mining</style></keyword><keyword><style  face="normal" font="default" size="100%">HTTP Traffic</style></keyword><keyword><style  face="normal" font="default" size="100%">URL extraction</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2014</style></year><pub-dates><date><style  face="normal" font="default" size="100%">04/2014</style></date></pub-dates></dates><publisher><style face="normal" font="default" size="100%">Springer</style></publisher><pub-location><style face="normal" font="default" size="100%">London</style></pub-location><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">With the advent of Over-The-Top content providers
(OTTs), Internet Service Providers (ISPs) saw their portfolio of
services shrink to the low margin role of data transporters. In
order to counter this effect, some ISPs started to follow big OTTs
like Facebook and Google in trying to turn their data into a
valuable asset. In this paper, we explore the questions of what
meaningful information can be extracted from network data, and
what interesting insights it can provide. To this end, we tackle
the first challenge of detecting “user-URLs”, i.e., those links that
were clicked by users as opposed to those objects automatically
downloaded by browsers and applications. We devise algorithms
to pinpoint such URLs, and validate them on manually collected
ground truth traces. We then apply them on a three-day long
traffic trace spanning more than 19,000 residential users that
generated around 190 million HTTP transactions. We find that
only 1.6% of these observed URLs were actually clicked by users.
As a first application for our methods, we answer the question
of which platforms participate most in promoting the Internet
content. Surprisingly, we find that, despite its notoriety, only 11%
of the user URL visits are coming from Google Search.
</style></abstract></record></records></xml>