WP3 - Large-scale data analysis

Work package title: Large-scale data analysis
Start date or starting event: M4
Activity Type: RTD
Leader: EURECOM - Pietro Michiardi Pietro.Michiardi@eurecom.fr
Participants: POLITO SSB TI ALBLF ENST NEC TID FW NETVISOR FTW A-LBELL

Objectives

The goal of WP3 is to work on systems to store and to process data collected through the mPlane probes developed within WP2. WP3 relies on the presence of a parallel processing framework to carry out large-scale data processing: in particular we will focus on systems akin to the open-source implementation of MapReduce, called Hadoop, which is nowadays a de-facto standard for large-scale data mining. Hadoop comprises a versatile storage layer and an orchestration framework that can be used to execute data analysis on a cluster of commodity hardware. We define the following objectives:

Design and implement scalable algorithms that operate on very large amounts of data. As illustrated in WP4, we expect to receive, in an asynchronous manner, input data from a multitude of probes defined in WP2. Based on use cases defined in WP1, we design a comprehensive set of network data analysis jobs, which provide aggregated results to be further analysed in WP4.
Design, implement and evaluate scheduling protocols for the efficient and fair allocation of computing resources to network data analysis jobs. Essentially, our objective is to design a new component for scheduling analytic tasks in parallel processing frameworks by considering the particular computational workloads generated by the mPlane infrastructure.
Design and deploy a distributed database system to expose aggregated and external network data to other mPlane components. The objective is to offer an interface for WP4 to access data elaborated in WP3 as well as for data available in digital repositories external to the mPlane infrastructure. Part of this objective is the design and implementation of specific indexing schemes to support a variety of queries aiming at retrieving a small subset of the data available in the repository.

Description of work

The workpackage contains three tasks:

Partners contribution

EURECOM (WP leader) will work on T3.1 (with a particular focus on defining “design patterns” underlying network data processing patterns), T3.2 (investigating techniques to estimate analytic job execution times and size-based scheduling protocols) and T3.3 (with a focus on distributed, column-oriented databases, the implementation of ad-hoc indexing mechanisms and the implementation of glue software to ingest external data stored in existing repositories).
POLITO will work on T3.1 defining and studying algorithms to correlate measurements coming from different probes at the same time, or from the same probe at different time. The goal is to study methodologies to extract the behaviour of measurements considering spatial or time diversity in the context of the “change detection” use case. We will also contribute to T3.3 on advanced indexing techniques to support aggregate queries. Furthermore, data mining techniques will be exploited to infer aggregation hierarchies from analysed data.
SSB will support integration and implementation of database architecture considering the access control and data protection mechanisms developed in T1.4
TI will mainly contribute their expertise as ISP to define interesting problems to be solved in WP3
ALBLF will work on data analysis algorithms to extract useful information from the large collection of data the mPlane system will have to handle
ENST will work on T3.1 (especially to on stress-test the pattern based algorithm design) and in minor part on T3.2 (investigating whether pattern-based design can lead to efficient workflows, e.g., exploiting caching of intermediate results by a pattern-aware task scheduler).
NEC will work on T3.1 with regards to the design of systems for the continuous processing of data for batch and stream processing co-existence. Furthermore, NEC will contribute comparing batch and stream processing systems (e.g., performance of Hadoop vs. packet capture and analysis frameworks in analyzing network traffic). Finally, NEC will work on T3.2 to investigate and find effective resource scheduling schemes to handle jobs in a cluster of computers.
TID will work on T3.2 and T3.3 to define efficient database structure and to study the data partitioning problem
FW will support the installation of testbed coming from WP3
NETVISOR will leverage its experience to support the deployment of large-scale database in a distributed infrastructure.
FTW will work on T3.1, studying and defining how to develop efficient and scalable data analysis algorithms that are executed on a parallel processing framework. FTW will also contribute to T3.2, studying the problem of resources scheduling and the interaction with the underlying parallel processing framework. Finally, FTW will lead T3.3, working on indexing mechanisms and efficient selective queries.
A-LBELL will focus on the application of on-line and sequential learning in order to perform traffic prediction at multiple timescales, the design of adaptive learning models, and means to perform cooperative monitoring by interconnecting distributed mPlane components running on individual routers.

Deliverables

D3.1 – (M6; editor EURECOM): Basic Network Data Analysis. This deliverable is meant to be Public. In this deliverable we describe and map basic algorithms to perform analytic tasks, which corresponds to different use cases addressed in mPlane defined in WP1. Essentially, we will define such algorithms as black boxes, focusing on input and output data, which are essential to start task T3.3.
D3.2 – (M12; editor FTW): Database Layer Design (Including External Repositories Selection). This deliverable is meant to be Public. This is a preliminary (i.e., query performance is not optimized) version that allows data to be accessible by WP4 and external users.
D3.3 – (M22; editor NEC): Algorithm and Scheduler Design and Implementation. This software deliverable is meant to be Public. It consists in two parts. In part 1, we will detail the algorithms developed in the first year after their definition (see D3.1), including their design and performance on a parallel computing testbed. This part is also dedicated to the definition of new basic algorithms that were not devised in D3.1. The second part of the deliverable is dedicated to the design and a preliminary implementation of the job scheduler. In particular, we will use basic techniques to estimate job length, which we will improve for the second release.
D3.4 – (M32; editor EURECOM) - Final Implementation and Evaluation of the Data Processing and Storage Layer. This software deliverable is meant to be Public. It will include the design (but not the full implementation) of a sophisticated method to infer job duration (e.g., for recurring jobs this will be based on statistical analysis, for other jobs this will be based on a training phase). Additionally, based on the input provided by WP4, we will design the key ingredients for query optimization, including indexing and data placement. Finally, we will focus on a detailed description of “design patterns” that will emerge from the implementation of several basic algorithms for data analysis.

Latest News

"Characterizing IPv4 Anycast Adoption and Deployment" awarded the IRTF Applied Networking Research Prize

We are proud to announce that the IRTF awarded the Applied Networking Research Prize 2016 (ANRP 2016) to the paper Cicalese, D., J. Auge, D. Joumblatt, T. ur Friedman, and D. Rossi, "Characterizing IPv4 Anycast Adoption and Deployment", ACM CoNEXT, Heidelberg, ACM, 12/2015.Congratulation to Dario's team, and to mPlane for supporting this research!! And thanks to the IRTF and ISOC for...

No QUIC anymore?

UPDATE: After 6 days, still no QUIC traffic... was it so bad? VP1 VP2VP3Saturday 5/12/2015 - It seems Google just stopped serving QUIC on all its servers. Bug or what? :)

mPlane talk @ IRTF RAIM meeting are available online

Notes from the 2015 IRTF & ISOC Workshop on Research and Applications of Internet Measurements (RAIM) in cooperation with ACM SIGCOMM are available online at http://tid.isoc.org:9001/p/raim-2015.Check talks from B. Trammell about mPlane architecture [video], P. Casas talking about results in 3G/4G networks [video].

mPlane Workshop registration is now open!

Registration for the mPlane workshop is now open!Come to see all the great work done in mPlane, to meet people, and to enjoy lively discussion with prominent researchers in the network measurement field!Check all information and how to register in the Workshop webpage.

Special issue on Machine learning, data mining and Big Data frameworks for network monitoring and troubleshooting

mPlane organizes an Elsevier Communication Networks special issue on "Machine learning, data mining and Big Data frameworks for network monitoring and troubleshooting". Please find the call for paper here. Call for papersThe complexity of the Internet has dramatically increased in the last few years, making it more important and challenging to design scalable network traffic monitoring...

EuCNC 2015 mPlane booth - Paris

mPlane will be present at the European Conference on Networks and Communications (EuCNC 2015)!mPlane will be present as exhibitors at the European Conference on Networks and Communications (EuCNC 2015), in Paris. Come to see us! You'll have the chance to see Demonstrations, Flyers, talk to our experts, and get in touch with the mPlane community!Enjoy the demos:Demonstration of the...

mPlane, RITE paper on ECN cited in Apple announcement

Apple has announced at WWDC 2015 (announcement around 34:30 here) that it is turning on ECN by default for client applications in the current developer builds of the next versions of Mac OS X and iOS. In doing so, it cited "Enabling Internet-Wide Deployment of Explicit Congestion Notification", a PAM 2015 paper that was joint work between the FP7 mPlane and RITE projects, on the current state of...

mPlane is technical sponsor of TRAC 2015

mPlane is technical sponsor of the 6th International Workshop on TRaffic Analysis and Characterization, TRAC 2015, which takes place in Dubrovnik, Croatia, from August 24-27 2015. The workshop is technically co-sponsored by IEEE.

mPlane is technical sponsor of TMA 2013

mPlane is technical sponsor of the 5th IEEE International Traffic Monitoring and Analysis Workshop, TMA 2013, which takes place in Turin, Italy, from April 14-19 2013,co-located with IEEE INFOCOM.

mPlane demonstration booth at EuCNC, June 29-July 2, 2015

mPlane has a booth at the European Conference on Networks and Communications (EuCNC) event, held in Paris between June 29 and July 2, 2015.Come and check out our demonstrations there!

mPlane final workshop co-located with ACM CoNEXT, Heidelberg November 30, 2015

NEC will host the final mPlane workshop event in Heidelberg on November 30, 2015.The event is co-localted with ACM CoNEXT.Please stay tuned for additional details.

IEEE JSAC Special issue on Measuring and Troubleshooting the Internet

mPlane organizes a JSAC special issue on "Measuring and Troubleshooting the Internet: Algorithms, Tools and Applications"Please find the call for paper here.Call for papersThe ubiquity of Internet access, and the wide variety of Internet-enabled devices and applications, have made the Internet a principal pillar of the Information Society. However, its distributed nature leads to operational...

mPlane industrial workshop in Barcelona, April 22 2015

mPlane Industrial WorkshopBarcellona - 22 April 2015 The research project mPlane (http://www.ict-mplane.eu), sponsored by the European Commission with the goal of measuring and troubleshooting Internet performance and availability by building an Intelligent Measurement Plane for Future Network and Application Management, organizes an industrial workshop to showcase the technology...

Tracebox mentioned on RIPE entry

Tracebox and mPlane did their first appearance in the Ripe website! Congratulations to the ULG team!

NEC develops new high speed solution for Internet performance monitoring

NEC Laboratories Europe is addressing the challenges presented by today’s distributed and diverse online environment by developing new monitoring and root cause analysis solutions in the research project mPlane.Full press available here.

Factsheet at the end of Second Year

mPlane reached the end of Second Year! Here is a summary of the achievements so far

Marry Christmas and Wonderful 2015

Best wishes for a Merry Christmas and a wonderful 2015 from all the mPlaners!

mPlane invited to participate to IFIP TC6 2014/2 Strategic Review Meeting in Dagstuhl

International Federation for Information Processing (IFIP) is an umbrella organization for national societies working in the field of information technology. The meeting brought together members of the Technical Committee (TC6) representing experts in the field of computer communication and networking, and a group of research leaders to provide expert input into the strategic direction for the...

The Cost of the “S” in HTTPS

The use of HTTPS is increasing and may become the default in HTTP 2.0. The privacy and security benefits of ubiquitous encryption are relatively clear, but what are the costs?Check the paper, the poster, and the presentation to see the answer!

mPlane poster

Approaching the end of the second year, mPlane is now going into dissemination and demonstration!We prepared a poster that summarizes the project aims and status. Thanks go to TI!

4th PhD School on Traffic Monitoring and Analysis (TMA)

The 4th Traffic Monitoring and Analysis (TMA) PhD School was successfully held in London, UK , April 14-16th 2014, with about 40 participants. The school was operated in cooperation with ACM SIGCOMM that kindly sponsored the event, and was for the first time held in conjunction with the 6th International Workshop on Traffic Monitoring and Analysis , increasing the interaction of PhDs with...

mPlane invited to participate in the FIRE-GENI workshop

mPlane has been invited to partcipate to the Second GENI/FIRE Collaboration Workshop - May 5-6, 2014 Cambridge MA. "mPlane – an Intelligent Measurement Plane for Future Network and Application Management"The focus is on Instrumentation and Measurement - interoperability among monitoring testbed - a clear example where the mPlane architecture can be a winner.

Brian Trammell appointed as IAB member!

On 13 February 2014, the NomCom announced the selection of the IAB slate whose terms will start at IETF 89 in March 2014: Mary Barnes Marc Blanchet (incumbent) Ted Hardie Joe Hildebrand Eliot Lear (incumbent, 1 year term) Brian TrammellCongratulation to Brian!!!

4th PhD School on Traffic Monitoring and Analysis

The 4th PhD School on Traffic Monitoring and Analysis (TMA) will be held in London, right after the TMA workshop.Deadline to register is March 12th 2014. Registration gives free access to the school, and to the TMA workshop!

Dagsthul Seminar on "Global Measurement Framework" is over

It was a very interesting opportunity to share ideas and have some discussions in a very friedly environment. Wine was not so bad either .Now it's time to tag mPlanner from the official photo

mPlane is technical sponsor of TRAC 2014

mPlane is technical sponsor of the 5th International Workshop on TRaffic Analysis and Characterization, TRAC 2014, which takes place in Nicosia, Cyprus, from August 4-8 2014. The workshop is technically co-sponsored by IEEE, and is chaired by Pedro Casas and Brian Trammell.

mPlane paper "IP Mining: Extracting Knowledge from the Dynamics of the Internet Addressing Space" got the Best Paper Award at ITC25

The paper Pedro Casas, Pierdomenico Fiadino, and Arian Bär from FTW received the Best Paper Award for the paper "IP Mining: Extracting Knowledge from the Dynamics of the Internet Addressing Space", presented at the 25th International Teletraffic Congress, ITC25, 2013, got the BEST PAPER AWARD!Congratulation to Pedro, Pierdomenico, Arian and all the FTW people!

mPlane paper appears on Slashdot

The IMC paper "Benchmarking Personal Cloud Storage" has been mentioned on Slashdot!The work has been funded within mPlane. Congratulations to Idilio and Enrico!

mPlane@IMC13

Great experience this year in Barcelona for IMC13!mPlane is Gold sponsor, and four mPlane papers presented there!Drago, I., E. Bocchi, M. Mellia, H. Slatman, and A. Pras, "Benchmarking Personal Cloud Storage", Internet Measurement Conference - IMC, Barcelona (ES), ACM, 10/2013. Vanaubel, Y., J-J. Pansiot, P. Mérindol, and B. Donnet, "Network...

TMA workshop website and CFP is available

The web site and Call for Paper of 6th Workshop on Traffic Monitoring and Analysis is up! Given the topic, this is a relevant workshop for mPlanners.Deadline for submission: November 15!

mPlane mentioned at the AGCOM workshop in Rome

The mPlane project has been mentioned during the Italian Regulatory Agency AGCOMM Workshop that has been held in Roma. The workshop focus has been on the "Qualità dell’accesso ad Internet da rete fissa in Italia" [Quality of Internet Access lines in Italy].See here for more information (in Italian)

Deliverable D3.1 completed

The Deliverable D3.1 - Basic Network Data Analysis has been completed and made available to the public. It describes the requirements, input, output for the algorithms needed to perform analytic tasks on a large amount of data, in the context of WP3. Starting from the use cases defined in WP1, we identify the algorithms needed to address the various scenario requirements.

External Advisory Board complete!

The mPlane External Advisory Board is now complete!Welcome to Mark, Fabian, Alberto and Lukasz and many thanks for their time and support to the mPlane project!

Dagsthul Seminar on "Global Measurement Framework"

mPlane contributes to the organisation of the Dagsthul Seminar on "Global Measurement Framework".OrganizersPhilip Eardley (BT Research, GB)Marco Mellia (Politecnico di Torino, IT)Jörg Ott (Aalto University, FI)Jürgen Schönwälder (Jacobs University – Bremen, DE)Henning Schulzrinne (Columbia University, US)The Dagsthul will take place from Sunday, November 17 to Wednesday,...

2nd plenary meeting 27,28,29 May in Paris!

The 2nd mPlane plenary meeting will be held in Paris, kindly hosted by ENST. Book your agenda!- sloppy start monday 27 may (11h30 -18h00)- full day tuesday 28 may (9h-18h00)- sloppy end wed 29 may (9h-16h00)

"Inside Dropbox: Understanding Personal Cloud Storage Services" awarded the IRTF Applied Networking Research Prize

We are proud to announce that the IRTF awarded the Applied Networking Research Prize 2013 (ANRP 2013) to the paperDrago, I., M. Mellia, M. Munafo', A. Sperotto, R. Sadre, and A. Pras, "Inside Dropbox: Understanding Personal Cloud Storage Services", Internet Measurement Conference - IMC, Boston, MA, ACM, 11/2012Congratulation to Idilio, and to mPlane for...

Happy 2013 from mPlane

Check below some statistics on how people celebrated the new year's eve @ midnight!It seems people stopped using the WEB after all, and started drinking some good glasses of spumante in the real world!And let's call my friend to whish him a very nice 2013!! Check the number of VoIP calls:Same anomaly from a larger time scale:Happy new 2013 to all mPlanners!

Check the pictures of the mPlane kick off meeting in Torino

A selection of pictures during the mPlane kickoff meeting are available. Feel free to find yourself

mPlane officially supports the 5th IEEE International Traffic Monitoring and Analysis Workshop (TMA 2013)

mPlane is proud to anounce the officiail support to the 5th IEEE International Traffic Monitoring and Analysis Workshop (TMA 2013) that will be help in Torino, Italy, April 19, 2013. We look forward to see interesting paper on traffic analysis, and some exicting news from mPlane.

Presentation of mPlane at the Cloud-based Service Platforms for the Future Internet workshop, ICCLab

Presentation of mPlane at the Cloud-based Service Platforms for the Future Internet workshop, ICCLab, Zürich University of Applied Sciences, Winterthur, Switzerland [PDF]

mPlane on the national newspaper "La Stampa"

mPlane made it to the "La stampa" national newspaper. You can try google translator if you really wish.

Previous Pause Next

Intranet

News RSS

Main menu

Public

You are here

WP3 - Large-scale data analysis

Objectives

Description of work

Partners contribution

Deliverables

Latest News

Main menu

Public

You are here

WP3 - Large-scale data analysis

Objectives

Description of work

Partners contribution

Deliverables

Latest News

Search form