The goal of this course is to provide a comprehensive view on recent topics and trends in distributed systems and cloud computing. We will discuss the software techniques employed to construct and program reliable, highly-scalable systems, with a particular focus on data-intensive computing systems.
Specifically, the course will cover the MapReduce programming model, its connection to relational algebra, and high-level programming models that build on MapReduce; in addition, the course delves into the details of the underlying execution framework that supports and execute parallel MapReduce programs, including distributed file-systems and the Hadoop implementation. The course is complemented by a series of practical, hands-on exercises executed on a small-scale cluster, in which students will learn the tools to program in MapReduce, Pig; such exercises are drawn from real-world case studies, including for example network-traffic analysis.
Wednesday 18 |
Thursday 19th |
Friday 20th |
|
9-12 |
Introduction to MapReduce:
|
Hadoop Internals:
|
High-level Languages:
|
14-17 |
Laboratory Session:
|
Laboratory Session:
|
Laboratory Session: The basics
|