<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>47</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Arian Bär</style></author><author><style face="normal" font="default" size="100%">Lukasz Golab</style></author><author><style face="normal" font="default" size="100%">Stefan Ruehrup</style></author><author><style face="normal" font="default" size="100%">Mirko Schiavone</style></author><author><style face="normal" font="default" size="100%">Pedro Casas</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Cache Oblivious Scheduling of Shared Workloads</style></title><secondary-title><style face="normal" font="default" size="100%">31st IEEE International Conference on Data Engineering (ICDE)</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2015</style></year><pub-dates><date><style  face="normal" font="default" size="100%">05/2015</style></date></pub-dates></dates><publisher><style face="normal" font="default" size="100%">IEEE</style></publisher><pub-location><style face="normal" font="default" size="100%">Seoul, Korea</style></pub-location><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Shared workload optimization is feasible if the set of tasks to be executed is known in advance, as is the case in updating a set of materialized views or executing an extract-transform-load workflow. In this paper, we consider dataintensive shared workloads with precedence constraints arising from data dependencies, i.e., before executing some task, other tasks may have to run first and generate some data needed by the next task(s). While there has been previous work on identifying common subexpressions in shared workloads and task re-ordering to enable shared scans, in this paper we go a step further and solve the problem of scheduling shared data-intensive workloads in a cache-oblivious way. Our solution relies on a novel formulation of precedence constrained scheduling with the additional constraint that once a data item is in the cache, all tasks that require this data item should execute as soon as possible thereafter. The intuition behind this formulation is that the longer a data item remains in the cache, the more likely it is to be evicted regardless of the cache size. We give an optimal ordering algorithm using A* search over the space of possible orderings, and we propose efficient and effective heuristics that obtain nearly-optimal results in much less time. We present experimental results on real-life data warehouse workloads and the TCP-DS benchmark to validate our claims.&lt;/p&gt;</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>47</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Arian Baer</style></author><author><style face="normal" font="default" size="100%">Pedro Casas</style></author><author><style face="normal" font="default" size="100%">Lukasz Golab</style></author><author><style face="normal" font="default" size="100%">Alessandro Finamore</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">DBStream: an Online Aggregation, Filtering and Processing System for Network Trafﬁc Monitoring</style></title><secondary-title><style face="normal" font="default" size="100%">TRAC</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2014</style></year><pub-dates><date><style  face="normal" font="default" size="100%">08/2014</style></date></pub-dates></dates><publisher><style face="normal" font="default" size="100%">IEEE</style></publisher><pub-location><style face="normal" font="default" size="100%">Nicosia, Cyprus</style></pub-location><language><style face="normal" font="default" size="100%">eng</style></language></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>47</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Arian Bär</style></author><author><style face="normal" font="default" size="100%">Alessandro Finamore</style></author><author><style face="normal" font="default" size="100%">Pedro Casas</style></author><author><style face="normal" font="default" size="100%">Lukasz Golab</style></author><author><style face="normal" font="default" size="100%">Marco Mellia</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Large-Scale Network Traffic Monitoring with DBStream, a System for Rolling Big Data Analysis</style></title><secondary-title><style face="normal" font="default" size="100%">International Conference on Big Data, IEEE BigData</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Big Data Analysis</style></keyword><keyword><style  face="normal" font="default" size="100%">Data Stream Processing</style></keyword><keyword><style  face="normal" font="default" size="100%">network data analysis</style></keyword><keyword><style  face="normal" font="default" size="100%">System Performance</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2014</style></year><pub-dates><date><style  face="normal" font="default" size="100%">11/2014</style></date></pub-dates></dates><publisher><style face="normal" font="default" size="100%">IEEE</style></publisher><pub-location><style face="normal" font="default" size="100%">Washington D.C., USA</style></pub-location><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;The complexity of the Internet has rapidly increased, making it more important and challenging to design scalable network monitoring tools. Network monitoring typically requires rolling data analysis, i.e., continuously and incrementally updating (rolling-over) various reports and statistics over high-volume data streams. In this paper, we describe DBStream, which is an SQL-based system that explicitly supports incremental queries for rolling data analysis. We also present a performance comparison of DBStream with a parallel data processing engine (Spark), showing that, in some scenarios, a single DBStream node can outperform a cluster of ten Spark nodes on rolling network monitoring workloads. Although our performance evaluation is based on network monitoring data, our results can be generalized to other big data problems with high volume and velocity.&lt;/p&gt;</style></abstract></record></records></xml>