Efficiently Handling Skew in Outer Joins on Distributed Systems

From International Center for Computational Logic

Toggle side column

Efficiently Handling Skew in Outer Joins on Distributed Systems

Long ChengLong Cheng,  Spyros KotoulasSpyros Kotoulas,  Tomas E. WardTomas E. Ward,  Georgios TheodoropoulosGeorgios Theodoropoulos
Efficiently Handling Skew in Outer Joins on Distributed Systems


Long Cheng, Spyros Kotoulas, Tomas E. Ward, Georgios Theodoropoulos
Efficiently Handling Skew in Outer Joins on Distributed Systems
Proc. 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'14), 295--304, May 2014. IEEE
  • KurzfassungAbstract
    Outer joins are ubiquitous in databases and big data systems. The question of how best to execute outer joins in large parallel systems is particularly challenging as real world datasets are characterized by data skew leading to performance issues. Although skew handling techniques have been extensively studied for inner joins, there is little published work solving the corresponding problem for parallel outer joins. Conventional approaches to this problem such as ones based on hash redistribution often lead to load balancing problems while duplication-based approaches incurs significant overhead in terms of network communication. In this paper, we propose a new algorithm, query with counters (QC), for directly handling skew in outer joins on distributed architectures. We present an efficient implementation of our approach based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate the performance of our approach on a cluster of 192 cores (16 nodes) and datasets of 1 billion tuples with different skew. Experimental results show that our method is scalable and, in cases of high skew, faster than the state-of-the-art.
  • Weitere Informationen unter:Further Information: Link
  • Forschungsgruppe:Research Group: Wissensbasierte SystemeKnowledge-Based Systems
@inproceedings{CKWT2014,
  author    = {Long Cheng and Spyros Kotoulas and Tomas E. Ward and Georgios
               Theodoropoulos},
  title     = {Efficiently Handling Skew in Outer Joins on Distributed Systems},
  booktitle = {Proc. 14th {IEEE/ACM} International Symposium on Cluster, Cloud
               and Grid Computing (CCGrid'14)},
  publisher = {IEEE},
  year      = {2014},
  month     = {May},
  pages     = {295--304},
  doi       = {10.1109/CCGrid.2014.35}
}