QbDJ: A Novel Framework for Handling Skew in Parallel Join Processing on Distributed Memory
Aus International Center for Computational Logic
QbDJ: A Novel Framework for Handling Skew in Parallel Join Processing on Distributed Memory
Long ChengLong Cheng, Spyros KotoulasSpyros Kotoulas, Tomas E. WardTomas E. Ward, Georgios TheodoropoulosGeorgios Theodoropoulos
Long Cheng, Spyros Kotoulas, Tomas E. Ward, Georgios Theodoropoulos
QbDJ: A Novel Framework for Handling Skew in Parallel Join Processing on Distributed Memory
Proc. 15th IEEE International Conference on High Performance Computing and Communications (HPCC'13), 1519-1527, November 2013. IEEE
QbDJ: A Novel Framework for Handling Skew in Parallel Join Processing on Distributed Memory
Proc. 15th IEEE International Conference on High Performance Computing and Communications (HPCC'13), 1519-1527, November 2013. IEEE
- KurzfassungAbstract
The performance of parallel distributed data management systems becomes increasingly important with the rise of Big Data. Parallel joins have been widely studied both in the parallel processing and the database communities. Nevertheless, most of the algorithms so far developed do not consider the data skew, which naturally exists in various applications. State of the art methods designed to handle this problem are based on extensions to either of the two prevalent conventional approaches to parallel joins - the hash-based and duplication-based frameworks. In this paper, we introduce a novel parallel join framework, query-based distributed join (QbDJ), for handling data skew on distributed architectures. Further, we present an efficient implementation of the method based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate the performance of our approach on a cluster of 192 cores (16 nodes) and datasets of 1 billion tuples with different skews. The results show that the method is scalable, and also runs faster with less network communication compared to state-of-art PRPD approach in [1] under high data skew. - Weitere Informationen unter:Further Information: Link
- Forschungsgruppe:Research Group: Wissensbasierte SystemeKnowledge-Based Systems
@inproceedings{CKWT2013,
author = {Long Cheng and Spyros Kotoulas and Tomas E. Ward and Georgios
Theodoropoulos},
title = {QbDJ: A Novel Framework for Handling Skew in Parallel Join
Processing on Distributed Memory},
booktitle = {Proc. 15th {IEEE} International Conference on High Performance
Computing and Communications (HPCC'13)},
publisher = {IEEE},
year = {2013},
month = {November},
pages = {1519-1527},
doi = {10.1109/HPCC.and.EUC.2013.214}
}