Design and Evaluation of Small-Large Outer Joins in Cloud Computing Environments

Aus International Center for Computational Logic
Wechseln zu:Navigation, Suche

Toggle side column

Design and Evaluation of Small-Large Outer Joins in Cloud Computing Environments

Long ChengLong Cheng,  Ilias TachmazidisIlias Tachmazidis,  Spyros KotoulasSpyros Kotoulas,  Grigoris AntoniouGrigoris Antoniou
Long Cheng, Ilias Tachmazidis, Spyros Kotoulas, Grigoris Antoniou
Design and Evaluation of Small-Large Outer Joins in Cloud Computing Environments
Journal of Parallel and Distributed Computing, 110:2-15, 2017
  • KurzfassungAbstract
    Large-scale analytics is a key application area for data processing and parallel computing research. One of the most common (and challenging) operations in this domain is the join. Though inner join approaches have been extensively evaluated in parallel and distributed systems, there is little published work providing analysis of outer joins, especially in the extremely popular cloud computing environments. A common type of outer join is the small-large outer join, where one relation is relatively small and the other is large. Conventional implementations on this condition, such as one based on hash redistribution, often incur significant network communication, while the duplication-based approaches are complex and inefficient. In this work, we present a new method called DDR (duplication and direct redistribution), which aims to enable efficient small-large outer joins in cloud computing environments while being easy to implement using existing predicates in data processing frameworks. We present the detailed implementation of our approach and evaluate its performance through extensive experiments over the widely used MapReduce and Spark platforms. We show that the proposed method is scalable and can achieve significant performance improvements over the conventional approaches. Compared to the state-of-art method, the DDR algorithm is shown to be easier to implement and can achieve very similar or better performance under different outer join workloads, and thus, can be considered as a new option for current data analysis applications. Moreover, our detailed experimental results also have provided insights of current small-large outer join implementations, thereby allowing system developers to make a more informed choice for their data analysis applications.
  • Projekt:Project: CfaedDIAMONDHAECHAEC B08
  • Forschungsgruppe:Research Group: Wissensbasierte SystemeKnowledge-Based Systems
@article{CTKA2017,
  author    = {Long Cheng and Ilias Tachmazidis and Spyros Kotoulas and Grigoris
               Antoniou},
  title     = {Design and Evaluation of Small-Large Outer Joins in Cloud
               Computing Environments},
  journal   = {Journal of Parallel and Distributed Computing},
  volume    = {110},
  publisher = {Elsevier},
  year      = {2017},
  pages     = {2-15},
  doi       = {10.1016/j.jpdc.2017.02.007}
}