Inproceedings4048: Unterschied zwischen den Versionen

Aus International Center for Computational Logic
Wechseln zu:Navigation, Suche
Long Cheng (Diskussion | Beiträge)
(Die Seite wurde neu angelegt: „{{Publikation Erster Autor |ErsterAutorNachname=Cheng |ErsterAutorVorname=Long }} {{Publikation Author |Rank=2 |Author=Spyros Kotoulas }} {{Publikation Author…“)
 
Long Cheng (Diskussion | Beiträge)
Keine Bearbeitungszusammenfassung
Zeile 17: Zeile 17:
{{Inproceedings
{{Inproceedings
|Referiert=1
|Referiert=1
|Title=Efficiently Handling Skew in Outer Joins on Distributed Systems
|Title=QbDJ: A Novel Framework for Handling Skew in Parallel Join Processing on Distributed Memory
|To appear=0
|To appear=0
|Year=2014
|Year=2013
|Month=Mai
|Month=November
|Booktitle=Proc. 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'14)
|Booktitle=Proc. 15th IEEE International Conference on High Performance Computing and Communications (HPCC'13)
|Pages=295--304
|Pages=1519-1527
|Publisher=IEEE
|Publisher=IEEE
}}
}}
{{Publikation Details
{{Publikation Details
|Bild=Ccgrid-logo.gif
|Abstract=The performance of parallel distributed data management systems becomes increasingly important with the rise of Big Data. Parallel joins have been widely studied both in the parallel processing and the database communities. Nevertheless, most of the algorithms so far developed do not consider the data skew, which naturally exists in various applications. State of the art methods designed to handle this problem are based on extensions to either of the two prevalent conventional approaches to parallel joins - the hash-based and duplication-based frameworks. In this paper, we introduce a novel parallel join framework, query-based distributed join (QbDJ), for handling data skew on distributed architectures. Further, we present an efficient implementation of the method based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate the performance of our approach on a cluster of 192 cores (16 nodes) and datasets of 1 billion tuples with different skews. The results show that the method is scalable, and also runs faster with less network communication compared to state-of-art PRPD approach in [1] under high data skew.
|Abstract=Outer joins are ubiquitous in databases and big data systems. The question of how best to execute outer joins in large parallel systems is particularly challenging as real world datasets are characterized by data skew leading to performance issues. Although skew handling techniques have been extensively studied for inner joins, there is little published work solving the corresponding problem for parallel outer joins. Conventional approaches to this problem such as ones based on hash redistribution often lead to load balancing problems while duplication-based approaches incurs significant overhead in terms of network communication. In this paper, we propose a new algorithm, query with counters (QC), for directly handling skew in outer joins on distributed architectures. We present an efficient implementation of our approach based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate the performance of our approach on a cluster of 192 cores (16 nodes) and datasets of 1 billion tuples with different skew. Experimental results show that our method is scalable and, in cases of high skew, faster than the state-of-the-art.
|Download=2013-A-Novel.pdf
|ISBN=978-1-4799-2784-5
|Link=http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6832096
|Download=2014-Efficiently-Handling.pdf
|DOI Name=10.1109/HPCC.and.EUC.2013.214
|Link=http://www.computer.org/csdl/proceedings/ccgrid/2014/2784/00/2784a295-abs.html
|DOI Name=10.1109/CCGrid.2014.35
|Projekt=DIAMOND
|Projekt=DIAMOND
|Forschungsgruppe=Information Systems
|Forschungsgruppe=Information Systems
}}
}}

Version vom 21. Oktober 2014, 23:35 Uhr

Toggle side column

QbDJ: A Novel Framework for Handling Skew in Parallel Join Processing on Distributed Memory

Long ChengLong Cheng,  Spyros KotoulasSpyros Kotoulas,  Tomas E. WardTomas E. Ward,  Georgios TheodoropoulosGeorgios Theodoropoulos
Long Cheng, Spyros Kotoulas, Tomas E. Ward, Georgios Theodoropoulos
QbDJ: A Novel Framework for Handling Skew in Parallel Join Processing on Distributed Memory
Proc. 15th IEEE International Conference on High Performance Computing and Communications (HPCC'13), 1519-1527, November 2013. IEEE
  • KurzfassungAbstract
    The performance of parallel distributed data management systems becomes increasingly important with the rise of Big Data. Parallel joins have been widely studied both in the parallel processing and the database communities. Nevertheless, most of the algorithms so far developed do not consider the data skew, which naturally exists in various applications. State of the art methods designed to handle this problem are based on extensions to either of the two prevalent conventional approaches to parallel joins - the hash-based and duplication-based frameworks. In this paper, we introduce a novel parallel join framework, query-based distributed join (QbDJ), for handling data skew on distributed architectures. Further, we present an efficient implementation of the method based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate the performance of our approach on a cluster of 192 cores (16 nodes) and datasets of 1 billion tuples with different skews. The results show that the method is scalable, and also runs faster with less network communication compared to state-of-art PRPD approach in [1] under high data skew.
  • Weitere Informationen unter:Further Information: Link
  • Projekt:Project: DIAMOND
  • Forschungsgruppe:Research Group: Information Systems„Information Systems“ befindet sich nicht in der Liste (Computational Logic, Automatentheorie, Wissensverarbeitung, Knowledge-Based Systems, Knowledge Systems, Wissensbasierte Systeme, Logische Programmierung und Argumentation, Algebra und Diskrete Strukturen, Knowledge-aware Artificial Intelligence, Algebraische und logische Grundlagen der Informatik) zulässiger Werte für das Attribut „Forschungsgruppe“.Knowledge-Based Systems
@inproceedings{CKWT2013,
  author    = {Long Cheng and Spyros Kotoulas and Tomas E. Ward and Georgios
               Theodoropoulos},
  title     = {QbDJ: A Novel Framework for Handling Skew in Parallel Join
               Processing on Distributed Memory},
  booktitle = {Proc. 15th {IEEE} International Conference on High Performance
               Computing and Communications (HPCC'13)},
  publisher = {IEEE},
  year      = {2013},
  month     = {November},
  pages     = {1519-1527},
  doi       = {10.1109/HPCC.and.EUC.2013.214}
}