Inproceedings4050: Unterschied zwischen den Versionen

Aus International Center for Computational Logic
Wechseln zu:Navigation, Suche
Long Cheng (Diskussion | Beiträge)
(Die Seite wurde neu angelegt: „{{Publikation Erster Autor |ErsterAutorNachname=Cheng |ErsterAutorVorname=Long |FurtherAuthors=Spyros Kotoulas ; Tomas E. Ward ; Georgios Theodoropoulos }} {{I…“)
 
Long Cheng (Diskussion | Beiträge)
Keine Bearbeitungszusammenfassung
Zeile 1: Zeile 1:
{{Publikation Erster Autor
{{Publikation Erster Autor
|ErsterAutorVorname=Long
|ErsterAutorNachname=Cheng
|ErsterAutorNachname=Cheng
|ErsterAutorVorname=Long
|FurtherAuthors=Spyros Kotoulas; Tomas E. Ward; Georgios Theodoropoulos
|FurtherAuthors=Spyros Kotoulas
; Tomas E. Ward
; Georgios Theodoropoulos
}}
}}
{{Inproceedings
{{Inproceedings
Zeile 11: Zeile 9:
|To appear=0
|To appear=0
|Year=2014
|Year=2014
|Month=November
|Month=September
|Booktitle=Proc. 25th ACM International Conference on Hypertext and Social Media (HT'14)
|Booktitle=Proc. 25th ACM International Conference on Hypertext and Social Media (HT'14)
|Pages=300-302
|Publisher=ACM
|Publisher=ACM
}}
}}
{{Publikation Details
{{Publikation Details
|Abstract=The performance of parallel distributed data management systems becomes increasingly important with the rise of Big Data. Parallel joins have been widely studied both in the parallel processing and the database communities. Nevertheless, most of the algorithms so far developed do not consider the data skew, which naturally exists in various applications. State of the art methods designed to handle this problem are based on extensions to either of the two prevalent conventional approaches to parallel joins - the hash-based and duplication-based frameworks. In this paper, we introduce a novel parallel join framework, query-based distributed join (QbDJ), for handling data skew on distributed architectures. Further, we present an efficient implementation of the method based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate the performance of our approach on a cluster of 192 cores (16 nodes) and datasets of 1 billion tuples with different skews. The results show that the method is scalable, and also runs faster with less network communication compared to state-of-art PRPD approach in [1] under high data skew.
|Bild=HT14.jpg
|Download=2013-A-Novel.pdf
|Abstract=We propose an efficient method for fast processing large RDF data over distributed memory. Our approach adopts a two-tier index architecture on each computation node: (1) a light-weight primary index, to keep loading times low, and (2) a dynamic, multi-level secondary index, calculated as a by-product of query execution, to decrease or remove inter-machine data movement for subsequent queries that contain the same graph patterns. Experimental results on a commodity cluster show that we can load large RDF data very quickly in memory while remaining within an interactive range for query processing with the secondary index.
|ISBN=978-1-4503-2954-5
|Download=2014-A-Two.pdf
|Link=http://dl.acm.org/citation.cfm?doid=2631775.2631789
|Link=http://dl.acm.org/citation.cfm?doid=2631775.2631789
|DOI Name=10.1145/2631775.2631789
|DOI Name=10.1145/2631775.2631789
|Projekt=DIAMOND
|Forschungsgruppe=Knowledge Systems
|Forschungsgruppe=Knowledge Systems
}}
}}

Version vom 26. Oktober 2014, 19:38 Uhr

Toggle side column

A Two-tier Index Architecture for Fast Processing Large RDF Data over Distributed Memory

Long ChengLong Cheng,  Spyros KotoulasSpyros Kotoulas,  Tomas E. WardTomas E. Ward,  Georgios TheodoropoulosGeorgios Theodoropoulos
A Two-tier Index Architecture for Fast Processing Large RDF Data over Distributed Memory


Long Cheng, Spyros Kotoulas, Tomas E. Ward, Georgios Theodoropoulos
A Two-tier Index Architecture for Fast Processing Large RDF Data over Distributed Memory
Proc. 25th ACM International Conference on Hypertext and Social Media (HT'14), 300-302, September 2014. ACM
  • KurzfassungAbstract
    We propose an efficient method for fast processing large RDF data over distributed memory. Our approach adopts a two-tier index architecture on each computation node: (1) a light-weight primary index, to keep loading times low, and (2) a dynamic, multi-level secondary index, calculated as a by-product of query execution, to decrease or remove inter-machine data movement for subsequent queries that contain the same graph patterns. Experimental results on a commodity cluster show that we can load large RDF data very quickly in memory while remaining within an interactive range for query processing with the secondary index.
  • Weitere Informationen unter:Further Information: Link
  • Projekt:Project: DIAMOND
  • Forschungsgruppe:Research Group: Knowledge SystemsKnowledge-Based Systems
@inproceedings{CKWT2014,
  author    = {Long Cheng and Spyros Kotoulas and Tomas E. Ward and Georgios
               Theodoropoulos},
  title     = {A Two-tier Index Architecture for Fast Processing Large {RDF}
               Data over Distributed Memory},
  booktitle = {Proc. 25th {ACM} International Conference on Hypertext and Social
               Media (HT'14)},
  publisher = {ACM},
  year      = {2014},
  month     = {September},
  pages     = {300-302},
  doi       = {10.1145/2631775.2631789}
}