Scale-Out Processing of Large RDF Datasets
From International Center for Computational Logic
Scale-Out Processing of Large RDF Datasets
Long ChengLong Cheng, Spyros KotoulasSpyros Kotoulas
Long Cheng, Spyros Kotoulas
Scale-Out Processing of Large RDF Datasets
IEEE Transactions on Big Data, 1(4):138-150, December 2015
Scale-Out Processing of Large RDF Datasets
IEEE Transactions on Big Data, 1(4):138-150, December 2015
- KurzfassungAbstract
Distributed RDF data management systems become increasingly important with the growth of the Semantic Web. Regardless, current methods meet performance bottlenecks either on data loading or querying when processing large amounts of data. In this work, we propose efficient methods for processing RDF using dynamic data re-partitioning to enable rapid analysis of large datasets. Our approach adopts a two-tier index architecture on each computation node: (1) a lightweight primary index, to keep loading times low, and (2) a series of dynamic, multi-level secondary indexes, calculated as a by-product of query execution, to decrease or remove inter-machine data movement for subsequent queries that contain the same graph patterns. In addition, we propose methods to replace some secondary indexes with distributed filters, so as to decrease memory consumption. Experimental results on a commodity cluster with 16 nodes show that the method presents good scale-out characteristics and can indeed vastly improve loading speeds while remaining competitive in terms of performance. Specifically, our approach can load a dataset of 1.1 billion triples at a rate of 2.48 million triples per second and provide competitive performance to RDF-3X and 4store for expensive queries. - Projekt:Project: DIAMOND, HAEC B08
- Forschungsgruppe:Research Group: Wissensbasierte SystemeKnowledge-Based Systems
@article{CK2015,
author = {Long Cheng and Spyros Kotoulas},
title = {Scale-Out Processing of Large {RDF} Datasets},
journal = {IEEE Transactions on Big Data},
volume = {1},
number = {4},
publisher = {IEEE},
year = {2015},
month = {December},
pages = {138-150}
}