Fast Compression of Large Semantic Web Data using X10

Long ChengLong Cheng, Avinash MalikAvinash Malik, Spyros KotoulasSpyros Kotoulas, Tomas E. WardTomas E. Ward, Georgios TheodoropoulosGeorgios Theodoropoulos

Long Cheng, Avinash Malik, Spyros Kotoulas, Tomas E. Ward, Georgios Theodoropoulos
Fast Compression of Large Semantic Web Data using X10
IEEE Transactions on Parallel and Distributed Systems, 27(9):2603-2617, September 2016

Details
Bibtex

KurzfassungAbstract
The Semantic Web comprises enormous volumes of semi-structured data elements. For interoperability, these elements are represented by long strings. Such representations are not efficient for the purposes of applications that perform computations over large volumes of such information. A common approach to alleviate this problem is through the use of compression methods that produce more compact representations of the data. The use of dictionary encoding is particularly prevalent in Semantic Web database systems for this purpose. However, centralized implementations present performance bottlenecks, giving rise to the need for scalable, efficient distributed encoding schemes. In this paper, we propose an efficient algorithm for fast encoding large Semantic Web data. Specially, we present the detailed implementation of our approach based on the state-of-art asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate performance on a cluster of up to 384 cores and datasets of up to 11 billion triples (1.9 TB). Compared to the state-of-art approach, we demonstrate a speed-up of 2.6 - 7.4x and excellent scalability. In the meantime, these results also illustrate the significant potential of the APGAS model for efficient implementation of dictionary encoding and contributes to the engineering of more efficient, larger scale Semantic Web applications.
Bemerkung: Note: This paper is the extended journal version of the article Efficient Parallel Dictionary Encoding for RDF Data. The source code in X10 is available at: https://github.com/longcheng11/rdf_encoding
Projekt:Project: DIAMOND, HAEC B08
Forschungsgruppe:Research Group: Wissensbasierte SystemeKnowledge-Based Systems

@article{CMKWT2016,
  author    = {Long Cheng and Avinash Malik and Spyros Kotoulas and Tomas E.
               Ward and Georgios Theodoropoulos},
  title     = {Fast Compression of Large Semantic Web Data using X10},
  journal   = {IEEE Transactions on Parallel and Distributed Systems},
  volume    = {27},
  number    = {9},
  publisher = {IEEE},
  year      = {2016},
  month     = {September},
  pages     = {2603-2617},
  doi       = {10.1109/TPDS.2015.2496579}
}