International Center for Computational Logic - Benutzerbeiträge [de]

Article3035

2017-03-07T21:44:12Z

Long Cheng: Die Seite wurde neu angelegt: „{{Publikation Erster Autor |ErsterAutorVorname=Long |ErsterAutorNachname=Cheng |FurtherAuthors=Ilias Tachmazidis; Spyros Kotoulas; Grigoris Antoniou; }} {{Art…“

{{Publikation Erster Autor
|ErsterAutorVorname=Long
|ErsterAutorNachname=Cheng
|FurtherAuthors=Ilias Tachmazidis; Spyros Kotoulas; Grigoris Antoniou;
}}
{{Article
|Referiert=1
|Title=Design and Evaluation of Small-Large Outer Joins in Cloud Computing Environments
|To appear=1
|Year=2017
|Journal=Journal of Parallel and Distributed Computing
|Publisher=Elsevier
}}
{{Publikation Details
|Abstract=Large-scale analytics is a key application area for data processing and parallel computing research. One of the most common (and challenging) operations in this domain is the join. Though inner join approaches have been extensively evaluated in parallel and distributed systems, there is little published work providing analysis of outer joins, especially in the extremely popular cloud computing environments. A common type of outer join is the small-large outer join, where one relation is relatively small and the other is large. Conventional implementations on this condition, such as one based on hash redistribution, often incur significant network communication, while the duplication-based approaches are complex and inefficient. In this work, we present a new method called DDR (duplication and direct redistribution), which aims to enable efficient small-large outer joins in cloud computing environments while being easy to implement using existing predicates in data processing frameworks. We present the detailed implementation of our approach and evaluate its performance through extensive experiments over the widely used MapReduce and Spark platforms. We show that the proposed method is scalable and can achieve significant performance improvements over the conventional approaches. Compared to the state-of-art method, the DDR algorithm is shown to be easier to implement and can achieve very similar or better performance under different outer join workloads, and thus, can be considered as a new option for current data analysis applications. Moreover, our detailed experimental results also have provided insights of current small-large outer join implementations, thereby allowing system developers to make a more informed choice for their data analysis applications.
|DOI Name=10.1016/j.jpdc.2017.02.007
|Projekt=Cfaed, DIAMOND, HAEC, HAEC B08
|Forschungsgruppe=Wissensbasierte Systeme
}}

Inproceedings3139/en

2017-02-08T21:31:17Z

2015-12-02T09:29:37Z

Long Cheng:

Article3018

2015-11-10T10:25:24Z

Long Cheng:

{{Publikation Erster Autor
|ErsterAutorVorname=Long
|ErsterAutorNachname=Cheng
|FurtherAuthors=Avinash Malik; Spyros Kotoulas; Tomas E. Ward; Georgios Theodoropoulos
}}
{{Article
|Referiert=0
|Title=Fast Compression of Large Semantic Web using X10
|To appear=1
|Year=2015
|Journal=IEEE Transactions on Parallel and Distributed Systems
|Note=This paper is the extended journal version of the article [[Inproceedings4049/en|Efficient Parallel Dictionary Encoding for RDF Data]]. The source code in X10 is available at: https://github.com/longcheng11/rdf_encoding
}}
{{Publikation Details
|Abstract=The Semantic Web comprises enormous volumes of semi-structured data elements. For interoperability, these elements are represented by long strings. Such representations are not efficient for the purposes of applications that perform computations over large volumes of such information. A common approach to alleviate this problem is through the use of compression methods that produce more compact representations of the data. The use of dictionary encoding is particularly prevalent in Semantic Web database systems for this purpose. However, centralized implementations present performance bottlenecks, giving rise to the need for scalable, efficient distributed encoding schemes. In this paper, we propose an efficient algorithm for fast encoding large Semantic Web data. Specially, we present the detailed implementation of our approach based on the state-of-art asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate performance on a cluster of up to 384 cores and datasets of up to 11 billion triples (1.9 TB). Compared to the state-of-art approach, we demonstrate a speed-up of 2.6 - 7.4x and excellent scalability. In the meantime, these results also illustrate the significant potential of the APGAS model for efficient implementation of dictionary encoding and contributes to the engineering of more efficient, larger scale Semantic Web applications.
|Download=Encodings.pdf
|DOI Name=10.1109/TPDS.2015.2496579
|Projekt=DIAMOND, HAEC
|Forschungsgruppe=Knowledge Systems
}}

2015-10-27T22:51:41Z

Long Cheng:

Article3018

2015-10-27T13:08:31Z

Long Cheng:

{{Publikation Erster Autor
|ErsterAutorVorname=Long
|ErsterAutorNachname=Cheng
|FurtherAuthors=Avinash Malik; Spyros Kotoulas; Tomas E. Ward; Georgios Theodoropoulos
}}
{{Article
|Referiert=0
|Title=Fast Compression of Large Semantic Web using X10
|To appear=1
|Year=2015
|Journal=IEEE Transactions on Parallel and Distributed Systems
|Note=This paper is the extended journal version of the article [[Inproceedings4049/en|Efficient Parallel Dictionary Encoding for RDF Data]].
}}
{{Publikation Details
|Abstract=The Semantic Web comprises enormous volumes of semi-structured data elements. For interoperability, these elements are represented by long strings. Such representations are not efficient for the purposes of applications that perform computations over large volumes of such information. A common approach to alleviate this problem is through the use of compression methods that produce more compact representations of the data. The use of dictionary encoding is particularly prevalent in Semantic Web database systems for this purpose. However, centralized implementations present performance bottlenecks, giving rise to the need for scalable, efficient distributed encoding schemes. In this paper, we propose an efficient algorithm for fast encoding large Semantic Web data. Specially, we present the detailed implementation of our approach based on the state-of-art asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate performance on a cluster of up to 384 cores and datasets of up to 11 billion triples (1.9 TB). Compared to the state-of-art approach, we demonstrate a speed-up of 2.6 - 7.4x and excellent scalability. In the meantime, these results also illustrate the significant potential of the APGAS model for efficient implementation of dictionary encoding and contributes to the engineering of more efficient, larger scale Semantic Web applications.
|Download=Tpds encodings.pdf
|Projekt=DIAMOND, HAEC
|Forschungsgruppe=Knowledge Systems
}}

2015-09-17T20:52:19Z

Long Cheng: Page created automatically by parser function on page Inproceedings3054

#REDIRECT [[Inproceedings3054]]

Inproceedings3054

2015-09-17T20:52:18Z

Long Cheng: Die Seite wurde neu angelegt: „{{Publikation Erster Autor |ErsterAutorVorname=Long |ErsterAutorNachname=Cheng |FurtherAuthors=Spyros Kotoulas; Tomas E Ward; Georgios Theodoropoulos }} {{Inp…“

{{Publikation Erster Autor
|ErsterAutorVorname=Long
|ErsterAutorNachname=Cheng
|FurtherAuthors=Spyros Kotoulas; Tomas E Ward; Georgios Theodoropoulos
}}
{{Inproceedings
|Referiert=1
|Title=High Throughput Indexing for Large-scale Semantic Web Data
|To appear=0
|Year=2015
|Month=April
|Booktitle=Proc. 30th ACM/SIGAPP Symposium On Applied Computing (SAC'15)
|Pages=416-422
|Publisher=ACM
}}
{{Publikation Details
|Abstract=Distributed RDF data management systems become increasingly important with the growth of the Semantic Web. Currently, several such systems have been proposed, however, their indexing methods meet performance bottlenecks either on data loading or querying when processing large amounts of data. In this work, we propose a high throughout index to enable rapid analysis of large datasets. We adopt a hybrid structure to combine the loading speed of similar-size based methods with the execution speed of graph-based approaches, using dynamic data repartitioning over query workloads. We introduce the design and detailed implementation of our method. Experimental results show that the proposed index can indeed vastly improve loading speeds while remaining competitive in terms of performance. Therefore, the method could be considered as a good choice for RDF analysis in large-scale distributed scenarios.
|Download=2015-High-Throughput.pdf
|Link=http://dl.acm.org/citation.cfm?doid=2695664.2695920
|DOI Name=10.1145/2695664.2695920
|Projekt=DIAMOND
|Forschungsgruppe=Knowledge Systems
}}

Datei:2015-High-Throughput.pdf

2015-09-17T20:43:37Z

Long Cheng:

Inproceedings4055

2015-09-17T20:38:33Z

Long Cheng:

{{Publikation Erster Autor
|ErsterAutorVorname=Long
|ErsterAutorNachname=Cheng
|FurtherAuthors=Spyros Kotoulas; Tomas E. Ward; Georgios Theodoropoulos
}}
{{Inproceedings
|Referiert=1
|Title=Design and Evaluation of Parallel Hashing over Large-scale Data
|To appear=0
|Year=2014
|Month=Dezember
|Booktitle=Proc. 21st IEEE International Conference on High Performance Computing (HiPC'14)
|Pages=1-10
|Publisher=IEEE
}}
{{Publikation Details
|Abstract=High-performance analytical data processing systems often run on servers with large amounts of memory. A common data structure used in such environment is the hash tables. This paper focuses on investigating efficient parallel hash algorithms for processing large-scale data. Currently, hash tables on distributed architectures are accessed one key at a time by local or remote threads while shared-memory approaches focus on accessing a single table with multiple threads. A relatively straightforward “bulk-operation” approach seems to have been neglected by researchers. In this work, using such a method, we propose a high-level parallel hashing framework, Structured Parallel Hashing, targeting efficiently processing massive data on distributed memory.

We present a theoretical analysis of the proposed method and describe the design of our hashing implementations. The evaluation reveals a very interesting result - the proposed straightforward method can vastly outperform distributed hashing methods and can even offer performance comparable with approaches based on shared memory supercomputers which use specialized hardware predicates. Moreover, we characterize the performance of our hash implementations through extensive experiments, thereby allowing system developers to make a more informed choice for their high-performance applications.
|Download=2014-Design-Evaluation.pdf
|Link=http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7116909
|DOI Name=10.1109/HiPC.2014.7116909
|Projekt=DIAMOND
|Forschungsgruppe=Knowledge Systems
}}

Inproceedings4052

2015-09-17T20:36:49Z

Long Cheng:

{{Publikation Erster Autor
|ErsterAutorVorname=Long
|ErsterAutorNachname=Cheng
|FurtherAuthors=Spyros Kotoulas; Tomas E. Ward; Georgios Theodoropoulos
}}
{{Inproceedings
|Referiert=1
|Title=Robust and Skew-resistant Parallel Joins in Shared-nothing Systems
|To appear=0
|Year=2014
|Month=November
|Booktitle=Proc. 23rd ACM International Conference on Information and Knowledge Management (CIKM'14)
|Pages=1399-1408
|Publisher=ACM
}}
{{Publikation Details
|Abstract=The performance of joins in parallel database management systems is critical for data intensive operations such as querying. Since data skew is common in many applications, poorly engineered join operations result in load imbalance and performance bottlenecks. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by removing the dependency on global skew knowledge and broadcasting. In this paper, we propose PRPQ (partial redistribution & partial query), an efficient and robust join algorithm for processing large-scale joins over distributed systems. We present the detailed implementation and a quantitative evaluation of our method. The experimental results demonstrate that the proposed PRPQ algorithm is indeed robust and scalable under a wide range of skew conditions. Specially, compared to the state-of-art PRPD method, we achieve 16% - 167% performance improvement and 24% - 54% less network communication under different join workloads.
|Download=2014-Robust-Skew.pdf
|Link=http://dl.acm.org/citation.cfm?doid=2661829.2661888
|DOI Name=10.1145/2661829.2661888
|Projekt=DIAMOND
|Forschungsgruppe=Knowledge Systems
}}

Inproceedings4053

2015-09-17T20:34:14Z

Long Cheng:

{{Publikation Erster Autor
|ErsterAutorVorname=Ilias
|ErsterAutorNachname=Tachmazidis
|FurtherAuthors=Long Cheng; Spyros Kotoulas; Grigoris Antoniou; Tomas E Ward
}}
{{Inproceedings
|Referiert=1
|Title=Massively Parallel Reasoning under the Well-Founded Semantics using X10
|To appear=0
|Year=2014
|Month=November
|Booktitle=Proc. 26th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'14)
|Pages=162-169
|Publisher=IEEE
}}
{{Publikation Details
|Abstract=Academia and industry are investigating novel approaches for processing vast amounts of data coming from enterprises, the Web, social media and sensor readings in an area that has come to be known as Big Data. Logic programming has traditionally focused on complex knowledge structures/programs. The question arises whether and how it can be applied in the context of Big Data. In this paper, we study how the well-founded semantics can be computed over huge amounts of data using mass parallelization. Specifically, we propose and evaluate a parallel approach based on the X10 programming language. Our experiments demonstrate that our
approach has the ability to process up to 1 billion facts within minutes.
|Download=2014-Massively- Parallel.pdf
|Projekt=DIAMOND
|Forschungsgruppe=Knowledge Systems
}}

Inproceedings4054

2015-09-17T20:32:16Z

Long Cheng:

{{Publikation Erster Autor
|ErsterAutorVorname=Long
|ErsterAutorNachname=Cheng
|FurtherAuthors=Yue Ma
}}
{{Inproceedings
|Referiert=1
|Title=Investigating Distributed Approaches to Efficiently Extract Textual Evidences for Biomedical Ontologies
|To appear=0
|Year=2014
|Month=November
|Booktitle=Proc. 14th IEEE International Conference on BioInformatics and BioEngineering (BIBE'14)
|Pages=220-225
|Publisher=IEEE
}}
{{Publikation Details
|Abstract=Heterogeneous data resources in biomedicine become available both in structured and unstructured formats, such as scientific publications, healthcare guidelines, controlled vocabularies, and formal ontologies. Bridging the gaps among these heterogeneous data is useful to discovery implicit knowledge. To make this happen, efficient computational approaches are a necessity for applications in such a knowledge- and data- intensive domain. In this paper, we first define a particular task, relation alignment, which is to identify textual evidences for biomedical ontologies. Then, we investigate two parallel approaches for this task over distributed systems and present the details of their implementations. Moreover, we characterize the performance of our methods through extensive experiments, thereby allowing researchers to make a more informed choice in the presence of large-scale biomedical data.
|Download=2014-Inves-Distributed.pdf
|Projekt=DIAMOND
|Forschungsgruppe=Automatentheorie, Knowledge Systems
}}

Long Cheng

2015-09-02T13:53:36Z

Long Cheng:

2014-11-02T16:42:49Z

Long Cheng:

{{Publikation Erster Autor
|ErsterAutorVorname=Ilias
|ErsterAutorNachname=Tachmazidis
|FurtherAuthors=Long Cheng; Spyros Kotoulas; Grigoris Antoniou; Tomas E Ward
}}
{{Inproceedings
|Referiert=1
|Title=Massively Parallel Reasoning under the Well-Founded Semantics using X10
|To appear=0
|Year=2014
|Month=November
|Booktitle=Proc. 26th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'14)
|Publisher=IEEE
}}
{{Publikation Details
|Abstract=Academia and industry are investigating novel approaches for processing vast amounts of data coming from enterprises, the Web, social media and sensor readings in an area that has come to be known as Big Data. Logic programming has traditionally focused on complex knowledge structures/programs. The question arises whether and how it can be applied in the context of Big Data. In this paper, we study how the well-founded semantics can be computed over huge amounts of data using mass parallelization. Specifically, we propose and evaluate a parallel approach based on the X10 programming language. Our experiments demonstrate that our
approach has the ability to process up to 1 billion facts within minutes.
|Download=2014-Massively- Parallel.pdf
|Forschungsgruppe=Knowledge Systems
}}