Indexing for Datalog Materialisation with Leapfrog Triejoin

From International Center for Computational Logic

Indexing for Datalog Materialisation with Leapfrog Triejoin

Talk by Philipp Hanisch
Datalog is a well-understood relational query language and there are efficient reasoners based on different concepts and technologies. Nevertheless, the search for faster and more efficient reasoners continues. A promising approach for further performance improvements is the so-called leapfrog triejoin by Veldhuizen. Leapfrog triejoin is a variable-oriented join algorithm, similar to an ordered merge join, and it computes the matches of a Datalog rule as a result trie following a previously defined variable order. The literature about leapfrog triejoin focuses on its application to a single join or, respectively, Datalog rule. Unfortunately, applying leapfrog triejoin to whole Datalog programs during materialization yields new challenges. We concentrate on these challenges as well as on potential solutions.

As leapfrog triejoin requires suitable data structures for each rule, it is prone to be inefficient when considering the rules of a program in isolation, as this might introduce avoidable redundancy. Moreover, the choice of ‘good’ variable orders is crucial for the performance of leapfrog triejoin. Hence, we propose criteria for the quality of variable orders. For finding good variable orders, we propose an optimal approach based on Answer Set Programming as well as a heuristic approach. Furthermore, we show the trade-off between the optimality of the ASP approach and the required time in an evaluation.

Moreover, we generalize leapfrog triejoin to use partial variable orders. We discuss the resulting generalization of the data structures, i.e., f-representations instead of tries, and we show how to efficiently prepare them for leapfrog triejoin. To realize the potential of considering independent variables separately and the more succinct representation of relations, we adapt leapfrog triejoin for partial variable orders. We show that, on a set of benchmarks, partial variable orders are of higher quality than total ones w.r.t. our optimization criteria.

This is a Diploma project defense and will be given online via BigBlueButton. To access the room, use one of the following links:

With ZIH-Login: https://selfservice.zih.tu-dresden.de/l/link.php?m=154204&p=b7e638fd

Without ZIH-Login:

https://selfservice.zih.tu-dresden.de/link.php?m=154204&p=19a754b0