Logical Foundations of Linked Data Anonymisation

From International Center for Computational Logic

Logical Foundations of Linked Data Anonymisation

Talk by Bernardo Cuenca Grau
The widespread adoption of the Linked Data paradigm has been driven by the increasing demand for information exchange between organisations, as well as by regulations in domains such as health care and governance that require certain data to be published. In this setting, sensitive information is at high risk of disclosure since published data can be often seamlessly linked with arbitrary external data sources.


In this talk I will discuss the logical foundations of privacy-preserving data publishing (PPDP) in the context of Linked Data. Specifically, we consider anonymisations of RDF graphs (and, more generally, relational datasets with labelled nulls) and define notions of safe and optimal anonymisations. Safety ensures that an anonymised dataset can be published with provable protection guarantees against linking attacks, whereas optimality ensures that the published data preserves as much information from the original data as possible, while still satisfying the safety requirement. I will establish the computational complexity of the underpinning decision problems both under the open-world semantics inherent to RDF and a closed-world semantics, where we assume that an attacker has complete knowledge over some part of the original data.