Project | Querying Archives of Dynamic Linked Open Data

The emerging Web of data is not a static structure of linked datasets, but a dynamic framework continuously evolving. Distributedly and without notice, novel datasets are added, others are modified, abandoned to obsolescence or removed from the Web. All this without a centralized monitoring nor prefixed policy, following the scale-free nature of the Web.

Applications and businesses leveraging the availability of certain data over time, and seeking to track data or conduct studies on the evolution of data, thus need to build their own infrastructures to preserve and query data over time.

Thus, preservation policies on Linked Data collections emerge as a novel topic with the goal of assuring quality and traceability of datasets over time. However, previous experiences in traditional Web archives, such as the Internet Archive, with petabytes of archived information, already highlight scalability problems when managing evolving volumes of information at Web-scale, making the task of longitudinal query across time a formidable challenge with current tools.

It needs to be stressed that querying Web archives has to deal mainly with text, whereas structured interlinked data archiving shall focus on structured queries across time. In particular, several research challenges arise when representing and querying evolving structured interlinked data:

Representation

How can we represent archives of continuously evolving linked datasets? How can huge archives be still processable?

Compression

How can we minimize the redundant information of archives?

Querying

How can we capture the expressiveness of emerging retrieval demands in archiving (e.g. time-traversing, traceability, evolution) and design a query language for evolving interlinked data?

Indexing

How can we index these archives at large scale to still process the demanded queries efficiently?

The proposed project tackles the problem of archiving and querying evolving semantic Web data. To that end, we aim to provide a novel representation leading to compressed queryable linked data archives. Under this scenario, we will investigate on the required expressiveness to query archives across time, and we will propose an structured query language matching the specific needs of consuming local and federated archives.

Thus, the project involves several research areas, from optimized representations for archiving evolving linked data up to indexing archives at large scale, time-based query languages, federation and performance optimization. Finally, we plan to validate all our steps on real data, on the specific use case of archiving governmental Open Data. The resulting project objectives are summarized below.

Project Description

Representation

Compression

Querying

Indexing

Project Goals

Representation (O1)

Query Language (O2)

Indexing (O3)

Query Optimization (O4)

Application (O5)

Support

The project

Project Description

Representation

Compression

Querying

Indexing

Project Goals

Representation (O1)

Query Language (O2)

Indexing (O3)

Query Optimization (O4)

Application (O5)

Support