3rd Workshop on Managing the Evolution and Preservation of the Data Web
May-2017
We co-organize the 3rd MEPDaW workshop co-located with the 14th European Semantic Web Conference 2017 in Portoroz, Slovenia.
There is a vast and rapidly increasing quantity of scientific, corporate, government, and crowd-sourced data published on the emerging Data Web. Open Data are expected to play a catalyst role in the way structured information is exploited on a large scale. This offers a great potential for building innovative products and services that create new value from already collected data. It is expected to foster active citizenship (e.g., around the topics of journalism, greenhouse gas emissions, food supply-chains, smart mobility, etc.) and world-wide research according to the “fourth paradigm of science”.
Published datasets are openly available on the Web. A traditional view of digitally preserving them by “pickling them and locking them away” for future use, like groceries, conflicts with their evolution. There are a number of approaches and frameworks, such as the Linked Data Stack, that manage a full life-cycle of the Data Web. More specifically, these techniques are expected to tackle major issues such as the synchronisation problem (how to monitor changes), the curation problem (how to repair data imperfections), the appraisal problem (how to assess the quality of a dataset), the citation problem (how to cite a particular version of a linked dataset), the archiving problem (how to retrieve the most recent or a particular version of a dataset), and the sustainability problem (how to support preservation at scale, ensuring long-term access).
Preserving linked open datasets poses a number of challenges, mainly related to the nature of the Linked Data principles and the RDF data model. Since resources are globally interlinked, effective citation measures are required. Another challenge is to determine the consequences that changes to one LOD dataset may have to other datasets linked to it. The distributed nature of LOD datasets furthermore introduces additional complexity, since external sources that are being linked to may change or become unavailable. Finally, another challenge is to identify means to continuously assess the quality of dynamic datasets.
During last year’s workshop, a number of open research questions were raised during the keynote and discussions:
- How can we represent archives of continuously evolving linked datasets? (efficiency vs. compact representation)
- How can we measure the performance of systems for archiving evolving datasets, in terms of representation, efficiency and compactness?
- How can we improve completeness of archiving?
- How can emerging retrieval demands in archiving (e.g. time-traversing and traceability) be satisfied? What type of data analytics can we perform on top of the archived Web of data?
- How can certain time-specific queries over archives be answered? Can we re-use existing technologies (e.g. SPARQL or temporal extensions)? What is the right query language for such queries?
- Is there an actual and urgent need in the community for handling the dynamicity of the Data Web?
- Is there the need of a killer-app to kick start the management of the evolving Web of Data?