'17
May

3rd Workshop on Managing the Evolution and Preservation of the Data Web

We co-organize the 3rd MEPDaW workshop co-located with the 14th European Semantic Web Conference 2017 in Portoroz, Slovenia.

There is a vast and rapidly increasing quantity of scientific, corporate, government, and crowd-sourced data published on the emerging Data Web. Open Data are expected to play a catalyst role in the way structured information is exploited on a large scale. This offers a great potential for building innovative products and services that create new value from already collected data. It is expected to foster active citizenship (e.g., around the topics of journalism, greenhouse gas emissions, food supply-chains, smart mobility, etc.) and world-wide research according to the “fourth paradigm of science”.

Published datasets are openly available on the Web. A traditional view of digitally preserving them by “pickling them and locking them away” for future use, like groceries, conflicts with their evolution. There are a number of approaches and frameworks, such as the Linked Data Stack, that manage a full life-cycle of the Data Web. More specifically, these techniques are expected to tackle major issues such as the synchronisation problem (how to monitor changes), the curation problem (how to repair data imperfections), the appraisal problem (how to assess the quality of a dataset), the citation problem (how to cite a particular version of a linked dataset), the archiving problem (how to retrieve the most recent or a particular version of a dataset), and the sustainability problem (how to support preservation at scale, ensuring long-term access).

Preserving linked open datasets poses a number of challenges, mainly related to the nature of the Linked Data principles and the RDF data model. Since resources are globally interlinked, effective citation measures are required. Another challenge is to determine the consequences that changes to one LOD dataset may have to other datasets linked to it. The distributed nature of LOD datasets furthermore introduces additional complexity, since external sources that are being linked to may change or become unavailable. Finally, another challenge is to identify means to continuously assess the quality of dynamic datasets.

During last year’s workshop, a number of open research questions were raised during the keynote and discussions:

  1. How can we represent archives of continuously evolving linked datasets? (efficiency vs. compact representation)
  2. How can we measure the performance of systems for archiving evolving datasets, in terms of representation, efficiency and compactness?
  3. How can we improve completeness of archiving?
  4. How can emerging retrieval demands in archiving (e.g. time-traversing and traceability) be satisfied? What type of data analytics can we perform on top of the archived Web of data?
  5. How can certain time-specific queries over archives be answered? Can we re-use existing technologies (e.g. SPARQL or temporal extensions)? What is the right query language for such queries?
  6. Is there an actual and urgent need in the community for handling the dynamicity of the Data Web?
  7. Is there the need of a killer-app to kick start the management of the evolving Web of Data?


'16
May

2nd Workshop on Managing the Evolution and Preservation of the Data Web

We co-organize the 2nd MEPDaW workshop co-located with the 13th European Semantic Web Conference 2016 in Heraklion, Crete.

This workshop targets one of the emerging and fundamental problems in the Semantic Web, specifically the preservation of evolving linked datasets. There is a vast and rapidly increasing quantity of scientific, corporate, government and crowd-sourced data published on the emerging Data Web. Open Data are expected to play a catalyst role in the way structured information is exploited in the large scale. This offers a great potential for building innovative products and services that create new value from already collected data. It is expected to foster active citizenship (e.g., around the topics of journalism, greenhouse gas emissions, food supply-chains, smart mobility, etc.) and world-wide research according to the “fourth paradigm of science”. The most noteworthy advantage of the Data Web is that, rather than documents, facts are recorded, which become the basis for discovering new knowledge that is not contained in any individual source, and solving problems that were not originally anticipated. In particular, Open Data published according to the Linked Data Paradigm are essentially transforming the Web into a vibrant information ecosystem.

Published datasets are openly available on the Web. A traditional view of digitally preserving them by “pickling them and locking them away” for future use, like groceries, would conflict with their evolution. There are a number of approaches and frameworks, such as the LOD2 stack, that manage a full life-cycle of the Data Web. More specifically, these techniques are expected to tackle major issues such as the synchronisation problem (how can we monitor changes), the curation problem (how can data imperfections be repaired), the appraisal problem (how can we assess the quality of a dataset), the citation problem (how can we cite a particular version of a linked dataset), the archiving problem (how can we retrieve the most recent or a particular version of a dataset), and the sustainability problem (how can we spread preservation ensuring long-term access).

Preserving linked open datasets poses a number of challenges, mainly related to the nature of the LOD principles and the RDF data model. In LOD, datasets representing real-world entities are structured; thus, when managing and representing facts we need to take into consideration possible constraints that may hold. Since resources might be interlinked, effective citation measures are required to be in place to enable, for example, the ranking of datasets according to their measured quality. Another challenge is to determine the consequences that changes to one LOD dataset may have to other datasets linked to it. The distributed nature of LOD datasets furthermore makes archiving a headache.

This workshop aims at addressing the above mentioned challenges and issues by providing a forum for researchers and practitioners who apply linked data technologies to discuss, exchange and disseminate their work.


'15
Sep

Best poster at SEMANTiCS 2015

We are happy to announce that our work "The DBpedia wayback machine" obtained the Best Poster & Demo Award at the 11th International Conference on Semantic Systems (SEMANTiCS 2015).

This work by Javier D. Fernández, Patrik Schneider and Jürgen Umbrich, presents the DBpedia Wayback Machine, which extends the DBpedia services with a wayback mechanism to get the version of a resource at any random time. We also model and offer the metadata of revisions and a diff comparison between version, all within the Linked Open Data paradigm and publicly available in a RESTful API.

The DBpedia Wayback Machine is openly available, hosted by the Institute for Information Business at WU Vienna University of Economics and Business.