Inferring Dataset Relations using Knowledge Graph Metadata
Sammanfattning: The web site dataportalen.se aims to increase the availability of Swedish open datasets. This is achieved by collecting metadata about the open datasets provided by Swedish organizations. At the time of writing, metadata from more than two thousand datasets reside in the portal, and this number is set to increase. As the number of datasets increases, browsing for relevant information becomes increasingly difficult and time-consuming. The web site supports searching using text and then filtering the results by theme, organization, file format och license. We believe that there exists potential to connect the datasets, thus making it easier to find a dataset of interest. The idea is to find common denominators in the metadata of the datasets. Furthermore, as no user data is available, the datasets had to be connected based solely on the metadata. The datasets are annotated with metadata, such as title,description, keywords, themes. By comparing metadata from different datasets, a measure of similarity could be computed. This measure can then be used to find the most relevant datasets for a specific dataset.The achieved results suggests that it is indeed possible to find similar datasets by only analyzing the metadata. By exploring various methods, we found it to be the case that text data holds useful information that can be used to find relations between datasets. Using a related workas a benchmark, we found that our results are as good if not better. Furthermore, the approach taken in this project is quite general, and should theoretically be applicable in other scenarios where textual data is available.
HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)