Information extraction and mapping for KG construction with learned concepts from scientic documents : Experimentation with relations data for development of concept learner

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Systematic review of research manuscripts is a common procedure in which research studies pertaining a particular field or domain are classified and structured in a methodological way. This process involves, between other steps, an extensive review and consolidation of scientific metrics and attributes of the manuscripts, such as citations, type or venue of publication. The extraction and mapping of relevant publication data, evidently, is a very laborious task if performed manually. Automation of such systematic mapping steps intend to reduce the human effort required and therefore can potentially reduce the time required for this process.The objective of this thesis is to automate the data extraction and mapping steps when systematically reviewing studies. The manual process is replaced by novel graph modelling techniques for effective knowledge representation, as well as novel machine learning techniques that aim to learn these representations. This eventually automates this process by characterising the publications on the basis of certain sub-properties and qualities that give the reviewer a quick high-level overview of each research study. The final model is a concept learner that predicts these sub-properties which in addition addresses the inherent concept-drift of novel manuscripts over time. Different models were developed and explored in this research study for the development of concept learner.Results show that: (1) Graph reasoning techniques which leverage the expressive power in modern graph databases are very effective in capturing the extracted knowledge in a so-called knowledge graph, which allows us to form concepts that can be learned using standard machine learning techniques like logistic regression, decision trees and neural networks etc. (2) Neural network models and ensemble models outperformed other standard machine learning techniques like logistic regression and decision trees based on the evaluation metrics. (3) The concept learner is able to detect and avoid concept drift by retraining the model.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)