Generating corpora of semantic graphs based on graph extension grammar

Detta är en Kandidat-uppsats från Umeå universitet/Institutionen för datavetenskap

Författare: Eric Andersson; [2023]

Nyckelord: ;

Sammanfattning: This thesis introduces the tool Lovelace which is used to generate corpora of semantic graphs to investigate which functionalities and design- as well as implementation aspects are important in a corpus generator. Lovelace uses the graph grammar formalism graph extension grammar (GEG) to generate these corpora. A GEG consists of two parts, regular tree grammar (RTG) and graph operations. A tree generated by an RTG is used as an instruction on how the graph operations are applied to create a semantic graph. Since Lovelace can express variables as word classes the combination of semantic graphs and well-formed word classes means that the corpus generated by Lovelace is well-formed. In addition, Lovelace enables the user to configure parameters to specify the corpus generated. These corpora could be used as a tool to translate and process natural language. The thesis ends with a discussion about which parts are missing and what could be improved in the corpus generator, along with new insights into which functionalities are important for a user of a corpus generator.  

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)