Information extraction from text recipes in a web format
Sammanfattning: Searching the Internet for recipes to find interesting ideas for meals to prepare is getting increasingly popular. It can however be difficult to find a recipe for a dish that can be prepared with the items someone has available at home. In this thesis a solution to a part of that problem will be presented. This thesis will investigate a method for extracting the various parts of a recipe from the Internet in order to save them and build a searchable database of recipes where users can search for recipes based on the ingredients they have available. The system works for both English and Swedish and is able identify both languages. This is a problem within Natural Language Processing and the subfield Information Extraction. To solve the Information Extraction problem rule-based techniques based on Named Entity Recognition, Content Extraction and general rule-based extraction are used. The results indicate a generally good but not flawless functionality. For English the rule-based algorithm achieved an F1-score of 83.8% for ingredient identification, 94.5% for identification of cooking instructions and an accuracy of 88.0% and 96.4% for cooking time and number of portions respectively. For Swedish the ingredient identification worked slightly better but the other parts worked slightly worse. The results are comparable to the results of other similar methods and can hence be considered good, they are however not good enough for the system to be used independently without a supervising human.
HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)