Crowdsourcing av data för Hybrid Code Networks

Detta är en Kandidat-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Task-oriented dialogue systems are a popular way for organisations to generate extra value both internally and for customers. Modern approaches for these dialogue systems that use neural networks to enable training directly on written dialogues are very data hungry, which complicates their implementation. Crowdsourcing is an attractive solution for generating this type of training data, but the method also comes with several difficulties. We introduce a new method for generating training data based on parallel crowdsourcing of dialogues, as well as crowdsourced quality review. We use this method to collect a small dataset that takes place within the domain bus driver-traveler. We believe that this method offers an efficient way to collect new, high-quality datasets. Hybrid Code Networks is a model for dialogue systems that combines a neural network with domain-specific knowledge, and thus requires a significantly smaller amount of training data than other similar dialogue systems to achieve comparable performance. By combining Hybrid Code Networks with our new method for generating training data, we believe that the threshold for implementing task-oriented dialogue systems on domains with insufficient training data can be lowered. We implement Hybrid Code Networks and train the implementation on the collected dataset and achieve good results.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)