Continual imitation learning: Enhancing safe data set aggregation with elastic weight consolidation

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: The field of machine learning currently draws massive attention due to ad- vancements and successful applications announced in the last few years. One of these applications is self-driving vehicles. A machine learning model can learn to drive through behavior cloning. Behavior cloning uses an expert’s behavioral traces as training data. However, the model’s steering predictions influence the succeeding input to the model and thus the model’s input data will vary depending on earlier predictions. Eventually the vehicle may de- viate from the expert’s behavioral traces and fail due to encountering data it has not been trained on. This is the problem of sequential predictions. DAG- GER and its improvement SafeDAGGER are algorithms that enable training models in the sequential prediction domain. Both algorithms iteratively col- lect new data, aggregate new and old data and retrain models on all data to avoid catastrophically forgetting previous knowledge. The aggregation of data leads to problems with increasing model training times, memory requirements and requires that previous data is maintained forever. This thesis’s purpose is investigate whether or not SafeDAGGER can be improved with continual learning to create a more scalable and flexible algorithm. This thesis presents an improved algorithm called EWC-SD that uses the continual learning algo- rithm EWC to protect a model’s previous knowledge and thereby only train on new data. Training only on new data allows EWC-SD to have lower training times, memory requirements and avoid storing old data forever compared to the original SafeDAGGER. The different algorithms are evaluated in the con- text of self-driving vehicles on three tracks in the VBS3 simulator. The results show EWC-SD when trained on new data only does not reach the performance of SafeDAGGER. Adding a rehearsal buffer containing only 23 training exam- ples to EWC-SD allows it to outperform SafeDAGGER by reaching the same performance in half as many iterations. The conclusion is that EWC-SD with rehearsal solves the problems of increasing model training times, memory re- quirements and requiring access to all previous data imposed by data aggre- gation.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)