Imitation Learning using Reward-Guided DAgger

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Nora Al-naami; [2020]

Nyckelord: ;

Sammanfattning: End-to-end autonomous driving can be approached by finding a policy function that maps observation (e.g. driving view of the road) to driving action. This is done by imitating an expert driver. This approach can be conducted by supervised learning, where the policy function is tuned to minimize the difference between the ground truth and predicted driver actions. However, using this method leads to poor performance since the policy function is trained only on the states reached by the expert. An algorithm in imitation learning that addresses this problem is Dataset Aggregation (DAgger). The main idea of DAgger is to train a policy iteratively with data collected from expert and the policy function itself. This requires identifying a rule for the interaction of the expert and policy function. The current DAgger variants require querying the expert for a long time and do not explore the state space with both safety and efficiency. In this thesis, we present an extension to DAgger, which attempts to present a decision rule with the safety in state space as a probability measure in order to minimize expert queries and guide the exploration in training. We evaluate the proposed algorithm called Reward-Guided DAgger (RG-DAgger) with other known algorithms Behavior Cloning, Vanilla DAgger and Ensemble DAgger. The different algorithms are evaluated in the context of self-driving cars on a track with twenty minutes of driving in a Virtual Battle Space Simulator 3 (VBS3). The training of the algorithms is carried out on ten randomly generated tracks using a human as an expert. The result shows trends of the expert time during training and number of falls during tests for one trial. The trends seen in the performance of expert time show Behavior Cloning performed worst with the highest expert time while RG-DAgger had the lowest total expert time overall DAgger iterations. The performance of number of falls shows the highest average number of falls in Ensemble DAgger. In conclusion Ensemble DAgger and Vanilla DAgger are more robust algorithms to learning compared to RG-DAgger, while RG-DAgger samples labels from expert only when expert controls the car making it more friendly for a human expert.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)