Latent Representation of Tasks for Faster Learning in Reinforcement Learning

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Felix Engström; [2019]

Nyckelord: ;

Sammanfattning: Reinforcement learning (RL) is an field of machine learning (ML) which attempts to approach learning in an manner inspired by the human way of learning through rewards and penalties. As with other forms of ML, it is strongly dependent on large amounts of data, the acquisition of which can be costly and time consuming. One way to reduce the need for data is transfer learning (TL) in which knowledge stored in one model can be used to help in the training of another model. In an attempt at performing TL in the context of RL we have suggested a multitask Q-learning model. This model is trained on multiple tasks that are assumed to come from some family of tasks sharing traits. This model combines contemporary Q-learning methods from the field of RL with ideas from the concept of variational auto encoders (VAEs), and thus suggests a probabilistically motivated model in which the Q-network is parameterized on a latent variable z ∈ Z representing the task. This is done in a way which is hoped to allow the use of the Z space to search for solutions when encountering new tasks from the same family of tasks. To evaluate the model we have designed a simple grid world environment called HillWalk, and two models are trained, each on a separate set of tasks from this environment. The results of these are then evaluated by comparing against a baseline Q- learning model from the OpenAI project, as well as through an investigation of the final models behaviour in relation to the latent variable z.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)