Multi Agent Reinforcement Learning

Detta är en Master-uppsats från Göteborgs universitet/Institutionen för matematiska vetenskaper

Författare: Rikard Isaksson; [2019-06-13]

Nyckelord: ;

Sammanfattning: Machine learning and artificial intelligence has been a hot topic the last few years, thanks to improved computational power the machine learning framework can now be applied to larger data sets. Reinforcement learning is a group of machine learning algorithms where one does not know the correct answer in advance, much like unsupervised learning. However, in contrast to unsupervised learning, the quality of a decision can be measured as a number. By trail and error a program can learn to nd the optimal decisions to take based on this measure. The reinforcement learning framework has shown to nd solutions to complex problems in con ned game environments and control systems such as balancing tasks and bipedal walking. With reinforcement learning, usable solutions or strategies have been found to many problems which in theory could be solved to optimality but in practice are intractable. The success with reinforcement learning in games such as Chess, Backgammon and Go are examples of such strategies [11]. A problem with reinforcement learning in general is the so called curse of dimensionality. As the problem gets more complex, it naturally takes longer for the program to learn and the computational time often grows quickly with the complexity of the problem. The issue with scalability translates to reinforcement learning systems with multiple agents and new issues arise concerning the learning in terms of stability of a solution. In this thesis we present three algorithms which attempts to tackle the issue with stability of solutions in systems with cooperating or competing agents. The algorithms minimax Q, Nash Q and win or learn fast are presented and implemented on a set of selected problems and the algortihms performance is discussed. We also discuss the scalability and make an attempt at interpreting the assumptions in these algorithms in order to draw conclusions about their applicability to real world

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)