MIXED MEMORY Q-LEARNER An adaptive reinforcement learning algorithm for the Iterated Prisoner’s Dilemma

Detta är en Kandidat-uppsats från Institutionen för tillämpad informationsteknologi

Sammanfattning: The success of future societies is likely to depend on cooperative interactionsbetween humans and artificial agents. As such, it is important to investigate howmachines can learn to cooperate. By looking at how machines handle complexsocial situations, so-called social dilemmas, knowledge about the componentsnecessary for cooperation in artificial agents can be acquired. In this study, areinforcement learning algorithm was used to study the Iterated Prisoner’sDilemma (IPD), a common social dilemma game. A reinforcement learningalgorithm can make decisions in the IPD by considering a given number of itsopponent’s last actions, thus representing the agent’s memory. This studyinvestigated the role of different memory lengths on the performance of the agentin the IPD. The results showed that different memory lengths are preferabledepending on the opponent. A new algorithm was created called Mixed MemoryQ-Learner (MMQL), which could switch memory length during play to adapt to itsopponent. It could also recognise its opponent between games, thus continuing itslearning over several interactions. MMQL performed better against certainopponents in the IPD but did not learn to cooperate with cooperative players.Further capabilities might therefore be added to the algorithm to invite cooperation,or the environment can be manipulated. The results suggest that flexibility in how asituation is represented and the ability to recognise opponents are importantcapabilities for artificial agents in social dilemmas.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)