Comparison of cumulative reward withone, two and three layered artificialneural network in a simple environmentwhen using ml-agents

Detta är en Kandidat-uppsats från Blekinge Tekniska Högskola/Institutionen för datavetenskap

Författare: David Björkberg; [2021]

Nyckelord: Machine learning; ml-agents;

Sammanfattning: Background.In machine learning you let the computer play a scenario, often millions of times. When the computer plays it receives feedback based on preset guidelines. The computer then adjusts its behaviour based on that feedback. The way the computer stores its feedback is in its artificial neural network(ANN). The ANN consists of an input layer, a set amount of hidden layers and an output layer. The ANN calculates actions using weights between the nodes in each layer and modifies those weights when it receives feedback. ml-agents is Unity Technologies implementation of machine learning. Objectives.ml-agents is a complex system with many different configurations. This results in users needing sources on what configuration to use for the best results. Our thesis aimed to answer the question of how many hidden layers yield the best results.We did this by attempting to answer our research question "How many layers are required to make the network capable of capturing the complexities of the environ-ment?". Methods.We used a prebuilt environment provided by Unity, in which the agent aims to keep a ball on its head for as long as possible. The training was collected by Tensorflow, which then provided graphs for each training session. We used these graphs to evaluate the training sessions. We ran each training session several times to get more consistent results. To evaluate the training sessions we looked at the peak of their cumulative reward graph and secondarily on how fast they reached this peak. Results.We found that with just one layer, the agent could only get roughly a fifth of the way to capturing the complexity of the environment. However, with two and three layers the agent was capable of capturing the complexity of the environment.The three layered training sessions reached their cumulative reward peak 22 percent faster than the two layered. Conclusions.We managed to get an answer to our research question. The minimum amount of hidden layers required to capture the complexity of the environment is two. However, with an additional layer the agent was able to get the same result faster. Which is worth taking into consideration

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)