SELECTION OF FEATURES FOR ML BASED COMMANDING OF AUTONOMOUS VEHICLES

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Traffic coordination is an essential challenge in vehicle automation. The challenge is not only about maximizing the revenue/productivity of a fleet of vehicles, but also about avoiding non feasible states such as collisions and low energy levels, which could make the fleet inoperable. The challenge is hard due to the complex nature of the real time traffic and the large state space involved. Reinforcement learning and simulation-based search techniques have been successful in handling complex problem with large state spaces [1] and can be used as potential candidates for traffic coordination. In this degree project, a variant of these techniques known as Dyna-2 [2] is investigated for traffic coordination. A long term memory of past experiences is approximated by a neural network and is used to guide a Temporal Difference (TD) search. Various features are proposed, evaluated and finally a feature representation is chosen to build the neural network model. The Dyna-2 Traffic Coordinator (TC) is investigated for its ability to provide supervision for handling vehicle bunching and charging. Two variants of traffic coordinators, one based on simple rules and another based on TD search are the existing baselines for the performance evaluation. The results indicate that by incorporating learning via a long-term memory, the Dyna-2 TC is robust to vehicle bunching and ensures a good balance in charge levels over time. The performance of the Dyna-2 TC depends on the choice of features used to build the function approximator, a bad feature choice does not provide good generalization and hence results in bad performance. On the other hand, the previous approaches based on rule-based planning and TD search made poor decisions resulting in collisions and low energy states. The search based approach is comparatively better than the rule-based approach, however it is not able to find an optimal solution due to the depth limitations. With the guidance from a long term memory, the search was able to generate a higher return and ensure a good balance in charge levels.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)