Investigating Multi-Objective Reinforcement Learning for Combinatorial Optimization and Scheduling Problems : Feature Identification for multi-objective Reinforcement Learning models

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Reinforcement Learning (RL) has in recent years become a core method for sequential decision making in complex dynamical systems, being of great interest to support improvements in scheduling problems. This could prove important to areas in the newer generation of cellular networks. One such area is the base stations scheduler which allocates radio resources to users. This is posed as large-scale optmization problem which needs to be solved in millisecond intervals, while at the same time accounting for multiple, sometimes conflicting, objectives like latency or Quality of Service requirements. In this thesis, multi-objective RL (MORL) solutions are proposed and evaluated in order to identify desired features for novel applications to the scheduling problem. The posed solution classes were tested in common MORL benchmark environments such as Deep Sea Treasure for efficient and informative evaluation of features. It was ultimately tested in environments to solve combinatorial optmization and scheduling problems. The results indicate that outer-loop multi-policy solutions are able to produce models that comply with desired features for scheduling. A multi-policy multi-objective deep Q-network was implemented and showed it can produce an adaptive-at-run-time discrete model, based on an outer-loop approach that calls a single-policy algorithm. The presented approach does not increase in complexity when adding objectives but generally requires larger sampling quantities for convergence. Differing scalarization techniques of the reward was tested, indicating effect on variance that could effect performance in certain environment characteristics.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)