Enhanced Experience Generation for Reinforcement Learning Pre-training in Telecommunication Systems

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Tianxiao Zhao; [2020]

Nyckelord: ;

Sammanfattning: In recent years, the rise of Reinforcement Learning (RL) in robotics and games has attracted growing attention from different industries, such as telecommunications. One novel attempt to apply reinforcement learning within telecommunications is to train an RL agent for auto-scaling Virtualized Network Functions (VNFs) in a core network environment. Such an attempt could bring increased flexibility and intelligence into workload handling and resource allocation, but it majorly suffers from an unacceptably long training period. Traditionally, this issue can be mitigated by pre-training the agent with historical data before its training. Since the amount of collected historical data is limited in the VNF framework, a directly pre-trained agent may be initially equipped with a poorly generalized policy, causing a suboptimal learning performance in the end. In order to remedy these problems, this thesis proposes a method, referred to as EGAN pre-training, to combine enhanced experience generation and pre-training. In this method, an Enhanced Generative Adversarial Network (EGAN) is trained with historical data and utilized to generate synthetic samples, which serve as a supplementary source of pre-training data. The enhancer of EGAN improves the quality of generated samples by adjusting them to follow the latent transitional relation in a given environment. A Deep Q-Network (DQN) agent is then pre-trained with a combination of the historical data and enhanced synthetic data to achieve faster learning. The EGAN pre-trained DQN is evaluated against a baseline DQN, a pre-trained DQN, a GAN pre-trained DQN, and a Dyna- Q agent in two toy environments and a simulated VNF scaling environment. Experimental results show that the EGAN pre-trained DQN generally outperforms the others in the chosen environments, in terms of peak reward performance and rise time gap. Plots of Fréchet Inception Distance (FID) in these environments reveal that the generated data from EGAN has consistently better synthetic quality than that from a normally-trained GAN.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)