Bridging Sim-to-Real Gap in Offline Reinforcement Learning for Antenna Tilt Control in Cellular Networks
Sammanfattning: Antenna tilt is the angle subtended by the radiation beam and horizontal plane. This angle plays a vital role in determining the coverage and the interference of the network with neighbouring cells and adjacent base stations. Traditional methods for network optimization rely on rule-based heuristics to do decision making for antenna tilt optimization to achieve desired network characteristics. However, these methods are quite brittle and are incapable of capturing the dynamics of communication traffic. Recent advancements in reinforcement learning have made it a viable solution to overcome this problem but even this learning approach is either limited to its simulation environment or is limited to off-policy offline learning. So far, there has not been any effort to overcome the previously mentioned limitations, so as to make it applicable in the real world. This work proposes a method that consists of transferring reinforcement learning policies from a simulated environment to a real environment i.e. sim-to-real transfer through the use of offline learning. The approach makes use of a simulated environment and a fixed dataset to compensate for the underlined limitations. The proposed sim-to-real transfer technique utilizes a hybrid policy model, which is composed of a portion trained in simulation and a portion trained on the offline real-world data from the cellular networks. This enables to merge samples from the real-world data to the simulated environment consequently modifying the standard reinforcement learning training procedures through knowledge sharing between the two environment’s representations. On the one hand, simulation enables to achieve better generalization performance with respect to conventional offline learning as it complements offline learning with learning through unseen simulated trajectories. On the other hand, the offline learning procedure enables to close the sim-to-real gap by exposing the agent to real-world data samples. Consequently, this transfer learning regime enable us to establish optimal antenna tilt control which in turn results in improved coverage and reduced interference with neighbouring cells in the cellular network.
HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)