Fault-Tolerant Over-the-Air Federated Learning with Clustered Aggregation : Using a hierarchical architecture with hybrid digital and analog communication, to deal with byzantine users in Federated Learning over wireless networks.

Detta är en Master-uppsats från Linköpings universitet/Kommunikationssystem

Författare: David Nordlund; [2023]

Nyckelord: federated learning; aircomp;

Sammanfattning: Rapid advancements in modern AI applications have placed unprecedented demands on large-scale connectivity and data aggregation. The vision of Internet-of-Things (IoT) is supported by a massive amount of distributed sensors and wireless devices that generate useful data for these applications. This trend of ever-increasing data traffic creates high pressure on current wireless network capacity. As a response, over-the-air (OtA) computation has emerged as a method of wireless data aggregation with more efficient usage of available wireless resources. OtA computation takes advantage of the superposition property of multiple-access channels to perform both communication and computation simultaneously. Particularly, it has attracted a lot of attention in distributed learning for aggregating local model updates at a central parameter server. In this thesis, the main topic is one such distributed learning scheme, Federated Learning (FL). The integration of OtA computation with FL has been studied extensively, but not from a fault tolerance perspective. Byzantine fault tolerance relates to the ability of a distributed system to operate despite some of its components exhibiting unexpected behaviours. There have been many suggested methods for robust aggregation in FL that deal with such byzantine users. However, they generally assume an orthogonal multiple-access scheme based on digital communication design. In this work, we propose and evaluate a hierarchical communication architecture with clustered aggregation and mixed digital and OtA transmission schemes.The motivation behind the design is to leverage the benefits of both OtA computation in terms of resource efficiency, and digital transmission enabling the use of robust aggregation methods that increase system resilience. We analyze how the proposed architecture compares to the cases with pure digital or OtA transmission, and focus on the tradeoff between their advantages and disadvantages when varying the number of clusters K. We implement Krum for robust aggregation, which places a lower bound of 3 + f on K based on the number of byzantine users f in the system. Increasing K improves the signal-to-noise ratio due to a reduction in communication distance, but this gain diminishes greatly when K increases. These observations depend on the applied clustering algorithm, which is K-means in this thesis. The diminishing improvement in signal-to-noise ratio as K increases places a soft upper bound on K that should be enforced unless the lower bound exceeds it. We conclude that to achieve efficient and resilient data aggregation in OtA FL, the number of clusters needs to be carefully designed. The optimal number of clusters will depend on the channel quality, the number of byzantine users, and the amount of available wireless resources.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)