WinBro: A Window and Broadcast-based Parallel Streaming Graph Partitioning Framework for Apache Flink

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Adrian Ackva; [2019]

Nyckelord: ;

Sammanfattning: The past years have shown an increasing demand to process data of various kinds and size in real-time. A common representation for many real-world scenarios is a graph, which shows relations between entities, such as users of social networks or pages on the Internet. These graphs increase in size over time and can easily exceed the capacity of single machines.Graph partitioning is used to divide graphs into multiple subgraphs on different servers. Traditional partitioning techniques work in an offline manner where the whole graph is processed before partitioning. Due to the recently increased demand for real-time analysis, online partitioning algorithms have been introduced. They are able to partition a graph arriving as a stream, also referred to as a streaming graph, without any pre-processing step.The goal of a good graph partitioning algorithm is to maintain the data locality and to balance partitions’ load at the same time. Although different algorithms have proven to achieve both goals for real-world graphs, they often require to maintain a state. However, modern stream processing systems, such as Apache Flink, work with a shared-nothing architecture in a data-parallel manner. Therefore, they do not allow to exchange information along with parallel computations. These systems usually use Hash-based partitioning, that is a fast stateless technique but ignores the graph structure. Hence, it can lead to longer analysis times for streaming applications which could benefit from preserved structures.This work aims to develop a state-sharing parallel streaming graph partitioner for Apache Flink, called WinBro, implementing well-performing partitioning algorithms. In order to do this, existing streaming graph algorithms are studied for possible implementation and then integrated into WinBro.For validation, different experiments were made with real-world graphs. In these experiments, the partitioning quality, and partitioning speed were measured. Moreover, the performance of different streaming applications using WinBro was measured and compared with the default Hash-based partitioning method.Results show that the new partitioner WinBro provides better partitioning quality in terms of data locality and also higher performance for applications with requirements for locality-based input data. Nonetheless, the Hash-based partitioner shows the highest throughput and better performance for data localityagnostic streaming applications.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)