Improving Efficiency of Data Compaction by Creating & Evaluating a Random Compaction Strategy in Apache Cassandra

Detta är en Master-uppsats från Blekinge Tekniska Högskola/Institutionen för programvaruteknik

Sammanfattning: Background: Cassandra is a NoSQL database, where the data in the background is stored in the immutable tables which are called SSTables. These SSTables are subjected to a method called Compaction to reclaim the disk space and to improve READ performance. Size Tiered Compaction Strategy and Leveled Compaction Strategy are the most used generic compaction strategies for different use cases. Space Amplification and Write Amplification are the main limitations of the above compaction strategies, respectively. This research aims to address the limitations of existing generic compaction strategies. Objectives: A new random compaction strategy will be created to improve the efficiency and effectiveness of compaction. This newly created random compaction strategy will be evaluated by comparing the read, write and space amplification with the existing generic compaction strategies, for different use cases. Methods: In this study, Design Science has been used as a research method to answer both the research questions. Focus groups meetings have been conducted to gain knowledge on the limitations of existing compaction strategies, newly created random compaction strategy, and it’s appropriate solutions. During the evaluation, The metrics have been collected from Prometheus server and visualization is carried out in Grafana server. The compaction strategies are compared significantly by performing statistical tests. Results: The results in this study showed that the random compaction strategy is performing almost similar to Leveled Compaction Strategy. The Random Compaction Strategy solves the space amplification problem and write amplification problem in the Size Tiered Compaction Strategy and Leveled Compaction Strategy, respectively. In this section, eight important metrics have been analyzed for all three compaction strategies. Conclusions: The main artefact of this research is a new Random Compaction Strategy. After performing two iterations, a new stable random compaction strategy is designed. The results were analyzed by comparing the Size Tiered Compaction Strategy, Leveled Compaction Strategy and Random Compaction Strategy on two different use cases. The new random compaction strategy has performed great for Ericsson buffer management use case.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)