RocksDB Read Optimization Strategies for Streaming Applications

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Modern stream processors rely on embedded key-value stores to manage state that accumulates over long-running computations and exceeds the available memory size. One of these key-value stores is RocksDB, which is widely used in many applications requiring high-performing storage with low latency. RocksDB uses local disk storage as its persistent storage medium. Disk reads are costly operations in terms of latency compared to reads from main memory, and if the time spent waiting for read operations in applications could be reduced, then the overall performance of applications could be improved. There are different strategies that can be used to achieve a shorter latency of read operations. Such a technique is micro-batching, which can utilize parallelism for increased performance and lowered latency. In this project, a specific method in RocksDB called MultiGet, which fetches multiple keys at the same time, was used to perform micro-batching of read operations. The strategy of using the MultiGet method on a batch of keys was benchmarked against a strategy of repeated Get calls, which only fetched one key at a time. The benchmarks were performed with those strategies to measure the difference in latency when used on synthetic key batches and on simulated streaming traces. The benchmarks showed that the MultiGet method had a latency of down to only 11% of the latency of the Get method when reading large batches from disk. When using these strategies on the simulated streaming workloads the strategies using MultiGet showed to be able to execute streaming traces with latency down to 58% of the latency of the Get strategy. These results show that micro-batching with the MultiGet method in RocksDB can potentially be used effectively depending on the streaming workload that is performed. However, there are some issues that need to be addressed first, such as batching operations in a way that makes sure that the streaming data semantics are kept.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)