High performance querying of time series market data

Detta är en Master-uppsats från Umeå universitet/Institutionen för datavetenskap

Författare: Erik Sandberg; [2022]

Nyckelord: ;

Sammanfattning: Time series data is a sequence of data points that are collected over some time interval, either at equally or unequally spaced points. Most people come into contact with time series data on a daily basis without their knowledge. For instance, with the COVID-19 pandemic, more people than ever consume time series data and demands fast and accurate information about daily trends of various COVID-19 statistics. Another example is the financial markets, which rely on autonomous trading algorithms that continuously collect data on how the markets are changing. The time series database type is growing in popularity for every year, but there is limited research on how working with financial market data affects the performance of time series databases. Therefore, the purpose of this thesis is to evaluate, and compare, a set of state-of-the-art time series data storage solutions, in terms of write and read throughput, and query flexibility, when storing transactional and frequently updated market data. A field study was conducted to discover candidate time series data storage solutions that were easy to use, popular, and high-performant, which resulted in the databases MongoDB, TimescaleDB, InfluxDB, QuestDB and OpenTSDB. After the field study, because of the limited duration of the work of this thesis, an implementation feasibility study was conducted to verify that the candidate solutions could be implemented in reasonable time, which resulted in discarding OpenTSDB from the evaluations. To evaluate the candidate solutions, a Java program was implemented, that executed and measured the performance of write and read queries for each of the candidate solutions, when a workload of historical time series market data was used. The results from the performance tests show that, for the workloads tested in this thesis, MongoDB and QuestDB display the best write throughput, read throughput, and query flexibility.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)