Study of bitwise operations on non-scarce attribute based data structures in PostgreSQL

Detta är en Kandidat-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: This report investigates the viability of bitwise operations on non-scarce attribute based data structures in PostgreSQL. For applications where computation can’t be avoided, it most probably can be optimized. In an attempt of bringing the computation closer to hardware and the underlying data, operations directly on the database system are explored, taking inspiration from the research field of comparative genomics. With the case-study of an online job platform in mind, where possible matchings between candidate and job descriptions are calculated by a matching engine, a binary encoding is proposed and the computational components identified. The ultimate goal was to evaluate the scalability of the bitwise strategy with respect to the current matching engine. Through an iterative approach, this report conducts quantitative experiments on the presented components. Most notably, an implementation of the population count in the form of a C extension was introduced. It was found, that even for large sequence lengths, the operation is highly efficient. Among the chosen algorithms Lookup Table, Hamming Weight, Intrinsic functions and Unrolled Inline Assembly, the 64 bit intrinsic function displayed the best performance. Benchmarks determined, that the proposed bitwise approach is an excellent strategy for the outlined use-case. Despite the tradeoff of additional complexity in the encoding and decoding of data, the speedup is so significant, that the targeted user base of 100000 can easily be managed and allows for the deprecation of caching mechanisms.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)