Optimering av blockbaserad Choleskyfaktorisering för moderna datorsystem : En studie om vektorisering och dess effekt på en befintlig blockalgoritm

Detta är en Kandidat-uppsats från KTH/Skolan för datavetenskap och kommunikation (CSC)

Författare: Jonathan Rosell; Andreas Vestergren; [2016]

Nyckelord: Cholesky;

Sammanfattning: Linear systems and the solving of those is an important tool in many areas of science. The solving of linear systems is an operation of high complexity, and there are applications where systems of thousands of variables are used. Therefore, it is important to use methods and algorithms that can take full advantage of the performance of modern computers. Factorizing a matrix that represents a linear system makes solving it faster. If the matrix is symmetrical and positive definite Cholesky factorization can be used. J. Chen et. al. (2013) studied a block-based algorithm that gives better performance by using the cache memory more efficently when the matrix size increases. Since then, the conditions have changed. The cache memory of modern processors are subject to constant change, and modern processors capability of improving performance by vectorization have been vastly improved. This report examine how this block-based Cholesky factorization can be optimized for modern Intel processors. By using AVX2 instructions, the parts of the algorithm where most of the arithmetic operations are performed are vectorized. The report also study how the optimal block size, as well as how the breaking point between the naive algorithm and the block-based algorithm changes as the hardware develops. Using a fairly simple implementation with vectorization, the time required to factorize matrices of all sizes are cut in half. The breaking point between the naive and the block-based algorithm is now at matrices of sizes as small as 100 × 100. This is an interesting fact as prior research showed a trend where the breaking point seemed to move towards bigger matrices as the hardware developed.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)