Low-Overhead Memory Access Sampler : An Efficient Method for Data-Locality Profiling

Detta är en Uppsats för yrkesexamina på avancerad nivå från Institutionen för informationsteknologi

Författare: Peter Vestberg; [2011]

Nyckelord: ;

Sammanfattning: There is an ever widening performance gap between processors and main memory, a gap bridged by small intermediate memories, cache memories, storing recently referenced data. A miss in the cache is an expensive operation because it requires data to be fetched from main memory. It is therefore crucial to understand application cache behavior. Caches only work well for applications with good data locality; insufficient data locality leads to poor cache utilization which quickly becomes a major performance bottleneck. Analysing and understanding the cache behavior helps in improving data locality and identifying such bottlenecks. In this thesis, we study a method for efficiently analysing application cache behavior. We implement the method in a cache analysis tool. The method uses a statistical cache model that only requires a sparse data locality fingerprint as input. The input is based on reuse distances between cache lines. By adjusting architecture-specific parameters, such as cache line size, the tool can output working-set graphs for a wide range of architectures. Readily available hardware performance counters combined with intelligent sampling are used to enable an implementation with low overhead. We evaluate our cache analysis tool using the SPEC CPU2006 benchmarks and our results show good accuracy and performance. The difference between the cache miss ratio estimated by our tool and a reference tool was nearly always below one percentage point. The run-time overhead was on average 17%. We also do an analysis of the overhead to identify the components of our implementation that are most costly and should be the focus for optimizations. We propose a number of optimizations that could reduce the overhead further. Phase-guided sampling is proposed as a key optimization where application phase behavior is used to determine when to sample memory references. We also build a prototype implementation of this optimization and the preliminary results were promising.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)