Design and Implementation of aHeterogeneous Multicore Architectureusing Field Programmable Technology

Detta är en Master-uppsats från KTH/Skolan för informations- och kommunikationsteknik (ICT)

Författare: Muhammad Sharjeel Khilji; [2013]

Nyckelord: SPARC V8; AMBA; Avalon; IEEE;

Sammanfattning: Latest trend in multi core architectures is to integrate heterogeneouscores on a single chip in order to achieve task and threadlevel parallelism, high performance and energy efficiency. Someexamples of heterogeneous multi cores processors include (Tegraby NVIDIA,Cell by IBM and Fusion by AMD). The goal of this thesis work is to design a heterogeneous (2x2)network on chip which can run different tasks in parallel on allthe four cores in the network. Development steps of heterogeneousnetwork on chip include integration of Leon3 -a soft core processorby AeroFlex Gaisler which conforms with IEEE 1754 (SPARCV8) architecture- at one of the nodes of a homogeneous networkon chip incorporating four NiosII/s cores -soft core processor byAltera.This integration involves replacing a NiosII/s processor fromone of the four nodes of the homogeneous network by Leon3 processor.To translate the signals between the resource to networkinterface of the node and the Leon3 processor an AMBA bus1 toAvalon bus2 signal translation wrapper was designed. All processorsin the network on chip communicate by message passing interface.To exploit the potential of heterogeneous network on chipthree applications including sparse LU factorization, nqueens andFibonacci numbers calculation were run on it. These applicationwere run on Leon3 SPARC which generated a number of tasks thatcan run in parallel on all cores of the network simultaneously. Thisparallel execution of nqueens and fibonacci numbers calculationhas resulted in speed up as compared to the serial execution ofthese applications on Leon3 SPARC only. Because of the limitedsize of the on chip memory available for the Leon3 processor, itwas not possible to run sparse LU factorization for bigger matrixsizes and this constraint has resulted in no speed up in case ofsparse LU factorization.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)