Performance assessment of Apache Spark applications

Detta är en Kandidat-uppsats från Linnéuniversitetet/Institutionen för datavetenskap och medieteknik (DM)

Sammanfattning: This thesis addresses the challenges of large software and data-intensive systems. We will discuss a Big Data software that consists of quite a bit of Linux configuration, some Scala coding and a set of frameworks that work together to achieve the smooth performance of the system. Moreover, the thesis focuses on the Apache Spark framework and the challenging of measuring the lazy evaluation of the transformation operations of Spark. Investigating the challenges are essential for the performance engineers to increase their ability to study how the system behaves and take decisions in early design iteration. Thus, we made some experiments and measurements to achieve this goal. In addition to that, and after analyzing the result we could create a formula that will be useful for the engineers to predict the performance of the system in production.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)