Code Generation for Accelerating Data Flow : Enhancing Pentaho Data Integration Performance

Detta är en Uppsats för yrkesexamina på avancerad nivå från Umeå universitet/Institutionen för fysik

Författare: Alexander Svensson; [2023]

Nyckelord: ;

Sammanfattning: Pentaho Data Integration, called Kettle, is an ETL tool that functions as a no-code program. The tool, implemented in Java, enables users to create data flow structures via a graphical user interface and store them as XML files, which can be edited or executed. In some applications, the current execution method does not provide satisfactory performance. To speed up execution times, we propose a Java code generator that works by analyzing the existing XML setup and Kettle’s existing source code.We also conduct some exploratory work with Apache Hop, another Kettle-based ETL tool, and provide comparative insights.Our analysis demonstrates the potential for significant speed improvements, with times reduced by 60% or even more. We consider this method’s challenges and limitations and propose solutions to overcome them. Overall, our research contributes to the field of no-code programming by highlighting the potential for using code generation to optimize performance in data engineering processes.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)