Data Pipeline Design for Audit Analytics : Data Ingestion Tools Evaluation & Proof of Concept

Detta är en Uppsats för yrkesexamina på avancerad nivå från Umeå universitet/Institutionen för tillämpad fysik och elektronik

Författare: Adam Bylund; [2023]

Nyckelord: ;

Sammanfattning: The amount of data and its importance is increasing for many industries. To fully take advantage of the data it must be easily accessible and understandable for the user. From a software perspective, the applications used for analyzing the data should be developed with this aspect in mind. Link Visualizer is a Software as a Service (SaaS) solution developed by the company Senseworks, intending to develop an application to provide this service for the audit industry. As the number of active users increases it has been discovered that this also correlates with the desire for data gathering from different third-party sources. A development that the current solution of different microservices has difficulty handling. The implementation of this functionality can be achieved by various alternatives of software architecture. The overall architecture of the new data integration solution is determined to be a data pipeline. This thesis aims to investigate which data extract tool within an data pipeline is the most suitable for Link based on requirements regarding scalability, maintainability, and supportability. The initial part of the study includes an evaluation of different tools that can be considered as contenders for the new solution. The final decision is based on an Architecture Decision Record which helps in the process and increases the chances of making a correct architecture decision. A Proof of Concept (PoC) is then developed to provide practical insight into the chosen tool and to prove if it is suited for implementation in a production version to replace the current solution. The result of the evaluation process showed Airbyte to be the most suitable option. Therefore, a PoC with two third-party data integrations was implemented using the Airbyte Open Source framework. The finished PoC showed possibilities for improvements in all areas of the requirements. However, it should be pointed out that a PoC doesn’t prove the correct or perfect solution, only that it can be done.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)