Sökning: "apache spark"

Visar resultat 1 - 5 av 49 uppsatser innehållade orden apache spark.

  1. 1. Big Data and Analytics with Driving  Data : Implementation and Analysis of Data Pipeline and Data Processing Resources

    Master-uppsats, Uppsala universitet/Institutionen för informationsteknologi

    Författare :Ivar Blohm; Erik Jarvis; [2023]
    Nyckelord :;

    Sammanfattning : This thesis project was conducted in cooperation with Zenseact for the purpose of investigating the possible usage of Google BigQuery and its capabilities to store and provide insights of large time-series data. An end-to-end data pipeline was built to facilitate the movement of data from Zenseact's local servers and ingestion into BigQuery. LÄS MER

  2. 2. Big Data Analytics Using Apache Flink for Cybercrime Forensics on X (formerly known as Twitter)

    Magister-uppsats, Högskolan i Halmstad/Akademin för informationsteknologi

    Författare :Manjunath Kakkepalya Puttaswamy; [2023]
    Nyckelord :Apache Flink; Apache Spark; Big Data; Twitter; X;

    Sammanfattning : The exponential growth of social media usage has led to massive data sharing, posing challenges for traditional systems in managing and analyzing such vast amounts of data. This surge in data exchange has also resulted in an increase in cyber threats from individuals and criminal groups. LÄS MER

  3. 3. Auto-Tuning Apache Spark Parameters for Processing Large Datasets

    Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Shidi Zhou; [2023]
    Nyckelord :Apache Spark; Cloud Environment; Spark Configuration Parameter; Resource Utilization; Ridge Regression; Elastic Net; Random Forest; Deep Neural Network; Bayesian Optimization; Particle Swarm Optimization.; Apache Spark; Molnmiljö; Apache Spark konfigurationsparameter; Resursutnyttjande; Ridge-regression; Elastisk nät; Slumpskog; Djupt neuralt nätverk; Bayesiansk optimering; Partikelsvärmsoptimering.;

    Sammanfattning : Apache Spark is a popular open-source distributed processing framework that enables efficient processing of large amounts of data. Apache Spark has a large number of configuration parameters that are strongly related to performance. Selecting an optimal configuration for Apache Spark application deployed in a cloud environment is a complex task. LÄS MER

  4. 4. Resource-efficient and fast Point-in-Time joins for Apache Spark : Optimization of time travel operations for the creation of machine learning training datasets

    Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Axel Pettersson; [2022]
    Nyckelord :Apache Spark; Point-in-Time; ASOF; Join; Optimizations; Time travel; Apache Spark; Point-in-Time; ASOF; Join; Optimeringar; Tidsresning;

    Sammanfattning : A scenario in which modern machine learning models are trained is to make use of past data to be able to make predictions about the future. When working with multiple structured and time-labeled datasets, it has become a more common practice to make use of a join operator called the Point-in-Time join, or PIT join, to construct these datasets. LÄS MER

  5. 5. A performance study for autoscaling big data analytics containerized applications : Scalability of Apache Spark on Kubernetes

    Master-uppsats, Blekinge Tekniska Högskola/Institutionen för datavetenskap

    Författare :Vinay Kumar Vennu; Sai Ram Yepuru; [2022]
    Nyckelord :Containers; Container Orchestration; Big data analytics; Autoscaling; Resource Management;

    Sammanfattning : Container technologies are rapidly changing how distributed applications are executed and managed on cloud computing resources. As containers can be deployed on a large scale, there is a tremendous need for Container Orchestration tools like Kubernetes that are highly automatic in deployment, scaling, and management. LÄS MER