Sökning: "Apache Spark"

Visar resultat 11 - 15 av 49 uppsatser innehållade orden Apache Spark.

  1. 11. Spark on Kubernetes using HopsFS as a backing store : Measuring performance of Spark with HopsFS for storing and retrieving shuffle files while running on Kubernetes

    Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Shivam Saini; [2020]
    Nyckelord :Spark; Kubernetes; HopsFS; Data processing; Distributed and Parallel processing;

    Sammanfattning : Data is a raw list of facts and details, such as numbers, words, measurements or observations that is not useful for us all by itself. Data processing is a technique that helps to process the data in order to get useful information out of it. Today, the world produces huge amounts of data that can not be processed using traditional methods. LÄS MER

  2. 12. Machine Learning for Predictive Maintenance on Wind Turbines : Using SCADA Data and the Apache Hadoop Ecosystem

    Master-uppsats, Linköpings universitet/Institutionen för datavetenskap

    Författare :John Eriksson; [2020]
    Nyckelord :Predictive maintenance; machine learning; hadoop; spark; mllib; apache; wind turbine; wind turbines; stacking; bagging; multilayer perceptron; decision tree; random forest;

    Sammanfattning : This thesis explores how to implement a predictive maintenance system for wind turbines in Apache Spark using SCADA data. How to balance and scale the data set is evaluated, together with the effects of applying the algorithms available in Spark mllib to the given problem. LÄS MER

  3. 13. Hudi on Hops : Incremental Processing and Fast Data Ingestion for Hops

    Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Netsanet Gebretsadkan Kidane; [2019]
    Nyckelord :Hudi; Hadoop; Hops; Upsert; SQL; Spark; Kafka; Hudi; Hadoop; Hops; Upsert; SQL; Spark; Kafka;

    Sammanfattning : In the era of big data, data is flooding from numerous data sources and many companies have been utilizing different types of tools to load and process data from various sources in a data lake. The major challenges where different companies are facing these days are how to update data into an existing dataset without having to read the entire dataset and overwriting it to accommodate the changes which have a negative impact on the performance. LÄS MER

  4. 14. Ablation Programming for Machine Learning

    Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Sina Sheikholeslami; [2019]
    Nyckelord :Distributed Machine Learning; Distributed Systems; Ablation Studies; Apache Spark; Keras; Hopsworks;

    Sammanfattning : As machine learning systems are being used in an increasing number of applications from analysis of satellite sensory data and health-care analytics to smart virtual assistants and self-driving cars they are also becoming more and more complex. This means that more time and computing resources are needed in order to train the models and the number of design choices and hyperparameters will increase as well. LÄS MER

  5. 15. Intelligent Resource Management for Large-scale Data Stream Processing

    Uppsats för yrkesexamina på avancerad nivå, Uppsala universitet/Institutionen för informationsteknologi

    Författare :Oliver Stein; [2019]
    Nyckelord :;

    Sammanfattning : With the increasing trend of using cloud computing resources, the efficient utilization of these resources becomes more and more important. Working with data stream processing is a paradigm gaining in popularity, with tools such as Apache Spark Streaming or Kafka widely available, and companies are shifting towards real-time monitoring of data such as sensor networks, financial data or anomaly detection. LÄS MER