Sökning: "Dataprocessering"

Hittade 4 uppsatser innehållade ordet Dataprocessering.

  1. 1. A Comparative Study on Efficiency and Scalability of Integer and String Datasets in cuDF and pandas

    Kandidat-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Anton Schulz; Emil Sjölander; [2023]
    Nyckelord :;

    Sammanfattning : This thesis presents a comparative analysis of cuDF and pandas, two Python data processing libraries, with a focus on performance, limitations, and scalability when handling integer and string datasets. The study aims to assess the efficiency and suitability of cuDF as a potential alternative to pandas in scenarios where high-performance data processing is required. LÄS MER

  2. 2. Highly Available Task Scheduling in Distinctly Branched Directed Acyclic Graphs

    Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Patrik Zhong; [2023]
    Nyckelord :Distributed Scheduling; Fault-tolerance; Graph Partitioning; Task Graphs; Dask; Dask Distributed; Data Processing; Distribuerad Schemaläggning; Feltolerans; Grafpartitionering; Uppgiftsgrafer; Dask; Dask Distributed; Dataprocessering;

    Sammanfattning : Big data processing frameworks utilizing distributed frameworks to parallelize the computing of datasets have become a staple part of the data engineering and data science pipelines. One of the more known frameworks is Dask, a widely utilized distributed framework used for parallelizing data processing jobs. LÄS MER

  3. 3. Scaling cloud-native Apache Spark on Kubernetes for workloads in external storages

    Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Piotr Mrowczynski; [2018]
    Nyckelord :Cloud Computing; Spark on Kubernetes; Kubernetes Operator; Elastic Re- source Provisioning; Cloud-Native Architectures; Openstack Magnum; Data Mining; Cloud Computing; Spark över Kubernetes; Kubernetes Operator; Elastic Re- source Provisioning; Cloud-Native Architectures; Openstack Magnum; Containers; Data Mining;

    Sammanfattning : CERN Scalable Analytics Section currently offers shared YARN clusters to its users as monitoring, security and experiment operations. YARN clusters with data in HDFS are difficult to provision, complex to manage and resize. This imposes new data and operational challenges to satisfy future physics data processing requirements. LÄS MER

  4. 4. Integrating Pig and Stratosphere

    Master-uppsats, KTH/Skolan för informations- och kommunikationsteknik (ICT)

    Författare :Vasiliki Kalavri; [2012]
    Nyckelord :;

    Sammanfattning : MapReduce is a wide-spread programming model for processing big amounts of data in parallel. PACT is a generalization of MapReduce, based on the concept of Parallelization Contracts (PACTs). Writing efficient applications in MapReduce or PACT requires strong programming skills and in-depth understanding of the systems’ architectures. LÄS MER