Sökning: "Spark SQL"

Visar resultat 1 - 5 av 8 uppsatser innehållade orden Spark SQL.

  1. 1. Transitioning from on-premise computing to cloud computing A cost comparison case study on a Swedish grocery retail company

    Kandidat-uppsats, Uppsala universitet/Institutionen för informationsteknologi

    Författare :Per Bondeson; [2023]
    Nyckelord :;

    Sammanfattning : Using public cloud platforms for managing applications and compute is becoming increasingly common among organizations. In this thesis, a case study was performed at a Swedish grocery retailer that was in pursue of migrating all analytics related IT-resources hosted on both on-premise and private cloud to a public cloud platform. LÄS MER

  2. 2. Data Build Tool (DBT) Jobs in Hopsworks

    Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Zidi Chen; [2022]
    Nyckelord :feature engineering; Structured Query Language SQL ; funktionsteknik; strukturerat frågespråk SQL ;

    Sammanfattning : Feature engineering at scale is always critical and challenging in the machine learning pipeline. Modern data warehouses enable data analysts to do feature engineering by transforming, validating and aggregating data in Structured Query Language (SQL). LÄS MER

  3. 3. Accelerating geospatial database services with Graphical Processing Units

    Kandidat-uppsats, Göteborgs universitet/Institutionen för data- och informationsteknik

    Författare :Andreas Fransson; Johan Johansson; [2019-11-12]
    Nyckelord :Accelerated database; SPA RK database; Windows SQL Server; Geospatial Data; Emergency Systems;

    Sammanfattning : With the growing need of instant or almost instant processing and retrieval when working on large data-sets we ask ourselves the following question “What impact would switching from a conventional CPU database to a GPU accelerated database have on emergency systems using large geospatial data-sets”. [Methodology] We chose to use Design Science and more specifically the method called optimization. LÄS MER

  4. 4. Hudi on Hops : Incremental Processing and Fast Data Ingestion for Hops

    Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Netsanet Gebretsadkan Kidane; [2019]
    Nyckelord :Hudi; Hadoop; Hops; Upsert; SQL; Spark; Kafka; Hudi; Hadoop; Hops; Upsert; SQL; Spark; Kafka;

    Sammanfattning : In the era of big data, data is flooding from numerous data sources and many companies have been utilizing different types of tools to load and process data from various sources in a data lake. The major challenges where different companies are facing these days are how to update data into an existing dataset without having to read the entire dataset and overwriting it to accommodate the changes which have a negative impact on the performance. LÄS MER

  5. 5. Hive, Spark, Presto for Interactive Queries on Big Data

    Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Nikita Gureev; [2018]
    Nyckelord :Hadoop; SQL; interactive analysis; Hive; Spark; Spark SQL; Presto; Big Data;

    Sammanfattning : Traditional relational database systems can not be efficiently used to analyze data with large volume and different formats, i.e. big data. Apache Hadoop is one of the first open-source tools that provides a distributed data storage system and resource manager. LÄS MER