Sökning: "apache spark"

Visar resultat 1 - 5 av 49 uppsatser innehållade orden apache spark.

1. Big Data and Analytics with Driving Data : Implementation and Analysis of Data Pipeline and Data Processing Resources
Master-uppsats, Uppsala universitet/Institutionen för informationsteknologi
Författare :Ivar Blohm; Erik Jarvis; [2023]
Nyckelord :;

Sammanfattning : This thesis project was conducted in cooperation with Zenseact for the purpose of investigating the possible usage of Google BigQuery and its capabilities to store and provide insights of large time-series data. An end-to-end data pipeline was built to facilitate the movement of data from Zenseact's local servers and ingestion into BigQuery. LÄS MER
2. Big Data Analytics Using Apache Flink for Cybercrime Forensics on X (formerly known as Twitter)
Magister-uppsats, Högskolan i Halmstad/Akademin för informationsteknologi
Författare :Manjunath Kakkepalya Puttaswamy; [2023]
Nyckelord :Apache Flink; Apache Spark; Big Data; Twitter; X;

Sammanfattning : The exponential growth of social media usage has led to massive data sharing, posing challenges for traditional systems in managing and analyzing such vast amounts of data. This surge in data exchange has also resulted in an increase in cyber threats from individuals and criminal groups. LÄS MER
3. Auto-Tuning Apache Spark Parameters for Processing Large Datasets
Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)
Författare :Shidi Zhou; [2023]
Nyckelord :Apache Spark; Cloud Environment; Spark Configuration Parameter; Resource Utilization; Ridge Regression; Elastic Net; Random Forest; Deep Neural Network; Bayesian Optimization; Particle Swarm Optimization.; Apache Spark; Molnmiljö; Apache Spark konfigurationsparameter; Resursutnyttjande; Ridge-regression; Elastisk nät; Slumpskog; Djupt neuralt nätverk; Bayesiansk optimering; Partikelsvärmsoptimering.;

Sammanfattning : Apache Spark is a popular open-source distributed processing framework that enables efficient processing of large amounts of data. Apache Spark has a large number of configuration parameters that are strongly related to performance. Selecting an optimal configuration for Apache Spark application deployed in a cloud environment is a complex task. LÄS MER
4. Resource-efficient and fast Point-in-Time joins for Apache Spark : Optimization of time travel operations for the creation of machine learning training datasets
Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)
Författare :Axel Pettersson; [2022]
Nyckelord :Apache Spark; Point-in-Time; ASOF; Join; Optimizations; Time travel; Apache Spark; Point-in-Time; ASOF; Join; Optimeringar; Tidsresning;

Sammanfattning : A scenario in which modern machine learning models are trained is to make use of past data to be able to make predictions about the future. When working with multiple structured and time-labeled datasets, it has become a more common practice to make use of a join operator called the Point-in-Time join, or PIT join, to construct these datasets. LÄS MER
5. A performance study for autoscaling big data analytics containerized applications : Scalability of Apache Spark on Kubernetes
Master-uppsats, Blekinge Tekniska Högskola/Institutionen för datavetenskap
Författare :Vinay Kumar Vennu; Sai Ram Yepuru; [2022]
Nyckelord :Containers; Container Orchestration; Big data analytics; Autoscaling; Resource Management;

Sammanfattning : Container technologies are rapidly changing how distributed applications are executed and managed on cloud computing resources. As containers can be deployed on a large scale, there is a tremendous need for Container Orchestration tools like Kubernetes that are highly automatic in deployment, scaling, and management. LÄS MER

Resultatsidor:

1 2 3 4 5 Nästa

Sökning: "apache spark"

1. Big Data and Analytics with Driving Data : Implementation and Analysis of Data Pipeline and Data Processing Resources

2. Big Data Analytics Using Apache Flink for Cybercrime Forensics on X (formerly known as Twitter)

3. Auto-Tuning Apache Spark Parameters for Processing Large Datasets

4. Resource-efficient and fast Point-in-Time joins for Apache Spark : Optimization of time travel operations for the creation of machine learning training datasets

5. A performance study for autoscaling big data analytics containerized applications : Scalability of Apache Spark on Kubernetes

Sökningar just nu

Populära sökningar

Uppsatser med många visningar igår (2024-04-17)