Project based multi-tenant managed RStudio on Kubernetes for Hopsworks

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: In order to fully benefit from cloud computing, services are designed following the “multi-tenant” architectural model which is aimed at maximizing resource sharing among users. However, multi-tenancy introduces challenges of security, performance isolation, scaling and customization. RStudio server is an open source Integrated Development Environment (IDE) accessible over a web browser for R programming language. The purpose of this thesis is to develop an open source multi-user distributed system on Hopsworks, a data intensive and AI platform, following the multi-tenant model, that provides RStudio as Software as a Service (SaaS). Our goal is to promote collaboration among users when using RStudio and the learning and teaching of R by enabling users easily have access to same computational environments and resources while eliminating installation and maintenance tasks. Hopsworks introduces project-based multi-tenancy where users within a project share project resources (e.g datasets, programs and services) for collaboration which introduces one more challenge of sharing project resources in RStudio server instances. To achieve our purpose and goal we therefore needed to solve the following problems: performance isolation, security, project resources sharing, scaling and customization. Our proposed model is on demand single user RStudio server instances per project. Our system is built around Docker and Kubernetes to solve the problems of performance isolation, security and scaling. We introduce HopsFS-mount, that allows securely mounting HopsFS via FUSE to solve the project resources (datasets and programs) sharing problem. We integrate our system with Apache Spark which can scale and handle Big Data processing workloads. Also we provide a UI where users can provide custom configuration and have full control of their own RStudio server instances. Our system was tested on a GCP cluster with four worker nodes each with 30GB of RAM allocated to them. The tests on this cluster showed that 44 RStudio servers, each with 2GB of RAM, can be run concurrently. Our system can scale out to potentially support hundreds of concurrently running RStudio servers by adding more resources (CPUs and RAM) to the cluster or system. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)