DevOps for Data Science System

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Commercialization potential is important to data science. Whether the problems encountered by data science in production can be solved determines the success or failure of the commercialization of data science. Recent research shows that DevOps theory is a great approach to solve the problems that software engineering encounters in production. And from the product perspective, data science and software engineering both need to provide digital services to customers. Therefore it is necessary to study the feasibility of applying DevOps in data science. This paper describes an approach of developing a delivery pipeline line for a data science system applying DevOps practices. I applied four practices in the pipeline: version control, model server, containerization, and continuous integration and delivery. However, DevOps is not a theory designed specifically for data science. This means the currently available DevOps practices cannot cover all the problems of data science in production. I expended the set of practices of DevOps to handle that kind of problem with a practice of data science. I studied and involved transfer learning in the thesis project. This paper describes an approach of parameterbased transfer where parameters learned from one dataset are transferred to another dataset. I studied the effect of transfer learning on model fitting to a new dataset. First I trained a convolutional neural network based on 10,000 images. Then I experimented with the trained model on another 10,000 images. I retrained the model in three ways: training from scratch, loading the trained weights and freezing the convolutional layers. The result shows that for the problem of image classification when the dataset changes but is similar to the old one, transfer learning a useful practice to adjust the model without retraining from scratch. Freezing the convolutional layer is a good choice if the new model just needs to achieve a similar level of performance as the old one. Loading weights is a better choice if the new model needs to achieve better performance than the original one. In conclusion, there is no need to be limited by the set of existing practices of DevOps when we apply DevOps to data science.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)