Video classification with memory and computation-efficient convolutional neural network

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Benjamin Naoto Chiche; [2019]

Nyckelord: ;

Sammanfattning: Video understanding involves problems such as video classification, which consists in labeling videos based on their contents and frames. In many real world applications such as robotics, self-driving car, augmented reality, and Internet of Things (IoT), video understanding tasks need to be carried out in a real-time manner on a device with limited memory resources and computation capabilities, while meeting latency requirement.In this context, whereas neural networks that are memory and computationefficient i.e., that present a reasonable trade-off between accuracy and efficiency with respect to memory size and computational speed have been developed for image recognition tasks, studies about video classification have not made the most of these networks. To fill this gap, this project answers the following research question: how to build video classification pipelines that are based on memory and computation-efficient convolutional neural network (CNN) and how do the latter perform?In order to answer this question, the project builds and evaluates video classification pipelines that are new artefacts. This research involves triangulation (i.e., is qualitative and quantitative at the same time) and the empirical research method is used for the evaluation. The artefacts are based on one of existing memory and computation-efficient CNNs and its evaluation is based on a public video classification dataset and multiclass classification performance metrics. The case study research strategy is adopted: we try to generalize obtained results as far as possible to other memory and computation-efficient CNNs and video classification datasets. The abductive research approach is used in order to verify or falsify hypotheses. As results, the artefacts are built and show satisfactory performance metrics compared to baseline pipelines that are also developed in this thesis and metric values that are reported in other papers that used the same dataset. To conclude, video-classification pipelines based on memory and computation-efficient CNN can be built by designing and developing artefacts that combine approaches inspired from existing papers and new approaches and these artefacts present satisfactory performance. In particular, we observe that the drop in accuracy induced by memory and computation-efficient CNN when dealing with video frames is, to some extent, compensated by capturing temporal information via consideration of sequence of these frames.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)