Human pose augmentation for facilitating Violence Detection in videos: a combination of the deep learning methods DensePose and VioNetHuman pose augmentation for facilitating Violence Detection in videos: a combination of the deep learning methods DensePo

Detta är en Master-uppsats från Mittuniversitetet/Institutionen för informationssystem och –teknologi

Sammanfattning: In recent years, deep learning, a critical technology in computer vision, has achieved remarkable milestones in many fields, such as image classification and object detection. In particular, it has also been introduced to address the problem of violence detection, which is a big challenge considering the complexity to establish an exact definition for the phenomenon of violence. Thanks to the ever increasing development of new technologies for surveillance, we have nowadays access to an enormous database of videos that can be analyzed to find any abnormal behavior. However, by dealing with such huge amount of data it is unrealistic to manually examine all of them. Deep learning techniques, instead, can automatically study, learn and perform classification operations. In the context of violence detection, with the extraction of visual harmful patterns, it is possible to design various descriptors to represent features that can identify them. In this research we tackle the task of generating new augmented datasets in order to try to simplify the identification step performed by a violence detection technique in the field of Deep Learning. The novelty of this work is to introduce the usage of DensePose model to enrich the images in a dataset by highlighting (i.e. by identifying and segmenting) all the human beings present in them. With this approach we gained knowledge of how this algorithm performs on videos with a violent context and how the violent detection network benefit from this procedure. Performances have been evaluated from the point of view of segmentation accuracy and efficiency of the violence detection network, as well from the computational point of view. Results shows how the context of the scene is the major indicator that brings the DensePose model to correct segment human beings and how the context of violence does not seem to be the most suitable field for the application of this model since the common overlap of bodies (distinctive aspect of violence) acts as disadvantage for the segmentation. For this reason, the violence detection network does not exploit its full potential. Finally, we understood how such augmented datasets can boost up the training speed by reducing the time needed for the weights-update phase, making this procedure a helpful adds-on for implementations in different contexts where the identification of human beings still plays the major role.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)