Video Görüntülerinde Şiddet İçeren Aktivitelerin Lstm Ağı İle Tespiti

Bilgisayarlı görü alanında hareket tanıma gerek RGB videolar, gerekse derinlik haritaları üzerinde fazlasıyla çalışılmış bir konu olmakla beraber; şiddet içeren hareketlerin tespiti göreli olarak az çalışılmış bir alandır. Gelişmekte olan teknoloji ve internet ağı sayesinde, büyük miktarlarda video verisine kolayca erişilebilmektedir. Bu sayede, birçok şiddet içerikli video da kolayca erişilebilir hale gelmiştir. Şiddet içeren sahnelere sahip videoların etiketlenmesi, güvenlik ve içerik tabanlı video arama sistemleri için önemlidir. Güvenlik kamera sistemleri genellikle şiddeti ve uygunsuz hareketleri tespit etmek için elverişsizdir. Büyük ölçekli bir güvenlik kamera sistemi için, bir operatörün tüm kameraları aynı anda izlemesi imkânsızdır. Öte yandan, video akışı sitelerine yüklenen videoları kontrol edebilen otomatik video değerlendirme ve etiketleme sistemleri için de giderek artan bir ihtiyaç bulunmaktadır. Bu nedenlerden ötürü şiddet tespiti daha da önem kazanan bir konu haline gelişmiştir. Bu çalışmada video görüntüleri üzerinde Transfer Öğrenme ve Long Short Term Memory (LSTM) ağı tabanlı bir yöntem önerilmiştir. Doğrudan RGB görüntülerinden, optik akış degerlerinin ve RGB çerçeve serilerinin türevi hesaplanarak elde edilen hız görüntülerinden GoogleNet kullanılarak derin öznitelikler elde edilmiştir. Elde edilen derin öznitelik serileri LSTM ağına girdi olarak verilmiştir. Önerilen yöntem literatürde bu tarz çalışmaların test edilmesinde yaygın olarak kullanılan Hockey Fight ve Violent Flow veri kümeleri ile test edilmiştir. Deney sonuçları literatürdeki çalışmalarla karşılaştırılabilir düzeydedir.

Anahtar Kelimeler:

Şiddet Eylemi, Derin Öğrenme, CNN, GoogleNet, LSTM

Detection of Violent Activities in Video Images with Lstm Network

Although action recognition is a widely studied area on both RGB videos and depth map, violent activity detection is a relatively less studied area. With developing technology and the growing internet, large-scale video data become easily accessible. This also makes videos with violent scenes accessible. Labeling the violent scenes in videos is important for the content-based multimedia retrieval systems. Standard surveillance systems are incapable of detecting violent and improper activities. It is impossible for a human operator to watch all the records for a large-scale surveillance system. On the other hand, there is an increasing demand for automatic labeling systems to check the videos uploaded into video streaming sites. For these reasons, the automatic detection of violent activities is becoming more and more important. In this study, we propose a Transfer Learning and LSTM (Long Short-Term Memory) network-based method. Deep features extracted form GoogleNet by using RGB sequences, velocity sequences and acceleration sequences computed from the first and second derivative of the pixels values are given into the LSTM network as input. The proposed method is tested with Hockey Fight and Violent Flow datasets that are commonly used in the literature. Experimental results are comparable to those in the literature.

Keywords:

Violent Activity, Deep Learning, CNN, GoogleNet, LSTM,

PDF

___

[1]Nam, J., Alghoniemy, M., Tewfik, A.H., 1998. Audio-visual content-based violent scene characterization, Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on. IEEE, pp. 353-357.
[2] Clarin, C., Dionisio, J., Echavez, M., Naval, P., 2005. DOVE: Detection of movie violence using motion intensity analysis on skin and blood. PCSC 6, 150-156.
[3] Gong, Y., Wang, W., Jiang, S., Huang, Q., Gao, W., 2008. Detecting violent scenes in movies by auditory and visual cues, Pacific-Rim Conference on Multimedia. Springer, pp. 317-326.
[4] Kooij, J.F., Liem, M., Krijnders, J.D., Andringa, T.C., Gavrila, D.M., 2016. Multi-modal human aggression detection. Computer Vision and Image Understanding 144, 106-120.
[5] Lin, J., Wang, W., 2009. Weakly-supervised violence detection in movies with audio and video based co-training, Pacific-Rim Conference on Multimedia. Springer, pp. 930-935.
[6] Hassner, T., Itcher, Y., Kliper-Gross, O., 2012. Violent flows: Real-time detection of violent crowd behavior, Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on. IEEE, pp. 1-6.
[7] Lloyd, K., Marshall, D., Moore, S.C., Rosin, P.L., 2016. Detecting Violent Crowds using Temporal Analysis of GLCM Texture. arXiv preprint arXiv:1605.05106.
[8] Boiman, O., Irani, M., 2007. Detecting irregularities in images and in video. Int J Comput Vision 74, 17-31.
[9] Rota, P., Conci, N., Sebe, N., Rehg, J.M., 2015. Real-life violent social interaction detection, Image Processing (ICIP), 2015 IEEE International Conference on. IEEE, pp. 3456-3460.
[10] Gao, Y., Liu, H., Sun, X., Wang, C., Liu, Y., 2016. Violence detection using Oriented VIolent Flows. Image and Vision Computing 48, 37-41.
[11] Dai, Q., Zhao, R.-W., Wu, Z., Wang, X., Gu, Z., Wu, W., Jiang, Y.-G., 2015. Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning, MediaEval.
[12] Deniz, O., Serrano, I., Bueno, G., Kim, T.-K., 2014. Fast violence detection in video, Computer Vision Theory and Applications (VISAPP), 2014 International Conference on. IEEE, pp. 478-485.
[13] Arceda, V.M., Ferna, K., Guti, J., 2016. Real time violence detection in video.
[14] Keceli, A.S., Kaya, A., 2017. Violent activity detection with transfer learning method. Electron Lett 53, 1047-1048.
[15] Keceli, A.S., Kaya, A., 2018. Optik Akış Görüntüsü ve Bi-Lstm ile Şiddet İçeren Hareketlerin Sınıflandırılması. Avrupa Bilim ve Teknoloji Dergisi, 204-208.
[16] GoogleNet katmanları çizgesi. https://www.mathworks.com/help/deeplearning/ref/plot.html (Erişim Tarihi: 01.09.2018).
[17] Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput 9, 1735-1780.
[18] Nievas, E.B., Suarez, O.D., García, G.B., Sukthankar, R., 2011. Violence detection in video using computer vision techniques, International Conference on Computer Analysis of Images and Patterns. Springer, pp. 332-339.
[19] Bruhn, A., Weickert, J., Schnörr, C., 2005. Lucas/Kanade meets Horn/Schunck: Combining local and global optic flow methods. Int J Comput Vision 61, 211-231.
[20] Yeffet, L., Wolf, L., 2009. Local trinary patterns for human action recognition, Computer Vision, 2009 IEEE 12th International Conference on. IEEE, pp. 492-497.