Özge ÖZTİMUR KARADAĞ

An adversarial framework for open-set human action recognition using skeleton data

Human action recognition is a fundamental problem which is applied in various domains, and it is widely studied in the literature. Majority of the studies model action recognition as a closed-set problem. However, in real- life applications it usually arises as an open-set problem where a set of actions are not available during training but are introduced to the system during testing. In this study, we propose an open-set action recognition system, human action recognition and novel action detection system (HARNAD), which consists of two stages and uses only 3D skeleton information. In the first stage, HARNAD recognizes a given action and in the second stage it decides whether the action really belongs to one of the a priori known classes or if it is a novel action. We evaluate the performance of the system experimentally both in terms of recognition and novelty detection. We also compare the system performance with state-of-the-art open-set recognition methods. Our experiments show that HARNAD is compatible with state-of-the-art methods in novelty detection, while it is superior to those methods in recognition

PDF

___

[1] Gao Z, Xuan H, Zhang H, Wan S, Choo KR. Adaptive fusion and category-level dictionary learning model for multiview human action recognition. IEEE Internet of Things Journal 2019; 6 (6): 9280-9293. doi: 10.1109/JIOT.2019.2911669
[2] Jegham I, Khalifa AB, Alouani I, Mahjoub MA. Vision-based human action recognition: An overview and real world challenges. Forensic Science International: Digital Investigation 2020; 32: 200901. doi: 10.1016/j.fsidi.2019.200901
[3] Tufek N, Yalcin M, Altintas M, Kalaoglu F, Li Y, Bahadir SK. Human action recognition using deep learning methods on limited sensory data. IEEE Sensors Journal 2020; 20 (6): 3101-3112. doi: 10.1109/JSEN.2019.2956901
[4] Ahmad Z, Khan N. Human action recognition using deep multilevel multimodal ( M 2 ) fusion of depth and inertial sensors. IEEE Sensors Journal 2020; 20 (3): 1445-1455.
[5] Aggarwal JK, Xia L. Human activity recognition from 3D data: A review. Pattern Recognition Letters 2014; 48: 70–80. doi: 10.1016/j.patrec.2014.04.011
[6] Laptev I, Marszalek M, Schmid C, Rozenfeld B. Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition; Anchorage, AK, USA; 2008. pp. 1-8.
[7] Ji S, Xu W, Yang M, Yu K. 3D Convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2013; 35 (1): 221-231.
[8] Oh S, Hoogs A, Perera A, Cuntoor N, Chen CC et al. A large-scale benchmark dataset for event recognition in surveillance video. In: Computer Vision and Pattern Recognition (CVPR 2011); Providence, RI, USA; 2011. pp. 3153-3160. doi: 10.1109/CVPR.2011.5995586
[9] Popoola OP, Wang K. Video-based abnormal human behavior recognition—A review. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2012; 42 (6): 865-878. doi: 10.1109/TSMCC.2011.2178594
[10] Wan S, Gu Z, Ni Q. Cognitive computing and wireless communications on the edge for healthcare service robots. Computer Communications 2020; 149: 99-106. doi: 10.1016/j.comcom.2019.10.012
[11] Chen C, Jafari R, Kehtarnavaz N. Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Transactions on Human-Machine Systems 2015; 45 (1): 51-61. doi: 10.1109/THMS.2014.2362520
[12] Shi L, Zhang Y, Cheng J, Lu H. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019); Long Beach, CA, USA; 2019. pp. 12018-12027. doi: 10.1109/CVPR.2019.01230
[13] Yang Y, Hou C, Lang Y, Guan D, Huang D et al. Open-set human activity recognition based on micro-doppler signatures. Pattern Recognition 2018; 85: 60-69. doi: 10.1016/j.patcog.2018.07.030
[14] Ji S, Xu W, Yang, M, Yu K. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2013; 35(1): 221-231. doi: 10.1109/TPAMI.2012.59
[15] Wang L, Xu Y, Cheng J, Xia H, Yin J et al. Human action recognition by learning spatio-temporal features with deep neural networks. IEEE Access 2018; 6: 17913-17922. doi: 10.1109/ACCESS.2018.2817253
[16] Luvizon D, Tabia H, Picard D. Multimodal deep neural networks for pose estimation and action recognition. In: Congres Reconnaissance des Formes, Image, Apprentissage et Perception (RFIAP 2018); Marne-la-Vallee, France; 2018. pp.1-20
[17] Scheirer WJ, Rocha AR, Sapkota A, Boult TE. Toward open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2013; 35 (7): 1757-1772. doi: 10.1109/TPAMI.2012.256
18] Scheirer WJ, Jain LP, Boult TE. Probability models for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2014; 36: 2317-2324. doi: 10.1109/TPAMI.2014.2321392
[19] Bendale A, Boult TE. Towards open set deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition; Las Vegas, NV, USA; 2016. pp. 1563-1572.
[20] Shu Y, Shi Y, Wang Y, Zou Y, Yuan Q et al. ODN: Opening the deep network for open-set action recognition. In: IEEE International Conference on Multimedia and Expo (ICME); San Diego, CA, USA; 2018. pp.1-6.
[21] Roitberg A, Al-Halah Z, Stiefelhagen R. Informed democracy: Voting-based novelty detection for action recognition. In: British Machine Vision Conference (BMVC); Newcastle, UK; 2018.
[22] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D et al. Generative adversarial nets. In: Advances in Neural Information Processing Systems 27 (NIPS 2014); Montreal, Canada; 2014. pp. 2672-2680.
[23] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. 2015; arXiv.
[24] Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A et al. Improved techniques for training GANs. In: 30th International Conference on Neural Information Processing Systems; Barcelona, Spain; 2016. pp. 2234-2242.
[25] Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. In: 34th International Conference on Machine Learning; Sydney, Australia; 2017. PMLR 70: 214-223.
[26] Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC. Improved training of Wasserstein GANs. In: International Conference on Neural Information Processing Systems; Long Beach, CA, USA; 2017. pp.5767-5777.
[27] Saboktrou M, Khalooei M, Fathy M, Adeli E. Adversarially learned one-class classifier for novelty detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition; New York, USA; 2018. pp. 3379-3388.
[28] Shahroudy A, Liu J, Ng T, Wang G. NTU RGB+D: A large scale dataset for 3D human activity analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Las Vegas, US; 2016. pp. 1010-1019.
[29] Xia L, Chen CC, Aggarwal JK. View invariant human action recognition using histograms of 3D joints. In: Computer Vision and Pattern Recognition Workshops (CVPRW); Providence, RI, USA; 2012. pp. 20-27.
[30] Bloom V, Makris D, Argyriou V. G3D: A gaming action dataset and real time action recognition evaluation framework. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops); Providence, RI, USA; 2012. pp. 7-12.
[31] Ionescu C, Papava D, Olaru V, Sminchisescu C. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 2014; 36 (7): 1325-1339.
[32] Wang H, Wang L. Beyond joints: Learning representations from primitive geometries for skeleton-based action recognition and detection. IEEE Transactions on Image Processing 2018; 27 (9): 4382-4394.
[33] Liu FT, Ting KM, Zhou ZH. Isolation forest. In: The Eighth IEEE International Conference on Data Mining; Pisa, Italy; 2008. pp.413-422.
[34] Schölkopf B, Williamson R, Smola A, Shawe-Taylor J, Platt J. Support vector method for novelty detection. In: 12th International Conference on Neural Information Processing Systems; Perth, WA, Australia; 1999. pp.582-588