Saman NIKZAD, Afshin EBRAHIMI

Two person interaction recognition based on a dual-coded modified metacognitive (DCMMC) extreme learning machine

Human action recognition has been an active research area for over three decades. However, state-of-the-art proposed algorithms are still far from developing error-free and fully-generalized systems to perform accurate interaction recognition. This work proposes a new method for two-person interaction recognition from videos, based on well-known cognitive theories. The main idea is to perform classification based on a theory of cognition known as dual coding theory. The theory states that human brain processes and represents two types of information to learn/classify data named analogue and symbolic codes, i.e. (verbal as analogue and visual as symbolic). To implement such a theory in a two-person interaction classification system, we exploit dense trajectories as analogue codes and a bag of words as symbolic codes which are two code types hypothesized in the theory. In addition to dual coding theory, we propose to implement a metacognitive classifier model which adds a metalevel with its own rules to perform more accurate training process. We also propose a modification in a metacognitive component to prevent cognitive interference well known as the Stroop effect. Evaluations on both datasets revealed that the method offers comparable recognition accuracy (95.6% for the SBU interaction dataset and 91.1% for the UT-interaction dataset).

PDF

___

[1] Fadl S, Han Q, Li Q. CNN spatiotemporal features and fusion for surveillance video forgery detection. Signal Processing: Image Communication 2021; 90: 116066. doi: 10.1016/j.image.2020.116066
[2] Ramezani M, Yaghmaee F. Motion pattern based representation for improving human action retrieval. Multimedia Tools and Applications 2018; 77: 26009–26032. doi: 10.1007/s11042-018-5835-6
[3] Vogiatzidakis P, Koutsabasis P. ‘Address and command’: Two-handed mid-air interactions with multiple home devices. International Journal of Human-Computer Studies 2022; 159: 102755. doi: 10.1016/j.ijhcs.2021.102755
[4] Shen Z, Elibol A, Chong NY. Multi-modal feature fusion for better understanding of human personality traits in social human–robot interaction. Robotics and Autonomous Systems 2021; 146: 103874. doi: 10.1016/j.robot.2021.103874
[5] Shen Z, Elibol A, Chong NY. Multi-modal feature fusion for better understanding of human personality traits in social human–robot interaction. Robotics and Autonomous Systems 2021; 146: 103874. doi: 10.1016/j.robot.2021.103874
[6] Islam N, Faheem Y, Din IU, Talha M, Guizani M et al. A blockchain-based fog computing framework for activity recognition as an application to e-Healthcare services. Future Generation Computer Systems 2019; 100: 569-78. doi: 10.1016/j.future.2019.05.059
[7] Gao Y, Xiang X, Xiong N, Huang B, Lee HJ et al. Human action monitoring for healthcare based on deep learning. IEEE Access 2018; 6: 52277-85. doi: 10.1109/ACCESS.2018.2869790
[8] Chen J, Samuel RD, Poovendran P. LSTM with bio inspired algorithm for action recognition in sports videos. Image and Vision Computing 2021; 112 :104214. doi: 10.1016/j.imavis.2021.104214
[9] Akila K, Chitrakala S. Highly refined human action recognition model to handle intraclass variability and interclass similarity. Multimedia Tools and Applications 2019; 78: 20877–20894. doi: 10.1007/s11042-019-7392-z
[10] Yun K, Honorio J, Chattopadhyay D, Berg TL, Samaras D. Two-person interaction detection using body-pose features and multiple instance learning. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Providence, RI, USA; 2012. pp. 28-35. doi: 10.1109/CVPRW.2012.6239234
[11] Ryoo MS, Aggarwal JK. UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA). In: IEEE International Conference on Pattern Recognition Workshops; 2010. Vol. 2, p. 4.
[12] Soomro K, Zamir A, Shah M. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv 2012: 1212.0402.
[13] Al-Faris M, Chiverton J, Ndzi D, Ahmed AI. A Review on Computer Vision-Based Methods for Human Action Recognition. Journal of Imaging 2020; 6 (6): 46. doi: 10.3390/jimaging606004
[14] Chao Wu, Yaqian Li, Yaru Zhang, Bin Liu. Double constrained bag of words for human action recognition. Signal Processing: Image Communication 2021; 98: 116399. doi: 10.1016/j.image.2021.116399
[15] Zou Y, Ren X. An Efficient Action Recognition Framework Based on ELM and 3D CNN. In: Chinese Intelligent Systems Conference; Springer, Singapore; 2020. pp. 641-648.
[16] Ijjina EP, Chalavadi KM. Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recognition 2017; 72: 504-516. doi: 10.1016/j.patcog.2017.07.013
[17] Patel CI, Labana D, Pandya S, Modi K, Ghayvat H et al. Histogram of oriented gradient-based fusion of features for human action recognition in action video sequences. Sensors 2020. 20 (24): 7299. doi: 10.3390/s20247299
[18] Wang H, Kläser A, Schmid C, Liu CL. Dense trajectories and motion boundary descriptors for action recognition. International journal of computer vision 2013; 103 (1): 60-79. doi: 10.1007/s11263-012-0594-8
[19] Haoyuan Z, Yonghong H, Pichao W, Zihui G, Wanqing L. SAR-NAS: Skeleton-based action recognition via neural architecture searching. Journal of Visual Communication and Image Representation 2020; 73: 102942. doi:10.1016/j.jvcir.2020.102942.
[20] Scarpina F, Tagini S. The stroop color and word test. Frontiers in psychology 2017; 8: 557.
[21] Clark JM, Paivio A. Dual coding theory and education. Educational psychology review 1991; 3 (3): 149-210.
[22] Reed SK. Cognition: Theories and applications. CA, USA: CENGAGE learning, 2012.
[23] Sternberg RJ. Cognitive theory. CA, USA: Thomson Wadsworth, 2003.
[24] Shi J. Good features to track. In: 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition; Seattle, WA, USA; 1994. pp. 593-600. doi: 10.1109/CVPR.1994.323794
[25] Babu GS, Suresh S. Meta-cognitive RBF network and its projection based learning algorithm for classification problems. Applied Soft Computing 2013; 13 (1): 654-66. doi: 10.1016/j.asoc.2012.08.047
[26] Metcalfe J, Shimamura AP. Metacognition: Knowing about knowing. MIT press, 1994.
[27] Fleming SM, Frith CD. The cognitive neuroscience of metacognition. London, UK: Springer, 2014.
[28] Cox MT. Metacognition in computation: A selected research review. Artificial intelligence 2005; 169 (2): 104-141.
[29] Nelson TO. Metamemory: A theoretical framework and new findings. In: Psychology of Learning and Motivation. Academic Press, 1990, Vol. 26, pp. 125-173. doi: 10.1016/S0079-7421(08)60053-5
[30] Cheng S, Wu Y, Li Y, Yao F, Min F. TWD-SFNN: Three-way decisions with a single hidden layer feedforward neural network. Information Sciences 2021; 579: 15-32. doi :10.1016/j.ins.2021.07.091
[31] Guliyev NJ, Ismailov VE. On the approximation by single hidden layer feedforward neural networks with fixed weights. Neural Networks 2018; 98: 296-304. doi: 10.1016/j.neunet.2017.12.007
[32] Cheng X, Feng Z, Niu W. Forecasting Monthly Runoff Time Series by Single-Layer Feedforward Artificial Neural Network and Grey Wolf Optimizer. IEEE Access 2020; 8: 157346-157355. doi: 10.1109/ACCESS.2020.3019574
[33] Huang GB, Zhu QY, Siew CK. Extreme learning machine: theory and applications. Neurocomputing 2006; 70 (1-3): 489-501. doi: 10.1016/j.neucom.2005.12.126
[34] Huang GB, Zhu QY, Siew CK. Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural Networks; Budapest, Hungary; 2004. Vol. 2, pp. 985-990.
[35] Liu B, Ju Z, Liu H. A structured multi-feature representation for recognizing human action and interaction. Neurocomputing 2018; 318: 287-96. doi: 10.1016/j.neucom.2018.08.066
[36] Ke Q, Bennamoun M, An S, Sohel F, Boussaid F. Learning clip representations for skeleton-based 3d action recognition. IEEE Transactions on Image Processing 2018; 27 (6): 2842-55. doi: 10.1109/TIP.2018.2812099
[37] Liu J, Wang G, Duan LY, Abdiyeva K, Kot AC. Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Transactions on Image Processing 2018; 27 (4): 1586-99.
[38] Nikzad S, Ebrahimnezhad H. Two-person interaction recognition from bilateral silhouette of key poses. Journal of Ambient Intelligence and Smart Environments 2017; 9 (4): 483-499. doi: 10.3233/AIS-170442
[39] Mottaghi A, Soryani M, Seifi H. Action recognition in freestyle wrestling using silhouette-skeleton features. Engineering Science and Technology 2020; 23 (4): 921-30. doi: 10.1016/j.jestch.2019.10.008
[40] Liu X, Li Y, Guo T, Xia R. Relative view based holistic-separate representations for two-person interaction recognition using multiple graph convolutional networks. Journal of Visual Communication and Image Representation 2020; 70: 102833. doi: 10.1016/j.jvcir.2020.102833
[41] Berlin SJ, John M. Particle swarm optimization with deep learning for human action recognition. Multimedia Tools and Applications 2020; 79: 17349-17371.
[42] Garzón G, Martínez F. A Fast Action Recognition Strategy Based on Motion Trajectory Occurrences. Pattern Recognition and Image Analysis 2019; 29 (3): 447-56. doi: 10.1134/S1054661819030039
[43] Sahoo SP, Ari S. On an algorithm for human action recognition. Expert Systems with Applications 2019; 115: 524-34. doi: 10.1016/j.eswa.2018.08.014
[44] Wang Z, Jin J, Liu T, Liu S, Zhang J et al. Understanding human activities in videos: A joint action and interaction learning approach. Neurocomputing 2018; 321: 216-26. doi: 10.1016/j.neucom.2018.09.031