Xue MEI, Yao DING, Jiali BIAN, Yu XUE, Liang WU

Efficient hierarchical temporal segmentation method for facial expression sequences

Temporal segmentation of facial expression sequences is important to understand and analyze human facialexpressions. It is, however, challenging to deal with the complexity of facial muscle movements by finding a suitablemetric to distinguish among different expressions and to deal with the uncontrolled environmental factors in the realworld. This paper presents a two-step unsupervised segmentation method composed of rough segmentation and finesegmentation stages to compute the optimal segmentation positions in video sequences to facilitate the segmentation ofdifferent facial expressions. The proposed method performs localization of facial expression patches to aid in recognitionand extraction of specific features. In the rough segmentation stage, facial sequences are segmented into distinct facialbehaviors based on the similarity between sequence frames, while similarity between segments is computed to obtainoptimal segmentation positions in the fine segmentation stage. The proposed method has been evaluated in experimentsusing the MMI dataset and real videos. Experiment results compared to other state-of-the-art methods indicate betterperformance of the proposed method.

PDF

___

[1] Cai X, Nie F, Huan, H, Kamangar F. Heterogeneous image feature integration via multi-modal spectral clustering. In: IEEE 2011 International Conference on Computer Vision and Pattern Recognition; Colorado Springs, CO, USA; 2011. pp. 1977-1984.
[2] Happy SL, Routray A. Automatic facial expression recognition using features of salient facial patches. IEEE Transactions on Affective Computing 2015; 6 (1): 1-12. doi: 10.1109/TAFFC.2014.2386334
[3] Lan Z, Sourina O, Wang L, Liu Y. Real-time EEG-based emotion monitoring using stable features. Visual Computer 2015; 32 (3): 347-358. doi: 10.1007/s00371-015-1183-y
[4] Ekman P, Friesen W. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Palo Alto, CA, USA: Consulting Psychologists Press, 1978.
[5] Fernando DLT, Campoy J, Ambadar Z, Coln JF. Temporal segmentation of facial behavior. In: IEEE 2007 International Conference on Computer Vision; Rio de Janeiro, Brazil; 2007. pp. 1-8.
[6] Kruger B, Vogele A, Willig T, Yao A, Klein R et al. Efficient unsupervised temporal segmentation of motion data. IEEE Transactions on Multimedia 2017; 19 (4): 797-812. doi: 10.1104/pp.79.3.699
[7] Yin M, Gao J, Lin Z. Laplacian regularized low-rank representation and its applications. IEEE Transactions on Pattern Analysis & Machine Intelligence 2016; 38 (3): 504-517. doi: 10.1109/TPAMI.2015.2462360
[8] Junejo IN, Dexter E, Laptev I, Pérez P. View-independent action recognition from temporal self-similarities. IEEE Transactions on Pattern Analysis & Machine Intelligence 2011; 33 (1): 172-185. doi: 10.1109/TPAMI.2010.68
[9] Lu G, Kudo M, Toyama J. Temporal segmentation and assignment of successive actions in a long-term video. Pattern Recognition Letters 2013; 34 (15): 1936-1944. doi: 10.1016/j.patrec.2012.10.023
[10] Xia L, Aggarwal JK. Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: IEEE 2013 Conference on Computer Vision and Pattern Recognition; Portland, OR, USA; 2013. pp. 2834-2841.
[11] Zhou F, Torre FDL, Hodgins JK. Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE Transactions on Pattern Analysis & Machine Intelligence 2016; 35 (3): 582-596. doi: 10.1109/tpami.2012.137
[12] Manfred B, Jacob AF, Lorenz-Peter S. Self-similarity matrix based slow-time feature extraction for human target in high-resolution radar. International Journal of Microwave & Wireless Technologies 2014; 6 (3-4): 423-434. doi: 10.1017/S1759078714000087
[13] Cao X, Zhang C, Zhou C, Fu H, Foroosh H. Constrained multi-view video face clustering. IEEE Transactions on Image Processing 2015; 24 (11): 4381-4393. doi: 10.1109/TIP.2015.2463223
[14] Elhamifar E, Vidal R. Sparse subspace clustering. In: IEEE 2009 Conference on Computer Vision and Pattern Recognition; Miami, FL, USA; 2009. pp. 20-25.
[15] Feng J, Lin Z, Xu H, Yan S. Robust subspace segmentation with block-diagonal prior. In: IEEE 2014 Conference on Computer Vision and Pattern Recognition; Columbus, OH, USA; 2014. pp. 3818-3825.
[16] Wright J, Ganesh A, Zhou Z, Wagner A, Ma Y. Demo: Robust face recognition via sparse representation. In: IEEE 2008 International Conference on Automatic Face & Gesture Recognition; Amsterdam, the Netherlands; 2008. pp. 1-2.
[17] Chrysouli C, Vretos N, Pitas I. Face clustering in videos based on spectral clustering techniques. In: IEEE 2008 Asian Conference on Pattern Recognition; Beijing, China; 2011. pp. 130-134.
[18] Wu B, Zhang Y, Hu BG, Ji Q. Constrained clustering and its application to face clustering in videos. In IEEE 2013 Conference on Computer Vision and Pattern Recognition; Portland, OR, USA; 2013. pp. 3507-3514.
[19] Liu G, Lin Z, Yan S, Sun J, Yu Y et al. Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis & Machine Intelligence 2013; 35(1): 171-184. doi: 10.1109/TPAMI.2012.88
[20] Cao X, Wei X, Han Y, Lin D. Robust face clustering via tensor decomposition. IEEE Transactions on Cybernetics 2015; 45 (11): 2546-2557. doi: 10.1109/TCYB.2014.2376938
[21] Xue Y, Mei X, Bian JL, Wu L, Ding Y. Temporal segmentation of facial expressions in video sequences. In: IEEE 2017 Chinese Control Conference; Dalian, China; 2017. pp. 10789-10794.
[22] Pai YT, Ruan SJ, Shie MC, Liu YC. A simple and accurate color face detection algorithm in complex background. In: IEEE 2006 International Conference on Multimedia and Expo; Toronto, Canada; 2006. pp. 1545-1548.
[23] Torre FDL. A least-squares framework for component analysis. IEEE Transactions on Pattern Analysis & Machine Intelligence 2012; 34 (6): 1041-1055. doi: 10.1109/tpami.2011.184
[24] Principi E, Squartini S, Piazza F. Power normalized cepstral coefficients based supervectors and i-vectors for small vocabulary speech recognition. In: IEEE 2014 International Joint Conference on Neural Networks; Beijing, China; 2014. pp. 3562-3568.
[25] Stathopoulos V, Zamora-Gutierrez V, Jones K, Girolami M. Bat call identification with Gaussian process multinomial profit regression and a dynamic time warping kernel. In 2014 International Conference on Artificial Intelligence and Statistics; Reykjavik, Iceland; 2014. pp. 913-921.
[26] Fang H, Parthaláin NM, Aubrey AJ, Tam GKL, Borgo R et al. Facial expression recognition in dynamic sequences: an integrated approach. Pattern Recognition 2014; 47 (3): 1271-1281. doi: 10.1016/j.patcog.2013.09.023
[27] Valstar M, Pantic M. Induced disgust, happiness and surprise: An addition to the MMI facial expression database. In: Proceedings of the International Workshop on Emotion Corpora for Research on Emotion & Affect; 2010. pp. 1-6.