Segmentation of Portrait Images Using A Deep Residual Network Architecture

Segmenting portrait images into semantic areas is an important step towards scene understanding and image analysis. Although segmentation is a very active field of study, there are few studies in the field of portrait segmentation. One of the most crucialsteps in portrait segmentation is the precise segmentation process where semantically related pixels grouped together including hair, face, body, and background. However, this is a challenging problem due to the extreme variations in hair shape, color, and background. In order to handle such variations, we proposed a deep residual network based on ERFNet architecture. We used geometrically normalized faces as an input for the network. Experimental studies on EG1800 dataset (two-classes) and LFW Part LabelsDataset (three-classes) showed that the proposed method provides state of the art mIoU (mean intersection over union) and pixel-based accuracy. We obtained 96.37% mIoU and 98.17% pixel based accuracy for EG1800 dataset and 90.1% mIoU and 97.14% accuracy for the LFW dataset.

___

[1]Goodfellow, I., Bengio, Y., Courville, A. 2016. Deep Learning, MIT Press

[2]He, K., Sun, J. 2014. Convolutional Neural Networks at Constrained Time Cost, CoRR, Vol. abs/1412.1

[3]He, K., Zhang, X., Ren, S., Sun, J.2015. Deep Residual Learning for Image Recognition, CoRR, Vol. abs/1512.0

[4]Zaitoun, N. M., Aqel, M. J. 2015. Survey on Image Segmentation Techniques, Procedia Computer Science, Vol. 65, p. 797–806. DOI: https://doi.org/10.1016/j.procs.2015.09.027

[5]Zhang, H., Fritts, J. E., Goldman, S. A. 2008. Image Segmentation Evaluation: A Survey of Unsupervised Methods, Computer Vision and Image Understanding, Vol. 110, No. 2, p. 260–280. DOI: https://doi.org/10.1016/j.cviu.2007.08.003

[6]Otsu, N. 1979. A Threshold Selection Method from Gray-Level Histograms, IEEE Transactions on Cybernetics, Vol. 9, No. 1, p. 62–66. DOI: 10.1109/TSMC.1979.4310076

[7]Liu, H., Yan, J., Li, Z., Zhang, H. 2007. Portrait Beautification: A Fast and Robust Approach, Image and Vision Computing, Vol. 25, No. 9, p. 1404–1413. DOI: https://doi.org/10.1016/j.imavis.2006.12.010

[8]Lafferty, J. D., McCallum, A., Pereira, F. C. N. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, Proceedings of the Eighteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, p. 282–289

[9]Toyoda, T., Hasegawa, O. 2008. Random Field Model for Integration of Local Information and Global Information, IEEE Transactions on Pattern Analysisand Machine Intelligence, Vol. 30, No. 8, p. 1483–1489. DOI: 10.1109/TPAMI.2008.105[10]Shotton, J., Winn, J., Rother, C., Criminisi, A. 2009. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context, International Journal of Computer Vision

[11]Boix, X., Gonfaus, J. M., Weijer, J., Bagdanov, A. D., Serrat, J., Gonzàlez, J. 2012. Harmony Potentials, International Journal of Computer Vision, Vol. 96, No. 1, p. 83–102. DOI: 10.1007/s11263-011-0449-8

[12]Lin, G., Shen, C., Reid, I. D., van den Hengel, A. 2015. Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation, CoRR, Vol. abs/1504.0

[13]Ladický, L., Russell, C., Kohli, P., Torr, P. H. S. 2009. Associative Hierarchical CRFs for Object Class Image Segmentation, 2009 IEEE 12th International Conference on Computer Vision, p. 739–746. DOI: 10.1109/ICCV.2009.5459248

[14]Boykov, Y. Y., Jolly, M. P. 2001. Interactive Graph Cuts for Optimal Boundary Amp; Region Segmentation of Objectsin N-D Images, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001 (Vol. 1), p. 105–112 vol.1. DOI: 10.1109/ICCV.2001.937505

[15]Luu, K., Le, T. H. N., Seshadri, K., Savvides, M. 2012. Facecut -a Robust Approach for Facial FeatureSegmentation, 2012 19th IEEE International Conference on Image Processing, p. 1841–1844. DOI: 10.1109/ICIP.2012.6467241

[16]Luu, K., Zhu, C., Bhagavatula, C., Le, T. H. N., Savvides, M. 2016. A Deep Learning Approach to Joint Face Detection and Segmentation, M. Kawulok; M. E. Celebi; B. Smolka (Eds.), Advances in Face Detection and Facial Image Analysis, Springer International Publishing, Cham, p. 1–12. DOI: 10.1007/978-3-319-25958-1_1

[17]Lecun, Y., Bottou, L., Bengio, Y., Haffner, P. 1998. Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, Vol. 86, No. 11, p. 2278–2324. DOI: 10.1109/5.726791

[18]Krizhevsky, A., Sutskever, I., Hinton, G. E. 2012. ImageNet Classification with Deep Convolutional Neural Networks, Proceedings of Neural Information Processing Systems (NIPS), p. 1106–1114.

[19]Simonyan, K., Zisserman, A. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition, CoRR, Vol. abs/1409.1

[20]Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. 2015. Going Deeper with Convolutions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 1–9. DOI: 10.1109/CVPR.2015.7298594

[21]Xie, S., Girshick, R. B., Dollár, P., Tu, Z., He, K. 2016. Aggregated Residual Transformations for Deep Neural Networks, CoRR, Vol. abs/1611.0

[22]Badrinarayanan, V., Kendall, A., Cipolla, R. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 12, p. 2481–2495. DOI: 10.1109/TPAMI.2016.2644615

[23]Romera, E., Álvarez, J. M., Bergasa, L. M., Arroyo, R. 2018. ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation, IEEE T INTELL TRANSP, Vol. 19, No. 1, p. 263–272. DOI: 10.1109/TITS.2017.2750080

[24]Yu, F., Koltun, V., Funkhouser, T. A. 2017. Dilated Residual Networks, CoRR, Vol. abs/1705.0

[25]Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., Zisserman, A. 2010. The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, Vol. 88, No. 2, p. 303–338. DOI: 10.1007/s11263-009-0275-4

[26]Shen, X., Hertzmann, A., Jia, J., Paris, S., Price, B., Shechtman, E., Sachs, I. 2016. Automatic Portrait Segmentation for Image Stylization, Computer Graphics Forum, Vol. 35, No. 2, p. 93–102. DOI: 10.1111/cgf.12814

[27]Long, J., Shelhamer, E., Darrell, T. 2015. Fully Convolutional Networks for Semantic Segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 3431–3440. DOI: 10.1109/CVPR.2015.7298965

[28]Wadhwa, N., Garg, R., Jacobs, D. E., Feldman, B. E., Kanazawa, N., Carroll, R., Movshovitz-Attias, Y., Barron, J. T., Pritch, Y., Levoy, M. 2018. Synthetic Depth-of-Field with a Single-Camera Mobile Phone, ACM Transactions on Graphics, Vol. 37, No. 4, p. 64:1--64:13. DOI: 10.1145/3197517.3201329

[29]Mostajabi, M., Yadollahpour, P., Shakhnarovich, G. 2014. Feedforward Semantic Segmentation with Zoom-out Features, CoRR, Vol. abs/1412.0

[30]Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Rodr\’\iguez, J. G. 2017. A Review on Deep Learning Techniques Applied to Semantic Segmentation, CoRR, Vol. abs/1704.0

[31]Krähenbühl, P., Koltun, V. 2011. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, Proceedings of the 24th International Conference on Neural Information Processing Systems, Curran Associates Inc., USA, p. 109–117

[32]Kae, A., Sohn, K., Lee, H., Learned-Miller, E. 2013. Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p. 2019–2026. DOI: 10.1109/CVPR.2013.263

[33]Viola, P., Jones, M. 2001. Rapid Object Detection Using a Boosted Cascade of Simple Features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001 (Vol. 1), p. I–I. DOI: 10.1109/CVPR.2001.990517

[34]Rowley, H. A., Baluja, S., Kanade, T. 1998. Neural Network-Based Face Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, p.23–38. DOI: 10.1109/34.655647

[35]Milborrow, S., Nicolls, F. 2008. Locating Facial Features with an Extended Active Shape Model, D. Forsyth; P. Torr; A. Zisserman (Eds.), Computer Vision --ECCV 2008, Springer Berlin Heidelberg, Berlin, Heidelberg, p. 504–513

[36]Collobert, R., Kavukcuoglu, K., Farabet, C. 2011. Torch7: A Matlab-like Environment for Machine Learning, BigLearn, NIPS Workshop

[37]Huang, G. B., Jain, V., Learned-Miller, E. G. 2007. Unsupervised Joint Alignment of Complex Images, 2007 IEEE 11th International Conference on Computer Vision, p. 1–8

[38]Kalayeh, M. M., Seifu, M., LaLanne, W., Shah, M. 2015. How to Take a Good Selfie?, Proceedings of the 23rd ACM International Conference on Multimedia, ACM, New York, NY, USA, p. 923–926. DOI: 10.1145/2733373.2806365