Taner DANIŞMAN

Derin Kalıntı Ağ Mimarisi Kullanarak Portre Görüntülerinin Bölütlenmesi

Portre görüntülerini anlamsal alanlara bölütlemek, sahne anlama ve görüntü analizinde önemli bir adımdır. Bölütleme çok aktif bir çalışma alanı olmakla birlikte, portre bölümlendirme alanında az sayıda çalışma bulunmaktadır. Portre bölütlemesindeki en önemli adımlardan biri, saç, yüz, gövde ve arka plan gibi anlamsal olarak ilişkili piksellerin birlikte gruplandığı, detaylı bölütleme işlemidir. Ancak, saç şekli, rengi ve arka planındaki aşırı farklılıklar nedeniyle bu zor bir problemdir. Çalışmamızda, bu çeşitliliklerin üstesinden gelmek için ERFNet mimarisine dayanan derin bir kalıntı ağı önerdik. Geometrik olarak normalleştirilmiş yüzleri ağ için bir girdi olarak kullandık. İki sınıflı EG1800 veri kümesi ve üç sınıflı LFW Parts Labels Veri Seti üzerinde yapılan deneysel çalışmalar, önerilen yöntemin yüksek doğrulukta ortalama kesişim değeri (mIoU) verdiğini ve piksel tabanlı doğruluğu sağladığını göstermiştir. EG1800 veri kümesi için %96,37 mIoU ve % 98,17 piksel tabanlı doğruluk ve LFW veri kümesi için %90,1 mIoU ve %97,14 doğruluk elde ettik.

Anahtar Kelimeler:

Portre Bölütleme, Derin Öğrenme, Derin Kalıntı Ağlar, Geometrik Normalleştirme, Kodlayıcı Kod Çözücü Ağlar

Segmentation of Portrait Images Using A Deep Residual Network Architecture

Segmenting portrait images into semantic areas is an important step towards scene understanding and image analysis. Although segmentation is a very active field of study, there are few studies in the field of portrait segmentation. One of the most crucial steps in portrait segmentation is the precise segmentation process where semantically related pixels grouped together including hair, face, body, and background. However, this is a challenging problem due to the extreme variations in hair shape, color, and background. In order to handle such variations, we proposed a deep residual network based on ERFNet architecture. We used geometrically normalized faces as an input for the network. Experimental studies on Adobe’s Portrait Segmentation dataset (two-classes) and LFW Part Labels Dataset (three-classes) showed that the proposed method provides state of the art mIoU (mean intersection over union) and pixel-based accuracy. We obtained 96.37% mIoU and 98.17% pixel‑based accuracy for EG1800 dataset and 90.1% mIoU and 97.14% accuracy for the LFW dataset.

Keywords:

Portrait Segmentation, Deep Learning, Deep Residual Networks, Geometric Normalization, Encoder Decoder Networks,

PDF

___

Goodfellow, I., Bengio, Y., Courville, A. 2016. Deep Learning, MIT Press
He, K., Sun, J. 2014. Convolutional Neural Networks at Constrained Time Cost, CoRR, Vol. abs/1412.1
He, K., Zhang, X., Ren, S., Sun, J. 2015. Deep Residual Learning for Image Recognition, CoRR, Vol. abs/1512.0
Zaitoun, N. M., Aqel, M. J. 2015. Survey on Image Segmentation Techniques, Procedia Computer Science, Vol. 65, p. 797–806. DOI: https://doi.org/10.1016/j.procs.2015.09.027
Zhang, H., Fritts, J. E., Goldman, S. A. 2008. Image Segmentation Evaluation: A Survey of Unsupervised Methods, Computer Vision and Image Understanding, Vol. 110, No. 2, p. 260–280. DOI: https://doi.org/10.1016/j.cviu.2007.08.003
Otsu, N. 1979. A Threshold Selection Method from Gray-Level Histograms, IEEE Transactions on Cybernetics, Vol. 9, No. 1, p. 62–66. DOI: 10.1109/TSMC.1979.4310076
Liu, H., Yan, J., Li, Z., Zhang, H. 2007. Portrait Beautification: A Fast and Robust Approach, Image and Vision Computing, Vol. 25, No. 9, p. 1404–1413. DOI: https://doi.org/10.1016/j.imavis.2006.12.010
Lafferty, J. D., McCallum, A., Pereira, F. C. N. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, Proceedings of the Eighteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, p. 282–289
Toyoda, T., Hasegawa, O. 2008. Random Field Model for Integration of Local Information and Global Information, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, No. 8, p. 1483–1489. DOI: 10.1109/TPAMI.2008.105
Shotton, J., Winn, J., Rother, C., Criminisi, A. 2009. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context, International Journal of Computer Vision
Boix, X., Gonfaus, J. M., Weijer, J., Bagdanov, A. D., Serrat, J., Gonzàlez, J. 2012. Harmony Potentials, International Journal of Computer Vision, Vol. 96, No. 1, p. 83–102. DOI: 10.1007/s11263-011-0449-8
Lin, G., Shen, C., Reid, I. D., van den Hengel, A. 2015. Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation, CoRR, Vol. abs/1504.0
Ladický, L., Russell, C., Kohli, P., Torr, P. H. S. 2009. Associative Hierarchical CRFs for Object Class Image Segmentation, 2009 IEEE 12th International Conference on Computer Vision, p. 739–746. DOI: 10.1109/ICCV.2009.5459248
Boykov, Y. Y., Jolly, M. P. 2001. Interactive Graph Cuts for Optimal Boundary Amp; Region Segmentation of Objects in N-D Images, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001 (Vol. 1), p. 105–112 vol.1. DOI: 10.1109/ICCV.2001.937505
Luu, K., Le, T. H. N., Seshadri, K., Savvides, M. 2012. Facecut - a Robust Approach for Facial Feature Segmentation, 2012 19th IEEE International Conference on Image Processing, p. 1841–1844. DOI: 10.1109/ICIP.2012.6467241
Luu, K., Zhu, C., Bhagavatula, C., Le, T. H. N., Savvides, M. 2016. A Deep Learning Approach to Joint Face Detection and Segmentation, M. Kawulok; M. E. Celebi; B. Smolka (Eds.), Advances in Face Detection and Facial Image Analysis, Springer International Publishing, Cham, p. 1–12. DOI: 10.1007/978-3-319-25958-1_1
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P. 1998. Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, Vol. 86, No. 11, p. 2278–2324. DOI: 10.1109/5.726791
Krizhevsky, A., Sutskever, I., Hinton, G. E. 2012. ImageNet Classification with Deep Convolutional Neural Networks, Proceedings of Neural Information Processing Systems (NIPS), p. 1106–1114
Simonyan, K., Zisserman, A. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition, CoRR, Vol. abs/1409.1
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. 2015. Going Deeper with Convolutions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 1–9. DOI: 10.1109/CVPR.2015.7298594
Xie, S., Girshick, R. B., Dollár, P., Tu, Z., He, K. 2016. Aggregated Residual Transformations for Deep Neural Networks, CoRR, Vol. abs/1611.0
Badrinarayanan, V., Kendall, A., Cipolla, R. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 12, p. 2481–2495. DOI: 10.1109/TPAMI.2016.2644615
Romera, E., Álvarez, J. M., Bergasa, L. M., Arroyo, R. 2018. ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation, IEEE T INTELL TRANSP, Vol. 19, No. 1, p. 263–272. DOI: 10.1109/TITS.2017.2750080
Yu, F., Koltun, V., Funkhouser, T. A. 2017. Dilated Residual Networks, CoRR, Vol. abs/1705.0
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., Zisserman, A. 2010. The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, Vol. 88, No. 2, p. 303–338. DOI: 10.1007/s11263-009-0275-4
Shen, X., Hertzmann, A., Jia, J., Paris, S., Price, B., Shechtman, E., Sachs, I. 2016. Automatic Portrait Segmentation for Image Stylization, Computer Graphics Forum, Vol. 35, No. 2, p. 93–102. DOI: 10.1111/cgf.12814
Long, J., Shelhamer, E., Darrell, T. 2015. Fully Convolutional Networks for Semantic Segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 3431–3440. DOI: 10.1109/CVPR.2015.7298965
Wadhwa, N., Garg, R., Jacobs, D. E., Feldman, B. E., Kanazawa, N., Carroll, R., Movshovitz-Attias, Y., Barron, J. T., Pritch, Y., Levoy, M. 2018. Synthetic Depth-of-Field with a Single-Camera Mobile Phone, ACM Transactions on Graphics, Vol. 37, No. 4, p. 64:1--64:13. DOI: 10.1145/3197517.3201329
Mostajabi, M., Yadollahpour, P., Shakhnarovich, G. 2014. Feedforward Semantic Segmentation with Zoom-out Features, CoRR, Vol. abs/1412.0
Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Rodr\’\iguez, J. G. 2017. A Review on Deep Learning Techniques Applied to Semantic Segmentation, CoRR, Vol. abs/1704.0
Krähenbühl, P., Koltun, V. 2011. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, Proceedings of the 24th International Conference on Neural Information Processing Systems, Curran Associates Inc., USA, p. 109–117
Kae, A., Sohn, K., Lee, H., Learned-Miller, E. 2013. Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p. 2019–2026. DOI: 10.1109/CVPR.2013.263
Viola, P., Jones, M. 2001. Rapid Object Detection Using a Boosted Cascade of Simple Features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001 (Vol. 1), p. I–I. DOI: 10.1109/CVPR.2001.990517
Rowley, H. A., Baluja, S., Kanade, T. 1998. Neural Network-Based Face Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, p. 23–38. DOI: 10.1109/34.655647
Milborrow, S., Nicolls, F. 2008. Locating Facial Features with an Extended Active Shape Model, D. Forsyth; P. Torr; A. Zisserman (Eds.), Computer Vision -- ECCV 2008, Springer Berlin Heidelberg, Berlin, Heidelberg, p. 504–513
Collobert, R., Kavukcuoglu, K., Farabet, C. 2011. Torch7: A Matlab-like Environment for Machine Learning, BigLearn, NIPS Workshop
Huang, G. B., Jain, V., Learned-Miller, E. G. 2007. Unsupervised Joint Alignment of Complex Images, 2007 IEEE 11th International Conference on Computer Vision, p. 1–8
Kalayeh, M. M., Seifu, M., LaLanne, W., Shah, M. 2015. How to Take a Good Selfie?, Proceedings of the 23rd ACM International Conference on Multimedia, ACM, New York, NY, USA, p. 923–926. DOI: 10.1145/2733373.2806365