LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION

Öz Semantic segmentation, which is one of the key problems in computer vision, has been applied in various application domains such as autonomous driving, robot navigation, or medical imagery, to name a few. Recently, deep learning, especially deep neural networks, have shown significant performance improvement over conventional semantic segmentation methods. In this paper, we present a novel encoder-decoder type deep neural network-based method, namely XSeNet, that can be trained end-to-end in a supervised manner. We adapt ResNet-50 layers as the encoder and design a cascaded decoder that composes of the stack of the X-Modules, which enables the network to learning dense contextual information and having wider field-of-view. We evaluate our method using CamVid dataset, and experimental results reveal that our method can segment most part of the scene accurately and even outperforms previous state-of-the art methods.

___

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition," Pro-ceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.

M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks," in European conference oncomputer vision, pp. 818-833, Springer, 2014.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifcation with deep convolutional neural networks," inAdvances in neural information processing systems, pp. 1097-1105, 2012.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition," arXiv preprintarXiv:1409.1556, 2014.

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation," in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.

A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for generating image descriptions," in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128-3137, 2015.

V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture forimage segmentation," arXiv preprint arXiv:1511.00561, 2015.

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation," inInternational Conference on Medical image computing and computer-assisted intervention, pp. 234-241, Springer,2015.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Goingdeeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9,2015.

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc)challenge," International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010.

C. Liu, J. Yuen, and A. Torralba, “Sift ow: Dense correspondence across scenes and its applications," IEEEtransactions on pattern analysis and machine intelligence, vol. 33, no. 5, pp. 978-994, 2011.

H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation," in Proceedings of theIEEE international conference on computer vision, pp. 1520-1528, 2015.

G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video: A high-defnition ground truthdatabase," Pattern Recognition Letters, vol. 30, no. 2, pp. 88-97, 2009.

S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, “The one hundred layers tiramisu: Fully convolutionaldensenets for semantic segmentation," in Computer Vision and Pattern Recognition Workshops (CVPRW), 2017IEEE Conference on, pp. 1175-1183, IEEE, 2017.

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.," inCVPR, vol. 1, p. 3, 2017.

S. H. Raza, M. Grundmann, and I. Essa, “Geometric context from videos," in Computer Vision and Pattern Recog-nition (CVPR), 2013 IEEE Conference on, pp. 3081-3088, IEEE, 2013.

A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semanticsegmentation," arXiv preprint arXiv:1606.02147, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition," in Proceedings of the IEEEconference on computer vision and pattern recognition, pp. 770-778, 2016.

I. Ardiyanto and T. B. Adji, “Deep residual coalesced convolutional network for eficient semantic road segmentation,"IPSJ Transactions on Computer Vision and Applications, vol. 9, no. 1, p. 6, 2017.

F. Visin, M. Ciccone, A. Romero, K. Kastner, K. Cho, Y. Bengio, M. Matteucci, and A. Courville, “Reseg: A recurrentneural network-based model for semantic segmentation," in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition Workshops, pp. 41-48, 2016.

R. P. Poudel, S. Liwicki, and R. Cipolla, “Fast-scnn: fast semantic segmentation network," arXiv preprintarXiv:1902.04502, 2019.

F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122,2015.

L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmenta-tion," arXiv preprint arXiv:1706.05587, 2017.

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation withdeep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis andmachine intelligence, vol. 40, no. 4, pp. 834-848, 2018.

L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolutionfor semantic image segmentation," arXiv preprint arXiv:1802.02611, 2018.

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariateshift," arXiv preprint arXiv:1502.03167, 2015.

J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler, “Eficient object localization using convolutionalnetworks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648-656, 2015.

F. Yu, V. Koltun, and T. A. Funkhouser, “Dilated residual networks.," in CVPR, vol. 2, p. 3, 2017.