Derin Öğrenme ve Görüntü Analizinde Kullanılan Derin Öğrenme Modelleri

Klasik Makine öğrenme teknikleri ile bir model tanımlama veya makine öğrenimi sistemi kurmak için öncelikle özellik vektörünün çıkarılması gerekmektedir. Özellik vektörünün çıkarılması için alanında uzman kişilere ihtiyaç duyulmaktadır. Bu işlemler hem çok zaman almakta hem de uzmanı çok meşgul etmektedir. Bu sebeple bu teknikler, ham bir veriyi ön işlem yapmadan ve uzman yardımı olmadan işleyemezler. Derin Öğrenme makine öğrenimi alanında çalışanların uzun yıllar boyunca uğraştığı bu sorunu ortadan kaldırarak büyük ilerleme sağlamıştır. Çünkü derin ağlar geleneksel makine öğrenmesi ve görüntü işleme tekniklerinin aksine öğrenme işlemini ham veri üzerinde yapmaktadır. Ham veriyi işlerken gerekli bilgiyi farklı katmanlarda oluşturmuş olduğu temsillerle elde etmektedir. Derin Öğrenme ilk defa 2012 yılında nesne sınıflandırma için yapılan, büyük ölçekli görsel tanıma (ImageNet) yarışmasında elde ettiği başarı ile dikkatleri üzerine çekmiştir. Derin Öğrenmenin temelleri geçmişe dayansa da özellikle son yıllarda popüler olmasının en önemli sebeplerinden ilki eğitim için yeteri kadar verinin olması ve ikinci olarak bu veriyi işleyecek donanımsal alt yapının olmasıdır. Bu çalışmada Derin Öğrenme hakkında detaylı bilgi verilmiştir. Evrişimsel Sinir Ağı(ESA) mimarisinin katmanları olan Konvolüsyon, Havuzlama, ReLu, DropOut, Tam bağlantılı ve Sınıflandırma katmanı hakkında açıklamalar yapılmıştır. Ayrıca Derin Öğrenmede temel mimariler olarak kabul edilebilecek AlexNet, ZFNet, GoogLeNet, Microsoft RestNet ve R-CNN mimarileri anlatılmıştır.

Anahtar Kelimeler:

Derin Öğrenme, CNN, Konvolüsyon, Pooling, AlexNet, ZFNet, GoogLeNet, Microsoft RestNet, R-CNN

Deep Learning and Deep Learning Models Used in Image Analysis

In order to establish a machine learning system or model definition with classical machine learning techniques, it is necessary to first extract the feature vector. Experts are needed for extract the feature vector. For this reason, these techniques are insufficient at the point where a raw data can be processed. Deep learning has made tremendous progress by eliminating this problem, which has been a challenge for many years in the field of machine learning. Unlike traditional machine learning and image processing techniques, Deep Learning do the learning process on raw data. It obtains the necessary information from the representations that it formed in different layers. Deep learning uses many areas such as image recognition, voice recognition, natural language processing and gene analysis etc. Deep learning first attracted attention with its success in the Large Scale Visual Recognition (ImageNet) competition for object classification in 2012. In fact, the foundations of Deep Learning depend on the past. But it has become popular in recent years mainly due to two reasons. The first is the existence of as much data as training. The second is the hardware infrastructure that will process this data. In this study, information about deep learning was given and detailed information about layers of convolution, pooling, ReLu and fully connected layers, which are layers of Convolution Neural Network (CNN) architecture. It also describes AlexNet, ZFNet, GoogLeNet, Microsoft RestNet and Region with Convolution Neural Network (R-CNN) architectures, which can be considered as basic architects for Deep Learning.

Keywords:

Deep Learning, CNN, Convolution, Pooling, AlexNet, ZFNet, GoogLeNet, Microsoft RestNet, R-CNN,

PDF

___

Amodei, D., S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng and G. Chen (2016). Deep speech 2: End-to-end speech recognition in english and mandarin. International Conference on Machine Learning.
Assael, Y. M., B. Shillingford, S. Whiteson and N. de Freitas (2016). "LipNet: End-to-End Sentence-level Lipreading."
Bahdanau, D., J. Chorowski, D. Serdyuk, P. Brakel and Y. Bengio (2016). End-to-end attention-based large vocabulary speech recognition. Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, IEEE.
Bengio, Y., A. Courville and P. Vincent (2013). "Representation Learning: A Review and New Perspectives." Ieee Transactions on Pattern Analysis and Machine Intelligence 35(8): 1798-1828.
Bengio, Y., P. Lamblin, D. Popovici and H. Larochelle (2007). "Greedy layer-wise training of deep networks." In NIPS’2006 . 14, 19, 200, 323, 324, 530, 532.
Bengio, Y. and Y. LeCun (2007). "Scaling learning algorithms towards AI." In Large Scale Kernel Machines . 19.
Bergstra, J. and Y. Bengio (2012). "Random search for hyper-parameter optimization." Journal of Machine Learning Research 13(Feb): 281-305.
Cao, Z., T. Simon, S.-E. Wei and Y. Sheikh (2016). "Realtime multi-person 2d pose estimation using part affinity fields." arXiv preprint arXiv:1611.08050.
Chaudhary, K., O. B. Poirion, L. Lu and L. Garmire (2017). "Deep Learning based multi-omics integration robustly predicts survival in liver cancer." bioRxiv: 114892.
Cheng, Z., Q. Yang and B. Sheng (2015). Deep colorization. Proceedings of the IEEE International Conference on Computer Vision.
Competition, I. L. S. V. R. (2012). "Available online: http://www. image-net. org/challenges." LSVRC/(accessed on 27 December 2016).
cs.stanford.edu. (2016). "CS231n Convolutional Neural Networks for Visual Recognition." from http://cs231n.github.io/convolutional-networks/.
Delalleau, O. and Y. Bengio (2011). "Shallow vs. deep sum-product networks." In NIPS. 19, 556.
Donahue, J., L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko and T. Darrell (2015). Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE conference on computer vision and pattern recognition.
Esteva, A., B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau and S. Thrun (2017). "Dermatologist-level classification of skin cancer with deep neural networks." Nature 542(7639): 115-118.
Everingham, M., L. Van Gool, C. Williams, J. Winn and A. Zisserman (2007). The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results http://www. pascal-network. org/challenges. VOC/voc2007/workshop/index. html.
Ganin, Y., D. Kononenko, D. Sungatullina and V. Lempitsky (2016). DeepWarp: Photorealistic image resynthesis for gaze manipulation. European Conference on Computer Vision, Springer.
Girshick, R. (2015). Fast r-cnn. Proceedings of the IEEE international conference on computer vision.
Girshick, R., J. Donahue, T. Darrell and J. Malik (2014). "Rich feature hierarchies for accurate object detection and semantic segmentation." 2014 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr): 580-587.
Graves, A. (2013). "Generating sequences with recurrent neural networks." arXiv preprint arXiv:1308.0850.
Graves, A., A.-r. Mohamed and G. Hinton (2013). Speech recognition with deep recurrent neural networks. Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, IEEE.
He, K., X. Zhang, S. Ren and J. Sun (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the IEEE international conference on computer vision.
He, K. M., X. Y. Zhang, S. Q. Ren and J. Sun (2016). "Deep Residual Learning for Image Recognition." 2016 Ieee Conference on Computer Vision and Pattern Recognition (Cpvr): 770-778.
Hermann, K. M., T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman and P. Blunsom (2015). Teaching machines to read and comprehend. Advances in Neural Information Processing Systems.
Hinton, G., L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen and T. N. Sainath (2012). "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." IEEE Signal Processing Magazine 29(6): 82-97.
Hinton, G. E., S. Osindero and Y.-W. Teh (2006). "A fast learning algorithm for deep belief nets." Neural computation 18(7): 1527–1554.
Hwang, J. and Y. Zhou "Image Colorization with Deep Convolutional Neural Networks."
Iandola, F. N., S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally and K. Keutzer (2016). "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360.
Isola, P., J.-Y. Zhu, T. Zhou and A. A. Efros (2016). "Image-to-image translation with conditional adversarial networks." arXiv preprint arXiv:1611.07004.
Jozefowicz, R., O. Vinyals, M. Schuster, N. Shazeer and Y. Wu (2016). "Exploring the limits of language modeling." arXiv preprint arXiv:1602.02410.
Julia, D. L. f. (2016). "devblogs.nvidia.com." from https://devblogs.nvidia.com/parallelforall/mocha-jl-deep-learning-julia/.
Kiros, R., R. Salakhutdinov and R. S. Zemel (2014). "Unifying visual-semantic embeddings with multimodal neural language models." arXiv preprint arXiv:1411.2539.
Krizhevsky, A., I. Sutskever and G. Hinton (2012). "ImageNet classification with deep convolutional neural networks." In NIPS’2012 . 23, 24, 27, 100, 200, 371, 456, 460.
Lample, G., M. Ballesteros, S. Subramanian, K. Kawakami and C. Dyer (2016). "Neural architectures for named entity recognition." arXiv preprint arXiv:1603.01360.
Lanchantin, J., R. Singh, B. Wang and Y. Qi (2016). "Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks." arXiv preprint arXiv:1608.03644.
Larsson, G., M. Maire and G. Shakhnarovich (2016). Learning representations for automatic colorization. European Conference on Computer Vision, Springer.
LeCun, Y. (1987). Modèles connexionistes de l’apprentissage, Université de Paris VI. 18, 504, 517.
LeCun, Y., Y. Bengio and G. Hinton (2015). "Deep learning." Nature 521(7553): 436-444.
Lecun, Y., L. Bottou, Y. Bengio and P. Haffner (1998). "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86(11): 2278–2324.
Lee, H., R. Grosse, R. Ranganath and A. Y. Ng (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. Proceedings of the 26th annual international conference on machine learning, ACM.
Lenz, I., H. Lee and A. Saxena (2015). "Deep learning for detecting robotic grasps." The International Journal of Robotics Research 34(4-5): 705-724.
Levine, S., P. Pastor, A. Krizhevsky, J. Ibarz and D. Quillen (2016). "Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection." The International Journal of Robotics Research: 0278364917710318.
Lillicrap, T. P., J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra (2015). "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971.
Liu, X. (2017). "Deep Recurrent Neural Network for Protein Function Prediction from Sequence." arXiv preprint arXiv:1701.08318.
Long, J., E. Shelhamer and T. Darrell (2015). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Luong, M.-T., H. Pham and C. D. Manning (2015). "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025.
Mao, J., W. Xu, Y. Yang, J. Wang and A. L. Yuille (2014). "Explain images with multimodal recurrent neural networks." arXiv preprint arXiv:1410.1090.
McClelland, J., D. Rumelhart and G. Hinton (1995). The appeal of parallel distributed processing . In Computation & intelligence, American Association for Artificial Intelligence. 17.
McCulloch, W. S. and W. Pitts (1943). "A logical calculus of the ideas immanent in nervous activity." The Bulletin of Mathematical Biophysics 5(4): 115–133.
Minsky, M. L. a. P., S. A. (1969). "Perceptrons." MIT Press, Cambridge. 15.
Mnih, V., K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller (2013). "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602.
Mnih, V., K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland and G. Ostrovski (2015). "Human-level control through deep reinforcement learning." Nature 518(7540): 529-533.
Montufar, G. F. (2014). "Universal approximation depth and errors of narrow belief networks with discrete units." Neural computation 26(7): 1386–1407.
Pascanu, R., Ç. Gülçehre, K. Cho and Y. Bengio (2014). "How to construct deep recurrent neural networks." In ICLR’2014 . 19, 199, 265, 398, 399, 400, 412, 462.
Qin, Q. and J. Feng (2017). "Imputation for transcription factor binding predictions based on deep learning." PLoS computational biology 13(2): e1005403.
Ranzato, M., C. Poultney, S. Chopra and Y. LeCun (2007). "Efficient learning of sparse representations with an energy-based model." In NIPS’2006 . 14, 19, 509, 530, 532.
Redmon, J., S. Divvala, R. Girshick and A. Farhadi (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Ren, S., K. He, R. Girshick and J. Sun (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in neural information processing systems.
Ren, S. Q., K. M. He, R. Girshick and J. Sun (2017). "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks." Ieee Transactions on Pattern Analysis and Machine Intelligence 39(6): 1137-1149.
Rosenblatt, F. (1958). "The perceptron: A probabilistic model for information storage and organization in the brain." Psychological review 65(6): 386–408.
Rosenblatt, F. (1962). Principles of Neurodynamics. Spartan, New York. 15, 27.
Ruder, S. (2016). "An overview of gradient descent optimization algorithms." arXiv preprint arXiv:1609.04747.
Rumelhart, D. E., G. E. Hinton and R. J. Williams (1986). "Learning representations by back-propagating errors." Nature 323(6088): 533–536.
Rumelhart, D. E., J. L. McClelland and T. P. R. Group (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, Cambridge. 17.
Shrikumar, A., P. Greenside and A. Kundaje (2017). "Reverse-complement parameter sharing improves deep learning models for genomics." bioRxiv: 103663.
Silver, D., A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam and M. Lanctot (2016). "Mastering the game of Go with deep neural networks and tree search." Nature 529(7587): 484-489.
Srivastava, N., G. E. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov (2014). "Dropout: a simple way to prevent neural networks from overfitting." Journal of machine learning research 15(1): 1929-1958.
Suwajanakorn, S., S. M. Seitz and I. Kemelmacher-Shlizerman (2017). "Synthesizing obama: learning lip sync from audio." ACM Transactions on Graphics (TOG) 36(4): 95.
Szegedy, C., W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich (2015). "Going Deeper with Convolutions." 2015 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr): 1-9.
Venugopalan, S., M. Rohrbach, J. Donahue, R. Mooney, T. Darrell and K. Saenko (2015). Sequence to sequence-video to text. Proceedings of the IEEE international conference on computer vision.
Widrow, B. and M. E. Hoff (1960). "Adaptive switching circuits." Adaptive switching circuits. In 1960 IRE WESCON Convention Record, volume 4, pages 96–104. IRE, New York. 15, 21, 24, 27.
WILDML. (2016). "UNDERSTANDING CONVOLUTIONAL NEURAL NETWORKS FOR NLP." from http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/.
Zeiler, M. D. and R. Fergus (2014). "Visualizing and Understanding Convolutional Networks." Computer Vision - Eccv 2014, Pt I 8689: 818-833.
Zhang, R., P. Isola and A. A. Efros (2016). Colorful image colorization. European Conference on Computer Vision, Springer.