Derin Öğrenme Algoritmalarının GPU ve CPU Donanım Mimarileri Üzerinde Uygulanması ve Performans Analizi: Deneysel Araştırma

Günümüzde hızla gelişen teknolojiyle verilerin çeşitliliği ve boyutu artmaktadır. Bu artış bilgisayar mimarisinde farklı tasarımları ortaya çıkarmıştır. CPU ve GPU mimarileri üzerlerinde bulunan çekirdek sayıları uygulama anında sonuca ulaşmada çözümler sağlayabilmektedir. Yazılım geliştirmesi yapılırken işlem performansı ve güç tüketimine dikkat edilmelidir. CPU’lar GPU’lardan daha uzun işlem süresi ile uygulamaları yürütmektedir. Bu süre performans sırasında harcanan gücü doğru orantılı etkilemektedir. GPU’lar derin öğrenme algoritmalarında CPU’lardan daha hızlı ve başarılı sonuçlar vermektedir. Öğrenme aşamasındaki en önemli kriter olan veri setinin büyüklüğü ve çeşitliliği öğrenme başarısını aynı oranda artırmaktadır. Bu çalışmada farklı mimariye sahip işlemciler üzerinde veri seti büyüklüğü ve işlem süresi kriterleri göz önünde bulundurularak uygulamalar yapılmıştır. Yapılan uygulamalarda GPU mimarilerinde harcanan güç seviyesi ölçülmüştür. Farklı büyüklüğe sahip 3 veri seti üzerinde CNN, RNN ve LSTM derin öğrenme algoritmaları uygulanmıştır. 6 farklı deney yapılarak performans ve enerji tüketimi konularında tespitler ve performans karşılaştırılması yapılmıştır. Çalışma neticesinde elde edilen sonuçlar ile algoritmalar üzerinde çalışmalar yapılırken süre ve enerji kriterleri baz alınmıştır. Bulgular derin öğrenme algoritmalarının yüksek doğrulukta GPU sistemlerinde tahmin edilmesinde yardımcı bir araç olarak kullanılabileceği yönündedir. Araştırmanın sonuçları CPU ve GPU sistemleri ile enerji ve süre açısından önemli bilgiler içermesinin yanı sıra, gelecekte farklı sektörlerde uygulanması açısından değer taşımaktadır.

Application of Deep Learning Algorithms to GPU and CPU Hardware Architectures and Performance Analysis: Experimental Reseach

Nowadays, the variety and volume of data is increasing with the rapidly developing technology. This increase has led to different designs in computer architecture. The number of cores on CPU and GPU architectures can provide solutions to achieve the result at the time of application. When developing software, attention should be paid to processing performance and power consumption. CPUs run applications with a longer processing time than GPUs. This time directly affects power consumption during performance. GPUs produce faster and more successful results than CPUs for Deep Learning algorithms. The size and diversity of the dataset, which is the most important criterion in the learning phase, increases the learning success to the same extent. In this study, applications were performed on processors with different architectures considering the criteria of dataset size and processing time. In the applications, the power consumption of GPU architectures is measured. CNN, RNN and LSTM deep learning algorithms are applied to 3 different sized datasets. 6 different experiments are performed and determinations and performance comparisons are made on the performance and power consumption. Time and energy criteria were used in processing the algorithms with the results of the study. The results show that Deep Learning algorithms can be used as a tool for predicting GPU systems with high accuracy. The results of the study not only contain important information related to CPU and GPU systems, energy, and time, but also are valuable for future applications in various fields.

___

  • Själander, M., Martonosi, M., & Kaxiras, S. (2014). Power-efficient computer architectures: Recent advances. Synthesis Lectures on Computer Architecture, 9(3), 1-96.
  • Huang, L., Lü, Y., Ma, S., Xiao, N., & Wang, Z. (2019). SIMD stealing: Architectural support for efficient data parallel execution on multicores. Microprocessors and Microsystems, 65, 136-147.
  • Li, T., Evans, A. T., Chiravuri, S., Gianchandani, R. Y., & Gianchandani, Y. B. (2012). Compact, power-efficient architectures using microvalves and microsensors, for intrathecal, insulin, and other drug delivery systems. Advanced drug delivery reviews, 64(14), 1639-1649.
  • Katreepalli, R., & Haniotakis, T. (2019). Power efficient synchronous counter design. Computers & Electrical Engineering, 75, 288-300.
  • Dehnavi, M., & Eshghi, M. (2018). Cost and power efficient FPGA based stereo vision system using directional graph transform. Journal of Visual Communication and Image Representation, 56, 106-115.
  • Huynh, T. V., Mücke, M., & Gansterer, W. N. (2012). Evaluation of the Stretch S6 Hybrid Reconfigurable Embedded CPU Architecture for Power-Efficient Scientific Computing. Procedia Computer Science, 9, 196-205.
  • Lautner, D., Hua, X., DeBates, S., Song, M., & Ren, S. (2018). Power efficient scheduling algorithms for real-time tasks on multi-mode microcontrollers. Procedia computer science, 130, 557-566.
  • Wyant, C. M., Cullinan, C. R., & Frattesi, T. R. (2012). Computing performance benchmarks among cpu, gpu, and fpga. Computing.
  • Mittal, S., & Vetter, J. S. (2014). A survey of methods for analyzing and improving GPU energy efficiency. ACM Computing Surveys (CSUR), 47(2), 1-23.
  • Betkaoui, B., Thomas, D. B., & Luk, W. (2010). Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing. In 2010 International Conference on Field-Programmable Technology (pp. 94-101). IEEE.
  • Mumcu, M. C., & Bayar, S. (2020). Parallel Implenetation Of The GPR Techniques For Detecting And Mapping Ancient Buildings By Using CUDA. Avrupa Bilim ve Teknoloji Dergisi, 352-359.
  • Stratton, J. A., Anssari, N., Rodrigues, C., Sung, I. J., Obeid, N., Chang, L., ... & Hwu, W. M. (2012). Optimization and architecture effects on GPU computing workload performance. In 2012 Innovative Parallel Computing (InPar) (pp. 1-10). IEEE.
  • InAccel. (2018). Cpu Gpu Fpga or Tpu, https://medium.com/@inaccel/cpu-gpu-fpga-or-tpu-which-one-to-choose-for-my-machine-learning-training-948902f058e0, 15.03.2021.
  • Kahoul, A., Constantinides, G. A., Smith, A. M., & Cheung, P. Y. (2009). Heterogeneous architecture exploration: Analysis vs. parameter sweep. In International Workshop on Applied Reconfigurable Computing (pp. 133-144). Springer, Berlin, Heidelberg.
  • Qasaimeh, M., Denolf, K., Lo, J., Vissers, K., Zambreno, J., & Jones, P. H. (2019). Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels. In 2019 IEEE International Conference on Embedded Software and Systems (ICESS) (pp. 1-8). IEEE.
  • Holm, H. H., Brodtkorb, A. R., & Sætra, M. L. (2020). GPU computing with Python: Performance, energy efficiency and usability. Computation, 8(1), 4.
  • Bandyopadhyay, A., (2019). Hands-On GPU Computing with Python: Explore the capabilities of GPUs for solving high performance computational problems, Packt Publishing, ISBN-13: 978-1789341072
  • Aydın, S., Samet, R., & Bay, Ö. F. (2020) Gpu Programlamada Cuda Platformu Kullanılan Paralel Görüntü İşleme Çalışmalarının İncelenmesi. Politeknik Dergisi, 23(3), 737-754.
  • Vaidya, B. (2018). Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA: Effective techniques for processing complex image data in real time using GPUs. Packt Publishing Ltd.
  • Goz, D., Ieronymakis, G., Papaefstathiou, V., Dimou, N., Bertocco, S., Simula, F., ... & Taffoni, G. (2020). Performance and energy footprint assessment of FPGAs and GPUs on HPC systems using Astrophysics application. Computation, 8(2), 34.
  • Lam, S. K., Pitrou, A., & Seibert, S. (2015). Numba: A llvm-based python jit compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC (pp. 1-6).
  • Okuta, R., Unno, Y., Nishino, D., Hido, S., & Loomis, C. (2017). Cupy: A numpy-compatible library for nvidia gpu calculations. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS) (p. 7).
  • Kaehler, A., & Bradski, G. (2016). Learning OpenCV 3: computer vision in C++ with the OpenCV library. " O'Reilly Media, Inc.".
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
  • Klöckner, A., Pinto, N., Lee, Y., Catanzaro, B., Ivanov, P., & Fasih, A. (2012). PyCUDA and PyOpenCL: A scripting- based approach to GPU run-time code generation. Parallel Computing, 38(3), 157-174.
  • Sriadibhatla, S., & Baboji, K. (2019). Design and implementation of area and power efficient reconfigurable fir filter with low complexity coefficients. Gazi University Journal of Science, 32(2), 494-507.
  • Sanapala, K. (2017). Two Novel Subthreshold Logic Families for Area and Ultra Low-Energy Efficient Applications: DTGDI & SBBGDI. Gazi University Journal of Science, 30(4), 283-294.
  • Demirbas, A. A., & Çınar, A. (2020) Nesne Sınıflandırma İşlemi İçin Tensor İşleme Birimi ve Cpu Performans Karşılaştırması. Bilgisayar Bilimleri ve Teknolojileri Dergisi, 1(1), 10-15.
  • Dodiu, E., & Gaitan, V. G. (2012). Custom designed CPU architecture based on a hardware scheduler and independent pipeline registers—Concept and theory of operation. In 2012 IEEE International Conference on Electro/Information Technology (pp. 1-5). IEEE.
  • Pereira, R., Couto, M., Ribeiro, F., Rua, R., Cunha, J., Fernandes, J. P., & Saraiva, J. (2017). Energy efficiency across programming languages: how do energy, time, and memory relate?. In Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering (pp. 256-267).
  • Çetin, N. M., & Hacıömeroğlu, M. (2013). Gpu Hızlandırmalı Veri Demetleme Algoritmalarının İncelenmesi. Ajıt-E: Bilişim Teknolojileri Online Dergisi, 4(11), 19-59.
  • Nvidia Time Series Dataset, Modeling Time Series Data with Recurrent Neural Networks in Keras, Son Erişim Tarihi: 02.01.2021, https://courses.nvidia.com/courses/course-v1:DLI+L-FX-24+V1/about adresinden erişildi.
  • Tensorflow Mnist Dataset, Loads the MNIST dataset, Son Erişim Tarihi: 05.01.2021, https://www.tensorflow.org/api_docs/python/tf/keras/datasets/mnist/load_data adresinden erişildi.
  • Tensorflow IMDB Dataset, Loads the IMDB dataset, Son Erişim Tarihi: 09.01.2021, https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb/load_data adresinden erişildi.
  • Selvin, S., Vinayakumar, R., Gopalakrishnan, E. A., Menon, V. K., & Soman, K. P. (2017, September). Stock price prediction using LSTM, RNN and CNN-sliding window model. In 2017 international conference on advances in computing, communications and informatics (icacci) (pp. 1643-1647). IEEE.
  • Ullah, A., Ahmad, J., Muhammad, K., Sajjad, M., & Baik, S. W. (2017). Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE access, 6, 1155-1166.
  • Zhu, F., Ye, F., Fu, Y., Liu, Q., & Shen, B. (2019). Electrocardiogram generation with a bidirectional LSTM-CNN generative adversarial network. Scientific reports, 9(1), 1-11.