Adanmış ve Gömülü GPU Sistemlerinin Kapsamlı Performans Karşılaştırması

Grafik işlem birimlerinin genel amaçlı kullanımı (GPGPU), grafik kartlarının güçlendikçe ve performansa yönelik hesaplamada yaygın kullanımları nedeniyle giderek önem kazanmaktadır. GPGPU'lar, iş istasyonu ve küme ortamlarındaki genel performans donanımıdır ve bu tür kurulumlardaki davranışları büyük ölçüde analiz edilir. Son zamanlarda, GPGPU hesaplamasında lider donanım ve yazılım satıcısı olan NVIDIA, GPGPU hesaplamayı enerji ve alanın sınırlı olduğu alanlarda daha uygulanabilir hale getirmek için daha fazla enerji tasarruflu gömülü GPGPU sistemleri, Jetson serisi, üretmeye başladı. Jetson sistemlerindeki GPU'ların mimarisi geleneksel adanmış masa üstü sistemlerde kullanılan grafik kartlarıyla aynı olsa da GPU ve sistemin ana bellek, CPU ve sabit disk gibi diğer bileşenleri arasındaki etkileşim geleneksel masaüstü çözümlerinden çok farklıdır. Jetson serisi gömülü çözümlerin yeteneklerini tam olarak anlamak için, bu makalede birçok farklı alandan birkaç uygulama çalıştırıyoruz ve bu uygulamaların performans özelliklerini hem gömülü hem de ayrık masaüstü grafik kartlarıyla karşılaştırıyoruz. Toplanan verileri analiz ettikten sonra, Jetson serisinin masaüstü GPU performansıyla karşılaştırılabilir performans sağlayabileceği belirli uygulama alanlarını ve program davranışlarını belirledik.

Anahtar Kelimeler:

NVIDIA Jetson, Embedded GPGPU, CUDA

A Comprehensive Performance Comparison of Dedicated and Embedded GPU Systems

General purpose usage of graphics processing units (GPGPU) is becoming increasingly important as GPUs get more powerful and their widespread usage in performance-oriented computing. GPGPUs are mainstream performance hardware in workstation and cluster environments and their behavior in such setups are highly analyzed. Recently, NVIDIA, the leader hardware and software vendor in GPGPU computing, started to produce more energy efficient embedded GPGPU systems, Jetson series GPUs, to make GPGPU computing more applicable in domains where energy and space are limited. Although, the architecture of the GPUs in Jetson systems is the same as the traditional dedicated desktop graphic cards, the interaction between the GPU and the other components of the system such as main memory, CPU, and hard disk, is a lot different than traditional desktop solutions. To fully understand the capabilities of the Jetson series embedded solutions, in this paper we run several applications from many different domains and compare the performance characteristics of these applications on both embedded and dedicated desktop GPUs. After analyzing the collected data, we have identified certain application domains and program behaviors that Jetson series can deliver performance comparable to dedicated GPU performance.

Keywords:

NVIDIA Jetson, Embedded GPGPU, CUDA,

PDF

___

1. Reese, J. and Zaranek, S., Gpu programming in matlab. MathWorks News&Notes. Natick, MA: The MathWorks Inc, pp.22-5. 2012.
2. Kirk, D., NVIDIA CUDA software and GPU parallel computing architecture. In ISMM (Vol. 7, pp. 103-104). 2007, October.
3. Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. ImageNet classification with deep convolutional neural networks, 25th Int. Conf. on Neural Information Processing Systems, p.1097-1105. 2012.
4. CUDA Spotlight GPU Applications Showcase. https://devblogs.nvidia.com/parallelforall/cuda-spotlight-gpu-accelerated-speech-recognition/ (Accessed at 22.05.2020)
5. GPU Technology Conference, Tutorials. http://on-demand.gputechconf.com/gtc/2015/webinar/deep-learning-course/intro-to-deep-learning.pdf (Accessed: 22.05.2020)
6. GPU Technology Conference, Tutorials. http://on-demand.gputechconf.com/gtc/2014/presentations/S4621-deep-neural-networks-automotive-safety.pdf (Accessed: 22.05.2020)
7. NVIDIA Embedded Platform. https://developer.nvidia.com/embedded/jetson-embedded-platform (Accessed : 22.05.2020)
8. B. Baumann. “Jetson TK1”, Institut Für Technische Informatik, Advanced Seminar Computer Engineering, Seminar Winter Term 2014/2015. 2015.
9. C. Alicea-Nieves. Caffe Framework on the Jetson TK1: Using Deep Learning for Real Time Object Detection. SUNFEST at PENN. (https://sunfest.seas.upenn.edu/) 2018.
10. R. J. Abbasi. HPCG benchmark for characterising performance of SoC devices, (Unpublished Master Thesis). The Australian National University. 2015.
11. Stone JE, Hallock MJ, Phillips JC, Peterson JR, Luthey-Schulten Z, Schulten K. Evaluation of emerging energy-efficient heterogeneous computing platforms for biomolecular and cellular simulation workloads. IEEE 30th Int. Parallel and Distr. Processing Symposium Workshops, IPDPSW. IEEE Computer Society. p. 89-100. 2016.
12. Nathan Otterness, Ming Yang, Sarah Rust, Eunbyung Park, James H. Anderson, F. Donelson Smith, Alexander C. Berg, Shige Wang. An Evaluation of the NVIDIA TX1 for Supporting Real-Time Computer-Vision Workloads. RTAS 2017: 353-364. 2017.
13. D. Bourque, CUDA-Accelerated Visual SLAM For UAVs, (Unpublished Master Thesis). Worcester Polytechnic Institute. 2017.
14. Jose, E., Greeshma, M., TP, M.H. and Supriya, M.H., March. Face recognition based surveillance system using facenet and mtcnn on jetson tx2. 5th Int. Conf. on Advanced Computing & Communication Systems (ICACCS) (pp. 608-613). IEEE. 2019.
15. Giubilato, R., Chiodini, S., Pertile, M. and D., S., An evaluation of ROS-compatible stereo visual SLAM methods on a nVidia Jetson TX2. Measurement, 140, pp.161-170. 2019.
16. Van Essen, B., Macaraeg, C., Gokhale, M. and Prenger, R., Accelerating a random forest classifier: Multi-core, GP-GPU, or FPGA. 20th International Symposium on Field-Programmable Custom Computing Machines (pp. 232-239). 2012.
17. Jones, D.H., Powell, A., Bouganis, C.S. and Cheung, P.Y., GPU versus FPGA for high productivity computing. International Conference on Field Programmable Logic and Applications (pp. 119-124). IEEE. 2010, August.
18. Nurvitadhi, E., Venkatesh, G., Sim, J., Marr, D., Huang, R., Ong Gee Hock, J., Liew, Y.T., Srivatsan, K., Moss, D., Subhaschandra, S. and Boudoukh, G., Can FPGAs beat GPUs in accelerating next-generation deep neural networks?. In Proceedings of the 2017 ACM/SIGDA Int. Symposium on Field-Programmable Gate Arrays (pp. 5-14). 2017, February.
19. Nurvitadhi, E., Sim, J., Sheffield, D., Mishra, A., Krishnan, S. and Marr, D., Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC. 26th International Conference on Field Programmable Logic and Applications (FPL) (pp. 1-4). IEEE. 2016, August.
20. CUDA C Programming Guide, http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html (Accessed : 22.05.2020)
21. Paralution Benchmark Suite. https://developer.nvidia.com/paralution, (Accessed: 22.05.2020)
22. Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V. and Vetter, J.S., March. SHOC benchmark suite. 3rd Workshop on GPGPU (pp. 63-74). 2010.
23. GeForce Titan X Specifications, http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-x/specifications (Acessed : 22.05.2020)
24. Jetson TX2 Module Data Sheet. https://developer.nvidia.com/embedded/jetson-tx2 (Acessed : 22.05.2020)
25. EVGA GeForce GTX TITAN X(12G-P4-2990-KR) on Amazon.com , https://www.amazon.com/dp/B07MK6CWLR/ref=dp_cr_wdg_tit_rfb (Accessed : 22.05.2020)
26. NVIDIA Jetson TX2 Development Kit on Amazon.com,https://www.amazon.com/B06XPFH939 (Accessed : 22.05.2020)
27. Matrix Market, (Accessed: 22.05.2020) http://math.nist.gov/MatrixMarket/
28. The SuiteSparse Matrix Collection, https://www.cise.ufl.edu/research/sparse/matrices/ (Accessed : 22.05.2020)