Adnan ÖZSOY

A Comprehensive Performance Comparison of Dedicated and Embedded GPU Systems

General purpose usage of graphics processing units (GPGPU) is becoming increasingly important asgraphics processing units (GPUs) get more powerful and their widespread usage in performance-orientedcomputing. GPGPUs are mainstream performance hardware in workstation and cluster environments andtheir behavior in such setups are highly analyzed. Recently, NVIDIA, the leader hardware and softwarevendor in GPGPU computing, started to produce more energy efficient embedded GPGPU systems, Jetsonseries GPUs, to make GPGPU computing more applicable in domains where energy and space are limited.Although, the architecture of the GPUs in Jetson systems is the same as the traditional dedicated desktopgraphic cards, the interaction between the GPU and the other components of the system such as mainmemory, central processing unit (CPU), and hard disk, is a lot different than traditional desktop solutions.To fully understand the capabilities of the Jetson series embedded solutions, in this paper we run severalapplications from many different domains and compare the performance characteristics of theseapplications on both embedded and dedicated desktop GPUs. After analyzing the collected data, we haveidentified certain application domains and program behaviors that Jetson series can deliver performancecomparable to dedicated GPU performance.

PDF

___

1. Reese, J. and Zaranek, S., Gpu programming in matlab. MathWorks News&Notes. Natick, MA: The MathWorks Inc, pp.22-5. 2012.
2. Kirk, D., NVIDIA CUDA software and GPU parallel computing architecture. In ISMM (Vol. 7, pp. 103- 104). 2007, October.
3. Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. ImageNet classification with deep convolutional neural networks, 25th Int. Conf. on Neural Information Processing Systems, p.1097-1105. 2012.
4. CUDA Spotlight GPU Applications Showcase. https://devblogs.nvidia.com/parallelforall/cudaspotlight-gpu-accelerated-speech-recognition/ (Accessed at 22.05.2020)
5. GPU Technology Conference, Tutorials. http://ondemand.gputechconf.com/gtc/2015/webinar/deeplearning-course/intro-to-deep-learning.pdf (Accessed: 22.05.2020)
6. GPU Technology Conference, Tutorials. http://ondemand.gputechconf.com/gtc/2014/presentations/S46 21-deep-neural-networks-automotive-safety.pdf (Accessed: 22.05.2020)
7. NVIDIA Embedded Platform. https://developer.nvidia.com/embedded/jetsonembedded-platform (Accessed : 22.05.2020)
8. B. Baumann. “Jetson TK1”, Institut Für Technische Informatik, Advanced Seminar Computer Engineering, Seminar Winter Term 2014/2015. 2015.
9. C. Alicea-Nieves. Caffe Framework on the Jetson TK1: Using Deep Learning for Real Time Object Detection. SUNFEST at PENN. (https://sunfest.seas.upenn.edu/) 2018.
10. R. J. Abbasi. HPCG benchmark for characterising performance of SoC devices, (Unpublished Master Thesis). The Australian National University. 2015.
11. Stone JE, Hallock MJ, Phillips JC, Peterson JR, Luthey-Schulten Z, Schulten K. Evaluation of emerging energy-efficient heterogeneous computing platforms for biomolecular and cellular simulation workloads. IEEE 30th Int. Parallel and Distr. Processing Symposium Workshops, IPDPSW. IEEE Computer Society. p. 89-100. 2016.
12. Nathan Otterness, Ming Yang, Sarah Rust, Eunbyung Park, James H. Anderson, F. Donelson Smith, Alexander C. Berg, Shige Wang. An Evaluation of the NVIDIA TX1 for Supporting Real-Time ComputerVision Workloads. RTAS 2017: 353-364. 2017.
13. D. Bourque, CUDA-Accelerated Visual SLAM For UAVs, (Unpublished Master Thesis). Worcester Polytechnic Institute. 2017.
14. Jose, E., Greeshma, M., TP, M.H. and Supriya, M.H., March. Face recognition based surveillance system using facenet and mtcnn on jetson tx2. 5th Int. Conf. on Advanced Computing & Communication Systems (ICACCS) (pp. 608-613). IEEE. 2019.
15. Giubilato, R., Chiodini, S., Pertile, M. and D., S., An evaluation of ROS-compatible stereo visual SLAM methods on a nVidia Jetson TX2. Measurement, 140, pp.161-170. 2019.
16. Van Essen, B., Macaraeg, C., Gokhale, M. and Prenger, R., Accelerating a random forest classifier: Multi-core, GP-GPU, or FPGA. 20th International Symposium on Field-Programmable Custom Computing Machines (pp. 232-239). 2012.
17. Jones, D.H., Powell, A., Bouganis, C.S. and Cheung, P.Y., GPU versus FPGA for high productivity computing. International Conference on Field Programmable Logic and Applications (pp. 119-124). IEEE. 2010, August.
18. Nurvitadhi, E., Venkatesh, G., Sim, J., Marr, D., Huang, R., Ong Gee Hock, J., Liew, Y.T., Srivatsan, K., Moss, D., Subhaschandra, S. and Boudoukh, G., Can FPGAs beat GPUs in accelerating nextgeneration deep neural networks?. In Proceedings of the 2017 ACM/SIGDA Int. Symposium on FieldProgrammable Gate Arrays (pp. 5-14). 2017, February.
19. Nurvitadhi, E., Sim, J., Sheffield, D., Mishra, A., Krishnan, S. and Marr, D., Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC. 26th International Conference on Field Programmable Logic and Applications (FPL) (pp. 1-4). IEEE. 2016, August.
20. CUDA C Programming Guide, http://docs.nvidia.com/cuda/cuda-c-programmingguide/index.html (Accessed : 22.05.2020)
21. Paralution Benchmark Suite. https://developer.nvidia.com/paralution, (Accessed: 22.05.2020)
22. Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V. and Vetter, J.S., March. SHOC benchmark suite. 3rd Workshop on GPGPU (pp. 63-74). 2010.
23. GeForce Titan X Specifications, http://www.geforce.com/hardware/desktopgpus/geforce-gtx-titan-x/specifications (Acessed : 22.05.2020)
24. Jetson TX2 Module Data Sheet. https://developer.nvidia.com/embedded/jetson-tx2 (Acessed : 22.05.2020)
25. EVGA GeForce GTX TITAN X(12G-P4-2990-KR) on Amazon.com , https://www.amazon.com/dp/B07MK6CWLR/ref=dp _cr_wdg_tit_rfb (Accessed : 22.05.2020)
26. NVIDIA Jetson TX2 Development Kit on Amazon.com,https://www.amazon.com/B06XPFH93 9 (Accessed : 22.05.2020)
27. Matrix Market, (Accessed: 22.05.2020) http://math.nist.gov/MatrixMarket/
28. The SuiteSparse Matrix Collection, https://www.cise.ufl.edu/research/sparse/matrices/ (Accessed : 22.05.2020)
29. Mittal, Sparsh. "A Survey on optimized implementation of deep learning models on the NVIDIA Jetson platform." Journal of Systems Architecture 97 (2019): 428-442.
30. Cui, Han, and Naim Dahnoun. "Real-Time Stereo Vision Implementation on Nvidia Jetson TX2." In 2019 8th Mediterranean Conference on Embedded Computing (MECO), pp. 1-5. IEEE, 2019