Duff Aygıtı TabanlıSeyrek Matris-Vektör Çarpımı

Seyrek matris-vektör çarpımı (SpMV) pek çok mühendislik probleminde ve bilimsel hesaplamada sıklıkla kullanılan bir işlemdir. SpMV’nin hızlandırılması geniş bir yelpazedeki uygulamaları olumlu etkiler. Bu makalede Duff aygıtı olarak bilinen döngü açılımının SpMV’nin başarımına etkisini irdeliyoruz. Önerdiğimiz Duff aygıtı tabanlı SpMV gerçeklemesi, en geçerli seyrek matris saklama formatı olan CSR formatının düşük maliyetli bir ön işlemesi sonrası kullanılabilmektedir. Gerçek problemlerde kullanılan matrislerden oluşan veri kümesi ile deneysel bir değerlendirme yaptık ve önemli derecede hızlanma kaydedilebileceğini gözlemledik.

___

[1]Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel J. Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. Parallel Comput 2009;35:178–94. doi:10.1016/j.parco.2008.12.006.

[2]Filippone S, Cardellini V, Barbieri D, FanfarilloA. Sparse Matrix-Vector Multiplication on GPGPUs. ACM Trans Math Softw 2017;43:30:1--30:49. doi:10.1145/3017994.

[3]Langr D, Tvrdik P. Evaluation Criteria for Sparse Matrix Storage Formats. IEEE Trans Parallel Distrib Syst 2016;27:428–40. doi:10.1109/TPDS.2015.2401575.

[4]Goumas GI, Kourtis K, Anastopoulos N, Karakasis V, Koziris N. Understanding the Performance of Sparse Matrix-Vector Multiplication. 16th Euromicro Int. Conf. Parallel, Distrib. Network-Based Process., 2008, p. 283–92. doi:10.1109/PDP.2008.41.

[5]Liu X, Smelyanskiy M, Chow E, Dubey P. Efficient Sparse Matrix-vector Multiplication on x86-based Many-core Processors. Proc. 27th Int. ACM Conf. Int. Conf. Supercomput., New York, NY, USA: ACM; 2013, p. 273–82. doi:10.1145/2464996.2465013.

[6]Belgin M, Back G, Ribbens CJ. A Library for Pattern-based Sparse Matrix Vector Multiply. Int J Parallel Program 2011;39:62–87. doi:10.1007/s10766-010-0145-2.

[7]Liu W, Vinter B. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. Proc. 29th ACM Int. Conf. Supercomput., New York, NY, USA: ACM; 2015, p. 339–50. doi:10.1145/2751205.2751209.

[8]Vuduc R, Demmel JW, Yelick KA. OSKI: A library of automatically tuned sparse matrix kernels. J Phys Conf Ser 2005;16:521. doi:10.1088/1742-6596/16/1/071.

[9]Bell N, Garland M. Implementing Sparse Matrix-vector Multiplication on Throughput-oriented Processors. Proc. Conf. High Perform. Comput. Networking, Storage Anal., New York, NY, USA: ACM; 2009, p. 18:1--18:11. doi:10.1145/1654059.1654078.

[10]Yzelman AN, Roose D. High-Level Strategies for Parallel Shared-Memory Sparse Matrix-Vector Multiplication. IEEE Trans Parallel Distrib Syst 2014;25:116–25. doi:10.1109/TPDS.2013.31.

[11]Saad Y. Iterative Methods for Sparse Linear Systems. SIAM; 2003. doi:10.1137/1.9780898718003.

[12]Greathouse JL, Daga M. Efficient Sparse Matrix-vector Multiplication on GPUs Using the CSR Storage Format. Int. Conf. High Perform. Comput. Networking, Storage Anal. (SC ’14), 2014, p. 769–80.doi:10.1109/SC.2014.68.

[13]Ashari A, Sedaghati N, Eisenlohr J, Parthasarathy S, Sadayappan P. Fast Sparse Matrix-vector Multiplication on GPUs for Graph Applications. Int. Conf. High Perform. Comput. Networking, Storage Anal. (SC ’14), 2014, p. 781–92. doi:10.1109/SC.2014.69.

[14]Merrill D, Garland M. Merge-based Parallel Sparse Matrix-vector Multiplication. Int. Conf. High Perform. Comput. Networking, Storage Anal. (SC ’16), 2016, p. 58:1--58:12. doi:10.1109/SC.2016.57.

[15]Liu Y, Schmidt B. LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs. Proc. Int. Conf. Appl. Syst. Archit. Process., vol. 2015–Septe, 2015, p. 82–9. doi:10.1109/ASAP.2015.7245713.

[16]Ohshima S, Katagiri T, Matsumoto M. Performance Optimization of SpMV Using CRS Format by Considering OpenMP Scheduling on CPUs and MIC. IEEE Int. Symp. Embed. Multicore/Manycore SoCs, 2014, p. 253–60. doi:10.1109/MCSoC.2014.43.

[17]Liu L, Liu M, Wang C, Wang J. LSRB-CSR: A low overhead storage format for SpMV on the GPU systems. Proc. Int. Conf. Parallel Distrib. Syst. -ICPADS, vol. 2016–Janua, 2016, p. 733–41. doi:10.1109/ICPADS.2015.97.

[18]Duff T. Duff’s Device 1988. http://www.lysator.liu.se/c/duffs-device.html.

[19]Youssefi A. Exploring the Potential for Accelerating Sparse Matrix-Vector Product on a Processing-in-Memory Architecture. Rice University, 2008.

[20]Mellor-Crummey J, Garvin J. Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam. Int J High Perform Comput Appl 2004;18:225–36. doi:10.1177/1094342004038951.

[21]Koster J. Parallel Templates for Numerical Linear Algebra, A High-Performance Computation Library. Utrecht University, 2002.

[22]Davis TA, Hu Y. The University of Florida Sparse Matrix Collection. ACM Trans Math Softw 2011;38:1:1--1:25. doi:10.1145/2049662.2049663.

[23]Kourtis K, Goumas G, Koziris N. Exploiting Compression Opportunities to Improve SpMxV Performance on Shared Memory Systems. ACM Trans Arch Code Optim 2010;7:16:1--16:31. doi:10.1145/1880037.1880041.

[24]Karakasis V, Gkountouvas T, Kourtis K, Goumas G, Koziris N. An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication. IEEE Trans Parallel Distrib Syst 2013;24:1930–40. doi:10.1109/TPDS.2012.290.

[25]Kamin S, Garzarán MJ, Aktemur B, Xu D, Yılmaz B, Chen Z. Optimization by Runtime Specialization for Sparse Matrix-vector Multiplication. Gener. Program. Concepts Exp. (GPCE ’14), 2014, p. 93–102. doi:10.1145/2658761.2658773

26]Yılmaz B, Aktemur B, Garzarán MJ, Kamin S, Kıraç F.Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication. ACM Trans Arch Code Optim 2016;13:5:1--5:26. doi:10.1145/2851500.

[27]Aktemur B. A sparse matrix-vector multiplication method with low preprocessing cost. Concurr Comput Pract Exp2018;30:e4701. doi:10.1002/cpe.4701.

[28]Yang W, Li K, Mo Z, Li K. Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs. IEEE Trans Comput 2015;64:2623–36. doi:10.1109/TC.2014.2366731.

[29]Yzelman AN, Bisseling RH. Cache-Oblivious Sparse Matrix--Vector Multiplication by Using Sparse Matrix Partitioning Methods. SIAM J Sci Comput 2009;31:3128–54. doi:10.1137/080733243.

[30]Martone M. Efficient multithreaded untransposed, transposed or symmetric sparse matrix--vector multiplication with the Recursive Sparse Blocks format. Parallel Comput 2014;40:251–70. doi:10.1016/j.parco.2014.03.008.

[31]Çatalyürek Ü V., Aykanat C. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans Parallel Distrib Syst 1999;10:673–93. doi:10.1109/71.780863.

[32]Willcock J, Lumsdaine A. Accelerating Sparse Matrix Computations via Data Compression. Proc. 20th Annu. Int. Conf. Supercomput., New York, NY, USA: ACM; 2006, p. 307–16. doi:10.1145/1183401.1183444.
Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi-Cover
  • ISSN: 1302-9304
  • Yayın Aralığı: Yılda 3 Sayı
  • Başlangıç: 1999
  • Yayıncı: Dokuz Eylül Üniversitesi Mühendislik Fakültesi