Almad Eddine DEBBI, Haddi BAKHTI

Incremental Banerjee test conditions committing for robust parallelization framework

This paper describes the design of an automatic parallelization framework. The kernel supplied at itsfront end was suggested as an instrument for parallel potential assessment. It was used to measure the maximumachievable speedups in the major set of the CHStone benchmark suite programs. In such framework, we suggestedthe liberation of parallelism incrementally. We proposed a data dependency heuristic-based transformation method tomake true dependences dissociation. We generated an internal representation ( IR2), where the Banerjee test conditionsare met. Two among three of Banerjee test conditions came to be committed. In shared memory many/multicoreplatforms, the third condition could be satisfied by privatization. We would be able to choose the safe and the opportunepairwise (mapping-privatization) scheme among a number of threads mapping scenarios that become available in theIR2structure. Instrumentation on a subset of CHStone benchmark was carried out as a validity proof of our proposal,and the results confirmed that our framework kernel is robust.

PDF

___

Li J, Sun J, Song Y, Zhao J. Accelerating MRI reconstruction via three-dimensional dual-dictionary learning using CUDA. J Supercomput 2015; 71: 2381-2396.
Glowacz A, Pietron M. Implementation of digital watermarking algorithms in parallel hardware accelerators. Int J Parallel Prog 2017; 45: 1108-1127.
Hidalgo-Pniagua A, Vega-Rodriguez MA, Pavon N, Ferruz J. A comparative study of parallel RANSAC implementation in 3D space. Int J Parallel Prog 2015; 43: 703-720.
Okuyan E, Güdükbay U. Direct volume rendering of unstructured tetrahedral meshes using CUDA and OpenMP. J Supercomput 2014; 67: 324-344.
Dagum L, Menon R. OpenMP: An industry standard API for shared-memory programming. IEEE Comput Sci Eng 1998; 5: 46-55.
Ayguade E, Copty N, Duran A, Hoeflinger J, Lin Y, Massaioli F, Teruel X, Unnikrishnan P, Zhang G. The design of OpenMP tasks. IEEE Trans Parallel Distrib Syst 2009; 20: 404-418.
Wang CK, Chen PS. Automatic scoping of task clauses for the OpenMP tasking model. J Supercomput 2015; 71: 808-823.
Gonçalves R, Amaris M, Okada T, Bruel P, Goldman A. OpenMP is not as easy as it appears. In: IEEE 2016 System Sciences 49th Hawaii International Conference; 5–8 Jan 2016; Koloa, HI, USA. New York, NY, USA: IEEE. pp. 5742-5751.
Blume W, Doallo R, Eigenmann R, Grout J, Hoeflinger J, Lawrence T, Lee J, Padua D, Paek Y, Pottenger B et al. Parallel programming with Polaris. Computer 1996; 29: 78-82.
Bae H, Mustafa D, Lee JW, Aurangzeb, Lin H, Dave C, Eigenmann R, Midkiff SP. The Cetus source-to-source compiler infrastructure: overview and evaluation. Int J Parallel Prog 2013; 41: 753-767.
Campanoni S, Jones TM, Holloway G, Wei GY, Brooks D. Helix: making the extraction of thread-level parallelism mainstream. IEEE Micro 2012; 32: 8-18.
Liao C, Quinlan D, Panas T, de Supinski BR. A ROSE-based OpenMP 3.0 research compiler supporting multiple runtime libraries. In: International Workshop on OpenMP (IWOMP); 14–16 June 2010; Tsukuba, Japan. Heidelberg, Berlin: Springer. pp. 15-28.
Zhang X, Navabi A, Jagannathan S. Alchemist: a transparent dependence distance profiling infrastructure. In: IEEE/ACM 2009 the 7th annual International Symposium on Code Generation and Optimization; 22–25 March 2009; Seattle, WA, USA. New York, NY, USA: IEEE. pp. 47-58.
Chen T, Lin J, Dai X, Hsu WC, Yew PC. Data dependence profiling for speculative optimizations. In: International Conference on Compiler Construction; 29 March–2 April 2004; Barcelona, Spain. Heidelberg, Berlin: Springer. pp. 57-72.
Kim M, Kim H, Luk CK. SD3: A scalable approach to dynamic data-dependence profiling. In: IEEE/ACM 2010 43rd Annual International symposium on micro-architecture; 4–8 December 2010; Atlanta, GA, USA. New York, NY, USA: IEEE. pp. 535-546.
Li Z, Jannesari A, Wolf F. An efficient data-dependence profiler for sequential and parallel programs. In: IEEE 2015 International Parallel and Distributed Processing Symposium; 25–29 May 2015; Hyderabad, India. New York, NY, USA: IEEE. pp. 484-493.
Sato Y, Inoguchi Y, Nakamura T. Whole program data dependence profiling to unveil parallel regions in the dynamic execution. In: IEEE 2012 International Symposium on Workload Characterization; 4–6 November 2012; La Jolla, CA, USA. New York, NY, USA: IEEE. pp. 69-80.
Tian C, Feng M, Nagarajan V. Gupta R. Speculative parallelization of sequential loops on multicores. Int J Parallel Prog 2009; 37: 508-535.
Campanoni S, Jones TM, Holloway G, JanapaReddi V, Wei GY, Brooks D. Helix: automatic parallelization of irregular programs for chip multiprocessing. In: ACM 2012 Proceedings of the tenth international symposium on code generation and optimization; 31 March–4 April 2012; San Jose, California, USA. New York, NY, USA: ACM. pp. 84-93.
Johnson NP, Kim H, Prabhu P, Zaks A, August DI. Speculative separation for privatization and reductions. In: ACM 2012 Proceedings of the 33rd ACM SIGPLAN Conference on Programming Languages Design and Implementation; 11–16 June 2012; Beijing, China. New York, NY, USA: ACM. pp. 359-370.
Tu P, Padua D. Automatic array privatization. In: International Workshop on Languages and Compilers for Parallel Computing; 12–14 August 1993; Oregon, USA. Heidelberg, Berlin: Springer. pp. 500-521.
Li Z. Array privatization for parallel execution of loops. In: ACM 1992 Proceedings of the 6th International Conference on Supercomputing; 19–24 July 1992; Washington D. C., USA. New York, NY, USA: ACM. pp. 313- 322.
Li M, Zhao Y, Tao Y. Dynamically spawning speculative threads to improve speculative path execution. In: International Conference on Algorithms and Architecture for Parallel Processing; 24–27 August 2014; Dalian, China. Heidelberg, Berlin: Springer. pp. 192-206.
Amini M, Creusillet B, Even S, Keryell R, Goubier O, Guelton S, Mcmahon JO, Pasquier FX, Péan G, Villalon P. Par4All: from convex array regions to heterogeneous computing. In: IMPACT 2012 2nd International workshop on polyhedral compilation techniques; Jan 2012; Paris, France.
Blume W, Eigenmann R, Faigin K, Grout J, Hoeflinger J, Padua D, Petersen P, Pottenger W, Rauchwerger L, Tu P et al. Polaris: Improving the effectiveness of parallelizing compilers. In: International workshop on languages and compilers for parallel computing; 8–10 August 1994; Ithaca, NY, USA. Heidelberg, Berlin: Springer. pp. 141-154.
Dave C, Bae H, Min SJ, Lee S, Eigenmann R, Midkiff S. Cetus: a source-to-source compiler infrastructure for multicores. Computer 2009; 42: 36-42.
Psarris K, Klappholz D, Kong X. On the accuracy of the Banerjee test. J Parallel Distrib Comput 1991; 12: 152-157.
Hara Y, Tomiyama H, Honda S, Takada H, Ishii H. CHStone: A benchmark program suite for practical C-based high-level synthesis. In: IEEE 2008 International Symposium on Circuits and Systems; 18–21 May 2008; Seattle, WA, USA. New York, NY, USA: IEEE. pp. 1192-1195