Hybrid MPI+UPC parallel programming paradigm on an SMP cluster

The symmetric multiprocessing (SMP) cluster system, which consists of shared memory nodes with several multicore central processing units connected to a high-speed network to form a distributed memory system, is the most widely available hardware architecture for the high-performance computing community. Today, the Message Passing Interface (MPI) is the most widely used parallel programming paradigm for SMP clusters, in which the MPI provides programming both for an SMP node and among nodes simultaneously. However, Unified Parallel C (UPC) is an emerging alternative that supports the partitioned global address space model that can be again employed within and across the nodes of a cluster. In this paper, we describe a hybrid parallel programming paradigm that was designed to combine MPI and UPC programming models. This paradigm's objective is to mix the MPI's data locality control and scalability strengths with UPC's fine-grain parallelism and ease of programming to achieve multiple-level parallelism at the SMP cluster, which itself has multilevel parallel architecture. Utilizing a proposed hybrid model and comparing MPI-only to UPC-only implementations, this paper presents a detailed description of Cannon's algorithm benchmark application with performance results of a random-access benchmark and the Barnes--Hut N-Body simulation. Experiments indicate that the hybrid MPI+UPC model can significantly provide performance increases of up to double in comparison with UPC-only implementation, and up to 20% increases in comparison to MPI-only implementation. Furthermore, an optimization was achieved that improved the hybrid performance by an additional 20%.

Hybrid MPI+UPC parallel programming paradigm on an SMP cluster

The symmetric multiprocessing (SMP) cluster system, which consists of shared memory nodes with several multicore central processing units connected to a high-speed network to form a distributed memory system, is the most widely available hardware architecture for the high-performance computing community. Today, the Message Passing Interface (MPI) is the most widely used parallel programming paradigm for SMP clusters, in which the MPI provides programming both for an SMP node and among nodes simultaneously. However, Unified Parallel C (UPC) is an emerging alternative that supports the partitioned global address space model that can be again employed within and across the nodes of a cluster. In this paper, we describe a hybrid parallel programming paradigm that was designed to combine MPI and UPC programming models. This paradigm's objective is to mix the MPI's data locality control and scalability strengths with UPC's fine-grain parallelism and ease of programming to achieve multiple-level parallelism at the SMP cluster, which itself has multilevel parallel architecture. Utilizing a proposed hybrid model and comparing MPI-only to UPC-only implementations, this paper presents a detailed description of Cannon's algorithm benchmark application with performance results of a random-access benchmark and the Barnes--Hut N-Body simulation. Experiments indicate that the hybrid MPI+UPC model can significantly provide performance increases of up to double in comparison with UPC-only implementation, and up to 20% increases in comparison to MPI-only implementation. Furthermore, an optimization was achieved that improved the hybrid performance by an additional 20%.