The implementation details for the normal CUDA & MPI version. More...

#include "Jacobi.h"

Include dependency graph for CUDA_Normal_MPI.c:

Functions
void	SetDeviceBeforeInit ()
	This allows the MPI process to set the CUDA device before the MPI environment is initialized For the normal CUDA & MPI version, the device will be set later on, so this implementation does nothing.
void	SetDeviceAfterInit (int rank)
	This allows the MPI process to set the CUDA device after the MPI environment is initialized For the normal CUDA & MPI version, this is the only place where the MPI process actually sets the CUDA device. Since there can be more than one MPI process working with a given CUDA device, actually selecting the device is done using the global rank. This will yield best results (the least GPU contention) if the ranks are consecutive.
void	ExchangeHalos (MPI_Comm cartComm, real devSend, real hostSend, real hostRecv, real devRecv, int neighbor, int elemCount)
	Exchange halo values between 2 direct neighbors This is the main difference between the normal CUDA & MPI version and the CUDA-aware MPI version. In the former, the exchange first requires a copy from device to host memory, an MPI call using the host buffer and lastly, a copy of the received host buffer back to the device memory. In the latter, the host buffers are completely skipped, as the MPI environment uses the device buffers directly.

Detailed Description

The implementation details for the normal CUDA & MPI version.

Function Documentation

void ExchangeHalos	(	MPI_Comm	cartComm,
		real *	devSend,
		real *	hostSend,
		real *	hostRecv,
		real *	devRecv,
		int	neighbor,
		int	elemCount
	)

Exchange halo values between 2 direct neighbors This is the main difference between the normal CUDA & MPI version and the CUDA-aware MPI version. In the former, the exchange first requires a copy from device to host memory, an MPI call using the host buffer and lastly, a copy of the received host buffer back to the device memory. In the latter, the host buffers are completely skipped, as the MPI environment uses the device buffers directly.

Parameters:

[in]	cartComm	The carthesian MPI communicator
[in]	devSend	The device buffer that needs to be sent
[in]	hostSend	The host buffer where the device buffer is first copied to
[in]	hostRecv	The host buffer that receives the halo values directly
[in]	devRecv	The device buffer where the receiving host buffer is copied to
[in]	neighbor	The rank of the neighbor MPI process in the carthesian communicator
[in]	elemCount	The number of elements to transfer

void SetDeviceAfterInit ( int rank )

This allows the MPI process to set the CUDA device after the MPI environment is initialized For the normal CUDA & MPI version, this is the only place where the MPI process actually sets the CUDA device. Since there can be more than one MPI process working with a given CUDA device, actually selecting the device is done using the global rank. This will yield best results (the least GPU contention) if the ranks are consecutive.

This allows the MPI process to set the CUDA device after the MPI environment is initialized For the CUDA-aware MPI version, there is nothing to be done here.

Parameters:

[in] rank The global rank of the calling MPI process

void SetDeviceBeforeInit ( )

This allows the MPI process to set the CUDA device before the MPI environment is initialized For the normal CUDA & MPI version, the device will be set later on, so this implementation does nothing.

This allows the MPI process to set the CUDA device before the MPI environment is initialized For the CUDA-aware MPI version, the is the only place where the device gets set. In order to do this, we rely on the node's local rank, as the MPI environment has not been initialized yet.

Functions

Detailed Description

Function Documentation