Jacobi
Functions
Host.c File Reference

This contains the host functions for data allocations, message passing and host-side computations. More...

#include <string.h>
#include <math.h>
#include "Jacobi.h"
Include dependency graph for Host.c:

Functions

void Initialize (int *argc, char ***argv, int *rank, int *size)
 Initialize the MPI environment, allowing the CUDA device to be selected before (if necessary)
void Finalize (real *devBlocks[2], real *devSideEdges[2], real *devHaloLines[2], real *hostSendLines[2], real *hostRecvLines[2], real *devResidue, cudaStream_t copyStream)
 Close (finalize) the MPI environment and deallocate buffers.
int ApplyTopology (int *rank, int size, const int2 *topSize, int *neighbors, int2 *topIndex, MPI_Comm *cartComm)
 Generates the 2D topology and establishes the neighbor relationships between MPI processes.
void InitializeDataChunk (int topSizeY, int topIdxY, const int2 *domSize, const int *neighbors, cudaStream_t *copyStream, real *devBlocks[2], real *devSideEdges[2], real *devHaloLines[2], real *hostSendLines[2], real *hostRecvLines[2], real **devResidue)
 This allocates and initializes all the relevant data buffers before the Jacobi run.
void PreRunJacobi (MPI_Comm cartComm, int rank, int size, double *timerStart)
 This function is called immediately before the main Jacobi loop.
void PostRunJacobi (MPI_Comm cartComm, int rank, int size, const int2 *topSize, const int2 *domSize, int iterations, int useFastSwap, double timerStart, double avgTransferTime)
 This function is called immediately after the main Jacobi loop.
double TransferAllHalos (MPI_Comm cartComm, const int2 *domSize, const int2 *topIndex, const int *neighbors, cudaStream_t copyStream, real *devBlocks[2], real *devSideEdges[2], real *devHaloLines[2], real *hostSendLines[2], real *hostRecvLines[2])
 This performs the exchanging of all necessary halos between 2 neighboring MPI processes.
void RunJacobi (MPI_Comm cartComm, int rank, int size, const int2 *domSize, const int2 *topIndex, const int *neighbors, int useFastSwap, real *devBlocks[2], real *devSideEdges[2], real *devHaloLines[2], real *hostSendLines[2], real *hostRecvLines[2], real *devResidue, cudaStream_t copyStream, int *iterations, double *avgTransferTime)
 This is the main Jacobi loop, which handles device computation and data exchange between MPI processes.

Detailed Description

This contains the host functions for data allocations, message passing and host-side computations.


Function Documentation

int ApplyTopology ( int *  rank,
int  size,
const int2 *  topSize,
int *  neighbors,
int2 *  topIndex,
MPI_Comm *  cartComm 
)

Generates the 2D topology and establishes the neighbor relationships between MPI processes.

Parameters:
[in,out]rankThe rank of the calling MPI process
[in]sizeThe total number of MPI processes available
[in]topSizeThe desired topology size (this must match the number of available MPI processes)
[out]neighborsThe list that will be populated with the direct neighbors of the calling MPI process
[out]topIndexThe 2D index that the calling MPI process will have in the topology
[out]cartCommThe carthesian MPI communicator

Here is the call graph for this function:

void Finalize ( real *  devBlocks[2],
real *  devSideEdges[2],
real *  devHaloLines[2],
real *  hostSendLines[2],
real *  hostRecvLines[2],
real *  devResidue,
cudaStream_t  copyStream 
)

Close (finalize) the MPI environment and deallocate buffers.

Parameters:
[in]devBlocksThe 2 device blocks that were used during the Jacobi iterations
[in]devSideEdgesThe 2 device side edges that were used to hold updated halos before sending
[in]devHaloLinesThe 2 device lines that were used to hold received halos
[in]hostSendLinesThe 2 host send buffers that were used at halo exchange in the normal CUDA & MPI version
[in]hostRecvLinesThe 2 host receive buffers that were used at halo exchange in the normal CUDA & MPI version
[in]devResidueThe global residue, kept in device memory
[in]copyStreamThe stream used to overlap top & bottom halo exchange with side halo copy to host memory
void Initialize ( int *  argc,
char ***  argv,
int *  rank,
int *  size 
)

Initialize the MPI environment, allowing the CUDA device to be selected before (if necessary)

Parameters:
[in,out]argcThe number of command-line arguments
[in,out]argvThe list of command-line arguments
[out]rankThe global rank of the current MPI process
[out]sizeThe total number of MPI processes available

Here is the call graph for this function:

void InitializeDataChunk ( int  topSizeY,
int  topIdxY,
const int2 *  domSize,
const int *  neighbors,
cudaStream_t *  copyStream,
real *  devBlocks[2],
real *  devSideEdges[2],
real *  devHaloLines[2],
real *  hostSendLines[2],
real *  hostRecvLines[2],
real **  devResidue 
)

This allocates and initializes all the relevant data buffers before the Jacobi run.

Parameters:
[in]topSizeYThe size of the topology in the Y direction
[in]topIdxYThe Y index of the calling MPI process in the topology
[in]domSizeThe size of the local domain (for which only the current MPI process is responsible)
[in]neighborsThe neighbor ranks, according to the topology
[in]copyStreamThe stream used to overlap top & bottom halo exchange with side halo copy to host memory
[out]devBlocksThe 2 device blocks that will be updated during the Jacobi run
[out]devSideEdgesThe 2 side edges (parallel to the Y direction) that will hold the packed halo values before sending them
[out]devHaloLinesThe 2 halo lines (parallel to the Y direction) that will hold the packed halo values after receiving them
[out]hostSendLinesThe 2 host send buffers that will be used during the halo exchange by the normal CUDA & MPI version
[out]hostRecvLinesThe 2 host receive buffers that will be used during the halo exchange by the normal CUDA & MPI version
[out]devResidueThe global device residue, which will be updated after every Jacobi iteration
void PostRunJacobi ( MPI_Comm  cartComm,
int  rank,
int  size,
const int2 *  topSize,
const int2 *  domSize,
int  iterations,
int  useFastSwap,
double  timerStart,
double  avgTransferTime 
)

This function is called immediately after the main Jacobi loop.

Parameters:
[in]cartCommThe carthesian communicator
[in]rankThe rank of the calling MPI process
[in]topSizeThe size of the topology
[in]domSizeThe size of the local domain
[in]iterationsThe number of successfully completed Jacobi iterations
[in]useFastSwapThe flag indicating if fast pointer swapping was used to exchange blocks
[in]timerStartThe Jacobi loop starting moment (measured as wall-time)
[in]avgTransferTimeThe average time spent performing MPI transfers (per process)
void PreRunJacobi ( MPI_Comm  cartComm,
int  rank,
int  size,
double *  timerStart 
)

This function is called immediately before the main Jacobi loop.

Parameters:
[in]cartCommThe carthesian communicator
[in]rankThe rank of the calling MPI process
[in]sizeThe total number of MPI processes available
[out]timerStartThe Jacobi loop starting moment (measured as wall-time)
void RunJacobi ( MPI_Comm  cartComm,
int  rank,
int  size,
const int2 *  domSize,
const int2 *  topIndex,
const int *  neighbors,
int  useFastSwap,
real *  devBlocks[2],
real *  devSideEdges[2],
real *  devHaloLines[2],
real *  hostSendLines[2],
real *  hostRecvLines[2],
real *  devResidue,
cudaStream_t  copyStream,
int *  iterations,
double *  avgTransferTime 
)

This is the main Jacobi loop, which handles device computation and data exchange between MPI processes.

Parameters:
[in]cartCommThe carthesian MPI communicator
[in]rankThe rank of the calling MPI process
[in]sizeThe number of available MPI processes
[in]domSizeThe 2D size of the local domain
[in]topIndexThe 2D index of the calling MPI process in the topology
[in]neighborsThe list of ranks which are direct neighbors to the caller
[in]useFastSwapThis flag indicates if blocks should be swapped through pointer copy (faster) or through element-by-element copy (slower)
[in,out]devBlocksThe 2 device blocks that are updated during the Jacobi run
[in,out]devSideEdgesThe 2 side edges (parallel to the Y direction) that hold the packed halo values before sending them
[in,out]devHaloLinesThe 2 halo lines (parallel to the Y direction) that hold the packed halo values after receiving them
[in,out]hostSendLinesThe 2 host send buffers that are used during the halo exchange by the normal CUDA & MPI version
[in,out]hostRecvLinesThe 2 host receive buffers that are used during the halo exchange by the normal CUDA & MPI version
[in,out]devResidueThe global device residue, which gets updated after every Jacobi iteration
[in]copyStreamThe stream used to overlap top & bottom halo exchange with side halo copy to host memory
[out]iterationsThe number of successfully completed iterations
[out]avgTransferTimeThe average time spent performing MPI transfers (per process)

Here is the call graph for this function:

double TransferAllHalos ( MPI_Comm  cartComm,
const int2 *  domSize,
const int2 *  topIndex,
const int *  neighbors,
cudaStream_t  copyStream,
real *  devBlocks[2],
real *  devSideEdges[2],
real *  devHaloLines[2],
real *  hostSendLines[2],
real *  hostRecvLines[2] 
)

This performs the exchanging of all necessary halos between 2 neighboring MPI processes.

Parameters:
[in]cartCommThe carthesian MPI communicator
[in]domSizeThe 2D size of the local domain
[in]topIndexThe 2D index of the calling MPI process in the topology
[in]neighborsThe list of ranks which are direct neighbors to the caller
[in]copyStreamThe stream used to overlap top & bottom halo exchange with side halo copy to host memory
[in,out]devBlocksThe 2 device blocks that are updated during the Jacobi run
[in,out]devSideEdgesThe 2 side edges (parallel to the Y direction) that hold the packed halo values before sending them
[in,out]devHaloLinesThe 2 halo lines (parallel to the Y direction) that hold the packed halo values after receiving them
[in,out]hostSendLinesThe 2 host send buffers that are used during the halo exchange by the normal CUDA & MPI version
[in,out]hostRecvLinesThe 2 host receive buffers that are used during the halo exchange by the normal CUDA & MPI version
Returns:
The time spent during the MPI transfers

Here is the call graph for this function:

 All Files Functions Defines