Jacobi
Functions
Device.cu File Reference

This contains the device kernels as well as the host wrappers for these kernels. More...

#include "Jacobi.h"
Include dependency graph for Device.cu:

Functions

template<>
__device__ void AtomicMax (float *const address, const float value)
 Compute the maximum of 2 single-precision floating point values using an atomic operation.
template<>
__device__ void AtomicMax (double *const address, const double value)
 Compute the maximum of 2 double-precision floating point values using an atomic operation.
__global__ void HaloToBufferKernel (real *__restrict__ const block, const real *__restrict__ const haloLeft, const real *__restrict__ const haloRight, int2 size, int hasLeftNeighbor, int hasRightNeighbor)
 The device kernel for copying (unpacking) the values from the halo buffers to the left and right side of the data block.
__global__ void BufferToHaloKernel (const real *__restrict__ const block, real *__restrict__ const leftSideEdge, real *__restrict__ const rightSideEdge, int2 size, int hasLeftNeighbor, int hasRightNeighbor)
 The device kernel for copying (packing) the values on the left and right side of the data block to separate, contiguous buffers.
__global__ void JacobiComputeKernel (const real *__restrict__ const oldBlock, real *__restrict__ const newBlock, int4 bounds, int stride, real *devResidue)
 The device kernel for one Jacobi iteration.
__global__ void CopyBlockKernel (const real *__restrict__ const srcBlock, real *__restrict__ const dstBlock, int4 bounds, int stride)
 The host wrapper for copying the updated block over the old one, after a Jacobi iteration finishes.
void CheckCudaCall (cudaError_t command, const char *commandName, const char *fileName, int line)
 The host function for checking the result of a CUDA API call.
real CallJacobiKernel (real *devBlocks[2], real *devResidue, const int4 *bounds, const int2 *size)
 The host wrapper for one Jacobi iteration.
void CopyDeviceBlock (real *devBlocks[2], const int4 *bounds, const int2 *size)
 The host wrapper for copying the updated block over the old one, after a Jacobi iteration finishes.
void CopyDevHalosToBlock (real *devBlock, const real *devHaloLineLeft, const real *devHaloLineRight, const int2 *size, const int *neighbors)
 The host wrapper for copying (unpacking) the values from the halo buffers to the left and right side of the data block.
void CopyDevSideEdgesFromBlock (const real *devBlock, real *devSideEdges[2], const int2 *size, const int *neighbors, cudaStream_t copyStream)
 The host wrapper for copying (packing) the values on the left and right side of the data block to separate, contiguous buffers.

Detailed Description

This contains the device kernels as well as the host wrappers for these kernels.


Function Documentation

template<>
__device__ void AtomicMax ( float *const  address,
const float  value 
)

Compute the maximum of 2 single-precision floating point values using an atomic operation.

Parameters:
[in]addressThe address of the reference value which might get updated with the maximum
[in]valueThe value that is compared to the reference in order to determine the maximum
template<>
__device__ void AtomicMax ( double *const  address,
const double  value 
)

Compute the maximum of 2 double-precision floating point values using an atomic operation.

Parameters:
[in]addressThe address of the reference value which might get updated with the maximum
[in]valueThe value that is compared to the reference in order to determine the maximum
__global__ void BufferToHaloKernel ( const real *__restrict__ const  block,
real *__restrict__ const  leftSideEdge,
real *__restrict__ const  rightSideEdge,
int2  size,
int  hasLeftNeighbor,
int  hasRightNeighbor 
)

The device kernel for copying (packing) the values on the left and right side of the data block to separate, contiguous buffers.

Parameters:
[in]blockThe 2D device block containing the updated values after a Jacobi run
[out]leftSideEdgeThe left edge, holding the left-most updated halos
[out]rightSideEdgeThe right edge, holding the right-most updated halos
[in]sizeThe 2D size of data block, excluding the edges which hold the halo values for the next iteration
[in]hasLeftNeighborMarks if the calling MPI process has a left neighbor
[in]hasRightNeighborMarks if the calling MPI process has a right neighbor
real CallJacobiKernel ( real *  devBlocks[2],
real *  devResidue,
const int4 *  bounds,
const int2 *  size 
)

The host wrapper for one Jacobi iteration.

Parameters:
[in,out]devBlocksThe 2 blocks involved: the first is the current one, the second is the one to be updated
[out]devResidueThe global residue that is to be updated through the Jacobi iteration
[in]boundsThe bounds of the rectangular block region that holds only computable values
[in]sizeThe 2D size of data blocks, excluding the edges which hold the halo values

Here is the call graph for this function:

void CheckCudaCall ( cudaError_t  command,
const char *  commandName,
const char *  fileName,
int  line 
)

The host function for checking the result of a CUDA API call.

Parameters:
[in]commandThe result of the previously-issued CUDA API call
[in]commandNameThe name of the issued API call
[in]fileNameThe name of the file where the API call occurred
[in]lineThe line in the file where the API call occurred
__global__ void CopyBlockKernel ( const real *__restrict__ const  srcBlock,
real *__restrict__ const  dstBlock,
int4  bounds,
int  stride 
)

The host wrapper for copying the updated block over the old one, after a Jacobi iteration finishes.

Parameters:
[in]srcBlockThe source block, from which data will be read
[out]dstBlockThe destination block, where data will be written
[in]boundsThe bounds of the rectangular updated region (holding only computable values)
[in]sizeThe stride of the data blocks, excluding the halo values
void CopyDevHalosToBlock ( real *  devBlock,
const real *  devHaloLineLeft,
const real *  devHaloLineRight,
const int2 *  size,
const int *  neighbors 
)

The host wrapper for copying (unpacking) the values from the halo buffers to the left and right side of the data block.

Parameters:
[out]devBlockThe 2D device block that will contain the halo values after unpacking
[in]devHaloLineLeftThe halo buffer for the left side of the data block
[in]devHaloLineRightThe halo buffer for the right side of the data block
[in]sizeThe 2D size of data block, excluding the edges which hold the halo values
[in]neighborsThe ranks of the neighboring MPI processes

Here is the call graph for this function:

void CopyDeviceBlock ( real *  devBlocks[2],
const int4 *  bounds,
const int2 *  size 
)

The host wrapper for copying the updated block over the old one, after a Jacobi iteration finishes.

Parameters:
[in,out]devBlocksThe 2 blocks involved: the first is the old one, the second is the updated one
[in]boundsThe bounds of the rectangular updated region (holding only computable values)
[in]sizeThe 2D size of data blocks, excluding the edges which hold the halo values

Here is the call graph for this function:

void CopyDevSideEdgesFromBlock ( const real *  devBlock,
real *  devSideEdges[2],
const int2 *  size,
const int *  neighbors,
cudaStream_t  copyStream 
)

The host wrapper for copying (packing) the values on the left and right side of the data block to separate, contiguous buffers.

Parameters:
[in]devBlockThe 2D device block containing the updated values after a Jacobi run
[out]devSideEdgesThe buffers where the edge values will be packed in
[in]sizeThe 2D size of data block, excluding the edges which hold the halo values for the next iteration
[in]neighborsThe ranks of the neighboring MPI processes
[in]copyStreamThe stream on which this kernel will be executed

Here is the call graph for this function:

__global__ void HaloToBufferKernel ( real *__restrict__ const  block,
const real *__restrict__ const  haloLeft,
const real *__restrict__ const  haloRight,
int2  size,
int  hasLeftNeighbor,
int  hasRightNeighbor 
)

The device kernel for copying (unpacking) the values from the halo buffers to the left and right side of the data block.

Parameters:
[out]blockThe 2D device block that will contain the halo values after unpacking
[in]haloLeftThe halo buffer for the left side of the data block
[in]haloRightThe halo buffer for the right side of the data block
[in]sizeThe 2D size of data block, excluding the edges which hold the halo values
[in]hasLeftNeighborMarks if the calling MPI process has a left neighbor
[in]hasRightNeighborMarks if the calling MPI process has a right neighbor
__global__ void JacobiComputeKernel ( const real *__restrict__ const  oldBlock,
real *__restrict__ const  newBlock,
int4  bounds,
int  stride,
real *  devResidue 
)

The device kernel for one Jacobi iteration.

Parameters:
[in]oldBlockThe current (old) block providing the values at the start of the iteration
[out]newBlockThe new block that will hold the updated values after the iteration finishes
[in]boundsThe bounds of the rectangular block region that holds only computable values
[in]strideThe stride of the data blocks, excluding the edges holding the halo values
[out]devResidueThe global residue that is to be updated through the iteration
 All Files Functions Defines