3.6. Execution Control
This section describes the execution control functions of the CUDA runtime application programming interface.
Some functions have overloaded C++ API template versions documented separately in the C++ API Routines module.
Functions
- cudaError_t cudaConfigureCall ( dim3 gridDim, dim3 blockDim, size_t sharedMem = 0, cudaStream_t stream = 0 )
- Configure a device-launch.
- cudaError_t cudaFuncGetAttributes ( cudaFuncAttributes* attr, const void* func )
- Find out attributes for a given function.
- cudaError_t cudaFuncSetCacheConfig ( const void* func, cudaFuncCache cacheConfig )
- Sets the preferred cache configuration for a device function.
- cudaError_t cudaFuncSetSharedMemConfig ( const void* func, cudaSharedMemConfig config )
- Sets the shared memory configuration for a device function.
- cudaError_t cudaLaunch ( const void* func )
- Launches a device function.
- cudaError_t cudaSetDoubleForDevice ( double* d )
- Converts a double argument to be executed on a device.
- cudaError_t cudaSetDoubleForHost ( double* d )
- Converts a double argument after execution on a device.
- cudaError_t cudaSetupArgument ( const void* arg, size_t size, size_t offset )
- Configure a device launch.
Functions
- cudaError_t cudaConfigureCall ( dim3 gridDim, dim3 blockDim, size_t sharedMem = 0, cudaStream_t stream = 0 )
-
Configure a device-launch.
Parameters
- gridDim
- - Grid dimensions
- blockDim
- - Block dimensions
- sharedMem
- - Shared memory
- stream
- - Stream identifier
Returns
Description
Specifies the grid and block dimensions for the device call to be executed similar to the execution configuration syntax. cudaConfigureCall() is stack based. Each call pushes data on top of an execution stack. This data contains the dimension for the grid and thread blocks, together with any arguments for the call.
Note:-
This function uses standard NULL stream semantics.
-
Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaFuncSetCacheConfig ( C API), cudaFuncGetAttributes ( C API), cudaLaunch ( C API), cudaSetDoubleForDevice, cudaSetDoubleForHost, cudaSetupArgument ( C API),
- cudaError_t cudaFuncGetAttributes ( cudaFuncAttributes* attr, const void* func )
-
Find out attributes for a given function.
Parameters
- attr
- - Return pointer to function's attributes
- func
- - Device function symbol
Description
This function obtains the attributes of a function specified via func. func is a device function symbol and must be declared as a __global__ function. The fetched attributes are placed in attr. If the specified function does not exist, then cudaErrorInvalidDeviceFunction is returned.
Note that some function attributes such as maxThreadsPerBlock may vary based on the device that is currently being used.
Note:-
Note that this function may also return error codes from previous, asynchronous launches.
-
Use of a string naming a function as the func paramater was deprecated in CUDA 4.1 and removed in CUDA 5.0.
See also:
cudaConfigureCall, cudaFuncSetCacheConfig ( C API), cudaFuncGetAttributes ( C++ API), cudaLaunch ( C API), cudaSetDoubleForDevice, cudaSetDoubleForHost, cudaSetupArgument ( C API)
- cudaError_t cudaFuncSetCacheConfig ( const void* func, cudaFuncCache cacheConfig )
-
Sets the preferred cache configuration for a device function.
Parameters
- func
- - Device function symbol
- cacheConfig
- - Requested cache configuration
Description
On devices where the L1 cache and shared memory use the same hardware resources, this sets through cacheConfig the preferred cache configuration for the function specified via func. This is only a preference. The runtime will use the requested configuration if possible, but it is free to choose a different configuration if required to execute func.
func is a device function symbol and must be declared as a __global__ function. If the specified function does not exist, then cudaErrorInvalidDeviceFunction is returned.
This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.
Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.
The supported cache configurations are:
-
cudaFuncCachePreferNone: no preference for shared memory or L1 (default)
-
cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache
-
cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory
Note:-
Note that this function may also return error codes from previous, asynchronous launches.
-
Use of a string naming a function as the func paramater was deprecated in CUDA 4.1 and removed in CUDA 5.0.
See also:
cudaConfigureCall, cudaFuncSetCacheConfig ( C++ API), cudaFuncGetAttributes ( C API), cudaLaunch ( C API), cudaSetDoubleForDevice, cudaSetDoubleForHost, cudaSetupArgument ( C API), cudaThreadGetCacheConfig, cudaThreadSetCacheConfig
- cudaError_t cudaFuncSetSharedMemConfig ( const void* func, cudaSharedMemConfig config )
-
Sets the shared memory configuration for a device function.
Parameters
- func
- - Device function symbol
- config
- - Requested shared memory configuration
Returns
cudaSuccess, cudaErrorInitializationError, cudaErrorInvalidDeviceFunction, cudaErrorInvalidValue,
Description
On devices with configurable shared memory banks, this function will force all subsequent launches of the specified device function to have the given shared memory bank size configuration. On any given launch of the function, the shared memory configuration of the device will be temporarily changed if needed to suit the function's preferred configuration. Changes in shared memory configuration between subsequent launches of functions, may introduce a device side synchronization point.
Any per-function setting of shared memory bank size set via cudaFuncSetSharedMemConfig will override the device wide setting set by cudaDeviceSetSharedMemConfig.
Changing the shared memory bank size will not increase shared memory usage or affect occupancy of kernels, but may have major effects on performance. Larger bank sizes will allow for greater potential bandwidth to shared memory, but will change what kinds of accesses to shared memory will result in bank conflicts.
This function will do nothing on devices with fixed shared memory bank size.
The supported bank configurations are:
-
cudaSharedMemBankSizeDefault: use the device's shared memory configuration when launching this function.
-
cudaSharedMemBankSizeFourByte: set shared memory bank width to be four bytes natively when launching this function.
-
cudaSharedMemBankSizeEightByte: set shared memory bank width to be eight bytes natively when launching this function.
Note:-
Note that this function may also return error codes from previous, asynchronous launches.
-
Use of a string naming a function as the func paramater was deprecated in CUDA 4.1 and removed in CUDA 5.0.
See also:
cudaConfigureCall, cudaDeviceSetSharedMemConfig, cudaDeviceGetSharedMemConfig, cudaDeviceSetCacheConfig, cudaDeviceGetCacheConfig, cudaFuncSetCacheConfig
- cudaError_t cudaLaunch ( const void* func )
-
Launches a device function.
Parameters
- func
- - Device function symbol
Returns
cudaSuccess, cudaErrorInvalidDeviceFunction, cudaErrorInvalidConfiguration, cudaErrorLaunchFailure, cudaErrorLaunchTimeout, cudaErrorLaunchOutOfResources, cudaErrorSharedObjectInitFailed
Description
Launches the function func on the device. The parameter func must be a device function symbol. The parameter specified by func must be declared as a __global__ function. cudaLaunch() must be preceded by a call to cudaConfigureCall() since it pops the data that was pushed by cudaConfigureCall() from the execution stack.
Note:-
Note that this function may also return error codes from previous, asynchronous launches.
-
Use of a string naming a variable as the symbol paramater was removed in CUDA 5.0.
See also:
cudaConfigureCall, cudaFuncSetCacheConfig ( C API), cudaFuncGetAttributes ( C API), cudaLaunch ( C++ API), cudaSetDoubleForDevice, cudaSetDoubleForHost, cudaSetupArgument ( C API), cudaThreadGetCacheConfig, cudaThreadSetCacheConfig
- cudaError_t cudaSetDoubleForDevice ( double* d )
-
Converts a double argument to be executed on a device.
Parameters
- d
- - Double to convert
Returns
Description
Converts the double value of d to an internal float representation if the device does not support double arithmetic. If the device does natively support doubles, then this function does nothing.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaConfigureCall, cudaFuncSetCacheConfig ( C API), cudaFuncGetAttributes ( C API), cudaLaunch ( C API), cudaSetDoubleForHost, cudaSetupArgument ( C API)
- cudaError_t cudaSetDoubleForHost ( double* d )
-
Converts a double argument after execution on a device.
Parameters
- d
- - Double to convert
Returns
Description
Converts the double value of d from a potentially internal float representation if the device does not support double arithmetic. If the device does natively support doubles, then this function does nothing.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaConfigureCall, cudaFuncSetCacheConfig ( C API), cudaFuncGetAttributes ( C API), cudaLaunch ( C API), cudaSetDoubleForDevice, cudaSetupArgument ( C API)
- cudaError_t cudaSetupArgument ( const void* arg, size_t size, size_t offset )
-
Configure a device launch.
Parameters
- arg
- - Argument to push for a kernel launch
- size
- - Size of argument
- offset
- - Offset in argument stack to push new arg
Returns
Description
Pushes size bytes of the argument pointed to by arg at offset bytes from the start of the parameter passing area, which starts at offset 0. The arguments are stored in the top of the execution stack. cudaSetupArgument() must be preceded by a call to cudaConfigureCall().
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaConfigureCall, cudaFuncSetCacheConfig ( C API), cudaFuncGetAttributes ( C API), cudaLaunch ( C API), cudaSetDoubleForDevice, cudaSetDoubleForHost, cudaSetupArgument ( C++ API),