Debugger API :: CUDA Toolkit Documentation

1. Release Notes

6.0 Release

New optimized routines: Following API calls were added to read bulk information about the device and speed up debugging. ReadWarpState() reads the whole state of a warp in a single API call. ReadRegisterRange() reads the value of a range of N registers in a single API call. ResumeWarpsUntilPC() resumes a set of warps until a given PC instead of single-stepping several times.
Adjusted Code Address: On some architectures, some PCs may be invalid and should not be referenced. To help the debugger clients, the API now provides the routine getAdjustedCodeAddress(). Given a code address, the function returns the corresponding valid PC.
Precise Error Reporting: Exceptions are not always precise. The device may stop at at a PC other than the address of the instruction that triggered an exception. On some device architectures, it it sometimes possible to recover the address of that instruction. That address can now be retrieved using the newly introduced readErrorPC() API routine.
ELF Image Notification Events: When the ELF image is unloaded from the device, the debugger client is now notified with a new CUDBG_EVENT_ELF_IMAGE_UNLOADED event type. The ELF image load/unload events also include a new properties field that is currently only used to indicate whether an ELF image corresponds to a set of system kernels or not, which may need to be hidden from the user. Also, the ELF image events now include a handle to the actual copy of the ELF image instead of including the ELF image itself. To retrieve the ELF image, use the getElfImageByHandle() routine.
Unified Memory Support: Existing routines were modified to support Unified Memory and should be used instead of the old ones: read/writeGenericMemory() and read/writeGlobalMemory(). GetManagedMemoryRegionInfo was added to identify the address segments that are considered managed memory and that should therefore require special attention when accessed.
Cleanup on Detach: The detach procedure was simplified and is now symmetrical with the attach procedure. The debugger client must now check if the application needs to be resumed to complete the detach process, just as it is done for the attach process.
State examination on a running device: The state collection functions in the debug API will return CUDBG_ERROR_RUNNING_DEVICE if called without first calling suspendDevice to ensure the device is stopped.
Using singleStepWarp() in an application using CUDA Dynamic Parallelism: Due to changes in the way CUDA Dynamic Parallelism operates, the debug API's singleStepWarp() entry point can now return CUDBG_ERROR_WARP_RESUME_NOT_POSSIBLE. To correctly handle such cases, the debugger client must set a breakpoint at the return address of the current frame and must resume all devices and resume all host threads. When singleStepWarp() returns CUDBG_ERROR_WARP_RESUME_NOT_POSSIBLE, there is no guarantee that hardware state has not been modified. In particular, when running with software preemption, there is no guarantee that any GPU state is valid across the singleStepWarp() call. As a result, debugger clients must invalidate and reanalyze all GPU state after the call if singleStepWarp() returns an error.
Miscellaneous: New error values were added to support the newly added API routines The getNextSync/AsyncEvent() routines were merged into a single getNextEvent() routine with an extra parameter instead.