2. Stream synchronization behavior
NULL stream
The NULL stream or stream 0 is an implicit stream which synchronizes with all other streams in the same CUcontext except for non-blocking streams, described below. (For applications using the runtime APIs only, there will be one context per device.) When an action is taken in the NULL stream such as a kernel launch or cudaStreamWaitEvent(), the NULL stream first waits on all blocking streams, the action is queued in the NULL stream, and then all blocking streams wait on the NULL stream.
For example, the following code launches a kernel k_1 in stream s, then k_2 in the NULL stream, then k_3 in stream s:
k_1<<<1, 1, 0, s>>>(); k_2<<<1, 1>>>(); k_3<<<1, 1, 0, s>>>();
The resulting behavior is that k_2 will block on k_1 and k_3 will block on k_2.
Actions are added in the NULL stream by passing 0 as the cudaStream_t parameter of applicable APIs, or by calling the corresponding synchronous APIs which do not take the cudaStream_t parameter. The NULL stream is also the default for kernel launches if a stream is not specified.
Non-blocking streams which do not synchronize with the NULL stream can be created using the cudaStreamNonBlocking flag with the stream creation APIs.