Last time we saw 2 ways to create managed variables. They can be accessed by either the host or the device and are paged automatically. This makes programming much easier.
- Create static variables with __device__ __managed__. See /parallel-class/stanford/tutorials/vector_addition2.cu on parallel.
- Use cudaMallocManaged. See /parallel-class/stanford/tutorials/vector_addition3.cu on parallel.
- In either case, you need to call cudaDeviceSynchronize(); on the host after starting a parallel kernel before reading the data on the host. The reason is that the kernel is started asynchonously and control returns while it is still executing.
- When the linux kernel gets HMM (heterogeneous memory management), all data on the heap will automatically be managed.
- The reason is that virtual addresses are long enough to contain a tag saying what device they are on. The VM page mapper will read and write pages to various devices, not just swap files.
- Any CUDA example using cudaMemcpy is now obsolete (on Pascal GPUs).