PAR Lecture 9, Thurs Feb 16

Table of contents

2   CUDA

Last time we saw 2 ways to create managed variables. They can be accessed by either the host or the device and are paged automatically. This makes programming much easier.

  1. Create static variables with __device__ __managed__. See /parallel-class/stanford/tutorials/vector_addition2.cu on parallel.
  2. Use cudaMallocManaged. See /parallel-class/stanford/tutorials/vector_addition3.cu on parallel.
  3. In either case, you need to call cudaDeviceSynchronize(); on the host after starting a parallel kernel before reading the data on the host. The reason is that the kernel is started asynchonously and control returns while it is still executing.
  4. When the linux kernel gets HMM (heterogeneous memory management), all data on the heap will automatically be managed.
  5. The reason is that virtual addresses are long enough to contain a tag saying what device they are on. The VM page mapper will read and write pages to various devices, not just swap files.
  6. Any CUDA example using cudaMemcpy is now obsolete (on Pascal GPUs).

Comments

Comments powered by Disqus