W Randolph Franklin home page
... (old version)
ParallelComputingSpring2014/ home page Login


Homework 3, due Thurs 2014-02-28

  1. Research and then describe the main changes from NVidia Fermi to Kepler.
  2. Why did NVidia lower the clock frequency for their higher performance GPUs?
  3. Why do GPUs like the K20x have 14 SMXs? That's an unusual number; why not make it 16?
  4. NVidia just announced Maxwell. What are its main points?
  5. Although a thread can use 255 registers, that might be bad for performance. Why?
  6. If a thread needs more local variables than it has registers, were do the extras go?
  7. How to the various threads in a block share data with each other?
  8. Reading a word from global memory might take 400 cycles. Does that mean that a thread that reads many words from global memory will always take hundreds of times longer to complete?
  9. What is divergence in a warp, and why is it bad?
  10. Since the threads in a warp are executed in a SIMD fashion, how can an if-then-else block be executed?