Homework 3, due Thurs 2014-02-28
- Research and then describe the main changes from NVidia Fermi to Kepler.
- Why did NVidia lower the clock frequency for their higher performance GPUs?
- Why do GPUs like the K20x have 14 SMXs? That's an unusual number; why
not make it 16?
- NVidia just announced Maxwell. What are its main points?
- Although a thread can use 255 registers, that might be bad for
performance. Why?
- If a thread needs more local variables than it has registers, were do the
extras go?
- How to the various threads in a block share data with each other?
- Reading a word from global memory might take 400 cycles. Does that mean
that a thread that reads many words from global memory will always take
hundreds of times longer to complete?
- What is divergence in a warp, and why is it bad?
- Since the threads in a warp are executed in a SIMD fashion, how can an
if-then-else block be executed?