1. In Part 1, our naive scheduling of the subsequent rounds of Black-Scholes jobs incurs a performance penalty due to divergence. When only a handful of threads are active inside a warp, the entire warp of threads must participate in the computation whether they like it or not. On the other hand, when all the threads of a warp are inactive, the warp may quickly "early out" of the kernel. 1a. Derive an expression for the probability that a given warp of threads will be completely inactive during the sparse rounds of Part 1. You may assume that the distribution of active options across the option set is uniform. 1b. Compare the number of options of Part 1's sparse rounds to the total number of threads launched in those rounds. What is the expected number of inactive warps for these launches? 2. In Part 2, our implementation of stream compaction relied on a composition of two parallel patterns, prefix sum (scan) and scatter. While this modularity matches our mental model of the algorithm, physically partitioning the computation into separate kernels may incur overhead due to kernel launch and redundant global memory bandwidth. 2a. Explain which discrete kernels involved in the compaction process may be fused. 2b. Explain why prefix sum is less useful in serial codes. 3. In Part 3, we reorganized the sparse rounds of Black-Scholes by compacting all active work such that we would spawn no inactive warps. Reordering data in this way comes with a substantial memory bandwidth cost. 3a. Time the individual components of stream compaction. Which is the bottleneck for your implementation? Is there more than one? 3b. How many sparse rounds of Black-Scholes are necessary for your compacted schedule of Part 3 to break even with Part 1? 4. Extra Credit 4a. For extra credit, implement the fusion opportunities you identified in 2a. 4b. Our implementation of Black-Scholes uses an approximation to the cumulative normal distribution function. Higher precision algorithms are available. Implement one (or more) and study what effect the cost of this function has on the benefit of stream compaction.