Sample: CUDA Parallel Prefix Sum with Shuffle Intrinsics (SHFL_Scan) Minimum spec: SM 3.0 This example demonstrates how to use the shuffle intrinsic __shfl_up to perform a scan operation across a thread block. A GPU with Compute Capability SM 3.0. is required to run the sample Key concepts: Data-Parallel Algorithms Performance Strategies