NVIDIA CUDA Sample "radixSortThrust" ---------------------------------- -------- OVERVIEW -------- This sample demonstrates a very fast and efficient parallel radix sort implemented in C for CUDA. The included RadixSort class can sort either key-value pairs (with float or unsigned integer keys) or keys only. It can also sort unsigned integer keys based on a varying number of least-significant bits ranging from 4 to 32 in multiples of 4. This radix sort code and the underlying algorithm is discussed in detail in the paper "Designing Efficient Sorting Algorithms for Manycore GPUs". A PDF version of this paper is available at http://mgarland.org/files/papers/gpusort-ipdps09.pdf ----- USAGE ----- To run a sort with default options (Sort 1M unsigned integer key-value pairs), just invoke the executable ("radixSort.exe" on Windows, "radixSort" otherwise). The following command line options are available: -n= : number of elements to sort -keysonly : sort only an array of keys (the default is to sort key-value pairs) -float : use 32-bit float keys -keybits= : Use only the B least-significant bits of the keys for the sort : B must be a multiple of 4. This option does not apply to float keys -quiet : Output only the number of elements and the time to sort -help : Output a help message The RadixSort class can also be used within your application by building the radixsort.cu file into your application or library, and including the radixsort.h header file. -------- CITATION -------- Satish, N., Harris, M., and Garland, M. "Designing Efficient Sorting Algorithms for Manycore GPUs". In Proceedings of IEEE International Parallel & Distributed Processing Symposium 2009 (IPDPS 2009). PDF: http://mgarland.org/files/papers/gpusort-ipdps09.pdf