TIME thru Starting, CPU: 0.33 (d= 0.33) Thrust v1.7, CUDA_VERSION: 6,000, CUDA_ARCH: 0 N=500,000,000 Comparing sort on host and device: TIME thru Create int vec on host, CPU: 1.06 (d= 0.73) TIME thru Transf random int vec on host, CPU: 1.82 (d= 0.76) TIME thru Sort on host, CPU: 12.98 (d=11.16) Before creating device vector: cuda_mem_tot=6,039,339,008, cuda_mem_free=5,951,418,368 TIME thru Create int vec on device, CPU: 12.98 (d= 0.00) After creating device vector: cuda_mem_tot=6,039,339,008, cuda_mem_free=3,951,276,032 TIME thru Transf random int vec on device, CPU: 13.00 (d= 0.02) TIME thru Sort on device, CPU: 13.65 (d= 0.65) TIME thru Transferring back to host, CPU: 14.17 (d= 0.52) Comparing reduce on host and device: TIME thru Create float vec on host, CPU: 14.90 (d= 0.73) TIME thru Transf random float vec on host, CPU: 15.73 (d= 0.83) TIME thru Reduce on host, CPU: 15.73 (d= 0.00) TIME thru Create float vec on device, CPU: 15.74 (d= 0.01) TIME thru Transf random float vec on device, CPU: 15.75 (d= 0.01) TIME thru Reduce on device, CPU: 15.77 (d= 0.02)