File sortreduce.cu compiled on Mar 23 2015 at 05:06:38 Device name: Quadro K2100M CUDA Capability version: 3.0 CUDA Driver / Runtime Version: 7,000/7,000, CUDA_VERSION: 7,000 Thrust v1.8 TIME thru Starting : CPU: 0.007 (d= 0.007) Some redundant ops are done for timing purposes. N= 100,000,000  Comparing sort on 4 types of memory: TIME thru Sync : CPU: 1.151 (d= 1.144)  Host: TIME thru Create int vec on host : CPU: 1.198 (d= 0.047) TIME thru Transform rand int vec on host : CPU: 1.359 (d= 0.161) TIME thru Sort on host : CPU: 3.150 (d= 1.791) TIME thru Sync : CPU: 3.150 (d= 0.000)  Device: Before creating device vector: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 1,250,750,464 TIME thru Before create int vec on device : CPU: 3.150 (d= 0.000) TIME thru Create int vec on device : CPU: 3.635 (d= 0.485) After creating device vector: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 1,250,750,464 TIME thru Copy host to device : CPU: 3.698 (d= 0.062) TIME thru Transform rand int vec on device : CPU: 4.439 (d= 0.741) TIME thru Sort on device : CPU: 11.076 (d= 6.638) TIME thru Copy device to host : CPU: 11.139 (d= 0.062)  Mapped: Starting mapped test: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 1,250,750,464 TIME thru cudaHostAllocMapped : CPU: 11.242 (d= 0.103) TIME thru Transform mapped array : CPU: 11.980 (d= 0.737) TIME thru sort mapped array : CPU: 18.708 (d= 6.729) TIME thru free and sync : CPU: 18.791 (d= 0.083)  Managed: Starting managed test: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 1,249,939,456 TIME thru cudaMallocManaged : CPU: 18.797 (d= 0.006) after cudaMallocManaged: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 849,608,704 TIME thru Transform managed array : CPU: 20.889 (d= 2.092) TIME thru Transform managed device array : CPU: 21.607 (d= 0.718) TIME thru sort managed array : CPU: 28.227 (d= 6.619) TIME thru sort managed device array : CPU: 34.076 (d= 5.850) TIME thru free and sync : CPU: 34.118 (d= 0.042)  Comparing reduce on 4 types of memory: TIME thru Starting testreduce : CPU: 34.119 (d= 0.001)  Host: TIME thru Create float vec on host : CPU: 34.166 (d= 0.047) TIME thru Transform rand float vec on host : CPU: 34.435 (d= 0.269) TIME thru Generate rand float vec on host : CPU: 36.455 (d= 2.020) sumh= 36,028,797,018,963,968.000 TIME thru Reduce on host : CPU: 36.568 (d= 0.113)  Device: TIME thru Create float vec on device : CPU: 36.979 (d= 0.411) TIME thru Transform rand float vec on device : CPU: 37.496 (d= 0.516) sumd= 4,941,592,339,152,896.000 TIME thru Reduce on device : CPU: 37.664 (d= 0.168)  Mapped: TIME thru cudaHostAllocMapped : CPU: 37.788 (d= 0.124) TIME thru Transform rand float vec on mapped : CPU: 38.298 (d= 0.511) sumd= 4,941,592,339,152,896.000 TIME thru Reduce on mapped : CPU: 38.465 (d= 0.167)  Managed: Starting managed test: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 1,283,850,240 TIME thru cudaMallocManaged : CPU: 38.471 (d= 0.006) after cudaMallocManaged: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 883,818,496 TIME thru Transform managed array : CPU: 41.104 (d= 2.633) TIME thru Transform managed device array : CPU: 41.568 (d= 0.463) sumd= 4,941,592,339,152,896.000 TIME thru Reduce on managed : CPU: 41.742 (d= 0.174) TIME thru free and sync : CPU: 41.782 (d= 0.041)