File sortreduce.cu compiled on Mar 23 2015 at 04:59:52 Device name: Quadro K2100M CUDA Capability version: 3.0 CUDA Driver / Runtime Version: 7,000/7,000, CUDA_VERSION: 7,000 Thrust v1.8 TIME thru Starting : CPU: 0.007 (d= 0.007) Some redundant ops are done for timing purposes. N= 100,000,000  Comparing sort on 4 types of memory: TIME thru Sync : CPU: 0.065 (d= 0.058)  Host: TIME thru Create int vec on host : CPU: 0.111 (d= 0.046) TIME thru Transform rand int vec on host : CPU: 0.508 (d= 0.397) TIME thru Sort on host : CPU: 2.269 (d= 1.761) TIME thru Sync : CPU: 2.269 (d= 0.000)  Device: Before creating device vector: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 1,283,903,488 TIME thru Before create int vec on device : CPU: 2.270 (d= 0.000) TIME thru Create int vec on device : CPU: 2.270 (d= 0.001) After creating device vector: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 883,847,168 TIME thru Copy host to device : CPU: 2.340 (d= 0.070) TIME thru Transform rand int vec on device : CPU: 2.340 (d= 0.000) TIME thru Sort on device : CPU: 2.750 (d= 0.410) TIME thru Copy device to host : CPU: 2.812 (d= 0.062)  Mapped: Starting mapped test: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 1,283,854,336 TIME thru cudaHostAllocMapped : CPU: 2.916 (d= 0.104) TIME thru Transform mapped array : CPU: 2.916 (d= 0.000) TIME thru sort mapped array : CPU: 3.552 (d= 0.635) TIME thru free and sync : CPU: 3.623 (d= 0.071)  Managed: Starting managed test: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 1,283,043,328 TIME thru cudaMallocManaged : CPU: 3.628 (d= 0.005) after cudaMallocManaged: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 882,712,576 TIME thru Transform managed array : CPU: 3.628 (d= 0.000) TIME thru Transform managed device array : CPU: 3.628 (d= 0.000) TIME thru sort managed array : CPU: 4.044 (d= 0.416) TIME thru sort managed device array : CPU: 4.439 (d= 0.395) TIME thru free and sync : CPU: 4.440 (d= 0.001)  Comparing reduce on 4 types of memory: TIME thru Starting testreduce : CPU: 4.441 (d= 0.001)  Host: TIME thru Create float vec on host : CPU: 4.487 (d= 0.046) TIME thru Transform rand float vec on host : CPU: 4.950 (d= 0.463) TIME thru Generate rand float vec on host : CPU: 6.964 (d= 2.013) sumh= 36,028,797,018,963,968.000 TIME thru Reduce on host : CPU: 7.077 (d= 0.113)  Device: TIME thru Create float vec on device : CPU: 7.077 (d= 0.001) TIME thru Transform rand float vec on device : CPU: 7.077 (d= 0.000) sumd= 4,994,873,019,072,512.000 TIME thru Reduce on device : CPU: 7.119 (d= 0.042)  Mapped: TIME thru cudaHostAllocMapped : CPU: 7.222 (d= 0.103) TIME thru Transform rand float vec on mapped : CPU: 7.222 (d= 0.000) sumd= 4,994,873,019,072,512.000 TIME thru Reduce on mapped : CPU: 7.292 (d= 0.070)  Managed: Starting managed test: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 1,273,700,352 TIME thru cudaMallocManaged : CPU: 7.293 (d= 0.001) after cudaMallocManaged: cuda_mem_tot= 2,147,287,040, cuda_mem_free= 873,668,608 TIME thru Transform managed array : CPU: 7.293 (d= 0.000) TIME thru Transform managed device array : CPU: 7.293 (d= 0.000) sumd= 4,994,873,019,072,512.000 TIME thru Reduce on managed : CPU: 7.344 (d= 0.051) TIME thru free and sync : CPU: 7.345 (d= 0.001)