Skip to main content

PAR Class 21, Mon 2021-04-12

1 Final projects

  1. See the syllabus.

  2. April 19: team and title, and 1-2 minute proposal presentation to class.

  3. April 26: 100 word project report.

  4. April 26, 29, or May 3: 10-15 minute presentations. Email me with preferred dates.

  5. May 5: report due.

2 Thrust, concl

  1. Universal_vector uses unified memory. No need for host vs device. E.g.,

    /parclass/2021/files/thrust/rpi/tiled_range2u.cu vs tiled_range2.cu

  2. This might have a 2x performance penalty. Dunno.

  3. You can force a function to execute on the host, or on the device, with an extra 1st argument.

  4. There are compiler switches to define what the host or device should be. Common devices are host single thread, host OpenMP, host TBB, GPU. You could conceivably add your own.

    The theory is that no source code changes are required.

3 Several parallel Nvidia tools

  1. OpenMP, OpenACC, Thrust, CUDA, C++17, ...

  2. https://developer.nvidia.com/blog/accelerating-standard-c-with-gpus-using-stdpar/

  3. https://developer.nvidia.com/hpc-compilers

  4. There are other interesting pages.

  5. To the first order, they have similar performance. That's dominated by data transfer, which they do the same.

  6. C++17 may have the greatest potential, but I don't know whether it's mature enough yet.

4 Nvidia GTC

  1. This week, free. https://www.nvidia.com/en-us/gtc/

  2. Jensen Huang's keynote should be good.

  3. Browse around.

5 My parallel research

  1. I use thrust (and OpenMP) to for some geometry (CAD or GIS) research doing efficient algorithms on large datasets.

  2. One example is to process a set of 3D points to find all pairs closer than a given distance.

  3. Another is to overlay two polyhedra or triangulations to find all intersecting pieces. That uses big rational arithmetric to avoid roundoff errors and Simulation of Simplicity to handle degenerate cases.