Skip to main content

PAR Class 19, Thu 2022-03-24

1 Driver/library version mismatch error

was caused by my updating some packages but not rebooting, causing new libraries to try to use the old kernel driver.

2 Thrust, 5

  1. Examples

    1. bucket_sort2d.cu overlays a grid on a set of 2D points and finds the points in each grid cell (bucket).

      1. The tuple is an efficient class for a short vector of fixed length.

      2. Note how random numbers are generated. You combine an engine that produces random output with a distribution.

        However you might need more complicated coding to make the numbers good when executing in parallel. See monte_carlo_disjoint_sequences.cu.

      3. The problem is that the number of points in each cell is unpredictable.

      4. The cell containing each point is computed and that and the points are sorted to bring together the points in each cell.

      5. Then lower_bound and upper_bound are used to find each bucket in that sorted vector of points.

      6. See this lower_bound description.

    2. mode.cu shows:

      1. Counting the number of unique keys in a vector.

        1. Sort the vector.

        2. Do an inner_product. However, instead of the operators being times and plus, they are not equal to the next element and plus.

      2. Counting their multiplicity.

        1. Construct vectors, sized at the number of unique keys, to hold the unique keys and counts.

        2. Do a reduce_by_keys on a constant_iterator using the sorted vector as the keys. For each range of identical keys, it sums the constant_iterator. That is, it counts the number of identical keys.

        3. Write a vector of unique keys and a vector of the counts.

      3. Finding the most used key (the mode).

        1. Do max_element on the counts vector.

    3. repeated_range.cu repeats each element of an N-vector K times: repeated_range([0, 1, 2, 3], 2) -> [0, 0, 1, 1, 2, 2, 3, 3]. It's a lite version of expand.cu, but uses a different technique.

      1. Here, N=4 and K=2.

      2. The idea is to construct a new iterator, repeated_range, that, when read and incremented, will return the proper output elements.

      3. The construction stores the relevant info in structure components of the variable.

      4. Treating its value like a subscript in the range [0,N*K), it divides that value by K and returns that element of its input.

      See also strided_range.cu and tiled_range.cu.

2.1 Backends

  1. The Thrust device can be CUDA, OpenMP, TBB, etc.

  2. You can spec it in 2 ways:

    1. by adding an extra arg at the start of a function's arg list.

    2. with an envar

      https://github.com/thrust/thrust/wiki/Host-Backends

      https://github.com/thrust/thrust/wiki/Device-Backends

3 Linux HMM (Heterogeneous Memory Management)

This is hardware support to make programming devices like GPUs easier. It took years to get into the official kernel, presumably because you don't want bugs in memory management.

This is also a nice intro into current operating systems concerns. Do our operating systems courses talk about this?

  1. https://www.kernel.org/doc/html/v4.18/vm/hmm.html

  2. https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=2ahUKEwiCuM-y2t7gAhUD24MKHZWlCeYQFjAEegQIBhAC&url=http%3A%2F%2Fon-demand.gputechconf.com%2Fgtc%2F2017%2Fpresentation%2Fs7764_john-hubbardgpus-using-hmm-blur-the-lines-between-cpu-and-gpu.pdf&usg=AOvVaw1c7bYo2YO5n8OtD0Vw9hbs

4 Final projects

  1. See the syllabus.

  2. April 4: team and title.

  3. April 11: 100 word project report.

  4. April 18, 25: 10 minute presentations. Email me with preferred date.

  5. Wed Apr 27: report due.

5 Quantum computing, 1

My quantum computing summary .

6 No classes

April 7, 21. If you wish, I can assign some excellent quantum computing videos to watch.