Skip to main content

PAR Class 21, Thu 2020-04-09

1   Final project presentations

Remember to sign up.

2   Thrust

  1. The Thrust examples teach several non-intuitive paradigms. As I figure them out, I'll describe a few. My descriptions are modified and expanded versions of the comments in the programs. This is not a list of all the useful programs, but only of some where I am adding to their comments.

    1. arbitrary_transformation.cu and dot_products_with_zip.cu. show the very useful zip_iterator. Using it is a 2-step process.

      1. Combine the separate iterators into a tuple.
      2. Construct a zip iterator from the tuple.

      Note that operator() is now a template.

    2. boundingbox.cu finds the bounding box around a set of 2D points.

      The main idea is to do a reduce. However, the combining operation, instead of addition, is to combine two bounding boxes to find the box around them.

      The combining op can be any associative op.

    3. bucket_sort2d.cu overlays a grid on a set of 2D points and finds the points in each grid cell (bucket).

      1. The tuple is an efficient class for a short vector of fixed length.

      2. Note how random numbers are generated. You combine an engine that produces random output with a distribution.

        However you might need more complicated coding to make the numbers good when executing in parallel. See monte_carlo_disjoint_sequences.cu.

      3. The problem is that the number of points in each cell is unpredictable.

      4. The cell containing each point is computed and that and the points are sorted to bring together the points in each cell.

      5. Then lower_bound and upper_bound are used to find each bucket in that sorted vector of points.

      6. See this lower_bound description.

    4. mode.cu shows:

      1. Counting the number of unique keys in a vector.
        1. Sort the vector.
        2. Do an inner_product. However, instead of the operators being times and plus, they are not equal to the next element and plus.
      2. Counting their multiplicity.
        1. Construct vectors, sized at the number of unique keys, to hold the unique keys and counts.
        2. Do a reduce_by_keys on a constant_iterator using the sorted vector as the keys. For each range of identical keys, it sums the constant_iterator. That is, it counts the number of identical keys.
        3. Write a vector of unique keys and a vector of the counts.
      3. Finding the most used key (the mode).
        1. Do max_element on the counts vector.
    5. repeated_range.cu repeats each element of an N-vector K times: repeated_range([0, 1, 2, 3], 2) -> [0, 0, 1, 1, 2, 2, 3, 3]. It's a lite version of expand.cu, but uses a different technique.

      1. Here, N=4 and K=2.
      2. The idea is to construct a new iterator, repeated_range, that, when read and incremented, will return the proper output elements.
      3. The construction stores the relevant info in structure components of the variable.
      4. Treating its value like a subscript in the range [0,N*K), it divides that value by K and returns that element of its input.

      See also strided_range.cu and tiled_range.cu.

2.1   Backends

  1. The Thrust device can be CUDA, OpenMP, TBB, etc.

  2. You can spec it in 2 ways:

    1. by adding an extra arg at the start of a function's arg list.

    2. with an envar

      https://github.com/thrust/thrust/wiki/Host-Backends

      https://github.com/thrust/thrust/wiki/Device-Backends