PAR Class 14, Thu 2019-02-28

2   Several forms of C++ functions

  1. Traditional top level function

    auto add(int a, int b) { return a+b;}

    You can pass this to a function, really pass a pointer to the function. It doesn't optimize across the call.

  2. Overload operator() in a new class

    Each different variable of the class is a different function. The function can use the variable's value. This is a closure.

    This is local to the containing block.

    This form optimizes well.

  3. Lambda, or anon function.

    auto add = [](int a, int b) { return a+b;};

    This is local to the containing block.

    This form optimizes well.

  4. Placeholder notation.

    As an argument in, e.g., transform, you can do this:

    transform(..., _1+_2);

    This is nice and short.

    As this is implemented by overloading the operators, the syntax of the expression is limited to what was overloaded.

3   Thrust

  1. Continue Stanford's parallel course notes.
    1. Lecture 8-: Thrust ctd.

3.1   Examples

  1. I rewrote /parallel-class/thrust/examples-1.8/tiled_range.cu into /parallel-class/thrust/rpi/tiled_range2.cu .

    It is now much shorter and much clearer. All the work is done here:

    gather(make_transform_iterator(make_counting_iterator(0), _1%N), make_transform_iterator(make_counting_iterator(N*C), _1%N), data.begin(), V.begin());

    1. make_counting_iterator(0) returns pointers to the sequence 0, 1, 2, ...
    2. _1%N is a function computing modulo N.
    3. make_transform_iterator(make_counting_iterator(0), _1%N) returns pointers to the sequence 0%N, 1%N, ...
    4. gather populates V. The i-th element of V gets make_transform_iterator...+i element of data, i.e., the i%N-th element of data.
  2. tiled_range3.cu is even shorter. Instead of writing an output vector, it constructs an iterator for a virtual output vector:

    auto output=make_permutation_iterator(data, make_transform_iterator(make_counting_iterator(0), _1%N));

    1. *(output+i) is *(data+(i%N)).
    2. You can get as many tiles as you want by iterating.
    3. tiled_range3.cu also constructs an iterator for a virtual input vector (in this case a vector of squares) instead of storing the data:

    auto data = make_transform_iterator(make_counting_iterator(0), _1*_1);

  3. tiled_range5.cu shows how to use a lambda instead of the _1 notation:

    auto output=make_permutation_iterator(data, make_transform_iterator(make_counting_iterator(0), [](const int i){return i%N;} ));

    1. You have to compile with --std c++11 .

    2. This can be rewritten thus:

      auto f = [](const int i){return i%N;}; auto output = make_permutation_iterator(data, make_transform_iterator(make_counting_iterator(0), f));

    3. The shortest lambda is this:

      auto f = [](){};

  4. repeated_range2.cu is my improvement on repeated_range.cu:

    auto output=make_permutation_iterator(data.begin(), make_transform_iterator(make_counting_iterator(0), _1/3));

    1. make_transform_iterator(make_counting_iterator(0), _1/3)) returns pointers to the sequence 0,0,0,1,1,1,2,2,2, ...