PAR Class 14, Thu 2019-02-28

W Randolph Franklin, RPI

2019-02-28 00:00

Source

Table of contents::

1 Linux HMM (Heterogeneous Memory Management)
2 Several forms of C++ functions
3 Thrust
- 3.1 Examples

1 Linux HMM (Heterogeneous Memory Management)

2 Several forms of C++ functions

Traditional top level function

auto add(int a, int b) { return a+b;}

You can pass this to a function, really pass a pointer to the function. It doesn't optimize across the call.
Overload operator() in a new class

Each different variable of the class is a different function. The function can use the variable's value. This is a closure.

This is local to the containing block.

This form optimizes well.
Lambda, or anon function.

auto add = [](int a, int b) { return a+b;};

This is local to the containing block.

This form optimizes well.
Placeholder notation.

As an argument in, e.g., transform, you can do this:

transform(..., _1+_2);

This is nice and short.

As this is implemented by overloading the operators, the syntax of the expression is limited to what was overloaded.

3 Thrust

Continue Stanford's parallel course notes.
1. Lecture 8-: Thrust ctd.

3.1 Examples

I rewrote /parallel-class/thrust/examples-1.8/tiled_range.cu into /parallel-class/thrust/rpi/tiled_range2.cu .

It is now much shorter and much clearer. All the work is done here:

gather(make_transform_iterator(make_counting_iterator(0), _1%N), make_transform_iterator(make_counting_iterator(N*C), _1%N), data.begin(), V.begin());
1. make_counting_iterator(0) returns pointers to the sequence 0, 1, 2, ...
2. _1%N is a function computing modulo N.
3. make_transform_iterator(make_counting_iterator(0), _1%N) returns pointers to the sequence 0%N, 1%N, ...
4. gather populates V. The i-th element of V gets make_transform_iterator...+i element of data, i.e., the i%N-th element of data.
tiled_range3.cu is even shorter. Instead of writing an output vector, it constructs an iterator for a virtual output vector:

auto output=make_permutation_iterator(data, make_transform_iterator(make_counting_iterator(0), _1%N));
1. *(output+i) is *(data+(i%N)).
2. You can get as many tiles as you want by iterating.
3. tiled_range3.cu also constructs an iterator for a virtual input vector (in this case a vector of squares) instead of storing the data:
auto data = make_transform_iterator(make_counting_iterator(0), _1*_1);
tiled_range5.cu shows how to use a lambda instead of the _1 notation:

auto output=make_permutation_iterator(data, make_transform_iterator(make_counting_iterator(0), [](const int i){return i%N;} ));
1. You have to compile with --std c++11 .
2. This can be rewritten thus:
  
  auto f = [](const int i){return i%N;}; auto output = make_permutation_iterator(data, make_transform_iterator(make_counting_iterator(0), f));
3. The shortest lambda is this:
  
  auto f = [](){};
repeated_range2.cu is my improvement on repeated_range.cu:

auto output=make_permutation_iterator(data.begin(), make_transform_iterator(make_counting_iterator(0), _1/3));
1. make_transform_iterator(make_counting_iterator(0), _1/3)) returns pointers to the sequence 0,0,0,1,1,1,2,2,2, ...