PAR Class 15, Thu 2022-03-03
1 NVIDIA GTC conference
March 21-24.
mostly free.
I encourage you to browse around.
There'll probably be a homework or 2 based on this.
2 Types of virtualization
There are many possible levels of virtualization.
At a low level, one might emulate the HW. This is quite flexible but too slow.
-
At a higher level, a basic OS runs separate virtual machines, each with its own file system.
Harmless machine instructions execute normally.
Powerful ones are trapped and emulated.
This requires a properly designed instruction set.
-
IBM has been doing this commercially for 40 years, with something originally called CP/CMS.
I think that IBM lucked out with their instruction set design, and didn't plan it.
-
Well-behaved clients might have problematic code edited before running, to speed the execution.
I think that Vmware does that.
It seems that compute-intensive clients might have almost no overhead.
However, the emulated file system can be pretty slow.
With Vmware, several clients can all be different OSs, and the host can be any compatible OS.
E.g., I've had a linux vmware host simultaneously running both linux and windows clients.
In linux, root no longer has infinite power.
-
The next level of virtualization has an nontrivial host OS, but separates the clients from each other.
They see a private view of the process space, file system, and other resources.
This is lighter weight, e.g., quicker to start a VM and less overhead.
The host and client must be the same OS.
This might be called paravirtualization.
Linux supports this with things like namespace isolation and control groups (cgroups). Wikipedia et al describe this.
-
Ubuntu snaps do something like this.
E.g., firefox is distributed as a snap to increase isolation and security.
However starting firefox now takes 15 sec.
-
The next level up is the normal linux security.
You can see all the processes and similar resources.
The file system has the usual protections.
This is hard to make secure when doing something complicated.
How do I protect myself from firefox going bad?
It's easy to describe what it should be allowed to do, but almost impossible to implement.
That includes using apparmor etc.
Who guards the guards? I get spammed at a unique address that I used only to register with apparmor.
In theory, packaging an app in a virtual machine has fewer dependencies and is more secure.
You can run the vm w/o changes on different hosts.
A Vmware client can run w/o change on both linux and windows hosts.
You can run a client on your own hardware, then spill over to a commercial cloudy platform when necessary.
3 Docker
Docker is a popular lightweight virtualization system, which Nvidia uses to distribute SW.
Docker runs images that define virtual machines.
Docker images share resources with the host, in a controlled manner.
For simple images, which is not nvidia/cuda, starting the image is so cheap that you can do it to run one command, and encapsulate the whole process in a shell function.
Docker is worth learning, apart from its use by Nvidia for parallel computing. You might also look up Kubernetes.
-
More info:
I installed docker on parallel to run nvidia images like pgc++. Then I removed it because it wasn't necessary, was complicated, and it was insecure.
4 Several forms of C++ functions
-
Traditional top level function
auto add(int a, int b) { return a+b;}
You can pass this to a function. This really passes a pointer to the function. It doesn't optimize across the call.
These have global scope.
Note auto. It's underused.
-
Overload operator() in a new class
Each different variable of the class is a different function. The function can use the variable's value. This is a closure.
This is local to the containing block.
This form optimizes well.
-
Lambda, or anon function.
auto add = [](int a, int b) { return a+b;};
This is local to the containing block.
This form optimizes well.
-
Placeholder notation.
As an argument in, e.g., thrust transform, you can do this:
transform(..., _1+_2);
This is nice and short.
As this is implemented by overloading the operators, the syntax of the expression is limited to what was overloaded.
5 Thrust - 1
Thrust is an API that looks like STL. Its backend can be GPU-CUDA, OpenMP, TBB, or sequential host-based code.
Functional-programming philosophy.
Easier programming, once you get used to it.
Source code is compact.
Compiled code runs efficiently.
Uses some unusual C++ techniques, like overloading operator().
Since the Stanford slides were created, Thrust has adopted unified addressing, so that pointers know whether they are host or device.
-
Links:
https://developer.nvidia.com/thrust Developer doc
-
Introduction to GPU Programming with CUDA and Thrust 1:18:19
All the preceding contains other interesting links.
6 Just using parallelism
You just want to run a big application, and figure a parallel computer might be useful.
You don't care for the theory.
Then try to implement your application on using existing parallel apps like Matlab or Mathematica.
Parallel matrix ops and Fourier transforms are already available.