PAR Syllabus

W Randolph Franklin, RPI

2018-01-16 00:00

Table of contents

1 Catalog info
2 Description
3 Prerequisite
4 Instructors
- 4.1 Professor
5 Course websites
6 Reading material
- 6.1 Text
- 6.2 Web
7 Computer systems used
8 Assessment measures, i.e., grades
- 8.1 Term project
  - 8.1.1 Size of term project
  - 8.1.2 Deliverables
- 8.2 Early warning system (EWS)
9 Academic integrity
10 Student feedback

1 Catalog info

Titles:	ECSE-4740-01 Applied Parallel Computing for Engineers, CRN 39207
Semesters:	Spring term annually
Credits:	3 credit hours
Time and place:	Wed 3-5, JEC6115

2 Description

This is intended to be a computer engineering course to provide students with knowledge and hands-on experience in developing applications software for affordable parallel processors. This course will cover hardware that any lab can afford to purchase. It will cover the software that, in the prof's opinion, is the most useful. There will also be some theory.
The target audiences are ECSE seniors and grads and others with comparable background who wish to develop parallel software.
This course will have minimal overlap with parallel courses in Computer Science. We will not teach the IBM BlueGene, because it is so expensive, nor cloud computing and MPI, because most big data problems are in fact small enough to fit on our hardware.
You may usefully take all the parallel courses at RPI.
This unique features of this course are as follows:
1. Use of only affordable hardware that any lab might purchase, such as Nvidia GPUs. This is currently the most widely used and least expensive parallel platform.
2. Emphasis on learning several programming packages, at the expense of theory. However you will learn a lot about parallel architecture.

Hardware taught, with reasons:

Multicore Intel Xeon:
	universally available and inexpensive, comparatively easy to program, powerful
Intel Xeon Phi:	affordable, potentially powerful, somewhat harder to program
Nvidia GPU accelerator:
	widely available (Nvidia external graphics processors are on 1/3 of all PCs), very inexpensive, powerful, but harder to program. Good cards cost only a few hundred dollars.

Software that might be taught, with reasons:

OpenMP C++ extension:
	widely used, easy to use if your algorithm is parallelizable, backend is multicore Xeon.
Thrust C++ functional programming library:
	FP is nice, hides low level details, backend can be any major parallel platform.
MATLAB:	easy to use parallelism for operations that Mathworks has implemented in parallel, etc.
Mathematica:	interesting powerful front end:
CUDA C++ extension and library for Nvidia:
	low level access to Nvidia GPUs.

The techniques learned here will also be applicable to larger parallel machines -- number 3 on the top 500 list uses NVIDIA GPUs, while number 2 uses Intel Xeon Phis. (Number 4 is a BlueGene.)
Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors.

3 Prerequisite

ECSE-2660 CANOS or equivalent, knowledge of C++.

4 Instructors

4.1 Professor

W. Randolph Franklin. BSc (Toronto), AM, PhD (Harvard)

Informal meetings:
Office:	Jonsson Engineering Center (JEC) 6026
Phone:	+1 (518) 276-6077 (forwards)
Email:	frankwr@rpi.edu Email is my preferred communication medium. Sending from a non-RPI account is fine (and is what I often do). However, please use an account that shows your name, at least in the comment field. A subject prefix of PAR is helpful. GPG encryption is welcomed.
Web:	https://wrf.ecse.rpi.edu/
Office hours:	After each class, usually as long as anyone wants to talk. Also by appointment. That means that if you write me, we can probably meet in the next day or two.
	If you would like to lunch with me, either individually or in a group, just mention it. We can then talk about most anything legal and ethical.

5 Course websites

The homepage has lecture summaries, syllabus, homeworks, etc.

6 Reading material

6.1 Text

There is no required text, but the following inexpensive books may be used.

Sanders and Kandrot, CUDA by example. It gets excellent reviews, although it is several years old. Amazon has many options, including Kindle and renting hardcopies.
Kirk and Hwu, 2nd edition, Programming massively parallel processors. It concentrates on CUDA.

One problem is that even recent books may be obsolete. For instance they may ignore the recent CUDA unified memory model, which simplifies CUDA programming at a performance cost. Even if the current edition of a book was published after unified memory was released, the author might not have updated the examples.

6.2 Web

There is a lot of free material on the web, which I'll reference, and may cache locally.

7 Computer systems used

This course will use (remotely via ssh) geoxeon.ecse.rpi.edu and parallel.ecse.rpi.edu.

Parallel has:

a dual 14-core Intel Xeon E5-2660 2.0GHz
256GB of DDR4-2133 ECC Reg memory
Nvidia GPU, perhaps GeForce GTX 1080 processor with 8GB
Intel Xeon Phi 7120A
Samsung Pro 850 1TB SSD
WD Red 6TB 6GB/s hard drive
CUDA
OpenMP 4.0
Thrust

Parallel is less available because I sometimes boot it into Windows to run the Vive.

Geoxeon has:

Dual 8-core Intel Xeon.
128GB DRAM.
Nvidia GPUs: #. GM200 GeForce GTX Titan X #. GK100GL Tesla K20Xm
Ubuntu 17.04
CUDA, Thrust, OpenMP, etc.

Geoxeon should be always available.

8 Assessment measures, i.e., grades

There will be no exams.
The grade will be based on a term project and class presentations.
Deliverables for the term project:
1. A 2-minute project proposal given to the class around the middle of the semester.
1. A 5-minute project presentation given to the class in the last week.
1. Some progress reports.
1. A write-up uploaded on the last class day. This will contain an academic paper, code and perhaps video or user manual.

8.1 Term project

For the latter part of the course, most of your homework time will be spent on a term project.
You are encouraged do it in teams of up to 3 people. A team of 3 people would be expected to do twice as much work as 1 person.
You may combine this with work for another course, provided that both courses know about this and agree. I always agree.
If you are a grad student, you may combine this with your research, if your prof agrees, and you tell me.
You may build on existing work, either your own or others'. You have to say what's new, and have the right to use the other work. E.g., using any GPLed code or any code on my website is automatically allowable (because of my Creative Commons licence).
You will implement, demonstrate, and document something vaguely related to parallel computing.
You will give a 5 minute fast forward Powerpoint talk in class. A fast forward talk is a timed Powerpoint presentation, where the slides advance automatically.
You may demo it to the TA if you wish.

8.1.1 Size of term project

It's impossible to specify how many lines of code makes a good term project. E.g., I take pride in writing code that is can be simultaneously shorter, more robust, and faster than some others. See my 8-line program for testing whether a point is in a polygon: Pnpoly.

According to Big Blues, when Bill Gates was collaborating with around 1980, he once rewrote a code fragment to be shorter. However, according to the IBM metric, number of lines of code produced, he had just caused that unit to officially do negative work.

8.1.2 Deliverables

An implementation showing parallel computing.
An extended abstract or paper on your project, written up like a paper. You should follow the style guide for some major conference (I don't care which, but can point you to one).
A more detailed manual, showing how to use it.
A talk in class.

A 10-minute demonstration to the TA is optional. If you do, she will give me a modifier of up to 10 points either way. I.e., a good demo will help, a bad one hurt.

8.2 Early warning system (EWS)

As required by the Provost, we may post notes about you to EWS, for example, if you're having trouble doing homeworks on time, or miss an exam. E.g., if you tell me that you had to miss a class because of family problems, then I may forward that information to the Dean of Students office.

9 Academic integrity

See the Student Handbook for the general policy. The summary is that students and faculty have to trust each other. After you graduate, your most important possession will be your reputation.

Specifics for this course are as follows.

You may collaborate on homeworks, but each team of 1 or 2 people must write up the solution separately (one writeup per team) using their own words. We willingly give hints to anyone who asks.
The penalty for two teams handing in identical work is a zero for both.
You may collaborate in teams of up to 3 people for the term project.
You may get help from anyone for the term project. You may build on a previous project, either your own or someone else's. However you must describe and acknowledge any other work you use, and have the other person's permission, which may be implicit. E.g., my web site gives a blanket permission to use it for nonprofit research or teaching. You must add something creative to the previous work. You must write up the project on your own.
However, writing assistance from the Writing Center and similar sources in allowed, if you acknowledge it.
The penalty for plagiarism is a zero grade.
Cheating will be reported to the Dean of Students Office.

10 Student feedback

Since it's my desire to give you the best possible course in a topic I enjoy teaching, I welcome feedback during (and after) the semester. You may tell me or write me or the TA, or contact a third party, such as Prof Gary Saulnier, the ECSE undergrad head, or Prof Mike Wozny, the ECSE Dept head.