PAR Lecture 27, Thurs Apr 27

W Randolph Franklin, RPI

2017-04-24 00:00

Comments

Table of contents

1 Course recap
2 Course survey
3 Term project deliverables
4 Parallel computing videos

1 Course recap

My teaching style is to work from particulars to the general.
You've seen OpenMP, a tool for shared memory parallelism.
You've seen the architecture of NVidia's GPU, a widely used parallel system, and CUDA, a tool for programming it.
You've seen Thrust, a tool on top of CUDA, built in the C++ STL style.
You've seen how widely used numerical tools like BLAS and FFT have versions built on CUDA.
You've had a chance to program in all of them on parallel.ecse, with dual 14-core Xeons, Pascal NVidia board, and Xeon Phi coprocessor.
You seen talks by leaders in high performance computing, such as Jack Dongarra.
You've seen quick references to parallel programming using Matlab, Mathematica, and the cloud.
Now, you can inductively reason towards general design rules for shared and non-shared parallel computers, and for the SW tools to exploit them.

2 Course survey

If you liked the course, then please officially tell RPI by completing the survey. Thanks.

3 Term project deliverables

3.1 Talk in class on Monday May 1

5 minutes per team, plus 2 minutes for questions.

3.2 Project report due by 1159pm on Wed May 3

Upload it to LMS, or upload a note saying from where to retrieve it.
This should contain
1. an academic paper in the style of a good computing conference, or in the style of a paper published in a professional journal of the ACM or IEEE.
2. the code,
3. a video or user manual showing how to use it, and
4. a description of how it works, problems you solved, etc.

3.3 Optional demo to Yin Li

Schedule this with him.
If you want to do this, then spend time to make it good.
Try to schedule it by May 3 (although a little slippage might be ok).

3.4 Inspiration for finishing your term projects

The Underhanded C Contest

"The goal of the contest is to write code that is as readable, clear, innocent and straightforward as possible, and yet it must fail to perform at its apparent function. To be more specific, it should do something subtly evil. Every year, we will propose a challenge to coders to solve a simple data processing problem, but with covert malicious behavior. Examples include miscounting votes, shaving money from financial transactions, or leaking information to an eavesdropper. The main goal, however, is to write source code that easily passes visual inspection by other programmers."
The International Obfuscated C Code Contest
https://www.awesomestories.com/asset/view/Space-Race-American-Rocket-Failures

Moral: After early disasters, sometimes you can eventually get things to work.
The 'Wrong' Brothers Aviation's Failures (1920s)
Early U.S. rocket and space launch failures and explosion
Numerous US Launch Failures

4 Parallel computing videos

We'll see some subset of these.

4.1 Welcome Distributed Systems in One Lesson

niko peikrishvili, 11 min

https://www.youtube.com/watch?v=T9ej3NcE2gQ

This is the first of several short videos.

4.2 Paying for Lunch: C++ in the ManyCore Age

CppCon 2014: by Herb Sutter

https://www.youtube.com/watch?v=AfI_0GzLWQ8

Published on Sep 29, 2014, 1h15m

http://www.cppcon.org

Presentation Slides, PDFs, Source Code and other presenter materials are available at: https://github.com/CppCon/CppCon2014

Concurrency is one of the major focuses of C++17 and one of the biggest challenges facing C++ programmers today. Hear what this panel of experts has to say about how to write concurrent C++ now and in the future.

MODERATOR: Herb Sutter - Author, chair of the ISO C++ committee, software architect at Microsoft.

SPEAKERS:

PABLO HALPERN - Pablo Halpern has been programming in C++ since 1989 and has been a member of the C++ Standards Committee since 2007. He is currently the Parallel Programming Languages Architect at Intel Corp., where he coordinates the efforts of teams working on Cilk Plus, TBB, OpenMP, and other parallelism languages, frameworks, and tools targeted to C++, C, and Fortran users. Pablo came to Intel from Cilk Arts, Inc., which was acquired by Intel in 2009. During his time at Cilk Arts, he co-authored the paper "Reducers and other Cilk++ Hyperobjects", which won best paper at the SPAA 2009 conference. His current work is focused on creating simpler and more powerful parallel programming languages and tools for Intel's customers and promoting adoption of parallel constructs into the C++ and C standards. He lives with his family in southern New Hampshire, USA. When not working on parallel programming, he enjoys studying the viola, skiing, snowboarding, and watching opera. Twitter handle: @PabloGHalpern

JARED HOBEROCK - Jared Hoberock is a research scientist at NVIDIA where he develops the Thrust parallel algorithms library and edits the Technical Specification on Extensions for Parallelism for C++.Website: http://github.com/jaredhoberock

ARTUR LAKSBERG - Artur Laksberg leads the Visual C++ Libraries development team at Microsoft. His interests include concurrency, programming language and library design, and modern C++. Artur is one of the co-authors of the Parallel STL proposal; his team is now working on the prototype implementation of the proposal.

ADE MILLER - Ade Miller writes C++ for fun. He wrote his first N-body model in BASIC on an 8-bit microcomputer 30 years ago and never really looked back. He started using C++ in the early 90s. Recently, he's written two books on parallel programming with C++; "C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++" and "Parallel Programming with Microsoft Visual C++". Ade spends the long winters in Washington contributing to the open source C++ AMP Algorithms Library and well as a few other projects. His summers are mostly spent crashing expensive bicycles into trees. Website: http://www.ademiller.com/blogs/tech/ Twitter handle: @ademiller

GOR NISHANOV - Gor Nishanov is a is a Principal Software Design Engineer on the Microsoft C++ team. He works on the 'await' feature. Prior to joining C++ team, Gor was working on distributed systems in Windows Clustering team.

MICHAEL WONG - You can talk to me about anything including C++ (even C and that language that shall remain nameless but starts with F), Transactional Memory, Parallel Programming, OpenMP, astrophysics (where my degree came from), tennis (still trying to see if I can play for a living), travel, and the best food (which I am on a permanent quest to eat). Michael Wong is the CEO of OpenMP. He is the IBM and Canadian representative to the C++ Standard and OpenMP Committee. And did I forget to say he is a Director of ISOCPP.org and a VP, Vice-Chair of Programming Languages for Canada's Standard Council. He has so many titles, its a wonder he can get anything done. Oh, and he chairs the WG21 SG5 Transactional Memory, and is the co-author of a number C++11/OpenMP/TM features including generalized attributes, user-defined literals, inheriting constructors, weakly ordered memory models, and explicit conversion operators. Having been the past C++ team lead to IBM´s XL C++ compiler means he has been messing around with designing C++ compilers for twenty years. His current research interest, i.e. what he would like to do if he had time is in the area of parallel programming, transactional memory, C++ benchmark performance, object model, generic programming and template metaprogramming. He holds a B.Sc from University of Toronto, and a Masters in Mathematics from University of Waterloo. He has been asked to speak at ACCU, C++Now, Meeting C++, CASCON, and many Universities, research centers and companies, except his own, where he has to listen. Now he and his wife loves to teach their two children to be curious about everything.

4.3 Combine Lambdas and weak_ptrs to make concurrency easy

https://www.youtube.com/watch?v=fEnnmpdZllQ

CppCon 2016: Dan Higgins, 4min

4.4 A Pragmatic Introduction to Multicore Synchronization

by Samy Al Bahra.

https://www.youtube.com/watch?v=LX4ugnzwggg

Published on Jun 15, 2016, 1h 2m.

This talk will introduce attendees to the challenges involved in achieving high performance multicore synchronization. The tour will begin with fundamental scalability bottlenecks in multicore systems and memory models, and then extend to advanced synchronization techniques involving scalable locking and lock-less synchronization. Expect plenty of hacks and real-world war stories in the fight for vertical scalability. Some of the topics introduced include memory coherence and consistency, memory organization, scalable locking, biased asymmetric synchronization, non-blocking synchronization and safe memory reclamation.

4.5 Synchronization - Blocking & Non-Blocking (1/2)

by Petr Kuznetsov, 15min

https://www.youtube.com/watch?v=k8uOOvd6Uj8

4.6 Lock-Free Programming (or, Juggling Razor Blades), Part I

CppCon 2014, by Herb Sutter

Published on Oct 16, 2014

http://www.cppcon.org

Presentation Slides, PDFs, Source Code and other presenter materials are available at: https://github.com/CppCon/CppCon2014

Example-driven talk on how to design and write lock-free algorithms and data structures using C++ atomic -- something that can look deceptively simple, but contains very deep topics. (Important note: This is not the same as my "atomic Weapons" talk; that talk was about the "what they are and why" of the C++ memory model and atomics, and did not cover how to actually use atomics to implement highly concurrent algorithms and data structures.)