This page describes GeoXeon, our GPU server. Its purpose is to do exciting research that we couldn't do before. Parts of this description are a little outdated.

System

The name is geoxeon.ecse.rpi.edu.
GeoXeon arrived on 3/20/13.
It was bought from Colfax International, except for the monitor, which came from the RPI computer store.
Colfax has an online configurator, allowing you to see the price for different options.
This is the quote: geoxeon-quote-feb2013.pdf .
This is Colfax's general description: http://www.colfax-intl.com/nd/workstations/sxt8600.aspx# .
The cost was $13,576.93 (incl $200 shipping) to Colfax plus about $700 to the RPI computer store for the monitor, paid for by my NSF terrain grant.
The HW is:
1. Supermicro X9DRG-QF motherboard.
2. Dual Intel Xeon E5-2687 3.1GHz 8 core / 16 thread CPUs,
3. 128GB 1600MHz Registered ECC DDR3 memory,
4. Dual 3TB Seagate Constellation ES.2 7200RPM 64MB cache 6.0GB/s disks,
5. NVIDIA Tesla Kepler K20xm Computing Processor, with
  1. CUDA 3.5,
  2. 2688 CUDA processing cores,
  3. 4 TFLOPS single precision floating point,
  4. 1.3 TFLOPS double precision floating point,
  5. 6GB memory.
  Re the K20xm: m means that the card doesn't have an internal fan, but relies on external cooling. The K20x is slightly better than the K20. E.g., it has 6GB instead of 5GB of memory.
6. NVIDIA Quadro K5000 graphics card, with
  1. CUDA 3.0,
  2. 1536 CUDA processing cores,
  3. 2.1 TFLOPS single precision floating point,
  4. 4GB memory,
  5. OpenGL 4.3.

Admin

I (WRF) will initially do a lot of the maintenance.
The systems programmers are:
1. Wenli Li
2. Salles Viana Gomes de Magalhães
To sysadmins: when adding users, copy the RPI RCS user name and uid.
To erstwhile users: Send me your RCSID and SSH public key.
If you want a key to the lab, give me your RIN.

Access

It is in JEC6115, in the back right corner.
It is accessible
1. from outside, via the VPN, if you have an RPI computer account, or
2. from other RPI machines.
For remote access, you must use an SSH key; password login is disabled.

File system

All partition except for root use ZFS. So your home dir is automatically and frequently snapshotted. See /home/.zfs/snapshot/ . Also, files are transparently compressed. Do not compress your files yourself. Other partitions, like /opt, are also in ZFS. However the root partition is not (because that would complicate booting.)
One consequence is snapshots is that large temporary files will continue to take up space so long as any snapshot contains them. It's impossible to delete one file from a snapshot; the whole snapshot must be deleted.
This is not a problem so long as we have lots of disk space. Indeed I've also configured a RAID 0, so every zfs file is stored twice.
/local (linked to /usr/local) contains some extra installed SW.
/opt contains more installed SW, particularly SW that wasn't installed into a fixed place by a package.
/opt also contains other files intended for public consumption, such as some of my programs (in /opt/wrf).
If you want to use very large files, I can create a separate filesystem that is not snapshotted. E.g., GISCup files are in /tank/giscup .

CUDA

The cuda installation is in /local/cuda/. Especially see the documentation and code samples. However the code samples are hard to read.
The sample programs are in /local/NVIDIA_CUDA-7.5_Samples/.
Copy them to your own directory to modify or recompile.
There's a lot of good stuff on the web, search for cuda tutorial including a Stanford class.
Thrust examples: http://docs.nvidia.com/cuda/thrust/index.html
Stanford course: http://code.google.com/p/stanford-cs193g-sp2010/wiki/TutorialPrerequisites

Software

Extra SW Installed

This SW is generally in /opt/ , with possible links in /local/bin .
Matlab 2016a Matlab may use 1/2 (because of licensing restrictions) of the available CPU cores well.
Mathematica 10. Its parallel capabilities didn't impress me. However, Mathematica has excellent visualization tools, and would be a good research and prototyping environment.
LIBQGLViewer This is a layer on top of OpenGL to add viewing tools. It looks very interesting for use in your programs. See the examples. It's installed to /opt/libQGLViewer/ including the examples.
CGAL. Computational Geometry Algorithms Library. Big C++ set of classes and algorithms with extensive doc and examples.
boost
slicer 3D segmentation, registration, and visualization of medical data. Installed to /opt/slicer . However VISIT appears more general.
VTK - The Visualization Tookkit. Installed to /opt/vtk. Some examples in /opt/vtk/VTK5.10.1/bin/ work. I'm still (3/28/13) trying to understand it; it's a big package.
VISIT Parallel interactive visualization tool. Installed to /opt/visit/ . It works but I haven't run it yet. (3/28/13). It looks good for visualizing your data.
Assorted other math-like libraries, used by CGAL, such as gmp and gmpxx.
Intel TBB (Threaded Building Blocks). It's apparently harder to use than OpenMP but more powerful. It uses only the CPUs.
/tank/home/parcomp/Class/ has files from my class on Engineering Parallel Computing Spring 2014. They include stuff on CUDA, OpenMP, etc.

Recommended SW set

boost for programming.
VISIT for interactive visualizing data.
libQGLViewer for programming visualizations into your code.
OpenMP for parallel programming on the CPUs.
Thrust for simple parallel programming on the GPUs.
CUDA for when Thrust isn't good enough.

Interesting SW not installed or not recommended

COIN3d: partly installed. Hi-level graphics API, but we're not doing that.
Paraview: comparable to VISIT. It is said to be more powerful but harder to use. I'm guessing that VISIT is powerful enough.
MeVisLab: nice volume rendering but specialized to medical images.
PETSc: looks good, does parallel sparse matrices; haven't studied it.
AZTEC: looks good, does parallel sparse matrices; haven't studied it.
Various other meshing and PDE tools.

My opinions about numerical SW

Here are my opinions about various choices for numerical SW, e.g., to work with matrices. This is more about libraries than standalone systems.

Packages

MATLAB often excellent, messy interface to C++.
Eigen perhaps best C++ free template library for mostly single-threaded computation.
Boost obsolete (only the numerical part).
BLAS many versions. non-free versions beat free versions.
Intel MKL (Math Kernel Lib): fast replacements of LAPACK and BLAS for multi-threaded CPU apps.
IT++ nice C++ templates for matrices that uses other packages.
Armadillo] nice C++ templates for matrices that uses BLAS etc. Only preliminary support for sparse matrices.
OpenBLAS multithreaded. Claims comparable to MKL. Can use OpenMP. Reported slower than MATLAB on multithreading. http://list.rdcps.ac.cn/pipermail/openblas/2012-September/000142.html
ATLAS self-optimizing BLAS.
Blaze claims to be fastest. Doesn't do inversion etc. Doesn't appear to be parallel.
http://crd-legacy.lbl.gov/~xiaoye/SuperLU/
http://www.cs.sandia.gov/CRF/aztec1.html Solves sparse systems on parallel machines. The website was last updated in 1997.
http://www.mcs.anl.gov/petsc/ - mostly solves PDEs (irrelevant to this project), supports GPUs. (to check)
CuBLAS apparently the only other free GPU tool.
CUSP (to check)

Tentative recommendations

For CPUs, use Eigen on top of OpenBLAS (or Blaze?), or use PETSc.
However, probably skip the above and go straight to CuBLAS.

Interesting pages

OpenCV vs. Armadillo vs. Eigen vs. more! Round 3: pseudoinverse test
http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
http://www.netlib.org/utk/people/JackDongarra/la-sw.html - list many sparse iterative solvers.
http://en.wikipedia.org/wiki/List_of_numerical_libraries
http://scicomp.stackexchange.com/questions/104/libraries-for-solving-sparse-linear-systems?rq=1
http://www.cs.uoregon.edu/research/paracomp/tau/tautools/ - looks useful, profiles parallel programs.

Changelog

Reformat zfs pool to be a mirror. This wastes 1/2 the disk, but we have the space. (12/15/13)
Update Matlab and Mathematica to current RPI versions. (12/1/13)
Installed a public version of my TIN program to /opt/wrf/tin/ . TIN greedily processes a square array of elevations, repeatedly inserting the worst point into a triangulation. Instructions are in /opt/wrf/tin/tin.help. (5/3/13)
I was the first person in the GIS field to write a TIN program. I was an undergrad working in the summer of 1973 at Simon Fraser University under the direction of Tom Poiker (Peucker) and David Douglas. This is an updated version. It is quite fast. There are faster programs that process all the input points in a batch, but mine is greedy and incremental. There may be a faster incremental program; I'm not sure. My internal data structures are more compact that most others, and so I can process larger datasets internally.
The largest dataset processed to date in ps16k, the 16385x16385 Puget Sound dataset. That's 268,468,225 points. TIN used 85GB of memory. (Luckily geoxeon has 128GB.) In 19 elapsed seconds, TIN found the 1M most important points. In 1185 seconds, it inserted 172,279,108 points, producing 344,508,515 triangles. (Those times exclude writing the results to disk.)
TIN can process smaller datasets faster. On the 1201x1201 Lake Champlain West USGS level-1 DEM, finding the 1000 most important points takes 0.55 seconds. Running until the max error is under 0.5, which means inserting about 500K points, takes 30 secs.
Got boost working with CUDA by applying https://svn.boost.org/trac/boost/ticket/8048. (4/24/13)
Installed on 4/23/13:
1. OpenBLAS
2. LEDA 6.3-free.
3. Armadillo.
4. LAPACK.
5. ATLAS.
6. CUSP 0.3.1, which appears to be identical to 0.3.0, including the stated version number. This might indicate an error on the download site?
7. CUSP examples.
8. Thrust 1.6.0
9. CGAL 4.2.
  1. using the laplack-atlas library.
  2. w/o LEDA, which wouldn't link to X11.
  3. Use make -j30 to speed up the compilation, which is a CPU-hour for the examples.
~~Change default g++ from 4.6 to 4.8 (4/21/13).~~ Revert 4/24/13 for fear that this might mess up nvcc, which is fragile. You can use 4.8 explicitly: g++-4.8.
Start Numerical SW Comments (4/14/13).
Install TBB (4/13/13).
Install Mint 17 Mate (approx 7/7/14).
Install Ubuntu 16.04 Xenial (5/3/2016).
1. Deleted /dev/sda4 with previous 2012 root partition (a copy is on /tank/old/...) and used it for new root.
2. /dev/sda3 has root1404.
3. Previous root partitions, usr/local etc, old /home:
  1. now in /tank/old/
  2. a few in /tank/{whatisthis,home-old.tar,DeletableQ}
4. problems powering on, as earlier.
5. takes several tries.
6. Updating IPMI fixed that. Do bios st.
Install nvidia drivers. Use the run file from http://www.nvidia.com/Download/index.aspx because deb package refused because of an obsolete signing key.
In case of X probs, the following might be useful.
Install CUDA 7.5.
1. had to remove g++-5, install 4.9, maybe to compile examples.
2. then reinstall 5.
3. must set lib path etc.
Install Matlab R2016a (5/2016).
1. installing required setting: install -javaxxxx .....jre
2. using w local graphics crashes
3. remote ok, w/o graphics ok
4. matlab -softwareopengl ok
Install Mathematica 10.4 (5/2016).
Flashed IPMI 2.60
1. now powering on works 1st time.
2. Details on request.