W Randolph Franklin home page
... (old version) Login

WRF Home Page

Keynote talk at GeoInfo 2013, XIV Brazilian Symposium on GeoInformatics

(Conference site.)

This talk will survey specialized algorithms, libraries, and development environments to facilitate processing huge geoinformatic databases on modern hardware.

Modern hardware is generally parallel, with the most economical platform, with the cheapest entry point, being NVidia GPUs. Desktop computers with current graphics cards, and even most laptop computers sold today, contain this platform, which was intended for realistic graphics, but which is also accessible by a programmer. Now, almost anyone can access a platform running thousands of parallel threads; the major problem being, "now what?".

Large datasets prefer data structures and algorithms like the uniform grid with linear expected compute time and I/O performance. With dataset sizes lying farther out on the asymptotic size-cost curve, even N*log(N) cost is prohibitive.

Modern hardware is often parallel, which favors simple regular data structures and algorithms. Modern hardware may have hundreds of gigabytes of main memory, which is surprisingly affordable. Despite being ignored by most researchers, it permits internal algorithms that access the data randomly, in contrast to the more limited external algorithms, to solve large problems.

Sophisticated development environments like Matlab and Mathematica can increase research productivity. They permit ideas to be realized in fewer lines of code, and may embed the best numerical techniques with implementation details hidden. This optimizes the often most important cost, the researcher's own time.

Specialized libraries can solve difficult problems and enhance productivity. CGAL, the Computational Geometry Algorithms Library, enables computing in the field of rational numbers extended with square roots. Here, lines and circles may intersect without any roundoff error (at a considerable cost in execution time). The Thrust package makes CUDA easier to use, and is even efficient. CUSP uses CUDA to process sparse matrices.

However, life is not perfect with these tools. They often do not play well with each other. The CUDA compiler can't parse C++11. GPUs have only modest amounts of high-speed memory. Tools developed for the current hardware may be buggy, badly documented, and not work on the latest hardware. Once you have committed to using a tool, switching away is difficult. Tools may become orphans whose development ceases. Commercial tools, with their captive audiences, get ever more expensive. In some domains, such as C++ compilers, the free tools are so good that the commercial tools have almost vanished.

A selection of these tools should be part of any productive GIS researcher's toolkit. However, since they do have learning curves, a modest amount of agency money spent producing tutorial material and providing programming support might increase the community's productivity.