Here are some techniques that have served me well in writing efficient programs. I learned to program on small slow machines; the standard Computer Science machine in grad school, on which much of the original famous research was done in the 1970s, was a PDP-10 with about 1MB of memory running under 1 MIPS. Paradoxically, when running large problems on current machines, efficiency is still important because external storage is perhaps {$10^5$} slower than main memory. Also, with larger {$N$}, you are farther out on the asymptotic curve. When {$N=10^9$}, as for some of my implementations, {$\lg N$} is a very large factor, i.e., 30.
My implementations include:
- Finding Connected Components on a {$ 1000\times1000\times1000 $} array of bits in {$E^3$}.
- Preprocessing almost {$2\cdot10^8$} points in {$E^3$} to support efficient nearest neighbor queries.
- Computing the volume and surface area of the Union of {$2\cdot10^7$} cubes.
Here are the tips.
- Do not dynamically allocate, 1-by-1, millions of elements, such as points. That appears to take superlinear time. Either allocate one array of them, or better:
- Allocate storage at compile time. Advantages:
- Fewer pointers are needed, e.g., for multi-dimensional arrays.
- The storage overhead for heap objects is avoided.
- Those 2 items allow more data to be stored in core.
- Addresses are known at compile time, allowing possible optimizations.
- If the size of, say, a grid, is variable, then allocate a maximum size at compile time, and recompile the program if that is too small.
- Then you can use boost arrays, which are especially good for multi-dimensional arrays. There are other good things in boost; study it.
- Also study the several books on C++ programming techniques.
- Rather than using a pointer to a small object, consider storing the object locally in place of the pointer. E.g., consider a balanced binary search tree of 4-byte objects. With {$N$} objects, there are {$N-1$} internal nodes and {$2N-2$} internal pointers, at 4 bytes each. Note that, as {$N\rightarrow\infty$}, the pointers take twice as much space as the objects. Alternatively, you could pack the objects sequentially, w/o any pointers, saving 2/3 of the total space. If the tree is not balanced, you have to store its topology, which takes 2 bits per internal node. In either case, however, access time rose from {$\log N$} to {$N$}. You can replace the bottom layer of pointers by storing the objects inside the bottom layers of internal nodes, in place of the pointers to the objects. That deletes 1/2 of the pointers and 1/3 of the total space. Access time, including insertion and deletion, is still {$\log N$}. You need 1 bit per field to distinguish a pointer from an object. A similar process is possible for the bottom {$k$} levels of the tree, to trade off time and space. The result is a pointer-based tree of smaller compact trees.
- With millions of objects, minimize the size of each object, e.g. with short int instead of int and float instead of double (when reasonable).
- Instead of using a pointer to one of many objects of the same type, if they are all part of a global array, use the index in that array. This may not be faster, but is easier to debug, since the index is a smaller number and so easier to understand.
- Here's how I optimize (in time and space) an array of sets of
objects, such as a grid with a different number of points in each
cell.
- Read the points in, and keep a count of the number in each cell. Do not store the points yet.
- Allocate a ragged array just big enough.
- Read the points a second time and populate the ragged array.
- My method uses no pointers, and so is smaller. For 4-byte objects, the space reduction is a factor of 2.
- The points in each cell can be randomly accessed in constant time since you don't have to trace along a linked list.
- However, objects cannot be added later; the data structure is constant after creation.