Introduction

In the map overlay problem, two maps, or planar graphs, are combined to make a third, each of whose polygons represents the intersection of one polygon from the first input map with one from the second. For instance, if the first map's polygons are countries and the second watersheds, then one output polygon might be that part of Canada that drains into Hudson Bay. The overlay problem is one of the more difficult computational issues in GIS since the algorithm is complex, the data are large, and the numerical inaccuracy of computers can confound even correct algorithms, White[ ]. Thematic map overlays were used by the Irish Railway Commissioners in about the 18 century, Bartlett[ ]. For a history of other manual methods, see Coppack[ ]. The first operational GIS to use overlay was Roger Tomlinson's CGIS. There have been several theses on overlay, Aronson[ ], Guevara[, Lam ], Wagner[. See Marble ] for some of the broader implications of overlay.

Often the overlay operation is performed only to find some property of the output polygons, such as their area, or even just to list the non-empty ones. The polygons' actual vertices and edges are not used otherwise. Nevertheless, the usual method is to find the overlay polygons first, and then to calculate the desired properties.

For example, assume that we need the population of each hydrographic polygon in the USA, but have only the population of each county. One reasonable method to find the population of one particular hydrographic polygon is areal interpolation, as follows. Find each county that it overlaps, the overlapping area with each county, and then the percentage of each county that is overlapped. Assign the same percentage of the county's population to the hydrographic polygon. This does assume that each county's population is uniformly distributed, which is false, but perhaps approximately correct.

Our algorithm, OVERPROP, presents a simpler method for finding properties of overlay polygons. It is much more efficient, can process the largest databases, and is less sensitive to numerical errors since they cannot cause topological difficulties. It is based on our simpler, local topological data structures for representing polygons and polyhedra. Its speed comes from the uniform grid data structure. These concepts have been described in the context of computer aided design (CAD) most recently in Franklin[ ]. A preliminary description of this algorithm and implementation were presented in Franklin[ ]; this paper contains tests of an improved implementation on larger maps and more details. Wu[ ] has done overlaying with rational numbers in Prolog, and Sun[ and Sivaswami ] have looked at related issues. One parallel implementation of the uniform grid is Hopkins[ ].

OVERPROP is fast enough to be used dynamically during a data investigation. It is not necessary explicitly to overlay all the data layers at the start of a job, as some commercial GIS packages do, unless the complete overlay is needed for other reasons, such as display, which is often the case.

Monte Carlo methods could be used approximately to calculate overlay areas. However, the accuracy improves with the square root of the number of sample points, which makes high accuracy infeasible. Also Monte Carlo methods do not extend to determining other properties, such as perimeter.

The rest of the paper is as follows. First, we will see the logical data structures, and the theoretical formulæ underlying the algorithm. We give the algorithm in two stages. First, a brief conceptual description ignores efficiency considerations. Then we see a more detailed description. Finally, we see several deeper issues, such as controlling numerical errors, design tradeoffs, and several possible extensions.

Next: Major Data Structures Up: Calculating the Area of Previous: List of Figures

Wm Randolph Franklin
Wed Dec 14 14:03:28 EST 1994