Terrain ideas

1 Summary

Here are some research ideas for terrain. I mostly formulated them in 2015. See also an incomplete list of my publications on terrain representation.

2 Consistently representing multiple related layers

The problem is to consistently represent multiple related layers of data for a single map, to preserve the relations when the map is lossily compressed and generalized. Indeed, internal consistency is more important than accuracy.

For example, consider a map with these three layers:

elevation contour lines,
stream lines, and
shore lines.

In the real world, streams almost always cross contours perpendicularly, and shore lines never cross contours. However see the two attached images, taken from a commercial (albeit old) mapping tool. One shows a stream running diagonally down a slope; the other shows the shore of a lake crossing multiple contours.

Perhaps the contours were derived from a lossily compressed elevation matrix, and the streams and shore stored as generalized polylines.

To see a possible solution strategy, consider the problem of lossily compressing a layer of contour lines. One representation is to store them individually as polylines, which are then generalized. Since each contour is treated separately from the others, there is nothing to stop generalized contours from crossing each other.

One solution is to store the elevations as a DEM, an array of elevation posts. Lossily compress the array by any means whatever; the result is still a legal elevation array. Finally, rederive the contours from that DEM; marching squares is a popular algorithm. The resulting contours will never cross each other.

We solved the problem of prohibiting crossing contours by transforming the data representation to another domain.

How to apply that strategy to the previous example? A shoreline might be encoded by labeling the appropriate contour. If the contours are generalized, then the compatible generalized shoreline can be computed. (There are some technical details.)

Here's a more complicated example of two correlated layers:

elevation, and
gradient vector, or its magnitude, the slope.

Elevation is mathematically a scalar field over a 2D domain. The gradient is a vector field defined as the derivative of the elevation (ignoring messy things like cliffs, and whether we mean data values at points or averaged over square pixels). However when the elevation is represented as a discrete array, and also has errors, its computed slope has much larger errors. So, that's useless.

Second try: we might store the elevation and slope layers separately, and ignore the inefficient use of space. However, even that doesn't work when you lossily compress both of them. Then they become inconsistent; remember that, to users, inconsistency is worse than inaccuracy.

Third try: we might store the elevations, compute the slopes from that, and store a correction layer containing the difference between the computed and correct slopes. This has yet to be tested.

The deeper problem of representing multiple correlated layers might consider feature layers with more complicated structure. Consider a map with two layers: rivers, and roads. The roads often run along the rivers, but rarely run in or cross them. (However, here is an exception). How do we compress a map with roads and rivers, and maintain this probabilistic restriction?

An idea of a general solution technique might come from "Principal component analysis (PCA) (which is) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components." (from Wikipedia). I'm not saying that PCA itself is useful, but that the technique of transforming variables into an independent set may be. JPEG compression also does this.

Misaligned shoreline and contour lines
Stream not perpendicular to contour lines

3 Terrain description by features

The goal is to describe terrain by means of features. Features are components, like hills and river valleys, that are natural for people to use. The goal is to use the feature description to reconstruct the terrain sufficiently accurately for other people to recognize it, and for computer applications.

One problem is to pick a feature set that is rich enough to be interesting, but small enough that we can research it.

Describing terrain by features is important because it forms the mental model of how people recognize terrain and describe it to others. We do not remember that Tupona, to pick a random fictional territory, say, has a point at location (x,y) with elevation z, and with a slope of 10 degrees to the east. Rather, we remember that it has two mountains, the northern one slightly higher, with a river flowing from the pass between them towards the east.

Indeed when we draw a map, those will be the features on it. Although there will be some numbers for distances and elevations on the map, it is the features and their relations that will dominate. We intuit these relations from our naive physics view of terrain, such as that water flows downhill.

This description of Tupona is also amenable to "progressive transmission". When describing Tupona to a recipient, we would list the major features first, followed in order by ever more minor ones. The longer the recipient listened, the more accurate a picture he would have. At any point in time, he would have a description of some legal terrain, only omitting some details.

This is how terrain was described from antiquity until computers were invented. Now, terrain is generally encoded as a Digital Elevation Model (DEM), a square array of elevation posts, or heights above a geoid (the notional sea level). This quantification has none of the advantageous properties of the descriptive model. E.g., water does not now always flow downhill. Because of the discrete nature of the quantification, the elevation array may have local minima, or pits or basins, that do not really exist.

Conversely, the DEM formalization contains apparently precise information of less interest, such as the elevation in the middle of a smooth slope. ("Apparently precise" because, depending on the source technology, the error bar may be surprisingly large.) If this elevation array is progressively transmitted with a Fourier decomposition, a standard technique in image processing, the intermediate reconstructions violate many rules of legal terrain. E.g., fictitious local minima are created and sharp features such a cliffs are rounded.

These limitations are so embedded in our thinking of terrain that we do not even realize what we've lost. From being our servant, the computer has become our master. This leads to the following interdisciplinary proposal, which has both spatio-geographical and computer-science components.

The spatio-geographical component is to extend and formalize descriptive geography from a human perspective. The goal is a language to describe terrain that is intuitive, but that, in the limit, has sufficient precision to specify the terrain unambiguously.

The computer-science component is, first, to devise a mathematics to easily represent such terrain. For instance, in this mathematics, legal terrain should be more natural than illegal terrain. E.g., up and down are not equivalent in real terrain. There are many local maxima but few local minima. Therefore, ideally the mathematical representation should treat them differently. The second part is to implement this.

The impact of this project will be a computer representation of terrain that matches how people think of it. Using this representation, we will be able to generate more realistic synthetic terrain. We will also be able to store and transmit in a more natural way the ever larger volumes of terrain data generated by LIDAR. When lossily compressing terrain, which is necessitated by the volume of data, our techniques will minimize the production of clearly illegal terrain.

Using some numbers will still be necessary when using features. However, the goal is to use numbers in a more natural way than a DEM does. The numbers describing a hill might include, among others: height, width at mid elevation, steepness.

Some features might be modeled on the terrain's history, in addition to its present form. We might say that the terrain originated as a plane, that was faulted, then uplifted, and so on.

One deep mathematical question is how much structure is necessary for a useful system. Some systems have incredible amounts of implicit structure. Two examples are euclidean geometry and abstract group theory. Others, not so much.

I've been thinking about this topic for some time. A few years ago, my students and I did some preliminary experiments, using a scooping operation as the sole feature type. We start with a high plateau, and apply a series of material removals with a scoop or shovel, as one might remove sand or ice cream. In each step, a moving scoop touches down at some point and then follows some trajectory while digging ever deeper. This stops when the scoop goes off the edge of the world.

A terrain formed by a series of scoops will have no interior local minima. This it is hydrologically valid. Forming sharp peaks and cliffs is easy. Unfortunately, our progress with the idea was slower than desired.

My PhD student Wenli Li and I also investigated another terrain compression method based on my ODETLAP representation. Preliminary results suggest that this has a smaller maximum error, for a given file size, than using JPEG2000.

Major research problems include these:

what set of features to use. Some features will be local, such as a hill, others more global, such as water erosion that carries material downstream far away.
how they need to be quantified, i.e., what numbers are needed as part of each feature's description,
how overlapping features should combine, and how geologically correct this needs to be. You cannot just add the elevations of two overlapping features, post by post.
what terrain properties are important, and what others are less important, to preserve. E.g., perhaps we need to preserve hydrography and visibility.
relatedly, what evaluation metric should be used?

The series of operations to form a terrain might have qualitatively different steps. (This is how the best data and image compression algorithms work.) Perhaps, after several features are applied, an error matrix might be computed, compressed, and added to the representation as the final feature. If the number of bytes needed to represent all the features of the terrain with the method is sufficiently small, then we have a winner.

My preferred research strategy would be to start by building a prototyping testbed, in which to test ideas. The programming effort would be non-trivial, but worthwhile.

4 Inferring features from terrain and vv

Given a terrain, to infer the network of features.
Given a net of features, to deduce the terrain.

One problem is to pick a feature set that is rich enough to be interesting, but small enough to be able to do something.

Another theme is handling multiple layers of data consistently. The problems are that the data may have error bars and is being compressed lossily. Here are some examples.

Terrain slope is theoretically derivable from elevation. However when the elevation has errors, its computed slope has much larger errors, so it's useless.

However, you can't just also store the slope. First, that takes a lot of space. 2nd, as you lossily compress both of them, they become inconsistent. To users, inconsistency is worse than inaccuracy.

Another example is elevation and hydrology. Theoretically, shorelines are horizontal and rivers flow straight downhill. In practice, see the attached figures. The first shows a shoreline crossing contours. The 2nd shows a stream crossing contour lines obliquely.

Related to hydrology is the problem of rasterization. This was explained to me by Larry Stanislawski of the USGS. Imagine a smooth cone pointed up. Let it rain on the point. The water should sheet smoothly and evenly down in all directions. However if you represent the surface with an elevation array (DEM), the computed water flow will collect in a small number of channels. This is independent of the fineness of the rasterization.

This is a poster from my talk at ISPRS 2004 in Istanbul, with the commutative diagram for evaluating lossily compressed terrain.

../../files/100-istanbul-isprs04-siting-poster.jpg