Next: 4 Lossless Compression Experiments Up: Compressing Elevation Data

3 Compression Review

Here is a quick summary of some of the major themes in compression. Huffman coding measures the number of times each symbol in a file occurs, and uses fewer bits to represent the more common symbols. In English text, instead of using 7 bits for each character, E might be stored as 00, which Q might be 10110010. The limitation is that adjacencies, such as that Q is almost always followed by U, are not used. Huffman is optimal if there are no such relations, except for a roundoff error since each symbol must be coded with an integral number of bits even if its probability is not exactly an inverse power of two.

Arithmetic coding improves on Huffman in that case by coding symbols with real fractional numbers. Only enough bits to uniquely identify the symbol are stored. Arithmetic coding usually uses floating point math, and can be subject to roundoff errors, which will cause errors in the uncompressed file, unless care is taken in the implementation. Huffman and arithmetic coding are often used inside other schemes to compress calculated coefficients.

Run-length encoding represents strings of the same repeated symbol as a count. The GIF (Graphics Interchange Format) image format common on PCs uses this (and other techniques). This method is excellent for line drawings. A step more sophisticated than this is delta encoding, which calculates the difference of each pixel of an image from the previous pixel, and then uses some other method, such as run-length encoding, to compress the differences. Facsimile transmission does this in 2-D, calculating the difference of one row from the previous row, then within that row of differences, calculating second order differences across it. The 2-D differencing is about 10% better than 1-D.

The above two methods are lossless; the original file is completely reconstructible. For images, lossy methods are often used since they may allow much greater compression without much visibly hurting the image.

Wavelets[13] offer a new technique that appears quite powerful; some of the best compression systems described below are based on this.

JPEG, from its designers, the Joint Photographic Experts Group, is a standard image compression method that splits the image into blocks, does a discrete cosine transform frequency analysis of each block (similar to a Fourier transform), deletes components that should not be very visible, and Huffman codes some things. JPEG is usually lossy. It has a parameter to control how lossy, with a lossier file being smaller. JPEG was designed to compress hi-quality photographic images, with 24 bpp (3 colors at 8 bits). This contrasts to GIF, which assumes only 8 bpp. JPEG does not do so well compressing line drawings and text, with sharp jumps in intensity between adjacent pixels. There is also LJPEG, lossless JPEG. LJPEG may use various techniques, such as 2-D differencing, as well as a lossy JPEG followed by a compressed correction table. Huang and Smith[15] have an implementation that compresses binary (raw) PBM files. 2-D differencing also appears in other methods.

Next: 4 Lossless Compression Experiments Up: Compressing Elevation Data

To appear Previous: 2 Review

Wm Randolph Franklin
Tue Jun 13 14:43:17 EDT 1995