Lossless Image Compression

This articles describes various methods in which one may compress digital images losslessly. This differs from lossy image compression, in that the reconstructed image must be identical to the original. State of the art with lossless image compression involves the PNG standard. A PNG file contains a series of chunks which contain the information within the compressed image. These chunks are created by applying a filter to a part the image, and then compressing the result with the DEFLATE algorithm from zlib.

There are several filters available within the PNG specification. Most of these construct linear combinations of previous pixels to predict the "current" pixel. By saving the difference between the prediction and the actual pixel, the data may be compressed. This delta-encoding produces signed numbers near zero if the filter is any good. The non-uniform distribution of deltas may then be efficiently compressed. Note that these filters are applied per-red, green, blue colour value separately. No inter-colour filtering is done.

The better PNG encoders will carefully select different filters that work best for the different regions of an image to produce maximal compression. This means that the best encoders can produce compressed images that are about half the size of the simpler encoders. This variance is problematic for comparison purposes. To simplify things, we will choose the Transcode library as the PNG encoder implementation. This encoder is commonly used, and can provide a reasonable baseline.

The next problem is to choose a set of images to compress. Since compression is variable with image content, this choice matters. Note that traditionally, many image compressors have been tested with the 512×512 Lenna test image. However, these days, such an image is rather small and doesn't quite correspond to the typical output from digital cameras of today. Thus we choose to use the RGB 8 bit images from www.imagecompression.info which are quite large.

The initial size of these images in PPM format, and after compression with the Transcode convert command into PNG format, the file sizes are

The first step is to replace the DEFLATE compression algorithm with something more modern. The technique used in the bzip2 compression program (The Burrows-Wheeler Transform) was relatively new when the PNG format was created. Since not enough experience with its usage existed, together with uncertainty about whether or not the transform was patented, the older DEFLATE algorithm was used instead. Today, the bz2 format is used extremely often, and such concerns have evaporated.

A complication with replacing the compression algorithm is that PNG encoders have been tuned for DEFLATE. Their choice of filters may not be optimal any more. Thus instead of allowing multiple filters, we will use just one, the Paeth filter. This filter is the one most commonly used for RGB images, so shouldn't affect the results too much.

The above assumes that the test images all lie within a rgb/ sub-directory. It also uses the bzip2 command via the system() function. A better program would call the bzip2 library instead, but the above is definitely simpler.

As can be seen, even though we have lost the advantage of choosing different filters based on image content, the superior compression of the bzip2 algorithm has made up for that. The resulting sizes are close to the PNG files, with the one exception being artificial.ppm. It appears that either Transcode wasn't very good at picking filters, or that bzip2 is particularly good in that case.

Now that we have a baseline, lets investigate other filters, and see how much we can improve the compression ratio. If we find a good filter that helps many of the above image files, then it may be worthwhile adding it to a later revision of the PNG specification.

The Paeth Filter

The Paeth filter is rather complicated... what does it do? It firstly calculates a linear combination of the values of three adjacent pixels to the current pixel. (Those immediately to the left, top left, and top.) It then uses the result of that calculation to pick one of the those three pixels to return as a predicted value.

By Taylor-expanding the functional form of the image about the current pixel, we can see what the linear combination actually does.

Setting the scale of Δx and Δy to be pixel distance allows us to substitute for their value in the linear combination. Doing that gives the result that the linear combination is an approximation to f(x, y) + d²f/dxdy. Thus both of the first order derivatives have vanished, and so the Paeth filter tries to minimize the intensity gradient for each colour.

Linear Extrapolation

Why does the Paeth filter then choose a pixel based on the extrapolated value, rather than using that value itself as the prediction? To see the effect of doing that, we create a linear-extrapolation filter:

The results are in general worse for all the images. The very smooth spider_web.ppm is the notable exception which improves somewhat. By choosing an old pixel value instead of creating a new one, the Paeth filter tends to produce more predictable deltas. The zero-curvature linear extrapolation may be technically better, but produces more random results which the compressor finds harder to use.

Third order Extrapolation

If a first-order extrapolation is fairly good, then what about using a stencil three pixels wide instead of two? This allows us to zero out the first three derivatives in the extrapolation, and get an even better prediction. By solving the linear algebra problem in the Taylor expansion coefficients, one can obtain the coefficients to multiply each of the pixel values. The resulting predictor looks like:

This is absolutely horrible! What has gone wrong? The problem here is due to error. The results from a digital camera need to be truncated to fit into 8 bits. The real colour intensity at a given point requires an infinite amount of accuracy to express. The difference between the two is the truncation error, and averages one bit per pixel. We can use error analysis to determine how this error affects the results of the extrapolation.

Since the formula is a linear combination of independent pixels, we can assume their errors are also independent. If that is the case, then the error in the answer will be proportional to the square root of the sum of the absolute values of the coefficients. For the linear combination used within the Paeth filter, the error is thus twice [sqrt(1+1+1+1)] that of a single pixel. For the third order calculation, the error is four times that of a single pixel.

The next problem is that many more pixels contribute to the result, and conversely a given pixel will affect many more results. This "spreads the error about" making it harder to compress by another factor of 8/3. The result is that the linear filter, which outputs commonly between ±1 as its deltas is replaced by something which outputs randomly between ±16. This reduces the compression factor considerably.

Mixed-order Filter

The one saving grace with the third order method is that it is a superior filter when the intensity gradient is changing rapidly. It can correctly predict pixel values along diagonal edges which are not handled well by the lower order method. Thus we might want to try a combination filter, choose the linear filter most of the time, and choose the third order method if it predicts a wildly different result. Such a mixed-order filter looks like:

It produces results that are better than the third order method but on average not as good as the simple linear method, let alone the Paeth filter results.

Third order Paeth

Another possibility is to combine the third order method with the Paeth filter. Use the better prediction produced to choose a more accurate pixel. This reduces the error problem significantly.

The results are encouraging, but still not as good as the normal Paeth algorithm. It seems that the extra power of the third order method is not useful at 8 bits per pixel colour. At 16 bits per colour, it may be worth investigating again. However, most image files today are still 8 bits per channel, so we will have to try something else.

Scaled Paeth Filter

Since the Paeth filter has had the best performance so far, lets investigate altering it to see if we can improve its results. One possibility is to change the pixel choice function. The function currently picks a pixel based on the absolute difference between the extrapolated value and the adjacent pixel values. However, the diagonal pixel is slightly further away than the orthogonal neighbours. The Paeth algorithm takes this into account by making it slightly less likely to choose the diagonal pixel by converting distance ties into wins for the adjacent pixels. Perhaps we can improve this even more?

A possibility is to add a scale factor to the intensity-distance calculation. Since the diagonal pixel is further away, we can give it a higher scale factor. Picking a low-order approximation to sqrt(2), which is the difference in distance, we choose multipliers 2 and 3 as these scale factors.

Paeth Linear

The above filter tried to improve things by returning the further pixel less often. Perhaps instead, we could return a "better" pixel instead? One choice is to return the linear extrapolation in that case, since it may be more accurate under the assumption of no intensity gradient curvature. In addition, we will alter the scale factors a little more. By choosing scale factors of two for the adjacent pixels, and unity for the diagonal pixel, we correctly scale the prediction for extrapolated diagonal gradients.

These results are about 1-2% better than the original Paeth filter algorithm. Unfortunately, further tweaks to the Paeth algorithm do not appear to help.

Average Filter

Another filter within the PNG specification is the "average" filter which averages the pixels to the top and left of the given pixel to obtain a prediction. Perhaps we could average add the diagonal top-left pixel to this, and improve the prediction? Doing this yields the following filter:

This filter does fairly well for smooth images. However, on average it is about 1-2% worse than the "Paeth Linear" filter.

Rounded Average

One possible way to improve the previous filter is to note that we truncated towards zero in the division. Perhaps rounding could be better. This can be done by adding one to the result pre-division.

Weighted Average

Another improvement could be obtained by using some form of weighted average, just as was done with improving the predictor within the Paeth filter. Weighting inversely according to distance from the predicted pixel yields:

The resulting filter is better than the Paeth filter, and in fact better than the "Paeth Linear" variant.

Linear Combination

The previous filter gained its improvements by dropping the requirement that the weights be the same for all the pixels. It, however, still required the weights to be positive. If we drop that requirement, then further improvements can be gained. In effect we can form a meta-average between the linear extrapolation filter and the weighted-average filter. After trying many possible weightings, the following predictor returned the best results.

This filter weights the orthogonally adjacent pixels with a scale factor of three, and the diagonal pixel with a weight of minus one. The results are quite a bit better than the Paeth filter, and nearly always better than the PNG results. Remember that the PNG specification allows the encoder to dynamically choose which filter it applies, depending on which works best in that particular part of an image.

This particular filter algorithm is a fair bit simpler to vectorize than the Paeth filter due to the lack of conditional statements within the kernel. (The existing check is just there to make sure the code doesn't read outside the image. That check can easily be removed by unrolling the first row and column in the main image loop.) The only small annoyance is that there is a division. Divisions can be quite slow, so it would be nice to rescale the coefficients to avoid this.

Linear Combination 2

By rescaling so that the division is a power of two, it may be implemented as a shift. This results in faster image compression and decompression routines. Unfortunately, the compression results aren't quite as good as the above algorithm, but they are still quite good.

Conclusion

The (3,3,-1) linear combination filter appears to be well-suited as a preconditioner to a compression algorithm in lossless image compression. The (5,5,-2) algorithm is almost as good, but is a bit faster. The linear combination filter and bzip2 pair performed very well considering no subimage filter-matching was performed, yielding compression ratios on average a few percent better than those with PNG. Perhaps the new linear combination filter should be added as an extension to the PNG specification.

Comments

Greg Roelofs said...

Very nice writeup!

Of course, one reason PNG has succeeded is compatibility, so we should be certain that any incompatible update to the spec wrings every (reasonable) bit of compression out. To that end, there are at least two tweaks to the above that are likely to pay off:

(1) There is a lossless YUV-like transformation of RGB samples about which we learned too late to go into PNG, but it is in the MNG spec. IIRC, it improved compression of photographic images by about 5% over the existing filters. (It basically doubles the filter selection; any given row can be YUV-transformed or not before applying the existing filters.)

(2) In my own testing of a moderately large and broad file corpus, bzip2 was clearly obsoleted by xz (LZMA2). xz level 2 was significantly faster (both compressing and decompressing) and compressed as well as or better than any level of bzip2, while the higher levels compressed much better and decompressed faster, albeit at a significant cost in compression speed.

There's also been some recent work on better dynamic-filter-selection strategies, which should be folded in, too.

With respect to the image corpus used above, it's true that digital cameras are widely used, but I'm not convinced there's a huge number of them being used in lossless mode--and even if there were, I don't believe it would be in 8bps mode; I think (but haven't verified) that raw image data tends to be at least 10 to 12bps, which gets us into 16bps (48-bit) PNG territory. Compression there is a very different beast.

I think I realistic, spec-change-supporting corpus must be decently large (at least hundreds of images) and include a reasonable sampling of online web comics, cluster and web-site graphs (e.g., as produced by RRD, nagios, etc., which frequently include lots of embedded text), perhaps a suite of exported presentation slides, and a large population of colormapped images (not icons--nobody cares how well tiny images compress anymore--but those with dimensions between, say, 400 and 1600 pixels in at least one dimension).

Of course, all of that is a lot of work, and the choices made in selecting the corpus will probably lead to lengthy arguments, discussion, etc., etc. Just sayin'. ;-)

Khadijjah said...

Best Actor:Thank you for sharing with us your trip to Japan, coemlpte w pics and links !!! I am glad to learn you did see Mt. Fuji, an experience that I believe is similar to seeing the original as opposed to the photo of a famous painting. I only had a brief visit of Japan while attending a conference there many years ago. So your travel info will be useful to me for future reference.BTW, did you take your blog-header photo at the Expt Farm? I often pass by that house when I go skating on the Canal or running/skiing in the area. Enjoy the last day of Winterlude !!

Jacklyn said...

All of my questions sedktet-lhants! http://rszmwpud.com [url=http://cywbdgpud.com]cywbdgpud[/url] [link=http://znxuarfkby.com]znxuarfkby[/link]