## Lossless Image CompressionThis articles describes various methods in which one may compress digital images losslessly. This differs from lossy image compression, in that the reconstructed image must be identical to the original. State of the art with lossless image compression involves the PNG standard. A PNG file contains a series of chunks which contain the information within the compressed image. These chunks are created by applying a filter to a part the image, and then compressing the result with the DEFLATE algorithm from There are several filters available within the PNG specification. Most of these construct linear combinations of previous pixels to predict the "current" pixel. By saving the difference between the prediction and the actual pixel, the data may be compressed. This delta-encoding produces signed numbers near zero if the filter is any good. The non-uniform distribution of deltas may then be efficiently compressed. Note that these filters are applied per-red, green, blue colour value separately. No inter-colour filtering is done. The better PNG encoders will carefully select different filters that work best for the different regions of an image to produce maximal compression. This means that the best encoders can produce compressed images that are about half the size of the simpler encoders. This variance is problematic for comparison purposes. To simplify things, we will choose the Transcode library as the PNG encoder implementation. This encoder is commonly used, and can provide a reasonable baseline. The next problem is to choose a set of images to compress. Since compression is variable with image content, this choice matters. Note that traditionally, many image compressors have been tested with the 512×512 Lenna test image. However, these days, such an image is rather small and doesn't quite correspond to the typical output from digital cameras of today. Thus we choose to use the RGB 8 bit images from www.imagecompression.info which are quite large. The initial size of these images in PPM format, and after compression with the Transcode
The first step is to replace the DEFLATE compression algorithm with something more modern. The technique used in the A complication with replacing the compression algorithm is that PNG encoders have been tuned for DEFLATE. Their choice of filters may not be optimal any more. Thus instead of allowing multiple filters, we will use just one, the Paeth filter. This filter is the one most commonly used for RGB images, so shouldn't affect the results too much. The resulting code which filters and compresses all of the images above is
The above assumes that the test images all lie within a The resulting file sizes are; (with lower relative percentages being better)
As can be seen, even though we have lost the advantage of choosing different filters based on image content, the superior compression of the Now that we have a baseline, lets investigate other filters, and see how much we can improve the compression ratio. If we find a good filter that helps many of the above image files, then it may be worthwhile adding it to a later revision of the PNG specification. ## The Paeth FilterThe Paeth filter is rather complicated... what does it do? It firstly calculates a linear combination of the values of three adjacent pixels to the current pixel. (Those immediately to the left, top left, and top.) It then uses the result of that calculation to pick one of the those three pixels to return as a predicted value. By Taylor-expanding the functional form of the image about the current pixel, we can see what the linear combination actually does. f(x+Δx, y+Δy) = f(x, y) + Δx df/dx + Δy df/dy + Δx^{2}/2 d^{2}f/dx^{2} + Δx Δy d^{2}f/dxdy + Δy^{2}/2 d^{2}f/dy^{2} + ...
Setting the scale of Δx and Δy to be pixel distance allows us to substitute for their value in the linear combination. Doing that gives the result that the linear combination is an approximation to f(x, y) + d ## Linear ExtrapolationWhy does the Paeth filter then choose a pixel based on the extrapolated value, rather than using that value itself as the prediction? To see the effect of doing that, we create a linear-extrapolation filter:
Using this, we obtain the results:
The results are in general worse for all the images. The very smooth ## Third order ExtrapolationIf a first-order extrapolation is fairly good, then what about using a stencil three pixels wide instead of two? This allows us to zero out the first three derivatives in the extrapolation, and get an even better prediction. By solving the linear algebra problem in the Taylor expansion coefficients, one can obtain the coefficients to multiply each of the pixel values. The resulting predictor looks like:
The resulting file sizes are
This is absolutely horrible! What has gone wrong? The problem here is due to error. The results from a digital camera need to be truncated to fit into 8 bits. The real colour intensity at a given point requires an infinite amount of accuracy to express. The difference between the two is the truncation error, and averages one bit per pixel. We can use error analysis to determine how this error affects the results of the extrapolation. Since the formula is a linear combination of independent pixels, we can assume their errors are also independent. If that is the case, then the error in the answer will be proportional to the square root of the sum of the absolute values of the coefficients. For the linear combination used within the Paeth filter, the error is thus twice [sqrt(1+1+1+1)] that of a single pixel. For the third order calculation, the error is four times that of a single pixel. The next problem is that many more pixels contribute to the result, and conversely a given pixel will affect many more results. This "spreads the error about" making it harder to compress by another factor of 8/3. The result is that the linear filter, which outputs commonly between ±1 as its deltas is replaced by something which outputs randomly between ±16. This reduces the compression factor considerably. ## Mixed-order FilterThe one saving grace with the third order method is that it is a superior filter when the intensity gradient is changing rapidly. It can correctly predict pixel values along diagonal edges which are not handled well by the lower order method. Thus we might want to try a combination filter, choose the linear filter most of the time, and choose the third order method if it predicts a wildly different result. Such a mixed-order filter looks like:
It produces results that are better than the third order method but on average not as good as the simple linear method, let alone the Paeth filter results.
## Third order PaethAnother possibility is to combine the third order method with the Paeth filter. Use the better prediction produced to choose a more accurate pixel. This reduces the error problem significantly.
The results are encouraging, but still not as good as the normal Paeth algorithm. It seems that the extra power of the third order method is not useful at 8 bits per pixel colour. At 16 bits per colour, it may be worth investigating again. However, most image files today are still 8 bits per channel, so we will have to try something else. ## Scaled Paeth FilterSince the Paeth filter has had the best performance so far, lets investigate altering it to see if we can improve its results. One possibility is to change the pixel choice function. The function currently picks a pixel based on the absolute difference between the extrapolated value and the adjacent pixel values. However, the diagonal pixel is slightly further away than the orthogonal neighbours. The Paeth algorithm takes this into account by making it slightly less likely to choose the diagonal pixel by converting distance ties into wins for the adjacent pixels. Perhaps we can improve this even more? A possibility is to add a scale factor to the intensity-distance calculation. Since the diagonal pixel is further away, we can give it a higher scale factor. Picking a low-order approximation to sqrt(2), which is the difference in distance, we choose multipliers 2 and 3 as these scale factors.
These results are similar to the normal Paeth filter. ## Paeth LinearThe above filter tried to improve things by returning the further pixel less often. Perhaps instead, we could return a "better" pixel instead? One choice is to return the linear extrapolation in that case, since it may be more accurate under the assumption of no intensity gradient curvature. In addition, we will alter the scale factors a little more. By choosing scale factors of two for the adjacent pixels, and unity for the diagonal pixel, we correctly scale the prediction for extrapolated diagonal gradients.
These results are about 1-2% better than the original Paeth filter algorithm. Unfortunately, further tweaks to the Paeth algorithm do not appear to help. ## Average FilterAnother filter within the PNG specification is the "average" filter which averages the pixels to the top and left of the given pixel to obtain a prediction. Perhaps we could average add the diagonal top-left pixel to this, and improve the prediction? Doing this yields the following filter:
This filter does fairly well for smooth images. However, on average it is about 1-2% worse than the "Paeth Linear" filter. ## Rounded AverageOne possible way to improve the previous filter is to note that we truncated towards zero in the division. Perhaps rounding could be better. This can be done by adding one to the result pre-division.
Unfortunately, this is fractionally worse than not rounding. ## Weighted AverageAnother improvement could be obtained by using some form of weighted average, just as was done with improving the predictor within the Paeth filter. Weighting inversely according to distance from the predicted pixel yields:
The resulting filter is better than the Paeth filter, and in fact better than the "Paeth Linear" variant. ## Linear CombinationThe previous filter gained its improvements by dropping the requirement that the weights be the same for all the pixels. It, however, still required the weights to be positive. If we drop that requirement, then further improvements can be gained. In effect we can form a meta-average between the linear extrapolation filter and the weighted-average filter. After trying many possible weightings, the following predictor returned the best results.
This filter weights the orthogonally adjacent pixels with a scale factor of three, and the diagonal pixel with a weight of minus one. The results are quite a bit better than the Paeth filter, and nearly always better than the PNG results. Remember that the PNG specification allows the encoder to dynamically choose which filter it applies, depending on which works best in that particular part of an image. This particular filter algorithm is a fair bit simpler to vectorize than the Paeth filter due to the lack of conditional statements within the kernel. (The existing check is just there to make sure the code doesn't read outside the image. That check can easily be removed by unrolling the first row and column in the main image loop.) The only small annoyance is that there is a division. Divisions can be quite slow, so it would be nice to rescale the coefficients to avoid this. ## Linear Combination 2By rescaling so that the division is a power of two, it may be implemented as a shift. This results in faster image compression and decompression routines. Unfortunately, the compression results aren't quite as good as the above algorithm, but they are still quite good.
## ConclusionThe (3,3,-1) linear combination filter appears to be well-suited as a preconditioner to a compression algorithm in lossless image compression. The (5,5,-2) algorithm is almost as good, but is a bit faster. The linear combination filter and |

About Us | Returns Policy | Privacy Policy | Send us Feedback |

Company Info |
Product Index |
Category Index |
Help |
Terms of Use
Copyright © Lockless Inc All Rights Reserved. |

## Comments

Greg Roelofs said...Of course, one reason PNG has succeeded is compatibility, so we should be certain that any incompatible update to the spec wrings every (reasonable) bit of compression out. To that end, there are at least two tweaks to the above that are likely to pay off:

(1) There is a lossless YUV-like transformation of RGB samples about which we learned too late to go into PNG, but it is in the MNG spec. IIRC, it improved compression of photographic images by about 5% over the existing filters. (It basically doubles the filter selection; any given row can be YUV-transformed or not before applying the existing filters.)

(2) In my own testing of a moderately large and broad file corpus, bzip2 was clearly obsoleted by xz (LZMA2). xz level 2 was significantly faster (both compressing and decompressing) and compressed as well as or better than any level of bzip2, while the higher levels compressed much better and decompressed faster, albeit at a significant cost in compression speed.

There's also been some recent work on better dynamic-filter-selection strategies, which should be folded in, too.

With respect to the image corpus used above, it's true that digital cameras are widely used, but I'm not convinced there's a huge number of them being used in lossless mode--and even if there were, I don't believe it would be in 8bps mode; I think (but haven't verified) that raw image data tends to be at least 10 to 12bps, which gets us into 16bps (48-bit) PNG territory. Compression there is a very different beast.

I think I realistic, spec-change-supporting corpus must be decently large (at least hundreds of images) and include a reasonable sampling of online web comics, cluster and web-site graphs (e.g., as produced by RRD, nagios, etc., which frequently include lots of embedded text), perhaps a suite of exported presentation slides, and a large population of colormapped images (not icons--nobody cares how well tiny images compress anymore--but those with dimensions between, say, 400 and 1600 pixels in at least one dimension).

Of course, all of that is a lot of work, and the choices made in selecting the corpus will probably lead to lengthy arguments, discussion, etc., etc. Just sayin'. ;-)