2023-08-11

Study of modern image formats

Illustration of an AVIF, JXL or JPEG image format What's the state of modern image formats and which one should I choose on the web now ? In this article, let's study modern image formats (AVIF, HEIF, WebP, WebP2 et JPEG XL) as precisly as possible. 🖼️

First, I'll explain how I studied these image formats. Then I'll compare the encoders performances, graph them and analyse them. Finally, on the basis of the data analysis, I'll compare the image formats and their use cases.

This article is going to be long and technical. Feel free to go straight to the analysis section, or even the conclusion.

Table of contents

  1. Study protocol
    1. Expected results
    2. Dataset
    3. Encoders
      1. Browser supports
      2. Features
  2. Analysis of produced images
    1. Encoding errors 🐛 🐞
      1. heic-enc to encode AVIF and HEIF
      2. cjxl encoder
      3. cwebp and cwp2 encoders
      4. Conclusion of the study of errors
    2. Compression rate compared to the quality of the image produced
      1. Low definition images
      2. Random PNGs
      3. Personal holiday photos
      4. High definition images
      5. Photos of the INRIA dataset
      6. PNG images from XKCD website
    3. Overall encoder speed
    4. Study of decoding time
    5. A look back at the study 👀
  3. Conclusion

Study protocol

I've put together a set of free of use images found on the web. This dataset is made up of images of very varied dimensions, sizes, origins and uses. In total, 169 images from 6 different datasets were analysed.

Each encoder converts the images by varying the speed (sometimes called effort) and quality options for each image. So for each image, 308 images were produced.

Since the formats studied all use lossy compression, it is necessary to estimate the quality lost during the encoding process. Each image produced was decoded into PNG format. This was used to calculate the structural similarity in order to estimate the loss of quality induced by the encoder. 1

For each image, the data collected from this process is as follows:

This is one of the methods for estimating the difference in perception between two images.

Expected results

I expect WebP to underperform the other formats on all counts. Also, AVIF should perform less well than WebP2. If the withdrawal of support for JPEG XL in Chrome and Firefox is anything to go by, I expect JPEG XL to perform less well than WebP2 and offer little in comparison with AVIF.

Dataset

I compiled this dataset myself. I don't own the rights to the images, but I chose royalty-free images whenever possible. I have favoured variety over freedom of use. A completely free dataset could have been put together, but it would have taken a lot more time. Although I find this ridiculous, technically I own the rights to the dataset. You are therefore free to reuse and modify it as long as you respect the original licences for the images.

You can download it from this torrent file or this magnet link 🧲.

Dataset sourceFormatNumber of imagesDescriptionLicense
low_def_imgs_set_jpgjpg39Subset of the USC-SIPI dataset.3Complex but accepted that they can be used for research
png_randompng38A variety of PNG images from a few hours' experience on the internet. A mix of "free for personal use", CC and certainly shared with a restrictive licence.
photos_imgs_set_jpgjpg16Personal holiday and travel photos.I own the rights and authorise use the CC-BY 2.0 ache licence.
high_def_imgs_set_jpgjpg28Professional-quality, royalty-free images from the Pexels website. The attribution of each image is given in the name of the photo.Free to use
selected_holliday _photos_jpgjpg24Subset of the INRIA dataset.Property of INRIA4. Use conditional on quoting the paper from which it is taken.
xkcdpng24Subset of a panel from the online comic strip XKCD.CC-BY-NC

Encoders

I used a specific version of the encoders, the most recent available. This improves the reproducibility of the experiment.

FormatTransparency supportQuality intervalEffort intervalEncoder / DecoderVersion
AVIF✔️0-100Option unavailableheic-enc / heic-convert1.15.2
HEIF✔️0-100Option unavailableheic-enc / heic-convert1.15.2
JXL✔️0-1001-95cjxl / djxlv0.8.1 (c27d4992)
WebP✔️0-1000-7cwebp / dwebp1.3.0 (libsharpyuv: 0.2.0)
WebP2✔️0-1000-10cwp2 / dwp20.0.1 (b80553d)

It should be noted that not all formats and encoders offer the same functionality. AVIF and WeBP2, for example, do not offer progressive decoding. However, I have never activated this option in the cjxl encoder.

Where possible, I activated parallel calculation, i.e. with WebP and WebP2. For WebP2 I limited the number of threads to 8, whereas WebP did not offer this option.

Similarly, the parameters between encoders are not comparable! ⚠️ So 50 quality for AVIF does not correspond to 50 quality for WebP2.

One mistake I made was to include the 100 quality of JPEG XL, which automatically activates its lossless compression mode, whereas the other encoders need an option for this.

Browser supports

By august 2023, at the time of writing:

WebP support by web browsers
AVIF support by web browsers
JPEG XL support by web browsers

Support for the various formats currently available on the canisuse.com website.

Features

FormatAnimationHDRProgressive decodingMaximum number of channelsMaximum dimensions
WebP✔️✖️✖️416 383 x 16 383
AVIF✔️10bits✖️58193x4320
JXL✔️12bits✔️51 703 741 823 x 1 703 741 823
WebP2✔️10bits🤷409916 383 x 16 383

AVIF supports arbitrary dimensions, but in this case the image is split into 8193x4320 blocks. No consistency is ensured at the link between these blocks, which can cause artefacts.

Is this synonymous with progressive decoding? It certainly does. Is it already implemented? I don't know.

Analysis of produced images

We will first analyse the errors and then the performance of each encoder.

Encoding errors 🐛 🐞

There are two types of error:

  1. When the encoding process crashes or fails, returning an error code other than 0.
  2. When, at the end of the encoding process, the generated image is particularly different from the source image. I will arbitrarily set a threshold of 0.4 for similarity2, so any image with a higher DSSIM threshold will be considered to be in error.

heic-enc to encode AVIF and HEIF

The heic-encoder has a problem managing EXIF data, and the image is not returned correctly.

As a result, the calculation of structural similarity could not work, as the original and produced images do not have the same dimensions. This error affects the AVIF and HEIF formats. In order to deal with this case, the EXIF data has been removed from all the images. This should not affect the size of the image produced. As this error has been circumvented, it has not been counted as an error.

Also on some PNGs the image produced was particularly different from the source image.

This seems to be a problem in decoding the source image, since both formats are affected. In addition, the image produced was recognisable but badly damaged. 9 source images were affected, and the error was systematic for each image produced from these source images.

cjxl encoder

The jxl encoder is still successful but the decoding step has failed 3 times.

This suggests that jxl has not produced a valid image. In a way, this is the worst possible bug. Testing with another JPEG XL decoder implementation, we get a more explicit "Unexpected end of file" error.

There were also 3 images generated that are particularly different from the source image.

However, unlike the heic-encoder, this error only occurs when images are encoded with a quality of 100 and a level 1 effort, called lightning ⚡. The 3 images affected by this bug are unrecognisable but curiously, the set of colours used is recognisable. It should be noted that these 3 images were also images that were not faithfully represented by heic-enc. It's unlikely to be a coincidence, but it's possible.

cwebp and cwp2 encoders

The WebP encoder fails on 6 images and the WebP2 encoder fails on 1 image.

The WebP2 encoding error is clear: WP2_STATUS_BAD_DIMENSION. In fact, the image in question is particularly large. This is the only error.

The WebP encoder fails on the same image, for the same reason, but also on 5 other images. In these cases the error returned is :

Saving file 'test_crash.webp'
Error! Cannot encode picture as WebP
Error code: 6 (PARTITION0_OVERFLOW: Partition #0 is too big to fit 512k.
To reduce the size of this partition, try using less segments with the -segments option, and eventually reduce the number of header bits using -partition_limit. More details are available in the manual (`man cwebp`)

Given that the error only occurs with quality at 90 or 100 and effort at 0 or 1 simultaneously, I consider this to be a borderline case that the encoder has difficulty handling. I didn't investigate further.

I note that, for these two encoders, there is no error of too low similarity between the image produced and the source image. This is a very good thing. 👌

Conclusion of the study of errors

Only the JPEG XL format fails to encode images and produces glitches. The PNG image decoding part of heic-enc needs to be improved, as it has very many bugs which also produce glitches. The WebP and WebP2 formats have problems with the limited dimensions supported by the format. WebP has a bug in its official implementation in certain limited cases.

From this point, we will exclude cases with errors from our statistics. As a result, AVIF and HEIF will have 9 fewer images. JPEG XL, WebP and WebP2 will have no data corresponding to certain parameters.

The bugs encountered by JPEG XL have now been resolved but the fix is not yet available in the latest version released. I warned about the heif-encoding bug and was fixed promptly. I didn't do the same for WebP, as the process requires a Google account.

Compression rate compared to the quality of the image produced

For ease of comparison, I'm going to comparison by dataset. It turns out that many images in the same dataset follow the same trends. So I'm going to try and list the different trends.

Low definition images

Within the low_def_imgs_set_jpg dataset there are 3 trends. A small set of images where the AVIF and HEIF formats do particularly badly, 3 images where the variation in performance is very large and the rest where all the image formats have roughly equivalent performance.

In around 13 cases, the AVIF and HEIC formats performed less well than all the others. The maximum similarity achieved is far from 0. Looking at the images, it does not seem that the images concerned have anything in common.

This poor performance can be explained either by a decoding problem or by the fact that the algorithms are not effective on these images.

On 3 images, it's clear that the results are very varied. The performance of each format depends very much on the effort and quality configured. Despite the wide variations, there is a clear segmentation where WebP2 clearly outperforms its competitors. AVIF and HEIC are OK, but JPEG XL lags behind WebP.

The images concerned are clearly greyscale test images.

Finally, for the rest of the low-resolution images, the performance of all the image formats is more or less identical. Only WebP lags slightly behind.

Random PNGs

Here, it is not easy to define a trend. Generally speaking, AVIF is equivalent to WebP2 and both do better than JPEG XL. WebP is generally pretty bad, but it's not uncommon for it to equal JXL in performance. But this is only shown as clearly as in some graphs.

General trend in PNGs, compression ratio as a function of quality for different image formats

However, the slight dominance of AVIF over HEIF is clear in this data set. Also, on a fairly large number of images, the AVIF and HEIF formats manage to reduce the size of the image by reducing its quality, whereas the other algorithms reach a threshold (each with a different threshold) that they are unable to exceed.

Some images (👽) have a very unusual graph!

Plot of quality as a function of size for different formats of the cat_donus_pixel_art image

This is the case with cat_donus_pixel_art where JPEG XL is very good at 100 quality (lossless) but does slightly less well than WebP otherwise. WebP also gives very good results, whereas WebP2 has difficulty optimising image size. AVIF gives very poor results, while HEIF systematically increases image size (except at quality 0).

Also on large, highly detailed images, JPEG XL can become interesting. This is the case, for example, on the screenshot image of the home page of https://wikipédia.org where JPEG XL becomes as interesting as WebP2, itself better than AVIF. JPEG XL performs better on large images (in size) such as eiffel_tower_pngwing, screenshot_firefox_c_quirks_en_ache_one.

Graph of quality as a function of size for different formats for a Wikipedia screenshot.

Personal holiday photos

The trend is very clear in this dataset. All formats except WebP are excellent.

General trend in personal photos, compression ratio as a function of quality for different image formats

Note the mediocre performance of WebP2 when the quality is 100%. Almost ✕ 6 on the nantes_église image and almost ✕ 5 on sleeping_bridge_beauty. AVIF suffers from the same problem on 100 and 90 quality but not to such an extent. JPEG XL never encounters this problem.

High definition images

The trend is identifiable. All formats except WebP are excellent. We can even rank JPEG XL first, AVIF/HEIF second and WebP2 third. It should be noted that this ranking is not true for all images. Also, all three formats have very good results, so the difference between WebP2 and JPEG XL is not that noticeable.

General trend towards high-resolution photos, compression ratio as a function of quality for different image formats

Broadly the same trend as the personal holiday photo dataset. The performance achieved is on average better, however.

No image stands out from the crowd, except for WebP2 performance at 100% quality and effort 0. An initial size multiplied by 8.5!

Photos of the INRIA dataset

Same analysis than the two previous datasets. 🤷

General trend for photos in the INRIA dataset, compression ratio as a function of quality for different image formats

PNG images from XKCD website

Very poor results for JPEG XL, worse overall than WebP, except on one image, where it excels in lossless mode. (1416.jpg). WebP2 does well. AVIF/HEIF is also quite equivalent. There are two trends.

General trend in PNGs from XKCD, compression ratio as a function of quality for different image formats

The first, and most common, is the dominance of WebP2 and AVIF/HEIF (equivalent) over the other formats. However, WebP2 performs better on certain images (136, 209, 399, 695, 731, 1071). What these images have in common is that they have colours, often a lot of colours. But there are some colour images where the performance gain is not significant.

The second, which can be seen in images 1123, 1144 and to a lesser extent 376, 1163, 1195 and 1445, is that JXL is better than the others when the quality is good and WebP is better than the others when the quality is poor. This trend was difficult to predict. What these images have in common is their small size.

Trend in PNGs from XKCD, where WebP and JXL alternately dominate the other formats, compression ratio as a function of quality for the different image formats

One exception is image 1416, which caused the heic-enc encoder to bug.

Overall encoder speed

For the same quality, the encoders take very different times. Here, taking the AVIF format points as a basis and comparing them with the similarity and similar ratio points, we obtain this graph:

The scale is logarithmic in order to display all the data on the graph.

Encoders encoding time

WebP is the fastest format overall. Next come AVIF and HEIF, but JPEG XL is very close. It is even faster on average. The anomaly here is WebP2, which is much slower overall (note that the scale is logarithmic ...)

AVIF and HEIF are the most consistent, being the only encoders to encode all images in less than 10s. JPEG XL took up to 2 minutes to encode certain images, while WebP2 sometimes took 2 hours to encode an image!

Study of decoding time

I looked at the decoding times for the different formats, but they're all pretty much the same. All images are decoded in 10s maximum by all image formats.

This result is quite surprising. It would not be surprising if an error made it way in these measurements and if, for example, it was the time taken to write to disk that was measured here.

The most likely scenario is that the PNG encoding time is of a higher order than the decoding time for the various formats.

A slight tendency for JPEG XL to be slower than AVIF and WebP2 to be faster than AVIF.

Median decoding time by decoder.

Mean decoding time by decoder.

Here, I have not enabled progressive decoding of JPEG XL. It would have been interesting to see whether progressive decoding of JPEG XL would have resulted in a faster display.

A look back at the study 👀

I think this study can be improved. I plan to repeat this comparison in July 2024 and would like to list the points for improvement.

  1. Improve reproducibility with a Docker image.

  2. Compare equal BPP (time/quality) and equal quality (BPP and time). This is particularly complicated, but it can be done.

  3. Study more encoders. Here, I have made the amalgam encoder ≈ format, but it is true that the quality of the encoder plays a particular role in the representation of a format.

  4. Use several image quality metrics.

  5. Include the BPG (Better Portable Graphic) format for comparison.

  6. I wanted to compare only lossy image encodings. Except that JPEG XL at quality 100 is automatically a lossless format. ❌ Next time I'd like to include the lossless versions of the different encoders.

  7. Check the validity of the decoding times.

Conclusion

Note that in all cases AVIF, WebP2 and JPEG XL are generally more interesting than JPEG or PNG.

Taking into account current browser support (August 2023). Only use AVIF if Edge (or QQ Browser) is not essential for you. If you can, use the [picture] tag (https://developer.mozilla.org/fr/docs/Web/HTML/Element/picture) to prepare for future image formats. You can then offer AVIF to those who support it, otherwise use WebP6.

Taking into account only the qualities and shortcomings of the various encoders, for example for storage purposes, or in the future when these formats are supported by the browser.

JPEG XL should be used for all photographs. For PNGs, it is difficult to choose between AVIF and WebP2 in the general case, choose WebP2 with an effort of 6 and a quality of 90 or AVIF with a quality between 80 and 90. If you process your images manually, test AVIF and WebP2 and see which one best matches your data. If you do automatic processing, preferably use WebP2, but bear in mind that AVIF is also a very good candidate, especially if you are constrained by encoding time or CPU load or if you have images of arbitrarily large dimensions.

So what I'm observing contradicts other studies on the subject. This may be due to the encoders progress. In fact, libjxl and libheif are in active development while libwp2 is in pause. Despite the encoder's maturity, it seems that WebP2 will not go beyond the experimental stage.

Additional Resources

  1. The implementation (in Rust) used is made by Kornel, whom I thank.

  2. Technically, a dissimilarity calculation. Here, we are looking for structural similarities between two images using the [SSIM] algorithm (https://fr.wikipedia.org/wiki/Structural_Similarity). This is one of the methods for estimating the difference in perception between two images. 2

  3. Low-resolution images, mainly from the misc category.

  4. From the paper "Hamming Embedding and Weak geometry consistency for large scale image search", presented at the 10ᵉ European Computer Vision Conference in October 2008.

  5. Each level of effort has a textual representation and I find that very funny.

    NiveauTextuelle
    1lightning
    2thinder
    3falcon
    4cheetah
    5hare
    6wombat
    7squirrel
    8kitten
    9Tortoise
  6. Or PNG if you need to support very old browsers.

fr