A very compact representation of an image placeholder.
Store it inline with your data and show it while the real image is loading for a smoother loading experience.
It’s similar to BlurHash but with the following advantages:

  • Encodes more detail in the same space
  • Much faster to encode and decode
  • Also encodes the aspect ratio
  • Gives more accurate colors
  • Supports images with alpha

Despite doing all of these additional things, the code for ThumbHash is still similar in complexity to the code for
BlurHash. One potential drawback compared to BlurHash is that the parameters of the algorithm are not configurable
(everything is automatically configured).

The code for this is available at
https://github.com/evanw/thumbhash and contains implementations for
JavaScript, Rust, Swift, and Java. You can use npm install thumbhash to install the
JavaScript package and cargo add thumbhash to
install the Rust package.

#Demo

#Comparisons

The table below compares ThumbHash to several other similar approaches:

  • ThumbHash:
    ThumbHash encodes a higher-resolution luminance channel, a lower-resolution color channel, and an optional alpha
    channel. The format is described in detail in the details section. There are no
    parameters to configure.

  • BlurHash:
    Uses BlurHash with 3×3 components for square images, 4×3
    components for landscape images, and 3×4 components for portrait images. This is the configuration recommended
    in the documentation, and is roughly the same size as a ThumbHash encoded using base64.

  • Potato WebP:
    This is an experiment of mine to see how Google’s
    WebP image format does at this. The
    “hash” is just the contents of the “VP8” chunk in a minimal WebP file: 0% quality (i.e.
    potato quality) and a size of 16×16, since
    WebP encodes everything in 16×16 blocks. The image is reconstructed by blurring a scaled-up copy of a minimal
    WebP file with the VP8 chunk reinserted.

In addition to these sample images, you can also drag and drop your own images to compare them here.

Original image ThumbHash BlurHash Potato WebP

#Details

The image is approximated using the
Discrete Cosine Transform. Luminance is
encoded using up to 7 terms in each dimension while chrominance (i.e. color) is encoded using 3 terms in each
dimension. The optional alpha channel is encoded using 5 terms in each dimension if present. If alpha is present,
luminance is only encoded using up to 5 terms in each dimension.

Each channel of DCT coefficients comes in three parts: the DC term, the AC terms, and the scale. The DC term is the
coefficient for the 0th order cosine and the AC terms are the coefficients of all other cosines (DC and AC are terms
from signal processing). All values are quantized to only a few bits each. To maximize the useful numeric range, AC
values are scaled up by the maximum magnitude and the scale is saved separately. In addition, ThumbHash omits the
high-frequency half of the coefficients and only keep the low-frequency half. If you are familiar with JPEG’s
zig-zag coefficient order, this roughly corresponds to stopping halfway through that sequence. The rationale is that
the low-frequency coefficients carry most of the information, and we also want a smooth image.

Luminance and chrominance is represented in a simple color space that’s easy to encode and decode. It uses the
values L for luminance, P for yellow vs. blue, and Q for red vs. green (inspired by human eyesight). The
advantage of LPQ over RGB is that variation in luminance is typically more important than variation in chrominance,
so we can make better use of space by using more space for luminance and less space for chrominance. Note that the
range of L is 0 to 1 but the range of P and Q is -1 to 1 because they each represent a subtraction.

To convert from RGB to LPQ:

l = (r + g + b) / 3;
p = (r + g) / 2 - b;
q = r - g;

And to convert from LPQ back to RGB:

b = l - 2 / 3 * p;
r = (3 * l - b + q) / 2;
g = r - q;

The file format is tightly packed and each number uses fewer than 8 bits.
If the ThumbHash file format were to be represented as a C++ struct, it might look something like this:

struct ThumbHash {
  
  uint8_t l_dc : 6;
  uint8_t p_dc : 6;
  uint8_t q_dc : 6;
  uint8_t l_scale : 5;
  uint8_t has_alpha : 1;

  
  uint8_t l_count : 3;
  uint8_t p_scale : 6;
  uint8_t q_scale : 6;
  uint8_t is_landscape : 1;

  
  #if has_alpha
    
    uint8_t a_dc : 4;
    uint8_t a_scale : 4;
  #endif

  
  uint8_t l_ac[] : 4;
  uint8_t p_ac[] : 4;
  uint8_t q_ac[] : 4;

  
  #if has_alpha
    uint8_t a_ac[] : 4;
  #endif
};

The colon syntax after each field is the number of bits used by that field. The length of each AC array is the
number of coefficients left after removing the 0th component (i.e. the DC component) and also removing the
high-frequency half of the components. Representing that in C code might look something like this for a single
channel, where nx and ny are the numbers of coefficients in each dimension:

for (int y = 0; y < ny; y++)
  for (int x = 0; x < nx; x++)
    if ((x != 0 || y != 0) && (x * ny + y * nx < nx * ny))
      readAC();

The number of luminance components is derived as follows:

if (is_landscape) {
  lx = max(3, has_alpha ? 5 : 7);
  ly = max(3, l_count);
} else {
  lx = max(3, l_count);
  ly = max(3, has_alpha ? 5 : 7);
}

Using the is_landscape and has_alpha flags like this to make the number of coefficients in
one dimension implicit is a way to save space. Since the number of components is automatically derived from the
aspect ratio of the original image, you can also use this information to derive an approximation of the original
aspect ratio.

If you just want the average color of the image (e.g. in a situation where showing a placeholder image is
impractical), you can get that by transforming the l_dc, p_dc, and q_dc
values from LPQ to RGB. These values are conveniently at the front of the file for this purpose.

Reference implementations for this algorithm can be found at
https://github.com/evanw/thumbhash.

Read More