A very compact representation of an image placeholder.
Store it inline with your data and show it while the real image is loading for a smoother loading experience.
It’s similar to BlurHash but with the following advantages:
 Encodes more detail in the same space
 Much faster to encode and decode
 Also encodes the aspect ratio
 Gives more accurate colors
 Supports images with alpha
Despite doing all of these additional things, the code for ThumbHash is still similar in complexity to the code for
BlurHash. One potential drawback compared to BlurHash is that the parameters of the algorithm are not configurable
(everything is automatically configured).
The code for this is available at
https://github.com/evanw/thumbhash and contains implementations for
JavaScript, Rust, Swift, and Java. You can use npm install thumbhash
to install the
JavaScript package and cargo add thumbhash
to
install the Rust package.
#Demo
#Comparisons
The table below compares ThumbHash to several other similar approaches:

ThumbHash:
ThumbHash encodes a higherresolution luminance channel, a lowerresolution color channel, and an optional alpha
channel. The format is described in detail in the details section. There are no
parameters to configure. 
BlurHash:
Uses BlurHash with 3×3 components for square images, 4×3
components for landscape images, and 3×4 components for portrait images. This is the configuration recommended
in the documentation, and is roughly the same size as a ThumbHash encoded using base64. 
Potato WebP:
This is an experiment of mine to see how Google’s
WebP image format does at this. The
“hash” is just the contents of the “VP8” chunk in a minimal WebP file: 0% quality (i.e.
potato quality) and a size of 16×16, since
WebP encodes everything in 16×16 blocks. The image is reconstructed by blurring a scaledup copy of a minimal
WebP file with the VP8 chunk reinserted.
In addition to these sample images, you can also drag and drop your own images to compare them here.
Original image  ThumbHash  BlurHash  Potato WebP 

#Details
The image is approximated using the
Discrete Cosine Transform. Luminance is
encoded using up to 7 terms in each dimension while chrominance (i.e. color) is encoded using 3 terms in each
dimension. The optional alpha channel is encoded using 5 terms in each dimension if present. If alpha is present,
luminance is only encoded using up to 5 terms in each dimension.
Each channel of DCT coefficients comes in three parts: the DC term, the AC terms, and the scale. The DC term is the
coefficient for the 0th order cosine and the AC terms are the coefficients of all other cosines (DC and AC are terms
from signal processing). All values are quantized to only a few bits each. To maximize the useful numeric range, AC
values are scaled up by the maximum magnitude and the scale is saved separately. In addition, ThumbHash omits the
highfrequency half of the coefficients and only keep the lowfrequency half. If you are familiar with JPEG’s
zigzag coefficient order, this roughly corresponds to stopping halfway through that sequence. The rationale is that
the lowfrequency coefficients carry most of the information, and we also want a smooth image.
Luminance and chrominance is represented in a simple color space that’s easy to encode and decode. It uses the
values L for luminance, P for yellow vs. blue, and Q for red vs. green (inspired by human eyesight). The
advantage of LPQ over RGB is that variation in luminance is typically more important than variation in chrominance,
so we can make better use of space by using more space for luminance and less space for chrominance. Note that the
range of L is 0 to 1 but the range of P and Q is 1 to 1 because they each represent a subtraction.
To convert from RGB to LPQ:
l = (r + g + b) / 3; p = (r + g) / 2  b; q = r  g;
And to convert from LPQ back to RGB:
b = l  2 / 3 * p; r = (3 * l  b + q) / 2; g = r  q;
The file format is tightly packed and each number uses fewer than 8 bits.
If the ThumbHash file format were to be represented as a C++ struct, it might look something like this:
struct ThumbHash { uint8_t l_dc : 6; uint8_t p_dc : 6; uint8_t q_dc : 6; uint8_t l_scale : 5; uint8_t has_alpha : 1; uint8_t l_count : 3; uint8_t p_scale : 6; uint8_t q_scale : 6; uint8_t is_landscape : 1; #if has_alpha uint8_t a_dc : 4; uint8_t a_scale : 4; #endif uint8_t l_ac[] : 4; uint8_t p_ac[] : 4; uint8_t q_ac[] : 4; #if has_alpha uint8_t a_ac[] : 4; #endif };
The colon syntax after each field is the number of bits used by that field. The length of each AC array is the
number of coefficients left after removing the 0th component (i.e. the DC component) and also removing the
highfrequency half of the components. Representing that in C code might look something like this for a single
channel, where nx
and ny
are the numbers of coefficients in each dimension:
for (int y = 0; y < ny; y++) for (int x = 0; x < nx; x++) if ((x != 0  y != 0) && (x * ny + y * nx < nx * ny)) readAC();
The number of luminance components is derived as follows:
if (is_landscape) { lx = max(3, has_alpha ? 5 : 7); ly = max(3, l_count); } else { lx = max(3, l_count); ly = max(3, has_alpha ? 5 : 7); }
Using the is_landscape
and has_alpha
flags like this to make the number of coefficients in
one dimension implicit is a way to save space. Since the number of components is automatically derived from the
aspect ratio of the original image, you can also use this information to derive an approximation of the original
aspect ratio.
If you just want the average color of the image (e.g. in a situation where showing a placeholder image is
impractical), you can get that by transforming the l_dc
, p_dc
, and q_dc
values from LPQ to RGB. These values are conveniently at the front of the file for this purpose.
Reference implementations for this algorithm can be found at
https://github.com/evanw/thumbhash.
Leave A Comment