The problem

I ran into a problem while working on a project to clean up images with borders on Wikimedia Commons. This project required lossless cropping of JPEG images (for which I used jpegtran). However, some images would stubbornly refuse to be cropped to the specified dimensions. For examples, these two:

Example of image with a 16x16 pixel MCU

Example of image with 8x16 pixel MCU

The theory

It turns out that the problem is caused by a part of the JPEG format known as the minimum coded unit (MCU). To understand this, you need to know two key bits of information about how JPEG images are constructed (see Wikipedia and this great Wikibook for additional details and explanation):

  1. A regular colour JPEG image is composed of 3 layers (called “components”) that each encode a particular aspect of the image. More specifically, these are the components that make up the YCbCr color space: Y, Cb and Cr. The Y component encodes luma information (brightness, basically), whereas the Cb and Cr components encode levels of blue and red respectively.

  2. JPEG encodes information in chunks of 8 by 8.

The human visual system is much better at detecting differences in luma than it is at detecting colour differences. JPEG can take advantage of this by encoding the color components (Cb and Cr) in much less detail than the luma component (Y). This is called chroma subsampling.

For example, given an 8 by 8 chunk from the Y component, instead of specifying the colour value for each of the points in this chunk, the chunk can be divided into 2 by 2 blocks to which the colour values are assigned. Thus, we only need 4 by 4 Cb and Cr chunks to specify the colours for this 8 by 8 Y chunk. Using this particular chroma subsampling method, a 64 by 64 pixel image will have a 64 by 64 Y component and two 32 by 32 colour components (Cb and Cr) that map onto the Y component.

Remember the second key point though: JPEG only encodes information into 8 by 8 chunks! Sticking with the example subsampling method, it is clear that an 8 by 8 chunk from the Cb or Cr component corresponds with a 16 by 16 chunk from the Y component. Because we cannot make the colour component chunks any smaller, the minimum coded unit would in this case be 16 by 16 pixels.

Now why is this a problem for cropping? It’s because we must crop along MCU borders if we want to crop losslessly (see Wikipedia. Think of it like cutting bubble wrap: ifyou cut right along the seams, the bubbles will be fine; if you cut across them, they’ll break.

The solution

For cropping to work on these images, we need to know their MCU. This has proven to be a rather obscure bit of information. It is not readily available in JPEG metadata (e.g. EXIF), nor in the JPEG header. Instead, it has to be calculated from the sampling factor information available in the JPEG header. To my knowledge there is only one tool that may be able to do this, called JPEGsnoop (I haven’t tested it). As this is a GUI program, it is obviously unfit for use in scripts.

Luckily however, the procedure is fairly straightforward:

First, the sampling factors for each of the components have to be obtained.

The sampling factors can be found in the Start-of-frame (SOF) header of the JPEG file. With the help of exiv2 the structure of the JPEG file can be obtained:

ubuntu@ubuntu:~$ exiv2 -p S example_16x16.jpg 
STRUCTURE OF JPEG FILE: example_16x16.jpg
 address | marker     | length  | data
       2 | 0xd8 SOI   |       0 
       4 | 0xe0 APP0  |      16 | JFIF.............;CREATOR: gd-jp
      22 | 0xfe COM   |      59 
      83 | 0xdb DQT   |      67 
     152 | 0xdb DQT   |      67 
     221 | 0xc0 SOF0  |      17 
     240 | 0xc4 DHT   |      31 
     273 | 0xc4 DHT   |     181 
     456 | 0xc4 DHT   |      31 
     489 | 0xc4 DHT   |     181 
     672 | 0xda SOS   |      12

As can be seen, the SOF header starts at the 221st byte of the JPEG header and is 17 bytes in length.

Next, we read these bytes into an array:

sof0string=$(hexdump "$1" -s 221 -n 17 -v -e '/1 "%02x "' | sed 's/\ $//')
read -a sof0 <<< "${sof0string}"

This page has a good overview of the innards of the JPEG header. From it, we learn that the first 8 bytes are boring and every set of 3 bytes that follows after holds component specific parameters. Let’s parse our SOF header:

    length of SOF header in bytes
    |  data precision
    |  |     image height
    |  |     |     image width
    |  |     |     |  number of components
    |  |     |     |  |
----- -- ----- ----- --
00 11 08 03 21 04 b0 03 01 22 00 02 11 01 03 11 01
                        -------- -------- --------
                       /  /  / |        |        |
                      /  /  /  |        |        Cr component
                     /  /  /   |        Cb component
                    /  /  /    Y component       
                   /  /   quantization table number
                  /  /
                 /   sampling factor (first digit=vertical, second=horizontal)
                 component id (1=Y, 2=Cb, 3=Cr, 4=I, 5=Q)

I enjoyed drawing the above, but all we’re really interested in are the sampling factors, i.e. bytes 9, 12 and 15 for the Y, Cb and Cr sampling factors respectively. In this case, they are: 22, 11, 11. This can be read as follows: for every 2 by 2 block of pixels, there is a 2 by 2 block of Y values, but only a two 1 by 1 block of colour values. In other words, the colour values are defined in blocks of 4 pixels. Applying key point 2 (see Theory), we learn that the MCU of this image is 16 by 16 pixels.

Here’s a script that puts it all together. Please note that it will only work with YCbCr color space JPEG files and assumes that the 3-byte component parts are in the order Y, Cb, Cr:


# Get position and length of SOF0 header in file.
sof0hex=$(exiv2 -p S $file | grep SOF0 | sed 's/|.*|//')
IFS=' '
read -a sof0hexfields <<< "${sof0hex}"

# Read SOF0 values into array.
sof0string=$(hexdump "$1" -s $offset -n $length -v -e '/1 "%02x "' | \
             sed 's/\ $//')
read -a sof0 <<< "${sof0string}"

# Check if length of SOF0 is as expected.
if [ ${#sof0[@]} -ne $((16#${sof0[0]}${sof0[1]})) ]; then
  echo "Length of SOF0 in bytes ($sof0length) not as expected ($sof0explength)."

# Check if the image is a YCbCr image (the only encoding this script handles).
if [ ${sof0[7]} -ne 3 ]; then
  echo "Image has ${sof0[7]} instead of 3 components: not YCbCr."
if [ ${sof0[14]} -ne 3 ]; then
  echo "Image is not YCbCr (most likely YIQ instead, or else just screwed)."

# Check if Cb and Cr are both 11.
if [ ${sof0[12]} -ne 11 -o ${sof0[15]} -ne 11 ]; then
  echo "Sampling factors of Cb and/or Cr is/are not equal to 11."

# Determine MCU based on Y component sampling factors:
y_hor=$(echo ${sof0[9]} | cut -c 1)
y_ver=$(echo ${sof0[9]} | cut -c 2)

mcu_x=$((y_hor * 8))
mcu_y=$((y_ver * 8))

echo -e $"mcu_x\t$mcu_x"
echo -e $"mcu_y\t$mcu_y"

Just run it on an image and it will return the width and height of its MCU:

ubuntu@ubuntu:~$ ./ example_16x16.jpg 
mcu_x   16
mcu_y   16

Now you can properly crop your images using jpegtran!