Determining JPEG minimum coded unit size
The problem
I ran into a problem while working on a project to clean up images with borders on Wikimedia Commons. This project required lossless cropping of JPEG images (for which I used jpegtran). However, some images would stubbornly refuse to be cropped to the specified dimensions. For examples, these two:
The theory
It turns out that the problem is caused by a part of the JPEG format known as the minimum coded unit (MCU). To understand this, you need to know two key bits of information about how JPEG images are constructed (see Wikipedia and this great Wikibook for additional details and explanation):
-
A regular colour JPEG image is composed of 3 layers (called “components”) that each encode a particular aspect of the image. More specifically, these are the components that make up the YCbCr color space: Y, Cb and Cr. The Y component encodes luma information (brightness, basically), whereas the Cb and Cr components encode levels of blue and red respectively.
-
JPEG encodes information in chunks of 8 by 8.
The human visual system is much better at detecting differences in luma than it is at detecting colour differences. JPEG can take advantage of this by encoding the color components (Cb and Cr) in much less detail than the luma component (Y). This is called chroma subsampling.
For example, given an 8 by 8 chunk from the Y component, instead of specifying the colour value for each of the points in this chunk, the chunk can be divided into 2 by 2 blocks to which the colour values are assigned. Thus, we only need 4 by 4 Cb and Cr chunks to specify the colours for this 8 by 8 Y chunk. Using this particular chroma subsampling method, a 64 by 64 pixel image will have a 64 by 64 Y component and two 32 by 32 colour components (Cb and Cr) that map onto the Y component.
Remember the second key point though: JPEG only encodes information into 8 by 8 chunks! Sticking with the example subsampling method, it is clear that an 8 by 8 chunk from the Cb or Cr component corresponds with a 16 by 16 chunk from the Y component. Because we cannot make the colour component chunks any smaller, the minimum coded unit would in this case be 16 by 16 pixels.
Now why is this a problem for cropping? It’s because we must crop along MCU borders if we want to crop losslessly (see Wikipedia. Think of it like cutting bubble wrap: ifyou cut right along the seams, the bubbles will be fine; if you cut across them, they’ll break.
The solution
For cropping to work on these images, we need to know their MCU. This has proven to be a rather obscure bit of information. It is not readily available in JPEG metadata (e.g. EXIF), nor in the JPEG header. Instead, it has to be calculated from the sampling factor information available in the JPEG header. To my knowledge there is only one tool that may be able to do this, called JPEGsnoop (I haven’t tested it). As this is a GUI program, it is obviously unfit for use in scripts.
Luckily however, the procedure is fairly straightforward:
First, the sampling factors for each of the components have to be obtained.
The sampling factors can be found in the Start-of-frame (SOF) header of the JPEG file. With the help of exiv2 the structure of the JPEG file can be obtained:
ubuntu@ubuntu:~$ exiv2 -p S example_16x16.jpg
STRUCTURE OF JPEG FILE: example_16x16.jpg
address | marker | length | data
2 | 0xd8 SOI | 0
4 | 0xe0 APP0 | 16 | JFIF.............;CREATOR: gd-jp
22 | 0xfe COM | 59
83 | 0xdb DQT | 67
152 | 0xdb DQT | 67
221 | 0xc0 SOF0 | 17
240 | 0xc4 DHT | 31
273 | 0xc4 DHT | 181
456 | 0xc4 DHT | 31
489 | 0xc4 DHT | 181
672 | 0xda SOS | 12
As can be seen, the SOF header starts at the 221st byte of the JPEG header and is 17 bytes in length.
Next, we read these bytes into an array:
This page has a good overview of the innards of the JPEG header. From it, we learn that the first 8 bytes are boring and every set of 3 bytes that follows after holds component specific parameters. Let’s parse our SOF header:
length of SOF header in bytes
| data precision
| | image height
| | | image width
| | | | number of components
| | | | |
----- -- ----- ----- --
00 11 08 03 21 04 b0 03 01 22 00 02 11 01 03 11 01
-------- -------- --------
/ / / | | |
/ / / | | Cr component
/ / / | Cb component
/ / / Y component
/ / quantization table number
/ /
/ sampling factor (first digit=vertical, second=horizontal)
component id (1=Y, 2=Cb, 3=Cr, 4=I, 5=Q)
I enjoyed drawing the above, but all we’re really interested in are the sampling factors, i.e. bytes 9, 12 and 15 for the Y, Cb and Cr sampling factors respectively. In this case, they are: 22, 11, 11. This can be read as follows: for every 2 by 2 block of pixels, there is a 2 by 2 block of Y values, but only a two 1 by 1 block of colour values. In other words, the colour values are defined in blocks of 4 pixels. Applying key point 2 (see Theory), we learn that the MCU of this image is 16 by 16 pixels.
Here’s a script that puts it all together. Please note that it will only work with YCbCr color space JPEG files and assumes that the 3-byte component parts are in the order Y, Cb, Cr:
Just run it on an image and it will return the width and height of its MCU:
ubuntu@ubuntu:~$ ./get_mcu_dimensions.sh example_16x16.jpg
mcu_x 16
mcu_y 16
Now you can properly crop your images using jpegtran!