cora

Decoding Canon Raw Files
Sun Apr 26 2020

What is a "raw" camera file?

Most consumer cameras will (if not told otherwise) save .png or .jpg files to the digital storage medium of your choice. This is already a heavily post-processed version of the image however and can in certain situations limit the amount of options you have while editing the image afterwards.

This is why camera vendors will offer the ability (at least in pro-sumer and professional products) to save the raw sensor data to an image file. This file stores not only raw sensor data but also a colored jpeg thumbnail, EXIF data like location, exposure, apt., iso, date and time, which optic was used and much more.

This allows us to choose post-sensor algorithms (like de-bayering, chr. abberation correction, optic correction and so on) when we get back home to our computer, not while we are looking at a 5cm screen in the field.

Canon's CR2

I found a complete specification on the archive. This page was also very helpful. In short, a CR2 file is a TIFF file with multiple layers (IFDs) . From the link:

* The Canon CR2 file format is an encapsulated TIFF shell having 4 IFD sets.
* These IFDs are different versions of the same image.
*
*     +=====================================+ Start of TIFF/CR2 file
*     | TIFF Header             |
*     | Size = 8                |
*     +=====================================+
*     | Various TIFF Tags describing File   | IFD #1 Segment
*     |   EXIF (TIFF subdirectory)      | Canon 5D image size 2496x1664
*     |- - - - - - - - - - - - - - - - - - -|
*     | JPEG data (baseline compression)    |
*     +=====================================+
*     | JpegInterchangeFormat       | IFD #2 Segment
*     |                 | unknown image size
*     |- - - - - - - - - - - - - - - - - - -|
*     | JPEG Compressed data        |
*     +=====================================+
*     | Few TIFF Tags describing segment    | IFD #3 Segment
*     |                 | Canon 5D image size 384x256
*     |- - - - - - - - - - - - - - - - - - -|
*     | JPEG data (unknown compression) |
*     +=====================================+
*     | Few TIFF Tags describing segment    | IFD #4 Segment - RAW image
*     |                 | Canon 5D image size 4476x2954
*     |- - - - - - - - - - - - - - - - - - -|
*     | JPEG data (lossless compression)    |
*     +=====================================+

The fourth IFD is the one we want to decode, it holds the raw and losslessly compressed sensor data. The data is scrambled (probably read in parallel at multiple points in the sensor chip) and encoded in an off-spec, lossless jpeg format. See also this very useful visual represenation of that JPEG format.

~300 lines of C++ later we can render the sensor data as a greyscale image:

Decoded File

Keep in mind that a camera does not actually capture all three colors per pixel. Instead, the sensor is covered with a pattern of colored and translucent material called a "Bayer Filter".

De-bayering

I stopped at this point because I got distracted by something else and had implemented de-bayering previously in another project. Thankfully there is projects like libcraw2 you can use instead.