2/23/2010

02-23-10 - Image Compresson - Color , ScieLab

The last time I wrote about anything technical it was to comment on image coding perceptual targets and chroma . Let's get into that a bit more.

There are these standard weapons available to us : 1. Colorspace transform (lossy or lossless) , 2. Relative scaling of color channels, 3. Downsampling , 4. Non-flat quantization matrices.

Many image compressors use some combination of these. For example, JPEG uses YCbCr colorspace, which has a built-in down scaling of the chroma channels, also optionally downsamples chroma, and also usually uses a very high-frequency-killing quantization matrix. The result is that chroma is attacked in many ways - the DC accuracy is destroyed by the scaling in the color conversion as well as the [0] entry of the quantization matrix, and high frequency info is killed both by downsampling and the high entries in the quantization matrix.

But is this good? Obviously it's all bad in terms of RMSE (* not completely true, but close enough), so we need something that approximates the human eye's less sensitie chroma receptors.

For a long time I put off this question because it seemed the only way to attack it was by showing a ton of images to test subjects and asking "is this better?". (Furthermore, there's the ugly problem that any perceptual metric is heavy tied to viewing conditions, and without knowing the viewing conditions you may be optimizing for the wrong thing). But maybe I found a solution.

Let me be clear briefly that I am here only trying to address the issue of how the human eye sees chroma vs luma. This is not a full "psychovisual perceptual metric" which would have to account for the brain identifying areas of noise vs. areas of smoothness, repeated patterns, linear ramps, etc. Basically the only thing I'm trying to capture here is the importance of luma bits vs. chroma bits.

Well, it turns out there's this thing from color research called SCIELAB . You may be familiar with "CIE LAB" aka the "Lab color space" which is considered to be pretty close to "perceptually uniform" , that is 1 unit of distance between two Lab colors has the same perceptual error importance no matter what the two colors are. Well SCIELAB is the extension of CIELAB to images (not just single colors). You can read the paper at that link (or see links below), but the basic thing it does is very simple :

SCIELAB takes the image and transforms it to "opponent color" (luma, red-green, and blue-yellow) , which is roughly the color space that eyes use to see light (rods see luma, cones see chroma) (note that here we are transforming "pixel values" into real light values, so we have to make an assumption about the brightness and color calibration of the viewing device). In opponent color space, each channel is filtered. The filter used for each channel represents the angular resolution that a rod or cone has. Basically this is a gaussian whose sdev is proportional to the angular resolution in that channel. This depends on the DPI of the viewing device and the viewing distance (eg. how many pixels fit into one degree at the eye). The gaussian is narrow for luma, indicating good precision, and wider for chroma. The filter also has a wide negative lobe around the center peak, which captures the fact that we see values as relative to their neighborhood - eg. 100 on a background of 10 looks brighter than 100 on a background of 50.

The gaussian filters represent the probability of a photon from a given pixel hitting and activating a rod or cone. The wider filters for chroma indicate that a half-toned image in red-green or blue-yellow will be indistiguishable from the original at a much shorter distance than a half-toned luma image.

One you do this filtering, you transform back to CIELAB and then you can just do a normal MSE to create a "delta E". (CIE also defines a more elaborate more uniform "delta E" metric for LAB , but for our purposes the plain L2 distance is very close and much simpler). The result is a "SCIELAB delta E" metric that is analytic and can be used in place of MSE for comparing images. Having this SCIELAB metric now lets us try various things and it tells us whether they are perceptually better or not (in terms of optical perception of color, anyway).

So far as I know this has never been used in the mainstream image compression literature ; the only place I found it was this Stanford school project tech report : Direction-Adaptive Partitioned Block Transform for Color Image Coding . This paper is pretty interesting; they aren't actually doing anything with the DA-PBT , they're just evaluating color spaces and how to do color coding starting with a grayscale image compressor.

Let's go through the EE398 paper in detail.

First they use YCbCr because they claim it produces better scielab results than RGB. True enough, but there were a lot of other color spaces to try. Furthermore, they don't mention this, but they are using the JPEG style YCbCr, which has a built in 0.5 scaling of the chroma channels (chroma should have a range of [-256,256] but JPEG offsets and scales to put it back into [0,256]) - they have effectively killed the chroma precision by using YCBCr.

They then look at whether sub-sampling helps or not. They find it to be roughly neutral - but when you try subsampling or not subsampling you should also try optimizing all other free options (scaling of the chroma channels, quantization matrix).

The most interesting part to me is "Rate Allocation". They try giving different fractions of the bit budget to Y or CbCr. They find that optimal delta E almost always occurs somewhere around Y bits = 66% of the total , that is the bit ratios are like [4:1:1]. In order to acheive this ratio they had to use small quantization step sizes for CbCr than Y, but that is an anomaly because of the fact that the YCbCr they use has killed the chroma - if you use a non-scaling YCbCr you would find that the chroma quantization values should be *larger* than luma to acheive the 66% bit allocation. (note that using different quantization values on each channel is equivalent to scaling the channels relative to each other).

They also found that using non-uniform quantization matrices (ala JPEG) hurt. I believe this was just an anomaly of their flawed testing methodology.

This paper was the most serious study of color in image compression that I've ever seen, but is still flawed in some simple ways that we can fix. The big problem is that they make the classic blunder of many people working in compression of optimizing parameters one by one. That is, say you have a compressor with options {A,B,C}. The blunderer finds the optimal value for option A and holds that fixed, then the optimal for B, then the optimal for C. They then try out some experimental new mode for step A, and their tests show it doesn't help - but they failed to retry every option for B and C in the new mode for A. eg. for example something like downsampling might hurt if you're using YCbCr, but say you use some other color space, or scale your colors in some way, or whatever, then downsampling might help and the result of doing all those steps together may be the best configuration.

Let's go back through it carefully :

First of all, the color conversion. Let me note that we use the color conversion in image compression for really two separate purposes which are mixed up. One use is for decorrelation (or energy compaction if you prefer) - this helps compression even for lossless mode. The second is for perceptual separation of chroma from luma so that we can smack the chroma around. Obviously here we need a color transform which gives us {luma/chroma} separation - that is, we cannot use something like the KLT which doesn't necessarilly have a perceptual "luma" axis.

From my earlier color studies, I found that YCoCg produces good results, usually within 1% of the best color transform on each image, so we'll just use that. But we will be careful and use a float <-> float YCoCg which doesn't scale any of the channels.

We will then scale Y relative to CoCg. This scaling is equivalent to variable quantizers and is (one of the ways) how we will control the bit allocation to Y vs. Chroma. This scaling gives you a difference in "value resolution" , it doesn't kill high frequencies.

You can then optionally downsample chroma. Note that in naive tests I have found in the past that downsampling chroma sometimes helps visual quality; and in fact in some cases it even helps MSE measured on the RGB data. I now know that that was just an anomaly due to the fact that I wasn't considering chroma scaling. That is, downsampling was just a crude way of allocating fewer bits to chroma, which does in fact sometimes help, but if you also have the ability to change the chroma bit allocation by relative scaling of the channels, the advantage of downsampling vanishes.

I optimized the scaling of CoCg relative to Y on lots of images. Obviously the true optimum value is highly image dependent (you could compute this per image and store it with the image of course), but in most cases a scale near 0.7 is optimal if you are not downsampling, and a scale near 1.1 is close to optimal when downsampling ( 1.0 is not bad when downsampling ). When not downsampling, the optimal bit allocation is usually in the area of Y ~= 66% of the bits, as seen in the EE398 paper. When downsampling, the optimal bit allocation tends to be closer to Y = 80% of the bits. Downsampling generally hurts RGB MSE and SCIELAB delta E, but I find it sometimes helps RGB SSIM.

Obviously downsampling is resulting in more bits being used on luma, which means you'll have sharper edges and better preservation of texture and a visual appearance of more "detail", at the cost of the color values being far off. By my own examination, I often will find that if I just stare at the image made from downsampled chroma it looks "better" - eg. I see more edge detail, and it has less of that obvious appearance of being compressed, eg. less ringing artifacts, halos, stair-steps, etc. However, when I switch back and forth between the original and the compressed, the version made from downsampled chroma shows obvious color errors. The version made from non-downsampled chroma obviously has much better color preservation, but appears generally blurrier, has more block artifacts, etc. The non-downsampled version wins according to "delta E" , but by my eyes I can't really clearly say one is better than the other, they're just different errors.

The last tool we have is a non-uniform quantization matrix. NUQM lets us give more bits to the low frequencies vs. the high frequencies. Generally NUQM hurts MSE, but it might help "delta E" , because SCIELAB accounts for the "fuzziness" of human visual (insensitivity to high frequency pattern). To test this, what we need to try is various different NUQM's for both luma and chroma, as well as optimizing the relative scaling value in each case. I haven't completed this yet, but early results show that NUQM's do in fact help delta E. Note that I'm not talking about doing a per-image optimal NUQM like "dctopt" does or something, just finding something like the JPEG style skewed matrix to use globally.

Some numbers for example :


On a 512x512 color image of a face , at 1.0 bits per pixel , 
optimizing quality at constant bit rate


baseline : delta E = 2.2933

not downsampled , optimal CoCg scale = 0.625 : delta E = 2.155  (bits Y = 72%)

    downsampled , optimal CoCg scale = 1.188 : delta E = 2.381  (bits Y = 80%)

best NUQM and scaling (no downsampling) : delta E = 1.899  (bits Y = 61%)

( JPEG delta E = 2.7339 )

One thing I notice that NUQM does obviously is give a lot more bits to the DC's. In this case :


not downsampled, same cases as previous
UQM = uniform quantization matrix

 UQM , bits DC = 15.4% , Q = 14.0  , delta E = 2.155 , bits Y = 72%

NUQM , bits DC = 19.4% , Q =  7.25 , delta E = 1.899 , bits Y = 61%

Here Q is the quantizer of the DC component of Y - in the UQM case all Q's are the same (though the Q for chroma is effectively scaled). In the NUQM case the higher frequency AC components get much higher Q's. We can see from the above that because of NUQM, the quantizer for the DC can be much lower at the same bit rate.

Personal visual inspection indicates that the NUQM images just have much more "JPEG-like" artifacts. That is, they generally look more speckly. They obviously preserve flat areas and simple ramps somewhat better. The tradeoff is much worse ringing artifacts and destruction of high frequency detail like fine edges. (in my case the lower Q from NUQM also means a much weaker deblocking filter is used which may be part of the reason for more speckly appearance).

In any case, NUQM clearly helps delta E due to the ability to take bits away from the high frequency chroma data - much better than just scaling and downsampling can.

This is all very interesting and promising, but we have to ask ourselves at some point - how much do we trust this "scielab delta E" ? eg. by optimizing for this metric are we actually making better results? More and more I am convinced that the biggest thing missing from data compression is a better image quality metric (and then once you have that, you need to go back to basics and re-test all your assumptions against it in the correct way).

Color links :

Working Space Comparison sRGB vs. Adobe RGB 1998
Welcome to IEEE Xplore 2.0 Using SCIELAB for image and video quality evaluation
Video compression's quantum leap - 12112003 - EDN
Useful Color Equations
Useful Color Data
Standard illuminant - Wikipedia, the free encyclopedia
SpringerLink - Book Chapter
S-CIELAB Matlab implementation
References related to S-CIELAB
Lab color space - Wikipedia, the free encyclopedia
IEEE Xplore - Login
help - sRGB versus Adobe RGB (1998)
efg's Chromaticity Diagrams Lab Report
CIECAM02 - Wikipedia, the free encyclopedia
Chromatic Adaptation
Brian A. Wandell -- Reference Page
Ask a Color Scientist!
A top down description of S-CIELAB and CIEDE2000. Garrett M. Johnson. 2003; Color Research & Application - Wiley InterScienc
A proposal for the modification of s-CIELAB

2/10/2010

02-10-10 - Some little image notes

1. Code stream structure implies a perceptual model. Often we'll say that uniform quantization is optimal for RMSE but is not optimal for perceptual quality. We think of JPEG-style quantization matrices that crush high frequencies as being better for human-visual perceptual quality. I want to note and remind myself that actually just the coding structure actually targets perceptual quality even if you are using uniform quantizers. (obviously there are gross ways this is true such as if you subsample chroma but I'm not talking about that).

1.A. One way is just with coding order. In something like a DCT with zig-zag scan, we are assuming there will be more zeros in the high frequency. Then when you use something like an RLE coder or End of Block codes, or even just a context coder that will correlate zeros to zeros, the result is that you will want to crush values in the high frequencies when you do RDO or TQ (rate distortion optimization and trellis quantization). This is sort of subtle and important; RDO and TQ will pretty much always kill high frequency detail, not because you told it anything about the HVS or any weighting, but just because that is where it can get the most rate back for a given distortion gain - and this is just because of the way the code structure is organized (in concert with the statistics of the data). The same thing happens with wavelet coders and something like a zerotree - the coding structure is not only capturing correlation, it's also implying that we think high frequencies are less important and thus where you should crush things. These are perceptual coders.

1.B. Any coder that makes decisions using a distortion metric (such as any lagrange RD based coder) is making perceptual decisions according to that distortion metric. Even if the sub-modes are not overtly "perceptual" if the decision is based on some distortion other than MSE you can have a very perceptual coder.

2. Chroma. It's widely just assumed that "chroma is less important" and that "subsampling is a good way to capture this". I think that those contentions are a bit off. What is true, is that subsampling chroma is *okay* on *most* images, and it gives you a nice speedup and sometimes a memory use reduction (half as many samples to code). But if you don't care about speed or memory use, it's not at all clear that you should be subsampling chroma for human visual perceptual gain.

It is true that we see high frequencies of chroma worse than we see high frequencies of luma. But we are still pretty good at locating a hard edge, for example. What is true is that a half-tone printed image in red or blue will appear similar to the original at a closer distance than one in green.

One funny thing with JPEG for example is that the quantization matrices are already smacking the fuck out of the high frequencies, and then they do it even harder for chroma. It's also worth noting that there are two major ways you can address the importance of chroma : one is by killing high frequencies in some way (quantization matrices or subsampling) - the other is how fine the DC value of the chroma should be; eg. how should the chroma planes be scaled vs. the luma plane (this is equivalent to asking - should the quantizers be the same?).

old rants