2/11/2011

02-11-11 - Some notes on EZ-trees

I realized a few weeks ago that there is an equivalence between EZ-tree coding and NOSB unary less-than-parent coding. Let me explain what that means.

EZ-tree coding means coding values in bitplanes tree-structured flagging of significance and insignificance. "NOSB" means "number of singificant bits". "significant" at bit level b means the value is >= 2^b . (if you like, this is just countlz , it's the position of the top bit, 0 means there is no top bit, 1 means the top bit is the bottom bit, etc)

"NOSB" encoding is a way of sending variable length values. You take the number, find the number of signficant bits, then you send that number using some scheme (such as unary), and then send the bits. So, eg. the value 30 (30 = 11110) needs 5 bits, so first you send 5 (using unary that would be 111110), then you send the bottom 4 bits = 1110.

A few unary-NOSB encoded values for example :


0 : 0
1 : 10
2 : 110,0
3 : 110,1
4 : 1110,00
5 : 1110,01
6 : 1110,10
7 : 1110,11

Okay, now about EZ-trees. To be concrete I'll talk about a 4x4 image :


+++++++++
|A|B|b|b|
+++++++++
|C|D|b|b|
+++++++++
|c|c|d|d|
+++++++++
|c|c|d|d|
+++++++++

The traditional EZ-tree using a parent child relationship where the lower case quartets (bbbb,cccc,dddd) are children of the upper case letters (B,C,D). The spot B has its own value, and it also acts as the parent of the b quartet. In a larger image, each of the b's would have 4 kids, and so on.

In all cases we are talking about the absolute value (the magnitude) and we will send the sign bit separately (unless the value is zero).

EZ-tree encoding goes like this :


1. At each value, set 
significance_level(me) = NOSB(me)
tree_significance_level(me) = MAX( significance_level(me), tree_significance_level(children) )

so tree_significance_level of any parent is >= that of its kids (and its own value)

2. Send the max tree_significance_level of B,C, and D
  this value tells us where to start our bitplane iteration

3. Count down from level = max significance_level down to 0
  For each level you must rewalk the whole tree

4. Walk the values in tree order

5. If the value has already been marked signficant, then transmit the bit at that level

5.B. If this is the first on bit, send the sign

6. If the value has not already been sent as significant, send tree_significant ? 1 : 0

6.B. If tree not significant, done

6.C. If tree significant, send my bit (and sign if on) and proceed to children 
    ( (**) see note later)

In the terms of the crazy EZW terminology, if your tree is significant but the value is not significant, that's called an "isolated zero". When you and your children are all not singificant, that's called a "zerotree". etc.

Let's assume for the moment that we don't truncate the stream, that is we repeat this for all singificance levels down to zero, so it is a lossless encoder. We get compression because significance level (that is, log2 magnitude) is well correlated between parent-child, and also between spatial neighbors within a subband (the quartets). In particular, we're making very heavy use of the fact that significance_level(child) <= singificance_level(parent) usually.

The thing I realized is that this encoding scheme is exactly equivalent to NOSB coding as a delta from parent :


1. At each value, set NOSB(me) = number of singificant bits of my value, then
NOSB(me) = MAX( NOSB(me) , NOSB(children) )

2. Send the maximum NOSB of B,C, and D 
  this value will be used as the "parent" of B,C,D

3. Walk down the tree from parents to children in one pass

4. At each value :
    Send ( NOSB(parent) - NOSB(me) ) in unary
   note that NOSB(parent) >= NOSB(me) is gauranteed

5. If NOSB(me) is zero, then send no bits and don't descend to any of my children

6. If NOSB(me) is not zero, send my bits plus my sign and continue to my children

This is not just similar, it is exactly the same. It produces the exact same output bits, just permuted.

In particular, let's do a small example of just one tree branch :


B = 6 (= 110)

bbbb = {3,0,1,2}

Significance(B) = 3

Significance(bbbb) = {2,0,1,2}

And the EZ-tree encoding :

Send 3 to indicate the top bit level.

Level 3 :

send 1  (B is on)
  send 1 (a bit of B)
  send a sign here as well which I will ignore

Go to children
send 0000 (no children on)

Level 2 :

B is on already, send a bit = 1

Go to children
send significance flags :
1001

For significant values, send their bits :
1  1

Level 1 :

B is on already, send a bit = 0

Go to children
send significance flags for those not sent :
 01

send bits for those already singificant :
1 10

------------

Bits sent :

1
1
 0000
1
 1001
 1  1
0
  01
 1 10

but if we simply transpose the bits sent (rows<->columns) we get :

1110

0111
000
0011
0110

Which is clearly unary + values :

1 + 1110

01 + 11
000 (*)
001 + 1
01 + 10

* = unary for 3 would be 0001 , but there's no need to send the last 1
because we know value is <= 3

exactly the same !

(**) = actually at the bottom level (leaves) when you send a significance flag you don't need to send the top bit. The examples worked here treat the b,c,d groups as nodes, not final leaves. If they were leaves, the top bits should be omitted.

So, that's pretty interesting to me. Lots of modern coders (like ADCTC) use NOSB encoding, because it gives you a nice small value (the log2) with most of the compressability, and then the bits under the top bit are very uncompressable, and generally follow a simple falloff model which you can context-code using the NOSB as the context. That is, in modern coders the NOSB of a value is first arithmetic coded using lots of neighbor and parent information as context, and then bits under the top bit are coded using some kind of laplacian or gaussian simple curve using the NOSB to select the curve.

We can see that EZW is just a NOSB coder where it does two crucial things : set NOSB(parent) >= NOSB(child) , and transmit NOSB as | NOSB(parent) - NOSB(child) |. This relies on the assumption that parents are generally larger than kids, and that magnitude levels are correlated between parents and kids.

Forcing parent >= child means we can send the delta unsigned. It also helps efficiency a lot because it lets us stop an entire tree descent when you hit a zero. In the more general case, you would not force parent to be >= child, you would simply use the correlation by coding ( NOSB(parent) - NOSB(child) ) as a signed value, and arithmetic-code / model that delta ( using at least NOSB(parent) as context, because NOSB(parent) = 0 should very strongly predict NOSB(child) = 0 as well). The big disadvantage of this is that because you can send child > parent, you can never stop processing, you must walk to all values.

Of course we can use any parent-child relationship, we don't have to use the standard square quartets.

The NOSB method is vastly preferrable to the traditional EZ-Tree method for speed, because it involves only one walk over all the data - none of this repeated scanning at various bit plane levels.

A few more notes on the EZ-Tree encoding :

At step 6, when you send the flag that a tree is significant, there are some options in the encoding. If your own value is on, then it's possible that all your children are off. So you could send another flag bit indicating if all your children are 0 or not. Generally off is more likely than on, so you could also send the number of children that are on in unary, and then an identifier of which ones are on; really this is just a fixed encoding for the 4 bit flags so maybe you just want to Huffman them.

The more interesting case is if you send that your tree is significant, but your own value is *off*. In that case you know at least one of your children must be significant, and in fact the case 0000 (all kids insignificant) is impossible. What this suggests is that a 5 bit value should be made - one bit from the parent + the four child flags, and it should be Huffman'ed together. Then the 5 bit value 00000 should never occur.

It's a little unclear how to get this kind of action in the NOSB formulation. In particular, the fact that if parent is sigificant, but parents bits so far are zero, then one of the kids must be on - that requires coding of the children together as a unit. That could be done thusly : rather than using unary, take the delta of NOSB from parent for all of the children. Take the first two bits or so of that value and put them together to make an 8 bit value. Use parent bit = 0 as a 1 bit context to select two different huffmans and use that to encode the 8 bits.

Finally, a few notes on the "embedded" property of EZ-trees ; that is, the ability to truncate and get lower bitrate encodings of the image.

Naively it appears that the NOSB formulation of the encoding is not truncatable in the same way, but in fact it is. First of all, if you truncate entire bit levels off the bottom, you can simply send the number of bit levels to truncate off and then you effecitvely just shift everything down by that number of bits and then proceed as normal. If you wish to truncate in the middle of a bit level, that means only sending the bottom bit for the first N values in that bit level, and then storing 0 implicitly for the remaining values. So you just have to send N and then check in the decoder; in the decoder for the first N values it reads all the bits, and then for remaining values it reads NOSB-1 bits and puts a zero in the bottom. Now you may say "that's an extra value you have to send" ; well, not really. In the EZ-tree if you just truncate the file you are effectively sending N in the file size - that is, you're cheating and using the file size as an extra information channel to send your truncation point.

One thing I don't see discussed much is that EZ-tree truncated values should not just be restored with zeros. In particular, truncation is not the same as quantization at a coarser level, because you should sometimes round up and set a higher bit. eg. say you have the value 7 and you decided to cut off the bottom 3 bits. You should not send 0, you should send 8 >> 3 = 1.

A related issue is how you restore missing bottom bits when you have some top bits. Say you got 110 and then 3 bits are cut off the bottom so you have [110???] you should not just make zeros - in fact you know your value is in the range 110000 - 110111 ; filling zeros puts you at the bottom of the range which is clearly wrong. You could go to the middle of the range, but that's also slightly wrong because image residuals have laplacian distribution, so the expected value is somewhere below the middle of the range. I have more complex solutions for this, but one very simple bit-shifty method is like this :


To fill the missing bits
Add one 0 bit
Then repeat the top bits

So 110??? -> 110 + 0 + 11 , 101???? -> 1010101 , etc.

Of course the big deal in EZ-trees is to sort the way you send data so that the most important bits come first. This is like R/D optimizing your truncation. See SPIHT, EBCOT, etc. Modern implementations of JPEG2000 like Kakadu have some perceptual D heuristic so that they do more truncation where bits are less important *perceptually* instead of just by MSE.

No comments:

old rants