4/25/2016

Huffman Correction

A reader pointed out an error in my blog post - 08-12-10 - The Lost Huffman Paper

I had :


if ( bits >= huff_branchCodeLeftAligned[TABLE_N_BITS] )
{
    U32 peek = bits >> (WORD_SIZE - TABLE_N_BITS);
    Consume( table[peek].codeLen );
    return table[peek].symbol;
}

it should have been :

if ( bits < huff_branchCodeLeftAligned[TABLE_N_BITS] )
{
    U32 peek = bits >> (WORD_SIZE - TABLE_N_BITS);
    Consume( table[peek].codeLen );
    return table[peek].symbol;
}

it's corrected now.

In my convention, branchCodeLeftAligned is the left-aligned bitbuff value that means you must go to a higher codelen.

I thought for clarity I'd go ahead and post the example I did with him :


You have this alphabet :

symbol_id, codeLen, code:
0 ; 2 ; 00
1 ; 3 ; 010
2 ; 3 ; 011
3 ; 3 ; 100
4 ; 4 ; 1010
5 ; 4 ; 1011
6 ; 4 ; 1100
7 ; 4 ; 1101
8 ; 4 ; 1110
9 ; 5 ; 11110
10; 5 ; 11111


baseCode[n] = first code of len n - # of codes of lower len


baseCode,
    [2]     0
    [3]     1       = 010 - 1
    [4]     6       = 1010 - 4
    [5]     21      = 11110 - 9

huff_branchCodeLeftAligned
    [2]   0x4000000000000000      010000...
    [3]   0xa000000000000000      101000...
    [4]   0xf000000000000000      111100...
    [5]   0xffffffffffffffff      111111...


My decode loop is :

for(int codeLen=1;;codeLen++) // actually unrolled, not a loop
{

if ( bitbuff < huff_branchCodeLeftAligned[codeLen] ) return symbolUnsort[ getbits(codeLen) - baseCode[codeLen] ];

}

or

int codeLen = minCodeLen;
while ( bitbuff >= huff_branchCodeLeftAligned[codeLen] ) codeLen++;
sym = symbolUnsort[ getbits(codeLen) - baseCode[codeLen] ];

so if bitbuff is

11010000...

codeLen starts at 2
we check

if ( 11010000.. < 0x4000... ) - false
if ( 11010000.. < 0xa000... ) - false
if ( 11010000.. < 0xf000... ) - true
  return ( 1101 - baseCode[4] ); = 13 - 6 = 7

And a full table-accelerated decoder for this code might be :


// 3-bit acceleration table :
#define TABLE_N_BITS 3
if ( bits < huff_branchCodeLeftAligned[TABLE_N_BITS] )
{
    U32 peek = bits >> (WORD_SIZE - TABLE_N_BITS);
    Consume( table[peek].codeLen );
    return table[peek].symbol;
}

if ( bitbuff < huff_branchCodeLeftAligned[4] ) return symbolUnsort[ getbits(4) - baseCode[4] ];

// 5 is max codelen
// this compare is not always true (because of the bitbuff=~0 problem), but we take it anyway
//if ( bitbuff < huff_branchCodeLeftAligned[5] )

return symbolUnsort[ getbits(5) - baseCode[5] ];

And there you go. MSB-first Huffman that supports long code lengths that exceed the acceleration table size.

No comments:

old rants