9/09/2012

09-09-12 - A Simple Tight-Packed Array

Trivial snippet for a tight-packed array with bit mask indicating which elements exist.

In the same family as this post :

cbloom rants 06-14-11 - A simple allocator

On newer chips with POPCNT this should be reasonably fast (assuming query is common and insert is rare, since insert requires a realloc/memmove or something similar). (and of course everything is a disaster on platforms where variable shift is slow).


struct PackedArray
{
    uint32  mask; // numItems = num_bits_set(mask)
    int32   items[1]; // variable allocation size
};


static inline uint32 num_bits_set( uint32 v )
{
    //return _mm_popcnt_u32(v);
    // from "Bit Twiddling Hacks" :
    v = v - ((v >> 1) & 0x55555555);                    // reuse input as temporary
    v = (v & 0x33333333) + ((v >> 2) & 0x33333333);     // temp
    uint32 c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count
    return c;
}

bool PackedArray_Get(const PackedArray * a,int index,int32 * pgot)
{
    ASSERT( index >= 0 && index < 32 );

    uint32 mask = 1UL << index;
    if ( a->mask & mask )
    {
        // it exists, find it
        uint32 numPreceding = num_bits_set( a->mask & (mask-1) );   
        *pgot = a->items[numPreceding];
        return true;
    }
    else
    {
        return false;
    }
}

bool PackedArray_Put(PackedArray * a,int index,int32 item)
{
    ASSERT( index >= 0 && index < 32 );
    
    uint32 mask = 1UL << index;
    uint32 numPreceding = num_bits_set( a->mask & (mask-1) );   
        
    if ( a->mask & mask )
    {
        // it exists, replace
        a->items[numPreceding] = item;
        return true;
    }
    else
    {
        // have to add it
        // realloc items here or whatever your scheme is
        
        // make room :
        uint32 numFollowing = num_bits_set(a->mask) - numPreceding;
        
        // slide up followers :
        int32 * pitem = a->items + numPreceding;
        memmove(pitem+1,pitem,numFollowing*sizeof(int32));
        
        // put me in
        *pitem = item;
        
        a->mask |= mask;
        return false;
    }
}

(BTW the GCC builtins are annoying. What they should have done is provide a way to test if the builtin actually corresponds to a machine instruction, because if it doesn't I generally don't want their fallback implementation, I'll do my own thank you.)

2 comments:

ryg said...

"(and of course everything is a disaster on platforms where variable shift is slow)"
This one can be resolved for the low, low price of a single cache line on the most annoying platforms with that particular misfeature. :)

Similarly, a small LUT (32-64 entries) for small integer->float works well.

cbloom said...

True, I guess 1 << n is easy; x << n is hard.

On most platforms this code is faster with a cached occupation count, but I hate redundant state so I left it out of the post.

old rants