5/15/2016

PS4 Battle : LZ4 vs LZSSE vs Oodle

PS4 is the arena. Three compressors enter. (revised 05-16)

Advantages of the PS4 : consistent well-defined hardware for reproducible testing. Slow platform that game developers care about being fast on. Builds with clang which is the target of choice for some compression libraries that don't build so easily on MSVC.

After the initial version of this post, I went and fuzz-safed LZB16, so they're all directly comparable. To compare apples, look at LZ4_decompress_safe. I also include LZ4_decompress_fast for reference.

LZ4_decompress_safe : fuzz safe (*)
LZ4_decompress_fast : NOT fuzz safe
LZSSE8_Decode : fuzz safe
Oodle LZB16 : fuzz safe
LZSSE and Oodle both use multiple copies of the core loop to minimize the pain of fuzz safing. LZ4's code is much simpler, it doesn't do a ton of macro or .inl nastiness. (* = this is the only one that I would actually trust to put in a satellite or a bank or something critical, it's just so simple, it's way easier to be sure that it's correct)

Conclusion :

Comparing the two Safe open-source options, LZ4_safe vs. LZSSE8 : LZSSE8 is pretty consistently faster than LZ4_Safe on PS4 (though the difference is small). PS4 is a better platform for LZSSE than x64 (PS4 is actually a pretty bad platform for LZ4; there're a variety of issues; see GDC "Taming the Jaguar" slides ; but particular issues for LZ4 are the front-end bottleneck and cache latency). When I tested on x64 before, it was much more mixed, sometimes LZ4 was faster.

I was surprised to find that Oodle LZB16 is quite a lot faster than LZ4 on PS4. (for example, that's not the case on Windows/x64, it's much closer there). I've never run third party codecs on PS4 before. I suppose this reflects the per-platform tweaking that we spend so much time on, and I'm sure LZ4 would catch up with some time spent fiddling with the PS4 codegen.

The compression ratios are mostly close enough to not care, though LZSSE8 does a bit worse on some of the DXTC/BCn files (lightmap.bc3 and d.dds).

On some files I include Oodle LZBLW numbers (LZB-bytewise-large-window). Sometimes Oodle LZBLW is a pretty big free compression win at about the same speed. Sometimes it gets worse ratio, sometimes much worse speed. If I was a client using this, I might try LZBLW and drop down to LZB16 any time it's not a good tradeoff.

Full data :

REMINDER : LZ4_decompress_fast is not directly comparable to the others, it's not fuzz safe, the others are!


PS4 clang-3.6.1

------------------

lzt99 :

lz4hc : 24,700,820 ->14,801,510 =  4.794 bpb =  1.669 to 1
LZ4_decompress_safe_time : 0.035 seconds, 440.51 b/kc, rate= 702.09 mb/s
LZ4_decompress_fast_time : 0.032 seconds, 483.55 b/kc, rate= 770.67 mb/s

LZSSE8  : 24,700,820 ->15,190,395 =  4.920 bpb =  1.626 to 1
LZSSE8_Decode_Time : 0.033 seconds, 467.32 b/kc, rate= 744.81 mb/s

Oodle LZB16 : lzt99 : 24,700,820 ->14,754,643 =  4.779 bpb =  1.674 to 1 
decode           : 0.027 seconds, 564.72 b/kc, rate= 900.08 mb/s

Oodle LZBLW : lzt99 : 24,700,820 ->13,349,800 =  4.324 bpb =  1.850 to 1
decode           : 0.033 seconds, 470.39 b/kc, rate= 749.74 mb/s

------------------

texture.bc1

lz4hc : 2,188,524 -> 2,068,268 =  7.560 bpb =  1.058 to 1
LZ4_decompress_safe_time : 0.004 seconds, 322.97 b/kc, rate= 514.95 mb/s
LZ4_decompress_fast_time : 0.004 seconds, 353.08 b/kc, rate= 562.89 mb/s

LZSSE8 : 2,188,524 -> 2,111,182 =  7.717 bpb =  1.037 to 1
LZSSE8_Decode_Time : 0.004 seconds, 360.21 b/kc, rate= 574.42 mb/s

Oodle LZB16 : texture.bc1 :  2,188,524 -> 2,068,823 =  7.562 bpb =  1.058 to 1 
decode           : 0.004 seconds, 368.67 b/kc, rate= 587.84 mb/s

------------------

lightmap.bc3

lz4hc : 4,194,332 ->   632,974 =  1.207 bpb =  6.626 to 1
LZ4_decompress_safe_time : 0.005 seconds, 521.54 b/kc, rate= 831.38 mb/s
LZ4_decompress_fast_time : 0.005 seconds, 564.63 b/kc, rate= 900.46 mb/s

LZSSE8 encode    :  4,194,332 ->   684,062 =  1.305 bpb =  6.132 to 1
LZSSE8_Decode_Time : 0.005 seconds, 551.85 b/kc, rate= 879.87 mb/s

Oodle LZB16 : lightmap.bc3 :  4,194,332 ->   630,794 =  1.203 bpb =  6.649 to 1 
decode           : 0.005 seconds, 525.10 b/kc, rate= 837.19 mb/s

------------------

silesia_mozilla

lz4hc : 51,220,480 ->22,062,995 =  3.446 bpb =  2.322 to 1
LZ4_decompress_safe_time : 0.083 seconds, 385.47 b/kc, rate= 614.35 mb/s
LZ4_decompress_fast_time : 0.075 seconds, 427.14 b/kc, rate= 680.75 mb/s

LZSSE8 : 51,220,480 ->22,148,366 =  3.459 bpb =  2.313 to 1
LZSSE8_Decode_Time : 0.070 seconds, 461.53 b/kc, rate= 735.59 mb/s

Oodle LZB16 : silesia_mozilla : 51,220,480 ->22,022,002 =  3.440 bpb =  2.326 to 1 
decode           : 0.065 seconds, 492.03 b/kc, rate= 784.19 mb/s

Oodle LZBLW : silesia_mozilla : 51,220,480 ->20,881,772 =  3.261 bpb =  2.453 to 1
decode           : 0.112 seconds, 285.68 b/kc, rate= 455.30 mb/s

------------------

breton.dds

lz4hc :   589,952 ->   116,447 =  1.579 bpb =  5.066 to 1
LZ4_decompress_safe_time : 0.001 seconds, 568.65 b/kc, rate= 906.22 mb/s
LZ4_decompress_fast_time : 0.001 seconds, 624.81 b/kc, rate= 996.54 mb/s

LZSSE8 encode    :  589,952 ->   119,659 =  1.623 bpb =  4.930 to 1
LZSSE8_Decode_Time : 0.001 seconds, 604.14 b/kc, rate= 962.40 mb/s

Oodle LZB16 : breton.dds :    589,952 ->   113,578 =  1.540 bpb =  5.194 to 1 
decode           : 0.001 seconds, 627.56 b/kc, rate= 1001.62 mb/s

Oodle LZBLW : breton.dds :    589,952 ->   132,934 =  1.803 bpb =  4.438 to 1
decode           : 0.001 seconds, 396.04 b/kc, rate= 630.96 mb/s

------------------

d.dds

lz4hc encode     :  1,048,704 ->   656,706 =  5.010 bpb =  1.597 to 1
LZ4_decompress_safe_time : 0.001 seconds, 554.69 b/kc, rate= 884.24 mb/s
LZ4_decompress_fast_time : 0.001 seconds, 587.20 b/kc, rate= 936.34 mb/s

LZSSE8 encode    :  1,048,704 ->   695,583 =  5.306 bpb =  1.508 to 1
LZSSE8_Decode_Time : 0.001 seconds, 551.13 b/kc, rate= 879.05 mb/s

Oodle LZB16 : d.dds :  d.dds :  1,048,704 ->   654,014 =  4.989 bpb =  1.603 to 1 
decode           : 0.001 seconds, 537.78 b/kc, rate= 857.48 mb/s

------------------

all_dds

lz4hc : 79,993,099 ->47,848,680 =  4.785 bpb =  1.672 to 1
LZ4_decompress_safe_time : 0.158 seconds, 316.67 b/kc, rate= 504.70 mb/s
LZ4_decompress_fast_time : 0.143 seconds, 350.66 b/kc, rate= 558.87 mb/s

LZSSE8 : 79,993,099 ->47,807,041 =  4.781 bpb =  1.673 to 1
LZSSE8_Decode_Time : 0.140 seconds, 358.61 b/kc, rate= 571.54 mb/s

Oodle LZB16 : all_dds : 79,993,099 ->47,683,003 =  4.769 bpb =  1.678 to 1 
decode           : 0.113 seconds, 444.38 b/kc, rate= 708.24 mb/s

----------

baby_robot_shell.gr2

lz4hc : 58,788,904 ->32,998,567 =  4.490 bpb =  1.782 to 1
LZ4_decompress_safe_time : 0.090 seconds, 412.04 b/kc, rate= 656.71 mb/s
LZ4_decompress_fast_time : 0.080 seconds, 460.55 b/kc, rate= 734.01 mb/s

LZSSE8 : 58,788,904 ->33,201,406 =  4.518 bpb =  1.771 to 1
LZSSE8_Decode_Time : 0.076 seconds, 485.14 b/kc, rate= 773.20 mb/s

Oodle LZB16 : baby_robot_shell.gr2 : 58,788,904 ->32,862,033 =  4.472 bpb =  1.789 to 1
decode           : 0.070 seconds, 530.45 b/kc, rate= 845.42 mb/s

Oodle LZBLW : baby_robot_shell.gr2 : 58,788,904 ->30,207,635 =  4.111 bpb =  1.946 to 1
decode           : 0.090 seconds, 409.88 b/kc, rate= 653.26 mb/s


After posting the original version with non-fuzz-safe LZB16, I decided to just go and do the fuzz-safing for LZB16.


LZB16, PS4 clang-3.6.1

post-fuzz-safing :

    lzt99 : 24,700,820 ->14,754,643 =  4.779 bpb =  1.674 to 1 
    decode           : 0.027 seconds, 564.72 b/kc, rate= 900.08 mb/s
    
    texture.bc1 :  2,188,524 -> 2,068,823 =  7.562 bpb =  1.058 to 1 
    decode           : 0.004 seconds, 368.67 b/kc, rate= 587.84 mb/s
    
    lightmap.bc3 :  4,194,332 ->   630,794 =  1.203 bpb =  6.649 to 1 
    decode           : 0.005 seconds, 525.10 b/kc, rate= 837.19 mb/s
    
    silesia_mozilla : 51,220,480 ->22,022,002 =  3.440 bpb =  2.326 to 1 
    decode           : 0.065 seconds, 492.03 b/kc, rate= 784.19 mb/s
    
    breton.dds :    589,952 ->   113,578 =  1.540 bpb =  5.194 to 1 
    decode           : 0.001 seconds, 627.56 b/kc, rate= 1001.62 mb/s
    
    d.dds :  1,048,704 ->   654,014 =  4.989 bpb =  1.603 to 1 
    decode           : 0.001 seconds, 537.78 b/kc, rate= 857.48 mb/s
    
    all_dds : 79,993,099 ->47,683,003 =  4.769 bpb =  1.678 to 1 
    decode           : 0.113 seconds, 444.38 b/kc, rate= 708.24 mb/s
    
    baby_robot_shell.gr2 : 58,788,904 ->32,862,033 =  4.472 bpb =  1.789 to 1
    decode           : 0.070 seconds, 530.45 b/kc, rate= 845.42 mb/s

pre-fuzz-safing reference :

    lzt99 912.92 mb/s
    texture.bc1 598.61 mb/s
    lightmap.bc3 876.19 mb/s
    silezia_mozilla 794.72 mb/s
    breton.dds 1078.52 mb/s
    d.dds 888.73 mb/s
    all_dds 701.45 mb/s
    baby_robot_shell.gr2 877.81 mb/s

Mild speed penalty on most files.

No comments:

old rants