9/30/2011

09-30-11 - String Match Results Part 4

Okay, finally on to greedy parsing. Note with greedy parsing the average match length per byte is always <= 1.0 (it's actually the % of bytes matched in this case).

Two charts for each , the first is clocks per byte, the second is average match length. Note that Suffix5 is just for reference and is neither windowed nor greedy.

got arg : window_bits = 16

got arg : window_bits = 17

got arg : window_bits = 18

Commentary :

Okay, finally MMC beats Suffix Trie and LzFind, this is what it's good at. Both MMC and LzFind get slower as the window gets larger. Surprisingly, the good old Zip-style Hash1 is significantly faster and finds almost all the matches on these files. (note that LzFind1 and Hash1 both have search limits but MMC does not)


test set :

0 = ares_merged.gr2_sec_5.dat
1 = bigship1.x
2 = BOOK1
3 = combined.gr2_sec_3.dat
4 = GrannyRocks_wot.gr2
5 = Gryphon_stripped.gr2
6 = hetero50k
7 = lzt20
8 = lzt21
9 = lzt22
10 = lzt23
11 = lzt24
12 = lzt25
13 = lzt27
14 = lzt28
15 = lzt29
16 = movie_headers.bin
17 = orange_purple.BMP
18 = predsave.bin


The same matchers ; greedy, 16 bit window, on the stress tests :

LzFind does not do well at all on the stress tests. (note that LzFind1 and MMC1 are length-limitted; LzFind1 and Hash1 are "amortized" (step limitted)).

0 = stress_all_as 
1 = stress_many_matches 
2 = stress_search_limit 
3 = stress_sliding_follow 
4 = stress_suffix_forward 
5 = twobooks

No comments:

old rants