Hi to all who appreciate extreme speeds.
Recently, I was lucky to finish my 2+ years long quest for writing in C the most optimized function for finding a memory block into another block of memory, the so called MEMMEM.
My function named Railgun is 100% free and you can see it at the external links in Wikipedia article about BMH algorithm:
https/en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm
The problem is that the beautiful Boyer–Moore–Horspool algorithm has reigned for 33 years (since 1980) and no one bothered to raise the order of checked chars in the rightmost part of the window. I simply raised the order from 1 to 2 and also to 12 for larger patterns. This did lead to thunderous boost in speed performance even on my old Core 2 T7500 laptop, but since I am greedy my wish is to see how fast it can go (especially on AMD Vishera with reduced L1 cache) on different modern CPUs, I would be glad to see my benchmark results on Haswell - that's my request.
Machinely yours,
Georgi 'Sanmayce'
Recently, I was lucky to finish my 2+ years long quest for writing in C the most optimized function for finding a memory block into another block of memory, the so called MEMMEM.
My function named Railgun is 100% free and you can see it at the external links in Wikipedia article about BMH algorithm:
https/en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm
The problem is that the beautiful Boyer–Moore–Horspool algorithm has reigned for 33 years (since 1980) and no one bothered to raise the order of checked chars in the rightmost part of the window. I simply raised the order from 1 to 2 and also to 12 for larger patterns. This did lead to thunderous boost in speed performance even on my old Core 2 T7500 laptop, but since I am greedy my wish is to see how fast it can go (especially on AMD Vishera with reduced L1 cache) on different modern CPUs, I would be glad to see my benchmark results on Haswell - that's my request.
Machinely yours,
Georgi 'Sanmayce'