[KinoSearch] improving vbyte decode performance
Nathan Kurz
nate at verse.com
Mon Dec 29 10:54:43 PST 2008
Hi Marvin --
I had a little bit of time to spend on improving the vbyte decode
performance. I haven't actually tried any of the new algorithms yet,
but I've got enough of the testing framework built to have a little
more insight.
The main surprise was that the current KINO_MATH_DECODE_C32 macro
seems very sensitive to gcc compilation options, and not in the
direction I would have expected. Specifically, at least on my laptop
with gcc 3.4.5, -O1 consistently runs about twice as fast as -O2 or
-O3.
Maybe I've messed something up in the testing? Here's the snippet I'm testing:
uint32_t decoded;
while (size--) {
KINO_MATH_DECODE_C32(decoded, source);
*result++ = decoded;
}
Anyway, I think there is still room for lots of other improvement, but
it's possible that we could get a real speedup from just a small
compilation tweak. I don't recall where you are at with the other
improvements regarding reducing the number of function calls, but if
you get a chance you might try a -O1 compile (ideally of just the
decode routines) to see if this is a real world effect.
--nate
More information about the kinosearch
mailing list