[KinoSearch] improving vbyte decode performance

Nathan Kurz nate at verse.com
Mon Dec 29 10:54:43 PST 2008


Hi Marvin --

I had a little bit of time to spend on improving the vbyte decode
performance.  I haven't actually tried any of the new algorithms yet,
but I've got enough of the testing framework built to have a little
more insight.

The main surprise was that the current KINO_MATH_DECODE_C32 macro
seems very sensitive to gcc compilation options, and not in the
direction I would have expected.  Specifically, at least on my laptop
with gcc 3.4.5, -O1 consistently runs about twice as fast as -O2 or
-O3.

Maybe I've messed something up in the testing?  Here's the snippet I'm testing:
    uint32_t decoded;
    while (size--) {
        KINO_MATH_DECODE_C32(decoded, source);
        *result++ = decoded;
    }

Anyway, I think there is still room for lots of other improvement, but
it's possible that we could get a real speedup from just a small
compilation tweak.  I don't recall where you are at with the other
improvements regarding reducing the number of function calls, but if
you get a chance you might try a -O1 compile (ideally of just the
decode routines) to see if this is a real world effect.

--nate




More information about the kinosearch mailing list