[KinoSearch] fast phrase matching [patch]
Nathan Kurz
nate at verse.com
Mon Sep 10 22:27:01 PDT 2007
On 9/10/07, Marvin Humphrey <marvin at rectangular.com> wrote:
> > Looking at the object code [it] generates,
>
> This is something I've dabbled in, but would like to pursue in
> earnest. Can you suggest some links or a course of study to get me
> on my way?
Unfortunately, I'm at best an advanced beginner at such things. The
approach I used here was to compile with -ggdb3 and use 'objdump -S'
on the object file. Then I stared at the output until it started to
make sense. My analysis here went no farther than the generalization
that shorter with fewer branches is better. While generally true,
with modern processors one probably needs to test. I had the
advantage with this algorithm to be modifying from something similar I
did earlier that I had done actual testing for an Opteron, although
the modifications were major enough that that testing might not apply
anymore.
That's said, here's two links that I've found useful (and that greatly
exceed my knowledge):
Software Optimization Guide for the AMD64 Processors
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF
Specific to AMD, but includes a lot of background.
Software optimization resources
http://www.agner.org/optimize/
Concentrating on C++ and Assembly, but helpful for C as well.
For the purposes of improving KinoSearch, though, I think that the
biggest room for improvement is going to be through integrating better
with the Virtual Memory Manager
(http://www.informit.com/content/images/0131453483/downloads/gorman_book.pdf)
from profiling to find bottlenecks with Oprofile, and through L2
cache optimization with Cachegrind. I still think proper use of mmap
has tremendous potential.
> > What is your opinion on C99 versus -pedantic?
>
> MSVC is a target, and it doesn't support C99.
>
> I really hate the declaration after statement limitation in
> particular, but KS needs to be maximally portable.
OK. This is probably a fine decision, but I think it will definitely
have a cost. I was hoping that targeting gcc under Cygwin would be
enough. Are there people actively compiling this under MSVC
currently? I know nothing about Windows.
> Often, assertions are clarifying. Other times, they're just paranoid
> -- like checking every dang pointer that could be a NULL. I see the
> addition of clarifying or important ones as beneficial and would be
> pleased to have them in the code base.
I'll send you a version tomorrow with such things included for you to
decide where you want to draw that line. It's out of sync with the
patch right now.
> PS: Tabs suck.
Oops. Have I been including them in things I send?
Nathan Kurz
nate at verse.com
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list