[KinoSearch] fast phrase matching [patch]
Nathan Kurz
nate at verse.com
Sun Sep 30 14:11:15 PDT 2007
On 9/28/07, Marvin Humphrey <marvin at rectangular.com> wrote:
> I'm tempted to try re-implementing InStream to use mmap exclusively. I think
> that the i/o usage patterns of KS might make it a suitable candidate for that
> treatment. But then your idea of allowing internal classes to bypass the
> stream classes altogether is interesting.
I think this is worth looking at. I think the major problem is going
to be how to fake the mmap() for Windows systems where it does not
exist. It's possible that a compromise is possible where we keep the
stream classes, but change them to return raw data in
page-size-multiple chunks, with Windows double-buffering and the Linux
implementation doing nothing other returning a pointer into a mapped
region.
> RAMFolder is mostly for testing, but I'd
> be crying in my beer if all the KS tests had to use disk i/o.
The goal is to get everything running as fast (or faster) than
RAMFolder works now. There is not going to any physical disk i/o
happening during testing other than that needed to get the data cached
by the system page buffer the first time it is read.
> Another thing about mmap: how well does it work on 32-bit systems when dealing
> with large files (which are common with KS)?
I don't think this is going to be a problem, although I haven't
thought it through in detail. The underlying implementation of
system read() is essentially mmap(), so we shouldn't hit any
fundamental problems. The total amount mapped at one time can't be
larger than the address space (< 4GB for 32-bit Linux), but I think we
can solve this by mapping and unmapping as necessary.
Nathan Kurz
nate at verse.com
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list