[KinoSearch] Queries with large number of hits.
Nathan Kurz
nate at verse.com
Wed Sep 17 13:16:01 PDT 2008
On Wed, Sep 17, 2008 at 12:23 PM, Dan <dmarkham at gmail.com> wrote:
> I warmed up the index then reset the opreport and ran the query once..
>
> Here is the report for that one query.
Thanks for posting that Dan. Looks great! This is presumably for one
of the expensive queries?
My quick impression is that while there is probably room improvement
here, there is nothing terribly amiss. Streaming the data from the
index is taking about 2/3 of the time, and the actual searching is
taking about 1/3. This is expensive, but nothing short of an
massively impractical mmap'd uncompressed data format :) is going to
get rid of that whole 2/3's. But since the processing time is
probably close to proportional to the file size, maybe this is where
Lucene has the advantage.
An interesting quick test might be to try some phrase queries. As
Marvin pointed out, Lucene keeps the position data in a separate file
thus doesn't have to deal with it in the queries you are testing. If
the KinoSearch time stays about the same, but the Lucene time jumps
significantly, this would implicate the single file architecture.
Re-indexing KinoSearch without positions and re-running your previous
searches would also be an inverse way to test this hypothesis.
Nathan Kurz
nate at verse.com
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list