[KinoSearch] Queries with large number of hits.

Nathan Kurz nate at verse.com
Sun Sep 14 22:02:02 PDT 2008


On Sun, Sep 14, 2008 at 4:36 PM, Marvin Humphrey <marvin at rectangular.com> wrote:
>> Is there anything I can do to make these searches perform better?
>
> There are a couple of known issues that on the todo list that affect search
> speed.  One is a bugfix (SegPList_Skip_To had to be temporarily disabled due
> to corrupt .skip files), and the other is a design flaw, described in
> <http://www.mail-archive.com/java-dev@lucene.apache.org/msg15825.html>.
>  Additionally, implementing the PForDelta compression algorithm for postings
> should speed up searching, but I'd planned to put that off.

Hi Marvin ---

Taking Dan's tests at face value, for the moment, I'm not quite
understanding how the issues you are pointing at would affect speed
this much.   It seems like his chosen terms can't be occurring so many
times per document that the extra position decoding could be this
significant. But maybe I'm not understanding the Lucene thread well
enough.  Is the Lucene position data kept in a separate stream?  Or is
it just not processed until requested?

Dan, my quick summary as a long-term observer is that there would be
no unsolveable reason that KinoSearch should be significantly slower
than Solr here, presuming you do indeed have caching turned off.  If
it is this much slower, it's probably a bug that can be fixed, and
Marvin is remarkable about fixing well-reported bugs quickly.  If
creating a real benchmark (a good idea) seems too difficult, finding
the hotstop with something like Oprofile might be a good way to focus
his attention

Nathan Kurz
nate at verse.com



More information about the kinosearch mailing list