[KinoSearch] Optimize on finish is affecting search results

Matt seabrook at gmail.com
Wed Aug 1 11:34:41 PDT 2007


Just a curiosity...

It's my understanding that passing "optimize => 1" to finish() after
making a lot of changes will result in an index that's optimized for
speed.  However, in addition to that, I'm finding that it's having an
effect on search results as well, albeit a positive one.  My problem
is that some queries only work on an optimized index.

Consider a document with the following content:

     "salad.robot mercenary"

Just random words that won't be gobbled up by the stop list.  Consider
also that the tokenizing expression just looks for words.  The content
would be split like: "salad|robot|mercenary".

After adding this document to my index for the first time, I can find
it with any of the following queries:

    salad
    robot
    mercenary
    salad.robot
    "robot mercenary"
    "salad.robot mercenary"

If I then re-index the document without making any changes to the
content, essentially just remove it and add it, and then call the
non-optimizing finish(), all of the above queries continue to work
accept for "salad.robot".  That query does work if I optimize the
index after re-adding the document, however.

Perhaps I don't fully understand what KinoSearch is doing with that
query, but I suspect "salad.robot" is the equivalent to asking for
"token salad followed by token robot".  Indeed, I should be able to
replace the period with any other token barrier.  For example, this
should work equally well:

    salad!?!!!robot

and indeed it does, but only after optimizing the index.

Granted, this may seem like an odd sort of search to perform.  If the
period was important to me, I could change the tokenizer so that it
includes it in the list of characters to keep, and I may end up doing
that anyways.  I guess I'm just curious to know why that query only
works after using optimize.

I should point out that I'm using KinoSearch 0.15.



More information about the kinosearch mailing list