[KinoSearch] Optimize on finish is affecting search results
Marvin Humphrey
marvin at rectangular.com
Wed Aug 1 21:38:20 PDT 2007
On Aug 1, 2007, at 11:34 AM, Matt wrote:
> Consider a document with the following content:
>
> "salad.robot mercenary"
>
> Just random words that won't be gobbled up by the stop list. Consider
> also that the tokenizing expression just looks for words. The content
> would be split like: "salad|robot|mercenary".
Yes, that's right. Tokenizing is almost certainly unrelated to the
issue you describe.
> salad
> robot
> mercenary
> salad.robot
> "robot mercenary"
> "salad.robot mercenary"
>
> If I then re-index the document without making any changes to the
> content, essentially just remove it and add it, and then call the
> non-optimizing finish(), all of the above queries continue to work
> accept for "salad.robot".
For the record, there is a subtle difference between the way
QueryParser parses 'salad.robot' and the way it parses 'salad
robot'. The first will be interpreted as a phrase.
However, that should not impact the search results pre- and post-
optimize.
> That query does work if I optimize the
> index after re-adding the document, however.
When you delete-by-term, what KS does is mark any documents which
match the term in old segments as "deleted". When you re-add, the
new document ends up in a new segment.
The re-added document ought to be available from the new segment.
When you optimize, KS merges all existing segments into a single new
segment, and documents may be reordered. Search results from the
same index pre- and post-optimize should be identical except for the
order of documents which have identical scores against the search query.
> I guess I'm just curious to know why that query only
> works after using optimize.
The possibility exists that one of KinoSearch's iterators is messing
up and quitting before the last document. Then, when the segments are
merged, the document appears in a new place and KS can find it
again. If that's true, it's a bug.
There may also be some concurrency issues depending on how your
indexing/search apps are set up.
> I should point out that I'm using KinoSearch 0.15.
If we can reduce this to a problem case that I can duplicate locally,
I will try to fix it.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list