[KinoSearch] Optimize on finish is affecting search results

Marvin Humphrey marvin at rectangular.com
Wed Aug 1 21:38:20 PDT 2007




On Aug 1, 2007, at 11:34 AM, Matt wrote:

> Consider a document with the following content:
>
>      "salad.robot mercenary"
>
> Just random words that won't be gobbled up by the stop list.  Consider
> also that the tokenizing expression just looks for words.  The content
> would be split like: "salad|robot|mercenary".

Yes, that's right.  Tokenizing is almost certainly unrelated to the  
issue you describe.

>     salad
>     robot
>     mercenary
>     salad.robot
>     "robot mercenary"
>     "salad.robot mercenary"
>
> If I then re-index the document without making any changes to the
> content, essentially just remove it and add it, and then call the
> non-optimizing finish(), all of the above queries continue to work
> accept for "salad.robot".

For the record, there is a subtle difference between the way  
QueryParser parses 'salad.robot' and the way it parses 'salad  
robot'.  The first will be interpreted as a phrase.

However, that should not impact the search results pre- and post- 
optimize.

> That query does work if I optimize the
> index after re-adding the document, however.

When you delete-by-term, what KS does is mark any documents which  
match the term in old segments as "deleted".  When you re-add, the  
new document ends up in a new segment.

The re-added document ought to be available from the new segment.

When you optimize, KS merges all existing segments into a single new  
segment, and documents may be reordered.   Search results from the  
same index pre- and post-optimize should be identical except for the  
order of documents which have identical scores against the search query.

> I guess I'm just curious to know why that query only
> works after using optimize.

The possibility exists that one of KinoSearch's iterators is messing  
up and quitting before the last document. Then, when the segments are  
merged, the document appears in a new place and KS can find it  
again.  If that's true, it's a bug.

There may also be some concurrency issues depending on how your  
indexing/search apps are set up.

> I should point out that I'm using KinoSearch 0.15.

If we can reduce this to a problem case that I can duplicate locally,  
I will try to fix it.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list