[KinoSearch] QueryFilter Crashings and Smashings

Marvin Humphrey marvin at rectangular.com
Wed Jun 20 14:15:30 PDT 2007


On Jun 4, 2007, at 9:18 AM, Chris Nandor wrote:

> Error:Slash::SearchToo::Kinosearch:/usr/local/lib/perl5/site_perl/ 
> 5.8.4/Slash/SearchToo/Kinosearch.pm:215:21052:
> kinosearcher failed (attempt 1).  Trying again ... : Error in function
> refill at ../c_src/KinoSearch/Store/InStream.c:92: Read past EOF of  
> _1.p1
> (start: 912 len 912)

I believe I've finally got this bug hunted and killed, though I'll  
have to wait for official confirmation from you, Pudge.  In any case,  
I've found something that can result in both crashes and incorrect  
search results.  It's sufficiently nasty that I'm going to release  
version 0.20_04 ASAP.

This bug had both search-time and index-time components.  Indexes  
created with 0.20_03 are probably corrupt; however, they need not be  
recreated from scratch.  It is only the .skip file which contains bad  
information, and this gets completely regenerated for each new  
segment.  To repair existing indexes, optimize them to a single, new  
segment:

   $invindexer->finish( optimize => 1 );

Details:

Your query produced a BooleanScorer with several TermScorers.  These  
TermScorers iterate over document numbers using PostingList objects.

SegPList_skip_to() was not living up to its contract.  It is supposed  
to be an optimization of PList_skip_to, but it was malfunctioning and  
setting the SegPostingList object's internal state incorrectly.  It  
was both seeking to the wrong place in the .p1 postings file, and  
setting the count of docs already seen to an incorrect, lower number.

The crash you saw arose because of the counting issue -- the iterator  
kept going after it should have quit, resulting in the "Read past EOF".

However, this bug can also result in search results which are simply  
incorrect.  After an optimized seek, the SegPostingList object is  
reading from the wrong place in the postings file.  It will now read  
incorrect delta doc numbers, and produce bogus matches.  The larger  
the index, the more wrong results you're likely to see.

The crucial component of the fix was a set of stress tests for  
SegPList_skip_to.  Once they were in place, the necessary fixes in  
PostingsWriter.c and SegPostingList.c could be identified.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/





More information about the kinosearch mailing list