[KinoSearch] QueryFilter Crashings and Smashings
Marvin Humphrey
marvin at rectangular.com
Wed Jun 20 14:15:30 PDT 2007
On Jun 4, 2007, at 9:18 AM, Chris Nandor wrote:
> Error:Slash::SearchToo::Kinosearch:/usr/local/lib/perl5/site_perl/
> 5.8.4/Slash/SearchToo/Kinosearch.pm:215:21052:
> kinosearcher failed (attempt 1). Trying again ... : Error in function
> refill at ../c_src/KinoSearch/Store/InStream.c:92: Read past EOF of
> _1.p1
> (start: 912 len 912)
I believe I've finally got this bug hunted and killed, though I'll
have to wait for official confirmation from you, Pudge. In any case,
I've found something that can result in both crashes and incorrect
search results. It's sufficiently nasty that I'm going to release
version 0.20_04 ASAP.
This bug had both search-time and index-time components. Indexes
created with 0.20_03 are probably corrupt; however, they need not be
recreated from scratch. It is only the .skip file which contains bad
information, and this gets completely regenerated for each new
segment. To repair existing indexes, optimize them to a single, new
segment:
$invindexer->finish( optimize => 1 );
Details:
Your query produced a BooleanScorer with several TermScorers. These
TermScorers iterate over document numbers using PostingList objects.
SegPList_skip_to() was not living up to its contract. It is supposed
to be an optimization of PList_skip_to, but it was malfunctioning and
setting the SegPostingList object's internal state incorrectly. It
was both seeking to the wrong place in the .p1 postings file, and
setting the count of docs already seen to an incorrect, lower number.
The crash you saw arose because of the counting issue -- the iterator
kept going after it should have quit, resulting in the "Read past EOF".
However, this bug can also result in search results which are simply
incorrect. After an optimized seek, the SegPostingList object is
reading from the wrong place in the postings file. It will now read
incorrect delta doc numbers, and produce bogus matches. The larger
the index, the more wrong results you're likely to see.
The crucial component of the fix was a set of stress tests for
SegPList_skip_to. Once they were in place, the necessary fixes in
PostingsWriter.c and SegPostingList.c could be identified.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
More information about the kinosearch
mailing list