[KinoSearch] QueryFilter Crashings and Smashings

Marvin Humphrey marvin at rectangular.com
Fri Jun 1 13:42:28 PDT 2007


On May 31, 2007, at 11:05 PM, Chris Nandor wrote:

> The second time through, though, it works: the first time I call - 
> >search,
> the above error is produced, but something happens with the  
> QueryFilter so
> that it works the second time through.  Dumping the object, I  
> notice that
> cached_bits is populated before the second call, and it was empty  
> on the
> first call.  That's the only obvious difference.

I think "works" and "populated" may be misleading here.  I assume  
you're running these inside an eval, because otherwise you'd never  
get to a "second time through".  In the QueryFilter code, the cached  
BitVector is stored via QueryFilter->store_cached_bits before  
Searcher->collect is run to actually flip the bits.  It's the call to  
Searcher->collect that's crashing.  The QueryFilter code is not to  
blame.

> Now using the same BooleanQuery that I created the QueryFilter  
> from, and
> passing that as the query parameter.  It crashes too, but now it  
> doesn't
> work on subsequent attempts, as the QueryFilter did.
>
> The BooleanQuery itself is hard to pin down.  I have these terms:
>
> 	uid => 2,
> 	accepted => 'no',
> 	rejected => 'no',
> 	public => 'yes',
> 	editorpop => 25,
> 	category => 'none'
>
> If I do just the first two, it works.  The first three, it  
> doesn't.  If I
> remove just the first one and do the other five, it works.  And so on.

Since you indicate that these are all added to your BooleanQuery with  
'MUST', the final BooleanScorer will be a thin wrapper around an  
ANDScorer with several TermScorers as its subscorers.

I suspect that the problem will be found within ANDScorer_skip_to().   
There's probably an extra call to a subscorer's Scorer_Skip_To() or  
Scorer_Doc() methods after that subscorer has been exhausted.  (Once  
Scorer_Next returns false, it's invalid to call either Scorer_Skip_To  
or Scorer_Doc.) In the effort to make that code as efficient as  
possible, it came out a mite tortured.

Fixing it will take an effort similar to what it took to fix  
BitVec_Flip_Range.  The function should be rethought and simplified  
if possible.  It will also need more aggressive tests.  The bug  
you're seeing now isn't revealed by the test suite because it depends  
on a peculiar sequence of document numbers within the subscorers.   
Hopefully we can come up with a pattern that covers more possible  
combinations.

I'll try to get to this this weekend.  In the meantime, if you want  
to scratch the itch, throw in a couple debug "Warn" calls and see if  
you can isolate the failing line within ANDScorer_skip_to.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/





More information about the kinosearch mailing list