[KinoSearch] QueryFilter Crashings and Smashings
Marvin Humphrey
marvin at rectangular.com
Fri Jun 1 13:42:28 PDT 2007
On May 31, 2007, at 11:05 PM, Chris Nandor wrote:
> The second time through, though, it works: the first time I call -
> >search,
> the above error is produced, but something happens with the
> QueryFilter so
> that it works the second time through. Dumping the object, I
> notice that
> cached_bits is populated before the second call, and it was empty
> on the
> first call. That's the only obvious difference.
I think "works" and "populated" may be misleading here. I assume
you're running these inside an eval, because otherwise you'd never
get to a "second time through". In the QueryFilter code, the cached
BitVector is stored via QueryFilter->store_cached_bits before
Searcher->collect is run to actually flip the bits. It's the call to
Searcher->collect that's crashing. The QueryFilter code is not to
blame.
> Now using the same BooleanQuery that I created the QueryFilter
> from, and
> passing that as the query parameter. It crashes too, but now it
> doesn't
> work on subsequent attempts, as the QueryFilter did.
>
> The BooleanQuery itself is hard to pin down. I have these terms:
>
> uid => 2,
> accepted => 'no',
> rejected => 'no',
> public => 'yes',
> editorpop => 25,
> category => 'none'
>
> If I do just the first two, it works. The first three, it
> doesn't. If I
> remove just the first one and do the other five, it works. And so on.
Since you indicate that these are all added to your BooleanQuery with
'MUST', the final BooleanScorer will be a thin wrapper around an
ANDScorer with several TermScorers as its subscorers.
I suspect that the problem will be found within ANDScorer_skip_to().
There's probably an extra call to a subscorer's Scorer_Skip_To() or
Scorer_Doc() methods after that subscorer has been exhausted. (Once
Scorer_Next returns false, it's invalid to call either Scorer_Skip_To
or Scorer_Doc.) In the effort to make that code as efficient as
possible, it came out a mite tortured.
Fixing it will take an effort similar to what it took to fix
BitVec_Flip_Range. The function should be rethought and simplified
if possible. It will also need more aggressive tests. The bug
you're seeing now isn't revealed by the test suite because it depends
on a peculiar sequence of document numbers within the subscorers.
Hopefully we can come up with a pattern that covers more possible
combinations.
I'll try to get to this this weekend. In the meantime, if you want
to scratch the itch, throw in a couple debug "Warn" calls and see if
you can isolate the failing line within ANDScorer_skip_to.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
More information about the kinosearch
mailing list