[KinoSearch] I'm getting fewer than expected results when supplying multiple fields
Marvin Humphrey
marvin at rectangular.com
Thu Nov 15 15:52:45 PST 2007
On Nov 10, 2007, at 3:23 AM, Adam . wrote:
> I'll strip out the irrelevent code/data and send my data and test case
> to you off-list once I've got a refined example.
I've isolated the problem and can provide a workaround. It turns out
not to be ANDScorer after all. ANDScorer has held up well under more
thorough testing.
Instead, the problem manifests during SegPList_Skip_To. It either
occurs because of something awry in the SegPList_skip_to function
itself, or because either PostingsWriter or LexWriter isn't writing
skip information correctly.
SegPList_Skip_To is only an optimization though. If we disable it,
then SegPostingList inherits the definitive method from Scorer...
bool_t
Scorer_skip_to(Scorer *self, u32_t target)
{
do {
if ( !Scorer_Next(self) )
return false;
} while ( target > Scorer_Doc(self) );
return true;
}
... which produces the correct results.
To implement the workaround, comment out the declaration of Skip_To
in SegPostingList.h...
/*
chy_bool_t
kino_SegPList_skip_to(kino_SegPostingList *self, chy_u32_t target);
KINO_METHOD("Kino_SegPList_Skip_To");
*/
... then run this sequence:
./Build distclean
perl Build.PL
./Build [test, install, code, whatever]
Note: the distclean step is *essential*.
The main consequence of disabling Skip_To is that intersections which
contain at least one rare term will proceed more slowly.
The investigation of this bug has produced some happy side effects.
* The newly introduced MockScorer class has made it possible to write
much more robust tests for ANDScorer; I'll soon be adding similarly
robust tests for ORScorer, ANDORScorer, and ANDNOTScorer.
* Even better, I've more-or-less solved the problem of how override
C methods with Perl methods, making it possible to implement
MockScorer
entirely in Perl. The same technique can be applied for other
classes,
making it possible for instance to write a HitCollector with a Perl
collect() method.
The next step is to figure out what's causing SegPList_Skip_To to
misbehave. Happily, even if the problem is a write-time bug, skip
information is completely recoverable, so it won't be necessary to
regenerate indexes from scratch.
Cheers,
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list