[KinoSearch] I'm getting fewer than expected results when supplying multiple fields

Marvin Humphrey marvin at rectangular.com
Thu Nov 15 15:52:45 PST 2007




On Nov 10, 2007, at 3:23 AM, Adam . wrote:

> I'll strip out the irrelevent code/data and send my data and test case
> to you off-list once I've got a refined example.

I've isolated the problem and can provide a workaround.  It turns out  
not to be ANDScorer after all.  ANDScorer has held up well under more  
thorough testing.

Instead, the problem manifests during SegPList_Skip_To.   It either  
occurs because of something awry in the SegPList_skip_to function  
itself, or because either PostingsWriter or LexWriter isn't writing  
skip information correctly.

SegPList_Skip_To is only an optimization though.  If we disable it,  
then SegPostingList inherits the definitive method from Scorer...

     bool_t
     Scorer_skip_to(Scorer *self, u32_t target)
     {
         do {
             if ( !Scorer_Next(self) )
                 return false;
         } while ( target > Scorer_Doc(self) );

         return true;
     }

... which produces the correct results.

To implement the workaround, comment out the declaration of Skip_To  
in SegPostingList.h...

   /*
   chy_bool_t
   kino_SegPList_skip_to(kino_SegPostingList *self, chy_u32_t target);
   KINO_METHOD("Kino_SegPList_Skip_To");
   */

... then run this sequence:

    ./Build distclean
    perl Build.PL
    ./Build [test, install, code, whatever]

Note: the distclean step is *essential*.

The main consequence of disabling Skip_To is that intersections which  
contain at least one rare term will proceed more slowly.

The investigation of this bug has produced some happy side effects.

   * The newly introduced MockScorer class has made it possible to write
     much more robust tests for ANDScorer; I'll soon be adding similarly
     robust tests for ORScorer, ANDORScorer, and ANDNOTScorer.
   * Even better, I've more-or-less solved the problem of how override
     C methods with Perl methods, making it possible to implement  
MockScorer
     entirely in Perl.  The same technique can be applied for other  
classes,
     making it possible for instance to write a HitCollector with a Perl
     collect() method.

The next step is to figure out what's causing SegPList_Skip_To to  
misbehave.  Happily, even if the problem is a write-time bug, skip  
information is completely recoverable, so it won't be necessary to  
regenerate indexes from scratch.

Cheers,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list