[KinoSearch] Possible Phrase Query Bug

Marvin Humphrey marvin at rectangular.com
Mon Sep 10 18:04:47 PDT 2007




On Sep 10, 2007, at 5:50 PM, Nathan Kurz wrote:

>     /* create an anchor set */
>     first_posting = (ScorePosting*)PList_Get_Posting(plists[0]);
>     BB_Copy_Str(anchor_set, (char*)first_posting->prox,
>         first_posting->freq * sizeof(u32_t));
>     anchors_start = (u32_t*)anchor_set->ptr;
>     anchors       = anchors_start;
>     anchors_end   = (u32_t*)BBEND(anchor_set);
>     while(anchors < anchors_end) {
>         ASSERT(*anchors > phrase_offset, "anchor underflow");
>         *anchors++ -= phrase_offset;
>     }
>
> This ASSERT() will fail on certain phrases. An underflow occurs if
> searching the phrase offset for the initial term 'a' is greater than
> zero on the document "a b c" since the first occurrence of the term
> 'a' is at position 0.  So far as I can tell, we still always get
> correct results when we fall through the loop, but I'm worried that
> this is good luck rather than  reliable design.

Yeah, that's not good.  You could end up with an anchor set like this...

     ((2**32 - 3), 62,  190)

Much code assumes that positions are ordered, and that breaks the  
assumption.

We want code that will "shift" invalid values off the front of the  
anchor set.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list