[KinoSearch] Possible Phrase Query Bug
Marvin Humphrey
marvin at rectangular.com
Mon Sep 10 18:04:47 PDT 2007
On Sep 10, 2007, at 5:50 PM, Nathan Kurz wrote:
> /* create an anchor set */
> first_posting = (ScorePosting*)PList_Get_Posting(plists[0]);
> BB_Copy_Str(anchor_set, (char*)first_posting->prox,
> first_posting->freq * sizeof(u32_t));
> anchors_start = (u32_t*)anchor_set->ptr;
> anchors = anchors_start;
> anchors_end = (u32_t*)BBEND(anchor_set);
> while(anchors < anchors_end) {
> ASSERT(*anchors > phrase_offset, "anchor underflow");
> *anchors++ -= phrase_offset;
> }
>
> This ASSERT() will fail on certain phrases. An underflow occurs if
> searching the phrase offset for the initial term 'a' is greater than
> zero on the document "a b c" since the first occurrence of the term
> 'a' is at position 0. So far as I can tell, we still always get
> correct results when we fall through the loop, but I'm worried that
> this is good luck rather than reliable design.
Yeah, that's not good. You could end up with an anchor set like this...
((2**32 - 3), 62, 190)
Much code assumes that positions are ordered, and that breaks the
assumption.
We want code that will "shift" invalid values off the front of the
anchor set.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list