[KinoSearch] Possible Phrase Query Bug

Nathan Kurz nate at verse.com
Mon Sep 10 17:50:32 PDT 2007



On 9/10/07, Marvin Humphrey <marvin at rectangular.com> wrote:
>
> On Sep 10, 2007, at 5:01 PM, Nathan Kurz wrote:
>
> > 1) What happens when phrase_offsets[0] is greater than the first
> > occurrence of the anchor_set?
>
> I'm not sure what you mean by occurrence.  Do you mean the first
> position in the anchor set?
>
> > It seems like there is going to be
> > another underflow problem, although it doesn't seem to cause problems
> > when I test for it.
>
> Can you please send your test code so I can see what you're concerned
> about?

Sorry for my lack of clarity.

    /* create an anchor set */
    first_posting = (ScorePosting*)PList_Get_Posting(plists[0]);
    BB_Copy_Str(anchor_set, (char*)first_posting->prox,
        first_posting->freq * sizeof(u32_t));
    anchors_start = (u32_t*)anchor_set->ptr;
    anchors       = anchors_start;
    anchors_end   = (u32_t*)BBEND(anchor_set);
    while(anchors < anchors_end) {
        ASSERT(*anchors > phrase_offset, "anchor underflow");
        *anchors++ -= phrase_offset;
    }

This ASSERT() will fail on certain phrases. An underflow occurs if
searching the phrase offset for the initial term 'a' is greater than
zero on the document "a b c" since the first occurrence of the term
'a' is at position 0.  So far as I can tell, we still always get
correct results when we fall through the loop, but I'm worried that
this is good luck rather than  reliable design.

> I implemented your REFCOUNT_INC idea. :)
>
> http://www.rectangular.com/pipermail/kinosearch-commits/2007-
> September/000344.html
>
> That one was a nice incremental change.

I hadn't noticed that you did that. Thanks!

Nathan Kurz
nate at verse.com

_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list