[KinoSearch] Possible Phrase Query Bug
Nathan Kurz
nate at verse.com
Mon Sep 10 17:50:32 PDT 2007
On 9/10/07, Marvin Humphrey <marvin at rectangular.com> wrote:
>
> On Sep 10, 2007, at 5:01 PM, Nathan Kurz wrote:
>
> > 1) What happens when phrase_offsets[0] is greater than the first
> > occurrence of the anchor_set?
>
> I'm not sure what you mean by occurrence. Do you mean the first
> position in the anchor set?
>
> > It seems like there is going to be
> > another underflow problem, although it doesn't seem to cause problems
> > when I test for it.
>
> Can you please send your test code so I can see what you're concerned
> about?
Sorry for my lack of clarity.
/* create an anchor set */
first_posting = (ScorePosting*)PList_Get_Posting(plists[0]);
BB_Copy_Str(anchor_set, (char*)first_posting->prox,
first_posting->freq * sizeof(u32_t));
anchors_start = (u32_t*)anchor_set->ptr;
anchors = anchors_start;
anchors_end = (u32_t*)BBEND(anchor_set);
while(anchors < anchors_end) {
ASSERT(*anchors > phrase_offset, "anchor underflow");
*anchors++ -= phrase_offset;
}
This ASSERT() will fail on certain phrases. An underflow occurs if
searching the phrase offset for the initial term 'a' is greater than
zero on the document "a b c" since the first occurrence of the term
'a' is at position 0. So far as I can tell, we still always get
correct results when we fall through the loop, but I'm worried that
this is good luck rather than reliable design.
> I implemented your REFCOUNT_INC idea. :)
>
> http://www.rectangular.com/pipermail/kinosearch-commits/2007-
> September/000344.html
>
> That one was a nice incremental change.
I hadn't noticed that you did that. Thanks!
Nathan Kurz
nate at verse.com
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list