[KinoSearch] Possible Phrase Query Bug
Marvin Humphrey
marvin at rectangular.com
Mon Sep 10 17:29:37 PDT 2007
On Sep 10, 2007, at 5:01 PM, Nathan Kurz wrote:
> 1) What happens when phrase_offsets[0] is greater than the first
> occurrence of the anchor_set?
I'm not sure what you mean by occurrence. Do you mean the first
position in the anchor set?
> It seems like there is going to be
> another underflow problem, although it doesn't seem to cause problems
> when I test for it.
Can you please send your test code so I can see what you're concerned
about?
> 2) I think we continue going through the outer loop even if we have
> run out of anchors.
You're right. We could add a break after setting anchor_set->len.
/* winnow down the size of the anchor set */
anchor_set->len = (char*)new_anchors - (char*)anchors_start;
+
+ /* Bail if we've exhausted all positions for the rarest term. */
+ if (anchor_set->len == 0)
+ break;
}
> Again, this doesn't seem to cause problems, but seems suboptimal.
I doubt it has the any impact on performance, but I like it because
it's the kind of thing a human would do -- stop when it becomes clear
that no match will be found.
> it's the closest thing I've had to an incremental improvement in a
> while,
I implemented your REFCOUNT_INC idea. :)
http://www.rectangular.com/pipermail/kinosearch-commits/2007-
September/000344.html
That one was a nice incremental change.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list