[KinoSearch] Possible Phrase Query Bug

Marvin Humphrey marvin at rectangular.com
Mon Sep 10 17:29:37 PDT 2007




On Sep 10, 2007, at 5:01 PM, Nathan Kurz wrote:

> 1) What happens when phrase_offsets[0] is greater than the first
> occurrence of the anchor_set?

I'm not sure what you mean by occurrence.  Do you mean the first  
position in the anchor set?

> It seems like there is going to be
> another underflow problem, although it doesn't seem to cause problems
> when I test for it.

Can you please send your test code so I can see what you're concerned  
about?

> 2) I think we continue going through the outer loop even if we have
> run out of anchors.

You're right.  We could add a break after setting anchor_set->len.

         /* winnow down the size of the anchor set */
         anchor_set->len = (char*)new_anchors - (char*)anchors_start;
+
+       /* Bail if we've exhausted all positions for the rarest term. */
+       if (anchor_set->len == 0)
+            break;
      }

> Again, this doesn't seem to cause problems, but seems suboptimal.

I doubt it has the any impact on performance, but I like it because  
it's the kind of thing a human would do -- stop when it becomes clear  
that no match will be found.

> it's the closest thing I've had to an incremental improvement in a
> while,

I implemented your REFCOUNT_INC idea. :)

http://www.rectangular.com/pipermail/kinosearch-commits/2007- 
September/000344.html

That one was a nice incremental change.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list