[KinoSearch] Scorer->next

webmasters at ctosonline.org webmasters at ctosonline.org
Fri May 15 21:38:44 PDT 2009


On May 15, 2009, at 12:12 PM, Marvin Humphrey wrote:

> On Thu, May 14, 2009 at 08:42:32AM -0700, webmasters at ctosonline.org  
> wrote:
>> I’ve just discovered the hard way that Scorer->next has to return doc
>> ids in order. Here is a patch to mention it in the cookbook:
>
> Thanks, applied as r4594.
>
>> Is it this way to allow KS to tiptoe over deleted documents  
>> efficiently?
>
> It's all about scalability.
>
> When scoring, all doc id sets are represented as iterators.  To get  
> the
> intersection of iterators, they need to proceed in an orderly,  
> predictable
> fashion.
>
> Say you want the top hits for "foo AND bar".  If you weren't worried  
> about
> scalability, you could cache separate result sets as arrays -- one  
> for 'foo'
> and one for 'bar' -- then intersect them, sort, and grab the top few  
> results.
> However, that doesn't scale up to millions of hits because of the  
> size of
> those result sets.
>
> So instead, the iterators for 'foo' and 'bar' ascend through doc  
> nums in sync,
> hits are collected one at a time into a priority queue which only  
> holds as
> many hits as absolutely necessary, and less relevant hits fall out  
> the bottom
> of the queue, keeping memory costs under control.
>
> A better implementation of PrefixQuery would actually use a priority  
> queue to
> hold the PostingLists rather than grab all the results at  
> construction time.
> But then the cookbook entry would have to be a lot longer and more  
> elaborate.

Since you’ve taken the time to write this, how about including it in a  
comment in one of the source files, or even in documentation?

Father Chrysostomos


More information about the kinosearch mailing list