[KinoSearch] Re: KinoSearch feature suggestions

Marvin Humphrey marvin at rectangular.com
Fri Jan 25 02:26:34 PST 2008




On Jan 24, 2008, at 8:20 PM, Father Chrysostomos wrote:

> I’m trying to have a go at this.
>
> How many times is the disk accessed when one does a boolean search  
> (e.g., 'this OR that OR the-other')? And what are those times?

The stack is pretty deep.  The Perl side looks something like...

    KinoSearch::Search::Searchable::search
    KinoSearch::Searcher::top_docs
    KinoSearch::Searcher::collect

Then the C stack looks something like...

    Scorer_collect          (Scorer.c)
    BoolScorer_skip_to      (BooleanScorer.c)
    ORScorer_skip_to        (ORScorer.c)
    advance_after_current   (ORScorer.c)
    ScorerDocQ_top_next     (ScorerDocQueue.c)
    TermScorer_next         (TermScorer.c)
    SegPList_next           (SegPostingList.c)
    ScorePost_read_record   (ScorePosting.c)
    InStream_read_c32       (Instream.c)
    InStream_read_u8        (Instream.c)
    refill                  (InStream.c)
    read_internal           (InStream.c)
    FSFileDes_fdread        (FSFileDes.c)
    fread                   (stdio)

The InStream class is analogous to a filehandle.  But we won't really  
have to concern ourselves with InStream for this purpose.   The  
deepest we'll need to get is PostingList.

> I could find the answer myself by reading more source code, but  
> it’s awfully time consuming....

In order to create legitimate subclasses to implement WildCard  
queries, a bunch of stuff that isn't yet public will have to become  
public.  I'm starting that off by exposing the Lexicon class, along  
with the factory method $index_reader->blank_lexicon($field_name).

The first thing we'll need to do is accumulate an array of terms that  
match the wildcard using the Lexicon object.

Soon, we'll need the PostingList class and $index_reader- 
 >blank_posting_list($field_name);

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list