[KinoSearch] Re: KinoSearch feature suggestions
Marvin Humphrey
marvin at rectangular.com
Fri Jan 25 02:26:34 PST 2008
On Jan 24, 2008, at 8:20 PM, Father Chrysostomos wrote:
> I’m trying to have a go at this.
>
> How many times is the disk accessed when one does a boolean search
> (e.g., 'this OR that OR the-other')? And what are those times?
The stack is pretty deep. The Perl side looks something like...
KinoSearch::Search::Searchable::search
KinoSearch::Searcher::top_docs
KinoSearch::Searcher::collect
Then the C stack looks something like...
Scorer_collect (Scorer.c)
BoolScorer_skip_to (BooleanScorer.c)
ORScorer_skip_to (ORScorer.c)
advance_after_current (ORScorer.c)
ScorerDocQ_top_next (ScorerDocQueue.c)
TermScorer_next (TermScorer.c)
SegPList_next (SegPostingList.c)
ScorePost_read_record (ScorePosting.c)
InStream_read_c32 (Instream.c)
InStream_read_u8 (Instream.c)
refill (InStream.c)
read_internal (InStream.c)
FSFileDes_fdread (FSFileDes.c)
fread (stdio)
The InStream class is analogous to a filehandle. But we won't really
have to concern ourselves with InStream for this purpose. The
deepest we'll need to get is PostingList.
> I could find the answer myself by reading more source code, but
> it’s awfully time consuming....
In order to create legitimate subclasses to implement WildCard
queries, a bunch of stuff that isn't yet public will have to become
public. I'm starting that off by exposing the Lexicon class, along
with the factory method $index_reader->blank_lexicon($field_name).
The first thing we'll need to do is accumulate an array of terms that
match the wildcard using the Lexicon object.
Soon, we'll need the PostingList class and $index_reader-
>blank_posting_list($field_name);
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list