[KinoSearch] the lifecycle of a Posting
Marvin Humphrey
marvin at rectangular.com
Thu Sep 27 10:53:49 PDT 2007
I wrote:
> There's one thing that's really wacky about PostingList and the Postings that
> TermScorer sees.
>
> malloc() and free() are expensive ops. And a hell of a lot of Postings go by
> during scoring.
>
> So... to save time, PList_Bulk_Read doesn't actually create individual Posting
> objects. It reads new data into the *same* master Posting over and over and
> stacks copies of the master end to end within a ByteBuf. Instead of creating
> and destroying many many Postings, we create and destroy a single ByteBuf.
>
> These copies are what the Scorers actually see.
Explaining this got me thinking. The "bulk read" functionality is a Lucene
artifact. Of necessity, it's implemented differently in KS. But I don't
think we really need it at all.
Hard drive buffering is handled by InStream, and even the FILE* object, since
I've never been able to figure out why turning off buffering with setvbuf
slows things down. There's really no reason to buffer a bunch of Posting
objects.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list