[KinoSearch] the lifecycle of a Posting

Marvin Humphrey marvin at rectangular.com
Thu Sep 27 10:53:49 PDT 2007



I wrote:
> There's one thing that's really wacky about PostingList and the Postings that
> TermScorer sees.
> 
> malloc() and free() are expensive ops.  And a hell of a lot of Postings go by
> during scoring.
> 
> So... to save time, PList_Bulk_Read doesn't actually create individual Posting
> objects.  It reads new data into the *same* master Posting over and over and
> stacks copies of the master end to end within a ByteBuf.  Instead of creating
> and destroying many many Postings, we create and destroy a single ByteBuf.  
> 
> These copies are what the Scorers actually see.

Explaining this got me thinking.  The "bulk read" functionality is a Lucene
artifact.  Of necessity, it's implemented differently in KS.  But I don't
think we really need it at all.  

Hard drive buffering is handled by InStream, and even the FILE* object, since
I've never been able to figure out why turning off buffering with setvbuf
slows things down.  There's really no reason to buffer a bunch of Posting
objects.  

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list