[KinoSearch] KinoSearch::Docs::Cookbook::ReusingSearchers

Marvin Humphrey marvin at rectangular.com
Fri Sep 14 15:32:53 PDT 2007




On Sep 14, 2007, at 1:39 AM, Nathan Kurz wrote:

> I see this being of benefit only to really gigantic
> loads/indexes with the hardware customized to the role of each node.

In fact, it was during a conversation with someone about just such a  
situation that the divide-by-task idea was initially sketched out.

> What are the cases you are thinking it would benefit?

The sort cache problem is described here:

   http://www.rectangular.com/pipermail/kinosearch/2007-June/000993.html

(You may recall one other scaling challenge: the BitVectors used by  
the Filter subclasses are too big to pass between nodes in a cluster.)

> I think that coming up with a good way of returning the field value to
> the requester is going to be a better final solution.

I don't think we can get optimum performance until the whole term  
dictionary resides in RAM.  Certainly to solve the sort cache  
problem, we need at least the whole sort field in RAM.

Another avenue of attack might be to load the sort field's .lex files  
into RAM, but not decompress them.  Then we'd use an InStream with an  
inner RAMFileDes (instead of an FSFileDes).  No more disk seeks.

This stratagem is a little dicey, because it's hard to predict how  
much space a field's terms will occupy.  It depends not only on how  
many values the sort field has and how long they are, but how well  
they compress.

> The fear of
> disk seeks seems like a red herring --- if a block is being read
> often, it's going to be cached, if it's not often, it doesn't matter.
> And If for some reason we are trashing the page cache and forcing a
> re-read, let's figure out how to change that!

The .lex files for the sort cache field will tend to be scattered  
because of the segmented index format that allows for incremental  
indexing.  You can consolidate them by optimizing the index, but  
that's expensive.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list