[KinoSearch] KinoSearch::Docs::Cookbook::ReusingSearchers
Marvin Humphrey
marvin at rectangular.com
Fri Sep 14 15:32:53 PDT 2007
On Sep 14, 2007, at 1:39 AM, Nathan Kurz wrote:
> I see this being of benefit only to really gigantic
> loads/indexes with the hardware customized to the role of each node.
In fact, it was during a conversation with someone about just such a
situation that the divide-by-task idea was initially sketched out.
> What are the cases you are thinking it would benefit?
The sort cache problem is described here:
http://www.rectangular.com/pipermail/kinosearch/2007-June/000993.html
(You may recall one other scaling challenge: the BitVectors used by
the Filter subclasses are too big to pass between nodes in a cluster.)
> I think that coming up with a good way of returning the field value to
> the requester is going to be a better final solution.
I don't think we can get optimum performance until the whole term
dictionary resides in RAM. Certainly to solve the sort cache
problem, we need at least the whole sort field in RAM.
Another avenue of attack might be to load the sort field's .lex files
into RAM, but not decompress them. Then we'd use an InStream with an
inner RAMFileDes (instead of an FSFileDes). No more disk seeks.
This stratagem is a little dicey, because it's hard to predict how
much space a field's terms will occupy. It depends not only on how
many values the sort field has and how long they are, but how well
they compress.
> The fear of
> disk seeks seems like a red herring --- if a block is being read
> often, it's going to be cached, if it's not often, it doesn't matter.
> And If for some reason we are trashing the page cache and forcing a
> re-read, let's figure out how to change that!
The .lex files for the sort cache field will tend to be scattered
because of the segmented index format that allows for incremental
indexing. You can consolidate them by optimizing the index, but
that's expensive.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list