[KinoSearch] anecdotal evidence: index size and search speed

Marvin Humphrey marvin at rectangular.com
Tue Jan 5 21:14:54 PST 2010


On Tue, Jan 05, 2010 at 08:27:53PM -0600, Peter Karman wrote:

> I've been pimping for KinoSearch on the Swish-e list recently and I was asked 
> about index sizes and search speed. My own development playground has modest 
> size (1M small docs in the test collection) so I'm wondering about KS use in the 
> wild, and what folks are seeing in terms of the ratio of collection size to 
> index size, how big you've grown your indexes, and what search speed looks like.

Somewhere between a few hundred thousand and a few million docs, you're
probably going to start feeling the size.  But it depends on usage pattern.
We've got a 5 GB, 16 million document index here and because our searches
never hit any large posting lists, the size doesn't cause any search-time
problems.  Large posting lists are where the rubber hits the road.

Incidentally, I haven't focused on raw search speed in a long time -- my
primary focus last year was on reopen time and integrating mmap.  (Opening a
searcher on that 16-millon-doc index, which has 900 MB of sort cache data,
takes about 22 milliseconds.)  There are a number of known improvements
waiting for us to make.  The big one is to fan out posting data into multiple
files again -- the current unified format is suboptimal. (: But it's still
fast enough for a lot of things.  :)

Marvin Humphrey




More information about the kinosearch mailing list