[KinoSearch] anecdotal evidence: index size and search speed
Marvin Humphrey
marvin at rectangular.com
Tue Jan 5 21:14:54 PST 2010
On Tue, Jan 05, 2010 at 08:27:53PM -0600, Peter Karman wrote:
> I've been pimping for KinoSearch on the Swish-e list recently and I was asked
> about index sizes and search speed. My own development playground has modest
> size (1M small docs in the test collection) so I'm wondering about KS use in the
> wild, and what folks are seeing in terms of the ratio of collection size to
> index size, how big you've grown your indexes, and what search speed looks like.
Somewhere between a few hundred thousand and a few million docs, you're
probably going to start feeling the size. But it depends on usage pattern.
We've got a 5 GB, 16 million document index here and because our searches
never hit any large posting lists, the size doesn't cause any search-time
problems. Large posting lists are where the rubber hits the road.
Incidentally, I haven't focused on raw search speed in a long time -- my
primary focus last year was on reopen time and integrating mmap. (Opening a
searcher on that 16-millon-doc index, which has 900 MB of sort cache data,
takes about 22 milliseconds.) There are a number of known improvements
waiting for us to make. The big one is to fan out posting data into multiple
files again -- the current unified format is suboptimal. (: But it's still
fast enough for a lot of things. :)
Marvin Humphrey
More information about the kinosearch
mailing list