[KinoSearch] getting total hits before a seek

Brett Paden paden at multiply.com
Tue Mar 13 07:02:24 PDT 2007


on Tuesday, Mar 13, 2007, Henka, wrote:
> 
> 
> > Using v0.15 (still)
> >
> > I have a pretty healthy document collection (around 15 million) that gets
> > moderate traffic (260k searches a day) and have been working on
> > improving performance as searches have crept into the >1s range.
> 
> You can also try out Marvin's distributed multi-index search facility (an
> extension to 0.15 I think - Marv can provide more details).  For an index
> of your size (~61% of ours) splitting the index into smaller chunks over
> several search nodes might be a good idea.

I was planning on experimenting with this service.  Do you have any 
hints on the best way to split up an index?  Or can I just make some
arbitrary divisions (5M documents per server in a three server setup,
for example)?  Finally, can the search server handle multiple 
simultaneous connections? 
 
> However, Marvin *did* say that 0.20.x might not require this... :-)
> Unless, of course, the search load requires it - 260,000 searches a day is
> quite a bit.  Spread the load, as we intend to.

Our search server is housed on a Dell 1850 with Quad Xeon processors and 
8G of ram.  I use Net::Server::PreForkSimple to spawn off 20 searchers, 
thus caching a searcher on each child.  The child is responsible for
cleaning up after itself and will respawn if its memory usage grows too
large.

The search index itself is pretty stripped down at 4.5G, as the text 
being searched is not stored nor vectorized.  This allows the OS to 
cache the index in memory with plenty left over for search children to
use. 

Our index definition:

    id => {
        indexed => 1,
        analyzed => 0,
        stored => 1,
        vectorized => 0,
    },
    bodytext => {
        indexed => 1,
        analyzed => 1,
        stored => 0,
        vectorized => 0,
    },
    type => {
        indexed => 1,
        analyzed => 0,
        stored => 0,
        vectorized => 0,
    },
 

On average, we're seeing .9s searches with a fair number exceeding 2s 
... this is down significantly from the average 4s when I stupidly 
looked for total hits before seeking out my ten hits.

I'll let the list know how MultiSearch works out for us.

> 
> 
> 
> _______________________________________________
> KinoSearch mailing list
> KinoSearch at rectangular.com
> http://www.rectangular.com/mailman/listinfo/kinosearch



More information about the kinosearch mailing list