[KinoSearch] getting total hits before a seek

Henka henka at cityweb.co.za
Tue Mar 13 07:07:08 PST 2007



> on Tuesday, Mar 13, 2007, Henka, wrote:
>>
>>
>> > Using v0.15 (still)
>> >
>> > I have a pretty healthy document collection (around 15 million) that
>> gets
>> > moderate traffic (260k searches a day) and have been working on
>> > improving performance as searches have crept into the >1s range.
>>
>> You can also try out Marvin's distributed multi-index search facility
>> (an
>> extension to 0.15 I think - Marv can provide more details).  For an
>> index
>> of your size (~61% of ours) splitting the index into smaller chunks over
>> several search nodes might be a good idea.
>
> I was planning on experimenting with this service.  Do you have any
> hints on the best way to split up an index?  Or can I just make some
> arbitrary divisions (5M documents per server in a three server setup,
> for example)?  Finally, can the search server handle multiple
> simultaneous connections?

We just use a round-robin approach (3x sub-indexes).  The target index is
chosen at crawl-time, making life simple.

>> However, Marvin *did* say that 0.20.x might not require this... :-)
>> Unless, of course, the search load requires it - 260,000 searches a day
>> is
>> quite a bit.  Spread the load, as we intend to.
>
> Our search server is housed on a Dell 1850 with Quad Xeon processors and
> 8G of ram.  I use Net::Server::PreForkSimple to spawn off 20 searchers,
> thus caching a searcher on each child.  The child is responsible for
> cleaning up after itself and will respawn if its memory usage grows too
> large.
>
> The search index itself is pretty stripped down at 4.5G, as the text
> being searched is not stored nor vectorized.  This allows the OS to
> cache the index in memory with plenty left over for search children to
> use.
>
> Our index definition:

hmm - our approach is a bit different will LOTS of special fields and
indexing on all of them, so the indexes are huge (several hundred GB and
growing).

You should be OK with the single machine considering your index size -
even better if you migrate to ks 0.20.x when it's a bit more stable.

Cheers
h




More information about the KinoSearch mailing list