[KinoSearch] Merging indexes, etc

henka at cityweb.co.za henka at cityweb.co.za
Fri Oct 20 09:25:07 PDT 2006


>> Backing up the amount of data involved is, well, becoming difficult...
>
> Do you have a plan in place in case the unlikely occurs and an index
> does become corrupted?  Obviously, this isn't supposed to happen, and
> bug reports regarding busted KS indexes have been very rare.  But I'm
> human, and so are computers.

Yes, the approach is to either have a dedicated backup machine which sole
purpose is index backup, or store index copies on the "search" nodes (as
opposed to NFS'ing the index to the search nodes).  I noticed the
discussion which touched on NFS so I'm not sure this is a good approach
(also performance penalties, etc).

>> Here's the error btw:
>>
>> Can't locate object method "_release_locks" via package
>> "self" (perhaps
>> you forgot to load "self"?) at
>> /usr/lib/perl5/site_perl/5.8.7/i486-linux/KinoSearch/InvIndexer.pm
>> line
>> 273.
>
> Looks like the DESTROY method is misbehaving.  Maybe this will help?
>
> -sub DESTROY { shift->_release_locks }
> +sub DESTROY {
> +    my $self = shift;
> +    $self->_release_locks;
> +}
>
> I suspect what's happening is that the object is being reclaimed
> before _release_locks() gets dispatched, because of the funny way I
> used shift().  By making a local copy of $self, the object's refcount
> should be increased and it should stick around until the end of the
> block.

Will test and let you know; thanks.

> Adding skipTo() to SegTermDocs and MultiTermDocs is easy, and should
> yield some improvement of speed on phrase queries right away.
> However, more significant benefits will accrue when I port Lucene's
> BooleanScorer2 and its dependencies.  That's more work.

Hmm.  Are you able to quantify (intuitively is fine) what kind of
improvements one can expect in search performance for keywords in general,
and phrases specifically?

> At some point, index size becomes too great for any one machine to
> handle gracefully.   What needs to happen then is for documents to be
> distributed around several machines on several indexes -- so you no
> longer have one monolithic index.  Each machine then searches against
> its smaller index, the results are pooled, and there's something like
> a runoff election to determine which documents get returned.
>
> KinoSearch does not yet have the infrastructure to support this, but
> the design is out there and just needs to be implemented.

This sounds interesting.  Hazzard a guess:  how long to implement this
concept?




More information about the kinosearch mailing list