[KinoSearch] Merging indexes, etc
Marvin Humphrey
marvin at rectangular.com
Wed Oct 18 09:36:19 PDT 2006
On Oct 17, 2006, at 11:56 PM, henka at cityweb.co.za wrote:
> 1. Would the master index survive an interrupted
> $iv->add_invindexes($a,$b,...), and be receptive to further
> add_invindexes()s without a problem? If not, how can it be resolved?
There's one critical split-second during finish(), when
"segments.new" gets renamed to "segments". Up until that point, no
changes are permanent. The new files are there, but they won't be
used. If the InvIndexer ceases operations at any point before the
renaming of "segments.new", then the next time an InvIndexer is
created against that invindex location, it will overwrite the unused
segment files.
> 2. The docs aren't specific: I presume one can
> $iv->delete_docs_by_term($term) on the same $iv as the one being
> operated
> on with add_invindexes()? I've encountered no errors so far, but was
> wondering.
Hmm, so you have something like this?
my $invindexer = KinoSearch::InvIndexer->new(
invindex => $invindex,
analyzer => $analyzer,
);
$invindexer->delete_docs_by_term($term);
$invindexer->add_invindexes( $another_invindex,
$yet_another_invindex );
$invindexer->finish;
I didn't consider that use case, so there's no test written for it,
but I think it ought to work.
Eventually, the merge logic is going to change some and the
restriction against performing add_doc and add_invindexes on the same
InvIndexer object will be lifted. The reason that restriction exists
is that merging of indexes/segments which may have different field
defs is complex. However, Dave Balmain has come up with a design
which solves that problem and I'm going to implement it.
/me .oO( now if only Dave and I didn't always have to duplicate
efforts... )
> 3. During a merge operation of many temp indexes into a master
> index, if
> no call is made to add_invindexes() before a finish() (maby because
> of an
> empty/invalid temp index, etc), it generates an error (sorry, busy
> with a
> run at the moment, so will paste sample error later). I've recoded
> the
> logic to side-step the error (ie, don't finish() if nothing is
> added), but
> I wonder if this might have any repercussions (ie, calling "my $iv =
> KinoSearch::InvIndexer->new(...)" on the same index without calling
> finish() in a loop).
There's some stuff in there to make finish() a no-op if nothing's
being changed. Sounds like that's failing, but I don't understand
why. what's being called on $iv in between the last spec_field() and
finish() ?
> 4. What's a good test to detect a bad/invalid/broken temp index?
> At the
> moment I just check if the "segments" file exists and is non-zero.
If an indexing session is interrupted before finish completes, the
segments file will exist, and it will have a length -- however when
KS reads it, KS will see that the invindex doesn't have any
segments. That is, or ought to be, a valid state (I don't think I
have a test written guaranteeing that it will be).
> However, the segments file *will* exist if a temp index run is
> interrupted
> - what other files *shouldn't* exist (and indicate a temp index
> which is
> half-baked) so that I can refine the temp index validation?
The problem is that you can't tell whether or not an indexing session
was interrupted by the file contents of the invindex. I'd suggest
adding failsafe logic to the app that creates your sub-index which
tells you whether or not the indexing session completed.
$invindexer->finish;
session_succeeded();
If session_succeeded() doesn't fire, assume that you have a broken
sub-index and need to repeat whatever actions it took to build it.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
More information about the kinosearch
mailing list