[KinoSearch] KS 0.3007 optimize on frequent index updates/commits? Deal with out of memory?
Marvin Humphrey
marvin at rectangular.com
Wed Nov 25 11:10:41 PST 2009
On Wed, Nov 25, 2009 at 10:19:28AM -0800, Ashley Pond V wrote:
> When and if should you do optimize()?
Rarely. Maybe never. Probably when you're creating an index which is never
going to be modified again, like an index of documentation that gets
distributed along with the docs.
It might also be somewhat useful if you're running a big merge once per day to
catch up, so that you start each day with zero fragmentation.
In general, though, searches aren't going to slow down much when you have 10
segments instead of 1. Segment proliferation is going to start being a
serious issue when you have hundreds.
> In here-
>
> http://search.cpan.org/~creamyg/KinoSearch-0.30_07/lib/KinoSearch/Docs/Cookbook/FastUpdates.pod
>
> -it suggests
>
> $indexer->optimize; # optional
> $indexer->finish;
>
> I think finish() is a 0.1 method, though, not a 0.3 method. So that should be-
>
> $indexer->optimize;
> $indexer->commit;
>
> Right?
Right, that's a doc glitch. Thanks, fixed by commit r5518, which also
eliminates all references to optimize() in the FastUpdates cookbook entry.
Aside from that glitch, the FastUpdates pod is up-to-date and offers what I
think is good general advice.
> I have a script right now watching a database for changes and then
> updating my KinoSearch index with new things when there are changes. I
> was doing optimize() on it every call. Watching the index files I see
> that they balloon for a moment during optimize and then settle back
> down to an ideal size.
Right, the index file size ballooning is due to temp files, which get zapped
just before the commit, and obsolete files, which get zapped just after the
commit.
> My script got an out of memory in the middle of one of these. Perhaps
> because too many other things were going on so I'm not blaming KS.
I'm guessing you have some sortable fields? An OOM during indexing is almost
certainly due to SortWriter, which is the only index component with memory
requirements that grow continuously during an indexing session. It's a known
issue, but not easy to fix.
> Will the index files remain reasonable on their own over long spans of time
> with frequent but small document changes and deletions?
Yes.
> * Is there a way to call optimize() safely? If it bombs out in the
> middle it can leave a set of index files which are too big to deal
> with and cause "out of memory" on any further attempts to
> commit/optimize them.
A crashed indexing session has no effect on subsequent indexing sessions. So
long as nothing was committed, each new process sweeps away leftover files and
starts over.
> This is a doc index of about 20K files and the index files occupy 80MB
> after a fresh rebuild. They've crept up to 86MB doing various
> adds/deletes/commits without the optimize call.
That's not very big. You might just watch a full indexing session with an
optimize() to see what worst case RAM usage is.
Marvin Humphrey
More information about the kinosearch
mailing list