[KinoSearch] Index optimize failure
Marvin Humphrey
marvin at rectangular.com
Thu Sep 18 07:45:53 PDT 2008
On Sep 18, 2008, at 1:49 AM, Henka wrote:
> Revision 3875 consistently fails on an optimize of *one* of my
> indexes.
Do you recall the last time you updated before r3875? r3875 itself is
definitely not the culprit -- it's a one-line fix for a memory leak in
a testing-only file.
I know you watch the commits list for stuff like this, but did r3737
from a month ago slip by?
------------------------------------------------------------------------
r3737 | creamyg | 2008-08-19 12:53:41 -0700 (Tue, 19 Aug 2008) | 3
lines
Change field numbers to start at 1 instead of 0. This is a
backwards-incompatible index format change.
> The script simply opens the index (a previously merged multi-index),
> then closes it with optimize => 1.
>
> Other index optimizes run successfully.
>
> The error:
> ----------
> Out of bounds: -2147406182 >= 166543 at ../c_src/KinoSearch/Util/
> I32Array.c:33 kino_I32Arr_get
> at /etc/test/testindexer/optimize_master_index line 97
>
>
> Line 97 is the expected:
> ------------------------
> $invindexer->finish( optimize => 1 );
That's an index-out-of-bounds error from I32Array, a "safe" array
class that throws exceptions when ordinary C array access would
trigger a memory error. I32Array is used in a few places around KS,
but not that many where the capacity would be as high as 166543 and
that would be called during InvIndexer_Finish(). My guess that those
are doc numbers and the call in question is coming from PostingPool.c:
/* Skip deletions. */
if (doc_map != NULL) {
const i32_t remapped = I32Arr_Get(doc_map,
raw_posting->doc_num - doc_base);
if ( !remapped )
continue;
raw_posting->doc_num = remapped;
}
It would be helpful to see a C stack trace to confirm the suspicion.
If it's the same number every time, can you put a watch point into
I32Arr_get() looking for it?
> A test case might be a bit difficult considering the size of the
> index (couple of gigs) and the number of subindexes merged therein,
> but I'll keep digging to narrow it down.
Will you be able to recreate the circumstances that led to this bug,
even if we can't condense a test case? Meaning, can you duplicate the
sequence of subindex creation and merging?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list