[KinoSearch] Index optimize failure

Marvin Humphrey marvin at rectangular.com
Thu Sep 18 07:45:53 PDT 2008




On Sep 18, 2008, at 1:49 AM, Henka wrote:

> Revision 3875 consistently fails on an optimize of *one* of my  
> indexes.

Do you recall the last time you updated before r3875?  r3875 itself is  
definitely not the culprit -- it's a one-line fix for a memory leak in  
a testing-only file.

I know you watch the commits list for stuff like this, but did r3737  
from a month ago slip by?

    
------------------------------------------------------------------------
   r3737 | creamyg | 2008-08-19 12:53:41 -0700 (Tue, 19 Aug 2008) | 3  
lines

   Change field numbers to start at 1 instead of 0.  This is a
   backwards-incompatible index format change.

> The script simply opens the index (a previously merged multi-index),  
> then closes it with optimize => 1.
>
> Other index optimizes run successfully.
>
> The error:
> ----------
> Out of bounds: -2147406182 >= 166543 at ../c_src/KinoSearch/Util/ 
> I32Array.c:33 kino_I32Arr_get
>         at /etc/test/testindexer/optimize_master_index line 97
>
>
> Line 97 is the expected:
> ------------------------
> $invindexer->finish( optimize => 1 );

That's an index-out-of-bounds error from I32Array, a "safe" array  
class that throws exceptions when ordinary C array access would  
trigger a memory error.  I32Array is used in a few places around KS,  
but not that many where the capacity would be as high as 166543 and  
that would be called during InvIndexer_Finish().  My guess that those  
are doc numbers and the call in question is coming from PostingPool.c:

         /* Skip deletions. */
         if (doc_map != NULL) {
             const i32_t remapped = I32Arr_Get(doc_map,
                 raw_posting->doc_num - doc_base);
             if ( !remapped )
                 continue;
             raw_posting->doc_num = remapped;
         }

It would be helpful to see a C stack trace to confirm the suspicion.   
If it's the same number every time, can you put a watch point into  
I32Arr_get() looking for it?

> A test case might be a bit difficult considering the size of the  
> index (couple of gigs) and the number of subindexes merged therein,  
> but I'll keep digging to narrow it down.

Will you be able to recreate the circumstances that led to this bug,  
even if we can't condense a test case?  Meaning, can you duplicate the  
sequence of subindex creation and merging?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list