[KinoSearch] revision 3552 SEGV during indexing

Marvin Humphrey marvin at rectangular.com
Wed Jul 2 15:36:25 PDT 2008




On Jun 30, 2008, at 8:54 PM, Henry wrote:

> Revision 3552 seems to be SEGV'ing.

OK, the recent big leaks were cleaned up as of r3551, so my guess is  
that this isn't an out-of-memory error.

Just to verify, the whole trunk is up-to-date, not just trunk/perl,  
right?

> Program terminated with signal 11, Segmentation fault.
> #0  0x00000086 in ?? ()
> (gdb) bt
> #0  0x00000086 in ?? ()
> #1  0x00360e2d in kino_Inverter_clear (self=0x972eea8)
>    at ../c_src/h/KinoSearch/Obj.h:166

Tracking this down, it looks like the section in Inverter_clear()  
where the Inverter's stored Doc object gets its refcount decremented.   
That's puzzling.  I don't see a scenario where an invalid Doc object  
could be sitting in the inverter->doc slot.

> As you can see above, the segv is happening on a
> $invindexer->add_doc($doc) call (a normal doc, I tried several).

Can you tell me a little more?  What does this document look like?   
How long has the indexing session been running when this happens?

Although throughout most of the KS test suite $invindexer->add_doc()  
gets fed a hashref rather than a Doc, there are instances where an  
actual Doc gets used (in t/602-boosts.t at the least), so we have a  
test already.

BTW, the instability people like you and Edward are experiencing right  
now is annoying, but the refactoring is paying off.  SVN trunk is now  
about 30% faster on the benchmark test than the last dev release, but  
the real-world gains are likely to be bigger: on the same system, t/ 
001-build_invindexes.t completes in 0.8 seconds for trunk vs. 7.6  
seconds for the last dev release.

My guess is that that improvements to Stemmer, LCNormalizer, and  
PolyAnalyzer are contributing the most, but there have also been  
improvements to InvIndexer, SegWriter, Inverter, and DocWriter.  I'd  
be surprised if everyone sees such gains, especially since KS probably  
isn't the bottleneck in most indexing apps, but still... :)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list