[KinoSearch] revision 3552 SEGV during indexing

Henry henka at cityweb.co.za
Wed Jul 2 20:00:43 PDT 2008



On Thu, July 3, 2008 12:36 am, Marvin Humphrey wrote:
> Just to verify, the whole trunk is up-to-date, not just trunk/perl,
> right?

Yes, I checked out a fresh copy and recompiled a few times.  I also tried
on other nodes in the cluster and they're doing the same.

> Can you tell me a little more?  What does this document look like?
> How long has the indexing session been running when this happens?

The docs are run-of-the-mill HTML files.  It happens on the third file in
the run - consistently.

> Although throughout most of the KS test suite $invindexer->add_doc()
> gets fed a hashref rather than a Doc, there are instances where an
> actual Doc gets used (in t/602-boosts.t at the least), so we have a
> test already.

I've got a sneaking suspicion my code hasn't kept pace with ks in svn
(meaning I'm sure I've missed some change in how ks in svn is supposed to
be used - my indexing code hasn't changed significantly in a few months). 
I'll whup together a test case and post it here.

> BTW, the instability people like you and Edward are experiencing right
> now is annoying, but the refactoring is paying off.  SVN trunk is now
> about 30% faster on the benchmark test than the last dev release, but
> the real-world gains are likely to be bigger: on the same system, t/
> 001-build_invindexes.t completes in 0.8 seconds for trunk vs. 7.6
> seconds for the last dev release.

No worries - we who walk barefoot in the head in svn-land do so with full
knowledge and at our own peril ;-)

> My guess is that that improvements to Stemmer, LCNormalizer, and
> PolyAnalyzer are contributing the most, but there have also been
> improvements to InvIndexer, SegWriter, Inverter, and DocWriter.  I'd
> be surprised if everyone sees such gains, especially since KS probably
> isn't the bottleneck in most indexing apps, but still... :)

Indexing has become zippy indeed (not that it was slow to begin with). 
Your suggestion of using HTML::Parser has been used to good effect, with
XML::LibXML rounding out a few cases where my skills with HTML::Parser are
lacking.

regards
Henry


_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list