[KinoSearch] revision 3552 SEGV during indexing
Henry
henka at cityweb.co.za
Wed Jul 2 20:00:43 PDT 2008
On Thu, July 3, 2008 12:36 am, Marvin Humphrey wrote:
> Just to verify, the whole trunk is up-to-date, not just trunk/perl,
> right?
Yes, I checked out a fresh copy and recompiled a few times. I also tried
on other nodes in the cluster and they're doing the same.
> Can you tell me a little more? What does this document look like?
> How long has the indexing session been running when this happens?
The docs are run-of-the-mill HTML files. It happens on the third file in
the run - consistently.
> Although throughout most of the KS test suite $invindexer->add_doc()
> gets fed a hashref rather than a Doc, there are instances where an
> actual Doc gets used (in t/602-boosts.t at the least), so we have a
> test already.
I've got a sneaking suspicion my code hasn't kept pace with ks in svn
(meaning I'm sure I've missed some change in how ks in svn is supposed to
be used - my indexing code hasn't changed significantly in a few months).
I'll whup together a test case and post it here.
> BTW, the instability people like you and Edward are experiencing right
> now is annoying, but the refactoring is paying off. SVN trunk is now
> about 30% faster on the benchmark test than the last dev release, but
> the real-world gains are likely to be bigger: on the same system, t/
> 001-build_invindexes.t completes in 0.8 seconds for trunk vs. 7.6
> seconds for the last dev release.
No worries - we who walk barefoot in the head in svn-land do so with full
knowledge and at our own peril ;-)
> My guess is that that improvements to Stemmer, LCNormalizer, and
> PolyAnalyzer are contributing the most, but there have also been
> improvements to InvIndexer, SegWriter, Inverter, and DocWriter. I'd
> be surprised if everyone sees such gains, especially since KS probably
> isn't the bottleneck in most indexing apps, but still... :)
Indexing has become zippy indeed (not that it was slow to begin with).
Your suggestion of using HTML::Parser has been used to good effect, with
XML::LibXML rounding out a few cases where my skills with HTML::Parser are
lacking.
regards
Henry
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list