[KinoSearch] Another minimal test case: File::Find causes crash

Edward Betts edwardbetts at gmail.com
Tue Jun 5 08:25:36 PDT 2007


Here is my code:

#!/usr/bin/perl
use strict;
use warnings;

package Schema;
use base qw( KinoSearch::Schema );
use KinoSearch::Analysis::PolyAnalyzer;

our %fields = ( title => 'KinoSearch::Schema::FieldSpec' );

sub analyzer { KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' ) }

package main;

use File::Find;
use KinoSearch::InvIndexer;

my $index = KinoSearch::InvIndexer->new(invindex => Schema->clobber('index'));

find(\&wanted, "en");

$index->finish();

sub wanted {
    /\.html$/ or return;
    my $filename = $_;

    my %field;
    open my $fh, $filename or die "$filename: $!";
    while (<$fh>) {
        m!<body>! and last;
        if (m!<title>(.*)</title>!) {
            $field{title} = $1;
            last;
        }
    }
    close $fh;

    $index->add_doc(\%field);
}

I'm running this with KinoSearch-0.20_03 from CPAN. It needs a
reasonably big collection of files, like 50,000 of them. I've used a
static dump from wikipedia. If you want to try that you need to
install 7zip, if you're running Debian the package name is p7zip-full.

Assuming you want to use the wiki dump and you've put the code in
index_wiki.pl the steps to run look like this:

wget http://static.wikipedia.org/downloads/April_2007/en/wikipedia-en-html.0.7z
7z x wikipedia-en-html.0.7z
perl index_wiki.pl

The output I get is:

Error in function kino_FSFolder_open_outstream at
c_src/KinoSearch/Store/FSFolder.c:56: Can't open '_1.skip': No such
file or directory
         at /home/edward/src/KinoSearch-0.20_03/blib/lib/KinoSearch/Index/SegWriter.pm
line 121
        KinoSearch::Index::SegWriter::add_doc('KinoSearch::Index::SegWriter=HASH(0x816bdfc)',
'HASH(0x890e790)', 1) called at
/home/edward/src/KinoSearch-0.20_03/blib/lib/KinoSearch/InvIndexer.pm
line 114
        KinoSearch::InvIndexer::add_doc('KinoSearch::InvIndexer=HASH(0x816b7c0)',
'HASH(0x890e790)') called at ./index_wiki.pl line 42
        main::wanted() called at /usr/share/perl/5.8/File/Find.pm line 886
        File::Find::_find_dir('HASH(0x816c00c)', 'en', 8) called at
/usr/share/perl/5.8/File/Find.pm line 700
        File::Find::_find_opt('HASH(0x816c00c)', 'en') called at
/usr/share/perl/5.8/File/Find.pm line 1223
        File::Find::find('CODE(0x8337cac)', 'en') called at
./index_wiki.pl line 23

The line numbers in index_wiki.pl are wrong because I took out the
'use lib' line in the sample above.

Let me know if you need any more info.
-- 
Edward Betts



More information about the kinosearch mailing list