[KinoSearch] newbie: Indexing and searching text not
Mike Barborak
barborak at basikgroup.com
Mon Aug 25 12:56:12 PDT 2008
Hi,
There is a utility that comes with the KinoSearch distribution called
dump_index. Running that shows these terms associated with the body field:
Terms:
body:a
Doc 0 (1 occurrences)
body:bodi
Doc 0 (1 occurrences)
body:here
Doc 0 (1 occurrences)
body:is
Doc 0 (1 occurrences)
body:short
Doc 0 (1 occurrences)
body:this
Doc 0 (1 occurrences)
body:veri
Doc 0 (1 occurrences)
So you can see that the PolyAnalyzer converted "very" to "veri." To get your
example to work then, either search for "veri" or run the word "very"
through the PolyAnalyzer first.
Best,
Mike
On Mon, Aug 25, 2008 at 2:58 PM, <kinosearch-request at rectangular.com> wrote:
> Date: Mon, 25 Aug 2008 11:40:10 +0530
> From: ram <ram at netcore.co.in>
> Subject: Re: [KinoSearch] newbie: Indexing and searching text not
> working
> To: KinoSearch discussion forum <kinosearch at rectangular.com>
> Message-ID: <1219644610.22357.61.camel at darkstar.netcore.co.in>
> Content-Type: text/plain
>
>
> On Sat, 2008-08-23 at 15:22 -0400, Mike Barborak wrote:
> > Hi,
> >
> > After creating your index with PolyAnalyzer, your body field will have
> > the terms "short" and "body" but not "short body." Take a look at
> > KinoSearch::QueryParser::QueryParser as it will likely do what you
> > want.
>
> I think my installation has got some issue. I cant search on a single
> word too
>
>
>
> ---------------------------------------
> use KinoSearch::InvIndexer;
> use KinoSearch::Analysis::PolyAnalyzer;
> use KinoSearch::Searcher;
> use strict;
> #
> # Start on a clean slate
> #
> system("rm -rf /tmp/invindex/*");
> my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new( language =>
> 'en' );
> @gl::headers = qw(from to cc subject body date reply-to message-id
> in-reply-to filename);
> my $invindexer = KinoSearch::InvIndexer->new(
> invindex => '/tmp/invindex',
> create => 1,
> analyzer => $analyzer,
> );
> foreach (@gl::headers) {
> $invindexer->spec_field( name => $_ ,indexed =>1);
> }
> my $doc = $invindexer->new_doc;
> my %mail = (
> 'date' => 'Mon, 07 Jan 2008 14:04:35 +0530',
> 'to' => 'myteam at example.com',
> 'subject' => 'subject test here ',
> 'body' => 'This is a very short body here ',
> 'cc' => 'ram at example.com',
> 'from' => 'sagar at example.com',
> 'message-id' => '<1199694875.14998.392.camel at sagar.example.com>',
> 'filename'=>'/abc/def'
> );
> foreach (keys %mail) {
> next unless($mail{$_});
> $doc->set_value( $_ => $mail{$_} );
> }
> $invindexer->add_doc($doc);
> $invindexer->finish;
>
>
> $analyzer = KinoSearch::Analysis::PolyAnalyzer->new( language =>
> 'en' );
> my $searcher = KinoSearch::Searcher->new(
> invindex => '/tmp/invindex',
> analyzer => $analyzer,
> );
> #
> # Search on body
> #
> my $term = KinoSearch::Index::Term->new("body","very");
> my $term_query = KinoSearch::Search::TermQuery->new(term => $term);
> my $hits = $searcher->search( query => $term_query );
> while ( my $hit = $hits->fetch_hit_hashref ){
> print "Found HIT in body" . $hit->{body}."\n";
> }
>
> -----------------------------------------------------------------
>
> I am using Fedora-8 and perl-5.10 and latest kinosearch installed via
> CPAN
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rectangular.com/pipermail/kinosearch/attachments/20080825/dd693429/attachment-0003.htm
-------------- next part --------------
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list