[KinoSearch] Index cleared when using add_doc, delete_by_term, et al.

Darian Anthony Patrick darian at criticode.com
Tue Mar 31 21:20:09 PDT 2009


Marvin Humphrey wrote:
> On Wed, Apr 01, 2009 at 12:00:15AM -0400, Darian Anthony Patrick wrote:
>>> What's the analyzer for the listing_id field?  My immediate guess is that
>>> there's a stemmer involved that's normalizing the listing_id and so that the
>>> delete hits multiple entries instead of just one.
>> I'm using the default KinoSearch::Analysis::PolyAnalyzer for all fields, 
>> including listing_id, constructed from my schema like so:
>>
>> sub analyzer {
>>    return KinoSearch::Analysis::PolyAnalyzer->new(language => 'en');
>> }
>>
>> Here is an example of the contents of listing_id:
>>
>> cl-philadelphia-apa-1100510822
> 
> Because of that PolyAnalyzer, when you do this:
> 
>     $invindexer->delete_by_term(
>         field => 'listing_id',
>         term  => 'cl-philadelphia-apa-1100510822'
>     );
> 
> You're actually deleting everything that contain 'cl'.
> 
> The solution is to turn off analysis for that field.
> 
>     package UnAnalyzed;
>     use base qw( KinoSearch::FieldSpec::TextField );
>     sub analyzed { 0 }
> 
>     package MySchema;
>     use base qw( KinoSearch::Schema );
> 
>     our %fields = (
>         content    => 'text',
>         title      => 'text',
>         listing_id => 'UnAnalyzed',
>     );
> 

Awesome!  Thanks alot for that help Marvin.  I just came to the same 
conclusion with this test:

#!/usr/bin/env perl

use Modern::Perl;

use FindBin qw($Bin);
use PAR {repository => "$Bin/../par/repo"};
use KinoSearch::Analysis::PolyAnalyzer;
use Data::Dumper;

my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );
my $token_batch = $analyzer->analyze_text('cl-newyork-aap-1101422114');
while ( my $token = $token_batch->next ) {
	my $text = $token->get_text;
	print "$text\n";
}

-- 
Darian Anthony Patrick, Criticode LLC
Office:     (215) 789-9956
Facsimile:  (866) 789-2992
XMPP/SMTP:  darian at criticode.com
Web:        http://criticode.com
=================================================
BCF1 E7AD 15AD 8A99 F613 AF5F 2A9C C45C F580 E087
=================================================



More information about the kinosearch mailing list