[KinoSearch] Index cleared when using add_doc, delete_by_term, et al.
Darian Anthony Patrick
darian at criticode.com
Tue Mar 31 21:20:09 PDT 2009
Marvin Humphrey wrote:
> On Wed, Apr 01, 2009 at 12:00:15AM -0400, Darian Anthony Patrick wrote:
>>> What's the analyzer for the listing_id field? My immediate guess is that
>>> there's a stemmer involved that's normalizing the listing_id and so that the
>>> delete hits multiple entries instead of just one.
>> I'm using the default KinoSearch::Analysis::PolyAnalyzer for all fields,
>> including listing_id, constructed from my schema like so:
>>
>> sub analyzer {
>> return KinoSearch::Analysis::PolyAnalyzer->new(language => 'en');
>> }
>>
>> Here is an example of the contents of listing_id:
>>
>> cl-philadelphia-apa-1100510822
>
> Because of that PolyAnalyzer, when you do this:
>
> $invindexer->delete_by_term(
> field => 'listing_id',
> term => 'cl-philadelphia-apa-1100510822'
> );
>
> You're actually deleting everything that contain 'cl'.
>
> The solution is to turn off analysis for that field.
>
> package UnAnalyzed;
> use base qw( KinoSearch::FieldSpec::TextField );
> sub analyzed { 0 }
>
> package MySchema;
> use base qw( KinoSearch::Schema );
>
> our %fields = (
> content => 'text',
> title => 'text',
> listing_id => 'UnAnalyzed',
> );
>
Awesome! Thanks alot for that help Marvin. I just came to the same
conclusion with this test:
#!/usr/bin/env perl
use Modern::Perl;
use FindBin qw($Bin);
use PAR {repository => "$Bin/../par/repo"};
use KinoSearch::Analysis::PolyAnalyzer;
use Data::Dumper;
my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );
my $token_batch = $analyzer->analyze_text('cl-newyork-aap-1101422114');
while ( my $token = $token_batch->next ) {
my $text = $token->get_text;
print "$text\n";
}
--
Darian Anthony Patrick, Criticode LLC
Office: (215) 789-9956
Facsimile: (866) 789-2992
XMPP/SMTP: darian at criticode.com
Web: http://criticode.com
=================================================
BCF1 E7AD 15AD 8A99 F613 AF5F 2A9C C45C F580 E087
=================================================
More information about the kinosearch
mailing list