[KinoSearch] Index cleared when using add_doc, delete_by_term, et al.

Darian Anthony Patrick darian at criticode.com
Tue Mar 31 21:11:44 PDT 2009


Darian Anthony Patrick wrote:
>> On Tue, Mar 31, 2009 at 08:59:40PM -0400, Darian Anthony Patrick wrote:
>>
>>> $invindexer->delete_by_term(
>>> 	'listing_id' => $listing{'listing_id'}
>>> );
>>> to create the new entry.  I do this in a loop over all entries in the 
>>> RSS feed.  I'm seeing behavior where occasionally my entire index gets 
>>> totally blown away with only entries created during the current 
>>> invocation of my indexing script existing in the index.
>> What's the analyzer for the listing_id field?  My immediate guess is that
>> there's a stemmer involved that's normalizing the listing_id and so that the
>> delete hits multiple entries instead of just one.
>>
> 
> I'm using the default KinoSearch::Analysis::PolyAnalyzer for all fields, 
> including listing_id, constructed from my schema like so:
> 
> sub analyzer {
>    return KinoSearch::Analysis::PolyAnalyzer->new(language => 'en');
> }
> 
> Here is an example of the contents of listing_id:
> 
> cl-philadelphia-apa-1100510822
> cl-philadelphia-apa-1101384542
> cl-philadelphia-apa-1101378600
> cl-newyork-aap-1101426145
> cl-newyork-aap-1101425002
> cl-newyork-aap-1101408072
> 

A quick test using Lingua::Stem::Snowball let's the listing_id through 
unchanged:

#!/usr/bin/env perl

use Modern::Perl;

use FindBin qw($Bin);
use PAR {repository => "$Bin/../par/repo"};
use Lingua::Stem::Snowball;

my $stemmer = Lingua::Stem::Snowball->new( lang => 'en' );
print $stemmer->stem( 'cl-newyork-aap-1101422114' ), "\n";

# prints "cl-newyork-aap-1101422114"

-- 
Darian Anthony Patrick, Criticode LLC
Office:     (215) 789-9956
Facsimile:  (866) 789-2992
XMPP/SMTP:  darian at criticode.com
Web:        http://criticode.com
=================================================
BCF1 E7AD 15AD 8A99 F613 AF5F 2A9C C45C F580 E087
=================================================



More information about the kinosearch mailing list