[KinoSearch] Index cleared when using add_doc, delete_by_term, et al.
Darian Anthony Patrick
darian at criticode.com
Tue Mar 31 21:11:44 PDT 2009
Darian Anthony Patrick wrote:
>> On Tue, Mar 31, 2009 at 08:59:40PM -0400, Darian Anthony Patrick wrote:
>>
>>> $invindexer->delete_by_term(
>>> 'listing_id' => $listing{'listing_id'}
>>> );
>>> to create the new entry. I do this in a loop over all entries in the
>>> RSS feed. I'm seeing behavior where occasionally my entire index gets
>>> totally blown away with only entries created during the current
>>> invocation of my indexing script existing in the index.
>> What's the analyzer for the listing_id field? My immediate guess is that
>> there's a stemmer involved that's normalizing the listing_id and so that the
>> delete hits multiple entries instead of just one.
>>
>
> I'm using the default KinoSearch::Analysis::PolyAnalyzer for all fields,
> including listing_id, constructed from my schema like so:
>
> sub analyzer {
> return KinoSearch::Analysis::PolyAnalyzer->new(language => 'en');
> }
>
> Here is an example of the contents of listing_id:
>
> cl-philadelphia-apa-1100510822
> cl-philadelphia-apa-1101384542
> cl-philadelphia-apa-1101378600
> cl-newyork-aap-1101426145
> cl-newyork-aap-1101425002
> cl-newyork-aap-1101408072
>
A quick test using Lingua::Stem::Snowball let's the listing_id through
unchanged:
#!/usr/bin/env perl
use Modern::Perl;
use FindBin qw($Bin);
use PAR {repository => "$Bin/../par/repo"};
use Lingua::Stem::Snowball;
my $stemmer = Lingua::Stem::Snowball->new( lang => 'en' );
print $stemmer->stem( 'cl-newyork-aap-1101422114' ), "\n";
# prints "cl-newyork-aap-1101422114"
--
Darian Anthony Patrick, Criticode LLC
Office: (215) 789-9956
Facsimile: (866) 789-2992
XMPP/SMTP: darian at criticode.com
Web: http://criticode.com
=================================================
BCF1 E7AD 15AD 8A99 F613 AF5F 2A9C C45C F580 E087
=================================================
More information about the kinosearch
mailing list