[KinoSearch] Unicode problem

Father Chrysostomos sprout at cpan.org
Tue Mar 4 14:43:59 PST 2008


On Mar 3, 2008, at 3:10 PM, Marvin Humphrey wrote:

> The problem was a missing SvUTF8_on in the XS binding for  
> Lexicon_Get_Term.  Fixed by r3103.  Thanks for the report.

Here’s a test for it.

-------------- next part --------------
Index: t/207-seg_lexicon.t
===================================================================
--- t/207-seg_lexicon.t	(revision 3105)
+++ t/207-seg_lexicon.t	(working copy)
@@ -1,5 +1,6 @@
 use strict;
 use warnings;
+use utf8;
 
 use Test::More tests => 5;
 
@@ -36,7 +37,9 @@
 );
 
 my $invindexer = KinoSearch::InvIndexer->new( invindex => $invindex );
-my @animals = qw( cat dog sloth );
+# We need to test strings that exceed the Latin-1 range to make sure that
+# get_term treats them correctly. (See change 3103 in the svn repo.)
+my @animals = qw( cat dog sloth λΡοντάρι змСйка );
 for my $animal (@animals) {
     $invindexer->add_doc(
         {   a => $animal,
@@ -69,7 +72,8 @@
         push @terms,  $lexicon->get_term;
     }
 }
-is_deeply( \@fields, [qw( a a a b b b c c c )], "correct fields" );
+is_deeply( \@fields, [qw( a a a a a b b b b b c c c c c )],
+    "correct fields" );
 my @correct_texts = (@animals) x 3;
 is_deeply( \@terms, \@correct_texts, "correct terms" );
 
-------------- next part --------------



More information about the kinosearch mailing list