[KinoSearch] Unicode problem
Father Chrysostomos
sprout at cpan.org
Tue Mar 4 14:43:59 PST 2008
On Mar 3, 2008, at 3:10 PM, Marvin Humphrey wrote:
> The problem was a missing SvUTF8_on in the XS binding for
> Lexicon_Get_Term. Fixed by r3103. Thanks for the report.
Here’s a test for it.
-------------- next part --------------
Index: t/207-seg_lexicon.t
===================================================================
--- t/207-seg_lexicon.t (revision 3105)
+++ t/207-seg_lexicon.t (working copy)
@@ -1,5 +1,6 @@
use strict;
use warnings;
+use utf8;
use Test::More tests => 5;
@@ -36,7 +37,9 @@
);
my $invindexer = KinoSearch::InvIndexer->new( invindex => $invindex );
-my @animals = qw( cat dog sloth );
+# We need to test strings that exceed the Latin-1 range to make sure that
+# get_term treats them correctly. (See change 3103 in the svn repo.)
+my @animals = qw( cat dog sloth λΡονΟα½±ΟΞΉ Π·ΠΌΠ΅ΠΉΠΊΠ° );
for my $animal (@animals) {
$invindexer->add_doc(
{ a => $animal,
@@ -69,7 +72,8 @@
push @terms, $lexicon->get_term;
}
}
-is_deeply( \@fields, [qw( a a a b b b c c c )], "correct fields" );
+is_deeply( \@fields, [qw( a a a a a b b b b b c c c c c )],
+ "correct fields" );
my @correct_texts = (@animals) x 3;
is_deeply( \@terms, \@correct_texts, "correct terms" );
-------------- next part --------------
More information about the kinosearch
mailing list