[KinoSearch] [Lucy] Re: Invalid UTF-8
Peter Karman
peter at peknet.com
Mon Jan 25 12:11:40 PST 2010
Marvin Humphrey wrote on 01/25/2010 11:48 AM:
>
> It would be interesting to see a hexdump of "lextemp" starting at byte 12464.
> That's where the PostingPool run starts. The combining sequence that triggers
> the exception starts two bytes later, at 12466.
$ hexdump -C -s 12464 -n 16 sources.index.ks/seg_1/lextemp
000030b0 00 00 1f 00 00 00 c1 5c 3c 20 62 20 3e 20 57 69 |.......\< b
> Wi|
the sequence c1 5c 3c 20 looks odd to me. It's definitely not UTF-8.
[... /me debugs ... hours pass ...]
the problem is in libswish3, not KinoSearch or the Search::Tools or the
original docs.
Thanks for the tips on how UTF-8 works in KS, though. It was helpful.
--
Peter Karman . http://peknet.com/ . peter at peknet.com
More information about the kinosearch
mailing list