[KinoSearch] [Lucy] Re: Invalid UTF-8

Peter Karman peter at peknet.com
Mon Jan 25 12:11:40 PST 2010


Marvin Humphrey wrote on 01/25/2010 11:48 AM:

> 
> It would be interesting to see a hexdump of "lextemp" starting at byte 12464.
> That's where the PostingPool run starts.  The combining sequence that triggers
> the exception starts two bytes later, at 12466.

$ hexdump -C -s 12464 -n 16 sources.index.ks/seg_1/lextemp
000030b0  00 00 1f 00 00 00 c1 5c  3c 20 62 20 3e 20 57 69  |.......\< b
> Wi|

the sequence c1 5c 3c 20 looks odd to me. It's definitely not UTF-8.

[... /me debugs ... hours pass ...]

the problem is in libswish3, not KinoSearch or the Search::Tools or the
original docs.

Thanks for the tips on how UTF-8 works in KS, though. It was helpful.

-- 
Peter Karman  .  http://peknet.com/  .  peter at peknet.com



More information about the kinosearch mailing list