[KinoSearch] Invalid UTF-8

Peter Karman peter at peknet.com
Tue Jan 26 18:24:06 PST 2010


Marvin Humphrey wrote on 1/26/10 8:03 PM:

>>  perl docmaker.pl \
>>     --utf_factor=0 \
>>     --write_files \
>>     --tmp_dir path/to/my/testdocs/ \
>>     --max_files 33000 \
>>     --max_words 3 \
>>     --tmp_dir_segments 2
> 
> I wonder whether this produces the same corpus on my OS X 10.5.8 MBPro as on
> your system.

no, definitely different. docmaker.pl creates random strings based on your 
system dictionary.


> No matter what, I see the following output:
> 
> marvin at smokey:~/projects/ks/perl $ rm -rf test-ks-utf8/ ; perl -Mblib karpet_utf8_test.pl testdocs/
> Crawled 33000 documents
> marvin at smokey:~/projects/ks/perl $ 
> 

damn.

> 
> Before we go further, what kind of system are you having trouble on?  Is it a
> 64-bit box?

yes, 64-bit. Tested on both RHEL 4 and Mac 10.6.

However, when I try to build on the two Linux boxen I have (32 and 64) with most 
recent KS trunk I get this:

Initializing Charmonizer/Core/OperatingSystem...
Trying to find a bit-bucket a la /dev/null...
Creating compiler object...
Trying to compile a small test file...
_charm_run.c: In function ?main?:
_charm_run.c:26: error: expected expression before ?/? token
_charm_run.c:26: error: too few arguments to function ?freopen?
_charm_run.c:27: error: expected expression before ?/? token
_charm_run.c:27: error: too few arguments to function ?freopen?
failed to compile _charm_run helper utility
Failed to write charmony.h at buildlib/KinoSearch/Build.pm line 183.
make: *** [all] Error 25


could one of the changes you committed in the last 48 hours have caused that?

-- 
Peter Karman  .  http://peknet.com/  .  peter at peknet.com



More information about the kinosearch mailing list