[KinoSearch] utf8 warnings/error
Scott Beck
scottbeck at gmail.com
Fri Aug 24 10:54:10 PDT 2007
Hi Marvin,
I still can't reproduce these errors on a small test case :(
I did however get some feedback from valgrind although I don't know
how helpful it is. I thought I would post it here as a follow up. I
will continue to debug this and see if I can figure it out.
valgrind errors from my tests:
==1766== Invalid read of size 1
==1766== at 0x814C06F: Perl_swash_fetch (utf8.c:1747)
==1766== by 0x813BF82: S_find_byclass (regexec.c:1248)
==1766== by 0x813E456: Perl_regexec_flags (regexec.c:1945)
==1766== by 0x8138538: Perl_pregexec (regexec.c:323)
==1766== by 0x61284B9:
XS_KinoSearch__Analysis__Tokenizer__do_analyze (KinoSearch.xs:4741)
==1766== by 0x80DE048: Perl_pp_entersub (pp_hot.c:2854)
==1766== by 0x80BCA83: Perl_runops_debug (dump.c:1442)
==1766== by 0x8064024: S_run_body (perl.c:1921)
==1766== by 0x8063AE5: perl_run (perl.c:1840)
==1766== by 0x805F69A: main (perlmain.c:86)
==1766== Address 0x630C48F is 6 bytes after a block of size 17 alloc'd
==1766== at 0x401B507: malloc (vg_replace_malloc.c:149)
==1766== by 0x80BCFEB: Perl_safesysmalloc (util.c:67)
==1766== by 0x80E1817: Perl_sv_grow (sv.c:1637)
==1766== by 0x80E6E24: Perl_sv_setsv_flags (sv.c:4019)
==1766== by 0x80EDD93: Perl_newSVsv (sv.c:7049)
==1766== by 0x814BC09: Perl_swash_fetch (utf8.c:1717)
==1766== by 0x8149F03: Perl_is_utf8_alnum (utf8.c:1191)
==1766== by 0x813BF0C: S_find_byclass (regexec.c:1246)
==1766== by 0x813E456: Perl_regexec_flags (regexec.c:1945)
==1766== by 0x8138538: Perl_pregexec (regexec.c:323)
==1766== by 0x61284B9:
XS_KinoSearch__Analysis__Tokenizer__do_analyze (KinoSearch.xs:4741)
==1766== by 0x80DE048: Perl_pp_entersub (pp_hot.c:2854)
==1766== by 0x80BCA83: Perl_runops_debug (dump.c:1442)
==1766== by 0x8064024: S_run_body (perl.c:1921)
==1766== by 0x8063AE5: perl_run (perl.c:1840)
==1766== by 0x805F69A: main (perlmain.c:86)
==1766== Invalid read of size 1
==1766== at 0x814C06F: Perl_swash_fetch (utf8.c:1747)
==1766== by 0x8145D17: S_regrepeat (regexec.c:4089)
==1766== by 0x814497E: S_regmatch (regexec.c:3732)
==1766== by 0x813EE8D: S_regtry (regexec.c:2185)
==1766== by 0x813BFA8: S_find_byclass (regexec.c:1249)
==1766== by 0x813E456: Perl_regexec_flags (regexec.c:1945)
==1766== by 0x8138538: Perl_pregexec (regexec.c:323)
==1766== by 0x61284B9:
XS_KinoSearch__Analysis__Tokenizer__do_analyze (KinoSearch.xs:4741)
==1766== by 0x80DE048: Perl_pp_entersub (pp_hot.c:2854)
==1766== by 0x80BCA83: Perl_runops_debug (dump.c:1442)
==1766== by 0x8064024: S_run_body (perl.c:1921)
==1766== by 0x8063AE5: perl_run (perl.c:1840)
==1766== by 0x805F69A: main (perlmain.c:86)
==1766== Address 0x630C48F is 6 bytes after a block of size 17 alloc'd
==1766== at 0x401B507: malloc (vg_replace_malloc.c:149)
==1766== by 0x80BCFEB: Perl_safesysmalloc (util.c:67)
==1766== by 0x80E1817: Perl_sv_grow (sv.c:1637)
==1766== by 0x80E6E24: Perl_sv_setsv_flags (sv.c:4019)
==1766== by 0x80EDD93: Perl_newSVsv (sv.c:7049)
==1766== by 0x814BC09: Perl_swash_fetch (utf8.c:1717)
==1766== by 0x8149F03: Perl_is_utf8_alnum (utf8.c:1191)
==1766== by 0x813BF0C: S_find_byclass (regexec.c:1246)
==1766== by 0x813E456: Perl_regexec_flags (regexec.c:1945)
==1766== by 0x8138538: Perl_pregexec (regexec.c:323)
==1766== by 0x61284B9:
XS_KinoSearch__Analysis__Tokenizer__do_analyze (KinoSearch.xs:4741)
==1766== by 0x80DE048: Perl_pp_entersub (pp_hot.c:2854)
==1766== by 0x80BCA83: Perl_runops_debug (dump.c:1442)
==1766== by 0x8064024: S_run_body (perl.c:1921)
==1766== by 0x8063AE5: perl_run (perl.c:1840)
==1766== by 0x805F69A: main (perlmain.c:86)
I don't know if this is related but after I index and then do a
delete/insert, my index is really broken. I wrote a small little
command line tool to test with, like mysql command line. Here is what
searches are returning like after all this:
kino> flag_deleted:0
flag_deleted path subject
0 ./new/1182974233.2388.1.vmware.nmsrv.com,S=1598 Re:Re:
Free Porn NOW
Hits 1
kino> flag_deleted:1
flag_deleted path subject
1 .Drafts/new/1187976985.1766.4.vmware.nmsrv.com,S=205:2,DT
Hits 1
kino> subject:a
flag_deleted path subject
0 ./cur/1182972472.1867.1.vmware.nmsrv.com,S=2454 Hot sex
with Viagra pills
0 ./cur/1182972878.1983.1.vmware.nmsrv.com,S=1410 Hello!
0 ./cur/1182973219.2096.1.vmware.nmsrv.com,S=1704 We are
here for you to live a healthier and happier life!
Hits 3
kino>
As you can see the search for "a" shows 3 results with flag_deleted=0
but the search for flag_deleted:0 only shows one result. And actually
there should be 57 results in the database which I can see from this
tool before I do the delete/insert from the database.
I will continue to try and reduce the problem to as small a case as
possible. Thanks for all your time and effort.
Scott
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list