[KinoSearch] Error in KinoSearch::Searcher::search

Marvin Humphrey marvin at rectangular.com
Wed Mar 7 20:28:55 PST 2007


On Mar 5, 2007, at 1:50 PM, Karel K. wrote:

> And another Note:
> I guess, besides my specific problem, there is some 64bit related
> issue.  Maybe in the mathfunctions library, because the length seems
> not to be reported correctly.  (Filesize is equal) Should be more than
> 228.
>
> 32-bit:
> Error in function refill at c_src/KinoSearch/Store/InStream.c:100:
> Read past EOF of
> /var/www/kinosearch/KinoSearch-0.20_01/sample/uscon_invindex/_1.cf
> (start: 4294960869 len 4294967295)
>
> 64-bit:
> Error in function refill at c_src/KinoSearch/Store/InStream.c:100:
> Read past EOF of
> /var/www/wikipedia/kslokal/constitution/uscon_invindex/_1.cf (start:
> 18446744073709545189 len 228)

I am almost certain that this discrepancy arises as an artifact of my  
using Perl's sprintf command -- or more specifically, the XS command  
sv_vcatpvf(), to prepare error messages.  (The relevant code is in  
c_src/KinoSearch/Util/Carp.c.)  The file pointers are 64 bit integers  
in the KS C library; when they have to pass through Perl scalars,  
they are turned to doubles, which can hold integers up to 2**53, more  
than enough for any real file size.  However, the conversion does not  
happen for the error messages, and sv_vcatpvf() handles integers  
differently depending on whether your Perl is 32-bit or 64-bit.

I have to use sv_vcatpvf() for error messages because the C sprintf()  
command is vulnerable to buffer overflow attacks and error messages  
could be manipulated by something as simple as an maliciously crafted  
query string.  snprintf() solves this problem when it's available,  
but it isn't always -- I don't think MSVC provides it.

It might be worth working up alternative code for the routines in  
Carp.c using snprintf() when it's detected.  Another remedy might be  
to always convert file pointers in error messages to doubles.  That  
would be kind of annoying, though, because the error messages are  
scattered throughout the library rather than concentrated in one file.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/





More information about the KinoSearch mailing list