[KinoSearch] Error in KinoSearch::Searcher::search
Marvin Humphrey
marvin at rectangular.com
Wed Mar 7 20:28:55 PST 2007
On Mar 5, 2007, at 1:50 PM, Karel K. wrote:
> And another Note:
> I guess, besides my specific problem, there is some 64bit related
> issue. Maybe in the mathfunctions library, because the length seems
> not to be reported correctly. (Filesize is equal) Should be more than
> 228.
>
> 32-bit:
> Error in function refill at c_src/KinoSearch/Store/InStream.c:100:
> Read past EOF of
> /var/www/kinosearch/KinoSearch-0.20_01/sample/uscon_invindex/_1.cf
> (start: 4294960869 len 4294967295)
>
> 64-bit:
> Error in function refill at c_src/KinoSearch/Store/InStream.c:100:
> Read past EOF of
> /var/www/wikipedia/kslokal/constitution/uscon_invindex/_1.cf (start:
> 18446744073709545189 len 228)
I am almost certain that this discrepancy arises as an artifact of my
using Perl's sprintf command -- or more specifically, the XS command
sv_vcatpvf(), to prepare error messages. (The relevant code is in
c_src/KinoSearch/Util/Carp.c.) The file pointers are 64 bit integers
in the KS C library; when they have to pass through Perl scalars,
they are turned to doubles, which can hold integers up to 2**53, more
than enough for any real file size. However, the conversion does not
happen for the error messages, and sv_vcatpvf() handles integers
differently depending on whether your Perl is 32-bit or 64-bit.
I have to use sv_vcatpvf() for error messages because the C sprintf()
command is vulnerable to buffer overflow attacks and error messages
could be manipulated by something as simple as an maliciously crafted
query string. snprintf() solves this problem when it's available,
but it isn't always -- I don't think MSVC provides it.
It might be worth working up alternative code for the routines in
Carp.c using snprintf() when it's detected. Another remedy might be
to always convert file pointers in error messages to doubles. That
would be kind of annoying, though, because the error messages are
scattered throughout the library rather than concentrated in one file.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
More information about the KinoSearch
mailing list