[KinoSearch] KinoSearch 0.163 - Couldn't open file : File exists
Clifton Kussmaul
ckussmaul at elegancetech.com
Mon Mar 2 09:35:56 PST 2009
I haven't had any "file exists" errors with 4221,
though I've also stopped indexing attachments,
so I'm not 100% sure the problem has been completely crushed.
Thanks for your help, Marvin!!!
Clif
Clif Kussmaul 484-431-0722 ckussmaul at elegancetech.com
Elegance Technologies, Inc http://www.elegancetech.com
-----Original Message-----
From: Marvin Humphrey [mailto:marvin at rectangular.com]
Sent: Thursday, February 26, 2009 03:58 PM
To: ckussmaul at elegancetech.com; KinoSearch discussion list.
Subject: Re: [KinoSearch] KinoSearch 0.163 - Couldn't open file : File
exists
On Thu, Feb 26, 2009 at 11:42:10AM -0500, Clifton Kussmaul wrote:
> I tried 4217, and it still gets stuck, unfortunately.
>
> Couldn't open file '<...>/index/_1.srt": File exists
> at <...>/KinoSearch/Store/FSInvIndex.pm
Yeah, that little cockroach had escaped. Please try 4221.
> Also, I think I (finally) found the error which breaks the index:
> Out of memory during "large" request for 16781312 bytes,
> total sbrk() is 376035328 bytes at <...>/KinoSearch/Index/SegWriter.pm
line
> 74.
> (That's a 16MB request and the total sbrk() is 376MB.)
> I guess that's the request that sbrk()'s the Kino's back :-)
>
> I am indexing files >10MB, so maybe more RAM will fix this.
For KS 0.163 on a 32-bit machine, each Token takes up 28 bytes in addition
to
the space required by the text itself. That's before inversion...
struct Token {
char *text;
STRLEN len;
I32 start_offset;
I32 end_offset;
I32 pos_inc;
Token *next;
Token *prev;
};
So, yes, indexing huge documents takes a lot of memory, and more RAM will
probably prevent that crash. KS uses external sorting so that it can handle
a
lot of docs, but a single huge doc can cause problems on a memory-limited
machine.
Marvin Humphrey
More information about the kinosearch
mailing list