[KinoSearch] KinoSearch 0.163 - Couldn't open file : File exists

Clifton Kussmaul ckussmaul at elegancetech.com
Mon Mar 2 09:35:56 PST 2009


I haven't had any "file exists" errors with 4221,
though I've also stopped indexing attachments, 
so I'm not 100% sure the problem has been completely crushed.

Thanks for your help, Marvin!!!
Clif

Clif Kussmaul  484-431-0722  ckussmaul at elegancetech.com
Elegance Technologies, Inc  http://www.elegancetech.com

-----Original Message-----
From: Marvin Humphrey [mailto:marvin at rectangular.com] 
Sent: Thursday, February 26, 2009 03:58 PM
To: ckussmaul at elegancetech.com; KinoSearch discussion list.
Subject: Re: [KinoSearch] KinoSearch 0.163 - Couldn't open file : File
exists

On Thu, Feb 26, 2009 at 11:42:10AM -0500, Clifton Kussmaul wrote:

> I tried 4217, and it still gets stuck, unfortunately.
> 
> Couldn't open file '<...>/index/_1.srt": File exists
>        at <...>/KinoSearch/Store/FSInvIndex.pm

Yeah, that little cockroach had escaped.  Please try 4221.

> Also, I think I (finally) found the error which breaks the index:
> Out of memory during "large" request for 16781312 bytes, 
> total sbrk() is 376035328 bytes at <...>/KinoSearch/Index/SegWriter.pm
line
> 74.
> (That's a 16MB request and the total sbrk() is 376MB.)
> I guess that's the request that sbrk()'s the Kino's back :-)
> 
> I am indexing files >10MB, so maybe more RAM will fix this.

For KS 0.163 on a 32-bit machine, each Token takes up 28 bytes in addition
to
the space required by the text itself.   That's before inversion...  

    struct Token {
        char   *text;
        STRLEN  len;
        I32     start_offset;
        I32     end_offset;
        I32     pos_inc;
        Token  *next;
        Token  *prev;
    };

So, yes, indexing huge documents takes a lot of memory, and more RAM will
probably prevent that crash.  KS uses external sorting so that it can handle
a
lot of docs, but a single huge doc can cause problems on a memory-limited
machine.

Marvin Humphrey






More information about the kinosearch mailing list