[KinoSearch] large index size

Marvin Humphrey marvin at rectangular.com
Sat Jul 26 01:12:39 PDT 2008




On Jul 25, 2008, at 7:25 PM, hao chen wrote:

> After indexing some html files (4.7G), I got a _1.cfs file that is  
> 8.4G. Is this normal?

Probably.  There's the index files used for lookup/scoring, the stored  
fields returned when retrieving hits, and the data used by the  
highlighter, which basically duplicates what's in the index files.

If you don't care about highlighting/excerpting and you just want to  
fetch titles, set "stored" and "vectorized" to 0 for everything but  
the "title" field and you'll cut down significantly on disk usage.

> I only modified the directory of the sample invindex.plx file for my  
> indexing

I strongly recommend using a real HTML parser rather than the cheesy  
regex tag stripper in the sample app.  It's only there because it's  
easy to grok at a glance.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list