[Kinosearch] Problem during delete_docs_by_term

Matt Williamson matt at sanasecurity.com
Mon Nov 10 19:18:19 PST 2008


I am indexing a large number of files, in batches. In order to allow the
files to change and be re-indexed, I call 

 

$invindexer->delete_docs_by_term($term);

 

Before each insertion. 

 

I managed to index around 140000 files, but then I hit a problem with
the following message:

 

Couldn't open file '/home/qatest/kinosearch/invindex/_10613.f0': File
exists at

/usr/local/lib/perl/5.8.8/KinoSearch/Store/FSInvIndex.pm line 88

 
KinoSearch::Store::FSInvIndex::open_outstream('KinoSearch::Store::FSInvI
ndex=HASH(0x88829bc)', '_10613.f0') called at /usr/local/lib

/perl/5.8.8/KinoSearch/Index/SegWriter.pm line 40

 
KinoSearch::Index::SegWriter::init_instance('KinoSearch::Index::SegWrite
r=HASH(0x8c9e634)') called at /usr/local/lib/perl/5.8.8/Kino

Search/Util/Class.pm line 31

        KinoSearch::Util::Class::new('KinoSearch::Index::SegWriter',
'invindex', 'KinoSearch::Store::FSInvIndex=HASH(0x88829bc)', 'seg_name'

, '_10613', 'finfos', 'KinoSearch::Index::FieldInfos=HASH(0x8c9e1d8)',
'field_sims', 'HASH(0x8882ee4)', ...) called at /usr/local/lib/perl/5

.8.8/KinoSearch/InvIndexer.pm line 152

 
KinoSearch::InvIndexer::_delayed_init('KinoSearch::InvIndexer=HASH(0x888
2b78)') called at /usr/local/lib/perl/5.8.8/KinoSearch/InvIn

dexer.pm line 262

 
KinoSearch::InvIndexer::delete_docs_by_term('KinoSearch::InvIndexer=HASH
(0x8882b78)', 'KinoSearch::Index::Term=HASH(0x8882cbc)') cal

led at index.pl line 185

        main::handleJob('KinoSearch::InvIndexer=HASH(0x8882b78)',
'26ba90ca1fa8354ffb00f43b59b58223f2f07b35') called at index.pl line 80

        eval {...} called at index.pl line 79

 

The contents of the invindexer directory are the following

 

-rw-r--r-- 1 qatest qatest 2360312453 2008-10-20 20:44 _10455.cfs

-rw-r--r-- 1 qatest qatest 2331193220 2008-10-20 22:51 _10583.cfs

-rw-r--r-- 1 qatest qatest  351002378 2008-10-20 23:06 _10599.cfs

-rw-r--r-- 1 qatest qatest   13298110 2008-10-20 23:07 _10600.cfs

-rw-r--r-- 1 qatest qatest   41360185 2008-10-20 23:08 _10601.cfs

-rw-r--r-- 1 qatest qatest   20375600 2008-10-20 23:09 _10602.cfs

-rw-r--r-- 1 qatest qatest   60127418 2008-10-20 23:11 _10603.cfs

-rw-r--r-- 1 qatest qatest   14264840 2008-10-20 23:12 _10604.cfs

-rw-r--r-- 1 qatest qatest   14161480 2008-10-20 23:12 _10605.cfs

-rw-r--r-- 1 qatest qatest   14208046 2008-10-20 23:13 _10606.cfs

-rw-r--r-- 1 qatest qatest   14408859 2008-10-20 23:14 _10607.cfs

-rw-r--r-- 1 qatest qatest   15721448 2008-10-20 23:15 _10608.cfs

-rw-r--r-- 1 qatest qatest   28738202 2008-10-20 23:16 _10609.cfs

-rw-r--r-- 1 qatest qatest   47820729 2008-10-20 23:17 _10610.cfs

-rw-r--r-- 1 qatest qatest   16592842 2008-10-20 23:18 _10611.cfs

-rw-r--r-- 1 qatest qatest   89080573 2008-11-10 13:05 _10612.cfs

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.f0

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.f1

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.f10

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.f11

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.f12

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.f2

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.f3

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.f4

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.f5

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.f6

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.f7

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.f8

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.f9

-rw-r--r-- 1 qatest qatest    8111867 2008-11-10 13:06 _10613.fdt

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.fdx

-rw-r--r-- 1 qatest qatest          0 2008-11-10 13:06 _10613.srt

-rw-r--r-- 1 qatest qatest 4111442230 2008-10-11 01:17 _228.cfs

-rw-r--r-- 1 qatest qatest 4448882741 2008-10-19 17:08 _9338.cfs

-rw-r--r-- 1 qatest qatest        215 2008-11-10 13:05 segments

 

If I move all the _10613.* files to another directory it will look like
it is indexing again, but I think the 'finish' is not working, so that
on my next batch run I will get the same problem above. I guess that if
I delete these files I am basically not adding that data to the index.
Is that true? I tried adding optimize => 1 to the finish, but it made no
difference.

 

I make the indexer, add a batch full of documents (e..g 200 or so of
delete_docs_by_term, then add_doc), then call 

 

$invindexer->finish;   

$invindexer->_release_locks();

$invindexer = undef;

 

And exit.

 

I read some other posts with update problems e.g.
http://www.gossamer-threads.com/lists/kinosearch/discuss/3249, and
upgraded to the latest from svn (3883 at the time I did it). Both the
version on cpan and the latest had the same exact error message.

 

Any suggestions? 

 

Thanks in advance

 

Matt Williamson

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rectangular.com/pipermail/kinosearch/attachments/20081110/409e8295/attachment-0002.htm 


More information about the kinosearch mailing list