[Kinosearch] Problem during delete_docs_by_term
Matt Williamson
matt at sanasecurity.com
Mon Nov 10 19:18:19 PST 2008
I am indexing a large number of files, in batches. In order to allow the
files to change and be re-indexed, I call
$invindexer->delete_docs_by_term($term);
Before each insertion.
I managed to index around 140000 files, but then I hit a problem with
the following message:
Couldn't open file '/home/qatest/kinosearch/invindex/_10613.f0': File
exists at
/usr/local/lib/perl/5.8.8/KinoSearch/Store/FSInvIndex.pm line 88
KinoSearch::Store::FSInvIndex::open_outstream('KinoSearch::Store::FSInvI
ndex=HASH(0x88829bc)', '_10613.f0') called at /usr/local/lib
/perl/5.8.8/KinoSearch/Index/SegWriter.pm line 40
KinoSearch::Index::SegWriter::init_instance('KinoSearch::Index::SegWrite
r=HASH(0x8c9e634)') called at /usr/local/lib/perl/5.8.8/Kino
Search/Util/Class.pm line 31
KinoSearch::Util::Class::new('KinoSearch::Index::SegWriter',
'invindex', 'KinoSearch::Store::FSInvIndex=HASH(0x88829bc)', 'seg_name'
, '_10613', 'finfos', 'KinoSearch::Index::FieldInfos=HASH(0x8c9e1d8)',
'field_sims', 'HASH(0x8882ee4)', ...) called at /usr/local/lib/perl/5
.8.8/KinoSearch/InvIndexer.pm line 152
KinoSearch::InvIndexer::_delayed_init('KinoSearch::InvIndexer=HASH(0x888
2b78)') called at /usr/local/lib/perl/5.8.8/KinoSearch/InvIn
dexer.pm line 262
KinoSearch::InvIndexer::delete_docs_by_term('KinoSearch::InvIndexer=HASH
(0x8882b78)', 'KinoSearch::Index::Term=HASH(0x8882cbc)') cal
led at index.pl line 185
main::handleJob('KinoSearch::InvIndexer=HASH(0x8882b78)',
'26ba90ca1fa8354ffb00f43b59b58223f2f07b35') called at index.pl line 80
eval {...} called at index.pl line 79
The contents of the invindexer directory are the following
-rw-r--r-- 1 qatest qatest 2360312453 2008-10-20 20:44 _10455.cfs
-rw-r--r-- 1 qatest qatest 2331193220 2008-10-20 22:51 _10583.cfs
-rw-r--r-- 1 qatest qatest 351002378 2008-10-20 23:06 _10599.cfs
-rw-r--r-- 1 qatest qatest 13298110 2008-10-20 23:07 _10600.cfs
-rw-r--r-- 1 qatest qatest 41360185 2008-10-20 23:08 _10601.cfs
-rw-r--r-- 1 qatest qatest 20375600 2008-10-20 23:09 _10602.cfs
-rw-r--r-- 1 qatest qatest 60127418 2008-10-20 23:11 _10603.cfs
-rw-r--r-- 1 qatest qatest 14264840 2008-10-20 23:12 _10604.cfs
-rw-r--r-- 1 qatest qatest 14161480 2008-10-20 23:12 _10605.cfs
-rw-r--r-- 1 qatest qatest 14208046 2008-10-20 23:13 _10606.cfs
-rw-r--r-- 1 qatest qatest 14408859 2008-10-20 23:14 _10607.cfs
-rw-r--r-- 1 qatest qatest 15721448 2008-10-20 23:15 _10608.cfs
-rw-r--r-- 1 qatest qatest 28738202 2008-10-20 23:16 _10609.cfs
-rw-r--r-- 1 qatest qatest 47820729 2008-10-20 23:17 _10610.cfs
-rw-r--r-- 1 qatest qatest 16592842 2008-10-20 23:18 _10611.cfs
-rw-r--r-- 1 qatest qatest 89080573 2008-11-10 13:05 _10612.cfs
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.f0
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.f1
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.f10
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.f11
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.f12
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.f2
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.f3
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.f4
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.f5
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.f6
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.f7
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.f8
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.f9
-rw-r--r-- 1 qatest qatest 8111867 2008-11-10 13:06 _10613.fdt
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.fdx
-rw-r--r-- 1 qatest qatest 0 2008-11-10 13:06 _10613.srt
-rw-r--r-- 1 qatest qatest 4111442230 2008-10-11 01:17 _228.cfs
-rw-r--r-- 1 qatest qatest 4448882741 2008-10-19 17:08 _9338.cfs
-rw-r--r-- 1 qatest qatest 215 2008-11-10 13:05 segments
If I move all the _10613.* files to another directory it will look like
it is indexing again, but I think the 'finish' is not working, so that
on my next batch run I will get the same problem above. I guess that if
I delete these files I am basically not adding that data to the index.
Is that true? I tried adding optimize => 1 to the finish, but it made no
difference.
I make the indexer, add a batch full of documents (e..g 200 or so of
delete_docs_by_term, then add_doc), then call
$invindexer->finish;
$invindexer->_release_locks();
$invindexer = undef;
And exit.
I read some other posts with update problems e.g.
http://www.gossamer-threads.com/lists/kinosearch/discuss/3249, and
upgraded to the latest from svn (3883 at the time I did it). Both the
version on cpan and the latest had the same exact error message.
Any suggestions?
Thanks in advance
Matt Williamson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rectangular.com/pipermail/kinosearch/attachments/20081110/409e8295/attachment-0002.htm
More information about the kinosearch
mailing list