[KinoSearch] _write_postings hanging in _02
Marvin Humphrey
marvin at rectangular.com
Wed Mar 7 19:35:11 PST 2007
On Mar 7, 2007, at 5:33 PM, Chris Nandor wrote:
> I added some entries to an index, deleted them all, then added them
> again,
> then deleted them again. On the second time through, on the
> delete, when I
> call finish on the writer, _write_postings ends up hanging.
Thanks for the report. I've reproduced and identified the problem,
and am working on a fix.
Your narrative brings up a related issue, recently uncovered. As
currently implemented, delete_by_term() only operates on documents
which were already in the index before the indexing session started.
So if you do this...
$invindexer->add_doc( { content => 'foo' } );
$invindexer->delete_by_term( content => 'foo' );
$invindexer->finish;
... the doc added just before the call to delete_by_term() won't be
deleted. That's because docs added via add_doc exist in limbo until
finish() is called; it's not possible to search against them until
the segment is completed.
Changing things so that that doc gets deleted would be hard -- it
would be necessary to cache each term, then go back and delete docs
later, but only ones added before that particular call to
delete_by_term(). The implementation would have to be elaborate and
therefore both fragile and brittle. Either that or you'd have to
write out the segment before each call to delete_by_term, which is a
non-starter -- performance would nosedive.
My inclination is to document the method's actual behavior, but I
don't think that's enough -- the name delete_by_term suggests a
certain behavior (I'm thinking SQL DELETE with a WHERE clause) and
it's bad design to have it do something subtly different. Perhaps
renaming the method to something more descriptive, like
"delete_existing" would help.
I haven't decided what to do yet. Thoughts?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
More information about the kinosearch
mailing list