[KinoSearch] _write_postings hanging in _02

Marvin Humphrey marvin at rectangular.com
Wed Mar 7 19:35:11 PST 2007


On Mar 7, 2007, at 5:33 PM, Chris Nandor wrote:

> I added some entries to an index, deleted them all, then added them  
> again,
> then deleted them again.  On the second time through, on the  
> delete, when I
> call finish on the writer, _write_postings ends up hanging.

Thanks for the report.  I've reproduced and identified the problem,  
and am working on a fix.

Your narrative brings up a related issue, recently uncovered.  As  
currently implemented, delete_by_term() only operates on documents  
which were already in the index before the indexing session started.   
So if you do this...

   $invindexer->add_doc( { content => 'foo' } );
   $invindexer->delete_by_term( content => 'foo' );
   $invindexer->finish;

... the doc added just before the call to delete_by_term() won't be  
deleted.  That's because docs added via add_doc exist in limbo until  
finish() is called; it's not possible to search against them until  
the segment is completed.

Changing things so that that doc gets deleted would be hard -- it  
would be necessary to cache each term, then go back and delete docs  
later, but only ones added before that particular call to  
delete_by_term().   The implementation would have to be elaborate and  
therefore both fragile and brittle.  Either that or you'd have to  
write out the segment before each call to delete_by_term, which is a  
non-starter -- performance would nosedive.

My inclination is to document the method's actual behavior, but I  
don't think that's enough -- the name delete_by_term suggests a  
certain behavior (I'm thinking SQL DELETE with a WHERE clause) and  
it's bad design to have it do something subtly different.  Perhaps  
renaming the method to something more descriptive, like  
"delete_existing" would help.

I haven't decided what to do yet.  Thoughts?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/





More information about the kinosearch mailing list