[KinoSearch] _write_postings hanging in _02

Marvin Humphrey marvin at rectangular.com
Wed Mar 7 21:26:53 PST 2007


On Mar 7, 2007, at 8:36 PM, Chris Nandor wrote:

> I was updating my searcher code, and previously I had been setting the
> offset passed to seek() using $hits->total_hits.

I don't understand what the use is of this, unless it's to naively  
retrieve the last (worst) matches.

FYI, in KS 0.15 and earlier, calling total_hits before seek()  
actually triggers a call to seek(0, 100) internally.  It's not  
possible to know how many documents a query matches without running  
the whole scoring routine.

Note that there's not much difference between calling seek(0, 10) and  
seek(0, 100).  The only change is the size of a priority queue; the  
cost of matching and scoring remains the same.

KS 0.15 also performed unnecessary seeks in some cases -- for  
instance, calling seek(0, 10) when you've already called seek(0, 100)  
shouldn't be necessary, but KS was doing that if you called seek(0,  
10) after total_hits().  This has changed in 0.20.  Credit to Henry  
for identifying the issue.

> But now I can't get that
> before I call seek(),

While you were "able" to get it before, you still had doubled costs.

> and as a result, I was passing num_wanted => 0 to
> search().  This bug in my code causing a bus error in KS.

Heh.  I'll go fix that.

> That said, I wonder if 0 or something similar might be a way to denote
> "send everything."

There are memory and performance implications for setting a large  
num_wanted.  Hits are collected in a priority queue, and the size of  
the queue is determined by num_wanted.

> My workaround now is to send $reader->num_docs instead,
> which is fine too, I think.

That will work -- sort of.  If your index is large, that's gonna be a  
huge priority queue.  Each element in the queue is either a ScoreDoc  
(16 bytes) or, when sorting, a FieldDoc (20 bytes presently, and  
probably about to grow to take in an arbitrary string).

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/





More information about the kinosearch mailing list