[KinoSearch] fast way to collect results? (KinoSearch .15)

Marvin Humphrey marvin at rectangular.com
Sun Sep 2 17:52:47 PDT 2007




On Aug 22, 2007, at 12:48 PM, Matthew Berk wrote:

> I'm looking for the fastest way to collect the full set of results  
> from a search. Here's what I'm using currently:
>
> my $hits = $index->search($query);
> my $collector = KinoSearch::Search::BitCollector->new();
> $hits->{searcher}->search_hit_collector(
>    hit_collector => $collector,
>    weight => $hits->{weight}
>    );
> my @result_ids = @{$collector->get_bit_vector()->to_arrayref};
>
> What I'm finding is that it takes MUCH longer to call  
> search_hit_collector that the initial search than I'd expect. The  
> initial search on my index takes something like .004s, while the  
> search_hit_collector_call brings processing speed to 0.11s.

On first look, I was confused myself.  However, now that I've had a  
chance to peruse things more closely, I believe the slowdown is due  
to the BitVector object continually reallocating as it stores  
increasing document numbers.  Try this:

   my $collector = KinoSearch::Search::BitCollector->new(
       capacity => $searcher->max_doc,
   );

PS: Just for the record, this is mostly private API we're accessing  
here.  Those document numbers may change up with any index revision,  
making them difficult to match up against external data.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list