[KinoSearch] fast way to collect results? (KinoSearch .15)
Marvin Humphrey
marvin at rectangular.com
Sun Sep 2 17:52:47 PDT 2007
On Aug 22, 2007, at 12:48 PM, Matthew Berk wrote:
> I'm looking for the fastest way to collect the full set of results
> from a search. Here's what I'm using currently:
>
> my $hits = $index->search($query);
> my $collector = KinoSearch::Search::BitCollector->new();
> $hits->{searcher}->search_hit_collector(
> hit_collector => $collector,
> weight => $hits->{weight}
> );
> my @result_ids = @{$collector->get_bit_vector()->to_arrayref};
>
> What I'm finding is that it takes MUCH longer to call
> search_hit_collector that the initial search than I'd expect. The
> initial search on my index takes something like .004s, while the
> search_hit_collector_call brings processing speed to 0.11s.
On first look, I was confused myself. However, now that I've had a
chance to peruse things more closely, I believe the slowdown is due
to the BitVector object continually reallocating as it stores
increasing document numbers. Try this:
my $collector = KinoSearch::Search::BitCollector->new(
capacity => $searcher->max_doc,
);
PS: Just for the record, this is mostly private API we're accessing
here. Those document numbers may change up with any index revision,
making them difficult to match up against external data.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list