[KinoSearch] Queries with large number of hits.
Marvin Humphrey
marvin at rectangular.com
Fri Sep 19 17:28:45 PDT 2008
On Sep 19, 2008, at 11:25 AM, Nathan Kurz wrote:
> The third thing (tiny, but perhaps easy to fix) is that
> Scorepost_read_record is spending 40% of its time in REALLOC. Is the
> enlarged position buffer not getting reused for some reason?
Oi, good catch! With one line of code, we see a 10-20% search-time
speed improvement:
Index: ../c_src/KinoSearch/Posting/ScorePosting.c
===================================================================
--- ../c_src/KinoSearch/Posting/ScorePosting.c (revision 3882)
+++ ../c_src/KinoSearch/Posting/ScorePosting.c (working copy)
@@ -145,6 +145,7 @@
num_prox = self->freq;
if (num_prox > self->prox_cap) {
self->prox = REALLOCATE(self->prox, num_prox, u32_t);
+ self->prox_cap = num_prox;
}
positions = self->prox;
> ps. The directions for building the Reuters benchmark index seem out
> of date. '-Mblib' no longer finds the uninstalled KinoSearch.so in
> the parent hierarchy.
I'll try to get updates committed later this evening.
Incidentally, although there are c. 19,000 unique documents in the
Reuters corpus, the indexing benchmarker will loop if you specify a
larger number, e.g. --docs=1000000.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list