[KinoSearch] Freshness queries
Marvin Humphrey
marvin at rectangular.com
Mon Aug 10 11:51:35 PDT 2009
On Mon, Aug 10, 2009 at 03:18:52PM +0200, Nick Wellnhofer wrote:
> I'd like to boost search results depending on their age. I.e. newer
> documents get promoted, older ones get demoted. AFAICS this is currently
> not possible.
Right -- as of now you can sort by "date" or any other sortable field, but not
influence scores so that sort order is determined by a combination of
relevance and "date".
> I think I could implement it quite easily by stealing code
> from MatchAllQuery and RangeQuery.
>
> I would propose the parameters fresh_age, fresh_score, stale_age,
> stale_score. The score of documents newer than fresh_age would be
> multiplied by fresh_score. The score of documents older than stale_age
> would be multiplied by stale_score. For documents in between I would
> interpolate linearly.
Scores within compound matchers are generally added rather than multiplied.
So, this structure wouldn't work...
my $and_query = KinoSearch::Search::ANDQuery->new(
children => [ $user_query, $freshness_query ],
);
... but something like this would:
my $freshness_query = KSx::Search::FreshnessQuery->new(
child => $user_query,
);
The next() and advance() methods for FreshnessScorer would be implemented by
invoking next() or advance() on the child Matcher.
sub next {
my $self = shift;
return $child{$$self}->next;
}
What's left is accessing the field's sort cache during score().
sub score {
my $self = shift;
my $doc_id = $child{$$self}->get_doc_id;
my $score = $child{$$self}->score;
my $date_stamp_field_cache = $date_stamp_field_cache{$$self};
my $date_stamp = $date_stamp_field_cache->value($doc_id);
if ($date_stamp < $stale_age{$$self}) {
$score *= $stale_score{$$self};
}
elsif ($date_stamp > $fresh_age{$$self}) {
$score *= $fresh_score{$$self};
}
else {
# interpolate
...
}
return $score;
}
SortCache doesn't presently have a public API, but this is the kind of
application that would drive us to expose one.
(Seems like maybe it ought to be called "FieldCache" instead, since it will be
used for more than sorting...)
> What do you think?
This is exactly the kind of thing that would work well as a pure-Perl KSx
module, and eventually a C-based KSx module. I think it's probably too
specific for core -- but then I'd be saying that about just about any Query
module, because I think it's important to keep the core at a manageable size.
In the short term, I'm happy to work to expose a field cache API. If the
pure-Perl implementation performs well enough to meet your present needs, that
would be great. If not, then we have to think about what it takes to get a C
API exposed.
Marvin Humphrey
More information about the kinosearch
mailing list