[KinoSearch] more abstract interfaces to kinosearch

Nathan Kurz nate at verse.com
Mon Jul 2 12:32:49 PDT 2007


On 7/2/07, Marvin Humphrey <marvin at rectangular.com> wrote:
> If you don't care about scoring and you can reuse Filters, you should
> use as many as practical.

I've been trying to understand the role of filters as well. My
impression is that they are a good stopgap while custom scoring is
hard (ie, what Marvin says is probably good practice for the present),
but that once it becomes easy to subclass scorers filters can go away.
 That is, they should  folded into the scoring apparatus rather than
wrapped around the Hit Collector.

To my mind, the main disadvantage of filters is that they happen too
late, after all the rest of the query has already run.  Consider some
expensive query that we want to filter, say a search for a mention of
the band "The The" in the category "Music".

With the current filtering approach, we have to do an expensive phrase
with position checks on every document that contains the word 'the',
and only afterward is the category checked.  While the category check
against a bit vector is very efficient, the rest of the query is
terribly expensive and disk intensive.

Contrast this with an ANDScorer with ordered subclauses
'category:music' and 'text:"the-the"'.  Checking whether the category
field contains the term 'music' is really fast and efficient, and if
the music category is relatively small we only have to do the
expensive part of the query on a very small subset of the documents.

Sure, you say, but what about restricted to ranges?  Surely you need a
filter for that? Currently, but only because it's hard to write a
custom scorer.  I don't see any inherent limitation that would prevent
a custom scorer from dealing with ranges directly.  And if one wanted,
I don't see any reason that its Next doc method has to use an index at
all --- it could work from a cached bit vector just as well.

> If that were already the case, somebody could whip up
> KSx::Search::RangeQuery and you could use it without waiting for me
> to act.

Yes, yes, that!  With that in mind, and thinking of the long-term, are
there things that can currently done with the Filter apparatus that
couldn't be done with a Scorer as or more efficiently?  This isn't to
mean that discriminating HitCollectors need to go away, only that
maybe they don't need to be part of the primary API.

Nathan Kurz
nate at verse.com



More information about the kinosearch mailing list