[KinoSearch] opening up the scorers

Nathan Kurz nate at verse.com
Thu Apr 17 22:15:11 PDT 2008



On Thu, Apr 17, 2008 at 4:10 PM, Marvin Humphrey <marvin at rectangular.com> wrote:
>  On Apr 17, 2008, at 10:43 AM, Nathan Kurz wrote:
> > I still can't keep straight how Queries and Scorers relate.
> >
>
>  Query is the abstract specification.  It's little more than a parse tree
> for a search string[1][2].
>
>  Scorer does the hard work of actually scoring documents.  It's the
> practical application of a Query, where Query meets the real world.

I appreciate the details.  I'm not being willfully obtuse, I just
haven't been thinking about this for a while.

So the tree of Queries is used to build a tree (typically) of Scorers,
and each Query class has a one-to-one relationship with a Scorer
class?  Is there any 'query' specific code in the query beyond the
name of the Scorer class?  My desire for simplicity makes me wonder if
one could just have a single 'QueryNode' class that instantiates a
customizeable Scorer.

>  People could potentially publish KSx subclasses that compile down to
> scorers that behave differently from those in core.

For a custom OrScorer that I'm interested in (short-circuit OR,
returns the score of the first match of the ordered children) what
would I subclass and how would I call it?  My instinct is it would be
simplest just to build the Scorer tree myself and stick with my
FirstMatchScorer in at the appropriate places.   But what would the
right way be?

>   ANDQuery    - Search for 'a AND b'.
>   ORQuery     - Search for 'a OR b'.
>   ANDNOTQuery - Search for 'a AND NOT b'.

Why not just have a NotQuery?  It seems like it would be more general,
and one could always build the 'a AND NOT b' using an AND and a NOT.

>  ANDORQuery is the odd one out, because it doesn't really mean 'a AND/OR b'.
> What it does is combine one optional clause and one required clause.

Ditto.  Why not just layer an AND and an OR?  Or an AND with a
hypothetical 'OptionalTermScorer' that returns some non-zero score if
the term is not found?

>  I chose those names because they seemed clearer than the Lucene
> equivalents.  Here's the mapping of Scorer subclasses:

Yes, these are improvements.  I do like the that Lucene names mention
that they are 'Sum' scorers, though, as it seems useful to distinguish
how the actual scoring is done.

Nathan Kurz
nate at verse.com

ps. The ice cream goes pretty well: http://screamsorbet.com/

_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list