[KinoSearch] opening up the scorers
Nathan Kurz
nate at verse.com
Thu Apr 17 10:43:09 PDT 2008
> I've been thinking about adding new public classes ORQuery, ANDQuery,
> ANDNOTQuery and ANDORQuery. BooleanQuery would either be deprecated or
> removed; the logic from the compilation phase of BooleanScorer's first
> iteration would be moved to QueryParser.
This sounds like a good idea to me, especially changing QueryParser to
build the query directly from the components. I think it would be
great to have a toolbox of component scorers (core or KSx) that can be
wired together in different ways by custom QueryParsers.
I'm not sure I understand the differences between the component
classes you are proposing though. Or maybe I do, and I just don't
understand the names. Also, related to this and probably evident from
my sloppy terminology, I still can't keep straight how Queries and
Scorers relate.
AndQuery: short circuit and, scored in some way as a product of subqueries?
OrQuery: score equal to best scoring subquery, could be short circuit if sorted?
AndOrQuery: score all subqueries and add them, possibly normalized?
AndNotQuery: not sure why this isn't a NotQuery, scored as a constant?
> > ps. Marvin --- the term-by-term approach might be a useful general
> > optimization for a special purpose additive OrScorer.
> >
>
> Yeah, term-at-a-time scoring is great stuff, it's just that the combining
> scorers in KS all need to go doc-at-a-time in order to handle boolean
> constraints without blowing up.
I agree that it probably can't be the default OrQuery/OrScorer, but it
strikes me as a useful piece of rope to tempt users who are creating
their own queries. It also might be useful to think about how Queries
could be split across cores/servers. If it worked, there would be
some performance benefits of doing so per term rather than
partitioning the corpus.
Nathan Kurz
nate at verse.com
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list