[KinoSearch] more abstract interfaces to kinosearch
Hans Dieter Pearcey
hdp at pobox.com
Mon Jul 2 08:38:59 PDT 2007
On Mon, Jul 02, 2007 at 08:17:55AM -0700, Marvin Humphrey wrote:
> If you add two clauses to a BooleanQuery with SHOULD, then their
> result sets get OR'd together.
>
> $bool_query->add_clause( query => $term_query_a, occur =>
> 'SHOULD' );
> $bool_query->add_clause( query => $term_query_b, occur =>
> 'SHOULD' );
Is this true even when (like me) you are only interested in matching?
Also, is there some reason that this isn't documented? I wanted to make
Searcher::Abstract support arbitrary nesting of conditions, and the fact that
some operations appeared to be only supported as part of filters (which would
make getting the correct precedence tricky) was a big reason I didn't
(referring to PolyFilter and RangeFilter not having Query analogues).
One thing I thought about doing was turning *everything* into a Filter, and
having a Query that did something like "match all documents", but I wasn't sure
how to build such a query.
> If you don't care about scoring and you can reuse Filters, you should
> use as many as practical.
What if I can't reuse Filters, but I don't care about scoring?
> Sorry for the delayed response -- I had to think this over.
>
> I've resisted making MatchFieldQuery public because I didn't feel
> like its API was mature enough. I'm still not sure about it, and I
> don't want to add it to the list of things that have to get done
> prior to the release of 0.20. For the time being, I suggest you go
> ahead and use MatchFieldQuery as is, but mark that aspect of your
> module experimental. Looking forward, you can help move things along
> by participating in design discussions about subclassing strategies.
> This is successful modularization, "divide and conquer", "loose
> coupling", etc, in action. Every class has its own reasonably
> contained problem domain. There are no "God Objects" that know too
> much or do too much. The components tolerate being assembled into
> many different configurations.
I agree that core KS has taken the right direction here.
The one place where this seems less true is the distinction between scoring and
matching, as I noted previously. My guess (because I don't know anything about
IR theory, or whatever) is that you assumed that of COURSE people wouldn't want
just matching and not scoring, and so the documentation and currently-existing
classes reflect that assumption, but the documentation does not extend to the
building blocks that someone like me would need in order to create the
theoretical RangeQuery. It ends up looking like there's some random arbitrary
difference between the capabilities of Query and Filter, when in fact there is
a perfectly good reason they're different.
Of course, I'm approaching it from a different direction, so I have different
assumptions; I want to treat KS more like a traditional database, which means I
have different expectations, 'unique' constraints, stuff like that.
hdp.
More information about the kinosearch
mailing list