[KinoSearch] opening up the scorers
Marvin Humphrey
marvin at rectangular.com
Mon Apr 21 22:54:42 PDT 2008
On Apr 19, 2008, at 12:10 PM, Nathan Kurz wrote:
>> You mean how would you persuade QueryParser to use your ORQuery
>> variant
>> rather than the default?
>
> Yes, I'm wondering how to get a variant to actually be used. As it
> is, the the official way seems to be to rewrite QueryParser to use my
> own classes, but this seems onerous.
The problem is that everyone and their dog has extremely strong,
mutually incompatible opinions about how query parsers should behave.
On the Lucene user list, eruptions of incredulity and outrage over the
QueryParser design are a regular side show -- and the Lucene
QueryParser is *jammed* with features.
KinoSearch's approach is to provide only one simple implementation
that serves the most common case. Opening up the core QueryParser API
is a low priority (approaching zero) because I know the feature set
will never please everyone, more features just means more hooks to
hang hate on, and I don't need the abuse.
Instead of opening up the core class, I'd be more inclined to write
and release KSx::Search::OpenQueryParser, which would look a lot like
the current QueryParser but single-field and with factory methods. In
theory, it *should* also be possible to adapt a general purpose module
like Search::QueryParser from CPAN to build KinoSearch Query trees --
the language of common search queries is quite small. Things get a
little complicated when you get into Analyzers and multiple fields,
though.
>> QueryParser doesn't parse 'NOT
>> brobniquitz' down to a NOTQuery because it's standard behavior for
>> search
>> engines to parse that kind of thing as a void query with no result
>> set
>> rather than return the universe.
>
> I strongly think you want to 'return the universe' here.
Returning the universe is a perfectly reasonable behavior for some
applications. However, I strongly disagree that it should be the
default behavior for the core QueryParser.
If I write NOTQuery, at least then it becomes possible to implement
your desired behavior. It's probably best if I focus my energies on
that task.
> Instead of thinking
> about this as a search engine (with standard search engine
> constraints) think of KinoSearch as a general purpose database with
> some really cool retrieval functions.
KinoSearch is built around the data structure of an inverted index,
which is not suitable for use as a general purpose database. If we
try to pretend that KS is a database, all kinds of annoyances start to
plague us: no unique keys, comparatively awkward updates, no standard
way of handling one-to-many relationships, etc. Database-like
thinking leads us to impose all kinds of constraints which fit poorly
and would require vastly more compromises than the search-engine model.
I recognize that people want to be able to fuse excellent full-text
search with database-like behavior. In my judgment, the best way to
accommodate this wish is to finish the C API and make it possible to
hook KS into an existing database like MyFavoriteFlavorOfSQL.
Furthermore, there's a lot more unexplored territory in the field of
inverted indexing than in general purpose database design. The state
of the art in search sucks, right up to the highest levels -- Google
sucks too, they just suck less. Personally speaking, the prospect of
writing YetAnotherDatabase holds no thrill -- but I'm both excited and
optimistic about the possibilities for shoehorning many different
kinds of data into segmented inverted indexes and combining different
retrieval models. That's where I'd like to spend my time.
> If you do go with a RequiredAndOptionalScorer, though,
> I'd request that it be able to handle arbitrary subqueries under the
> Required half, rather than just straight Terms.
Absolutely.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list