[KinoSearch] opening up the scorers

Marvin Humphrey marvin at rectangular.com
Mon Apr 21 22:54:42 PDT 2008




On Apr 19, 2008, at 12:10 PM, Nathan Kurz wrote:

>> You mean how would you persuade QueryParser to use your ORQuery  
>> variant
>> rather than the default?
>
> Yes, I'm wondering how to get a variant to actually be used.  As it
> is, the the official way seems to be to rewrite QueryParser to use my
> own classes, but this seems onerous.

The problem is that everyone and their dog has extremely strong,  
mutually incompatible opinions about how query parsers should behave.   
On the Lucene user list, eruptions of incredulity and outrage over the  
QueryParser design are a regular side show -- and the Lucene  
QueryParser is *jammed* with features.

KinoSearch's approach is to provide only one simple implementation  
that serves the most common case.  Opening up the core QueryParser API  
is a low priority (approaching zero) because I know the feature set  
will never please everyone, more features just means more hooks to  
hang hate on, and I don't need the abuse.

Instead of opening up the core class, I'd be more inclined to write  
and release KSx::Search::OpenQueryParser, which would look a lot like  
the current QueryParser but single-field and with factory methods.  In  
theory, it *should* also be possible to adapt a general purpose module  
like Search::QueryParser from CPAN to build KinoSearch Query trees --  
the language of common search queries is quite small.  Things get a  
little complicated when you get into Analyzers and multiple fields,  
though.

>> QueryParser doesn't parse 'NOT
>> brobniquitz' down to a NOTQuery because it's standard behavior for  
>> search
>> engines to parse that kind of thing as a void query with no result  
>> set
>> rather than return the universe.
>
> I strongly think you want to 'return the universe' here.

Returning the universe is a perfectly reasonable behavior for some  
applications.  However, I strongly disagree that it should be the  
default behavior for the core QueryParser.

If I write NOTQuery, at least then it becomes possible to implement  
your desired behavior.  It's probably best if I focus my energies on  
that task.

> Instead of thinking
> about this as a search engine (with standard search engine
> constraints) think of KinoSearch as a general purpose database with
> some really cool retrieval functions.

KinoSearch is built around the data structure of an inverted index,  
which is not suitable for use as a general purpose database.  If we  
try to pretend that KS is a database, all kinds of annoyances start to  
plague us: no unique keys, comparatively awkward updates, no standard  
way of handling one-to-many relationships, etc.  Database-like  
thinking leads us to impose all kinds of constraints which fit poorly  
and would require vastly more compromises than the search-engine model.

I recognize that people want to be able to fuse excellent full-text  
search with database-like behavior.  In my judgment, the best way to  
accommodate this wish is to finish the C API and make it possible to  
hook KS into an existing database like MyFavoriteFlavorOfSQL.

Furthermore, there's a lot more unexplored territory in the field of  
inverted indexing than in general purpose database design.  The state  
of the art in search sucks, right up to the highest levels -- Google  
sucks too, they just suck less.  Personally speaking, the prospect of  
writing YetAnotherDatabase holds no thrill -- but I'm both excited and  
optimistic about the possibilities for shoehorning many different  
kinds of data into segmented inverted indexes and combining different  
retrieval models.  That's where I'd like to spend my time.

> If you do go with a RequiredAndOptionalScorer, though,
> I'd request that it be able to handle arbitrary subqueries under the
> Required half, rather than just straight Terms.

Absolutely.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list