[KinoSearch] opening up the scorers

Nathan Kurz nate at verse.com
Tue Apr 22 14:39:45 PDT 2008



On Mon, Apr 21, 2008 at 10:54 PM, Marvin Humphrey
<marvin at rectangular.com> wrote:
>  Instead of opening up the core class, I'd be more inclined to write and
> release KSx::Search::OpenQueryParser, which would look a lot like the
> current QueryParser but single-field and with factory methods.

I'm pretty happy with that approach, although I think there might be a
little more safe maneuvering room before the slope gets slippery.
Opening up the API to allow syntax changes seems excessive;
customizing the output classes strikes me as reasonable.  I ignored
QueryParser and wrote my own, but my fear is that the burden of
writing a parser is going to stop anyone from casually experimenting
with different scorers.

A stray thought:   QueryParser implies that it is parsing a Query,
whereas it's probably clearer to think of it as building a query from
some text, with the output tree being the actual Query.  I don't
suppose that QueryBuilder strikes you as a clearer name?  It would
make it clearer what it does...

>  KinoSearch's approach is to provide only one simple implementation that
> serves the most common case.

It is the most common case, but possibly only because doing anything
else requires a lot of heavy lifting.  :)


> > I strongly think you want to 'return the universe' [for a bare NOT query].
> >
>
>  Returning the universe is a perfectly reasonable behavior for some
> applications.  However, I strongly disagree that it should be the default
> behavior for the core QueryParser.
>
>  If I write NOTQuery, at least then it becomes possible to implement your
> desired behavior.  It's probably best if I focus my energies on that task.

Probably an agree to disagree sort of situation.  If I picture the
QueryParser as being something that is task specific, this is probably
a fine solution.  My main preference would be to have the Scorer
capable of ordering and returning large numbers of results without
blowing up --- whether it does so by default is merely a detail.  So
yes, implementing a NOTQuery that the default parser optimizes out
would be just fine for my purposes, although I might try to argue that
this optimization should take place at some later stage to allow for a
simpler Parser.

> > Instead of thinking
> > about this as a search engine (with standard search engine
> > constraints) think of KinoSearch as a general purpose database with
> > some really cool retrieval functions.

My phrasing was poor.  I didn't mean at all that KinoSearch should try
to be a _relational_ database.  I probably should have said
'datastore'.  And by 'general' I meant that it should be agnostic
about the type of data it is storing, not that it should be general
enough to use for all types of problems.  I agree with you
completelythat this is a more interesting problem than creating a
slightly better SomeSQL.

> I'm both excited and optimistic
> about the possibilities for shoehorning many different kinds of data into
> segmented inverted indexes and combining different retrieval models.  That's
> where I'd like to spend my time.

Yes, and yes.  And to help you along this path, I think it would be
good to start spend some time on some use cases that are a little
further away from TF/IDF scoring for full-text search.  The
'more-like-this' approach that follows from Jack's project seems like
it might be a good start.  I'd love it if you could think some about
kNN searches of ratings data.  Relational databases don't stand much
of a chance for either of these, whereas KinoSearch seems like it
could handle either quite well.

Nathan Kurz
nate at verse.com

_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list