[KinoSearch] opening up the scorers
Nathan Kurz
nate at verse.com
Tue Apr 22 14:39:45 PDT 2008
On Mon, Apr 21, 2008 at 10:54 PM, Marvin Humphrey
<marvin at rectangular.com> wrote:
> Instead of opening up the core class, I'd be more inclined to write and
> release KSx::Search::OpenQueryParser, which would look a lot like the
> current QueryParser but single-field and with factory methods.
I'm pretty happy with that approach, although I think there might be a
little more safe maneuvering room before the slope gets slippery.
Opening up the API to allow syntax changes seems excessive;
customizing the output classes strikes me as reasonable. I ignored
QueryParser and wrote my own, but my fear is that the burden of
writing a parser is going to stop anyone from casually experimenting
with different scorers.
A stray thought: QueryParser implies that it is parsing a Query,
whereas it's probably clearer to think of it as building a query from
some text, with the output tree being the actual Query. I don't
suppose that QueryBuilder strikes you as a clearer name? It would
make it clearer what it does...
> KinoSearch's approach is to provide only one simple implementation that
> serves the most common case.
It is the most common case, but possibly only because doing anything
else requires a lot of heavy lifting. :)
> > I strongly think you want to 'return the universe' [for a bare NOT query].
> >
>
> Returning the universe is a perfectly reasonable behavior for some
> applications. However, I strongly disagree that it should be the default
> behavior for the core QueryParser.
>
> If I write NOTQuery, at least then it becomes possible to implement your
> desired behavior. It's probably best if I focus my energies on that task.
Probably an agree to disagree sort of situation. If I picture the
QueryParser as being something that is task specific, this is probably
a fine solution. My main preference would be to have the Scorer
capable of ordering and returning large numbers of results without
blowing up --- whether it does so by default is merely a detail. So
yes, implementing a NOTQuery that the default parser optimizes out
would be just fine for my purposes, although I might try to argue that
this optimization should take place at some later stage to allow for a
simpler Parser.
> > Instead of thinking
> > about this as a search engine (with standard search engine
> > constraints) think of KinoSearch as a general purpose database with
> > some really cool retrieval functions.
My phrasing was poor. I didn't mean at all that KinoSearch should try
to be a _relational_ database. I probably should have said
'datastore'. And by 'general' I meant that it should be agnostic
about the type of data it is storing, not that it should be general
enough to use for all types of problems. I agree with you
completelythat this is a more interesting problem than creating a
slightly better SomeSQL.
> I'm both excited and optimistic
> about the possibilities for shoehorning many different kinds of data into
> segmented inverted indexes and combining different retrieval models. That's
> where I'd like to spend my time.
Yes, and yes. And to help you along this path, I think it would be
good to start spend some time on some use cases that are a little
further away from TF/IDF scoring for full-text search. The
'more-like-this' approach that follows from Jack's project seems like
it might be a good start. I'd love it if you could think some about
kNN searches of ratings data. Relational databases don't stand much
of a chance for either of these, whereas KinoSearch seems like it
could handle either quite well.
Nathan Kurz
nate at verse.com
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list