[KinoSearch] API for subclassing Scorer (was adding a proximity scorer)

Nathan Kurz nate at verse.com
Sun Jun 17 02:08:01 PDT 2007


On 6/16/07, Marvin Humphrey <marvin at rectangular.com> wrote:
> > Personally, though, I'd probably rather see a greater split between
> > the Perl and the C. I love them both individually, but I'd be more
> > comfortable with a standard C library (libidf?) with a Perl wrapper
> > and a clearly defined boundary.
>
> This is clearly the direction that KS is headed.

>From the outside, I'm not sure that this is clear. Currently, the C
code (which I take to be proto-Lucy) seems very intimately tied to the
KinoSearch (and Lucene, and presumably Ferret) class hierarchies, and
the boundaries between the layers seem pretty malleable. Not that this
is a direction you want to go, but I'd be more comfortable with a
standard procedural C (hard to override) library with bindings that
allow the object hierarchy to be created in Perl or whatever.

Without prejudice, I can see why you've taken the route you have, but
I'd hesitate to call it standard.   I think a worthwhile question is
to ask whether an outsider considering implementing full-text-search
in another language would find it advantageous to link to your library
rather than implementing just the parts they felt they needed.  For
example (in a direction I've considered) if I were designing a search
component using an Apache module done purely in C, would I link to
this?

> Let's design the ideal API for subclassing Scorer, then work
> backwards to implement it and see how close we can get.

Probably only semantics, but I'd start by defining the problem a
little differently:  the goal is to allow someone to easily change the
way in which scoring happens.  Subclassing the existing Scorer is one
way to do this, but making it the scoring procedure simple and clear
enough that they can implement their own Scorer should be a priority.
And making it possible to change the operation of an existing class
without subclassing is nice too.  That said...

>    * It should be possible to implement a Scorer class entirely in
>      Perl and have KS use it.  (Schema and FieldSpec sort of work
>      this way.)

Yes, that would be useful.  Even having examples of the code done in
Perl would be useful to make it easier to understand whats happening.
If the default KinoSearch could be those Perl examples selectively
overriden with C using the same mechanism that a user would use to
customize, that would be fantastic.

>    * It should be possible to override individual methods used by
>      a Scorer implemented in C with wrapped Perl subroutines.

This would be impressive.  I'd agree this would be ideal, but I'd be
willing to make this a lower priority --- the kind of thing one
designs well enough to make possible in the future but doesn't
implement right now.  Are there examples of this in other software
that could be used as a pattern?

>    * It should be possible to override individual methods used by
>      a Scorer implemented in C with C functions, as in the code
>      block at the top of this post.  (This is fairly easy.)

Yes, this seems like appropriate fruit.  In addition to the inline
approach, I'd like to see it possible to load an external shared
library and use a method in that. If possible, I'd also like to see it
possible to  override the method directly in the base class (or
perhaps one instance of it), rather than only in the subclass.

Currently, it's often difficult to get your subclass to be actually
used.  Thus I'd also like the code to avoid hardcoded constructors,
and provide a similar override mechanism to call your custom subclass
constructor:
$Kinosearch::Search::BooleanScorer->override(newORScorer,

                     'MyORScorer_new')
Which is to say, hardcoded constructors should become class methods.

>    * It should be possible to add additional Perl member variables
>      to a Scorer implemented in C.
>    * It should be possible to add additional C member variables to
>      a Scorer implemented in C.

I can see why you are interested in an inside-out object model.  I
wasn't familiar with it before you mentioned it.  I can see why it's
appealing, but it's still too new for me to evaluate.  At first
glance, this seems like it would be complex.

>    * It _must_ be possible to upgrade KS without encountering binary
>      compatibility problems such as reordered vtables or object
>      structs.

I'm sure you've thought about this part much more than I have.  Do you
mean that it must be possible to upgrade the Perl portion only while
leaving the C portion untouched?  Or vice versa?  Or both?  (perhaps
this is obvious --- I'm getting tired)

Have a good night,

Nathan Kurz
nate at verse.com



More information about the KinoSearch mailing list