[KinoSearch] adding a proximity scorer - Boilerplater
Nathan Kurz
nate at verse.com
Sat Jun 16 00:05:28 PDT 2007
On 6/15/07, Marvin Humphrey <marvin at rectangular.com> wrote:
> Wow, hot damn.
I'm continuing along, and things seem to be going well. I have a
subclassed Parser, Query, and Scorer that do very little other than
inherit from their counterparts and print some junk to prove they are
being called, but it's exciting that they exist. BoilerPlater seems
solid and flexible so far.
> The BoilerPlater stuff came out well, but it wasn't and isn't really
> designed to be a public API. It arose out of necessity because the
> faked-up inheritance schemes that Dave Balmain was using with Ferret
> and I was using with KS 0.15 were messy and scaled poorly. The
> design was hashed out last fall on the Lucy developer's list.
I read through some of that when I was trying to get my bearings.
I'll try to read some more. My impression so far is that you've got a
great implementation of a lousy API, and leaving Lucene in the dust is
definitely the right plan.
That sounds too harsh: there's a lot of good thought in Lucene, but
it's a little too much accretative thought and too little reductive
thought. I'd certainly prefer a clearer code path and fewer of the
twisty mazes.
> Cool idea, but it would have to look slightly different, because of
> the limitations of C syntax. It would have to be either a function,
> or a multi-line macro like this:
I'm going with this for now, which seems reasonable to me:
/* ADOPT is an alternative to CREATE for a subclass contructor that wants to
start where the parent constructor left off. For example:
SubClass *SubClass_new(parent_args, new_arg) {
ADOPT(self, Parent_new(parent_args), SubClass, SUBCLASS);
self->var = new_arg;
return self;
}
*/
#define ADOPT(var, instance, type, vtable) \
type *var = (type *) (instance); \
var->_ = &(vtable); \
var = KINO_REALLOCATE(var, 1, type);
Does it seem like this would work?
> Ideally, our discussion will result in an improvement upon that
> scheme that will allow you to write your ORScorer subclass without
> touching BoilerPlater. Something like this:
>
> package MyORScorer;
> use base qw( KinoSearch::Search::ORScorer );
>
> __PACKAGE__->register_c_method( tally => 'my_tally' );
>
> use Inline => C << 'END_C';
>
> kino_Tally*
> my_tally(kino_OrScorer *self) {
> /* ... */
> }
>
> END_C
That seems like a great goal. For now I'm happy writing C. Perhaps
more useful for most people would be the ability to override a
BoilerPlated C method with a Perl function, with it automatically
wrapped in just enough C to push the args. You aren't already doing
this anywhere, are you?
Personally, though, I'd probably rather see a greater split between
the Perl and the C. I love them both individually, but I'd be more
comfortable with a standard C library (libidf?) with a Perl wrapper
and a clearly defined boundary. I guess I think that would be both
clearer and potentially faster*. But it sure did feel slick to be
able to overlay a single function in C!
Goodnight!
--nate
* Yes, I read your testing about the negligible effect of the class
finalization.
But it's the function overhead that worries me, not the lookup. Being
addicted to speed, I drool about speedup possible if you flattened the
scoring loop into something inline, especially if you were going
directly over the mmap'd indexes.
More information about the kinosearch
mailing list