[KinoSearch] Re: Subclassable Highlighter (was: Re: KinoSearch feature suggestions)
Marvin Humphrey
marvin at rectangular.com
Sun Jan 27 17:14:33 PST 2008
On Jan 26, 2008, at 2:09 PM, Father Chrysostomos wrote:
> Do you mean to eliminate add_spec?
No, that was just an oversight while writing untested code for email.
> 2) Forget about get_(formatter|encoder), since each spec might have
> a different one.
Yes. I'd unconsciously reverted to the old API, where there was
only one formatter/encoder per highlighter. (: It's because I make
mistakes like these that KS has arg checking everywhere. :)
> 3) Make generate_excerpts call generate_excerpt (_gen_excerpt
> renamed); or maybe we should call it single_excerpt, to
> differentiate between it and the former more easily. single_excerpt
> will be called with its current args, and can be overridden in a
> subclass. The $spec passed to singe_excerpt can be documented to
> contain the args passed to add_spec, with default filled in. So
> $spec->{limit} should be removed and calculated in the default
> single_excerpt method instead of in add_spec.
Sounds well thought through. I concur with making single_excerpt
public() with that API.
We'll need to add one extra named arg to the add_spec list. "hit"?
Or actually, how about "doc"?
# User code:
my $highlighter = KinoSearch::Highlight::Highlighter->new(
searcher => $searcher,
);
$highlighter->add_spec( name => 'content' );
my $excerpts = $highlighter->generate_excerpts($hit);
# Internally, highlighter calls single_excerpt:
for my $spec ( @{ $specs{$$self} } ) {
$excerpts->{ $spec->{name} } = $self->single_excerpt(
%$spec,
doc => $hit,
);
}
The DocVector object would be retrieved within single_excerpt() --
which becomes possible once the Highlighter gets a Searcher at
construction time.
I'm a little uncertain about dedicating the name "Hit" to the class
for the documents that Hits::fetch_hit returns. Sure, it works, but
"hit" is used elsewhere, e.g. the HitCollector class, which doesn't
deal with *this* kind of "hit". These are essentially Doc objects.
So I'm thinking make them a subclass of Doc called HitDoc, and have
the named arg for single_excerpt() be "doc". Sound good?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list