Subclassable Highlighter (was: Re: KinoSearch feature suggestions)
Father Chrysostomos
sprout at cpan.org
Sat Jan 26 14:09:08 PST 2008
On Jan 23, 2008, at 10:33 PM, Marvin Humphrey wrote:
>
> On Jan 23, 2008, at 12:48 PM, Father Chrysostomos wrote:
>
>> Since the highlighter’s main job is to create the excerpt, I think
>> it would actually be better if we made it easy to subclass by
>> dividing up its _gen_excerpt method.
>
> OK, I can see the rationale for this. The Excerpter would probably
> need access to the Formatter and the Encoder anyway. At that point,
> it's not really an excerpter -- it's a highlighter.
>
> However, I think that the internal methods of Highlighter ought to
> remain internal. They were not designed to be public. The division
> of labor amongst them isn't particularly elegant or clean. They
> call other, non-public methods within the KinoSearch suite.
>
> I think we need the public interface for Highlighter to be more
> general. And perhaps it usage shouldn't be integrated into Hits as
> it is now. I think this, which is along the lines of Peter's
> Search::Tools::HiLiter, would be the better:
>
> my $highlighter = KinoSearch::Highlight::Highlighter->new(
> searcher => $searcher,
> );
> my $hits = $searcher->search( query => $query_string );
> while ( my $hit = $hits->fetch_hit ) {
> my $excerpts = $highlighter->generate_excerpts($hit);
> ...
> }
>
> In fact, using Search::Tools::HiLiter would look almost the same:
>
> my $highliter = Search::Tools::HiLiter->new( query =>
> $query_string );
> my $hits = $searcher->search( query => $query_string );
> while ( my $hit = $hits->fetch_hit_hashref ) {
> my $excerpt = $highliter->light( $hit->{content} );
> ...
> }
>
> The only public methods on Highlighter would be
> generate_excerpts($hit), get_formatter(), and get_encoder(). If we
> add others -- and I can see how that would benefit you -- they
> should be more coherent than the current internal methods.
Do you mean to eliminate add_spec? For which fields will the default
highlighter create excerpts? Might I suggest the following:
1) Keep add_spec.
2) Forget about get_(formatter|encoder), since each spec might have a
different one.
3) Make generate_excerpts call generate_excerpt (_gen_excerpt
renamed); or maybe we should call it single_excerpt, to differentiate
between it and the former more easily. single_excerpt will be called
with its current args, and can be overridden in a subclass. The $spec
passed to singe_excerpt can be documented to contain the args passed
to add_spec, with default filled in. So $spec->{limit} should be
removed and calculated in the default single_excerpt method instead of
in add_spec.
4) All the suggestions you made in the other message (heat_map, $query-
>highlight_data, and find_sentence_boundaries).
More information about the kinosearch
mailing list