Subclassable Highlighter (was: Re: KinoSearch feature suggestions)

Father Chrysostomos sprout at cpan.org
Sat Jan 26 14:09:08 PST 2008




On Jan 23, 2008, at 10:33 PM, Marvin Humphrey wrote:

>
> On Jan 23, 2008, at 12:48 PM, Father Chrysostomos wrote:
>
>> Since the highlighter’s main job is to create the excerpt, I think  
>> it would actually be better if we made it easy to subclass by  
>> dividing up its _gen_excerpt method.
>
> OK, I can see the rationale for this.  The Excerpter would probably  
> need access to the Formatter and the Encoder anyway.  At that point,  
> it's not really an excerpter -- it's a highlighter.
>
> However, I think that the internal methods of Highlighter ought to  
> remain internal.  They were not designed to be public.  The division  
> of labor amongst them isn't particularly elegant or clean.  They  
> call other, non-public methods within the KinoSearch suite.
>
> I think we need the public interface for Highlighter to be more  
> general.  And perhaps it usage shouldn't be integrated into Hits as  
> it is now.  I think this, which is along the lines of Peter's  
> Search::Tools::HiLiter, would be the better:
>
>  my $highlighter = KinoSearch::Highlight::Highlighter->new(
>     searcher => $searcher,
>  );
>  my $hits = $searcher->search( query => $query_string );
>  while ( my $hit = $hits->fetch_hit ) {
>    my $excerpts = $highlighter->generate_excerpts($hit);
>    ...
>  }
>
> In fact, using Search::Tools::HiLiter would look almost the same:
>
>  my $highliter = Search::Tools::HiLiter->new( query =>  
> $query_string );
>  my $hits = $searcher->search( query => $query_string );
>  while ( my $hit = $hits->fetch_hit_hashref ) {
>    my $excerpt = $highliter->light( $hit->{content} );
>    ...
>  }
>
> The only public methods on Highlighter would be  
> generate_excerpts($hit), get_formatter(), and get_encoder().  If we  
> add others -- and I can see how that would benefit you -- they  
> should be more coherent than the current internal methods.

Do you mean to eliminate add_spec? For which fields will the default  
highlighter create excerpts? Might I suggest the following:

1) Keep add_spec.

2) Forget about get_(formatter|encoder), since each spec might have a  
different one.

3) Make generate_excerpts call generate_excerpt (_gen_excerpt  
renamed); or maybe we should call it single_excerpt, to differentiate  
between it and the former more easily. single_excerpt will be called  
with its current args, and can be overridden in a subclass. The $spec  
passed to singe_excerpt can be documented to contain the args passed  
to add_spec, with default filled in. So $spec->{limit} should be  
removed and calculated in the default single_excerpt method instead of  
in add_spec.

4) All the suggestions you made in the other message (heat_map, $query- 
 >highlight_data, and find_sentence_boundaries).





More information about the kinosearch mailing list