Subclassable Highlighter (was: Re: KinoSearch feature suggestions)

Father Chrysostomos sprout at cpan.org
Fri Jan 25 08:28:10 PST 2008




On Jan 25, 2008, at 2:11 AM, Marvin Humphrey wrote:

>   my $highlight_data = $query->highlight_data($doc_vector,  
> $field_name);
>
> In its most basic form, the highlight data could be an array of  
> positions.

Is there any reason this needs to be an array, rather than a list?

@highlight_data = ...

> However, I think it ought to be something richer -- an array of  
> HighlightSpan objects.
>
>  my $highlight_span = KinoSearch::Highlight::HighlightSpan->new(
>    start_offset => 0,
>    end_offset   => 16,
>    weight       => 3.0
>  );
>
> Highlighter can offer a public method, heat_map(), which takes an  
> array of HighlightSpan objects as input, and returns a  
> KinoSearch::Highlight::HeatMap object.  This object would serve as a  
> vessel for the kind of information currently conveyed via  
> _starts_and_ends and _calc_best_location.  In theory, a HeatMap  
> object might supply an array of float, one per character in the  
> field; in practice, we'll need to dial that back.
>
> The default Highlighter would use the HeatMap to find a single  
> contiguous snippet.  Your subclass would use it to find multiple  
> snippets.
>
> As for the "rounding the ends" code... maybe a method called  
> find_sentence_boundaries? generate_excerpts() can then make use of  
> the boundary information however it sees fit.

Sounds more like a utility function to me: just pass it a string and  
it returns a two-element list, the number of chars to chop of each  
end. Could this be exported?





More information about the kinosearch mailing list