Subclassable Highlighter (was: Re: KinoSearch feature suggestions)
Father Chrysostomos
sprout at cpan.org
Fri Jan 25 08:28:10 PST 2008
On Jan 25, 2008, at 2:11 AM, Marvin Humphrey wrote:
> my $highlight_data = $query->highlight_data($doc_vector,
> $field_name);
>
> In its most basic form, the highlight data could be an array of
> positions.
Is there any reason this needs to be an array, rather than a list?
@highlight_data = ...
> However, I think it ought to be something richer -- an array of
> HighlightSpan objects.
>
> my $highlight_span = KinoSearch::Highlight::HighlightSpan->new(
> start_offset => 0,
> end_offset => 16,
> weight => 3.0
> );
>
> Highlighter can offer a public method, heat_map(), which takes an
> array of HighlightSpan objects as input, and returns a
> KinoSearch::Highlight::HeatMap object. This object would serve as a
> vessel for the kind of information currently conveyed via
> _starts_and_ends and _calc_best_location. In theory, a HeatMap
> object might supply an array of float, one per character in the
> field; in practice, we'll need to dial that back.
>
> The default Highlighter would use the HeatMap to find a single
> contiguous snippet. Your subclass would use it to find multiple
> snippets.
>
> As for the "rounding the ends" code... maybe a method called
> find_sentence_boundaries? generate_excerpts() can then make use of
> the boundary information however it sees fit.
Sounds more like a utility function to me: just pass it a string and
it returns a two-element list, the number of chars to chop of each
end. Could this be exported?
More information about the kinosearch
mailing list