Subclassable Highlighter (was: Re: KinoSearch feature suggestions)

Father Chrysostomos sprout at cpan.org
Wed Jan 23 14:10:24 PST 2008




On Jan 23, 2008, at 12:48 PM, I wrote:

> Since the highlighter’s main job is to create the excerpt, I think  
> it would actually be better if we made it easy to subclass by  
> dividing up its _gen_excerpt method.
>
> So, we’d have:
>
> • gen_excerpt
>
> This will call starts_and_ends and calc_best_location, then pass  
> beginning and ending offsets for the excerpt to  
> gen_excerpt_from_offsets. A subclass can override this to call the  
> latter multiple times.
>
> • starts_and_ends
>
> Just _starts_and_ends renamed, so that subclasses can call it while  
> still using the public API.
>
> • calc_best_location
>
> _calc_best_location renamed, and made to return a list in list  
> context.
>
> • get_excerpt_from_offsets
>
> This will ‘round off’ the offsets passed to it to the nearest  
> sentence boundary, if possible, and then call format_excerpt  
> (passing it a couple of flags to indicate whether ellipsis marks are  
> needed).
>
> • format_excerpt
>
> This will take of all the formatting, calling the formatter and  
> encoder as needed, and adding ellipsis marks.
>

Here is an initial patch. It’s still missing docs and some arg-checking.

The tests in 303-highlighter.t all pass. I’ve not yet tried  
subclassing it though.

There is one difference in behaviour, however. The default ellipsis is  
not longer passed to the encoder. Is this a problem? (It’s not too  
hard to fix this, but it increases the complexity of the code a little.)


Father Chrysostomos





More information about the kinosearch mailing list