Subclassable Highlighter (was: Re: KinoSearch feature suggestions)
Father Chrysostomos
sprout at cpan.org
Wed Jan 23 12:48:42 PST 2008
On Jan 23, 2008, at 6:32 AM, Marvin Humphrey wrote:
>>> How about if we outsource excerpting to subclasses of a new class,
>>> KinoSearch::Highlight::Excerpter?
>>
>> I think I can have a patch for this in a couple of days.
>
> Sweet. :)
Since the highlighter’s main job is to create the excerpt, I think it
would actually be better if we made it easy to subclass by dividing up
its _gen_excerpt method.
So, we’d have:
• gen_excerpt
This will call starts_and_ends and calc_best_location, then pass
beginning and ending offsets for the excerpt to
gen_excerpt_from_offsets. A subclass can override this to call the
latter multiple times.
• starts_and_ends
Just _starts_and_ends renamed, so that subclasses can call it while
still using the public API.
• calc_best_location
_calc_best_location renamed, and made to return a list in list context.
• get_excerpt_from_offsets
This will ‘round off’ the offsets passed to it to the nearest sentence
boundary, if possible, and then call format_excerpt (passing it a
couple of flags to indicate whether ellipsis marks are needed).
• format_excerpt
This will take of all the formatting, calling the formatter and
encoder as needed, and adding ellipsis marks.
Please let me know if this is too complex and there is a better way I
haven’t thought of....
>
>
>> But the *offsets* of the page breaks need to be recorded. Counting
>> is not sufficient. I still have to think more about how this should
>> work—unless you have some ideas.
>
> We can modify that function to record offsets in a Perl array. This
> (untested) variant renders those offsets as counts of Unicode code
> points:
>
> [...]
I don’t know why I didn’t see this sooner, but the indexer/tokenizer/
whatever doesn’t need to care about form feeds. A highlighter subclass
can use your counting method (or y///) to see how many occur before
the excerpt, so that problem has solved itself, as it were.
More information about the kinosearch
mailing list