[KinoSearch] Feature request: highlight without excerpt

Marvin Humphrey marvin at rectangular.com
Thu Jun 7 19:10:34 PDT 2007


On Jun 7, 2007, at 1:55 PM, Edward Betts wrote:

> I'd like to be able to highlight the matches in a field without
> creating an excerpt from it.

At least one other person has made the same feature request (<http:// 
rt.cpan.org/Ticket/Display.html?id=25400>).

The revised Highlighter API introduced in 0.20_03 is intended to  
facilitate such features.  You can even process the same field  
multiple times if you want.

   $highlighter->add_spec(
     field          => 'content',
     name           => 'less'
     excerpt_length => 50,
   );
   $highlighter->add_spec(
     field          => 'content',
     name           => 'more'
     excerpt_length => 2000,
   );
   ...
   print "$hit->{excerpts}{less}\n";
   ...
   print "$hit->{excerpts}{more}\n";

As I mentioned in my reply to that bug report, you can sort of fake  
up a non-excerpted excerpt by making excerpt_length a large number.   
However, as was pointed out to me, Highlighter will tack on an  
ellipsis unless the field ends with a period.

That's a bug that needs fixin'.  Highlighter should not tack on an  
ellipsis if the end of the excerpt coincides with the end of the  
field value.

> A typical use case would be for
> highlighting in titles, like Google does.

Another use would be highlighting within URLs, something Google also  
does.

> I would have a go at implementing it, but I'm not sure how best to fit
> it into the class hierarchy, and where to put the result in the data
> structure returned by fetch_hit_hashref.

It should still go under $hit->{excerpts}.

I think there's a bit of a disconnect because of the name of that  
hash key and the name of the Hits method, "create_excerpts".  Those  
names sort-of imply that you can't use the Highlighter without  
excerpting.  Maybe that Hits method should be named "set_highlighter"  
instead, though having the word "highlighter" in there sort-of  
implies the opposite -- that you can't create excerpts without  
highlighting -- which is just as misleading.

In any case, there should be a way to turn off excerpting via the  
Highlighter->add_spec API.  I think the best way to do that is to add  
a extract_excerpt parameter to add_spec():

   $highlighter->add_spec(
     field           => 'title',
     extract_excerpt => 0,  # default 1
   );

Another possibility would be to treat an explicit undef supplied to  
excerpt_length as an indication that no excerpting should be  
performed, but I think people reading the docs wouldn't find that as  
easily.

Do you feel like taking this on?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/





More information about the kinosearch mailing list