[KinoSearch] Subclassable Highlighter

Father Chrysostomos sprout at cpan.org
Mon Jan 28 18:12:53 PST 2008


On Jan 28, 2008, at 5:27 PM, Marvin Humphrey wrote:

>
> On Jan 28, 2008, at 3:39 PM, Father Chrysostomos wrote:
>
>> On Jan 27, 2008, at 7:56 PM, Marvin Humphrey wrote:
>>
>>> my $highlighter = KinoSearch::Highlight::Highlighter->new(
>>>     searcher => $searcher,
>>>     query    => $query,
>>> );
>>
>> Another problem with this approach is that the highlighter can only  
>> be used for one query. If a second search is made with the same  
>> $searcher, another highlighter is needed.
>
> True, but I can't think of where that would cause a problem.  Can  
> you think of one?

I thought of this only when I looked in 303-highlight.t, which does  
just that. I don’t see how it would cause a problem in a real-world  
situation.

> In fact, we could simplify further.  Now that we don't have to stick  
> all our excerpts into $hashref->{excerpts}, we can return the  
> excerpts as scalars, one-at-a-time -- eliminating both add_spec()  
> and generate_excerpts().
>
>  my $highlighter = KinoSearch::Highlight::Highlighter->new(
>    searcher       => $searcher,   # required
>    query          => $query,      # required
>    field          => 'content',   # required
>    excerpt_length => 150,         # default: 200
>    formatter      => $formatter,  # default: a SimpleHTMLFormatter
>    encoder        => $encoder,    # default: a SimpleHTMLEncoder
>  );
>  for my $hit ( $hits->fetch_hit ) {

while(my $hit = $hits->fetch_hit) {  # :-)

>
>     my $excerpt = $highlighter->single_excerpt($hit);
>     ...
>  }
>
> Juggling how params get set is a superficial change compared with  
> e.g. making single_excerpt() public, so it isn't that important.   
> However, I wonder if this lighter-weight vision for a highlighter  
> makes you more comfortable.

It certainly does. Having the constructor and add_spec combined makes  
it much easier to use.

>> Also, when it comes to the highlight_data method, which class  
>> should be responsible for removing duplicate HighlightSpans? Should  
>> I make this a method of Highlighter itself?
>
> When would there be duplicates?

Well, the current _starts_and_ends checks for them. I just thought it  
must be doing it for a reason.

>> I don’t remember whether I told you: I’m working on these changes  
>> to Highlighter, and I think I will have a patch ready soon.
>
> I'm working on the Doc class right now.  You should see some commits  
> over the next few hours.

The patch I thought I would have ready soon isn’t currently working.  
It appears to be highlighting words more or less at random. I’m  
sending it anyway just to show you what I’m doing. Parts of it will  
need to be re-factored, too, now that I’ve read this message of yours.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ks-highlighter.diff
Type: application/octet-stream
Size: 50534 bytes
Desc: not available
Url : http://rectangular.com/pipermail/kinosearch/attachments/20080128/d329517b/attachment-0002.obj 
-------------- next part --------------
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


More information about the kinosearch mailing list