Subclassable Highlighter (was: Re: KinoSearch feature suggestions)

Father Chrysostomos sprout at cpan.org
Sun Jan 27 18:26:57 PST 2008


On Jan 27, 2008, at 5:14 PM, Marvin Humphrey wrote:

> We'll need to add one extra named arg to the add_spec list.  "hit"?   
> Or actually, how about "doc"?
>
>   # User code:
>   my $highlighter = KinoSearch::Highlight::Highlighter->new(
>      searcher => $searcher,
>   );
>   $highlighter->add_spec( name => 'content' );
>   my $excerpts = $highlighter->generate_excerpts($hit);
>
>   # Internally, highlighter calls single_excerpt:
>   for my $spec ( @{ $specs{$$self} } ) {
>      $excerpts->{ $spec->{name} } = $self->single_excerpt(
>         %$spec,
>         doc => $hit,
>      );
>   }

Actually what I’ve done so far hash ->singe_excerpt($hit, \%spec), but  
I think what you have is better. I’ve also made doc_vector an  
attribute of Hit (see the attached file [if I remember to attach it  
after typing this message]). Now I’m not certain that the hit needs a  
reference to the doc number, but it’s in there. Also, it has a  
reference to the query, so that single_excerpt can call

	$hit_doc->highlight_data( $excerpt_field )

and just has to pass one arg. This highlight_data method calls the  
method of the same name on the query and then sorts its return value  
and removes duplicates.

> The DocVector object would be retrieved within single_excerpt() --  
> which becomes possible once the Highlighter gets a Searcher at  
> construction time.

DocVector is currently documented as a private class. Do we want a  
‘publicly subclassable’ method to have to deal with it?
>
>
> I'm a little uncertain about dedicating the name "Hit" to the class  
> for the documents that Hits::fetch_hit returns.  Sure, it works, but  
> "hit" is used elsewhere, e.g. the HitCollector class, which doesn't  
> deal with *this* kind of "hit".  These are essentially Doc objects.   
> So I'm thinking make them a subclass of Doc called HitDoc, and have  
> the named arg for single_excerpt() be "doc".  Sound good?

This sounds fine to me. But I don’t know what the Doc is currently  
for....

Also, do we need a HighlightSpan object? Won’t a simple hash do?  
Likewise with a heat map.


Father Chrysostomos

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Hit.pm
Type: text/x-perl-script
Size: 2197 bytes
Desc: not available
Url : http://rectangular.com/pipermail/kinosearch/attachments/20080127/b5fe2475/attachment-0001.bin 
-------------- next part --------------



More information about the kinosearch mailing list