[KinoSearch] Subclassable Highlighter

Father Chrysostomos sprout at cpan.org
Sun Jan 27 13:49:47 PST 2008




On Jan 26, 2008, at 2:09 PM, I wrote:

> 1) Keep add_spec.
>
> 2) Forget about get_(formatter|encoder), since each spec might have  
> a different one.
>
> 3) Make generate_excerpts call generate_excerpt (_gen_excerpt  
> renamed); or maybe we should call it single_excerpt, to  
> differentiate between it and the former more easily. single_excerpt  
> will be called with its current args, and can be overridden in a  
> subclass. The $spec passed to singe_excerpt can be documented to  
> contain the args passed to add_spec, with default filled in. So  
> $spec->{limit} should be removed and calculated in the default  
> single_excerpt method instead of in add_spec.
>
> 4) All the suggestions you made in the other message (heat_map,  
> $query->highlight_data, and find_sentence_boundaries).
>

I’m trying to implement this now. I’ve noticed what I believe to be a  
couple of bugs in _calc_best_location, but I want to check with you to  
make sure:

sub _calc_best_location {
...
     for my $loc_index ( 0 .. $#$posits ) {
         # only score positions that are in range
         my $location        = $posits->[$loc_index][0];
         my $other_loc_index = $loc_index - 1;
         while ( $other_loc_index > 0 ) {

Should this not be >= ?

             my $diff = $location - $posits->[$other_loc_index][0];
             last if $diff > $window;
             my $num_tokens_at_pos = $posits->[$other_loc_index][2];
             $locations{$location}
                 += ( 1 / ( 1 + log($diff) ) ) * $num_tokens_at_pos;
             --$other_loc_index;
         }
         $other_loc_index = $loc_index + 1;
         while ( $other_loc_index <= $#$posits ) {
             my $diff = $posits->[$other_loc_index] - $location;

Shouldn’t $posits->[$other_loc_index] have [0] on the end?

             last if $diff > $window;
             my $num_tokens_at_pos = $posits->[$other_loc_index][2];
             $locations{$location}
                 += ( 1 / ( 1 + log($diff) ) ) * $num_tokens_at_pos;
             ++$other_loc_index;
         }
     }

     # return the highest scoring position
     return ( sort { $locations{$b} <=> $locations{$a} } keys  
%locations )[0];
}





More information about the kinosearch mailing list