[KinoSearch] Subclassable Highlighter
Father Chrysostomos
sprout at cpan.org
Sun Jan 27 13:49:47 PST 2008
On Jan 26, 2008, at 2:09 PM, I wrote:
> 1) Keep add_spec.
>
> 2) Forget about get_(formatter|encoder), since each spec might have
> a different one.
>
> 3) Make generate_excerpts call generate_excerpt (_gen_excerpt
> renamed); or maybe we should call it single_excerpt, to
> differentiate between it and the former more easily. single_excerpt
> will be called with its current args, and can be overridden in a
> subclass. The $spec passed to singe_excerpt can be documented to
> contain the args passed to add_spec, with default filled in. So
> $spec->{limit} should be removed and calculated in the default
> single_excerpt method instead of in add_spec.
>
> 4) All the suggestions you made in the other message (heat_map,
> $query->highlight_data, and find_sentence_boundaries).
>
I’m trying to implement this now. I’ve noticed what I believe to be a
couple of bugs in _calc_best_location, but I want to check with you to
make sure:
sub _calc_best_location {
...
for my $loc_index ( 0 .. $#$posits ) {
# only score positions that are in range
my $location = $posits->[$loc_index][0];
my $other_loc_index = $loc_index - 1;
while ( $other_loc_index > 0 ) {
Should this not be >= ?
my $diff = $location - $posits->[$other_loc_index][0];
last if $diff > $window;
my $num_tokens_at_pos = $posits->[$other_loc_index][2];
$locations{$location}
+= ( 1 / ( 1 + log($diff) ) ) * $num_tokens_at_pos;
--$other_loc_index;
}
$other_loc_index = $loc_index + 1;
while ( $other_loc_index <= $#$posits ) {
my $diff = $posits->[$other_loc_index] - $location;
Shouldn’t $posits->[$other_loc_index] have [0] on the end?
last if $diff > $window;
my $num_tokens_at_pos = $posits->[$other_loc_index][2];
$locations{$location}
+= ( 1 / ( 1 + log($diff) ) ) * $num_tokens_at_pos;
++$other_loc_index;
}
}
# return the highest scoring position
return ( sort { $locations{$b} <=> $locations{$a} } keys
%locations )[0];
}
More information about the kinosearch
mailing list