Subclassable Highlighter (was: Re: KinoSearch feature suggestions)
Father Chrysostomos
sprout at cpan.org
Thu Jan 24 09:45:16 PST 2008
On Jan 23, 2008, at 10:33 PM, Marvin Humphrey wrote:
> However, I think that the internal methods of Highlighter ought to
> remain internal. They were not designed to be public. The division
> of labor amongst them isn't particularly elegant or clean. They
> call other, non-public methods within the KinoSearch suite.
>
> I think we need the public interface for Highlighter to be more
> general. And perhaps it usage shouldn't be integrated into Hits as
> it is now. I think this, which is along the lines of Peter's
> Search::Tools::HiLiter, would be the better:
>
> my $highlighter = KinoSearch::Highlight::Highlighter->new(
> searcher => $searcher,
> );
> my $hits = $searcher->search( query => $query_string );
> while ( my $hit = $hits->fetch_hit ) {
> my $excerpts = $highlighter->generate_excerpts($hit);
> ...
> }
Sounds good.
> The only public methods on Highlighter would be
> generate_excerpts($hit), get_formatter(), and get_encoder(). If we
> add others -- and I can see how that would benefit you -- they
> should be more coherent than the current internal methods.
I’d certainly like to avoid copying and pasting the code for
calculating the best location and for ‘rounding’ the ends to the
nearest sentence. What would you suggest (or dictate, since you’re in
charge :-) that the methods be?
>
>
> Having the Highlighter operate standalone a la
> Search::Tools::HiLiter just makes more sense. Unfortunately, the
> problem with that design is that $hits->fetch_hit_hashref returns a
> plain old hashref, and that's not enough. Crucially, the hashref
> does not convey the document number, which is needed to retrieve the
> DocVector associated with the hit from the $searcher.
>
> I've pondered how to associate the return value of $hits-
> >fetch_hit_hashref with a document number for a while, without
> arriving at a satisfactory solution.
>
> 1) Stuffing it into the hashref like $hashref->{_kino_doc_num} is
> cheesy. It's unexpected, messes up the parallels with DBI's
> fetch_row_hashref, and would get in the way if someone wanted to
> iterate over their fields using Perl's hash-manipulation functions.
>
> 2) It might make sense to implement KinoSearch::Document as a
> blessed hashref. However, it would be the only class in all of KS
> which didn't subclass KinoSearch::Obj, and thus which couldn't be
> used at the C level.
>
> 3) Implementing KinoSearch::Document as a standard KS object and
> giving it a $document->to_hashref method would work...
>
> while ( my $hit = $hits->fetch_hit ) {
> my $hashref = $hit->to_hashref;
> my $excerpts = $highlighter->generate_excerpts($hit);
> ...
> }
>
> ... but it's a little less elegant than returning a plain old
> hashref and there's extra overhead from unnecessary string copying.
The object could have %{} overloading, but that would cause extra
overhead, as the method would have to check ‘caller’ each time, to
make sure it’s not KS:H:Highlighter. But since the number of hits is
likely to be relatively small, maybe that’s not too bad.
Another thing you could do is to use Hash::Util::FieldHash(::Compat)
to assign a document number to the hash ref. This would be quite
inelegant, though, and it’s not clear exactly which class should own
the field hash.
Father Chrysostomos
More information about the kinosearch
mailing list