Subclassable Highlighter (was: Re: KinoSearch feature suggestions)
Father Chrysostomos
sprout at cpan.org
Sun Jan 27 18:26:57 PST 2008
On Jan 27, 2008, at 5:14 PM, Marvin Humphrey wrote:
> We'll need to add one extra named arg to the add_spec list. "hit"?
> Or actually, how about "doc"?
>
> # User code:
> my $highlighter = KinoSearch::Highlight::Highlighter->new(
> searcher => $searcher,
> );
> $highlighter->add_spec( name => 'content' );
> my $excerpts = $highlighter->generate_excerpts($hit);
>
> # Internally, highlighter calls single_excerpt:
> for my $spec ( @{ $specs{$$self} } ) {
> $excerpts->{ $spec->{name} } = $self->single_excerpt(
> %$spec,
> doc => $hit,
> );
> }
Actually what I’ve done so far hash ->singe_excerpt($hit, \%spec), but
I think what you have is better. I’ve also made doc_vector an
attribute of Hit (see the attached file [if I remember to attach it
after typing this message]). Now I’m not certain that the hit needs a
reference to the doc number, but it’s in there. Also, it has a
reference to the query, so that single_excerpt can call
$hit_doc->highlight_data( $excerpt_field )
and just has to pass one arg. This highlight_data method calls the
method of the same name on the query and then sorts its return value
and removes duplicates.
> The DocVector object would be retrieved within single_excerpt() --
> which becomes possible once the Highlighter gets a Searcher at
> construction time.
DocVector is currently documented as a private class. Do we want a
‘publicly subclassable’ method to have to deal with it?
>
>
> I'm a little uncertain about dedicating the name "Hit" to the class
> for the documents that Hits::fetch_hit returns. Sure, it works, but
> "hit" is used elsewhere, e.g. the HitCollector class, which doesn't
> deal with *this* kind of "hit". These are essentially Doc objects.
> So I'm thinking make them a subclass of Doc called HitDoc, and have
> the named arg for single_excerpt() be "doc". Sound good?
This sounds fine to me. But I don’t know what the Doc is currently
for....
Also, do we need a HighlightSpan object? Won’t a simple hash do?
Likewise with a heat map.
Father Chrysostomos
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Hit.pm
Type: text/x-perl-script
Size: 2197 bytes
Desc: not available
Url : http://rectangular.com/pipermail/kinosearch/attachments/20080127/b5fe2475/attachment-0001.bin
-------------- next part --------------
More information about the kinosearch
mailing list