[KinoSearch] Subclassable Highlighter
Father Chrysostomos
sprout at cpan.org
Fri Feb 1 17:59:28 PST 2008
On Feb 1, 2008, at 4:47 PM, Marvin Humphrey wrote:
> Another nice patch. :) Sorry it's taken me a bit to respond --
> been sick. Today I'm just narcoleptic instead of comatose, though,
> so I've been able to complete the review.
I hope you get well soon.
>
>
>> I saw in another message that you wanted the Scorer to provide the
>> HighlightSpans.
>
> Having pondered the subject a little longer, and having seen your
> patch, I now lean towards Weight as the best place to put the
> highlight_spans() method.
>
> [...]
>
> Compiling a Query to a Weight, though, is a little expensive to be
> doing each for each document. I think the better solution is to
> have the Highlighter compile the Query to a Weight once and cache it
> as a member var, then have the cached Weight do the work.
Since the searcher has to do the same, is there anyway to steal it’s
weight object, before it discards it? Or does that cease to be
feasible when one uses SearchClient et al.?
> We'll need to make some more APIs public in order for you to access
> these capabilities in your custom Highlighter subclass.
>
> * Weight
> * Query::make_weight.
> * Weight::highlight_spans.
>
> Existing subclasses of Weight like TermWeight will stay private.
Sounds good.
> PS: I saw this comment in HeatMap.pm:
>
> # XXX: This calls the same methods over and over, as does the block
> # below. Is there any way to speed this up?
> my @orig_posits = sort {
> $a->get_start_offset <=> $b->get_start_offset ||
> $b->get_end_offset <=> $a->get_end_offset
> } @$spans;
>
> If that section turns out to be a bottleneck, it's trivial to port
> it to C, where it will be lightning fast.
Another way to do it is to make a hash first, but that’s still slower
than C, I imagine.
Speaking of optimisations, I found a redundant line of code when
poking around inside Similarity.pm:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ks-similarity.diff
Type: application/octet-stream
Size: 477 bytes
Desc: not available
Url : http://rectangular.com/pipermail/kinosearch/attachments/20080201/5e040ead/attachment-0002.obj
-------------- next part --------------
Father Chrysostomos
-------------- next part --------------
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list