KinoSearch::Highlight::Highlighter

Michael Greb michael at thegrebs.com
Wed Jul 16 06:23:49 PDT 2008



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Jul 15, 2008, at 8:04 PM, Marvin Humphrey wrote:
> Hello, Michael,
>
>> I'm using KinoSearch to develop a search engine for the IRC logs I  
>> have browsable on my website.  I am currently using the developer  
>> release, version 0.20_051 due to a need for non-score based sorting  
>> (sort by date).  I am very pleased with KinoSearch so far.  For IRC  
>> logs it makes the most sense to break on line breaks versus periods  
>> for excerpts.  This is an easy one line patch[1] in Highlight/ 
>> Highlighter.pm but it seems a bit overkill to subclass Highlighter  
>> for a one line patch to _gen_excerpt.
>>
>> Perhaps it may make sense to have an argument that allows you to  
>> specify a character/string to prefer breaking on that defaults to  
>> '\.'.  Allowing RegEx syntax would be most flexible and I think  
>> most overriding the default wouldn't have an issue escaping things  
>> but you are the author ;).  I'm really not sure what other than  
>> periods and new-lines someone may want to break on, perhaps tabs,  
>> so would definitely understand should you decide this is a feature  
>> request that wouldn't be used widely enough to merit inclusion.
>
> Sorry for the delayed response.

Not a problem.

> I've been working on Highlighter lately, and I think the answer is  
> to define a couple methods that the user can override:  
> find_sentence_boundaries() and raw_excerpt().  If you're interested  
> in discussing API design for those, we should take up the matter on  
> the KinoSearch mailing list: <http://www.rectangular.com/mailman/listinfo/kinosearch/ 
> >

Indeed, this makes sense and allows for even more specialization.

Re Mailing List: Yes, I fail, too used to the small modules without  
the lists, subscribed a couple of days ago to this and CCing this  
reply there.

Mike
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFIffZl0Qbp4bPZvesRAryNAKCFNBWbExBIxMpJc9ZqlIdrbOGgbACeNw69
QFU7BwgJGgoscT6k+7sVH1E=
=DkXV
-----END PGP SIGNATURE-----




More information about the kinosearch mailing list