[KinoSearch] Field-specific terms vs. query filter?

Larry Leszczynski larryl at emailplus.org
Sun Dec 2 10:03:17 PST 2007




On Sat, 1 Dec 2007, Marvin Humphrey wrote:

>> Is there any syntax to query an analyzed field for an exact match? 
>> E.g. if I want to match "cream" but not "creamy"?
>
> What ends up in the index depends on the Analyzer assigned to that 
> field. The choice the KS docs shunt you towards as the easiest is a 
> PolyAnalyzer that's actually a series of three other analyzers. 
> Here's the code from PolyAnalyzer's constructor:
>
>   # create a default set of analyzers if language was specified
>   if ( !defined $args->{analyzers} ) {
>       confess("Must specify either 'language' or 'analyzers'")
>           unless $language;
>       $args->{analyzers} = [
>           KinoSearch::Analysis::LCNormalizer->new,
>           KinoSearch::Analysis::Tokenizer->new,
>           KinoSearch::Analysis::Stemmer->new( language => $language ),
>       ];
>   }
>
> The element that reduces "creamy" to "cream" is the Stemmer.  If you 
> take it out of the loop, then searches for "creamy" will no longer 
> docs matching "cream" and vice versa.  Whether that's desirable 
> depends on your application.  It's perfectly reasonable to index 
> twice, so long as you have the resources to spare (you'd probably 
> only want to have only one of the fields be "stored").
>
> Take the Tokenizer out of the loop too, and then searches for 
> "creamy" will no longer match a field whose value is "Creamy 
> Goodness".  Take the LCNormalizer out of the loop too, and then 
> there's no more analysis being performed -- so only exact matches 
> will succeed.

Coolness, info much appreciated as usual!


_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list