[KinoSearch] using kinosearch without stemming

Marvin Humphrey marvin at rectangular.com
Thu Jun 7 19:07:35 PDT 2007


On Jun 7, 2007, at 11:31 AM, Hans Dieter Pearcey wrote:

> I like using KS.  It's fast, and though I sometimes get lost in the  
> twisty maze
> of classes, the documentation is generally pretty good.

Thanks!

I'll have more to say about navigating the twisty maze later... maybe  
over the weekend...

> I especially like using it for things that I might have previously  
> used a
> database for -- log files and the like, where I want quick and  
> flexible
> searching.

Thanks, it's good to know how people are using KS beyond the  
archetypal setup of CGI search for a website.

> Assume a field called "action" that can have (among other values)  
> "rejected".
> I don't want this to be stemmed, because rather than being ordinary  
> speech it's
> effectively like an enum.  So my first instinct is to make a  
> FieldSpec subclass
> with
>
>   sub analyzed { 0 }
>
> However, this only seems to take effect while building the invindex.

You're right.  It's a bug in QueryParser.  Here's the code that's  
been misbehaving:

    for my $field (@$fields) {
         # custom analyze for each field unless override
         my $analyzer = $supplied_analyzer;
         $analyzer = $schema->fetch_analyzer($field) unless defined  
$analyzer;

         my @token_texts = grep {length} $analyzer->analyze_raw($text);
         my $query = $self->_gen_single_field_query( $field,  
\@token_texts );
         push @queries, $query if defined $query;
    }

QueryParser was finding the "correct" analyzer for the field -- since  
none was specified, fetch_analyzer() returns the main analyzer for  
$schema.  However, QueryParser wasn't obeying the field's analyzed()  
property, as you discovered.

The problem is fixed as of subversion repository revision 2465.

   svn co -r 2465 http://www.rectangular.com/svn/kinosearch/trunk ks

> When I search for "action:rejected",

You may have seen this in a recent post of mine, but just FYI the  
'field_name:term_text' syntax is now off by default in QueryParser.   
You can get it back via $query_parser->set_heed_colons(1).

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/





More information about the KinoSearch mailing list