[KinoSearch] Weight requires a Similarity

Marvin Humphrey marvin at rectangular.com
Sat Mar 1 12:56:28 PST 2008


On Mar 1, 2008, at 10:57 AM, Father Chrysostomos wrote:

> I just got this error message when testing my RegexpTermWeight class:
>
> Error in function kino_Weight_init_from_hash at ../c_src/KinoSearch/ 
> Search/Weight.c:24: Can't find 'similarity'
>
> Although this is not documented, Weight apparently requires a  
> Similarity object to be passed to its constructor. Is this permanent  
> (in which case I can write a doc patch)? or is this going to change?

It's permanent.  I've committed a doc patch, and would appreciate  
review:

   http://xrl.us/bgzv6 (Link to www.rectangular.com)

Similarity objects are assigned via Schema, almost exactly like  
Analyzers.  The Schema itself has one primary Similarity; individual  
FieldSpecs may override FieldSpec::similarity() to provide another.   
The only difference is that Schema::analyzer() is an abstract method  
that every subclass has to implement, while Schema::similarity()  
returns a standard Similarity object by default.

If every Weight subclass was associated with a field, it would be  
possible to automatically retrieve the correct similarity like so:

    my $sim = $searchable->get_schema->fetch_sim($field);

However, some Weight subclasses don't have a field -- e.g.  
BooleanWeight.

It's tempting to default to the Schema's primary similarity, but  
that's not failsafe design.  If a field-specific Weight subclass fails  
to supply a value for "similarity", it should trigger an error rather  
than the silently incorrect behavior of defaulting to the wrong object.

> Note that this also prevents the example  
> in ::Cookbook::WildCardQuery from working.


I changed the Cookbook example in the commit as well.  However, both  
Weight and Cookbook::WildCardQuery now refer to Schema::fetch_sim,  
which isn't currently exposed as a public method.  We'll need to fix  
that.

I think Nathan will argue that Schema::fetch_sim should be renamed to  
Schema::fetch_similarity() before it goes public.  If he does, he's  
probably right.  :)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/




More information about the kinosearch mailing list