[KinoSearch] fuzzy searches

Peter Karman peter at peknet.com
Sun Mar 14 21:45:15 PDT 2010


Dermot wrote on 2/5/10 5:23 AM:
> Hi,
> 
> I've been asked to see if I can make my searches more *fuzzy*. I had a
> look at this thread in the archives:
> 
> http://www.rectangular.com/pipermail/kinosearch/2006-May/000165.html
> 
> The thread is a bit old and some of the modules referred to have
> changed names. So I could use some up-to-date strategies on how I cam
> make searches return fuzzier results.
> 
> I am not sure that altering the default pattern in the Tokenizer would
> edge me towards my goal. I had thought, like the person in the thread
> above, to use ASpell to find spelling suggestions and passing these to
> PhraseQuery or ORQuery and append them to the query. An another option
> I considering was be to break the query into letter substrings and
> append them to the query. But I think that what I am trying to do has
> already been done. Looking at the pod for QueryParser and
> Search::Compiler,  I suspect that there may be ways to do what I want
> without resorting to hacking at the querying string.
> 
> Can anyone offer any suggestions?

fwiw, I like to offer 2 versions of an index, a 'strict' version and a 'fuzzy'
version, with some kind of selector in my UI to toggle which one is used.

The strict version has stemming off, case-folding on, stopwords off, and --
depending on the business rules -- is the default index.

The fuzzy version has stemming on, case-folding on, stopwords heavily applied
(targeting non-nouns), and then is augmented with alternate spelling
(misspellings) and thesaurus terms for each noun term. I'm also very interested
in experimenting with LSI[0] which I see on the KS BrainDump list[1].

[0] http://www.knowledgesearch.org/lsi/lsa_definition.htm
[1] http://www.rectangular.com/kinosearch/wiki/VectorSpaceModel


-- 
Peter Karman  .  http://peknet.com/  .  peter at peknet.com



More information about the kinosearch mailing list