[KinoSearch] Finding matching search terms
Marvin Humphrey
marvin at rectangular.com
Mon Dec 31 13:57:47 PST 2007
On Mon, Dec 31, 2007 at 09:21:24AM -0800, colossus forbin wrote:
> This approach would make sense for a large site that expects a large
> set of search terms, but what about a small site expecting a limited
> number of terms, such as a small ecommerce site with a limited number
> of products. If a user misspells a product name, it would make sense
> to not only offer a corrected spelling, but perhaps suggest a similar
> product which is carried by the site. These actions would be done at
> run-time so it would be important to know which terms did not
> contribute to any hits.
There's a method on Searcher, doc_freq(), which returns an integer telling you
how many documents a given term occurs in. Terms with a doc_freq of 0
have no chance of contributing to a score.
Searcher->doc_freq isn't public yet, but there's a good chance it will
be exposed in time -- document frequency information will always be
needed during the Query-to-Scorer compilation phase for weighting. I don't
think the API has changed since 0.05, so go ahead and use it, with caution. :)
Once you've identified your terms, you need to figure out what to suggest.
Aspell would help with ordinary words, but might not help with product names.
Maybe one of the edit-distance CPAN modules could help.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
More information about the kinosearch
mailing list