[KinoSearch] get doc/query similarity

Nathan Kurz nate at verse.com
Sat Apr 19 10:54:09 PDT 2008



On Fri, Apr 18, 2008 at 7:22 AM,  <jack_tanner at yahoo.com> wrote:
>  $doc1 = $invindex->get_doc(id_field => 'doc_id', id_value => $id1);
>  $doc2 = $invindex->get_doc(id_field => 'doc_id', id_value => $id2);

Something like this seems like a fine idea.  Being able to pull a doc
out of the index seems useful.

>  $similarity = $doc1->get_cosine($doc2);
>  $similarity = $doc1->get_similarity($doc2, $my_similarity_fxn);

I'm less happy with this approach.  This relates to my general feeling
that the way to solve this problem is not by adding new interfaces,
but by generalizing the scoring mechanism to allow alternative scorers
like this one.  The 'right' solution would be to create a
CosineScorer, and make it work with the existing infrastructure.
Adding convenience methods to score individual documents would be
great, but should be added to the Scorer (or Similarity? or Weight?)
rather than to the Doc.

For background, one of the ways I was abusing KinoSearch in the past
was trying to use it as a framework for the Netflix Prize, which is a
contest to predict movie ratings: http://www.netflixprize.com
Loosely, my Docs were Users, my Terms were Movies.  Much like Jack was
doing for similarity between documents, I wanted to be use Kinosearch
to do kNN searches for similar users.  If you squint hard enough, this
problem is really similar to full text search.  I failed and went with
a custom framework, but I don't see any reason why KinoSearch couldn't
accommodate other similar abuses.

Nathan Kurz
nate at verse.com

_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list