[KinoSearch] get doc/query similarity

Marvin Humphrey marvin at rectangular.com
Thu Apr 10 15:30:40 PDT 2008




On Apr 10, 2008, at 2:33 PM, jack_tanner at yahoo.com wrote:

> 1) I'd like to compute TF and IDF between a query and one specific
> indexed document. What's the best way to do that?

Hmm, IDF for a *query*, not just a term?  A query could be a lot of  
different things.  To know the IDF, you have to know how many  
documents the query matches.  To do that for an arbitrary query, you  
have to run a search.  KinoSearch::Search::Similarity has a private  
idf() method, but it works on terms, not arbitrary queries...

Let's assume you mean a term, for the sake of getting things started.   
Let's also assume that you don't really mean "one specific document",  
even though that's exactly what you said. :)

Here's some code that goes in that general direction: it prints out TF  
for each document which matches a specific term.  It requires svn  
trunk and uses some private methods:

   my $invindex = MySchema->open('/path/to/invindex');
   my $reader = KinoSearch::Index::IndexReader->open(
     invindex => $invindex,
   );
   my $posting_list = $reader->posting_list(
     field => 'title',
     term  => 'foo',
   );
   my $sim = $invindex->get_schema->fetch_sim('title');
   while ( my $doc_num = $posting_list->next ) {
     my $doc             = $reader->fetch_doc($doc_num);
     my $posting         = $posting_list->get_posting;
     my $num_occurrences = $posting->get_freq;
     my $tf              = $sim->tf($freq);
     print "'$doc->{title}' FREQ: $num_occurrences TF: $tf\n";
   }

> P.S. FYI, I could not subscribe to this list, post messages, or  
> apparently even e-mail marvin at rectangular directly from my  
> hotmail account.

Interesting.  I received your private email and wrote back.  Maybe  
hotmail is blocking rectangular.com or something.  AOL blockaded me  
once because the previous tenants on the Comcast IP block  
rectangular.com got assigned to weren't good netizens.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list