[KinoSearch] get doc/query similarity

jack_tanner at yahoo.com jack_tanner at yahoo.com
Fri Apr 11 12:07:05 PDT 2008



> From: Marvin Humphrey <marvin at rectangular.com>
>
> Let's assume you mean a term, for the sake of getting things started.   
> Let's also assume that you don't really mean "one specific document",  
> even though that's exactly what you said. :)

Thanks for that example. Let me be more clear about what is desired: I need to compute the similarity of two indexed documents. I'd like it if the metric was more sophisticated than mere term overlap. At a minimum, it could be Jaccard (i.e., doc length-normalized term overlap). It would be preferable to have something that takes corpus statistics into account. For example, if in my corpus some term T has high TF and low IDF (occurs often and in many docs), then such a term could be downweighted. Could you suggest a way of doing this? Ideally with KS 0.162?

> Interesting.  I received your private email and wrote back.  Maybe  
> hotmail is blocking rectangular.com or something.  AOL blockaded me  
> once because the previous tenants on the Comcast IP block  
> rectangular.com got assigned to weren't good netizens.

I never got your response at my hotmail address, not even in the spam folder. If you like, I could forward a complaint to Hotmail's postmaster. Please send a test e-mail to my @yahoo and cc my @hotmail, and I'll forward that along.




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list