[KinoSearch] get doc/query similarity
jack_tanner at yahoo.com
jack_tanner at yahoo.com
Fri Apr 11 12:07:05 PDT 2008
> From: Marvin Humphrey <marvin at rectangular.com>
>
> Let's assume you mean a term, for the sake of getting things started.
> Let's also assume that you don't really mean "one specific document",
> even though that's exactly what you said. :)
Thanks for that example. Let me be more clear about what is desired: I need to compute the similarity of two indexed documents. I'd like it if the metric was more sophisticated than mere term overlap. At a minimum, it could be Jaccard (i.e., doc length-normalized term overlap). It would be preferable to have something that takes corpus statistics into account. For example, if in my corpus some term T has high TF and low IDF (occurs often and in many docs), then such a term could be downweighted. Could you suggest a way of doing this? Ideally with KS 0.162?
> Interesting. I received your private email and wrote back. Maybe
> hotmail is blocking rectangular.com or something. AOL blockaded me
> once because the previous tenants on the Comcast IP block
> rectangular.com got assigned to weren't good netizens.
I never got your response at my hotmail address, not even in the spam folder. If you like, I could forward a complaint to Hotmail's postmaster. Please send a test e-mail to my @yahoo and cc my @hotmail, and I'll forward that along.
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list