[KinoSearch] fuzzy searches

Nick Wellnhofer wellnhofer at aevum.de
Mon Mar 15 07:18:12 PDT 2010


On 15.03.2010 06:15, Marvin Humphrey wrote:
> LSI/LSA (Latent Semantic Indexing/Analysis, "LSA" seems to have become more
> common) fell out of patent a couple of years ago.  The matrix algebra needed
> to perform the data reduction is heavy-duty math, beyond my capabilities.  But
> it sure is interesting to think about it in terms of vector space clustering.

There are also more approaches than LSA. But internally, KinoSearch only
has to work with the "topic" (or "concept") vectors of each document and
could support different pluggable models to compute those vectors from
the term-document matrix.

If anyone is interested in working on something like that I would gladly
contribute. I also have a little mathematical background and did some
research on matrix approximation recently. It seems that the fastest
algorithms are based on random sampling. A very good introductory paper is

FINDING STRUCTURE WITH RANDOMNESS: STOCHASTIC ALGORITHMS FOR
CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS
http://arxiv.org/PS_cache/arxiv/pdf/0909/0909.4061v1.pdf

Nick


-- 
aevum gmbh
rumfordstr. 4
80469 münchen
germany

tel: +49 89 3838 0653
http://aevum.de/



More information about the kinosearch mailing list