[KinoSearch] rfc: faceted search
Nathan Kurz
nate at verse.com
Wed Jul 30 11:03:27 PDT 2008
On Tue, Jul 29, 2008 at 2:53 PM, Nathan Kurz <nate at verse.com> wrote:
> My mental model of faceted search (likely faulty) has two
> requirements: to break down search results by facet, and to allow
> filtering by facet. Can't this be done with just Boolean queries and
> stored field values? Why does it require bitvectors and wrapped
> queries?
Sorry to reply to myself, but I thought about this a bit more last
night. I'm now more sure that bit vectors and wrapped queries are
_not_ a good solution for faceted search. The bit vectors become
unworkable once you have a large number of facets (think of using
'author' as a facet on a large dataset), and the wrapped query
approach doesn't spread well to a cluster (as one would be forced to
round-trip the full list of matching docs over the net and not just
the top-n).
Instead the problem can be better scalably solved by reusing the
existing inverted and document indexes. Really the only change
necessary is accepting that facet counting needs to be done during
the main query, and not as a later step like gathering excepts. This
makes sense, though, as a facet count applies to the full query, and
the counts can be considered a query response just as the matching
docs are.
Instead of wrapping a query, one just uses a FacetCollector. Hit by
hit, this collector (which could be wrapping a HitCollector) steps
through the facet field's DocVector (is this where the forward index
of terms is kept?) adding up term occurrences.
These facet counts are then returned as part of the response along
with the top matches. Caching, if it is to occur, happens in a
centralized way (memcached) at the top level for the entire query,
rather than component by component within the cluster.
Your servant of Ockham,
Nathan Kurz
nate at verse.com
ps. I can't tell at a glance how KinoSearch currently treats fields
in its indexes. For example, does each field get its own lexicon?
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list