[KinoSearch] Feature request: Search facet counts in Kinosearch?
Marvin Humphrey
marvin at rectangular.com
Thu Jun 12 12:53:03 PDT 2008
On Jun 12, 2008, at 11:16 AM, Nathan Kurz wrote:
> Is the faceted approach layered on top of the search as
> a post-processing filter, or are the facets being handled directly by
> the search engine?
The main trick for obtaining the facet counts is massive server-side
caching.
You cache doc sets for each facet. A BitVector works well for facets
which match lots of documents; for more sparse sets, a SortedVIntList,
which encodes a set of integers using a compressed format, may use
less memory.
When you search, you use a dual-purpose HitCollector which wraps both
a TopDocCollector and a BitCollector. The TopDocCollector gets you
your standard search results ranked by score.
The BitCollector gets you a list of all the doc numbers that matched.
For each facet that you want a result for, you count the number of
docs in the intersection of the main result set with the facet's
cached result set.
The other problem is how to decide which facets to evaluate each query
against. I think most people use sort of drill-down, where top-level
queries are compared against general categories, and once you select
one of those categories (e.g. by clicking on "DVDs", or "Books"), the
facet set changes. However, I don't believe that Solr constrains you
with regard to how you select facets.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
More information about the KinoSearch
mailing list