[KinoSearch] Feature request: Search facet counts in Kinosearch?

Marvin Humphrey marvin at rectangular.com
Thu Jun 12 12:53:03 PDT 2008


On Jun 12, 2008, at 11:16 AM, Nathan Kurz wrote:

> Is the faceted approach layered on top of the search as
> a post-processing filter, or are the facets being handled directly by
> the search engine?

The main trick for obtaining the facet counts is massive server-side  
caching.

You cache doc sets for each facet.  A BitVector works well for facets  
which match lots of documents; for more sparse sets, a SortedVIntList,  
which encodes a set of integers using a compressed format, may use  
less memory.

When you search, you use a dual-purpose HitCollector which wraps both  
a TopDocCollector and a BitCollector.  The TopDocCollector gets you  
your standard search results ranked by score.

The BitCollector gets you a list of all the doc numbers that matched.   
For each facet that you want a result for, you count the number of  
docs in the intersection of the main result set with the facet's  
cached result set.

The other problem is how to decide which facets to evaluate each query  
against.  I think most people use sort of drill-down, where top-level  
queries are compared against general categories, and once you select  
one of those categories (e.g. by clicking on "DVDs", or "Books"), the  
facet set changes.  However, I don't believe that Solr constrains you  
with regard to how you select facets.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/




More information about the KinoSearch mailing list