[KinoSearch] rfc: faceted search
Nathan Kurz
nate at verse.com
Tue Jul 29 13:53:40 PDT 2008
On Tue, Jul 29, 2008 at 5:33 AM, Andrew Bramble
<bramble.andrew at gmail.com> wrote:
> To provide faceted search (FS) capability for KinoSearch (KS)
> requires, to quote Marvin "massive server-side caching". We like!!
My mental model of faceted search (likely faulty) has two
requirements: to break down search results by facet, and to allow
filtering by facet. Can't this be done with just Boolean queries and
stored field values? Why does it require bitvectors and wrapped
queries?
For a simple example, presume that each 'doc' is given a color and a
size. We want to search for large, blue widgets. Can't we just
search for "size:large && color:blue && text:widget"? This seems like
it would already be pretty efficient.
And on the display side, wouldn't it be easier just to have a forward
index listing the combined facets for each doc? You'd have to buzz
over this list for the results of each new query, but it seems like it
would be easy to cache the facet counts at the application level.
What am I missing?
Nathan Kurz
nate at verse.com
>
> A FS class would trawl the index on startup (or even better during
> index time and store with the invindex, there's an API for index
> overlays... right?!?) and generate bitvectors for desired field(s)
> terms - storing a 1 in doc_num position for documents posessing the
> given term. For a field with 100 facets or terms - you'd need 100 bit
> vectors of at most max_docs bits long.
>
> An FS query would wrap a regular KS query - AND'ing the query results
> with each term's cached bitvector to derive a count of documents
> within the wrapped query that posess that term ( 100 ANDs + 100 counts
> of bitvectors no greater than maxdoc bits ).
>
> I have made a VERY naive implementation of this without glueing into
> KinoSearch XS/C, since I confess to _barely_ grokking charmony + XS +
> C beyond kindergarten level.
>
> Facet::Counter2 (yes I embrace version control really) is quite
> hopeless from a practical standpoint and will only count the facets of
> documents returned by KS::Search::Hits , limited to num_wanted.
>
> My next goal would be to better understand KS internals so as to
> * use KS BitVectors to replace scalars and vec
> * make Facet::Counter into KSx::Search::FacetQuery to collect the >0
> scored results of a child query and count facets this way instead of
> using KS::Search::Hits
>
> Constructive abuse welcome.
>
> AB
>
> _______________________________________________
> KinoSearch mailing list
> KinoSearch at rectangular.com
> http://www.rectangular.com/mailman/listinfo/kinosearch
>
>
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list