[KinoSearch] Compacting the core

Marvin Humphrey marvin at rectangular.com
Thu Jun 12 19:08:41 PDT 2008


Greets,

(I'm cc'ing this to lucy-dev at lucene.apache.org, because I think Lucy  
should follow the same design principles described in this post.)

KinoSearch is spinning off a few modules, to cut down on the core size  
and complexity.  For the present time, they will continue to be  
distributed with the KinoSearch tarball, but eventually they will  
become separate distributions.

KinoSearch::Search::SearchServer and KinoSearch::Search::SearchClient  
have moved to KSx::Remote::SearchServer and  
KSx::Remote::SearchClient.  Eventually, they will be distributed under  
KSx::Remote.

The rationale for breaking out SearchServer/SearchClient is that there  
are many ways to have machines interconnect; the Socket/faked-up-rpc  
approach taken by SearchClient/SearchServer, the XML approach used by  
Solr, etc.  For core, it is only crucial that the messages that have  
to be sent over the network be serializable using *some* technique --  
it's not important what technique is chosen.

The other spinoff is Filter.  KinoSearch::Search::Filter,  
KinoSearch::Search::QueryFilter, and KinoSearch::Search::PolyFilter  
have all been removed; their functionality is now encapsulated in  
KSx::Search::Filter, which has been refactored as a subclass of  
Query.  The last filter subclass, KinoSearch::Search::RangeFilter, has  
been replaced by a new core class, KinoSearch::Search::RangeQuery  
(which behaves similarly to Lucene's ConstantScoringRangeQuery with a  
fixed score of 0).

The standard KS search methods no longer take a 'filter' argument.   
Here's the new Filter API in action:

   my %category_filters;
   for my $category (qw( sweet sour salty bitter )) {
     my $cat_query  = KinoSearch::Search::TermQuery->new(
       field => 'category',
       term  => $category,
     );
     $category_filters{$category} = KSx::Search::Filter->new(
        query => $cat_query,
     );
   }

   while ( my $cgi = CGI::Fast->new ) {
     my $user_query = $cgi->param('q');
     my $filter = $category_filters{$cgi->param('category')};
     my $and_query = KinoSearch::Search::ANDQuery->new;
     $and_query->add_child($user_query);
     $and_query->add_child($filter);
     my $hits = $searcher->search( query => $and_query );
     ...

Filter is moving outside of core because it is essentially nothing  
more a caching optimization.  Logically, the following code would  
produce exactly the same results as the code above:

   while ( my $cgi = CGI::Fast->new ) {
     my $user_query = $cgi->param('q');
     my $category_query = KinoSearch::Search::TermQuery->new(
       field => 'category',
       term  => $cgi->param('category'),
     );
     $category_query->set_boost(0);
     my $and_query = KinoSearch::Search::ANDQuery->new;
     $and_query->add_child($user_query);
     $and_query->add_child($category_query);
     my $hits = $searcher->search( query => $and_query );
     ...

The only significant differences are that the Filter only runs the  
query once, and that it can't be serialized and sent over the network  
in a search cluster (because the search results are cached in a  
BitVector which is too big to send).

Lucene provides classes called RemoteCachingWrapperFilter and  
FilterManager that address the problem of filter caching in search  
clusters, and whose functionality might eventually end up in either  
KSx::Remote or KSx::Search::Filter.  Again, though, they are caching  
optimizations with serialization limitations and as such belong  
outside of core.

I thought about keeping Filter as an abstract base class, and putting  
the actual functionality into KSx::Search::QueryFilter or something  
like that.  However, after reviewing the various Filter subclasses in  
both Lucene's core and contrib, it looked to me as though nearly all  
of them (all except for the SpanFilter subclasses which would need to  
be different anyway) could be realized using either ordinary Queries  
or Queries in conjunction with this new implementation of Filter.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/




More information about the kinosearch mailing list