[KinoSearch] Queries with large number of hits.

Marvin Humphrey marvin at rectangular.com
Sun Sep 14 16:36:43 PDT 2008


On Sep 13, 2008, at 1:56 PM, Dan wrote:

> So now I have made claims... :)
> I'll try to give more details.

In my book, benchmarking claims presented without code, corpus, stats,  
raw data, and detailed methodological descriptions qualify as  
"anecdotal evidence".  If you have a scientific background, you know  
what that means: not to be ignored, but requiring a high degree of  
skepticism and not particularly useful.

> So as you can see this whole "test" is pretty simple with many
> possible holes to try and get this Apples Vs Oranges test running.

KinoSearch is a low-level engine analogous to Lucene; Solr is a higher- 
level library built on top of Lucene that does a lot of extra stuff,  
including copious caching.

A comparison of Lucene to KinoSearch would be more germane from a  
development standpoint.  By using Solr rather than Lucene, you've  
polluted the experiment with an extra layer of variables.  I actually  
think that testing with all of Solr's default caching mechanisms *on*  
would be more interesting in a sense than what we've gotten from you  
so far.  It wouldn't be helpful for development in terms of  
identifying optimization opportunities within KS, but it might be more  
interesting for decision makers.

> Is there anything I can do to make these searches perform better?

There are a couple of known issues that on the todo list that affect  
search speed.  One is a bugfix (SegPList_Skip_To had to be temporarily  
disabled due to corrupt .skip files), and the other is a design flaw,  
described in <http://www.mail-archive.com/java-dev@lucene.apache.org/msg15825.html 
 >.  Additionally, implementing the PForDelta compression algorithm  
for postings should speed up searching, but I'd planned to put that off.

However, measuring progress on those issues using a closed source  
benchmark with "many possible holes" would be foolish.  If we're going  
to do benchmarking at all, we're going to do it right: <http://www.rectangular.com/kinosearch/benchmarks.html 
 >.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/




More information about the kinosearch mailing list