[KinoSearch] Search in a clustered environment

Marvin Humphrey marvin at rectangular.com
Tue Jan 16 17:59:10 PST 2007


On Jan 16, 2007, at 4:58 PM, Miles Crawford wrote:

> I notice that your workaround as described in
>
> http://www.rectangular.com/kinosearch/docs/devel/KinoSearch/Docs/ 
> NFS.html
>
> does not help when multiple process may be updating an index.

The "stale NFS filehandle" problem is still "solved" so long as you  
don't allow readers and writers to be open against the same index at  
the same time.

The workaround doesn't address the problem of coordinating multiple  
machines writing to an index on a shared volume, though.  That's a  
new problem I figured out moments ago.  Just by releasing a version  
of KS which reverts to the old behavior of throwing an exception  
rather than deleting the lock file, the contention issue is mitigated  
-- you still have to deal with the exceptions, but no more potential  
index corruption.

> Still, using this strategy I could perhaps have a process that runs  
> once a minute and indexes everything created across all cluster- 
> members since the last run. Clumsy, and it would be much better  
> from a usability perspective to block until the index is updated in  
> my case.

In general, inverted indexes do not update as nimbly as traditional  
databases, and if you enter with the mindset that they must, you will  
end up disappointed.

>> SearchServer and SearchClient were designed to support distributed  
>> search. However, I doubt you'll need them for 2,000 searches.  Are  
>> you anticipating an increase in volume, or are you concerned  
>> primarily about multi-machine interop?
>
> It's not 2,000 searches, it's 2,000 new items added to the index  
> per day. There is no search functionality yet, so god only knows  
> how many searches per day will be run.
>
> I'm really only interested in multi-machine interoperability at  
> this point. I'd love to extend the search feature to other tools we  
> run at some point, but they'd probably be using their own separate  
> indexes.  That said, is there anything I should know about using  
> SearchServer/Client to achieve this?

SearchServer and SearchClient are there to diffuse the cost of  
searching a large corpus over several machines.  They know nothing  
about how the indexes were created, but the MultiSearcher with which  
you would aggregate several SearchClients assumes that each sub- 
searcher is responsible for unique content.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/





More information about the kinosearch mailing list