[KinoSearch] ProximityQuery
Marvin Humphrey
marvin at rectangular.com
Mon Mar 15 21:49:07 PDT 2010
On Mon, Mar 15, 2010 at 10:57:28PM -0500, Peter Karman wrote:
> I'd like to offer a proximity query type in my app, so that I can search like:
>
> foo NEAR10 bar
>
> to find all instances of 'foo' within 10 token positions of 'bar'.[0]
>
> It seems like the place to start, if I were to take the route of
> subclassing/extending an existing class, is the PhraseQuery feature,
> specifically the PhraseScorer and the internal winnow_anchors() function. Am I
> on the right track here?
As you seem to have noted already, the hard part will be the Matcher class,
not the Query.
Within the existing KS code base, PhraseScorer would be the closest thing to
what you want. It wasn't really built to handle nearness, but maybe it can be
adapted.
If you want to see other prior art, Lucene has SpanNearQuery
and SpanScorer:
http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/search/spans/SpanNearQuery.java
http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/search/spans/SpanScorer.java
Also, Lucene's PhraseScorer takes a "slop" parameter, which KinoSearch's does
not. I forget exactly what it does and how it differs from
SpanNearQuery/SpanScorer.
http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/search/PhraseScorer.java
> [0] I believe Lucene syntax for that query is "foo bar"~10
Yes.
http://lucene.apache.org/java/3_0_1/queryparsersyntax.html#Proximity%20Searches
That '10' is the 'slop' parameter.
Do you have an idea yet as to how you might publish this?
Marvin Humphrey
More information about the kinosearch
mailing list