[KinoSearch] Stemming and Term/TermQuery

Marvin Humphrey marvin at rectangular.com
Tue Aug 14 19:59:10 PDT 2007




On Aug 14, 2007, at 4:19 PM, Evaldas Imbrasas wrote:

>  The first series is done using a simple query call:
>     my $hits = $searcher->search(query => 'organic');
>
> The second series is done using TermQuery:
>     my $term = KinoSearch::Index::Term->new(title => 'organic');
>     my $by_title = KinoSearch::Search::TermQuery->new(term => $term);
>     my $hits = $searcher->search(query => $by_title);
>
> I expect both series to produce the same results, since there's only
> one field indexed per document.

They will not.  The one passing through the Searcher is receiving  
additional processing -- crucially, it is being passed through an  
Analyzer.  In the first you are searching for 'organ', which is in  
the index.  In the second, you are searching for 'organic', which is  
not.

> However, the output is different for
> some search terms:
>
> Test searches:
>
>   cotton:       10 results
>   bags: 29 results
>   organic:      18 results
>   bamboo:       7 results
>   clothes:      7 results
>
> Test term searches:
>
>   cotton:       10 results
>   bags: 0 results
>   organic:      0 results
>   bamboo:       7 results
>   clothes:      0 results

Try out each of these terms at <http://snowball.tartarus.org/ 
demo.php>.  The ones where the stemmed output is identical to the  
input produce identical results.

PS to the list regarding my continuing absence...

I mentioned in a post a little while ago that for the contract job  
I've been working on, testing had begun and the project lead had  
left.  Testing has gone about as well as we might have expected.   
However, the absence of the project lead has made our troubleshooting  
significantly less efficient, and I have had to step it up to  
compensate as best I can.  There is a lot of money at stake for a lot  
of people, and I intend to continue pouring 100% of my efforts into  
this job until it is certain that we are free and clear.  I  
appreciate your continuing patience with this pause.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list