[KinoSearch] Wildcards (was: Re: KinoSearch feature suggestions)

Father Chrysostomos sprout at cpan.org
Fri Jan 25 12:57:29 PST 2008




On Jan 25, 2008, at 11:34 AM, Marvin Humphrey wrote:

> I've just committed some revised docs for Lexicon.  Please let me  
> know if this is sufficiently clear:
>

>    KinoSearch::Index::Lexicon - Iterator for a field's Terms.
>
>    =head1 SYNOPSIS
[...]

Perfect. :-)

>
> So here's my latest thought:
>
>   my $seeked_lexicon        = $reader->lexicon( 'content', 'foo' );
>   my $unseeked_lexicon      = $reader->lexicon( 'content', undef );
>   my $seeked_posting_list   = $reader->posting_list( 'content',  
> 'foo' );
>   my $unseeked_posting_list = $reader->posting_list( 'content',  
> undef );
>
> Does that make sense?

Yes, it does.

>  2. At some point, it would be nice to support non-text fields.

Since binary data can be stored in a string, it is already supported,  
is it not?


> Hmm.  Thinking over the second point, perhaps it would be best if  
> Lexicons only stored field values rather than terms.  In Lucene,  
> that wouldn't work because TermEnum objects handle multiple fields,  
> but in KS, the field is fixed.

Do you mean that the field contains the terms, which contain the field  
name? This does seem redundant.

> Making such a change wouldn't be trivial, but it's probably  
> worthwhile.

That would certainly make things simpler. Of course, it’s up to you.

> A RegexQuery class would be nice to have, but it would have some  
> significant limitations.  If it used the existing KS index data  
> structures, it would not behave like a typical SQL regex or LIKE  
> query, matching the regex against the non-tokenized contents for  
> each field.  If you did something like this...
>
>  my $regex_query = KSx::Search::RegexQuery->new(
>    field => 'content',
>    regex => qr/three blind/,
>  );
>
> ... and the 'content' field was tokenized, the regex wouldn't match  
> against any of the values in the Lexicon, since e.g. "blind" doesn't  
> match qr/three blind/.

I should have called it RegexpTermQuery. :-)  I still like the idea,  
though.


_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list