[KinoSearch] Serialized Schema (was KinoSearch::FieldSpec::text)

Marvin Humphrey marvin at rectangular.com
Sat Sep 29 15:28:09 PDT 2007




On Sep 6, 2007, at 5:40 PM, Peter Karman wrote:

> On 9/6/07 5:26 PM, Marvin Humphrey wrote:
>
>> Peter, I know Swish works off of a configuration file.  What do  
>> you think of having Schema write out something analogous to the  
>> Swish config file during InvIndexer->finish?
>
> I think I am suffering a strange sense of deja vu all over again ;)
>
> http://www.rectangular.com/pipermail/kinosearch/2006-November/ 
> 000560.html

The difference between then and now is that back then I didn't think  
it was going to be possible to serialize a Schema well enough that  
you'd not need the original class.  In fact, at the time I regarded  
that insight as a liberation: if you were stuck providing the  
Analyzer externally, you might as well put a whole slew of stuff into  
classes.

I was also ill-informed about the security of supplying regular  
expressions via a potentially untrusted source: I didn't realize  
that /(?{$code})/  was disabled by default.  It wasn't until after  
that discussion that I became familiar with the 're' pragma.

> Seriously though, I think it sounds like a fine idea. Swish has 3  
> native field types: text, int and date (which is really just an int  
> that gets output as a timestamp string). All the info about those  
> fields is stored in the Swish-e index header. So doing something  
> similar in KS, with more robust field types, makes perfect sense to  
> me, especially when you talk about the index format in the context  
> of Lucy (which is what I assume you alluding to when you wrote  
> about accessing the index using other languages).

I'm not necessarily talking about Lucy.  What I'd like to do is write  
a formal spec for the "invindex" file format, opening things up for  
other apps.  Like the Lucene file format spec, except usable.

Since good programming is all about designing good data structures,  
formally defining the spec would be a useful exercise.  It might be  
nice to issue some RFCs on the Lucene list, PerlMonks, Swish list  
(?), etc.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list