[KinoSearch] Serialized Schema (was KinoSearch::FieldSpec::text)
Marvin Humphrey
marvin at rectangular.com
Sat Sep 29 15:28:09 PDT 2007
On Sep 6, 2007, at 5:40 PM, Peter Karman wrote:
> On 9/6/07 5:26 PM, Marvin Humphrey wrote:
>
>> Peter, I know Swish works off of a configuration file. What do
>> you think of having Schema write out something analogous to the
>> Swish config file during InvIndexer->finish?
>
> I think I am suffering a strange sense of deja vu all over again ;)
>
> http://www.rectangular.com/pipermail/kinosearch/2006-November/
> 000560.html
The difference between then and now is that back then I didn't think
it was going to be possible to serialize a Schema well enough that
you'd not need the original class. In fact, at the time I regarded
that insight as a liberation: if you were stuck providing the
Analyzer externally, you might as well put a whole slew of stuff into
classes.
I was also ill-informed about the security of supplying regular
expressions via a potentially untrusted source: I didn't realize
that /(?{$code})/ was disabled by default. It wasn't until after
that discussion that I became familiar with the 're' pragma.
> Seriously though, I think it sounds like a fine idea. Swish has 3
> native field types: text, int and date (which is really just an int
> that gets output as a timestamp string). All the info about those
> fields is stored in the Swish-e index header. So doing something
> similar in KS, with more robust field types, makes perfect sense to
> me, especially when you talk about the index format in the context
> of Lucy (which is what I assume you alluding to when you wrote
> about accessing the index using other languages).
I'm not necessarily talking about Lucy. What I'd like to do is write
a formal spec for the "invindex" file format, opening things up for
other apps. Like the Lucene file format spec, except usable.
Since good programming is all about designing good data structures,
formally defining the spec would be a useful exercise. It might be
nice to issue some RFCs on the Lucene list, PerlMonks, Swish list
(?), etc.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list