[KinoSearch] Serialized Schema (was KinoSearch::FieldSpec::text)
Nathan Kurz
nate at verse.com
Fri Sep 7 13:24:41 PDT 2007
On 9/7/07, Marvin Humphrey <marvin at rectangular.com> wrote:
> My main goal
> with serializing Schema is to make the invindex file format self-
> describing, so that it becomes possible to read one without the need
> for any auxiliary information.
Thanks for the explanation. I understand better now.
I think I agree with all of that, with the small exception that I
don't think you gain much by procedurally specify the tokenizer. I
think specifying it as
"tokenizer: whitespace" and letting the reader handle the
implementation is wiser than specifying a split on "\S+".
If you are trying to be language-agnostic, requiring the reader to be
able to handle what could be arbitrary expressions in a particular
regexp language seems onerous, even if it is a pretty standard one.
In particular, I can see wanting a straight C implementation using
flex rather than a regexp library.
I don't feel strongly about this, though, since if one really wants to
do this one could just do it non-portably.
Nathan Kurz
nate at verse.com
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list