[KinoSearch] Serialized Schema
Marvin Humphrey
marvin at rectangular.com
Sat Nov 3 14:06:39 PDT 2007
On Oct 5, 2007, at 6:17 AM, Peter Karman wrote:
> I'm not convinced you need XML; it's
> probably a little harder to read than YAML, but XML does have wider
> adoption at
> this point in history.
You're right that we don't need XML. The framework the XSD provides
is nice, but it's more than we require.
Furthermore, while I'm confident that I could write a basic round-
trip parser for handling KinoSearch-specific XML, I'm not confident
that I could write a water-tight spec and a parser that's guaranteed
to handle all possible corner cases generated by conforming
applications.
Using YAML presents slightly different problems. The full YAML spec
is sadly bloated; declaring that the InvIndex file format uses "YAML"
means that everyone who implements fully it needs a full-blown YAML
parser. To avoid that, we might want to limit allowable constructs,
but since there's no YAML equivalent of XSD, we have to add our own
ad hoc restrictions. That might be doable, but seems hackish and
fiddly.
Time to consider a third alternative: "All InvIndex metadata files
use UTF-8 encoded JSON."
The JSON spec is tiny compared to XML and YAML, but it's sufficient.
It has an official RFC (<http://www.ietf.org/rfc/rfc4627.txt>), and
we probably don't need to impose any additional constraints beyond
specifying the UTF-8 encoding and referring to the RFC -- though an
ASCII-only limitation might be worth considering.
Leaving everything to the JSON spec itself would impose the
requirement for a full-blown JSON parser on all fully conforming
apps. However, that's less onerous than a YAML or XML parser, and it
would still be possible to write a miniature subset parser a la devel
KinoSearch's current YAML parser.
> Guess it in part depends on (1) how hard it is to write
> your own parser for either,
There are several JSON parsers on CPAN. One of them seems to stand
out: JSON::XS. <http://search.cpan.org/perldoc?JSON%3A%3AXS> The
author, Marc Lehmann, is gruff, but knowledgeable. Our two main
concerns are that the JSON be correct and that the distro build
reliably. Glancing over the documentation, the test results, the
Changes file, and some of the code, it looks to be suitable for
adding as a dependency.
> and (2) if you have any philosophical agenda to
> promote.
My goal is to write a inverted index file format spec that is easy to
implement and easy to extend. Whether metadata gets encoded as YAML,
XML, or JSON is incidental.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list