[KinoSearch] Serialized Schema

Marvin Humphrey marvin at rectangular.com
Sat Nov 3 14:06:39 PDT 2007




On Oct 5, 2007, at 6:17 AM, Peter Karman wrote:

> I'm not convinced you need XML; it's
> probably a little harder to read than YAML, but XML does have wider  
> adoption at
> this point in history.

You're right that we don't need XML.  The framework the XSD provides  
is nice, but it's more than we require.

Furthermore, while I'm confident that I could write a basic round- 
trip parser for handling KinoSearch-specific XML, I'm not confident  
that I could write a water-tight spec and a parser that's guaranteed  
to handle all possible corner cases generated by conforming  
applications.

Using YAML presents slightly different problems.  The full YAML spec  
is sadly bloated; declaring that the InvIndex file format uses "YAML"  
means that everyone who implements fully it needs a full-blown YAML  
parser.  To avoid that, we might want to limit allowable constructs,  
but since there's no YAML equivalent of XSD, we have to add our own  
ad hoc restrictions.  That might be doable, but seems hackish and  
fiddly.

Time to consider a third alternative: "All InvIndex metadata files  
use UTF-8 encoded JSON."

The JSON spec is tiny compared to XML and YAML, but it's sufficient.   
It has an official RFC (<http://www.ietf.org/rfc/rfc4627.txt>), and  
we probably don't need to impose any additional constraints beyond  
specifying the UTF-8 encoding and referring to the RFC -- though an  
ASCII-only limitation might be worth considering.

Leaving everything to the JSON spec itself would impose the  
requirement for a full-blown JSON parser on all fully conforming  
apps.  However, that's less onerous than a YAML or XML parser, and it  
would still be possible to write a miniature subset parser a la devel  
KinoSearch's current YAML parser.

> Guess it in part depends on (1) how hard it is to write
> your own parser for either,

There are several JSON parsers on CPAN.  One of them seems to stand  
out: JSON::XS.  <http://search.cpan.org/perldoc?JSON%3A%3AXS>  The  
author, Marc Lehmann, is gruff, but knowledgeable.  Our two main  
concerns are that the JSON be correct and that the distro build  
reliably.  Glancing over the documentation, the test results, the  
Changes file, and some of the code, it looks to be suitable for  
adding as a dependency.

> and (2) if you have any philosophical agenda to
> promote.

My goal is to write a inverted index file format spec that is easy to  
implement and easy to extend.  Whether metadata gets encoded as YAML,  
XML, or JSON is incidental.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list