[KinoSearch] FieldSpec/InvIndexSpec API
Marvin Humphrey
marvin at rectangular.com
Mon Nov 20 16:17:23 PST 2006
So, checking out what stuff would look like in YAML as opposed to XML...
Here's the .cfsmeta compound file description...
---
seg_name: _1
sub_files:
-
name: _1.tii
offset: 0
length: 440
-
name: _1.tis
offset: 440
length: 2709
Looks good. Parsing the name value pairs is cake. I have to wrap my
head around how to keep track of the indentation level and where each
data structure begins and ends, though.
Since there's no requirement that everything be housed within a root
node -- unlike XML -- I haven't included one.
The delqueue file...
---
files:
- _10.cfs
- _10.cfsmeta
- _11.cfs
- _11.cfsmeta
The lock file...
---
invindex: /path/to/invindex
The per-segment .delmeta file...
---
seg_name: _2
num_deletions: 5
byte_size: 1291
Those are all straightforward. Let's consider the big one,
invindex.meta...
---
analyzer: 'CustomAnalyzer'
fields:
'title':
number: 0
spec:
name: 'KinoSearch::Index::DefaultFieldSpec'
arguments:
boost: 1
indexed: 1
analyzed: 1
stored: 1
compressed: 0
vectorized: 1
'body':
number: 1
spec:
name: 'KinoSearch::Index::DefaultFieldSpec'
arguments:
boost: 2
indexed: 1
analyzed: 1
stored: 1
compressed: 0
vectorized: 1
'url':
number: 2
spec:
name: 'KinoSearch::Index::DefaultFieldSpec'
arguments:
boost: 1
indexed: 1
analyzed: 0
stored: 1
compressed: 0
vectorized: 0
'date':
number: 3
spec:
name: 'CustomFieldSpec'
I like it. While I was writing that, I was thinking ahead about just
how the FieldSpec API should work -- I wasn't distracted by the
challenge of visually parsing the data, as I would have been with
XML. It has the clarity of a straight-up name-value pair config file
would, but it maps onto a multi-level data structure of hashes and
arrays, and it's upwards-compatible with a reasonably popular spec.
I don't see any need for multi-line strings, or even double quoting
with C-style escapes, let alone any of YAML's more esoteric
extensions (like references). It's more complicated to write the
parser than it would have been for XML, but it's still doable.
Thoughts?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
More information about the KinoSearch
mailing list