KinoSearch::Docs::DevGuide - Hacking/debugging KinoSearch.
Developer-only documentation. If you just want to build a search engine, you probably don't need to read this.
Most of the Perl classes in KinoSearch rely on KinoSearch::Util::Class and KinoSearch::Util::ToolSet.
At the C level, inheritance is implemented using the devel/boilerplater.pl
utility. The base class is KinoSearch::Util::Obj. If what's going on is not
immediately apparent to you after spelunking a few files in the c_src
directory, see boilerplater's documentation.
There are three access levels in KinoSearch.
All Perl member variables are treated as private. Multiple classes defined within a single source-code file may use direct access to get at each others member variables. Everybody else has to use accessor methods.
All C-struct member variables allow distro-level access. C vars can have a more permissive scheme because C structs don't suffer from the problem of autovivification of misspelled names. This does tend to encourage tight binding between classes, which is unfortunate but manageable so long as the bad designs are purely internal. In the future, it may make sense to make C-vars private by default, but introduce voluntary conventions for identifying protected and distro-level members.
Hash-style argument lists are verified to ensure that no parameter label has been misspelled. Stronger validation is performed ad hoc.
KinoSearch's public API is defined by what you get when you run the suite through a well-behaved pod-to-whatever converter. Developer-only documentation is limited to comments and "invisible" =for/=begin POD blocks.
XS code in KinoSearch is stored faux-Inline-style, after an
__END__ token, and delimited by __XS__. and __POD__. A heavily
customized Build.PL detects these code blocks and writes out hard files at
install-time, so the inlining is mostly for convenience while editing: the XS
code is often tightly coupled to the Perl code in a given module, and having
everything in one place makes it easier to see what's going on and move things
back and forth.
The content of KinoSearch.xs consists of the XS block from KinoSearch.pm, followed by all the other XS blocks in an undetermined order. Ultimately, only a single compiled library gets installed along with the Perl modules.
At runtime, the only module which calls XSLoader::load is KinoSearch. Because
the KinoSearch MODULE has many PACKAGEs, use KinoSearch; loads all
of the XS routines in the entire KinoSearch suite. A pure-Perl version of
KinoSearch.pm which did the same thing might look like this...
package KinoSearch; our $VERSION = 1.0; package KinoSearch::Index::TermInfo; sub get_doc_freq { # ... } package KinoSearch::Store::InStream; sub lu_read { # ... } # ...
To maximize clarity, when possible XS in KinoSearch is limited to "glue" code, while Perl and C do the heavy lifting. Exceptions occur when XS functions need to manipulate the Perl stack, for instance when returning more than one value.
Given pure-ASCII source material, KinoSearch 0.05 produced indexes that could be read by Java Lucene 1.4.3 and vice versa. That was the high watermark for Lucene compatibility.
The file-format changed in version 0.06, the API was never that close, and KinoSearch 0.20 represents a further break both in terms of API and file format.
It has turned out to be impossible to provide full Lucene compatibility without making extraordinary sacrifices in both performance and code complexity -- so we have moved on without looking back.
When possible, KinoSearch's Perl code follows the recommendations set out in Damian Conway's book, "Perl Best Practices", and its XS/C code follows Apache's guidelines.
Perl code is auto-formatted using a PerlTidy-based helper app called kinotidy, which is basically perltidy with a profile set up to use the PBP settings.
It would be nice if there were a formatter for XS and C code that was as good as PerlTidy. Since there isn't, the code is manually set to look as though it had been, with one important difference: a bias towards maximum parenthetical tightness.
In both Perl and XS/C, code is organized into commented paragraphs a few lines in length, as per PBP recommendations. Strong efforts are made to keep the comment to a single line. Stupefyingly obvious "code narration" comments are used when something more literate doesn't present itself -- the goal is to be able to grok the intended flow of a function by scanning the first line of each "paragraph" -- especially when the paragraph-summarizing comments are set off by syntax highlighting in a programmer's text editor.
Copyright 2005-2007 Marvin Humphrey
See KinoSearch version 0.20.