[KinoSearch] KinoSearch 0.30_01

Marvin Humphrey marvin at rectangular.com
Thu Jun 18 22:26:26 PDT 2009


Greets,

I'm pleased to announce that KinoSearch 0.30_01 has been uploaded to CPAN.

It's been a little while, so the entry from the Changes file has some heft 
to it:

  Highlights:

    * Many new classes and methods.
    * Improved Searcher open times and decreased process memory footprint.
    * Improved sorting support.
    * Improved subclassing support.
    * Improved indexing speed.
    * Schemas serialized and stored with indexes.
    * Improved pluggability.
    * Expanded tutorial documentation.
    * Restored Windows compatibility.

  New public classes:

    * KinoSearch::Architecture
    * KinoSearch::Doc
    * KinoSearch::Doc::HitDoc
    * KinoSearch::Indexer (replaces InvIndexer)
    * KinoSearch::FieldType (replaces FieldSpec)
    * KinoSearch::FieldType::BlobField
    * KinoSearch::FieldType::FullTextField (replaces FieldSpec::text)
    * KinoSearch::FieldType::StringField
    * KinoSearch::Highlight::HeatMap
    * KinoSearch::Index::DataReader
    * KinoSearch::Index::DataWriter
    * KinoSearch::Index::DocReader
    * KinoSearch::Index::Lexicon
    * KinoSearch::Index::LexiconReader
    * KinoSearch::Index::PolyReader
    * KinoSearch::Index::PostingList
    * KinoSearch::Index::PostingsReader
    * KinoSearch::Index::Segment
    * KinoSearch::Index::SegReader
    * KinoSearch::Index::SegWriter
    * KinoSearch::Index::Snapshot
    * KinoSearch::Obj
    * KinoSearch::Search::ANDQuery
    * KinoSearch::Search::Compiler
    * KinoSearch::Search::HitCollector
    * KinoSearch::Search::HitCollector::BitCollector
    * KinoSearch::Search::LeafQuery
    * KinoSearch::Search::MatchAllQuery
    * KinoSearch::Search::Matcher
    * KinoSearch::Search::NoMatchQuery
    * KinoSearch::Search::NOTQuery
    * KinoSearch::Search::ORQuery
    * KinoSearch::Search::PolyQuery
    * KinoSearch::Search::RangeQuery (replaces RangeFilter)
    * KinoSearch::Search::RequiredOptionalQuery
    * KinoSearch::Search::SortRule (factored out of SortSpec)
    * KinoSearch::Search::Span
    * KinoSearch::Util::BitVector
    * KSx::Index::ByteBufDocReader
    * KSx::Index::ByteBufDocWriter
    * KSx::Index::ZlibDocReader
    * KSx::Index::ZlibDocWriter
    * KSx::Search::MockScorer

  New/updated documentation:

    * KinoSearch::Docs::Tutorial::Simple            (updated)
    * KinoSearch::Docs::Tutorial::BeyondSimple      (updated)
    * KinoSearch::Docs::Tutorial::FieldType         (new)
    * KinoSearch::Docs::Tutorial::Analysis          (new)
    * KinoSearch::Docs::Tutorial::Highlighter       (new)
    * KinoSearch::Docs::Tutorial::QueryObjects      (new)
    * KinoSearch::Docs::Cookbook::CustomQuery       (new)
    * KinoSearch::Docs::Cookbook::CustomQueryParser (new)
    * KinoSearch::Docs::DocIDs                      (new)

  Removed/redacted/replaced:

    * KinoSearch::Analysis::Token - redacted pending API overhaul.
    * KinoSearch::Analysis::TokenBatch - redacted pending API overhaul.
    * KinoSearch::Docs::DevGuide - removed.
    * KinoSearch::FieldSpec - replaced by FieldType.
    * KinoSearch::FieldSpec::text - replaced by FullTextType and StringType.
    * KinoSearch::Highlight::Encoder - rolled into Highlighter.
    * KinoSearch::Highlight::Formatter - rolled into Highlighter.
    * KinoSearch::Highlight::SimpleHTMLEncoder - rolled into Highlighter.
    * KinoSearch::Highlight::SimpleHTMLFormatter - rolled into Highlighter.
    * KinoSearch::Index::Term - removed.  Now any object can be a term.
    * KinoSearch::InvIndex - removed.
    * KinoSearch::InvIndexer - replaced by Indexer.
    * KinoSearch::Posting - redacted pending API overhaul.
    * KinoSearch::Posting::MatchPosting - redacted pending API overhaul.
    * KinoSearch::Posting::RichPosting - redacted pending API overhaul.
    * KinoSearch::Posting::ScorePosting - redacted pending API overhaul.
    * KinoSearch::Search::BooleanQuery - replaced by ANDQuery, ORQuery,
      NOTQuery, and RequiredOptionalQuery.
    * KinoSearch::Search::Filter - removed.  Filtering can now be achieved via
      ANDQuery, NOTQuery, etc.
    * KinoSearch::Search::PolyFilter - removed.
    * KinoSearch::Search::QueryFilter - replaced by KSx::Search::Filter
    * KinoSearch::Search::RangeFilter - replaced by RangeQuery.
    * KinoSearch::Util::Class - removed.
    * KinoSearch::Util::ToolSet - permanently redacted.

  Renamed:

    * KinoSearch::Analysis::LCNormalizer => KinoSearch::Analysis::CaseFolder
    * KinoSearch::Search::SearchServer   => KSx::Remote::SearchServer
    * KinoSearch::Search::SearchClient   => KSx::Remote::SearchClient
    * KinoSearch::Simple                 => KSx::Simple
    * KinoSearch::Search::MultiSearcher  => KinoSearch::Search::PolySearcher

  API Changes:

    * KinoSearch::Analysis::Analyzer
      o analyze_batch() - redacted pending API overhaul.

    * KinoSearch::Analysis::PolyAnalyzer
      o get_analyzers() - added.

    * KinoSearch::Analysis::Tokenizer
      o new() - parameter "token_re" replaced by "pattern".

    * KinoSearch::Highlight::Highlighter
      o Highlighter objects are now single-field.
      o Fields must now be marked as "highlightable" at index time via
        their FieldType.
      o Excerpts are now created manually rather than automatically inserted
        via the Hits class.
      o new() - now takes four params instead of none: "searchable", "field",
        "query", and "excerpt_length".
      o add_spec() - removed.
      o create_excerpt(), highlight(), encode(), set_pre_tag(), get_pre_tag(),
        set_post_tag(), get_post_tag(), get_searchable(), get_query(),
        get_compiler(), get_excerpt_length(), get_field - added.

    * KinoSearch::Index::IndexReader
      o open() - takes an "index" (string filepath or Folder object) instead
        of an "invindex", plus an optional "snapshot".  Always returns a
        PolyReader (instead of an unspecified IndexReader subclass).
      o max_doc() - replaced by doc_max(), which has slightly different
        semantics since doc ids now start at 1 rather than 0.
      o num_docs() - renamed to doc_count().
      o del_count(), seg_readers(), offsets(), fetch(), obtain() - added.

    * KinoSearch::Indexer (replaces KinoSearch::InvIndexer)
      o new() - parameters changed.  Old: "invindex", "lock_factory".  New:
        "schema", "index", "create", "truncate", "lock_factory".
      o add_doc() - now takes either a hash ref or a Doc object, and
        optionally takes labeled params.
      o finish() - refactored into commit(), prepare_commit(), and optimize().
      o add_invindexes() - replaced by add_index().
      o delete_by_term() - now takes labeled parameters rather than positional
        args.
      o delete_by_query() - added.
      
      takes "index" (a string filepath or Folder object),
      "lock_factory", and 

    * KinoSearch::QueryParser
      o tree(), expand(), expand_leaf(), prune(), make_term_query(),
        make_phrase_query(), make_and_query(), make_or_query(),
        make_not_query(), make_req_opt_query() - added.

    * KinoSearch::Schema
      o No longer an abstract class.
      o "%fields" hash eliminated.
      o Now gets serialized as JSON and stored with index.
      o clobber(), open(), read() - removed.
      o analyzer() - removed.
      o similarity() - removed.
      o pre_sort() - removed.
      o add_field() - replaced by spec_field(), which associates a field name
        with a FieldType object rather than a class name.
      o num_fields(), all_fields(), fetch_type(), fetch_sim(), architecture(),
        get_architecture(), get_similarity() - added.

    * KinoSearch::Search::Hits
      o fetch_hit_hashref() - replaced by next(), which return a HitDoc by
        default.
      o create_excerpts() - removed.

    * KinoSearch::Search::PhraseQuery
      o new() - now takes params "field" and "terms".
      o add_term() - removed.
      o get_field(), get_terms() - added.

    * KinoSearch::Search::PolySearcher (formerly MultiSearcher)
      o Now supports SortSpec.

    * KinoSearch::Search::Query
      o make_compiler() - added.

    * KinoSearch::Search::Searchable
      o search() - renamed to hits().
      o new(), glean_query(), get_schema(), collect(), doc_max(), doc_freq(),
        fetch_doc() - added.

    * KinoSearch::Search::SortSpec
      o new() - takes new param "rules", an array of SortRules.
      o add() - removed.

    * KinoSearch::Search::TermQuery
      o new() - now takes "field", and "term" (which is a string rather than a
        Term object as before).

    * KinoSearch::Searcher
      o new() - now takes "index" (a string filepath, a Folder object, or an
        IndexReader object), rather than "invindex" or "reader".
      o search() - renamed to hits().
      o set_prune_factor() - removed.
      o collect(), doc_max(), doc_freq(), fetch_doc(), get_schema() - added.

  Subclassing improvements:

    * Although KinoSearch is now implemented almost entirely in C, pure-Perl
      dynamic subclassing is supported.  (Public methods which are overridden
      in pure-Perl subclasses are automatically detected and invoked as
      callbacks by the the internal KS object engine.)

  Significant internal changes:

    * All classes now implemented in C, with Perl and XS only where necessary.
    * Doc IDs now start at 1 rather than 0.


Enjoy!

Marvin Humphrey





More information about the kinosearch mailing list