[KinoSearch] KinoSearch 0.30_01
Marvin Humphrey
marvin at rectangular.com
Thu Jun 18 22:26:26 PDT 2009
Greets,
I'm pleased to announce that KinoSearch 0.30_01 has been uploaded to CPAN.
It's been a little while, so the entry from the Changes file has some heft
to it:
Highlights:
* Many new classes and methods.
* Improved Searcher open times and decreased process memory footprint.
* Improved sorting support.
* Improved subclassing support.
* Improved indexing speed.
* Schemas serialized and stored with indexes.
* Improved pluggability.
* Expanded tutorial documentation.
* Restored Windows compatibility.
New public classes:
* KinoSearch::Architecture
* KinoSearch::Doc
* KinoSearch::Doc::HitDoc
* KinoSearch::Indexer (replaces InvIndexer)
* KinoSearch::FieldType (replaces FieldSpec)
* KinoSearch::FieldType::BlobField
* KinoSearch::FieldType::FullTextField (replaces FieldSpec::text)
* KinoSearch::FieldType::StringField
* KinoSearch::Highlight::HeatMap
* KinoSearch::Index::DataReader
* KinoSearch::Index::DataWriter
* KinoSearch::Index::DocReader
* KinoSearch::Index::Lexicon
* KinoSearch::Index::LexiconReader
* KinoSearch::Index::PolyReader
* KinoSearch::Index::PostingList
* KinoSearch::Index::PostingsReader
* KinoSearch::Index::Segment
* KinoSearch::Index::SegReader
* KinoSearch::Index::SegWriter
* KinoSearch::Index::Snapshot
* KinoSearch::Obj
* KinoSearch::Search::ANDQuery
* KinoSearch::Search::Compiler
* KinoSearch::Search::HitCollector
* KinoSearch::Search::HitCollector::BitCollector
* KinoSearch::Search::LeafQuery
* KinoSearch::Search::MatchAllQuery
* KinoSearch::Search::Matcher
* KinoSearch::Search::NoMatchQuery
* KinoSearch::Search::NOTQuery
* KinoSearch::Search::ORQuery
* KinoSearch::Search::PolyQuery
* KinoSearch::Search::RangeQuery (replaces RangeFilter)
* KinoSearch::Search::RequiredOptionalQuery
* KinoSearch::Search::SortRule (factored out of SortSpec)
* KinoSearch::Search::Span
* KinoSearch::Util::BitVector
* KSx::Index::ByteBufDocReader
* KSx::Index::ByteBufDocWriter
* KSx::Index::ZlibDocReader
* KSx::Index::ZlibDocWriter
* KSx::Search::MockScorer
New/updated documentation:
* KinoSearch::Docs::Tutorial::Simple (updated)
* KinoSearch::Docs::Tutorial::BeyondSimple (updated)
* KinoSearch::Docs::Tutorial::FieldType (new)
* KinoSearch::Docs::Tutorial::Analysis (new)
* KinoSearch::Docs::Tutorial::Highlighter (new)
* KinoSearch::Docs::Tutorial::QueryObjects (new)
* KinoSearch::Docs::Cookbook::CustomQuery (new)
* KinoSearch::Docs::Cookbook::CustomQueryParser (new)
* KinoSearch::Docs::DocIDs (new)
Removed/redacted/replaced:
* KinoSearch::Analysis::Token - redacted pending API overhaul.
* KinoSearch::Analysis::TokenBatch - redacted pending API overhaul.
* KinoSearch::Docs::DevGuide - removed.
* KinoSearch::FieldSpec - replaced by FieldType.
* KinoSearch::FieldSpec::text - replaced by FullTextType and StringType.
* KinoSearch::Highlight::Encoder - rolled into Highlighter.
* KinoSearch::Highlight::Formatter - rolled into Highlighter.
* KinoSearch::Highlight::SimpleHTMLEncoder - rolled into Highlighter.
* KinoSearch::Highlight::SimpleHTMLFormatter - rolled into Highlighter.
* KinoSearch::Index::Term - removed. Now any object can be a term.
* KinoSearch::InvIndex - removed.
* KinoSearch::InvIndexer - replaced by Indexer.
* KinoSearch::Posting - redacted pending API overhaul.
* KinoSearch::Posting::MatchPosting - redacted pending API overhaul.
* KinoSearch::Posting::RichPosting - redacted pending API overhaul.
* KinoSearch::Posting::ScorePosting - redacted pending API overhaul.
* KinoSearch::Search::BooleanQuery - replaced by ANDQuery, ORQuery,
NOTQuery, and RequiredOptionalQuery.
* KinoSearch::Search::Filter - removed. Filtering can now be achieved via
ANDQuery, NOTQuery, etc.
* KinoSearch::Search::PolyFilter - removed.
* KinoSearch::Search::QueryFilter - replaced by KSx::Search::Filter
* KinoSearch::Search::RangeFilter - replaced by RangeQuery.
* KinoSearch::Util::Class - removed.
* KinoSearch::Util::ToolSet - permanently redacted.
Renamed:
* KinoSearch::Analysis::LCNormalizer => KinoSearch::Analysis::CaseFolder
* KinoSearch::Search::SearchServer => KSx::Remote::SearchServer
* KinoSearch::Search::SearchClient => KSx::Remote::SearchClient
* KinoSearch::Simple => KSx::Simple
* KinoSearch::Search::MultiSearcher => KinoSearch::Search::PolySearcher
API Changes:
* KinoSearch::Analysis::Analyzer
o analyze_batch() - redacted pending API overhaul.
* KinoSearch::Analysis::PolyAnalyzer
o get_analyzers() - added.
* KinoSearch::Analysis::Tokenizer
o new() - parameter "token_re" replaced by "pattern".
* KinoSearch::Highlight::Highlighter
o Highlighter objects are now single-field.
o Fields must now be marked as "highlightable" at index time via
their FieldType.
o Excerpts are now created manually rather than automatically inserted
via the Hits class.
o new() - now takes four params instead of none: "searchable", "field",
"query", and "excerpt_length".
o add_spec() - removed.
o create_excerpt(), highlight(), encode(), set_pre_tag(), get_pre_tag(),
set_post_tag(), get_post_tag(), get_searchable(), get_query(),
get_compiler(), get_excerpt_length(), get_field - added.
* KinoSearch::Index::IndexReader
o open() - takes an "index" (string filepath or Folder object) instead
of an "invindex", plus an optional "snapshot". Always returns a
PolyReader (instead of an unspecified IndexReader subclass).
o max_doc() - replaced by doc_max(), which has slightly different
semantics since doc ids now start at 1 rather than 0.
o num_docs() - renamed to doc_count().
o del_count(), seg_readers(), offsets(), fetch(), obtain() - added.
* KinoSearch::Indexer (replaces KinoSearch::InvIndexer)
o new() - parameters changed. Old: "invindex", "lock_factory". New:
"schema", "index", "create", "truncate", "lock_factory".
o add_doc() - now takes either a hash ref or a Doc object, and
optionally takes labeled params.
o finish() - refactored into commit(), prepare_commit(), and optimize().
o add_invindexes() - replaced by add_index().
o delete_by_term() - now takes labeled parameters rather than positional
args.
o delete_by_query() - added.
takes "index" (a string filepath or Folder object),
"lock_factory", and
* KinoSearch::QueryParser
o tree(), expand(), expand_leaf(), prune(), make_term_query(),
make_phrase_query(), make_and_query(), make_or_query(),
make_not_query(), make_req_opt_query() - added.
* KinoSearch::Schema
o No longer an abstract class.
o "%fields" hash eliminated.
o Now gets serialized as JSON and stored with index.
o clobber(), open(), read() - removed.
o analyzer() - removed.
o similarity() - removed.
o pre_sort() - removed.
o add_field() - replaced by spec_field(), which associates a field name
with a FieldType object rather than a class name.
o num_fields(), all_fields(), fetch_type(), fetch_sim(), architecture(),
get_architecture(), get_similarity() - added.
* KinoSearch::Search::Hits
o fetch_hit_hashref() - replaced by next(), which return a HitDoc by
default.
o create_excerpts() - removed.
* KinoSearch::Search::PhraseQuery
o new() - now takes params "field" and "terms".
o add_term() - removed.
o get_field(), get_terms() - added.
* KinoSearch::Search::PolySearcher (formerly MultiSearcher)
o Now supports SortSpec.
* KinoSearch::Search::Query
o make_compiler() - added.
* KinoSearch::Search::Searchable
o search() - renamed to hits().
o new(), glean_query(), get_schema(), collect(), doc_max(), doc_freq(),
fetch_doc() - added.
* KinoSearch::Search::SortSpec
o new() - takes new param "rules", an array of SortRules.
o add() - removed.
* KinoSearch::Search::TermQuery
o new() - now takes "field", and "term" (which is a string rather than a
Term object as before).
* KinoSearch::Searcher
o new() - now takes "index" (a string filepath, a Folder object, or an
IndexReader object), rather than "invindex" or "reader".
o search() - renamed to hits().
o set_prune_factor() - removed.
o collect(), doc_max(), doc_freq(), fetch_doc(), get_schema() - added.
Subclassing improvements:
* Although KinoSearch is now implemented almost entirely in C, pure-Perl
dynamic subclassing is supported. (Public methods which are overridden
in pure-Perl subclasses are automatically detected and invoked as
callbacks by the the internal KS object engine.)
Significant internal changes:
* All classes now implemented in C, with Perl and XS only where necessary.
* Doc IDs now start at 1 rather than 0.
Enjoy!
Marvin Humphrey
More information about the kinosearch
mailing list