DBIx-TextIndex
view release on metacpan or search on metacpan
Added method all_doc_ids().
There is no longer any need to sort doc_ids before passing
to add_doc().
Added more tests. WARNING: MySQL database 'test' must be available
on localhost for tests to succeed.
UPGRADE WARNING: collection table format changed, new index table
collection_all_docs_vector added, some field names have changed in
inverted tables. Any indexes created with 0.11 or earlier will have
to be deleted and recreated.
UPGRADE WARNING: all symbols with "document" have been renamed to
"doc" for brevity. Methods have also been renamed, e.g. add_document()
is now add_doc(). The old method names will work, but are deprecated.
Replaced option 'language' with 'charset'. iso-8859-1 is the
default charset.
Added call to Text::Unaccent::unac_string (www.senga.org) to replace
accented characters with plain ASCII equivalent. Uses 'charset' option
to determine mapping.
UPGRADE WARNING: added structured exceptions using Exception::Class.
Calls to search() now have to be wrapped with eval blocks to catch
query exceptions.
=head2 0.11
Bug fix: HTML tags are now changed to a single space, instead of
empty string when indexing document. Prevents concatenation of words
in some cases.
=head2 0.10
Fixed collection table upgrade bug
=head2 0.09
Changed $MAX_WORD_LENGTH default to 20
Allow numbers to be indexed as words
Use HTML::Entities to decode entities in indexed documents, on by
default. Set option decode_html_entities to 0 to disable.
Use $dbh->tables to check for existence of tables (caution, may not
work in DBI > 1.30).
=head2 0.08
Bug fix: add_mask() was not inserting masks
=head2 0.07
UPGRADE WARNING: collection table format changed, use new method
$index->upgrade_collection_table() to recreate collection table.
Calling initialize() method for a new collection will also upgrade
collection table. Index backup recommended.
Added error_ prefix to error message column names in collection table
Added version column to collection table
Added language column to collection table, removed czech_language column
UPGRADE WARNING: instead of new({ czech_language => 1}), use
new({ language => 'cz' })
Bug fix: _store_collection_info() error if stop lists are not used
unscored_search() will now return a scalar error message
if an error occurs in search
search() will croak if passed an invalid field name to search on
Added documentation for mask operations
=head2 0.06
tripie's patch v2 updates:
- a bug in document removing proccess related to incorrect
'occurency' data updates when multi-field documents were removed,
was fixed. The methods remove_document() and _inverted_remove()
were affected.
- a bug related to wildcards in queries in form of "+word +next%"
or "+word% +next%" was fixed
- a bug related to "%" wildcards used while searching of
multi-field documents was fixed
- a bug related to stoplists and phrases that contain a
non-stoplisted word together with a stoplisted word was fixed
- a new full-featured solution of highligting of query words or
patterns in content of resulting documents was added
I've written a new module HTML::Highlight, that can be used
either independently or together with DBIx::TextIndex. Its
advantages include:
- it makes highlighting very easy
- takes phrases and wildcards into account
- supports diacritics insensitive highlighting for iso-8859-2
languages
- takes HTML tags into account. That means when a user searches
for for example 'font', than a FONT element in <FONT COLOR="red">
does not get "highlighted".
The module provides a very nice Google-like highlighting using
different colors for different words or phrases.
The module works together with DBIx::TextIndex using its new method
html_highlight().
The module can also be used to preview a context in which query
words appear in resulting documents.
( run in 0.634 second using v1.01-cache-2.11-cpan-75ffa21a3d4 )