Search-Tools
view release on metacpan or search on metacpan
- optimizations to HeatMap and Snipper sentence detection, which has the
nice side effect of avoiding breaking HTML entities in snipped HTML. To
take advantage, use as_sentences => 1.
0.77 15 Aug 2012
- add stemming support for Query->matches_html and Query->matches_text
- add HiLiter->html_stemmer with passthrough to plain_stemmer until
failing test cases materialize.
- some fixes for stemming support, mostly turning off optimizations based
on regular expressions.
0.76 7 Aug 2012
- finally(!) add real stemming tests and support to Snipper and HiLiter
0.75 6 Aug 2012
- add some tests for Perl 5.17.x test failures
- fix edge case where short snip generated spurious ellipses
0.74 21 May 2012
- yank some meta data from a test doc to avoid security scan problems on
CPAN
0.73 13 May 2012 (Happy Mothers Day)
- fix edge case with snipping phrases that contain non-word characters
other than spaces.
0.72 30 April 2012
- more fixes, similar to 0.71 (for now missing Keywords class)
0.71 28 Feb 2012
- fix failing tests due to removed classes in 0.70
0.70 23 Feb 2012
- refactor XML->escape for some performance gain
- remove long-deprecated Keywords classes
0.69 22 Feb 2012
- fix XML->escape() to preserve UTF-8 flag on the returned SV*
0.68 15 Jan 2012
- add missing dTHX macro per
https://rt.cpan.org/Ticket/Display.html?id=74022
0.67 12 Jan 2012
- bolster Tokenizer sentence detection, adding list of abbreviations from
Linga::EN::Tagger.
- fix missing 'lang' param for SpellCheck
- fix placement of dSP macro in tokenize() C func to properly scope stack
variables.
- add slurp() method to Search::Tools
0.66 05 Dec 2011
- undo 0.65 change, since HTML entities are case sensitive
(http://www.w3.org/TR/html4/charset.html#h-5.3.2)
0.65 02 Dec 2011
- lowercase named entity matches. patch from Adam Lesperance.
0.64 02 Dec 2011
- optimizations to regex matching in Query->matches and HiLiter
- according to Unicode spect \xfeff (BOM) is deprecated as whitespace
character in favor of \x2060. HTML whitespace definition changed
accordingly.
- fix edge case in HiLiter where match on single letter could cause
infinite loop.
- add Query->fields method to see the fields searched for.
- fix XML->unescape_named to support entities with \d in them, and
case-insensitive. https://rt.cpan.org/Ticket/Display.html?id=72904
0.63 06 Oct 2011
- change __func__ macro to use FUNCION__ instead since Perl core
implements that portable macro.
0.62 26 Aug 2011
- remove ';' as sentence boundary character (it was marked as TODO in
search-tools.c) because character entities use it (e.g. &).
0.61 29 July 2011
- add term_min_length option to QueryParser, to ignore terms unless then
are N chars or longer. Useful for skipping single-character words when
Snipping or HiLiting. For backwards compatibility the default is 1.
- fix treat_uris_like_phrases regex to add / character in addition to @.\
0.60 13 July 2011
- fix whitespace def to include (broke HTML::HiLiter)
0.59 19 June 2011
- add normalize_whitespace feature to XML->no_html() method.
- add several Unicode whitespace defs to $whitespace regex in XML class
per http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters
0.58 27 May 2011
- fix unescaped string in regex in HiLiter
0.57 22 Feb 2011
- extend bug-fix from 0.56 to prevent false matches on match markers.
0.56 10 Feb 2011
- fix bug where query terms 'span' or 'style' were breaking hiliting by
"double-dipping"
0.55 25 Oct 2010
- disable one more test for perl >= 5.14 (see 0.54)
0.54 24 Oct 2010
- fixes for Search::Query 0.18
- disabled some tests that break under perl >= 5.14. See
https://rt.cpan.org/Ticket/Display.html?id=62417
0.53 26 June 2010
- add ->matches_text and ->matches_html methods to Query class
0.52 22 June 2010
- tweek locale tests because some OSes (linux) use 'UTF8' instead of
'UTF-8' naming.
- small optimizations to HiLiter
0.51 23 May 2010
- singularizer in XML->perl_to_xml will now treat common English plurals
0.50 19 May 2010
( run in 1.731 second using v1.01-cache-2.11-cpan-39bf76dae61 )