HTML-Parser

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

         be reported if requested via the 'attr' or 'tokens' argspecs
         for the 'end' handler.
       - Parse '</:comment>' and '</ comment>' as comments unless
         strict_comment is enabled.  Previous versions of the parser
         would report these as text.  If these comments contain
         quoted words prefixed by space or '=' these words can
         contain '>' without terminating the comment.
       - Parse '<! "<>" foo>' as comment containing ' "<>" foo'.
         Previous versions of the parser would terminate the comment
         at the first '>' and report the rest as text.
       - Legacy comment mode:  Parse with comments terminated with a
         lone '>' if no '-->' is found before eof.
       - Incomplete tag at eof is reported as a 'comment' instead
         of 'text' unless strict_comment is enabled.

3.28     2003-04-16
  * When 'strict_comment' is off (which it is by default)
     treat anything that matches <!...> a comment.
  * Should now be more efficient on threaded perls.

3.27     2003-01-18
  * Typo fixes to the documentation.
  * HTML::Entities::escape_entities_numeric contributed
     by Sean M. Burke <sburke@cpan.org>.
  * Included one more example program 'hlc' that show
     how to downcase all tags in an HTML file.

3.26     2002-03-17
  * Avoid core dump in some cases where the callback croaks.
     The perl_call_method and perl_call_sv needs G_EVAL flag
     to be safe.
  * New parser attributes; 'attr_encoded' and 'case_sensitive'.
     Contributed by Guy Albertelli II <guy@albertelli.com>.
  * HTML::Entities
         - don't encode \r by default as suggested by Sean M. Burke.
  * HTML::HeadParser
         - ignore empty http-equiv
         - allow multiple <link> elements.  Patch by
           Timur I. Bakeyev <timur@gnu.org>
  * Avoid warnings from bleadperl on the uentities test.

3.25     2001-05-11
  * Minor tweaks for build failures on perl5.004_04, perl-5.6.0,
     and for macro clash under Windows.
  * Improved parsing of <plaintext>...  :-)

3.24     2001-05-09
  * $p->parse(CODE)
  * New events: start_document, end_document
  * New argspecs: skipped_text, offset_end
  * The offset/line/column counters was not properly reset
     after eof.

3.23     2001-05-01
  * If the $p->ignore_elements filter did not work as it should if
     handlers for start/end events was not registered.

3.22     2001-04-17
  * The <textarea> element is now parsed in literal mode, i.e. no other tags
     recognized until the </textarea> tag is seen.  Unlike other literal elements,
     the text content is not 'cdata'.
  * The XML &apos; entity is decoded.  It apos-char itself is still encoded as
     &#39; as &apos; is not really an HTML tag, and not recognized by many HTML
     browsers.

3.21     2001-04-10
  * Fix a memory leak which occurred when using filter methods.
  * Avoid a few compiler warnings (DEC C):
        - Trailing comma found in enumerator list
        - "unsigned char" is not compatible with "const char".
  * Doc update.

3.20     2001-04-02
  * Some minor documentation updates.

3.19_94     2001-03-30
  * Implemented 'tag', 'line', 'column' argspecs.
  * HTML::PullParser doc update.
     eg/hform is an example of HTML::PullParser usage.

3.19_93     2001-03-27
  * Shorten 'report_only_tags' to 'report_tags'.
     I think it reads better.
  * Bleadperl portability fixes.

3.19_92     2001-03-25
  * HTML::HeadParser made more efficient by using 'ignore_elements'.
  * HTML::LinkExtor made more efficient by using 'report_only_tags'.
  * HTML::TokeParser generalized into HTML::PullParser.  HTML::PullParser
     only support the get_token/unget_token interface of HTML::TokeParser,
     but is more flexible because the information that make up an token
     is customisable.  HTML::TokeParser is made into an HTML::PullParser
     subclass.

3.19_91     2001-03-19
  * Array references can be passed to the filter methods.  Makes it easier
     to use them as constructor options.
  * Example programs updated to use filters.
  * Reset ignored_element state on EOF.
  * Documentation updates.
  * The netscape_buggy_comment() method now generates mandatory warning
     about its deprecation.

3.19_90     2001-03-13
  * This is an developer only release.  It contains some new
     experimental features.  The interface to these might still change.
  * Implemented filters to reduce the numbers of callbacks generated:
        - $p->ignore_tags()
        - $p->report_only_tags()
        - $p->ignore_elements()
  * New @attr argspec.  Less overhead than 'attr' and allow
     compatibility with XML::Parser style start events.
  * The whole argspec can be wrapped up in @{...} to signal
     flattening.  Only makes a difference when the target is an
     array.

3.19     2001-03-09
  * Avoid the entity2char global.  That should make the module
     more thread safe.   Patch by Gurusamy Sarathy <gsar@ActiveState.com>.

3.18     2001-02-24

Changes  view on Meta::CPAN


3.04     2000-01-15
  * Backed out 3.03-patch that checked for legal handler and attribute
    names in the HTML::Parser constructor.
  * Documentation typo fixed by Michael.

3.03     2000-01-14
  * We did not get out of comment mode for comments ending with an
    odd number of "-" before ">".  Patch by la mouton <kero@3sheep.com>
  * Documentation patch by Michael.

3.02     1999-12-21
  * Hide ~-magic IV-pointer to 'struct p_state' behind a reference.
    This allow copying of the internal _hparser_xs_state element, and
    will make HTML-Tree-0.61 work again.
  * Introduced $p->init() which might be useful for subclasses that
    only want the initialization part of the constructor.
  * Filled out DIAGNOSTICS section of the HTML::Parser POD.

3.01     1999-12-19
  * Rely on ~-magic instead of a DESTROY method to deallocate
    the internal 'struct p_state'.  This avoid memory leaks
    when people simply wipe of the content of the object hash.
  * One of the assertion in hparser.c had opposite logic.  This made
    the parser fail when compiled with a -DDEBUGGING perl.
  * Don't assume any specific order of hash keys in the t/cases.t.
    This test failed with some newer development releases of perl.

3.00     1999-12-14
  * Documentation update (most of it from Michael)
  * Minor patch to eg/hstrip so that it use a "" handler
    instead of &ignore.
  * Test suite patches from Michael

2.99_96     1999-12-13
  * Patches from Michael:
    - A handler of "" means that the event will be ignored.
      More efficient than using 'sub {}' as handler.
    - Don't use a perl hash for looking up argspec keywords.
    - Documentation tweaks.

2.99_95     1999-12-09
  * (this is a 3.00 candidate)
  * Fixed core dump when "<" was followed by an 8-bit character.
    Spotted and test case provided by Doug MacEachern.  Doug had
    been running HTML-Parser-XS through more that 1 million urls that
    had been downloaded via LWP.
  * Handlers can now invoke $p->eof to request the parsing to terminate.
    HTML::HeadParser has been simplified by taking advantage of this.
    Also added a title-extraction example that uses this.
  * Michael once again fixed my bad English in the HTML::Parser
    documentation.
  * netscape_buggy_comment will carp instead of warn
  * updated TODO/README
  * Documented that HTML::Filter is depreciated.
  * Made backslash reserved in literal argspec strings.
  * Added several new test scripts.

2.99_94     1999-12-08
  * (should almost be a 3.00 candidate)
  * Renamed 'cdata_flag' as 'is_cdata'.
  * Dropped support for wrapping callback handler and argspec
    in an array and passing a reference to $p->handler.  It
    created ambiguities when you want to pass a array as
    handler destination and not update argspec.  The wrapping
    for constructor arguments are unchanged.
  * Reworked the documentation after updates from Michael.
  * Simplified internal check_handler().  It should probably simply
    be inlined in handler() again.
  * Added argspec 'length' and 'undef'
  * Fix statement-less label.  Fix suggested by Matthew Langford
    <langfml@Eng.Auburn.EDU>.
  * Added two more example programs: eg/hstrip and eg/htext.
  * Various minor patches from Michael.

2.99_93     1999-12-07
  * Documentation update
  * $p->bool_attr_value renamed as $p->boolean_attribute_value
  * Internal renaming: attrspec --> argspec
  * Introduced internal 'enum argcode' in hparser.c
  * Added eg/hrefsub

2.99_92     1999-12-05
  * More documentation patches from Michael
  * Renamed 'token1' as 'token0' as suggested by Michael
  * For artificial end tags we now report 'tokens', but not 'tokenpos'.
  * Boolean attribute values show up as (0, 0) in 'tokenpos' now.
  * If $p->bool_attr_value is set it will influence 'tokens'
  * Fix for core dump when parsing <a "> when $p->strict_names(0).
    Based on fix by Michael.
  * Will av_extend() the tokens/tokenspos arrays.
  * New test suite script by Michael: t/attrspec.t

2.99_91     1999-12-04
  * Implemented attrspec 'offset'
  * Documentation patch from Michael
  * Some more cleanup/updated TODO

2.99_90     1999-12-03
  * (first beta for 3.00)
  * Using "realloc" as a parameter name in grow_tokens created
    problems for some people.  Fix by Paul Schinder <schinder@pobox.com>
  * Patch by Michael that makes array handler destinations really work.
  * Patch by Michael that make HTML::TokeParser use this.  This gave a
    a speedup of about 80%.
  * Patch by Michael that makes t/cases into a real test.
  * Small HTML::Parser documentation patch by Michael.
  * Renamed attrspec 'origtext' to 'text' and 'decoded_text' to 'dtext'
  * Split up Parser.xs.  Moved stuff into hparser.c and util.c
  * Dropped html_ prefix from internal parser functions.
  * Renamed internal function html_handle() as report_event().

2.99_17     1999-12-02
  * HTML::Parser documentation patch from Michael.
  * Fix memory leaks in html_handler()
  * Patch that makes an array legal as handler destination.
    Also from Michael.
  * The end of marked sections does not eat successive newline
    any more.
  * The artificial end event for empty tag in xml_mode did not
    report an empty origtext.



( run in 0.831 second using v1.01-cache-2.11-cpan-cdf2f3d4e48 )