HTML-Tidy-libXML

 view release on metacpan or  search on metacpan

lib/HTML/Tidy/libXML.pm  view on Meta::CPAN


  my $dom = $tidy->html2dom($string, $encoding);

This is analogus to

  my $lx = XML::LibXML->new;
  $lx->recover_silently(1);
  my $dom = $lx->parse_html_string($string);

Except one major difference.  L<HTML::Tidy::LibXML> does not trust
C<< <meta http-equiv="content-type" content="text/html; charset="foo"> >>
while L<XML::LibXML> tries to use one.  Consider this;

  my $dom = $lx->parse_html_string('http://example.com');

This B<kinda> works since L<XML::LibXML> is capable of fetching
document directly.  But L<XML::LibXML> does not honor HTTP header.
Here is the better practice.

  require LWP::UserAgent;
  require HTTP::Response::Encoding;



( run in 1.375 second using v1.01-cache-2.11-cpan-524268b4103 )