HTML-Tidy-libXML
view release on metacpan or search on metacpan
lib/HTML/Tidy/libXML.pm view on Meta::CPAN
my $dom = $tidy->html2dom($string, $encoding);
This is analogus to
my $lx = XML::LibXML->new;
$lx->recover_silently(1);
my $dom = $lx->parse_html_string($string);
Except one major difference. L<HTML::Tidy::LibXML> does not trust
C<< <meta http-equiv="content-type" content="text/html; charset="foo"> >>
while L<XML::LibXML> tries to use one. Consider this;
my $dom = $lx->parse_html_string('http://example.com');
This B<kinda> works since L<XML::LibXML> is capable of fetching
document directly. But L<XML::LibXML> does not honor HTTP header.
Here is the better practice.
require LWP::UserAgent;
require HTTP::Response::Encoding;
( run in 1.375 second using v1.01-cache-2.11-cpan-524268b4103 )