HTML-Gumbo
view release on metacpan or search on metacpan
lib/HTML/Gumbo.pm view on Meta::CPAN
Use it to hardcode a specific encoding.
$gumbo->parse( $octets, input_is => 'octets', encoding => 'latin-1' );
=item BOM
UTF-8/UTF-16 BOMs are checked.
=item C<encoding_content_type> argument
Encdoning from rransport layer, charset in content-type header.
$gumbo->parse( $octets, input_is => 'octets', encoding_content_type => 'latin-1' );
=item Prescan
Not implemented, follow L<issue 58|https://github.com/google/gumbo-parser/issues/58>.
HTML5 defines L<prescan algorithm|http://www.w3.org/TR/html5/syntax.html#prescan-a-byte-stream-to-determine-its-encoding>
that extracts encoding from meta tags in the head.
( run in 2.148 seconds using v1.01-cache-2.11-cpan-524268b4103 )