HTML-Gumbo

 view release on metacpan or  search on metacpan

lib/HTML/Gumbo.pm  view on Meta::CPAN

Use it to hardcode a specific encoding.

    $gumbo->parse( $octets, input_is => 'octets', encoding => 'latin-1' );

=item BOM

UTF-8/UTF-16 BOMs are checked.

=item C<encoding_content_type> argument

Encdoning from rransport layer, charset in content-type header.

    $gumbo->parse( $octets, input_is => 'octets', encoding_content_type => 'latin-1' );

=item Prescan

Not implemented, follow L<issue 58|https://github.com/google/gumbo-parser/issues/58>.

HTML5 defines L<prescan algorithm|http://www.w3.org/TR/html5/syntax.html#prescan-a-byte-stream-to-determine-its-encoding>
that extracts encoding from meta tags in the head.



( run in 2.148 seconds using v1.01-cache-2.11-cpan-524268b4103 )