Encode-Repair

 view release on metacpan or  search on metacpan

lib/Encode/Repair.pm  view on Meta::CPAN

    binmode STDOUT, ':encoding(UTF-8)';

    # prints: small ae: ä
    print repair_double("small ae: \xc3\x83\xc2\xa4\n");

    # prints: beta: β
    print repair_double("beta: \xc4\xaa\xc2\xb2\n", {via => 'Latin-7'});


    # Advanced usage
    # assumes you have a sample text both correctly decoded in a
    # character string, and as a wrongly encoded buffer

    use Encode::Repair qw(repair_encoding learn_recoding);
    use charnames qw(:full);
    binmode STDOUT, ':encoding(UTF-8)';

    my $recoding_pattern  = learn_recoding(
        from        => "beta: \xc4\xaa\xc2\xb2",
        to          => "beta: \N{GREEK SMALL LETTER BETA}",
        encodings   => ['UTF-8', 'Latin-1', 'Latin-7'],

lib/Encode/Repair.pm  view on Meta::CPAN

=over

=item repair_double

Repairs the common case when a UTF-8 string was read as another encoding,
and was encoded as UTF-8 again. The other encoding defaults to ISO-8859-1 aka
Latin-1, and can be overridden with the C<via> option:

    my $repaired = repair_double($buffer, {via => 'ISO-8859-2' });

It expects an octet string as input, and returns a decoded character string.

=item learn_recoding

Given a sample of text twice, once correctly decoded and once mistreated,
attemps to find a sequence of encoding and decoding that turns the mistreated
text into the correct form.

    my $coding_pattern = learn_recoding(
        from        => $mistreated_buffer,
        to          => $correct_string,
        encodings   => \@involved_encodings,
        depth       => 5,
        search      => 'first',
    );



( run in 0.335 second using v1.01-cache-2.11-cpan-0d8aa00de5b )