Encode-Repair
view release on metacpan or search on metacpan
lib/Encode/Repair.pm view on Meta::CPAN
binmode STDOUT, ':encoding(UTF-8)';
# prints: small ae: ä
print repair_double("small ae: \xc3\x83\xc2\xa4\n");
# prints: beta: β
print repair_double("beta: \xc4\xaa\xc2\xb2\n", {via => 'Latin-7'});
# Advanced usage
# assumes you have a sample text both correctly decoded in a
# character string, and as a wrongly encoded buffer
use Encode::Repair qw(repair_encoding learn_recoding);
use charnames qw(:full);
binmode STDOUT, ':encoding(UTF-8)';
my $recoding_pattern = learn_recoding(
from => "beta: \xc4\xaa\xc2\xb2",
to => "beta: \N{GREEK SMALL LETTER BETA}",
encodings => ['UTF-8', 'Latin-1', 'Latin-7'],
lib/Encode/Repair.pm view on Meta::CPAN
=over
=item repair_double
Repairs the common case when a UTF-8 string was read as another encoding,
and was encoded as UTF-8 again. The other encoding defaults to ISO-8859-1 aka
Latin-1, and can be overridden with the C<via> option:
my $repaired = repair_double($buffer, {via => 'ISO-8859-2' });
It expects an octet string as input, and returns a decoded character string.
=item learn_recoding
Given a sample of text twice, once correctly decoded and once mistreated,
attemps to find a sequence of encoding and decoding that turns the mistreated
text into the correct form.
my $coding_pattern = learn_recoding(
from => $mistreated_buffer,
to => $correct_string,
encodings => \@involved_encodings,
depth => 5,
search => 'first',
);
( run in 0.255 second using v1.01-cache-2.11-cpan-0d8aa00de5b )