App-zipdetails
view release on metacpan or search on metacpan
comments) appear to be garbled. This section walks through the reasons and
mitigations that can be applied to work around these issues.
### Background
When zip files were first created in the 1980's, there was no Unicode or
UTF-8. Issues around character set encoding interoperability were not a
major concern.
Initially, the only official encoding supported in zip files was IBM Code
Page 437 (AKA `CP437`). As time went on users in locales where `CP437`
wasn't appropriate stored filenames in the encoding native to their locale.
If you were running a system that matched the locale of the zip file, all
was well. If not, you had to post-process the filenames after unzipping the
zip file.
Fast forward to the introduction of Unicode and UTF-8 encoding. The
approach now used by all major zip implementations is to set the `Language
encoding flag` (also known as `EFS`) in the zip file metadata to signal
that a filename/comment is encoded in UTF-8.
To ensure maximum interoperability when sharing zip files store 7-bit
filenames as-is in the zip file. For anything else the `EFS` bit needs to
bin/zipdetails view on Meta::CPAN
sub getNativeLocale
{
state $enc;
if (! defined $enc)
{
eval
{
require encoding ;
my $encoding = encoding::_get_locale_encoding() ;
if (! $encoding)
{
# CP437 is the legacy default for zip files
$encoding = 'cp437';
# ::warning undef, "Cannot determine system charset: defaulting to '$encoding'"
}
$enc = Encode::find_encoding($encoding) ;
} ;
}
bin/zipdetails view on Meta::CPAN
comments) appear to be garbled. This section walks through the reasons and
mitigations that can be applied to work around these issues.
=head3 Background
When zip files were first created in the 1980's, there was no Unicode or
UTF-8. Issues around character set encoding interoperability were not a
major concern.
Initially, the only official encoding supported in zip files was IBM Code
Page 437 (AKA C<CP437>). As time went on users in locales where C<CP437>
wasn't appropriate stored filenames in the encoding native to their locale.
If you were running a system that matched the locale of the zip file, all
was well. If not, you had to post-process the filenames after unzipping the
zip file.
Fast forward to the introduction of Unicode and UTF-8 encoding. The
approach now used by all major zip implementations is to set the C<Language
encoding flag> (also known as C<EFS>) in the zip file metadata to signal
that a filename/comment is encoded in UTF-8.
To ensure maximum interoperability when sharing zip files store 7-bit
filenames as-is in the zip file. For anything else the C<EFS> bit needs to
t/002-main.t view on Meta::CPAN
{
state $enc;
if (! $enc)
{
$enc = 'unknown';
eval
{
require encoding ;
my $encoding = encoding::_get_locale_encoding() // 'cp437';
$enc = Encode::find_encoding($encoding) ;
} ;
$enc = $enc->name()
if $enc;
}
return $enc;
}
t/002-main.t view on Meta::CPAN
$string = $1;
}
# warn "GOT [$Perl][$]][$string]\n";
return $string ;
}
sub zapGolden
{
my $locale_charset = getNativeLocale();
$_[0] =~ s<^(#\s*System Default Encoding:\s*)('.+?')><$1'$locale_charset'>mg ;
# # Encode changed from using utf8 to UTF-8 at some point
my $UTF = getUTF8String();
$_[0] =~ s<\S+ (\S+) does not map to Unicode><$UTF $1 does not map to Unicode>g ;
}
( run in 1.382 second using v1.01-cache-2.11-cpan-ceb78f64989 )