App-zipdetails

 view release on metacpan or  search on metacpan

README.md  view on Meta::CPAN

comments) appear to be garbled. This section walks through the reasons and
mitigations that can be applied to work around these issues.

### Background

When zip files were first created in the 1980's, there was no Unicode or
UTF-8. Issues around character set encoding interoperability were not a
major concern.

Initially, the only official encoding supported in zip files was IBM Code
Page 437 (AKA `CP437`). As time went on users in locales where `CP437`
wasn't appropriate stored filenames in the encoding native to their locale.
If you were running a system that matched the locale of the zip file, all
was well. If not, you had to post-process the filenames after unzipping the
zip file.

Fast forward to the introduction of Unicode and UTF-8 encoding. The
approach now used by all major zip implementations is to set the `Language
encoding flag` (also known as `EFS`) in the zip file metadata to signal
that a filename/comment is encoded in UTF-8.

To ensure maximum interoperability when sharing zip files store 7-bit
filenames as-is in the zip file. For anything else the `EFS` bit needs to

bin/zipdetails  view on Meta::CPAN


    sub getNativeLocale
    {
        state $enc;

        if (! defined $enc)
        {
            eval
            {
                require encoding ;
                my $encoding = encoding::_get_locale_encoding() ;
                if (! $encoding)
                {
                    # CP437 is the legacy default for zip files
                    $encoding = 'cp437';
                    # ::warning undef, "Cannot determine system charset: defaulting to '$encoding'"
                }
                $enc = Encode::find_encoding($encoding) ;
            } ;
        }

bin/zipdetails  view on Meta::CPAN

comments) appear to be garbled. This section walks through the reasons and
mitigations that can be applied to work around these issues.

=head3 Background

When zip files were first created in the 1980's, there was no Unicode or
UTF-8. Issues around character set encoding interoperability were not a
major concern.

Initially, the only official encoding supported in zip files was IBM Code
Page 437 (AKA C<CP437>). As time went on users in locales where C<CP437>
wasn't appropriate stored filenames in the encoding native to their locale.
If you were running a system that matched the locale of the zip file, all
was well. If not, you had to post-process the filenames after unzipping the
zip file.

Fast forward to the introduction of Unicode and UTF-8 encoding. The
approach now used by all major zip implementations is to set the C<Language
encoding flag> (also known as C<EFS>) in the zip file metadata to signal
that a filename/comment is encoded in UTF-8.

To ensure maximum interoperability when sharing zip files store 7-bit
filenames as-is in the zip file. For anything else the C<EFS> bit needs to

t/002-main.t  view on Meta::CPAN

{
    state $enc;

    if (! $enc)
    {
        $enc = 'unknown';

        eval
        {
            require encoding ;
            my $encoding = encoding::_get_locale_encoding() // 'cp437';
            $enc = Encode::find_encoding($encoding) ;
        } ;

        $enc = $enc->name()
            if $enc;
    }

    return $enc;
}

t/002-main.t  view on Meta::CPAN


        $string = $1;
    }

    # warn "GOT [$Perl][$]][$string]\n";
    return $string ;
}

sub zapGolden
{
    my $locale_charset = getNativeLocale();
    $_[0] =~ s<^(#\s*System Default Encoding:\s*)('.+?')><$1'$locale_charset'>mg ;

    # # Encode changed from using utf8 to UTF-8 at some point
    my $UTF = getUTF8String();
    $_[0] =~ s<\S+ (\S+) does not map to Unicode><$UTF $1 does not map to Unicode>g ;
}



( run in 1.382 second using v1.01-cache-2.11-cpan-ceb78f64989 )