App-zipdetails

 view release on metacpan or  search on metacpan

README.md  view on Meta::CPAN

    # Done

## Advanced Analysis

If you have a corrupt or non-standard zip file, particulatly one where the
`Central Directory` metadata at the end of the file is absent/incomplete, you
can use either the `--walk` option or the `--scan` option to search for
any zip metadata that is still present in the file.

When either of these options is enabled, this program will bypass the
initial step of reading the `Central Directory` at the end of the file and
simply scan the zip file sequentially from the start of the file looking
for zip metadata records. Although this can be error prone, for the most
part it will find any zip file metadata that is still present in the file.

The difference between the two options is how aggressive the sequential
scan is: `--walk` is optimistic, while `--scan` is pessimistic.

To understand the difference in more detail you need to know a bit about
how zip file metadata is structured. Under the hood, a zip file uses a
series of 4-byte signatures to flag the start of a each of the metadata
records it uses. When the `--walk` or the `--scan` option is enabled both
work identically by scanning the file from the beginning looking for any
the of these valid 4-byte metadata signatures. When a 4-byte signature is
found both options will blindly assume that it has found a vald metadata
record and display it.

### `--walk`

The `--walk` option optimistically assumes that it has found a real zip
metadata record and so starts the scan for the next record directly after
the record it has just output.

### `--scan`

The `--scan` option is pessimistic and assumes the 4-byte signature
sequence may have been a false-positive, so before starting the scan for
the next resord, it will rewind to the location in the file directly after
the 4-byte sequecce it just processed. This means it will rescan data that
has already been processed.  For very large zip files the `--scan` option
can be really really slow, so trying the `--walk` option first.

**Important Note**: If the zip file being processed contains one or more
nested zip files, and the outer zip file uses the `STORE` compression
method, the `--scan` option will display the zip metadata for both the
outer & inner zip files.

## Filename Encoding Issues

Sometimes when displaying the contents of a zip file the filenames (or
comments) appear to be garbled. This section walks through the reasons and
mitigations that can be applied to work around these issues.

### Background

When zip files were first created in the 1980's, there was no Unicode or
UTF-8. Issues around character set encoding interoperability were not a
major concern.

Initially, the only official encoding supported in zip files was IBM Code
Page 437 (AKA `CP437`). As time went on users in locales where `CP437`
wasn't appropriate stored filenames in the encoding native to their locale.
If you were running a system that matched the locale of the zip file, all
was well. If not, you had to post-process the filenames after unzipping the
zip file.

Fast forward to the introduction of Unicode and UTF-8 encoding. The
approach now used by all major zip implementations is to set the `Language
encoding flag` (also known as `EFS`) in the zip file metadata to signal
that a filename/comment is encoded in UTF-8.

To ensure maximum interoperability when sharing zip files store 7-bit
filenames as-is in the zip file. For anything else the `EFS` bit needs to
be set and the filename is encoded in UTF-8. Although this rule is kept to
for the most part, there are exceptions out in the wild.

### Dealing with Encoding Errors

The most common filename encoding issue is where the `EFS` bit is not set and
the filename is stored in a character set that doesnt't match the system
encoding. This mostly impacts legacy zip files that predate the
introduction of Unicode.

To deal with this issue you first need to know what encoding was used in
the zip file. For example, if the filename is encoded in `ISO-8859-1` you
can display the filenames using the `--encoding` option

    zipdetails --encoding ISO-8859-1 myfile.zip

A less common variation of this is where the `EFS` bit is set, signalling
that the filename will be encoded in UTF-8, but the filename is not encoded
in UTF-8. To deal with this scenario, use the `--no-language-encoding`
option along with the `--encoding` option.

# LIMITATIONS

The following zip file features are not supported by this program:

- Multi-part/Split/Spanned Zip Archives.

    This program cannot give an overall report on the combined parts of a
    multi-part zip file.

    The best you can do is run with either the `--scan` or `--walk` options
    against individual parts. Some will contains zipfile metadata which will be
    detected and some will only contain compressed payload data.

- Encrypted Central Directory

    When pkzip _Strong Encryption_ is enabled in a zip file this program can
    still parse most of the metadata in the zip file. The exception is when the
    `Central Directory` of a zip file is also encrypted. This program cannot
    parse any metadata from an encrypted `Central Directory`.

- Corrupt Zip files

    When `zipdetails` encounters a corrupt zip file, it will do one or more of
    the following

    - Display details of the corruption and carry on
    - Display details of the corruption and terminate
    - Terminate with a generic message



( run in 0.707 second using v1.01-cache-2.11-cpan-ceb78f64989 )