File-Unpack

 view release on metacpan or  search on metacpan

MANIFEST  view on Meta::CPAN

t/08-survive-corrupt.t

t/data/Desktop.directory
t/data/IPA-snippet.pfa
t/data/Times-Roman-snippet.afm
t/data/columns-snippet.fo
t/data/empty.odt
t/data/ruhyphal.tex
t/data/test2.tga
t/data/wzbc-2009-06-28-17-00.m3u
t/data/xterm-snippet.desktop
t/data/lxknf09SCc0.bin
t/data/monotone.info
t/data/pdftex-a.txt
t/data/test.mht
t/data/Archive.pax
t/data/bad34.pdf
t/data/good10.pdf

META.yml                                 Module meta-data (added by MakeMaker)
META.json                                Module JSON meta-data (added by MakeMaker)

Unpack.pm  view on Meta::CPAN


$u->mime(buf => "#!/bin ...", file => "what-was-read")

$u->mime(fd => \*STDIN, file => "what-was-opened")

Determines the MIME type (and optionally additional information) of a file.
The file can be specified by filename, by a provided buffer or an opened file descriptor.
For the latter two cases, specifying a filename is optional, and used only for diagnostics.

C<mime> uses libmagic by Christos Zoulas exposed via File::LibMagic and also uses
the shared-mime-info database from freedesktop.org exposed via
File::MimeInfo::Magic, if available.  Either one is sufficient, but having both
is better. LibMagic sometimes says 'text/x-pascal', although we have a F<.desktop>
file, or says 'text/plain', but has contradicting details in its description.

C<File::MimeInfo::Magic::magic> is consulted where the libmagic output is dubious. E.g. when 
the desciption says something interesting like 'Debian binary package (format 2.0)' but the 
mimetype says 'application/octet-stream'. The combination of both libraries gives us 
excellent reliability in the critical field of MIME type recognition.

This implementation also features multi-level MIME type recognition for efficient unpacking.
When e.g. unpacking a large bzipped tar archive, this saves us from creating a
huge temporary tar-file which C<unpack> would extract in a second step.  The multi-level recognition

Unpack.pm  view on Meta::CPAN

  if ($mime1 =~ m{^application/xml})
    {
      # This is horrible from a greedy text cruncher perspective:
      # although xml is a plain text syntax, it is reported by flm to be 
      # outside text/*
      $r[0] = "text/x-application-xml";
    }

  if ($mime1 =~ m{^text/x-(?:pascal|fortran)$})
    {
      # xterm.desktop
      # ['text/x-pascal; charset=utf-8','UTF-8 Unicode Pascal program text']
      # 'application/x-desktop'
      #
      # Times-Roman.afm
      # ['text/x-fortran; charset=us-ascii','ASCII font metrics']
      # 'application/x-font-afm'
      #
      # debian/rules
      # ['text/x-pascal; charset=us-ascii','a /usr/bin/make -f  script text']
      # 'text/x-makefile'
      if ($mime2 ||= eval { open my $fd,'<',\$in{buf}; File::MimeInfo::Magic::magic($fd); })
        {

Unpack.pm  view on Meta::CPAN

weaknesses, and consult File::MimeInfo::Magic and some own logic, for e.g.
detecting LZMA compression which fails to provide any recognizable magic.
Required if you use C<mime>; otherwise not a hard requirement.

=item File::MimeInfo::Magic

Uses both magic information and file suffixes to determine the mimetype. Its
magic() function is used in a few cases, where File::LibMagic fails.  E.g. as
of June 2010, libmagic does not recognize 'image/x-targa'.
File::MimeInfo::Magic may be slower, but it features the shared-mime-info
database from freedesktop.org .  Recommended if you use C<mime>.

=item String::ShellQuote 

Used to call external MIME helpers. Required.

=item BSD::Resource

Used to reliably restrict the maximum file size. Recommended.

=item File::Path

t/02-mime.t  view on Meta::CPAN

  ## these two are from SUSE:Factory:Head/qpdf%5.1.0%r23/qpdf-5.1.0/qpdf/qtest/qpdf/
  'bad34.pdf' => 
  	[ 'application/pdf', 'us-ascii', 'PDF document, version 1.3' ],
  'good10.pdf' => 
  	[ 'application/pdf', 'us-ascii', 'PDF document, version 1.3' ],

  ## 0.22 used to say application/x-lzma, but true binary data. Not even compressed.
  'lxknf09SCc0.bin' => 
  	[ 'application/octet-stream', qr{^(binary|unknown|)$} ], 

  ## actually 'application/x-desktop' or 'text/x-desktop'
  'Desktop.directory' => 
  	[ 'text/plain', 'utf-8', 'UTF-8 Unicode text' ],

  ## text/plain seen on 12.1, was text/x-desktop before
  'xterm-snippet.desktop' => 
  	[ qr{^text/(plain|x\-desktop)$}, 'utf-8', 
	 'UTF-8 Unicode Pascal program text', ['text/x-pascal','application/x-desktop']],

  'IPA-snippet.pfa' => 
  	[ 'text/x-font-type1', qr{^(us-ascii|)$}, 
	  'PostScript Type 1 font text (OmegaSerifIPA 001.000)', 
	  [ 'text/plain', 'application/x-font-type1' ] ],

  'Times-Roman-snippet.afm' => 
  	[ qr{^(application|text)/x-font-sunos-news$}, 
	  'us-ascii','ASCII font metrics',['text/x-fortran','application/x-font-sunos-news']], 

t/data/Desktop.directory  view on Meta::CPAN

[Desktop Entry]
BgImage=
Encoding=UTF-8
Icon=user-desktop
Name=Desktop
Name[af]=Werkskerm
Name[ar]=سطح المكتب
Name[az]=Masa Üstü
Name[be]=Працоўны стол
Name[bg]=Работен плот
Name[bn]=ডেস্কটপ
Name[br]=Gorretaol
Name[bs]=Radna površina
Name[ca]=Escriptori



( run in 0.439 second using v1.01-cache-2.11-cpan-299005ec8e3 )