Archive-Tar-Stream

 view release on metacpan or  search on metacpan

lib/Archive/Tar/Stream.pm  view on Meta::CPAN

package Archive::Tar::Stream;

use strict;
use warnings;

# this is pretty fixed by the format!
use constant BLOCKSIZE => 512;

use constant BLOCKCOUNT => 2048;
use constant BUFSIZE => (512*2048);

# dependencies
use IO::File;
use IO::Handle;
use File::Temp;
use List::Util qw(min);

# XXX - make this an OO attribute
our $VERBOSE = 0;

=head1 NAME

Archive::Tar::Stream - pure perl IO-friendly tar file management

=head1 VERSION

Version 0.05

=cut

our $VERSION = '0.05';


=head1 SYNOPSIS

Archive::Tar::Stream grew from a requirement to process very large
archives containing email backups, where the IO hit for unpacking
a tar file, repacking parts of it, and then unlinking all the files
was prohibitive.

Archive::Tar::Stream takes two file handles, one purely for reads,
one purely for writes.  It does no seeking, it just unpacks
individual records from the input filehandle, and packs records
to the output filehandle.

This module does not attempt to do any file handle management or
compression for you.  External zcat and gzip are quite fast and
use separate cores.

    use Archive::Tar::Stream;

    my $ts = Archive::Tar::Stream->new(outfh => $fh);
    $ts->AddFile($name, -s $fh, $fh);

    # remove large non-jpeg files from a tar.gz
    my $infh = IO::File->new("zcat $infile |") || die "oops";
    my $outfh = IO::File->new("| gzip > $outfile") || die "double oops";
    my $ts = Archive::Tar::Stream->new(infh => $infh, outfh => $outfh);
    $ts->StreamCopy(sub {
        my ($header, $outpos, $fh) = @_;

        # we want all small files
        return 'KEEP' if $header->{size} < 64 * 1024;
        # and any other jpegs
        return 'KEEP' if $header->{name} =~ m/\.jpg$/i;

        # no, seriously
        return 'EDIT' unless $fh;

        return 'KEEP' if mimetype_of_filehandle($fh) eq 'image/jpeg';

        # ok, we don't want other big files
        return 'SKIP';
    });


=head1 SUBROUTINES/METHODS

=head2 new

    my $ts = Archive::Tar::Stream->new(%args);

Args:
   infh      - filehandle to read from
   outfh     - filehandle to write to
   inpos     - initial offset in infh
   outpos    - initial offset in outfh
   safe_copy - boolean.

Offsets are for informational purposes only, but can be
useful if you are tracking offsets of items within your
tar files separately.  All read and write functions
update these offsets.  If you don't provide offsets, they
will default to zero.

Safe Copy is the default - you have to explicitly turn it
off.  If Safe Copy is set, every file is first extracted



( run in 3.131 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )