Archive-BagIt

 view release on metacpan or  search on metacpan

README.mkdn  view on Meta::CPAN

    }

Similar for tagmanifests

## How fast is [Archive::BagIt](https://metacpan.org/pod/Archive%3A%3ABagIt)?

I have made great efforts to optimize Archive::BagIt for high throughput. There are two limiting factors:

- calculation of checksums, by switching from the module "Digest" to OpenSSL by using [Net::SSLeay](https://metacpan.org/pod/Net%3A%3ASSLeay) a significant
   speed increase could be achieved.
- loading the files referenced in the manifest files was previously done serially and using synchronous I/O. By
   using the [IO::Async](https://metacpan.org/pod/IO%3A%3AAsync) module, the files are loaded asynchronously and the checksums are calculated in parallel.
   If the underlying file system supports parallel accesses, the performance gain is huge.

On my system with 8cores, SSD and a large 9GB bag with 568 payload files the results for `verify_bag()` are:

                     processing time          run time             throughput
    Version       user time    system time    total time    total    MB/s
     v0.71        38.31s        1.60s         39.938s       100%     230
     v0.81        25.48s        1.68s         27.1s          67%     340
     v0.82        48.85s        3.89s          6.84s         17%    1346

README.mkdn  view on Meta::CPAN

    #read in an existing bag:
    my $bag_dir = "/path/to/bag";
    my $bag = Archive::BagIt->new(
        bag_path => $bag_dir,
    );

The arguments are:

- `bag_path` - path to bag-directory
- `force_utf8` - if set the warnings about non portable filenames are disabled (default: enabled)
- `use_async` - if set it uses IO::Async to read payload files asynchronly, only useful under Linux.
- `use_parallel` - if set it uses Parallel::parallel\_map to calculate digests of payload files in parallel,
      only useful if underlying filesystem supports parallel read and if multiple CPU cores available.
- `use_plugins` - expected manifest plugin strings, if set it uses the requested plugins,
      example `Archive::BagIt::Plugin::Manifest::SHA256`.
      HINT: this option \*disables\* the forced fixity check in `verify_bag()`!

The bag object will use $bag\_dir, BUT an existing $bag\_dir is not read. If you use `store()` an existing bag will be overwritten!

See `load()` if you want to parse/modify an existing bag.

lib/Archive/BagIt.pm  view on Meta::CPAN


=head2 How fast is L<Archive::BagIt>?

I have made great efforts to optimize Archive::BagIt for high throughput. There are two limiting factors:

=over

=item calculation of checksums, by switching from the module "Digest" to OpenSSL by using L<Net::SSLeay> a significant
   speed increase could be achieved.

=item loading the files referenced in the manifest files was previously done serially and using synchronous I/O. By
   using the L<IO::Async> module, the files are loaded asynchronously and the checksums are calculated in parallel.
   If the underlying file system supports parallel accesses, the performance gain is huge.

=back

On my system with 8cores, SSD and a large 9GB bag with 568 payload files the results for C<verify_bag()> are:

                    processing time          run time             throughput
   Version       user time    system time    total time    total    MB/s
    v0.71        38.31s        1.60s         39.938s       100%     230
    v0.81        25.48s        1.68s         27.1s          67%     340

lib/Archive/BagIt.pm  view on Meta::CPAN

    );

The arguments are:

=over 1

=item C<bag_path> - path to bag-directory

=item C<force_utf8> - if set the warnings about non portable filenames are disabled (default: enabled)

=item C<use_async> - if set it uses IO::Async to read payload files asynchronly, only useful under Linux.

=item C<use_parallel> - if set it uses Parallel::parallel_map to calculate digests of payload files in parallel,
      only useful if underlying filesystem supports parallel read and if multiple CPU cores available.

=item C<use_plugins> - expected manifest plugin strings, if set it uses the requested plugins,
      example C<Archive::BagIt::Plugin::Manifest::SHA256>.
      HINT: this option *disables* the forced fixity check in C<verify_bag()>!

=back

lib/Archive/BagIt/Role/OpenSSL.pm  view on Meta::CPAN

=head1 VERSION

version 0.100

=head2 has_async_support()

returns true if async IO is possible, because IO::Async could be loaded, otherwise returns false

=head2 get_hash_string($fh)

calls synchronous or asynchronous function to calc digest of file, depending on result of $bag->use_async()
returns the digest result as hex string

=head1 AVAILABILITY

The latest version of this module is available from the Comprehensive Perl
Archive Network (CPAN). Visit L<http://www.perl.com/CPAN/> to find a CPAN
site near you, or see L<https://metacpan.org/module/Archive::BagIt/>.

=head1 BUGS AND LIMITATIONS

lib/Archive/BagIt/Role/OpenSSL/Async.pm  view on Meta::CPAN

package Archive::BagIt::Role::OpenSSL::Async;
use strict;
use warnings;
use Moo;
use namespace::autoclean;
use IO::Async::Loop;
use IO::Async::Stream;
use Net::SSLeay ();
our $VERSION = '0.100'; # VERSION
# ABSTRACT: handles asynchronous digest calculation using openssl

sub BEGIN {
    Net::SSLeay::OpenSSL_add_all_digests();
    is => 'rw',

}

has 'name' => (
    required => 1,
    is       => 'ro',

lib/Archive/BagIt/Role/OpenSSL/Async.pm  view on Meta::CPAN

1;

__END__

=pod

=encoding UTF-8

=head1 NAME

Archive::BagIt::Role::OpenSSL::Async - handles asynchronous digest calculation using openssl

=head1 VERSION

version 0.100

=head1 AVAILABILITY

The latest version of this module is available from the Comprehensive Perl
Archive Network (CPAN). Visit L<http://www.perl.com/CPAN/> to find a CPAN
site near you, or see L<https://metacpan.org/module/Archive::BagIt/>.

lib/Archive/BagIt/Role/OpenSSL/Sync.pm  view on Meta::CPAN

package Archive::BagIt::Role::OpenSSL::Sync;
use strict;
use warnings FATAL => 'all';
use Moo;
use namespace::autoclean;
use Net::SSLeay ();
our $VERSION = '0.100'; # VERSION
# ABSTRACT: handles synchronous digest calculation using openssl

sub BEGIN {
    Net::SSLeay::OpenSSL_add_all_digests();
}

has 'name' => (
    required => 1,
    is       => 'ro',
);

lib/Archive/BagIt/Role/OpenSSL/Sync.pm  view on Meta::CPAN

1;

__END__

=pod

=encoding UTF-8

=head1 NAME

Archive::BagIt::Role::OpenSSL::Sync - handles synchronous digest calculation using openssl

=head1 VERSION

version 0.100

=head1 AVAILABILITY

The latest version of this module is available from the Comprehensive Perl
Archive Network (CPAN). Visit L<http://www.perl.com/CPAN/> to find a CPAN
site near you, or see L<https://metacpan.org/module/Archive::BagIt/>.



( run in 0.307 second using v1.01-cache-2.11-cpan-0d8aa00de5b )