Archive-BagIt

 view release on metacpan or  search on metacpan

README.mkdn  view on Meta::CPAN

- enhanced testsuite
- reduce complexity
- use modern perl code
- add flag to enable very strict verify

# Backward Compatibility

To reduce the complexity of code in current module the support for

- parallel processing
=item synchronous I/O

is removed. The existing code is very fast, so there is no performance loss.

In near future the support for [Archive::BagIt::Fast](https://metacpan.org/pod/Archive%3A%3ABagIt%3A%3AFast) will be  removed, because it needs hooks, which increase code
complexity in current module without any performance benefit.

# FAQ

## How to access the manifest-entries directly?

README.mkdn  view on Meta::CPAN

    }

Similar for tagmanifests

## How fast is [Archive::BagIt](https://metacpan.org/pod/Archive%3A%3ABagIt)?

I have made great efforts to optimize Archive::BagIt for high throughput. There are two limiting factors:

- calculation of checksums, by switching from the module "Digest" to OpenSSL by using [Net::SSLeay](https://metacpan.org/pod/Net%3A%3ASSLeay) a significant
   speed increase could be achieved.
- loading the files referenced in the manifest files was previously done serially and using synchronous I/O. By
   using the [IO::Async](https://metacpan.org/pod/IO%3A%3AAsync) module, the files are loaded asynchronously, the performance gain is huge.

On my system with 8cores, SSD and a large 9GB bag with 568 payload files the results for `verify_bag()` are:

                     processing time          run time             throughput
    Version       user time    system time    total time    total    MB/s
     v0.71        38.31s        1.60s         39.938s       100%     230
     v0.81        25.48s        1.68s         27.1s          67%     340
     v0.82        48.85s        3.89s          6.84s         17%    1346

## How fast is [Archive::BagIt::Fast](https://metacpan.org/pod/Archive%3A%3ABagIt%3A%3AFast)?

lib/Archive/BagIt.pm  view on Meta::CPAN


=back

=head1 Backward Compatibility

To reduce the complexity of code in current module the support for

=over

=item parallel processing
=item synchronous I/O

=back

is removed. The existing code is very fast, so there is no performance loss.

In near future the support for L<Archive::BagIt::Fast> will be  removed, because it needs hooks, which increase code
complexity in current module without any performance benefit.

=head1 FAQ

lib/Archive/BagIt.pm  view on Meta::CPAN


=head2 How fast is L<Archive::BagIt>?

I have made great efforts to optimize Archive::BagIt for high throughput. There are two limiting factors:

=over

=item calculation of checksums, by switching from the module "Digest" to OpenSSL by using L<Net::SSLeay> a significant
   speed increase could be achieved.

=item loading the files referenced in the manifest files was previously done serially and using synchronous I/O. By
   using the L<IO::Async> module, the files are loaded asynchronously, the performance gain is huge.

=back

On my system with 8cores, SSD and a large 9GB bag with 568 payload files the results for C<verify_bag()> are:

                    processing time          run time             throughput
   Version       user time    system time    total time    total    MB/s
    v0.71        38.31s        1.60s         39.938s       100%     230
    v0.81        25.48s        1.68s         27.1s          67%     340
    v0.82        48.85s        3.89s          6.84s         17%    1346

lib/Archive/BagIt/Role/OpenSSL.pm  view on Meta::CPAN

=head1 VERSION

version 0.101

=head2 has_async_support()

returns true if async IO is possible, because IO::Async could be loaded, otherwise returns false

=head2 get_hash_string($fh)

calls synchronous or asynchronous function to calc digest of file, depending on result of $bag->use_async()
returns the digest result as hex string

=head1 AVAILABILITY

The latest version of this module is available from the Comprehensive Perl
Archive Network (CPAN). Visit L<http://www.perl.com/CPAN/> to find a CPAN
site near you, or see L<https://metacpan.org/module/Archive::BagIt/>.

=head1 BUGS AND LIMITATIONS

lib/Archive/BagIt/Role/OpenSSL/Async.pm  view on Meta::CPAN

package Archive::BagIt::Role::OpenSSL::Async;
use strict;
use warnings;
use Moo;
use namespace::autoclean;
use IO::Async::Loop;
use IO::Async::Stream;
use Net::SSLeay ();
our $VERSION = '0.101'; # VERSION
# ABSTRACT: handles asynchronous digest calculation using openssl

sub BEGIN {
    Net::SSLeay::OpenSSL_add_all_digests();
    is => 'rw',

}

has 'name' => (
    required => 1,
    is       => 'ro',

lib/Archive/BagIt/Role/OpenSSL/Async.pm  view on Meta::CPAN

1;

__END__

=pod

=encoding UTF-8

=head1 NAME

Archive::BagIt::Role::OpenSSL::Async - handles asynchronous digest calculation using openssl

=head1 VERSION

version 0.101

=head1 AVAILABILITY

The latest version of this module is available from the Comprehensive Perl
Archive Network (CPAN). Visit L<http://www.perl.com/CPAN/> to find a CPAN
site near you, or see L<https://metacpan.org/module/Archive::BagIt/>.

lib/Archive/BagIt/Role/OpenSSL/Sync.pm  view on Meta::CPAN

package Archive::BagIt::Role::OpenSSL::Sync;
use strict;
use warnings FATAL => 'all';
use Moo;
use namespace::autoclean;
use Net::SSLeay ();
our $VERSION = '0.101'; # VERSION
# ABSTRACT: handles synchronous digest calculation using openssl

sub BEGIN {
    Net::SSLeay::OpenSSL_add_all_digests();
}

has 'name' => (
    required => 1,
    is       => 'ro',
);

lib/Archive/BagIt/Role/OpenSSL/Sync.pm  view on Meta::CPAN

1;

__END__

=pod

=encoding UTF-8

=head1 NAME

Archive::BagIt::Role::OpenSSL::Sync - handles synchronous digest calculation using openssl

=head1 VERSION

version 0.101

=head1 AVAILABILITY

The latest version of this module is available from the Comprehensive Perl
Archive Network (CPAN). Visit L<http://www.perl.com/CPAN/> to find a CPAN
site near you, or see L<https://metacpan.org/module/Archive::BagIt/>.



( run in 1.480 second using v1.01-cache-2.11-cpan-ff066701436 )