Archive-BagIt
view release on metacpan or search on metacpan
README.mkdn view on Meta::CPAN
}
Similar for tagmanifests
## How fast is [Archive::BagIt](https://metacpan.org/pod/Archive%3A%3ABagIt)?
I have made great efforts to optimize Archive::BagIt for high throughput. There are two limiting factors:
- calculation of checksums, by switching from the module "Digest" to OpenSSL by using [Net::SSLeay](https://metacpan.org/pod/Net%3A%3ASSLeay) a significant
speed increase could be achieved.
- loading the files referenced in the manifest files was previously done serially and using synchronous I/O. By
using the [IO::Async](https://metacpan.org/pod/IO%3A%3AAsync) module, the files are loaded asynchronously and the checksums are calculated in parallel.
If the underlying file system supports parallel accesses, the performance gain is huge.
On my system with 8cores, SSD and a large 9GB bag with 568 payload files the results for `verify_bag()` are:
processing time run time throughput
Version user time system time total time total MB/s
v0.71 38.31s 1.60s 39.938s 100% 230
v0.81 25.48s 1.68s 27.1s 67% 340
v0.82 48.85s 3.89s 6.84s 17% 1346
README.mkdn view on Meta::CPAN
#read in an existing bag:
my $bag_dir = "/path/to/bag";
my $bag = Archive::BagIt->new(
bag_path => $bag_dir,
);
The arguments are:
- `bag_path` - path to bag-directory
- `force_utf8` - if set the warnings about non portable filenames are disabled (default: enabled)
- `use_async` - if set it uses IO::Async to read payload files asynchronly, only useful under Linux.
- `use_parallel` - if set it uses Parallel::parallel\_map to calculate digests of payload files in parallel,
only useful if underlying filesystem supports parallel read and if multiple CPU cores available.
- `use_plugins` - expected manifest plugin strings, if set it uses the requested plugins,
example `Archive::BagIt::Plugin::Manifest::SHA256`.
HINT: this option \*disables\* the forced fixity check in `verify_bag()`!
The bag object will use $bag\_dir, BUT an existing $bag\_dir is not read. If you use `store()` an existing bag will be overwritten!
See `load()` if you want to parse/modify an existing bag.
lib/Archive/BagIt.pm view on Meta::CPAN
=head2 How fast is L<Archive::BagIt>?
I have made great efforts to optimize Archive::BagIt for high throughput. There are two limiting factors:
=over
=item calculation of checksums, by switching from the module "Digest" to OpenSSL by using L<Net::SSLeay> a significant
speed increase could be achieved.
=item loading the files referenced in the manifest files was previously done serially and using synchronous I/O. By
using the L<IO::Async> module, the files are loaded asynchronously and the checksums are calculated in parallel.
If the underlying file system supports parallel accesses, the performance gain is huge.
=back
On my system with 8cores, SSD and a large 9GB bag with 568 payload files the results for C<verify_bag()> are:
processing time run time throughput
Version user time system time total time total MB/s
v0.71 38.31s 1.60s 39.938s 100% 230
v0.81 25.48s 1.68s 27.1s 67% 340
lib/Archive/BagIt.pm view on Meta::CPAN
);
The arguments are:
=over 1
=item C<bag_path> - path to bag-directory
=item C<force_utf8> - if set the warnings about non portable filenames are disabled (default: enabled)
=item C<use_async> - if set it uses IO::Async to read payload files asynchronly, only useful under Linux.
=item C<use_parallel> - if set it uses Parallel::parallel_map to calculate digests of payload files in parallel,
only useful if underlying filesystem supports parallel read and if multiple CPU cores available.
=item C<use_plugins> - expected manifest plugin strings, if set it uses the requested plugins,
example C<Archive::BagIt::Plugin::Manifest::SHA256>.
HINT: this option *disables* the forced fixity check in C<verify_bag()>!
=back
lib/Archive/BagIt/Role/OpenSSL.pm view on Meta::CPAN
=head1 VERSION
version 0.100
=head2 has_async_support()
returns true if async IO is possible, because IO::Async could be loaded, otherwise returns false
=head2 get_hash_string($fh)
calls synchronous or asynchronous function to calc digest of file, depending on result of $bag->use_async()
returns the digest result as hex string
=head1 AVAILABILITY
The latest version of this module is available from the Comprehensive Perl
Archive Network (CPAN). Visit L<http://www.perl.com/CPAN/> to find a CPAN
site near you, or see L<https://metacpan.org/module/Archive::BagIt/>.
=head1 BUGS AND LIMITATIONS
lib/Archive/BagIt/Role/OpenSSL/Async.pm view on Meta::CPAN
package Archive::BagIt::Role::OpenSSL::Async;
use strict;
use warnings;
use Moo;
use namespace::autoclean;
use IO::Async::Loop;
use IO::Async::Stream;
use Net::SSLeay ();
our $VERSION = '0.100'; # VERSION
# ABSTRACT: handles asynchronous digest calculation using openssl
sub BEGIN {
Net::SSLeay::OpenSSL_add_all_digests();
is => 'rw',
}
has 'name' => (
required => 1,
is => 'ro',
lib/Archive/BagIt/Role/OpenSSL/Async.pm view on Meta::CPAN
1;
__END__
=pod
=encoding UTF-8
=head1 NAME
Archive::BagIt::Role::OpenSSL::Async - handles asynchronous digest calculation using openssl
=head1 VERSION
version 0.100
=head1 AVAILABILITY
The latest version of this module is available from the Comprehensive Perl
Archive Network (CPAN). Visit L<http://www.perl.com/CPAN/> to find a CPAN
site near you, or see L<https://metacpan.org/module/Archive::BagIt/>.
lib/Archive/BagIt/Role/OpenSSL/Sync.pm view on Meta::CPAN
package Archive::BagIt::Role::OpenSSL::Sync;
use strict;
use warnings FATAL => 'all';
use Moo;
use namespace::autoclean;
use Net::SSLeay ();
our $VERSION = '0.100'; # VERSION
# ABSTRACT: handles synchronous digest calculation using openssl
sub BEGIN {
Net::SSLeay::OpenSSL_add_all_digests();
}
has 'name' => (
required => 1,
is => 'ro',
);
lib/Archive/BagIt/Role/OpenSSL/Sync.pm view on Meta::CPAN
1;
__END__
=pod
=encoding UTF-8
=head1 NAME
Archive::BagIt::Role::OpenSSL::Sync - handles synchronous digest calculation using openssl
=head1 VERSION
version 0.100
=head1 AVAILABILITY
The latest version of this module is available from the Comprehensive Perl
Archive Network (CPAN). Visit L<http://www.perl.com/CPAN/> to find a CPAN
site near you, or see L<https://metacpan.org/module/Archive::BagIt/>.
( run in 0.307 second using v1.01-cache-2.11-cpan-0d8aa00de5b )