Archive-BagIt

 view release on metacpan or  search on metacpan

lib/Archive/BagIt.pm  view on Meta::CPAN

=head1 NAME

Achive::BagIt - The main module to handle Bags

=head1 SOURCE

The original development version was on github at L<http://github.com/rjeschmi/Archive-BagIt>
and may be cloned from there.

The actual development version is available at L<https://git.fsfe.org/art1pirat/Archive-BagIt>

=head1 Conformance to RFC8493

The module should fulfill the RFC requirements, with following limitations:

=over

=item only encoding UTF-8 is supported

=item version 0.97 or 1.0 allowed

=item version 0.97 requires tag-/manifest-files with md5-fixity

=item version 1.0 requires tag-/manifest-files with sha512-fixity

=item BOM is not supported

=item Carriage Return in bagit-files are not allowed

=item fetch.txt is unsupported

=back

At the moment only filepaths in linux-style are supported.

To get an more detailled overview, see the testsuite under F<t/verify_bag.t> and corresponding test bags from the BagIt conformance testsuite of Library of Congress under F<bagit_conformance_suite/>.

See L<https://datatracker.ietf.org/doc/rfc8493/?include_text=1> for details.

=head1 TODO

=over

=item enhanced testsuite

=item reduce complexity

=item use modern perl code

=item add flag to enable very strict verify

=back

=head1 Backward Compatibility

To reduce the complexity of code in current module the support for

=over

=item parallel processing
=item synchronous I/O

=back

is removed. The existing code is very fast, so there is no performance loss.

In near future the support for L<Archive::BagIt::Fast> will be  removed, because it needs hooks, which increase code
complexity in current module without any performance benefit.

=head1 FAQ

=head2 How to access the manifest-entries directly?

Try this:

   foreach my $algorithm ( keys %{ $self->manifests }) {
       my $entries_ref = $self->manifests->{$algorithm}->manifest_entries();
       # $entries_ref returns a hashref like:
       # {
       #     data/hello.txt   "e7c22b994c59d9cf2b48e549b1e24666636045930d3da7c1acb299d1c3b7f931f94aae41edda2c2b207a36e10f8bcb8d45223e54878f5b316e7ce3b6bc019629"
       # }
   }

Similar for tagmanifests

=head2 How fast is L<Archive::BagIt>?

I have made great efforts to optimize Archive::BagIt for high throughput. There are two limiting factors:

=over

=item calculation of checksums, by switching from the module "Digest" to OpenSSL by using L<Net::SSLeay> a significant
   speed increase could be achieved.

=item loading the files referenced in the manifest files was previously done serially and using synchronous I/O. By
   using the L<IO::Async> module, the files are loaded asynchronously, the performance gain is huge.

=back

On my system with 8cores, SSD and a large 9GB bag with 568 payload files the results for C<verify_bag()> are:

                    processing time          run time             throughput
   Version       user time    system time    total time    total    MB/s
    v0.71        38.31s        1.60s         39.938s       100%     230
    v0.81        25.48s        1.68s         27.1s          67%     340
    v0.82        48.85s        3.89s          6.84s         17%    1346

=head2 How fast is L<Archive::BagIt::Fast>?

It depends. On my system with 8cores, SSD and a 38MB bag with 48 payload files the results for C<verify_bag()> are:

                  Rate         Base         Fast
   Base         3.01/s           --         -21%
   Fast         3.80/s          26%           --

On my system with 8cores, SSD and a large 9GB bag with 568 payload files the results for C<verify_bag()> are:

                s/iter         Base         Fast
   Base           74.6           --          -9%
   Fast           68.3           9%           --

But you should measure which variant is best for you. In general the default L<Archive::BagIt> is fast enough.

=head2 How to update an old bag of version v0.97 to v1.0?

You could try this:

   use Archive::BagIt;
   my $bag=Archive::BagIt->new( $my_old_bag_filepath );
   $bag->load();
   $bag->store();

=head2 How to create UTF-8 based paths under MS Windows?

For versions < Windows10: I have no idea and suggestions for a portable solution are very welcome!
For Windows 10: Thanks to L<https://superuser.com/questions/1033088/is-it-possible-to-set-locale-of-a-windows-application-to-utf-8/1451686#1451686>
you have to enable UTF-8 support via 'System Administration' -> 'Region' -> 'Administrative'
-> 'Region Settings' -> Flag 'Use Unicode UTF-8 for worldwide language support'

Hint: The better way is to use only portable filenames. See L<perlport> for details.

=head1 BUGS

None known yet.

=head1 THANKS

Thanks to Rob Schmidt <rjeschmi@gmail.com> for the trustful handover of the project and thanks for your initial work!
I would also like to thank Patrick Hochstenbach and Rusell McOrmond for their valuable and especially detailed advice!
And without the helpful, sometimes rude help of the IRC channel #perl I would have been stuck in a lot of problems.
Without the support of my colleagues at SLUB Dresden, the project would never have made it this far.

=head1 SYNOPSIS

This modules will hopefully help with the basic commands needed to create
and verify a bag. This part supports BagIt 1.0 according to RFC 8493 ([https://tools.ietf.org/html/rfc8493](https://tools.ietf.org/html/rfc8493)).



( run in 1.867 second using v1.01-cache-2.11-cpan-5a3173703d6 )