Archive-BagIt
view release on metacpan or search on metacpan
README.mkdn view on Meta::CPAN
# NAME
Archive::BagIt - The main module to handle bags.
# VERSION
version 0.101
# NAME
Achive::BagIt - The main module to handle Bags
# SOURCE
The original development version was on github at [http://github.com/rjeschmi/Archive-BagIt](http://github.com/rjeschmi/Archive-BagIt)
and may be cloned from there.
The actual development version is available at [https://git.fsfe.org/art1pirat/Archive-BagIt](https://git.fsfe.org/art1pirat/Archive-BagIt)
# Conformance to RFC8493
The module should fulfill the RFC requirements, with following limitations:
- only encoding UTF-8 is supported
- version 0.97 or 1.0 allowed
- version 0.97 requires tag-/manifest-files with md5-fixity
- version 1.0 requires tag-/manifest-files with sha512-fixity
- BOM is not supported
- Carriage Return in bagit-files are not allowed
- fetch.txt is unsupported
At the moment only filepaths in linux-style are supported.
To get an more detailled overview, see the testsuite under `t/verify_bag.t` and corresponding test bags from the BagIt conformance testsuite of Library of Congress under `bagit_conformance_suite/`.
See [https://datatracker.ietf.org/doc/rfc8493/?include\_text=1](https://datatracker.ietf.org/doc/rfc8493/?include_text=1) for details.
# TODO
- enhanced testsuite
- reduce complexity
- use modern perl code
- add flag to enable very strict verify
# Backward Compatibility
To reduce the complexity of code in current module the support for
- parallel processing
=item synchronous I/O
is removed. The existing code is very fast, so there is no performance loss.
In near future the support for [Archive::BagIt::Fast](https://metacpan.org/pod/Archive%3A%3ABagIt%3A%3AFast) will be removed, because it needs hooks, which increase code
complexity in current module without any performance benefit.
# FAQ
## How to access the manifest-entries directly?
Try this:
foreach my $algorithm ( keys %{ $self->manifests }) {
my $entries_ref = $self->manifests->{$algorithm}->manifest_entries();
# $entries_ref returns a hashref like:
# {
# data/hello.txt "e7c22b994c59d9cf2b48e549b1e24666636045930d3da7c1acb299d1c3b7f931f94aae41edda2c2b207a36e10f8bcb8d45223e54878f5b316e7ce3b6bc019629"
# }
}
Similar for tagmanifests
## How fast is [Archive::BagIt](https://metacpan.org/pod/Archive%3A%3ABagIt)?
I have made great efforts to optimize Archive::BagIt for high throughput. There are two limiting factors:
- calculation of checksums, by switching from the module "Digest" to OpenSSL by using [Net::SSLeay](https://metacpan.org/pod/Net%3A%3ASSLeay) a significant
speed increase could be achieved.
- loading the files referenced in the manifest files was previously done serially and using synchronous I/O. By
using the [IO::Async](https://metacpan.org/pod/IO%3A%3AAsync) module, the files are loaded asynchronously, the performance gain is huge.
On my system with 8cores, SSD and a large 9GB bag with 568 payload files the results for `verify_bag()` are:
processing time run time throughput
Version user time system time total time total MB/s
v0.71 38.31s 1.60s 39.938s 100% 230
v0.81 25.48s 1.68s 27.1s 67% 340
v0.82 48.85s 3.89s 6.84s 17% 1346
( run in 0.541 second using v1.01-cache-2.11-cpan-524268b4103 )