Archive-BagIt
view release on metacpan or search on metacpan
README.mkdn view on Meta::CPAN
# NAME
Archive::BagIt - The main module to handle bags.
# VERSION
version 0.101
# NAME
Achive::BagIt - The main module to handle Bags
# SOURCE
The original development version was on github at [http://github.com/rjeschmi/Archive-BagIt](http://github.com/rjeschmi/Archive-BagIt)
and may be cloned from there.
The actual development version is available at [https://git.fsfe.org/art1pirat/Archive-BagIt](https://git.fsfe.org/art1pirat/Archive-BagIt)
# Conformance to RFC8493
The module should fulfill the RFC requirements, with following limitations:
- only encoding UTF-8 is supported
- version 0.97 or 1.0 allowed
- version 0.97 requires tag-/manifest-files with md5-fixity
- version 1.0 requires tag-/manifest-files with sha512-fixity
- BOM is not supported
- Carriage Return in bagit-files are not allowed
- fetch.txt is unsupported
At the moment only filepaths in linux-style are supported.
To get an more detailled overview, see the testsuite under `t/verify_bag.t` and corresponding test bags from the BagIt conformance testsuite of Library of Congress under `bagit_conformance_suite/`.
See [https://datatracker.ietf.org/doc/rfc8493/?include\_text=1](https://datatracker.ietf.org/doc/rfc8493/?include_text=1) for details.
# TODO
- enhanced testsuite
- reduce complexity
- use modern perl code
- add flag to enable very strict verify
# Backward Compatibility
To reduce the complexity of code in current module the support for
- parallel processing
=item synchronous I/O
is removed. The existing code is very fast, so there is no performance loss.
In near future the support for [Archive::BagIt::Fast](https://metacpan.org/pod/Archive%3A%3ABagIt%3A%3AFast) will be removed, because it needs hooks, which increase code
complexity in current module without any performance benefit.
# FAQ
## How to access the manifest-entries directly?
Try this:
foreach my $algorithm ( keys %{ $self->manifests }) {
my $entries_ref = $self->manifests->{$algorithm}->manifest_entries();
# $entries_ref returns a hashref like:
# {
# data/hello.txt "e7c22b994c59d9cf2b48e549b1e24666636045930d3da7c1acb299d1c3b7f931f94aae41edda2c2b207a36e10f8bcb8d45223e54878f5b316e7ce3b6bc019629"
# }
}
Similar for tagmanifests
## How fast is [Archive::BagIt](https://metacpan.org/pod/Archive%3A%3ABagIt)?
I have made great efforts to optimize Archive::BagIt for high throughput. There are two limiting factors:
- calculation of checksums, by switching from the module "Digest" to OpenSSL by using [Net::SSLeay](https://metacpan.org/pod/Net%3A%3ASSLeay) a significant
speed increase could be achieved.
- loading the files referenced in the manifest files was previously done serially and using synchronous I/O. By
using the [IO::Async](https://metacpan.org/pod/IO%3A%3AAsync) module, the files are loaded asynchronously, the performance gain is huge.
On my system with 8cores, SSD and a large 9GB bag with 568 payload files the results for `verify_bag()` are:
processing time run time throughput
Version user time system time total time total MB/s
v0.71 38.31s 1.60s 39.938s 100% 230
v0.81 25.48s 1.68s 27.1s 67% 340
v0.82 48.85s 3.89s 6.84s 17% 1346
## How fast is [Archive::BagIt::Fast](https://metacpan.org/pod/Archive%3A%3ABagIt%3A%3AFast)?
It depends. On my system with 8cores, SSD and a 38MB bag with 48 payload files the results for `verify_bag()` are:
Rate Base Fast
Base 3.01/s -- -21%
Fast 3.80/s 26% --
On my system with 8cores, SSD and a large 9GB bag with 568 payload files the results for `verify_bag()` are:
s/iter Base Fast
Base 74.6 -- -9%
Fast 68.3 9% --
But you should measure which variant is best for you. In general the default [Archive::BagIt](https://metacpan.org/pod/Archive%3A%3ABagIt) is fast enough.
## How to update an old bag of version v0.97 to v1.0?
You could try this:
use Archive::BagIt;
my $bag=Archive::BagIt->new( $my_old_bag_filepath );
$bag->load();
$bag->store();
## How to create UTF-8 based paths under MS Windows?
For versions < Windows10: I have no idea and suggestions for a portable solution are very welcome!
For Windows 10: Thanks to [https://superuser.com/questions/1033088/is-it-possible-to-set-locale-of-a-windows-application-to-utf-8/1451686#1451686](https://superuser.com/questions/1033088/is-it-possible-to-set-locale-of-a-windows-application-to-utf-8/...
you have to enable UTF-8 support via 'System Administration' -> 'Region' -> 'Administrative'
\-> 'Region Settings' -> Flag 'Use Unicode UTF-8 for worldwide language support'
Hint: The better way is to use only portable filenames. See [perlport](https://metacpan.org/pod/perlport) for details.
# BUGS
None known yet.
# THANKS
Thanks to Rob Schmidt <rjeschmi@gmail.com> for the trustful handover of the project and thanks for your initial work!
I would also like to thank Patrick Hochstenbach and Rusell McOrmond for their valuable and especially detailed advice!
And without the helpful, sometimes rude help of the IRC channel #perl I would have been stuck in a lot of problems.
Without the support of my colleagues at SLUB Dresden, the project would never have made it this far.
# SYNOPSIS
This modules will hopefully help with the basic commands needed to create
and verify a bag. This part supports BagIt 1.0 according to RFC 8493 (\[https://tools.ietf.org/html/rfc8493\](https://tools.ietf.org/html/rfc8493)).
You only need to know the following methods first:
## read a BagIt
use Archive::BagIt;
#read in an existing bag:
my $bag_dir = "/path/to/bag";
my $bag = Archive::BagIt->new($bag_dir);
## construct a BagIt around a payload
use Archive::BagIt;
my $bag2 = Archive::BagIt->make_bag($bag_dir);
## verify a BagIt-dir
use Archive::BagIt;
# Validate a BagIt archive against its manifest
my $bag3 = Archive::BagIt->new($bag_dir);
my $is_valid1 = $bag3->verify_bag();
# Validate a BagIt archive against its manifest, report all errors
my $bag4 = Archive::BagIt->new($bag_dir);
my $is_valid2 = $bag4->verify_bag( {report_all_errors => 1} );
## read a BagIt-dir, change something, store
Because all methods operate lazy, you should ensure to parse parts of the bag \*BEFORE\* you modify it.
Otherwise it will be overwritten!
use Archive::BagIt;
my $bag5 = Archive::BagIt->new($bag_dir); # lazy, nothing happened
$bag5->load(); # this updates the object representation by parsing the given $bag_dir
$bag5->store(); # this writes the bag new
# METHODS
## Constructor
The constructor sub, will create a bag with a single argument,
use Archive::BagIt;
#read in an existing bag:
my $bag_dir = "/path/to/bag";
my $bag = Archive::BagIt->new($bag_dir);
or use hashreferences
use Archive::BagIt;
#read in an existing bag:
my $bag_dir = "/path/to/bag";
my $bag = Archive::BagIt->new(
bag_path => $bag_dir,
);
The arguments are:
- `bag_path` - path to bag-directory
- `force_utf8` - if set the warnings about non portable filenames are disabled (default: enabled)
- `use_plugins` - expected manifest plugin strings, if set it uses the requested plugins,
example `Archive::BagIt::Plugin::Manifest::SHA256`.
HINT: this option \*disables\* the forced fixity check in `verify_bag()`!
The bag object will use $bag\_dir, BUT an existing $bag\_dir is not read. If you use `store()` an existing bag will be overwritten!
See `load()` if you want to parse/modify an existing bag.
## has\_force\_utf8()
to check if force\_utf8() was set.
If set it ignores warnings about potential filepath problems.
## bag\_path(\[$new\_value\])
Getter/setter for bag path
## metadata\_path()
Getter for metadata path
## payload\_path()
Getter for payload path
## checksum\_algos()
Getter for registered Checksums
## bag\_version()
Getter for bag version
## bag\_encoding()
Getter for bag encoding.
HINT: the current version of Archive::BagIt only supports UTF-8, but the method could return other values depending on given Bags.
## bag\_info(\[$new\_value\])
Getter/Setter for bag info. Expects/returns an array of HashRefs implementing simple key-value pairs.
HINT: RFC8493 does not allow \*reordering\* of entries!
## has\_bag\_info()
returns true if bag info exists.
## errors()
Getter to return collected errors after a `verify_bag()` call with Option `report_all_errors`
## warnings()
Getter to return collected warnings after a `verify_bag()` call
## digest\_callback()
This method could be reimplemented by derived classes to handle fixity checks in own way. The
getter returns an anonymous function with following interface:
my $digest = $self->digest_callback;
&$digest( $digestobject, $filename);
This anonymous function MUST use the `get_hash_string()` function of the [Archive::BagIt::Role::Algorithm](https://metacpan.org/pod/Archive%3A%3ABagIt%3A%3ARole%3A%3AAlgorithm) role,
which is implemented by each [Archive::BagIt::Plugin::Algorithm::XXXX](https://metacpan.org/pod/Archive%3A%3ABagIt%3A%3APlugin%3A%3AAlgorithm%3A%3AXXXX) module.
See [Archive::BagIt::Fast](https://metacpan.org/pod/Archive%3A%3ABagIt%3A%3AFast) for details.
## get\_baginfo\_values\_by\_key($searchkey)
Returns all values which match $searchkey, undef otherwise
## is\_baginfo\_key\_reserved\_as\_uniq($searchkey)
returns true if key is reserved and should be uniq
## is\_baginfo\_key\_reserved( $searchkey )
returns true if key is reserved
## verify\_baginfo()
checks baginfo-keys, returns true if all fine, otherwise returns undef and the message is pushed to `errors()`.
Warnings pushed to ` warnings() `
## delete\_baginfo\_by\_key( $searchkey )
deletes an entry of given $searchkey if exists.
If multiple entries with $searchkey exists, only the last one is deleted.
## exists\_baginfo\_key( $searchkey )
returns true if a given $searchkey exists
## append\_baginfo\_by\_key($searchkey, $newvalue)
Appends a key value pair to bag\_info.
HINT: check return code if append was successful, because some keys needs to be uniq.
## add\_or\_replace\_baginfo\_by\_key($searchkey, $newvalue)
It replaces the first entry with $newvalue if $searchkey exists, otherwise it appends.
## forced\_fixity\_algorithm()
Getter to return the forced fixity algorithm depending on BagIt version
## manifest\_files()
Getter to find all manifest-files
## tagmanifest\_files()
Getter to find all tagmanifest-files
## payload\_files()
Getter to find all payload-files
## non\_payload\_files()
Getter to find all non payload-files
## plugins()
Getter/setter to algorithm plugins
## manifests()
Getter/Setter to all manifests (objects)
## algos()
Getter/Setter to all registered Algorithms
## load\_plugins
As default SHA512 and MD5 will be loaded and therefore used. If you want to create a bag only with one or a specific
checksum-algorithm, you could use this method to (re-)register it. It expects list of strings with namespace of type:
Archive::BagIt::Plugin::Algorithm::XXX where XXX is your chosen fixity algorithm.
## load()
Triggers loading of an existing bag
## verify\_bag($opts)
A method to verify a bag deeply. If `$opts` is set with `{return_all_errors}` all fixity errors are reported.
The default ist to croak with error message if any error is detected.
HINT: You might also want to check [Archive::BagIt::Fast](https://metacpan.org/pod/Archive%3A%3ABagIt%3A%3AFast) to see a more direct way of accessing files (and thus faster).
## calc\_payload\_oxum()
returns an array with octets and streamcount of payload-dir
## calc\_bagsize()
returns a string with human readable size of paylod
## create\_bagit()
creates a bagit.txt file
## create\_baginfo()
creates a bag-info.txt file
Hint: the entries 'Bagging-Date', 'Bag-Software-Agent', 'Payload-Oxum' and 'Bag-Size' will be automagically set,
existing values in internal bag-info representation will be overwritten!
## store()
store a bagit-obj if bagit directory-structure was already constructed.
## init\_metadata( $bag\_path, $options)
A constructor that will just create the metadata directory
This won't make a bag, but it will create the conditions to do that eventually
## make\_bag( $bag\_path, $options )
A constructor that will make and return a bag from a directory,
It expects a preliminary bagit-dir exists.
If there a data directory exists, assume it is already a bag (no checking for invalid files in root)
# AVAILABILITY
The latest version of this module is available from the Comprehensive Perl
Archive Network (CPAN). Visit [http://www.perl.com/CPAN/](http://www.perl.com/CPAN/) to find a CPAN
site near you, or see [https://metacpan.org/module/Archive::BagIt/](https://metacpan.org/module/Archive::BagIt/).
# BUGS AND LIMITATIONS
You can make new bug reports, and view existing ones, through the
web interface at [http://rt.cpan.org](http://rt.cpan.org).
# AUTHOR
Andreas Romeyke <cpan@andreas.romeyke.de>
# COPYRIGHT AND LICENSE
This software is copyright (c) 2025 by Rob Schmidt <rjeschmi@gmail.com>, William Wueppelmann and Andreas Romeyke.
This is free software; you can redistribute it and/or modify it under
( run in 2.667 seconds using v1.01-cache-2.11-cpan-437f7b0c052 )