Archive-BagIt

 view release on metacpan or  search on metacpan

README.mkdn  view on Meta::CPAN


## How to access the manifest-entries directly?

Try this:

    foreach my $algorithm ( keys %{ $self->manifests }) {
        my $entries_ref = $self->manifests->{$algorithm}->manifest_entries();
        # $entries_ref returns a hashref like:
        # {
        #     data/hello.txt   "e7c22b994c59d9cf2b48e549b1e24666636045930d3da7c1acb299d1c3b7f931f94aae41edda2c2b207a36e10f8bcb8d45223e54878f5b316e7ce3b6bc019629"
        # }
    }

Similar for tagmanifests

## How fast is [Archive::BagIt](https://metacpan.org/pod/Archive%3A%3ABagIt)?

I have made great efforts to optimize Archive::BagIt for high throughput. There are two limiting factors:

- calculation of checksums, by switching from the module "Digest" to OpenSSL by using [Net::SSLeay](https://metacpan.org/pod/Net%3A%3ASSLeay) a significant
   speed increase could be achieved.
- loading the files referenced in the manifest files was previously done serially and using synchronous I/O. By
   using the [IO::Async](https://metacpan.org/pod/IO%3A%3AAsync) module, the files are loaded asynchronously, the performance gain is huge.

On my system with 8cores, SSD and a large 9GB bag with 568 payload files the results for `verify_bag()` are:

                     processing time          run time             throughput
    Version       user time    system time    total time    total    MB/s
     v0.71        38.31s        1.60s         39.938s       100%     230
     v0.81        25.48s        1.68s         27.1s          67%     340
     v0.82        48.85s        3.89s          6.84s         17%    1346

## How fast is [Archive::BagIt::Fast](https://metacpan.org/pod/Archive%3A%3ABagIt%3A%3AFast)?

It depends. On my system with 8cores, SSD and a 38MB bag with 48 payload files the results for `verify_bag()` are:

                   Rate         Base         Fast
    Base         3.01/s           --         -21%
    Fast         3.80/s          26%           --

On my system with 8cores, SSD and a large 9GB bag with 568 payload files the results for `verify_bag()` are:

                 s/iter         Base         Fast
    Base           74.6           --          -9%
    Fast           68.3           9%           --

But you should measure which variant is best for you. In general the default [Archive::BagIt](https://metacpan.org/pod/Archive%3A%3ABagIt) is fast enough.

## How to update an old bag of version v0.97 to v1.0?

You could try this:

    use Archive::BagIt;
    my $bag=Archive::BagIt->new( $my_old_bag_filepath );
    $bag->load();
    $bag->store();

## How to create UTF-8 based paths under MS Windows?

For versions < Windows10: I have no idea and suggestions for a portable solution are very welcome!
For Windows 10: Thanks to [https://superuser.com/questions/1033088/is-it-possible-to-set-locale-of-a-windows-application-to-utf-8/1451686#1451686](https://superuser.com/questions/1033088/is-it-possible-to-set-locale-of-a-windows-application-to-utf-8/...
you have to enable UTF-8 support via 'System Administration' -> 'Region' -> 'Administrative'
\-> 'Region Settings' -> Flag 'Use Unicode UTF-8 for worldwide language support'

Hint: The better way is to use only portable filenames. See [perlport](https://metacpan.org/pod/perlport) for details.

# BUGS

None known yet.

# THANKS

Thanks to Rob Schmidt <rjeschmi@gmail.com> for the trustful handover of the project and thanks for your initial work!
I would also like to thank Patrick Hochstenbach and Rusell McOrmond for their valuable and especially detailed advice!
And without the helpful, sometimes rude help of the IRC channel #perl I would have been stuck in a lot of problems.
Without the support of my colleagues at SLUB Dresden, the project would never have made it this far.

# SYNOPSIS

This modules will hopefully help with the basic commands needed to create
and verify a bag. This part supports BagIt 1.0 according to RFC 8493 (\[https://tools.ietf.org/html/rfc8493\](https://tools.ietf.org/html/rfc8493)).

You only need to know the following methods first:

## read a BagIt

    use Archive::BagIt;

    #read in an existing bag:
    my $bag_dir = "/path/to/bag";
    my $bag = Archive::BagIt->new($bag_dir);

## construct a BagIt around a payload

    use Archive::BagIt;
    my $bag2 = Archive::BagIt->make_bag($bag_dir);

## verify a BagIt-dir

    use Archive::BagIt;

    # Validate a BagIt archive against its manifest
    my $bag3 = Archive::BagIt->new($bag_dir);
    my $is_valid1 = $bag3->verify_bag();

    # Validate a BagIt archive against its manifest, report all errors
    my $bag4 = Archive::BagIt->new($bag_dir);
    my $is_valid2 = $bag4->verify_bag( {report_all_errors => 1} );

## read a BagIt-dir, change something, store

Because all methods operate lazy, you should ensure to parse parts of the bag \*BEFORE\* you modify it.
Otherwise it will be overwritten!

    use Archive::BagIt;
    my $bag5 = Archive::BagIt->new($bag_dir); # lazy, nothing happened
    $bag5->load(); # this updates the object representation by parsing the given $bag_dir
    $bag5->store(); # this writes the bag new

# METHODS



( run in 0.832 second using v1.01-cache-2.11-cpan-5a3173703d6 )