Catmandu-OAI

 view release on metacpan or  search on metacpan

lib/Catmandu/Importer/OAI.pm  view on Meta::CPAN

    }
    elsif ($self->listIdentifiers) {
        return $self->_list_records;
    }
    elsif ($self->listSets) {
        return $self->_list_sets
    }
    elsif ($self->getRecord) {
        return $self->_get_record;
    }
    elsif ($self->listMetadataFormats) {
        return $self->_list_metadata_formats;
    }
    elsif ($self->listRecords) {
        return $self->_list_records
    }
    else {
        return $self->_list_records
    }
}

sub generator {
    my ($self) = @_;

    return $self->dry ? $self->dry_run : $self->oai_run;
}

1;
__END__

=head1 NAME

Catmandu::Importer::OAI - Package that imports OAI-PMH feeds

=head1 SYNOPSIS

    # From the command line

    # Harvest records
    $ catmandu convert OAI --url http://myrepo.org/oai
    $ catmandu convert OAI --url http://myrepo.org/oai --metadataPrefix didl --handler raw

    # Harvest repository description
    $ catmandu convert OAI --url http://myrepo.org/oai --identify 1

    # Harvest identifiers
    $ catmandu convert OAI --url http://myrepo.org/oai --listIdentifiers 1

    # Harvest sets
    $ catmandu convert OAI --url http://myrepo.org/oai --listSets 1

    # Harvest metadataFormats
    $ catmandu convert OAI --url http://myrepo.org/oai --listMetadataFormats 1

    # Harvest one record
    $ catmandu convert OAI --url http://myrepo.org/oai --getRecord 1 --identifier oai:myrepo:1234

=head1 DESCRIPTION

L<Catmandu::Importer::OAI> is an L<Catmandu> importer to harvest metadata records
from an OAI-PMH endpoint.

=head1 CONFIGURATION

=over

=item url

OAI-PMH Base URL.

=item metadataPrefix

Metadata prefix to specify the metadata format. Set to C<oai_dc> by default.

=item handler( sub {} | $object | 'NAME' | '+NAME' )

Handler to transform each record from XML DOM (L<XML::LibXML::Element>) into
Perl hash.

Handlers can be provided as function reference, an instance of a Perl
package that implements 'parse', or by a package NAME. Package names should
be prepended by C<+> or prefixed with C<Catmandu::Importer::OAI::Parser>. E.g
C<foobar> will create a C<Catmandu::Importer::OAI::Parser::foobar> instance.

By default the handler L<Catmandu::Importer::OAI::Parser::oai_dc> is used for
metadataPrefix C<oai_dc>,  L<Catmandu::Importer::OAI::Parser::marcxml> for
C<marcxml>, L<Catmandu::Importer::OAI::Parser::mods> for
C<mods>, and L<Catmandu::Importer::OAI::Parser::struct> for other formats.
In addition there is L<Catmandu::Importer::OAI::Parser::raw> to return the XML
as it is.

=item identifier

Option return only results for this particular identifier

=item set

An optional set for selective harvesting.

=item from

An optional datetime value (YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ) as lower bound
for datestamp-based selective harvesting.

=item until

An optional datetime value (YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ) as upper bound
for datestamp-based selective harvesting.

=item identify

Harvest the repository description instead of all records.

=item getRecord

Harvest one record instead of all records.

=item listIdentifiers

Harvest identifiers instead of full records.

=item listRecords

Harvest full records. Default operation.

=item listSets

Harvest sets instead of records.

=item listMetadataFormats

Harvest metadata formats of records

=item resumptionToken

An optional resumptionToken to start harvesting from.

=item dry

Don't do any HTTP requests but return URLs that data would be queried from.

=item strict

Optional validate all parameters first against the OAI 2 spefications before
sending it to an OAI server. Default: undef.

=item xslt

Preprocess XML records with XSLT script(s) given as comma separated list or
array reference. Requires L<Catmandu::XML>.

=item max_retries

When an oai request fails, the importer will retry this number of times.
Set to '0' by default.

Internally the exponential backoff algorithm is used
for this. This means that after every failed request the importer
will choose a random number between 0 and 2^collision (excluded),
and wait that number of seconds. So the actual ammount of time before
the importer stops can differ:

 first retry:
    wait [ 0..2^1 [ seconds
 second retry:
    wait [ 0..2^2 [ seconds
 third retry:
    wait [ 0..2^3 [ seconds

 ..

=item sleep 

Sleep a number of seconds between OAI-PMH calls to the endpoint (default 0).

=item realm

An optional realm value. This value is used when the importer harvests from a
repository which is secured with basic authentication through Integrated Windows
Authentication (NTLM or Kerberos).

=item username

An optional username value. This value is used when the importer harvests from a
repository which is secured with basic authentication.

=item password

An optional password value. This value is used when the importer harvests from a
repository which is secured with basic authentication.

=back

=head1 METHOD

Every Catmandu::Importer is a L<Catmandu::Iterable> all its methods are
inherited. The Catmandu::Importer::OAI methods are not idempotent: OAI-PMH
feeds can only be read once.

In addition to methods inherited from L<Catmandu::Iterable>, this module
provides the following public methods:

=head2 handle_record( $dom )

Process an XML DOM as with xslt and handler as configured and return the
result.

=head1 ENVIRONMENT

If you are connected to the internet via a proxy server you need to set the
coordinates to this proxy in your environment:

    export http_proxy="http://localhost:8080"

If you are connecting to a HTTPS server and don't want to verify the validity
of certificates of the peer you can set the PERL_LWP_SSL_VERIFY_HOSTNAME to
false in your environment. This maybe required to connect to broken SSL servers:

    export PERL_LWP_SSL_VERIFY_HOSTNAME=0

=head1 SEE ALSO

L<Catmandu> ,
L<Catmandu::Importer>

=head1 AUTHOR

Nicolas Steenlant, C<< <nicolas.steenlant at ugent.be> >>

=head1 CONTRIBUTOR

Patrick Hochstenbach, C<< <patrick.hochstenbach at ugent.be> >>

Jakob Voss, C<< <nichtich at cpan.org> >>



( run in 1.899 second using v1.01-cache-2.11-cpan-5735350b133 )