Catmandu-OAI
view release on metacpan or search on metacpan
lib/Catmandu/Importer/OAI.pm view on Meta::CPAN
}
elsif ($self->listIdentifiers) {
return $self->_list_records;
}
elsif ($self->listSets) {
return $self->_list_sets
}
elsif ($self->getRecord) {
return $self->_get_record;
}
elsif ($self->listMetadataFormats) {
return $self->_list_metadata_formats;
}
elsif ($self->listRecords) {
return $self->_list_records
}
else {
return $self->_list_records
}
}
sub generator {
my ($self) = @_;
return $self->dry ? $self->dry_run : $self->oai_run;
}
1;
__END__
=head1 NAME
Catmandu::Importer::OAI - Package that imports OAI-PMH feeds
=head1 SYNOPSIS
# From the command line
# Harvest records
$ catmandu convert OAI --url http://myrepo.org/oai
$ catmandu convert OAI --url http://myrepo.org/oai --metadataPrefix didl --handler raw
# Harvest repository description
$ catmandu convert OAI --url http://myrepo.org/oai --identify 1
# Harvest identifiers
$ catmandu convert OAI --url http://myrepo.org/oai --listIdentifiers 1
# Harvest sets
$ catmandu convert OAI --url http://myrepo.org/oai --listSets 1
# Harvest metadataFormats
$ catmandu convert OAI --url http://myrepo.org/oai --listMetadataFormats 1
# Harvest one record
$ catmandu convert OAI --url http://myrepo.org/oai --getRecord 1 --identifier oai:myrepo:1234
=head1 DESCRIPTION
L<Catmandu::Importer::OAI> is an L<Catmandu> importer to harvest metadata records
from an OAI-PMH endpoint.
=head1 CONFIGURATION
=over
=item url
OAI-PMH Base URL.
=item metadataPrefix
Metadata prefix to specify the metadata format. Set to C<oai_dc> by default.
=item handler( sub {} | $object | 'NAME' | '+NAME' )
Handler to transform each record from XML DOM (L<XML::LibXML::Element>) into
Perl hash.
Handlers can be provided as function reference, an instance of a Perl
package that implements 'parse', or by a package NAME. Package names should
be prepended by C<+> or prefixed with C<Catmandu::Importer::OAI::Parser>. E.g
C<foobar> will create a C<Catmandu::Importer::OAI::Parser::foobar> instance.
By default the handler L<Catmandu::Importer::OAI::Parser::oai_dc> is used for
metadataPrefix C<oai_dc>, L<Catmandu::Importer::OAI::Parser::marcxml> for
C<marcxml>, L<Catmandu::Importer::OAI::Parser::mods> for
C<mods>, and L<Catmandu::Importer::OAI::Parser::struct> for other formats.
In addition there is L<Catmandu::Importer::OAI::Parser::raw> to return the XML
as it is.
=item identifier
Option return only results for this particular identifier
=item set
An optional set for selective harvesting.
=item from
An optional datetime value (YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ) as lower bound
for datestamp-based selective harvesting.
=item until
An optional datetime value (YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ) as upper bound
for datestamp-based selective harvesting.
=item identify
Harvest the repository description instead of all records.
=item getRecord
Harvest one record instead of all records.
=item listIdentifiers
Harvest identifiers instead of full records.
=item listRecords
Harvest full records. Default operation.
=item listSets
Harvest sets instead of records.
=item listMetadataFormats
Harvest metadata formats of records
=item resumptionToken
An optional resumptionToken to start harvesting from.
=item dry
Don't do any HTTP requests but return URLs that data would be queried from.
=item strict
Optional validate all parameters first against the OAI 2 spefications before
sending it to an OAI server. Default: undef.
=item xslt
Preprocess XML records with XSLT script(s) given as comma separated list or
array reference. Requires L<Catmandu::XML>.
=item max_retries
When an oai request fails, the importer will retry this number of times.
Set to '0' by default.
Internally the exponential backoff algorithm is used
for this. This means that after every failed request the importer
will choose a random number between 0 and 2^collision (excluded),
and wait that number of seconds. So the actual ammount of time before
the importer stops can differ:
first retry:
wait [ 0..2^1 [ seconds
second retry:
wait [ 0..2^2 [ seconds
third retry:
wait [ 0..2^3 [ seconds
..
=item sleep
Sleep a number of seconds between OAI-PMH calls to the endpoint (default 0).
=item realm
An optional realm value. This value is used when the importer harvests from a
repository which is secured with basic authentication through Integrated Windows
Authentication (NTLM or Kerberos).
=item username
An optional username value. This value is used when the importer harvests from a
repository which is secured with basic authentication.
=item password
An optional password value. This value is used when the importer harvests from a
repository which is secured with basic authentication.
=back
=head1 METHOD
Every Catmandu::Importer is a L<Catmandu::Iterable> all its methods are
inherited. The Catmandu::Importer::OAI methods are not idempotent: OAI-PMH
feeds can only be read once.
In addition to methods inherited from L<Catmandu::Iterable>, this module
provides the following public methods:
=head2 handle_record( $dom )
Process an XML DOM as with xslt and handler as configured and return the
result.
=head1 ENVIRONMENT
If you are connected to the internet via a proxy server you need to set the
coordinates to this proxy in your environment:
export http_proxy="http://localhost:8080"
If you are connecting to a HTTPS server and don't want to verify the validity
of certificates of the peer you can set the PERL_LWP_SSL_VERIFY_HOSTNAME to
false in your environment. This maybe required to connect to broken SSL servers:
export PERL_LWP_SSL_VERIFY_HOSTNAME=0
=head1 SEE ALSO
L<Catmandu> ,
L<Catmandu::Importer>
=head1 AUTHOR
Nicolas Steenlant, C<< <nicolas.steenlant at ugent.be> >>
=head1 CONTRIBUTOR
Patrick Hochstenbach, C<< <patrick.hochstenbach at ugent.be> >>
Jakob Voss, C<< <nichtich at cpan.org> >>
( run in 1.899 second using v1.01-cache-2.11-cpan-5735350b133 )