Datahub-Factory

 view release on metacpan or  search on metacpan

lib/Datahub/Factory/Introduction.pod  view on Meta::CPAN

L<Datahub::Factory> supports L<Log4perl>.

=head1 USE

=head2 Command line options

All commands share the following switches:

=over

=item --log_level --L [int]

Set the log_level. Takes a numeric parameter. Supported levels are: 1 (WARN), 
2 (INFO), 3 (DEBUG). WARN (1) is the default.

=item --log_output

Selects an output for the log messages. By default, it will send them to STDERR 
(pass STDERR as parameter), but STDOUT (STDOUT) and a log file.

=item --verbose -v

Set verbosity. Invoking the command with the --verbose, -v flag will render 
verbose output to the terminal.

=item --number -n [int]

Set number of records to process. Invoking the transport command with the --number, -n flag will process the first [int] records instead of all records available at the data source.

=back

=head2 Available Commands

=head3 help COMMAND

Documentation about command line options.

=head3 L<transport OPTIONS|https://metacpan.org/pod/Datahub::Factory::Command::transport>

Fetch data from a local or remote source, convert the data to a target format and structure and export the data to a local or remote data sink.

=head3 L<index OPTIONS|https://metacpan.org/pod/Datahub::Factory::Command::index>

Fetch data from a local source, and push it to an enterprise search engine in bulk. Currently only supports Apache Solr (L<https://lucene.apache.org/solr/>)

=head1 CONFIGURATION

Pipelines are defined in configuration files which are formatted according to the INI structure as expected by the L<Config::Simple> library. Any pipeline consists of 4 parts: a General block, an Importer block, a Fixer block and an Exporter block.

Examples can be found in L<https://github.com/thedatahub/Datahub-Factory-Pipelines>.

A simple example that pushes OAI data to a YAML output on STDOUT:

    [General]
    id_path = administrativeMetadata.recordWrap.recordID.0._

    [Importer]
    plugin = OAI

    [plugin_importer_OAI]
    endpoint =  https://datahub.vlaamsekunstcollectie.be/oai
    handler = +Catmandu::Importer::OAI::Parser::lido
    metadata_prefix = oai_lido

    [Fixer]
    plugin = Fix

    [plugin_fixer_Fix]
    file_name = '/home/foobar/datahub.fix'

    [Exporter]
    plugin = YAML

    [Exporter_YAML]

Note: The datahub.fix file is required, but can be left empty.

An example defining multiple fix transforms based on a context dependent value:

    [General]
    id_path = 'administrativeMetadata.recordWrap.recordID.0._'

    [Importer]
    plugin = OAI

    [plugin_importer_OAI]
    # endpoint = 'http://collections.britishart.yale.edu/oaicatmuseum/OAIHandler'
    endpoint = https://datahub.vlaamsekunstcollectie.be/oai
    handler = +Catmandu::Importer::OAI::Parser::lido
    metadata_prefix = oai_lido

    [Fixer]
    plugin = Fix

    [plugin_fixer_Fix]
    condition_path = '_metadata.administrativeMetadata.0.recordWrap.recordSource.0.legalBodyName.0.appellationValue.0._'
    fixers = MSK, GRO

    [plugin_fixer_GRO]
    condition = 'Musea Brugge - Groeningemuseum'
    file_name = '/Users/foobar/groeninge.fix'
    
    [plugin_fixer_MSK]
    condition = 'Museum voor Schone Kunsten Gent'
    file_name = '/Users/foobar/msk.fix'

    [Exporter]
    plugin = YAML

    [plugin_exporter_YAML]

Note: condition_path contains the Fix path to the node that contains the context-dependent value. The condtion parameter in each fixer contains the value against which the conditional check is performed.

=head1 API

Datahub::Factory leverages a plugin-based architecture. This makes extending the toolkit with new functionality fairly trivial.

New commands can be added by creating a new, separate Perl module that contains a `command_name.pm` file in the `lib/Datahub/Factory/Command` path. 
Datahub::Factory uses the L<Datahub::Factory::Command> namespace and leverages L<App::Cmd> internally.

New L<Datahub::Factory::Importer>, L<Datahub::Factory::Exporter>, L<Datahub::Factory::Fixer>, L<Datahub::Factory::Indexer> plugins can be added in the same way.

=head1 AUTHORS

=over

=item Matthias Vandermaesen C<< matthias.vandermaesen@vlaamsekunstcollectie.be >>

=item Pieter De Praetere C<< pieter@packed.be >>

=back

=head1 COPYRIGHT AND LICENSE

This software is copyright (c) 2016, 2019 by PACKED, vzw, Vlaamse Kunstcollectie, vzw.

This is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License, Version 3, June 2007.

=cut



( run in 0.762 second using v1.01-cache-2.11-cpan-39bf76dae61 )