Datahub-Factory
view release on metacpan or search on metacpan
lib/Datahub/Factory/Command/transport.pm view on Meta::CPAN
=item C<--exporter>
Location of the exporter configuration file.
=item C<--verbose>
Set this flag for pretty output of the ETL processing.
=back
=head2 Pipeline configuration file
The I<pipeline configuration file> is in the L<INI format|http://search.cpan.org/~sherzodr/Config-Simple-4.59/Simple.pm#INI-FILE> and its location is
provided to the application using the C<--pipeline> switch.
The file is broadly divided in two parts: the first (shortest) part configures
the pipeline itself and sets the plugins to use for the I<import>, I<fix> and
I<export> actions. The second part sets options specific for the used plugins.
=head4 Pipeline configuration
This part has three sections: C<[Importer]>, C<[Fixer]> and C<[Exporter]>.
Every section has just one option: C<plugin>. Set this to the plugin you
want to use for every action.
All current supported plugins are in the C<Importer> and C<Exporter> folders.
For the C<[Fixer]>, only the I<Fix> plugin is supported.
Supported I<Importer> plugins:
=over
=item L<TMS|Datahub::Factory::Importer::TMS>
=item L<Adlib|Datahub::Factory::Importer::Adlib>
=item L<OAI|Datahub::Factory::Importer::OAI>
=back
Supported I<Exporter> plugins:
=over
=item L<Datahub|Datahub::Factory::Exporter::Datahub>
=item L<LIDO|Datahub::Factory::Exporter::LIDO>
=item L<YAML|Datahub::Factory::Exporter::YAML>
=back
=head3 Plugin configuration
[Importer]
plugin = OAI
id_path = 'lidoRecID.0._'
[plugin_importer_OAI]
endpoint = https://oai.my.museum/oai
[Fixer]
plugin = Fix
[plugin_fixer_Fix]
file_name = '/home/datahub/my.fix'
[Exporter]
plugin = YAML
[plugin_exporter_YAML]
All plugins have their own configuration options in sections called
C<[plugin_type_name]> where C<type> can be I<importer>, I<exporter>
or I<fixer> and C<name> is the name of the plugin.
All plugins define their own options as parameters to the respective
plugin. All possible parameters are valid items in the configuration
section.
If a plugin requires no options, you still need to create the (empty)
configuration section (e.g. C<[plugin_exporter_LIDO]> in the above
example).
=head4 Importer plugin
The C<id_path> option contains the path (in Fix syntax) of the identifier of
each record in your data after the fix has been applied, but before it is
submitted to the I<Exporter>. It is used for reporting and logging.
=head4 Fixer plugin
[plugin_fixer_Fix]
condition = record.institution_name
fixers = FOO, BAR
[plugin_fixer_Fix]
file_name = /home/datahub/my.fix
The C<[plugin_fixer_Fix]> can directly load a fix file (via the option
C<file_name>) or can be configured to conditionally load a different
fix file to support multiple fix files for the same data stream (e.g.
when two institutions with different data models use the same API
endpoint). This is done by setting the C<condition> and C<fixers>
options.
=head4 Conditional fixers
[plugin_fixer_Fix]
condition = record.institution_name
fixers = FOO, BAR
[plugin_fixer_FOO]
condition = 'Museum of Foo'
file_name = '/home/datahub/foo.fix'
[plugin_fixer_BAR]
condition = 'Museum of Bar'
file_name = '/home/datahub/bar.fix'
If you want to separate the data stream into multiple (smaller) streams with
a different fix file for each stream, you can do this by setting the appropriate
options in the C<[plugin_fixer_Fix]> block. Note that C<id_path> is still mandatory.
Set C<condition> to the Fix-compatible path in the original stream that holds
the condition you want to use to split the stream.
Provide a comma-separated list of fixer plugins in C<fixers>.
For every fixer plugin in C<fixers>, create a configuration block called
C<[plugin_fixer_name]> and provide the following options:
=over
=item C<condition>
The value that the C<condition> from C<[plugin_fixer_Fix]> must have for
the record to belong to this block.
=item C<file_name>
The location of the fix file that must be executed for every record in this
block.
=back
=head4 Example configuration file
[Importer]
plugin = Adlib
id_path = 'record.id'
[Fixer]
plugin = Fix
[Exporter]
plugin = Datahub
[plugin_importer_Adlib]
file_name = '/tmp/adlib.xml'
data_path = 'recordList.record.*'
[plugin_fixer_Fix]
file_name = '/tmp/msk.fix'
( run in 1.758 second using v1.01-cache-2.11-cpan-ceb78f64989 )