Alvis-Pipeline

 view release on metacpan or  search on metacpan

lib/Alvis/Pipeline.pm  view on Meta::CPAN


Alvis::Pipeline - Perl extension for passing XML documents along the Alvis pipeline

=head1 SYNOPSIS

 use Alvis::Pipeline;
 $in = new Alvis::Pipeline::Read(host => "harvester.alvis.info",
                                 port => 16716,
                                 spooldir => "/home/alvis/spool");
 $out = new Alvis::Pipeline::Write(port => 29168);
 while ($xml = $in->read(1)) {
     $transformed = process($xml);
     $out->write($transformed);
 }

=head1 DESCRIPTION

This module provides a simple means for components in the Alvis
pipeline to pass documents between themselves without needing to know
about the underlying transfer protocol.  Pipe objects may be created
either for reading or writing; components in the middle of the
pipeline will create one of each.  Pipes support exactly one method,
which is either C<read()> or C<write()> depending on the type of the
pipe.  The granularity of reading and writing is
the XML document; neither smaller fragments nor larger aggregates can
be transferred.

The documents expected to pass through this pipeline are those
representing documents acquired for, and being analysed by, Alvis.
These documents are expressed as XML contructed according to the
specifications described in the Metadata Format for Enriched
Documents.  However, while this is the motivating example pipeline
that led to the creation of this module, there is no reason why other
kinds of documents should not also be passed through pipeline using
this software.

The pipeline protocol is described below, to facilitate the
development of indepedent implementations in other languages.

=head1 METHODS

=head2 new()

 $in = new Alvis::Pipeline::Read(host => "harvester.alvis.info",
                                 port => 16716,
                                 spooldir => "/home/alvis/spool");
 $out = new Alvis::Pipeline::Write(port => 29168);

Creates a new pipeline, either for reading or for writing.  Any number
of I<name>-I<value> pairs may be passed as parameters.  Among these,
most are optional but some are mandatory:

=over 4

=item *

Read-pipes must specify both the C<host> and C<port> of the component
that they will read from, and C<spooldir>,
a directory that is writable to the user the process is running as.
(When files become available by being written down a write-pipe, they
are immediately read in the background, then stored in the
specified spool directory until picked up by a reader.)

=item *

Pipes may specify C<loglevel> [default 0]: higher levels
providing some commentary on under-the-hood behaviour.

=back

=head2 option()

 $old = $pipe->option("foo");
 $pipe->option(bar => 23);

Can be used to set the value for a specific option, or to retrieve its
value.

=head2 read()

 # Read-pipes only
 $xml = $in->read($block);

Reads an XML document from the specified inbound pipe, and returns it
as a string.  If there is no document ready to read, it
either returns an undefined value (if no argment is provided, or if
the argument is false) or blocks if the argument is provided and true.
C<read()> throws an exception if an error occurs.

Once a document has been read in this way, it will no longer be
available for subsequent C<read()>s, so a sequence of C<read()> calls
will read all the available records one at a time.

Once a document has been read, it is the responsibility of the reader
to process it and pass it on to the next component in the pipeline.
If something catastrophic happens, and the record is lost, then an
out-of-band mechanism may be used to request a new copy of the record
from the writer.  The C<Alvis::Pipeline> module does not directly
support such requests; they are considered to be application-level and
therefore not appropriate for this low-level module to deal with.

(As a matter of application design, we offer the observation that, in
Alvis, the C<<id>> attribute on the top-level element specifies the
identity of the record, and should remain changed even if the record
itself is updated; so any out-of-band request for records to be
re-sent should do so by specifying the IDs of the required records.)

=head2 write()

 # Write-pipes only
 $out->write($xmlDocument);

Writes an XML document to the specified outbound pipe.  The document
may be passed in either as a DOM tree (C<XML::LibXML::Element>) or a
string containing the text of the document.  Throws an exception if an
error occurs.

This method returns only when the record has been successfully
transferred to the receiver at the other end of the pipeline; so the
sender is then able to forget about the transferred, which is now the
responsibility of the next component in the pipeline.



( run in 1.225 second using v1.01-cache-2.11-cpan-d8267643d1d )