Alvis-Convert

 view release on metacpan or  search on metacpan

lib/Alvis/Buffer.pm  view on Meta::CPAN

# $Id: Buffer.pm,v 1.1 2006/12/01 09:40:24 buntine Exp $

package Alvis::Buffer;

use strict;
use warnings;
use Time::Simple;

use encoding 'utf8';
use open ':utf8';
binmode STDIN, ":utf8";
binmode STDERR, ":utf8";

our $VERSION = '0.10';

=head1 NAME

Alvis::Buffer - Perl extension for buffering utilities for the Alvis pipeline

=head1 SYNOPSIS

 use Alvis::Buffer;
 $Buffer::BUFFER = "/tmp/building.xml";
 $Buffer::verbose++;
 &Buffer::fix() or die "Cannot Buffer::fix";
 $in = new Alvis::Pipeline::Read(host => "harvester.alvis.info",
                                 port => 16716,
                                 spooldir => "/home/alvis/spool");
 while ($xml = $in->read(1)) {
     &clean_wrapping(\$xml);
     &Buffer::add($xml);
     if ( $Buffer::docs>1000 ) {
        $filename = &Buffer::save();
        if ( !$filename ) {
           &Buffer::close();
           die "Cannot Buffer::save";
        }
     }
 }
 $filename = &Buffer::save();
 &Buffer::close();

=head1 DESCRIPTION

This module provides a way of buffering Alvis XML into manageable chunks
as it is read in from a pipeline (Alvis::Pipeline).
Chunks can be controlled by file size or document count, but this is
done externally to the module, and the module simple provides a
function to save the current buffer contents.

Files of collected Alvis XML documents, with appropriate XML header
and footer parts, are saved in the relative directory "xml-add/"
under numbers 1,2,3, ...  At each time of storage, the current
directory is checked to see which number to use to store the latest
batch.   If "xml-add/" is empty, then "xml/" is checked instead.
Presumably, files in "xml-add/" are being processed into "xml/".

The implementation is independent of any pipeline,
and assumes a number of fixed directories.
Assumes files are in UTF-8, and that documents are present
in elements named <documentRecord>.

=head1 FUNCTIONS

=head2 fix()

 &Buffer::fix() or die "Cannot Buffer::fix";

Basic initialisation and checking to ensure the output buffer
is OK, and have the current document count and size in memory.
Returns 1 if everything is OK, else 0.



( run in 2.075 seconds using v1.01-cache-2.11-cpan-cdf2f3d4e48 )