Alvis-Convert
view release on metacpan or search on metacpan
lib/Alvis/Buffer.pm view on Meta::CPAN
# $Id: Buffer.pm,v 1.1 2006/12/01 09:40:24 buntine Exp $
package Alvis::Buffer;
use strict;
use warnings;
use Time::Simple;
use encoding 'utf8';
use open ':utf8';
binmode STDIN, ":utf8";
binmode STDERR, ":utf8";
our $VERSION = '0.10';
=head1 NAME
Alvis::Buffer - Perl extension for buffering utilities for the Alvis pipeline
=head1 SYNOPSIS
use Alvis::Buffer;
$Buffer::BUFFER = "/tmp/building.xml";
$Buffer::verbose++;
&Buffer::fix() or die "Cannot Buffer::fix";
$in = new Alvis::Pipeline::Read(host => "harvester.alvis.info",
port => 16716,
spooldir => "/home/alvis/spool");
while ($xml = $in->read(1)) {
&clean_wrapping(\$xml);
&Buffer::add($xml);
if ( $Buffer::docs>1000 ) {
$filename = &Buffer::save();
if ( !$filename ) {
&Buffer::close();
die "Cannot Buffer::save";
}
}
}
$filename = &Buffer::save();
&Buffer::close();
=head1 DESCRIPTION
This module provides a way of buffering Alvis XML into manageable chunks
as it is read in from a pipeline (Alvis::Pipeline).
Chunks can be controlled by file size or document count, but this is
done externally to the module, and the module simple provides a
function to save the current buffer contents.
Files of collected Alvis XML documents, with appropriate XML header
and footer parts, are saved in the relative directory "xml-add/"
under numbers 1,2,3, ... At each time of storage, the current
directory is checked to see which number to use to store the latest
batch. If "xml-add/" is empty, then "xml/" is checked instead.
Presumably, files in "xml-add/" are being processed into "xml/".
The implementation is independent of any pipeline,
and assumes a number of fixed directories.
Assumes files are in UTF-8, and that documents are present
in elements named <documentRecord>.
=head1 FUNCTIONS
=head2 fix()
&Buffer::fix() or die "Cannot Buffer::fix";
Basic initialisation and checking to ensure the output buffer
is OK, and have the current document count and size in memory.
Returns 1 if everything is OK, else 0.
( run in 2.075 seconds using v1.01-cache-2.11-cpan-cdf2f3d4e48 )