OpenOffice-PerlPoint
view release on metacpan or search on metacpan
lib/OpenOffice/PerlPoint.pm view on Meta::CPAN
C<OpenOffice::PerlPoint> is a translator class to transform Open Office 1.0 and 2.0 (and
generally OASIS Open Document) documents into PerlPoint sources. It is based on
C<OpenOffice::OODoc>.
Once you have transformed an Open Office or Open Document document into PerlPoint, it may
be furtherly processed using the PerlPoint utilities.
If you prefer, you do not need to perform an explicit transformation. Beginning with
release 0.40, C<PerlPoint::Package> can process Open Office / Open Document sources directly.
Please see C<PerlPoint::Parser> for details, or the documentation that comes with PerlPoint.
B<This software is in alpha state. It supports just a I<subset> of the source format features.>
Please see the I<NOTES> sections below.
=head1 METHODS
=cut
# declare package
package OpenOffice::PerlPoint;
# declare version
$VERSION=0.03;
# pragmata
use strict;
# load modules
use Carp;
use Safe;
use Storable;
use Net::Ping;
use Text::Wrapper;
use File::Basename;
use LWP::UserAgent;
use Text::Template;
use POSIX qw(strftime);
use OpenOffice::OODoc 2.00;
# declare attributes
use fields qw(
file
archive
metadata
docContent
docStyles
content
notes
metaData
userAgent
ping
skipmetadata
imagebufferdir
metadataTemplate
userdefinedDocdata
);
# define data: delimiter handling
my %delimiters=(
'text:footnote-citation' => {
begin => '[',
end => ']',
},
'text:footnote-body' => {
begin => '{NOTE: ',
end => '}',
},
'text:span' => {
begin => '<<',
end => '>>',
},
'text:list-item' => {
begin => '',
end => '',
},
);
# define data: style extraction directives for traversal (see traverseElement() below)
my @styles=(
['B', 'properties', 'fo:font-weight', 'bold', 0, '\B<', '>'],
['I', 'properties', 'fo:font-style', 'italic', 0, '\I<', '>'],
['U', 'properties', 'style:text-underline', 'single', 0, '\U<', '>'], # OO 1.0
['U', 'properties', 'style:text-underline-style', 'solid', 0, '\U<', '>'], # OD (missing the previous part -bug?!)
['F', 'properties', 'fo:color', qr/^(\#[\da-fA-F]{6})$/, 0, '\F{color="$1"}<', '>'], # first backslash is for highlightning
['C', 'properties', 'style:font-name', qr/^(Courier New)$/, 0, '\C<', '>'],
['BLOCK', 'references', 'style:parent-style-name', qr/^(Code)$/, 0, ' ' x 3, ''],
);
# init wrappers
my ($paragraphWrapper, $listWrapper);
$paragraphWrapper=Text::Wrapper->new(
columns => 76,
par_start => '',
body_start => ''
);
$listWrapper=Text::Wrapper->new(
columns => 76,
par_start => ' ',
body_start => ' '
);
=pod
=head2 new()
The constructor.
B<Parameters:>
All parameters except the first are named.
lib/OpenOffice/PerlPoint.pm view on Meta::CPAN
=item file
The (absolute or relative) path to the Office document that should be converted.
=item imagebufferdir
OO document images refer to images stored within the document or located externally at
a location that is specified by an URL. Both image sources cannot be accessed by PerlPoint,
so the converter makes copies from those sources and refers to I<them>. The C<imagebufferdir>
option specifies where these intermediate copies should be stored. The directory is made
unless it already exists.
A I<relative> path will result in a directory relative to the document. An I<absolute> path
is suitable if images from various documents should be collected in one place, or if the
resulting PerlPoint document should be written to a special path.
=item metadataTemplate
A template to include document meta data to the transformation result. The template is
expected to be in C<Text::Template> format, in a safe compartment.
These data are available:
=over 4
=item %metadata
A hash of all document meta data. The keys of this hash are the following, while the
values hold the document data assigned to that keys.
=over 4
=item title
document title
=item subject
document subject
=item description
document description
=item creator
document author
=item date
last modification
=item keywords
keywords describing the document
=item User defined fields
All names defined by C<userdefinedDocdata>.
=back
=item %tools
Keys: C<generator> holds the name of the program that wrote the OO document.
C<converter> holds the name of the converter, usually the name of this module.
=item $source
Source name, usually the name set by option C<file>.
=back
This option has no effect if C<skipmetadata> is set.
=item skipmetadata
If set to a true value meta data processing is bypassed.
=item userdefinedDocdata
Each OO document can be described by various predefined data, which are set automatically
(like the modification date) or set up by the document author in a dialog (like the
documents title). Additionally, OO allows to define up to four user informations. Called
C<info1> to I<info4> by default, they can be named individually if required.
This option expects a reference to an array of names for those user defined document data entries.
The names can be used in templates passed in via option C<metadataTemplate> to access the data
stored in the related document fields.
=back
B<Returns:> the new object.
B<Example:>
# build an object
my $oo2pp=new OpenOffice::PerlPoint(file=>$ooFile);
=cut
sub new
{
# get parameters
my ($class, @pars)=@_;
# build parameter hash
confess "[BUG] The number of parameters should be even - use named parameters, please.\n" if @pars%2;
my %pars=@pars;
# check parameters
confess "[BUG] Missing class name.\n" unless $class;
confess "[BUG] Missing file parameter.\n" unless exists $pars{file};
confess "[BUG] Missing image buffer directory parameter.\n" unless exists $pars{imagebufferdir};
# build object
my __PACKAGE__ $me=fields::new($class);
# store configuration
$me->{$_}=$pars{$_} for qw(
file
imagebufferdir
skipmetadata
metadataTemplate
userdefinedDocdata
);
# aggregate a user agent object
$me->{userAgent}=new LWP::UserAgent;
$me->{userAgent}->timeout(1);
$me->{userAgent}->env_proxy;
# and a Net::Ping object
$me->{ping}=new Net::Ping;
# build archive object
$me->{archive}=ooFile($pars{file});
confess "[Error] $pars{file} is no regular OpenOffice.org file.\n" unless $me->{archive};
# extract metadata
$me->{metadata}=ooMeta(archive => $me->{archive});
carp "[Warn] $pars{file} has not standard OOO properties, it looks strange.\n" unless $me->{metadata};
# extract document (in content and style parts)
$me->{docContent}=ooDocument(
archive => $me->{archive},
member => 'content',
delimiters => \%delimiters,
);
confess "[Error] No standard OOO content found in $pars{file}!\n" unless $me->{docContent};
$me->{docStyles}=ooDocument(
archive => $me->{archive},
member => 'styles',
delimiters => \%delimiters,
);
confess "[Error] No standard OOO styles found in $pars{file}!\n" unless $me->{docContent};
# the strange next lines prevent the getText() method of
# OpenOffice::OODoc::Text (see the corresponding man page) from using
# its default tags for spans and footnotes
delete $me->{docContent}{delimiters}{'text:span'};
delete $me->{docContent}{delimiters}{'text:footnote-body'};
# here we select the tab as field separator for table field output
# (the default is ";" as for CSV output)
$me->{docContent}{field_separator}="\t";
# in the next sequence, we will extract all the footnotes, store them for
# later processing and remove them from the content
$me->{notes}=[$me->{docContent}->getFootnoteList];
$me->{docContent}->removeElement($_) for @{$me->{notes}};
# get the full list of text objects (without the previously removed footnotes)
$me->{content}=[$me->{docContent}->getTextElementList];
# reply the new object
$me;
}
# TODO: make document variable names configurable
sub convertMetadata
{
# get and check parameters
((my __PACKAGE__ $me), (my ($item, $guard)))=@_;
confess "[BUG] Missing object parameter.\n" unless $me;
confess "[BUG] Object parameter is no ", __PACKAGE__, " object.\n" unless ref $me and ref $me eq __PACKAGE__;
# variables
my ($perlpoint, $title, $subject, $description, $author, $date, $version, $generator, $copyright, $authormail);
# anything to do?
if ($me->{metadata})
{
# predefined meta data: title, subject, description, author
$me->{metaData}{$_}=$me->{metadata}->$_ || 'unknown' for qw(
creator
date
description
keywords
subject
title
);
# get user defined metadata, as set up by caller
my %userDefinedMetadata=$me->{metadata}->user_defined;
$me->{metaData}{$_}=$userDefinedMetadata{$_} || 'unknown' for @{$me->{userdefinedDocdata}};
# get generator
$generator=$me->{metadata}->generator;
$generator='unknown program' unless $generator;
}
# process meta data, if configured
if (defined $me->{metadataTemplate})
{
# build safe environment
my $safe=new Safe;
# clone meta data into a transfer variable
my %transfer=(
# meta data
metaData => $me->{metaData} ? Storable::dclone($me->{metaData}) : {},
# generator and converter
tools => {
generator => $generator,
converter => __PACKAGE__,
},
# more data
source => $me->{file},
);
# build a template object, process the template and add the result
# (template bugs stop the program immediately - as they are considered bugs, not (user) errors)
my $template=new Text::Template(TYPE => 'STRING', SOURCE => $me->{metadataTemplate})
or die "[BUG] Couldn't construct template: $Text::Template::ERROR\n";
$perlpoint.=$template->fill_in(SAFE => $safe, HASH => \%transfer)
or die "[BUG] Couldn't process template: $Text::Template::ERROR\n";
}
# supply result
$perlpoint;
}
#-----------------------------------------------------------------------------
# convert completely
=pod
=head2 oo2pp()
Perform conversion of the document specified in the constructor call.
B<Parameters:>
=over 4
=item object
A object as supplied by C<new()>.
=back
B<Returns:> the PerlPoint string.
( run in 1.721 second using v1.01-cache-2.11-cpan-39bf76dae61 )