OpenOffice-PerlPoint

 view release on metacpan or  search on metacpan

lib/OpenOffice/PerlPoint.pm  view on Meta::CPAN


C<OpenOffice::PerlPoint> is a translator class to transform Open Office 1.0 and 2.0 (and
generally OASIS Open Document) documents into PerlPoint sources. It is based on
C<OpenOffice::OODoc>.

Once you have transformed an Open Office or Open Document document into PerlPoint, it may
be furtherly processed using the PerlPoint utilities.

If you prefer, you do not need to perform an explicit transformation. Beginning with
release 0.40, C<PerlPoint::Package> can process Open Office / Open Document sources directly.
Please see C<PerlPoint::Parser> for details, or the documentation that comes with PerlPoint.

B<This software is in alpha state. It supports just a I<subset> of the source format features.>
Please see the I<NOTES> sections below.

=head1 METHODS

=cut


# declare package
package OpenOffice::PerlPoint;

# declare version
$VERSION=0.03;

# pragmata
use strict;

# load modules
use Carp;
use Safe;
use Storable;
use Net::Ping;
use Text::Wrapper;
use File::Basename;
use LWP::UserAgent;
use Text::Template;
use POSIX qw(strftime);
use OpenOffice::OODoc 2.00;

# declare attributes
use fields qw(
              file
              archive

              metadata
              docContent
              docStyles
              content
              notes

              metaData

              userAgent
              ping

              skipmetadata
              imagebufferdir
              metadataTemplate
              userdefinedDocdata
             );


# define data: delimiter handling
my %delimiters=(
                'text:footnote-citation' => {
                                             begin => '[',
                                             end   => ']',
                                            },
                'text:footnote-body'     => {
                                             begin => '{NOTE: ',
                                             end   => '}',
                                            },
                'text:span'              => {
                                             begin => '<<',
                                             end   => '>>',
                                            },
                'text:list-item'         => {
                                             begin => '',
                                             end   => '',
                                            },
               );


# define data: style extraction directives for traversal (see traverseElement() below)
my @styles=(
            ['B',     'properties', 'fo:font-weight', 'bold', 0, '\B<', '>'],
            ['I',     'properties', 'fo:font-style', 'italic', 0, '\I<', '>'],
            ['U',     'properties', 'style:text-underline', 'single', 0, '\U<', '>'],                # OO 1.0
            ['U',     'properties', 'style:text-underline-style', 'solid', 0, '\U<', '>'],           # OD (missing the previous part -bug?!)
            ['F',     'properties', 'fo:color', qr/^(\#[\da-fA-F]{6})$/, 0, '\F{color="$1"}<', '>'], # first backslash is for highlightning
            ['C',     'properties', 'style:font-name', qr/^(Courier New)$/, 0, '\C<', '>'],
            ['BLOCK', 'references', 'style:parent-style-name', qr/^(Code)$/, 0, ' ' x 3, ''],
           );



# init wrappers
my ($paragraphWrapper, $listWrapper);
$paragraphWrapper=Text::Wrapper->new(
                                     columns	=> 76,
                                     par_start	=> '',
                                     body_start	=> ''
                                    );
  
$listWrapper=Text::Wrapper->new(
                                columns		=> 76,
                                par_start	=> '  ',
                                body_start	=> '  '
                               );

=pod

=head2 new()

The constructor.

B<Parameters:>

All parameters except the first are named.

lib/OpenOffice/PerlPoint.pm  view on Meta::CPAN


=item file

The (absolute or relative) path to the Office document that should be converted.

=item imagebufferdir

OO document images refer to images stored within the document or located externally at
a location that is specified by an URL. Both image sources cannot be accessed by PerlPoint,
so the converter makes copies from those sources and refers to I<them>. The C<imagebufferdir>
option specifies where these intermediate copies should be stored. The directory is made
unless it already exists.

A I<relative> path will result in a directory relative to the document. An I<absolute> path
is suitable if images from various documents should be collected in one place, or if the
resulting PerlPoint document should be written to a special path.


=item metadataTemplate

A template to include document meta data to the transformation result. The template is
expected to be in C<Text::Template> format, in a safe compartment.

These data are available:

=over 4

=item %metadata

A hash of all document meta data. The keys of this hash are the following, while the
values hold the document data assigned to that keys.

=over 4
    
=item title

document title

=item subject

document subject

=item description

document description

=item creator

document author

=item date

last modification

=item keywords

keywords describing the document

=item User defined fields

All names defined by C<userdefinedDocdata>.

=back

=item %tools

Keys: C<generator> holds the name of the program that wrote the OO document.
C<converter> holds the name of the converter, usually the name of this module.

=item $source

Source name, usually the name set by option C<file>.

=back

This option has no effect if C<skipmetadata> is set.


=item skipmetadata

If set to a true value meta data processing is bypassed.


=item userdefinedDocdata

Each OO document can be described by various predefined data, which are set automatically
(like the modification date) or set up by the document author in a dialog (like the
documents title). Additionally, OO allows to define up to four user informations. Called
C<info1> to I<info4> by default, they can be named individually if required.

This option expects a reference to an array of names for those user defined document data entries.
The names can be used in templates passed in via option C<metadataTemplate> to access the data
stored in the related document fields.

=back

B<Returns:> the new object.

B<Example:>

 # build an object
 my $oo2pp=new OpenOffice::PerlPoint(file=>$ooFile);

=cut
sub new
 {
  # get parameters
  my ($class, @pars)=@_;

  # build parameter hash
  confess "[BUG] The number of parameters should be even - use named parameters, please.\n" if @pars%2;
  my %pars=@pars;

  # check parameters
  confess "[BUG] Missing class name.\n" unless $class;
  confess "[BUG] Missing file parameter.\n" unless exists $pars{file};
  confess "[BUG] Missing image buffer directory parameter.\n" unless exists $pars{imagebufferdir};

  # build object
  my __PACKAGE__ $me=fields::new($class);

  # store configuration
  $me->{$_}=$pars{$_} for qw(
                             file
                             imagebufferdir
                             skipmetadata
                             metadataTemplate
                             userdefinedDocdata
                            );

  # aggregate a user agent object
  $me->{userAgent}=new LWP::UserAgent;
  $me->{userAgent}->timeout(1);
  $me->{userAgent}->env_proxy;

  # and a Net::Ping object
  $me->{ping}=new Net::Ping;

  # build archive object
  $me->{archive}=ooFile($pars{file});
  confess "[Error] $pars{file} is no regular OpenOffice.org file.\n" unless $me->{archive};

  # extract metadata
  $me->{metadata}=ooMeta(archive => $me->{archive});
  carp "[Warn] $pars{file} has not standard OOO properties, it looks strange.\n" unless $me->{metadata};

  # extract document (in content and style parts)
  $me->{docContent}=ooDocument(
                               archive    => $me->{archive},
                               member     => 'content',
                               delimiters => \%delimiters,
                              );
  confess "[Error] No standard OOO content found in $pars{file}!\n" unless $me->{docContent};

  $me->{docStyles}=ooDocument(
                              archive    => $me->{archive},
                              member     => 'styles',
                              delimiters => \%delimiters,
                             );
  confess "[Error] No standard OOO styles found in $pars{file}!\n" unless $me->{docContent};

  # the strange next lines prevent the getText() method of
  # OpenOffice::OODoc::Text (see the corresponding man page) from using
  # its default tags for spans and footnotes
  delete $me->{docContent}{delimiters}{'text:span'};
  delete $me->{docContent}{delimiters}{'text:footnote-body'};

  # here we select the tab as field separator for table field output
  # (the default is ";" as for CSV output)
  $me->{docContent}{field_separator}="\t";

  # in the next sequence, we will extract all the footnotes, store them for
  # later processing and remove them from the content
  $me->{notes}=[$me->{docContent}->getFootnoteList];
  $me->{docContent}->removeElement($_) for @{$me->{notes}};

  # get the full list of text objects (without the previously removed footnotes)
  $me->{content}=[$me->{docContent}->getTextElementList];

  # reply the new object
  $me;
 }

# TODO: make document variable names configurable
sub convertMetadata
 {
  # get and check parameters
  ((my __PACKAGE__ $me), (my ($item, $guard)))=@_;
  confess "[BUG] Missing object parameter.\n" unless $me;
  confess "[BUG] Object parameter is no ", __PACKAGE__, " object.\n" unless ref $me and ref $me eq __PACKAGE__;

  # variables
  my ($perlpoint, $title, $subject, $description, $author, $date, $version, $generator, $copyright, $authormail);

  # anything to do?
  if ($me->{metadata})
    {
     # predefined meta data: title, subject, description, author
     $me->{metaData}{$_}=$me->{metadata}->$_ || 'unknown' for qw(
                                                                 creator
                                                                 date
                                                                 description
                                                                 keywords
                                                                 subject
                                                                 title
                                                                );

     # get user defined metadata, as set up by caller
     my %userDefinedMetadata=$me->{metadata}->user_defined;
     $me->{metaData}{$_}=$userDefinedMetadata{$_} || 'unknown' for @{$me->{userdefinedDocdata}};

     # get generator
     $generator=$me->{metadata}->generator;
     $generator='unknown program' unless $generator;
    }

  # process meta data, if configured
  if (defined $me->{metadataTemplate})
   {
    # build safe environment
    my $safe=new Safe;

    # clone meta data into a transfer variable
    my %transfer=(
                  # meta data
                  metaData => $me->{metaData} ? Storable::dclone($me->{metaData}) : {},

                  # generator and converter
                  tools    => {
                               generator => $generator,
                               converter => __PACKAGE__,
                              },

                  # more data
                  source   => $me->{file},
                 );

    # build a template object, process the template and add the result
    # (template bugs stop the program immediately - as they are considered bugs, not (user) errors)
    my $template=new Text::Template(TYPE => 'STRING', SOURCE => $me->{metadataTemplate})
        or die "[BUG] Couldn't construct template: $Text::Template::ERROR\n";
    $perlpoint.=$template->fill_in(SAFE => $safe, HASH => \%transfer)
        or die "[BUG] Couldn't process template: $Text::Template::ERROR\n";
   }

  # supply result
  $perlpoint;
 }


#-----------------------------------------------------------------------------

# convert completely
=pod

=head2 oo2pp()

Perform conversion of the document specified in the constructor call.

B<Parameters:>

=over 4

=item object

A object as supplied by C<new()>.

=back

B<Returns:> the PerlPoint string.



( run in 1.721 second using v1.01-cache-2.11-cpan-39bf76dae61 )