streaming results from the CPAN

PerlPoint-Package

  my (
      @streamData,                # PerlPoint stream;
      %options,                   # option hash;
     );


==Option handling

Option handling is a highly individual part of software design. CPAN provides numerous modules which solve this task in several ways. This reflects the very different preferences of various people, and we do not want to restrict anyone in this field....

* Provide a way to use \XREF{name="Option files"}<option files>.

* Provide \XREF{name="Common options"}<common \PP options>.


===Option files

Option files allow to specify Perl script options \I<by files>, so they simply contain what would normally be specified in the commandline. This relieves a user from typing in typical options again and again. It also allows to \I<reuse> options, whic...

  \GREEN<# configure style>
  -style_dir /opt/perlpoint/pp2html/styles
  -style surprise

, to store this file as \C<style.cfg> and to invoke \C<pp2html> as in

  > pp2html \RED<@style.cfg> ...

Option files can be nested and cascaded, and you can use as many of them as you want. It is also possible to use \I<default> option files which do not need to be specified when calling the script but are resolved automatically. They make it very hand...

To provide option file usage, all you have to do is to integrate the following statement.

  \GREEN<# resolve option files>
  argvFile(default=>1, home=>1);

\C<argvFile()> is a function of \C<Getopt::ArgvFile> which was already \XREF{name="Modules to load"}<loaded> and performs three tasks in this call:

# It searches the users home directory for a file named \C<.<converter name\>>, e.g.
  \C<\B<.>pp2sdf>. All options found therein are "unshifted" into \C<@ARGV>. A default
  option file in ones home directory stores individual preferences of calling the converter.

# It searches the directory where the converter script is located for (probably another)
  \C<.<converter name\>> and integrates its options likewise, "unshifting" them to \C<@ARGV>
  as well. Such a global option file can be used to set up options to be used by all
  script users.

# It processes \C<@ARGV> to resolve any explicit and nested option files.

The result is an \C<@ARGV> array which contains all options, both specified directly and by file, ready to be processed by usual option handling modules like \BC<Getopt::Long>.


===Common options

I personally prefer \BC<Getopt::Long> for option handling. Your preferences may vary, but please provide at least the options specified in this example statement:

  \GREEN<# get options>
  GetOptions(\%options,

             "activeContents",    \GREEN<# evaluation of active contents;>
             "cache",             \GREEN<# control the cache;>
             "cacheCleanup",      \GREEN<# cache cleanup;>
             "docstreaming=s",    \GREEN<# document stream handling;>
             "help",              \GREEN<# online help, usage;>
             "includelib=s@",     \GREEN<# library pathes;>
             "nocopyright",       \GREEN<# suppress copyright message;>
             "noinfo",            \GREEN<# suppress runtime informations;>
             "nowarn",            \GREEN<# suppress runtime warnings;>
             "quiet",             \GREEN<# suppress all runtime messages except of error ones;>
             "safeOpcode=s@",     \GREEN<# permitted opcodes in active contents;>
             "set=s@",            \GREEN<# user settings;>
             "skipstream=s@",     \GREEN<# skip certain document streams;>
             "tagset=s@",         \GREEN<# add a tag set to the scripts own tag declarations;>
             "trace:i",           \GREEN<# activate trace messages;>
            );

:\BC<activeContents>: flags if active contents shall be evaluated or not.

:\BC<cache>: allows user to activate and deactivate the cache.

:\BC<cacheCleanup>: enforces a cleanup of existing cache files.

:\BC<docstreaming>: specifies how document streams shall be handled.

:\BC<help>: displays a usage message, which is usually the complete converter manpage.
            The converter is stopped after performing the display task.

:\BC<includelib>: specifies a directory to be scanned for files included via an
                  \C<\\INCLUDE> tag - think of \C<perl>'s \C<-I> option. Users are
                  allowed to specify as many directories as necessary, to be investigated
                  in the specified order. (Note: the environment variable \C<PERLPOINTLIB>
                  can be used for the same purpose, but \C<-includelib> pathes will be
                  scanned first.)

:\BC<nocopyright>, \BC<noinfo> and \BC<nowarn>: suppress informations a user can occasionally
                                                live without: copyright messages, informations
                                                and warnings.

:\BC<quiet>: combines \C<nocopyright>, \C<noinfo> and \C<nowarn>.

:\BC<safeOpcode>: Embedded code ("Active Contents") is usually executed in a \BC<Safe>
                  compartment. This way a user can control which operations shall be allowed
                  and which shall be denied. According to the interface of \C<Safe>, allowed 
                  operations are specified by Perl opcodes as defined by the \BC<Opcode>
                  module. With this option, a user can specify such an opcode to allow its
                  execution. It can be used multiply to accept several opcodes. Alternatively,
                  the user might pass the special string \C<ALL> which flags that Active
                  Contents shall be executed without any restriction - which will be done by
                  using \C<eval()> instead of \C<Safe>.

:\BC<set>: provides a way to inject user defined settings into Active Contents. This is helpful
           to process documents in various ways, without a need to modify document sources.

:\BC<skipstream>: allows a user to hide a certain document stream. Can be specified multiply.

:\BC<tagset>: can be used multiply to declare that foreign tags shall be accepted. Foreign tags
              are tags not supported by the converter, but defined for other converters. By 
              accepting them, a user can make a source pass to the converter even if it uses
              tags of another converter.

:\BC<trace>: activates several traces, at least of the frameset modules.

Luckily, the implementation of most of these options is as common as the options themselves and shown in the following sections. So in most cases it's no extra effort to provide these features.

For example, \C<quiet> can be implemented by

  @options{qw(nocopyright noinfo nowarn)}=() x 3 if exists $options{quiet};

It should be possible to control traces by an environment variable \BC<SCRIPTDEBUG> as well as by option \C<trace>:

  $options{trace}=$ENV{SCRIPTDEBUG} if not exists $options{trace} and exists $ENV{SCRIPTDEBUG};


==Usual startup operations

Now could be a good time to display a copyright message if you like.

  \GREEN<# display copyright unless suppressed>

Help requests can be fulfilled very quickly, because they do not need further operations.

  \GREEN<# check for a help request>
  (exec("pod2text $0 | less") or die "[Fatal] exec() cannot be called: $!\n") if $options{help};

doc/writing-converters-traditional.pp view on Meta::CPAN

Please note that because \C<-tagset> is intended to reflect definitions made for a certain converter, there is no way to load only a subset of another converters tags descriptions.

Different to usual \XREF{name="Modules to load"}<definition loading>, no warning is displayed here if a loaded foreign tag is named like an own one, and the \I<original> converter definition will remain established to give the converter first priorit...


==Set up Active Contents handling

If the special opcode \C<ALL> was passed, all embedded Perl operations are permitted and there's no need to perform them in a compartment provided by the \BC<Safe> module. This is flagged by a true \I<scalar> value. Otherwise, we need to construct a ...

  \GREEN<# Set up active contents handling. By default, we use a Safe object.>
  my $safe=new Safe;
  if (exists $options{safeOpcode})
   {
    unless (grep($_ eq 'ALL', @{$options{safeOpcode}}))
      {
       \GREEN<# configure compartment>
       $safe->permit(@{$options{safeOpcode}});
      }
    else
      {
       \GREEN<# simply flag that we want to execute active contents>
       $safe=1;
      }
   }

The variable \C<$safe> which is prepared by this code is intended to be passed to the parser soon.


==The parser call

And now, the parser can be called. Because it is implemented by a class, we first need to build an object, which is simple.

  \GREEN<# build parser>
  my $parser=new PerlPoint::Parser;

The objects method \C<run()> invokes the parser to process all document sources. Various parameters control how this task is performed and need to be set according to the converter options. It should be sufficient to simply copy this call and to slig...

  \GREEN<# call parser>
  $parser->run(
               stream          => \\@streamData,
               files           => \\@ARGV,

               filter          => 'perl|sdf|html',

               safe            => exists $options{activeContents} ? $safe : undef,

               activeBaseData  => {
                                   targetLanguage => 'SDF',
                                   userSettings   => {map {$_=>1} exists $options{set} ? @{$options{set}} : ()},
                                  },

               predeclaredVars => {
                                   CONVERTER_NAME    => basename($0),
                                   CONVERTER_VERSION => do {no strict 'refs'; ${join('::', __PACKAGE__, 'VERSION')}},
                                  },

               libpath         => exists $options{includelib} ? $options{includelib} : [],

               docstreams2skip => exists $options{skipstream} ? $options{skipstream} : [],

               docstreaming    => (exists $options{docstreaming} and ($options{docstreaming}==DSTREAM_HEADLINES or $options{docstreaming}==DSTREAM_IGNORE)) ? $options{docstreaming} : DSTREAM_DEFAULT,

               vispro          => 1,

               cache           =>   (exists $options{cache} ? CACHE_ON : CACHE_OFF)
                                  + (exists $options{cacheCleanup} ? CACHE_CLEANUP : 0),

               display         =>   DISPLAY_ALL
                                  + (exists $options{noinfo} ? DISPLAY_NOINFO : 0)
                                  + (exists $options{nowarn} ? DISPLAY_NOWARN : 0),

               trace           =>   TRACE_NOTHING
                                  + ((exists $options{trace} and $options{trace} &  1) ? TRACE_PARAGRAPHS : 0)
                                  + ((exists $options{trace} and $options{trace} &  2) ? TRACE_LEXER      : 0)
                                  + ((exists $options{trace} and $options{trace} &  4) ? TRACE_PARSER     : 0)
                                  + ((exists $options{trace} and $options{trace} &  8) ? TRACE_SEMANTIC   : 0)
                                  + ((exists $options{trace} and $options{trace} & 16) ? TRACE_ACTIVE     : 0),
              ) or exit(1);


So what happens here?

:\BC<stream>: passes a reference to an array which will be used to store the stream element
              in. It is suggested to pass an empty array (but currently new fields will be
              \I<added>, so existing entries will not be damaged).

:\BC<files>: passes an array of document source files to parse.

:\BC<filter>: declares which formats are allowed to be embedded or included. You can accept all
              formats which can be processed by the software which has to deal with the converter
              product, and "perl" to provide the full power of Active Contents. All formats not
              matching the filter will be \I<ignored>.

:\BC<safe>: pass the \XREF{name="Set up Active Contents handling"}<prepared> variable \C<$safe>.

:\BC<activeBaseData>: sets up a hash reference which is made accessible to Active Contents as
                      \C<$main::\PP>. The keys \C<targetLanguage> and \C<userSettings> are
                      provided by convention, but you may add whatever keys you need.

:\BC<libpath>: passes the library pathes a user specified via \C<-includelib>, see above
               for details. Please copy this code.

:\BC<docstreams2skip>: passes the names of document streams to skip which are specified via
                       user option \C<-skipstream>, see above for details. Please copy this
                       code.

:\BC<docstreaming>: specifies how document streams shall be handled. It is up to you
                    which fashions of docstream handling your converter will provide.
                    Nevertheless, two styles are supported directly by the parser (which
                    makes them available in all converters), so please pass \C<DSTREAM_HEADLINE>
                    and \C<DSTREAM_IGNORE> directly to the parser if they are specified by a user.
                    \C<DSTREAM_HEADLINES> transforms stream entry points into headlines,
                    while \C<DSTREAM_IGNORE> ignores all streams except the "main stream".
                    In all other cases, please set the value of this parameter to
                    \C<DSTREAM_DEFAULT>, which makes the parser supply all stream entry points
                    in the intermediate data so your converter can deal with them itself.
                    A final hint: for user convenience, your converter could allow to specify
                    streaming modes by strings instead of by numbers. (And it is recommended
                    to do so because the internal constants are subject to changes, but the
                    user interface should be as stable as possible.) So please adapt this
                    code appropriately.

:\BC<vispro>: if set to a true value, the parser will display runtime informations.
              Please copy this code.

:\BC<cache>: used to pass the cache settings. Please copy this code.

:\BC<display>: used to pass the display settings. Please copy this code.

:\BC<trace>: used to pass the trace settings. Please copy this code.


\C<run()> returns a true value if parsing was successful. It is recommended to evaluate this code and to stop processing in case of an error.

  \GREEN<# call parser>
  $parser->run(...) \RED<or exit(1)>;


==Backend construction

When the sources are parsed their data are represented in the stream where they can be processed to produce the final document(s). It is strongly recommended to do this by using the backend class shipped with the converter framework. In a first step,...

  \GREEN<# build a backend>
  my $backend=new PerlPoint::Backend(
                                     name    => 'pp2sdf',
                                     display =>   DISPLAY_ALL
                                                + (exists $options{noinfo} ? DISPLAY_NOINFO : 0)
                                                + (exists $options{nowarn} ? DISPLAY_NOWARN : 0),
                                     trace   =>   TRACE_NOTHING
                                                + ((exists $options{trace} and $options{trace} & 32) ? TRACE_BACKEND : 0),
                                     vispro  => 1,
                                    );

Names and behaviour of these constructor options are mostly known from \XREF{name="The parser call"}<call> of the parsers \C<run()> method.

:\BC<name>: a description used to identify the backend.

:\BC<display>: used to pass the display settings. Please copy this code.

:\BC<trace>: used to pass the trace settings. Please copy this code.

:\BC<vispro>: if set to a true value, the backend will display runtime informations.

Note: because backend processing does not consume stream data, it is possible to use more than one backend at a time, but there are still side effects. This may be improved by future framework versions.


==Backend callbacks

A backend is used to translate stream elements into appropriate expressions of the target format. Because of this, backends need to work very format (or converter) specific. The only common task is to read the stream and to ignore all parts which the...

When a backend is \XREF{name="Backend invokation: produce new code"}<started>, it (usually) runs through the stream. Every stream element is checked for its type, and then it is checked if a callback was specified to handle it. If so, the callback wi...

To make this work, callbacks need to be registered before the backend starts its stream run. This is done by the backend method \BC<register()>.

  \GREEN<# register backend handlers>
  $backend->register(DIRECTIVE_BLOCK, \\&handleBlock);
  ...

( run in 0.439 second using v1.01-cache-2.11-cpan-140bd7fdf52 )