unicode results from the CPAN

Config-Wrest

This section has been written from the point-of-view of perl 5.8, although the concepts translate to perl 5.6's
slightly different Unicode handling.

First it's important to differentiate between configuration data that is given to deserialize() as a string which contains
wide characters (i.e. code point >255), and data which contains escape sequences for wide characters. Escape sequences
can only occur in certain places, whereas actual wide characters can be used in key names, block names, directives and
in values. This is because the parser uses regular expressions which use metacharacters such as "\w", and these can
match against some wide characters.

Although you can use wide characters in directives, it may make no sense to try to "@include" a filename which contains
wide characters.

Configuration data will generally be read to or written from a file at some stage. You should be aware that
File::Slurp::WithinPolicy uses File::Slurp which reads files in byte-oriented fashion. 
If this is not what you want, e.g. if your config files contain multi-byte characters such as UTF8,
then you should either read/write the file yourself using the appropriate layer
in the arguments to open(), or use the Encode module to go between perl's Unicode-based strings and the required
encoding (e.g. your configuration files may be stored on disk as ISO-8859-1, but you want it to be read into perl
as the Unicode characters, not as a stream of bytes). Similarly, you may wish to use Encode or similar to turn
a string into the correct encoding for your application to use.

Unicode specifies a number of different characters that should be considered as line endings: not just u000A and u000D,
but also u0085 and several others. However, to keep this module compatible with perl versions before 5.8 this
module splits data into lines on the sequence "\x0D\x0A" B<or> on the regular expression C</[\n\r]/>, and does B<not>
split on any of the other characters given in the Unicode standard. If you want your configuration data to use any of the
other line endings you must read the file yourself, change the desired line ending to C<\n> and pass that string
to deserialize(). Reverse the process when using serialize() and writing files. E.g. on an OS/390 machine a
configuration file may be stored with C<NEL> (i.e. "\x85") line endings which need to be changed when reading it
on a Unix machine.

This module has not been tested on EBCDIC platforms.

=head1 READING DATA

If you try to deserialize configuration data that has the wrong syntax (e.g. mis-nested blocks, or too many closing tags)
a fatal error will be raised.

Unrecognized directives cause a warning, as will key/value lines appearing in a list block, or list items appearing in a
hash block (see AllowEmptyValues in L</CONSTRUCTOR OPTIONS>). You also get a warning if there were too few closing tags
and the parse implicitly closed some for you.

=head1 WRITING DATA

The data structure you want to serialize must be a hash reference. The values may be strings, arrayrefs or hashrefs,
and so on recursively. Any bad reference types cause a fatal croak().

You are only allowed to use a restricted set of characters as hash keys, i.e. the names of block elements
and the key in key/value pairs of data. If your data structure has a hash key that could create bad
config data a fatal error is thrown with croak(). Values in list blocks are also checked, and a fatal error is raised
if the value would create bad config data.

In general you will want to use the 'Escapes' option described above. This makes it hard to produce bad configuration files.

If you want to dump out cyclic / self-referential data structures you'll need to set the 'WriteWithReferences' option, otherwise the deep recursion
will be detected and the serialization will throw a fatal error.

=head1 SEE ALSO

parse_file(), write_file() and the '@include' directive load L<File::Slurp::WithinPolicy> on demand to perform the file input/output operations.
See L<perlunicode> for more details on perl's Unicode handling, and L<Encode> for character recoding.
See L<Any::Template>, and the relevant templating modules, if the 'Subs' option is true.

Although this module can read and write data structures it is not intended as an all-purpose serialization system. For that
see L<Storable>.

Unicode Newline Guidelines from http://www.unicode.org/versions/Unicode4.0.0/ch05.pdf#G10213

=head1 VERSION

$Revision: 1.36 $ on $Date: 2006/08/22 14:09:50 $ by $Author: mattheww $

=head1 AUTHOR

IF&L Software Engineers <cpan _at_ bbc _dot_ co _dot_ uk>

=head1 COPYRIGHT

(c) BBC 2006. This program is free software; you can redistribute it and/or modify it under the GNU GPL.

See the file COPYING in this distribution, or http://www.gnu.org/licenses/gpl.txt 

=cut
( run in 0.936 second using v1.01-cache-2.11-cpan-acf6aa7dc9e )