Data-CTable

 view release on metacpan or  search on metacpan

CTable.pm  view on Meta::CPAN

=item _FDelimiter    ||= undef;     

_FDelimiter is the field delimiter between field names in the header
row (if any) and also between fields in the body of the file.  If
undef, read() will try to guess whether it is tab C<"\t"> or comma
<",">, and set this parameter accordingly.  If there is only one field
in the file, then comma is assumed by read() and will be used by
write().

To guess the delimiter, the program looks for the first comma or tab
character in the header row (if present) or in the first record.
Whichever character is found first is assumed to be the delimiter.

If you don't want the program to guess, or you have a data file format
that uses a custom delimiter, specify the delimiter explicitly in the
object or when calling read() or make a subclass that initializes this
value differently.  On write(), this will default to comma if it is
empty or undef.

=item _QuoteFields    = undef unless exists

_QuoteFields controls how field values are quoted by write() when
writing the table to a delimited text file.

An undef value (the default) means "auto" -- each field is checked
individually and if it contains either the _FDelimiter character or a
double-quote character, the field value will be surrounded by
double-quotes as it is written to the file.  This method is slower to
write but faster to read, and may make the output easier for humans to
read.

A true value means always put double-quotes around every field value.
This mode is faster to write but slower to read.

A zero value means never to use double-quotes around field values and
not to check for the need to use them.  This method is the fastest to
read and write.  You may use it when you are certain that your data
can't contain any special characters.  However, if you're wrong, this
mode will produce a corrupted file in the event that one of the fields
does contain the active delimiter (such as comma or tab) or a quote.

=item _HeaderRow      = 1 unless exists

_HeaderRow is a boolean that says whether to expect a header row in
data files.  The default is true: a header row is required.  If false,
_FieldList MUST be present before calling read() or an error will be
generated.  In this latter case, _FieldList will be assumed to give
the correct names of the fields in the file, in order, before the file
is read.  In other words, the object expects that either a) it can get
the field names from the file's header row or b) you will supply them
before read() opens the file.

=back 

=head2 Encoding of return characters within fields

=over 4 

=item _ReturnMap       = 1 unless exists

_ReturnMap says that returns embedded in fields should be decoded on
read() and encoded again on write().  The industry-standard encoding
for embedded returns is ^K (ascii 11 -- but see next setting to change
it).  This defaults to true but can be turned off if you want data
untouched by read().  This setting has no effect on data files where
no fields contain embedded returns.  However, it is vital to leave
this option ON when writing any data file whose fields could contain
embedded returns -- if you have such data and call write() with
_ReturnMap turned off, the resulting file will be an invalid Merge/CSV
file and might not be re-readable.

When these fields are decoded on read(), encoded returns are converted
to C<"\n"> in memory, whatever its interpretation may be on the current
platform (\x0A on Unix or DOS; \x0D on MacPerl).

IMPORTANT NOTE: When these fields are encoded by write(), any
occurrence of the current _LineEnding being used to write the file is
searched and encoded FIRST, and THEN, any occurrence of "\n" is also
searched and encoded.  For example, if using mac line endings (^M) to
write a file on a Unix machine, any ^M characters in fields will be
encoded, and then any "\n" (^J) characters will ALSO be encoded.  This
may not be what you want, so be sure you know how your data is encoded
in cases where your field values might contain any ^J and/or ^M
characters.

IMPORTANT NOTE: If you turn _ReturnMap off, fields with returns in
them will still be double-quoted correctly.  Some parsers of tab- or
comma-delimited files are able to support reading such files.
HOWEVER, the parser in this module's read() method DOES NOT currently
support reading files in which a single field value appears to span
multiple lines in the file.  If you have a need to read such a file,
you may need to write your own parser as a subclass of this module.

=item _ReturnEncoding    ||= "\x0B";

This is the default encoding to assume when embedding return
characters within fields.  The industry standard is "\x0B" (ascii 11 /
octal \013 / ^K) so you should probably not ever change this setting.

When fields are encoded on write(), C<"\n"> is converted to this
value.  Note that different platforms use different ascii values for
C<"\n">, which is another good reason to leave the ReturnEncoding
feature enabled when calling write().  

To summarize: this module likes to assume, and you should too, that
returns in data files on disk are encoded as "\x0B", but once loaded
into memory, they are encoded as the current platform's value of
C<"\n">.

=item _MacRomanMap       = undef unless exists

Data::CTable assumes by default that you want field data in memory to
be in the ISO 8859-1 character set (the standard for Latin 1 Roman
characters on Unix and Windows in the English and Western European
languages -- and also the default encoding for HTML Web pages).

_MacRomanMap controls the module's optional mapping of Roman
characters from Mac format on disk to ISO format in memory when
reading and writing data files.  These settings are recognized:

	undef   ## Auto: Read/write Mac chars if using Mac line endings  
	1       ## On:   Assume Mac char set in all fields
	0       ## Off:  Don't do any character mapping at all

The default setting is undef, which enables "Auto" mode: files found
to contain Mac line endings will be assumed to contain Mac upper-ASCII
characters and will be mapped to ISO on read(); and files to be
written with Mac line endings will mapped back from ISO to Mac format
on write().

If your data uses any non-Latin-1 character sets, or binary data, or
you really want Mac upper-ASCII characters in memory, or you just



( run in 0.608 second using v1.01-cache-2.11-cpan-df04353d9ac )