Data-CTable
view release on metacpan or search on metacpan
mac "\x0D" CR 13 "\015" ^M
unix "\x0A" LF 10 "\012" ^J
See the section LINE ENDINGS, below, for accessor methods and
conversion utilities that help you get/set this parameter in either
symbolic format or string format as you prefer.
_FDelimiter ||= undef;
_FDelimiter is the field delimiter between field names in the header
row (if any) and also between fields in the body of the file. If
undef, read() will try to guess whether it is tab ""\t"" or comma
<",">, and set this parameter accordingly. If there is only one
field in the file, then comma is assumed by read() and will be used
by write().
To guess the delimiter, the program looks for the first comma or tab
character in the header row (if present) or in the first record.
Whichever character is found first is assumed to be the delimiter.
If you don't want the program to guess, or you have a data file
format that uses a custom delimiter, specify the delimiter
explicitly in the object or when calling read() or make a subclass
that initializes this value differently. On write(), this will
default to comma if it is empty or undef.
_QuoteFields = undef unless exists
_QuoteFields controls how field values are quoted by write() when
writing the table to a delimited text file.
An undef value (the default) means "auto" -- each field is checked
individually and if it contains either the _FDelimiter character or
a double-quote character, the field value will be surrounded by
double-quotes as it is written to the file. This method is slower to
write but faster to read, and may make the output easier for humans
to read.
A true value means always put double-quotes around every field
value. This mode is faster to write but slower to read.
A zero value means never to use double-quotes around field values
and not to check for the need to use them. This method is the
fastest to read and write. You may use it when you are certain that
your data can't contain any special characters. However, if you're
wrong, this mode will produce a corrupted file in the event that one
of the fields does contain the active delimiter (such as comma or
tab) or a quote.
_HeaderRow = 1 unless exists
_HeaderRow is a boolean that says whether to expect a header row in
data files. The default is true: a header row is required. If false,
_FieldList MUST be present before calling read() or an error will be
generated. In this latter case, _FieldList will be assumed to give
the correct names of the fields in the file, in order, before the
file is read. In other words, the object expects that either a) it
can get the field names from the file's header row or b) you will
supply them before read() opens the file.
Encoding of return characters within fields
_ReturnMap = 1 unless exists
_ReturnMap says that returns embedded in fields should be decoded on
read() and encoded again on write(). The industry-standard encoding
for embedded returns is ^K (ascii 11 -- but see next setting to
change it). This defaults to true but can be turned off if you want
data untouched by read(). This setting has no effect on data files
where no fields contain embedded returns. However, it is vital to
leave this option ON when writing any data file whose fields could
contain embedded returns -- if you have such data and call write()
with _ReturnMap turned off, the resulting file will be an invalid
Merge/CSV file and might not be re-readable.
When these fields are decoded on read(), encoded returns are
converted to ""\n"" in memory, whatever its interpretation may be on
the current platform (\x0A on Unix or DOS; \x0D on MacPerl).
IMPORTANT NOTE: When these fields are encoded by write(), any
occurrence of the current _LineEnding being used to write the file
is searched and encoded FIRST, and THEN, any occurrence of "\n" is
also searched and encoded. For example, if using mac line endings
(^M) to write a file on a Unix machine, any ^M characters in fields
will be encoded, and then any "\n" (^J) characters will ALSO be
encoded. This may not be what you want, so be sure you know how your
data is encoded in cases where your field values might contain any
^J and/or ^M characters.
IMPORTANT NOTE: If you turn _ReturnMap off, fields with returns in
them will still be double-quoted correctly. Some parsers of tab- or
comma-delimited files are able to support reading such files.
HOWEVER, the parser in this module's read() method DOES NOT
currently support reading files in which a single field value
appears to span multiple lines in the file. If you have a need to
read such a file, you may need to write your own parser as a
subclass of this module.
_ReturnEncoding ||= "\x0B";
This is the default encoding to assume when embedding return
characters within fields. The industry standard is "\x0B" (ascii 11
/ octal \013 / ^K) so you should probably not ever change this
setting.
When fields are encoded on write(), ""\n"" is converted to this
value. Note that different platforms use different ascii values for
""\n"", which is another good reason to leave the ReturnEncoding
feature enabled when calling write().
To summarize: this module likes to assume, and you should too, that
returns in data files on disk are encoded as "\x0B", but once loaded
into memory, they are encoded as the current platform's value of
""\n"".
_MacRomanMap = undef unless exists
Data::CTable assumes by default that you want field data in memory
to be in the ISO 8859-1 character set (the standard for Latin 1
Roman characters on Unix and Windows in the English and Western
European languages -- and also the default encoding for HTML Web
pages).
_MacRomanMap controls the module's optional mapping of Roman
characters from Mac format on disk to ISO format in memory when
reading and writing data files. These settings are recognized:
undef ## Auto: Read/write Mac chars if using Mac line endings
1 ## On: Assume Mac char set in all fields
0 ## Off: Don't do any character mapping at all
The default setting is undef, which enables "Auto" mode: files found
to contain Mac line endings will be assumed to contain Mac
upper-ASCII characters and will be mapped to ISO on read(); and
files to be written with Mac line endings will mapped back from ISO
to Mac format on write().
If your data uses any non-Latin-1 character sets, or binary data, or
( run in 3.846 seconds using v1.01-cache-2.11-cpan-2398b32b56e )