File-BOM
view release on metacpan or search on metacpan
subroutines only
* :vars
just %bom2enc and %enc2bom
VARIABLES
%bom2enc
Maps Byte Order marks to their encodings.
The keys of this hash are strings which represent the BOMs, the values
are their encodings, in a format which is understood by Encode
The encodings represented in this hash are: UTF-8, UTF-16BE, UTF-16LE,
UTF-32BE and UTF-32LE
%enc2bom
A reverse-lookup hash for bom2enc, with a few aliases used in Encode,
namely utf8, iso-10646-1 and UCS-2.
Note that UTF-16, UTF-32 and UCS-4 are not included in this hash. Mainly
because Encode::encode automatically puts BOMs on output. See
Encode::Unicode
FUNCTIONS
open_bom
$encoding = open_bom(HANDLE, $filename, $default_mode)
($encoding, $spill) = open_bom(HANDLE, $filename, $default_mode)
opens HANDLE for reading on $filename, setting the mode to the
appropriate encoding for the BOM stored in the file.
On failure, a fatal error is raised, see the DIAGNOSTICS section for
details on how to catch these. This is in order to allow the return
value(s) to be used for other purposes.
If the file doesn't contain a BOM, $default_mode is used instead. Hence:
open_bom(FH, 'my_file.txt', ':utf8')
Opens my_file.txt for reading in an appropriate encoding found from the
BOM in that file, or as a UTF-8 file if none is found.
In the absence of a $default_mode argument, the following 2 calls should
be equivalent:
open_bom(FH, 'no_bom.txt');
open(FH, '<', 'no_bom.txt');
If an undefined value is passed as the handle, a symbol will be
generated for it like open() does:
# create filehandle on the fly
$enc = open_bom(my $fh, $filename, ':utf8');
$line = <$fh>;
The filehandle will be cued up to read after the BOM. Unseekable files
(e.g. fifos) will cause croaking, unless called in list context to catch
spillage from the handle. Any spillage will be automatically decoded
from the encoding, if found.
e.g.
# croak if my_socket is unseekable
open_bom(FH, 'my_socket');
# keep spillage if my_socket is unseekable
($encoding, $spillage) = open_bom(FH, 'my_socket');
# discard any spillage from open_bom
($encoding) = open_bom(FH, 'my_socket');
defuse
$enc = defuse(FH);
($enc, $spill) = defuse(FH);
FH should be a filehandle opened for reading, it will have the relevant
encoding layer pushed onto it be binmode if a BOM is found. Spillage
should be Unicode, not bytes.
Any uncaptured spillage will be silently lost. If the handle is
unseekable, use list context to avoid data loss.
If no BOM is found, the mode will be unaffected.
decode_from_bom
$unicode_string = decode_from_bom($string, $default, $check)
($unicode_string, $encoding) = decode_from_bom($string, $default, $check)
Reads a BOM from the beginning of $string, decodes $string (minus the
BOM) and returns it to you as a perl unicode string.
if $string doesn't have a BOM, $default is used instead.
$check, if supplied, is passed to Encode::decode as the third argument.
If there's no BOM and no default, the original string is returned and
encoding is ''.
See Encode
get_encoding_from_filehandle
$encoding = get_encoding_from_filehandle(HANDLE)
($encoding, $spillage) = get_encoding_from_filehandle(HANDLE)
Returns the encoding found in the given filehandle.
The handle should be opened in a non-unicode way (e.g. mode '<:bytes')
so that the BOM can be read in its natural state.
After calling, the handle will be set to read at a point after the BOM
(or at the beginning of the file if no BOM was found)
If called in scalar context, unseekable handles cause a croak().
If called in list context, unseekable handles will be read byte-by-byte
( run in 0.416 second using v1.01-cache-2.11-cpan-e1769b4cff6 )