Acme-InputRecordSeparatorIsRegexp
view release on metacpan or search on metacpan
lib/Acme/InputRecordSeparatorIsRegexp.pm view on Meta::CPAN
}
return $removed;
}
1; #
__END__
=head1 NAME
Acme::InputRecordSeparatorIsRegexp - awk doesn't have to be better at something.
=head1 VERSION
Version 0.07
=head1 SYNOPSIS
use Acme::InputRecordSeparatorIsRegexp;
# open-then-tie
open my $fh, '<', 'file-with-Win-Mac-and-Unix-line-endings';
tie *$fh, 'Acme::IRSRegexp', $fh, '\r\n|\n|\r';
while (<$fh>) {
# $_ could have "\r\n", "\n", or "\r" line ending now
}
# tie-then-open
tie *{$fh=Symbol::gensym}, 'Acme::IRSRegExp', qr/\r\n|[\r\n]/;
open $fh, '<', 'file-with-ambiguous-line-endings';
$line = <$fh>;
# import open function and use :irs pseudo-layer
use Acme::InputRecordSeparatorIsRegexp 'open';
open my $fh, '<:irs(\r\n|\r|\n)', 'ambiguous.txt';
$line = <$fh>;
# import binmode and use :irs pseudo-layer
use Acme::InputRecordSeparatorIsRegexp 'binmode';
open my $fh, '<', 'ambiguous.txt';
binmode $fh, ':irs(\r\n|\r|\n)';
$line = <$fh>;
=head1 DESCRIPTION
In the section about the L<"input record separator"|perlvar/"$/">,
C<perlvar> famously quips
=over 4
Remember: the value of $/ is a string, not a regex. B<awk>
has to be better for something. :-)
=back
This module provides a mechanism to read records from a file
using a regular expression as a record separator.
A common use case for this module is to read a text file
that you don't know whether it uses Unix (C<\n>),
Windows/DOS (C<\r\n>), or Mac (C<\r>) style line-endings,
or even if it might contain all three. To properly parse
this file, you could tie its file handle to this package with
the appropriate regular expression:
my $fh = Symbol::gensym;
tie *$fh, 'Acme::InputRecordSeparatorIsRegexp', '\r\n|\r|\n';
open $fh, '<', 'file-with-ambiguous-line-endings';
@lines = <$fh>;
# or
while (my $line = <$fh>) { ... }
The lines produced by the C<< <$fh> >> expression, like the
builtin C<readline> function and operator, include the record
separator at the end of the line, so the lines returned may end
in C<\r\n>, C<\r>, or C<\n>.
Another use case is files that contain multiple types of records
where a different sequence of characters is used to denote the
end of different types of records.
=head1 tie STATEMENT
A typical use of this package might look like
my $fh = Symbol::gensym;
tie *$fh, 'Acme::InputRecordSeparatorIsRegexp', $record_sep_regex;
open $fh, '<', $filename;
where C<$record_sep_regexp> is a string or a C<Regexp> object
(specified with the
L<< C<qr/.../>|"Quote and quote-like operators"/perlop >> notation)
containing the regular expression
you want to use for a file's line endings. Also see the convenience
method L<"open"> for an alternate way to obtain a file handle with
the features of this package.
=head1 FUNCTIONS
=head2 open
Another way of using this package to attach a regular expression
to the input record separator of a file handle, available since
v0.04, is to import this package's C<open> function and to
specify an C<:irs(...)> I<pseudo-layer>.
use Acme::InputRecordSeparatorIsRegexp 'open';
$result = open FILEHANDLE, "<:irs(REGEXP)", EXPR
$result = open FILEHANDLE, "<:irs(REGEXP)", EXPR, LIST
$result = open FILEHANDLE, "<:irs(REGEXP)", REFERENCE
$result = open my $fh, "<:irs(\r|\n|\r\n)", "ambiguous-line-endings.txt"
The C<:irs(...)> layer may be combined with other layers.
open my $fh, "<:encoding(UTF-16):irs(\R)", "ambiguous.txt"
See also: L<"binmode">
=head2 autochomp
( run in 2.117 seconds using v1.01-cache-2.11-cpan-524268b4103 )