Acme-InputRecordSeparatorIsRegexp

 view release on metacpan or  search on metacpan

lib/Acme/InputRecordSeparatorIsRegexp.pm  view on Meta::CPAN

    }
    return $removed;
}

1; # 

__END__

=head1 NAME

Acme::InputRecordSeparatorIsRegexp - awk doesn't have to be better at something.

=head1 VERSION

Version 0.07

=head1 SYNOPSIS

    use Acme::InputRecordSeparatorIsRegexp;

    # open-then-tie
    open my $fh, '<', 'file-with-Win-Mac-and-Unix-line-endings';
    tie *$fh, 'Acme::IRSRegexp', $fh, '\r\n|\n|\r';
    while (<$fh>) {
        # $_ could have "\r\n", "\n", or "\r" line ending now
    }

    # tie-then-open
    tie *{$fh=Symbol::gensym}, 'Acme::IRSRegExp', qr/\r\n|[\r\n]/;
    open $fh, '<', 'file-with-ambiguous-line-endings';
    $line = <$fh>;

    # import open function and use :irs pseudo-layer
    use Acme::InputRecordSeparatorIsRegexp 'open';
    open my $fh, '<:irs(\r\n|\r|\n)', 'ambiguous.txt';
    $line = <$fh>;

    # import binmode and use :irs pseudo-layer
    use Acme::InputRecordSeparatorIsRegexp 'binmode';
    open my $fh, '<', 'ambiguous.txt';
    binmode $fh, ':irs(\r\n|\r|\n)';
    $line = <$fh>;

=head1 DESCRIPTION

In the section about the L<"input record separator"|perlvar/"$/">,
C<perlvar> famously quips

=over 4

Remember: the value of $/ is a string, not a regex. B<awk>
has to be better for something. :-)

=back

This module provides a mechanism to read records from a file
using a regular expression as a record separator.

A common use case for this module is to read a text file 
that you don't know whether it uses Unix (C<\n>), 
Windows/DOS (C<\r\n>), or Mac (C<\r>) style line-endings, 
or even if it might contain all three. To properly parse
this file, you could tie its file handle to this package with
the appropriate regular expression:

    my $fh = Symbol::gensym;
    tie *$fh, 'Acme::InputRecordSeparatorIsRegexp', '\r\n|\r|\n';
    open $fh, '<', 'file-with-ambiguous-line-endings';

    @lines = <$fh>;
    # or
    while (my $line = <$fh>) { ... }

The lines produced by the C<< <$fh> >> expression, like the
builtin C<readline> function and operator, include the record
separator at the end of the line, so the lines returned may end
in C<\r\n>, C<\r>, or C<\n>.

Another use case is files that contain multiple types of records
where a different sequence of characters is used to denote the
end of different types of records.

=head1 tie STATEMENT

A typical use of this package might look like

    my $fh = Symbol::gensym;
    tie *$fh, 'Acme::InputRecordSeparatorIsRegexp', $record_sep_regex;
    open $fh, '<', $filename;

where C<$record_sep_regexp> is a string or a C<Regexp> object 
(specified with the 
L<< C<qr/.../>|"Quote and quote-like operators"/perlop >> notation)
containing the regular expression
you want to use for a file's line endings. Also see the convenience
method L<"open"> for an alternate way to obtain a file handle with
the features of this package.

=head1 FUNCTIONS

=head2 open

Another way of using this package to attach a regular expression
to the input record separator of a file handle, available since
v0.04,  is to import this package's C<open> function and to
specify an C<:irs(...)> I<pseudo-layer>.

   use Acme::InputRecordSeparatorIsRegexp 'open';
   $result = open FILEHANDLE, "<:irs(REGEXP)", EXPR
   $result = open FILEHANDLE, "<:irs(REGEXP)", EXPR, LIST
   $result = open FILEHANDLE, "<:irs(REGEXP)", REFERENCE

   $result = open my $fh, "<:irs(\r|\n|\r\n)", "ambiguous-line-endings.txt"

The C<:irs(...)> layer may be combined with other layers.

   open my $fh, "<:encoding(UTF-16):irs(\R)", "ambiguous.txt"

See also: L<"binmode">

=head2 autochomp



( run in 2.117 seconds using v1.01-cache-2.11-cpan-524268b4103 )