File-CountLines

 view release on metacpan or  search on metacpan

README  view on Meta::CPAN

    The file is read in equally sized blocks. The size of the blocks can be
    supplied with the "blocksize" option. The default is 4096, and can be
    changed by setting $File::CountLines::BlockSize.

    Do not use a block size smaller than the length of the separator, that
    might produce wrong results. (In general there's no reason to chose a
    smaller block size at all. Depending on your size a larger block size
    might speed up things a bit.)

Character Encodings
    If you supply a separator yourself, it should not be a decoded string.

    The file is read in binary mode, which implies that this module works
    fine for text files in ASCII-compatible encodings, including ASCII
    itself, UTF-8 and all the ISO-8859-* encodings (aka Latin-1, Latin-2,
    ...).

    Note that the multi byte encodings like UTF-32, UTF-16le, UTF-16be and
    UCS-2 encode a line feed character in a way that the 0x0A byte is a
    substring of the encoded character, but if you search blindly for that
    byte you will get false positives. For example the *LATIN CAPITAL LETTER

lib/File/CountLines.pm  view on Meta::CPAN

can be supplied with the C<blocksize> option. The default is 4096,
and can be changed by setting C<$File::CountLines::BlockSize>.

Do not use a block size smaller than the length of the separator, that
might produce wrong results. (In general there's no reason to chose a
smaller block size at all. Depending on your size a larger block size
might speed up things a bit.)

=head1 Character Encodings

If you supply a separator yourself, it should not be a decoded string.

The file is read in binary mode, which implies that this module
works fine for text files in ASCII-compatible encodings, including
ASCII itself, UTF-8 and all the ISO-8859-* encodings (aka Latin-1,
Latin-2, ...).

Note that the multi byte encodings like UTF-32, UTF-16le, UTF-16be
and UCS-2 encode a line feed character in a way that the C<0x0A> byte
is a substring of the encoded character, but if you search blindly for
that byte you will get false positives. For example the I<LATIN CAPITAL



( run in 0.367 second using v1.01-cache-2.11-cpan-a9ef4e587e4 )