String-Multibyte
view release on metacpan or search on metacpan
Multibyte.pm view on Meta::CPAN
If the specified string does not contain any character
in the search list,
the returned value equals the length of the string.
C<SEARCHLIST> can be an C<ARRAYREF>.
e.g. if a charset treats C<CRLF> as a single character,
C<"\r\n"> is a one-element list of only C<"\r\n">.
A two-element list of C<"\r"> and C<"\n"> can be
given as C<["\r", "\n"]> (of course C<"\n\r"> is also ok
since the character order of C<SEARCHLIST> doesn't matter in C<strcspn>).
=back
=head2 Substring
=over 4
=item C<$mbcs-E<gt>substr(STRING or SCALAR REF, OFFSET)>
=item C<$mbcs-E<gt>substr(STRING or SCALAR REF, OFFSET, LENGTH)>
=item C<$mbcs-E<gt>substr(SCALAR, OFFSET, LENGTH, REPLACEMENT)>
It works like C<CORE::substr>, but
using character semantics of multibyte charset encoding.
If the C<REPLACEMENT> as the fourth argument is specified, replaces
parts of the C<SCALAR> and returns what was there before.
You can utilize the lvalue reference,
returned if a reference of scalar variable is used as the first argument.
${ $mbcs->substr(\$str,$off,$len) } = $replace;
works like
CORE::substr($str,$off,$len) = $replace;
The returned lvalue is not multibyte-aware,
then successive assignment may lead to odd results.
=back
=head2 Split
=over 4
=item C<$mbcs-E<gt>strsplit(SEPARATOR, STRING)>
=item C<$mbcs-E<gt>strsplit(SEPARATOR, STRING, LIMIT)>
This function emulates C<CORE::split>, but splits on the C<SEPARATOR> string,
not by a pattern.
If not in list context, only return the number of fields found,
but does not split into the C<@_> array.
If empty string is specified as C<SEPARATOR>, splits the specified string
into characters.
$bytes->strsplit('', 'This is perl.', 7);
# ('T', 'h', 'i', 's', ' ', 'i', 's perl.')
=back
=head2 Character Range
=over 4
=item C<$mbcs-E<gt>mkrange(CHARLIST, ALLOW_REVERSE)>
Returns the character list (not in list context, as a concatenated string)
gained by parsing the specified character range.
The result depends on the the character order for the concerned charset.
About the character order for each charset, see its definition file.
If the character order is undefined in the definition file,
returns an identical string with the specified string.
A character range is specified with a hyphen (C<'-'>, but exactly
speaking, C<$obj-E<gt>{hyphen}>).
The backslashed combinations C<'\-'> and C<'\\'>
(exactly speaking, C<"$obj-E<gt>{escape}$obj-E<gt>{hyphen}">
and C<"$obj-E<gt>{escape}$obj-E<gt>{escape}">) are used
instead of the characters C<'-'> and C<'\'>, respectively.
The hyphen at the beginning or the end of the range
is also evaluated as the hyphen itself.
For example, C<$mbcs-E<gt>mkrange('+\-0-9A-F')> returns
C<('+', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'A', 'B', 'C', 'D', 'E', 'F')>
and C<scalar $mbcs-E<gt>mkrange('A-P')> returns C<'ABCDEFGHIJKLMNOP'>.
If true value is specified as the second argument,
reverse character ranges such as C<'9-0'>, C<'Z-A'> are allowed.
$bytes = String::Multibyte->new('Bytes');
$bytes->mkrange('p-e-r-l', 1); # ponmlkjihgfefghijklmnopqrqponml
=back
=head2 Transliteration
=over 4
=item C<$mbcs-E<gt>strtr(STRING or SCALAR REF, SEARCHLIST, REPLACEMENTLIST)>
=item C<$mbcs-E<gt>strtr(STRING or SCALAR REF, SEARCHLIST, REPLACEMENTLIST, MODIFIER)>
Transliterates all occurrences of the characters found in the search list
with the corresponding character in the replacement list.
If a reference of scalar variable is specified as the first argument,
returns the number of characters replaced or deleted;
otherwise, returns the transliterated string and
the specified string is unaffected.
If C<'h'> modifier is specified, returns a hash of histogram in list context;
a reference to hash of histogram in scalar context;
( run in 0.661 second using v1.01-cache-2.11-cpan-39bf76dae61 )