Char-Replace
view release on metacpan or search on metacpan
lib/Char/Replace.pm view on Meta::CPAN
$MAP[ ord('d') ] = '5'; # replace all 'd' characters by '5'
}
# we can now use our map to replace the string
is Char::Replace::replace( q[abcd], \@MAP ), q[AAbc5], "a -> AA ; d -> 5";
{ # trim XS helper
# remove spaces at the beginning and end of a string - XS helper
is Char::Replace::trim( qq[ Some spaces in this string.\n\r\n] ), q[Some spaces in this string.];
}
{ # trim_inplace: modify string in place (zero allocation)
my $str = qq[ Some spaces \n];
my $removed = Char::Replace::trim_inplace( $str );
is $str, q[Some spaces], "trim_inplace modifies in place";
is $removed, 5, "5 whitespace bytes removed";
}
done_testing;
=head1 DESCRIPTION
Char::Replace
XS helpers to perform some basic character replacement on strings.
=over
=item replace: replace (transliterate) one or more ASCII characters
=item replace_inplace: fast in-place 1:1 character replacement (no allocation)
=item trim: remove leading and trailing spaces of a string
=item trim_inplace: in-place whitespace trimming (no allocation)
=back
=head1 Available functions
=head2 $output = replace( $string, $MAP )
Return a new string '$output' using the replacement map provided by $MAP (Array Ref).
Map entries can be:
=over
=item a string (PV) â replaces the character with that string
=item an empty string â deletes the character from the output
=item an integer (IV) â replaces the character with C<chr(value)> (0â255)
=item undef â keeps the original character unchanged
=item a code ref â called with the character as argument; return value is the replacement
(return undef to keep original, empty string to delete)
B<Note:> Code ref callbacks are significantly slower than static replacements due to
function call overhead. Avoid heavy computation inside callbacks. Callbacks receive
a single-character string argument; for tainted input, the output string inherits the
taint flag from the input (not from the callback return value).
=back
view L</SYNOPSIS> or example just after.
Setting a map entry to an empty string deletes the character from the output:
$map->[ ord('x') ] = q[]; # delete 'x'
Char::Replace::replace( "fox", $map ) eq "fo" or die;
Setting a map entry to an integer replaces the character with chr(value):
$map->[ ord('a') ] = ord('A'); # replace 'a' with 'A'
Char::Replace::replace( "abc", $map ) eq "Abc" or die;
Setting a map entry to a code ref enables dynamic replacement:
$map->[ ord('a') ] = sub { uc $_[0] }; # uppercase callback
Char::Replace::replace( "abc", $map ) eq "Abc" or die;
# stateful callback
my $n = 0;
$map->[ ord('x') ] = sub { ++$n };
Char::Replace::replace( "xyx", $map ) eq "1y2" or die;
=head2 $map = identity_map()
This is a convenient helper to initialize an ASCII mapping.
It returns an Array Ref, where every character will map to itself by default.
You can then adjust one or several characters.
my $map = Char::Replace::identity_map();
$map->[ ord('a') ] = q[XYZ]; # replace 'a' by 'XYZ'
# replaces all 'a' by 'XYZ'
Char::Replace::replace( "abcdabcd" ) eq "XYZbcdXYZbcd" or die;
=head2 $map = build_map( char => replacement, ... )
Convenience constructor: takes a hash of single-character keys and their
replacement values, and returns an array ref suitable for C<replace()> or
C<replace_inplace()>. Starts from an identity map, so unmapped characters
pass through unchanged.
my $map = Char::Replace::build_map(
'a' => 'AA',
'd' => '', # delete
'x' => ord('X'), # IV
'z' => sub { uc $_[0] }, # callback
);
Char::Replace::replace( "abxd", $map ) eq "AAbX" or die;
Croaks if any key is not exactly one character.
=head2 $count = replace_inplace( $string, $MAP )
Modifies C<$string> in place, applying 1:1 byte replacements from C<$MAP>.
Returns the number of bytes actually changed.
Unlike C<replace()>, this function does B<not> allocate a new string â it
modifies the existing SV buffer directly. This makes it significantly faster
(up to 3.5x for long strings) but restricts map entries to single-character
replacements only:
=over
=item a single-character string (PV of length 1)
=item an integer (IV) in range 0â255
=item undef â keeps the original character unchanged
=back
Multi-character strings, empty strings (deletion), and code refs will cause a croak.
Use C<replace()> when you need expansion, deletion, or dynamic callbacks.
my $map = Char::Replace::identity_map();
$map->[ ord('a') ] = 'A';
my $str = "abcabc";
my $n = Char::Replace::replace_inplace( $str, $map );
# $str is now "AbcAbc", $n is 2
UTF-8 safety applies: multi-byte sequences are skipped, only ASCII bytes
are eligible for replacement.
=head2 $string = trim( $string )
trim removes all trailing and leading characters of a string
Trailing and leading space characters ' ', '\r', '\n', '\t', '\f' are removed.
A new string is returned.
The removal is performed in XS.
We only need to look at the beginning and end of the string.
The UTF-8 state of a string is preserved.
=head2 $count = trim_inplace( $string )
Modifies C<$string> in place, removing leading and trailing whitespace.
Returns the total number of whitespace bytes removed.
Unlike C<trim()>, this function does B<not> allocate a new string â it
modifies the existing SV directly. Uses C<sv_chop()> internally for
efficient leading-whitespace removal.
The same whitespace characters as C<trim()> are recognized:
C<' '>, C<'\r'>, C<'\n'>, C<'\t'>, C<'\f'>.
my $str = " hello world ";
my $n = Char::Replace::trim_inplace( $str );
# $str is now "hello world", $n is 4
The UTF-8 state of the string is preserved.
=head1 Benchmarks
=head2 char_replace
#!perl
use strict;
use warnings;
use Test2::Bundle::Extended;
use Test2::Tools::Explain;
use Char::Replace;
our ( $STR, @MAP );
{
note "benchmark";
use Benchmark;
( run in 0.930 second using v1.01-cache-2.11-cpan-cdf2f3d4e48 )