Char-Replace

 view release on metacpan or  search on metacpan

lib/Char/Replace.pm  view on Meta::CPAN

     $MAP[ ord('d') ] = '5';     # replace all 'd' characters by '5'
 }
 
 # we can now use our map to replace the string
 
 is Char::Replace::replace( q[abcd], \@MAP ), q[AAbc5], "a -> AA ; d -> 5";
 
 { # trim XS helper
     # remove spaces at the beginning and end of a string - XS helper
 
     is Char::Replace::trim( qq[ Some spaces in this string.\n\r\n] ), q[Some spaces in this string.];    
 }
 
 { # trim_inplace: modify string in place (zero allocation)
     my $str = qq[  Some spaces  \n];
     my $removed = Char::Replace::trim_inplace( $str );
     is $str, q[Some spaces], "trim_inplace modifies in place";
     is $removed, 5, "5 whitespace bytes removed";
 }
 
 done_testing;

=head1 DESCRIPTION

Char::Replace

XS helpers to perform some basic character replacement on strings.

=over

=item replace: replace (transliterate) one or more ASCII characters

=item replace_inplace: fast in-place 1:1 character replacement (no allocation)

=item trim: remove leading and trailing spaces of a string

=item trim_inplace: in-place whitespace trimming (no allocation)

=back

=head1 Available functions

=head2 $output = replace( $string, $MAP )

Return a new string '$output' using the replacement map provided by $MAP (Array Ref).
Map entries can be:

=over

=item a string (PV) — replaces the character with that string

=item an empty string — deletes the character from the output

=item an integer (IV) — replaces the character with C<chr(value)> (0–255)

=item undef — keeps the original character unchanged

=item a code ref — called with the character as argument; return value is the replacement
(return undef to keep original, empty string to delete)

B<Note:> Code ref callbacks are significantly slower than static replacements due to
function call overhead. Avoid heavy computation inside callbacks. Callbacks receive
a single-character string argument; for tainted input, the output string inherits the
taint flag from the input (not from the callback return value).

=back

view L</SYNOPSIS> or example just after.

Setting a map entry to an empty string deletes the character from the output:

    $map->[ ord('x') ] = q[];    # delete 'x'
    Char::Replace::replace( "fox", $map ) eq "fo" or die;

Setting a map entry to an integer replaces the character with chr(value):

    $map->[ ord('a') ] = ord('A');  # replace 'a' with 'A'
    Char::Replace::replace( "abc", $map ) eq "Abc" or die;

Setting a map entry to a code ref enables dynamic replacement:

    $map->[ ord('a') ] = sub { uc $_[0] };  # uppercase callback
    Char::Replace::replace( "abc", $map ) eq "Abc" or die;

    # stateful callback
    my $n = 0;
    $map->[ ord('x') ] = sub { ++$n };
    Char::Replace::replace( "xyx", $map ) eq "1y2" or die;

=head2 $map = identity_map()

This is a convenient helper to initialize an ASCII mapping.
It returns an Array Ref, where every character will map to itself by default.

You can then adjust one or several characters.

    my $map = Char::Replace::identity_map();
    $map->[ ord('a') ] = q[XYZ]; # replace 'a' by 'XYZ'

    # replaces all 'a' by 'XYZ'
    Char::Replace::replace( "abcdabcd" ) eq "XYZbcdXYZbcd" or die;

=head2 $map = build_map( char => replacement, ... )

Convenience constructor: takes a hash of single-character keys and their
replacement values, and returns an array ref suitable for C<replace()> or
C<replace_inplace()>. Starts from an identity map, so unmapped characters
pass through unchanged.

    my $map = Char::Replace::build_map(
        'a' => 'AA',
        'd' => '',       # delete
        'x' => ord('X'), # IV
        'z' => sub { uc $_[0] },  # callback
    );
    Char::Replace::replace( "abxd", $map ) eq "AAbX" or die;

Croaks if any key is not exactly one character.

=head2 $count = replace_inplace( $string, $MAP )

Modifies C<$string> in place, applying 1:1 byte replacements from C<$MAP>.
Returns the number of bytes actually changed.

Unlike C<replace()>, this function does B<not> allocate a new string — it
modifies the existing SV buffer directly. This makes it significantly faster
(up to 3.5x for long strings) but restricts map entries to single-character
replacements only:

=over

=item a single-character string (PV of length 1)

=item an integer (IV) in range 0–255

=item undef — keeps the original character unchanged

=back

Multi-character strings, empty strings (deletion), and code refs will cause a croak.
Use C<replace()> when you need expansion, deletion, or dynamic callbacks.

    my $map = Char::Replace::identity_map();
    $map->[ ord('a') ] = 'A';

    my $str = "abcabc";
    my $n = Char::Replace::replace_inplace( $str, $map );
    # $str is now "AbcAbc", $n is 2

UTF-8 safety applies: multi-byte sequences are skipped, only ASCII bytes
are eligible for replacement.

=head2 $string = trim( $string )

trim removes all trailing and leading characters of a string
Trailing and leading space characters  ' ', '\r', '\n', '\t', '\f' are removed.
A new string is returned.

The removal is performed in XS.
We only need to look at the beginning and end of the string.

The UTF-8 state of a string is preserved.

=head2 $count = trim_inplace( $string )

Modifies C<$string> in place, removing leading and trailing whitespace.
Returns the total number of whitespace bytes removed.

Unlike C<trim()>, this function does B<not> allocate a new string — it
modifies the existing SV directly. Uses C<sv_chop()> internally for
efficient leading-whitespace removal.

The same whitespace characters as C<trim()> are recognized:
C<' '>, C<'\r'>, C<'\n'>, C<'\t'>, C<'\f'>.

    my $str = "  hello world  ";
    my $n = Char::Replace::trim_inplace( $str );
    # $str is now "hello world", $n is 4

The UTF-8 state of the string is preserved.

=head1 Benchmarks

=head2 char_replace

 #!perl
 
 use strict;
 use warnings;
 
 use Test2::Bundle::Extended;
 use Test2::Tools::Explain;
 
 use Char::Replace;
 
 our ( $STR, @MAP );
 
 {
     note "benchmark";
     use Benchmark;
 



( run in 0.930 second using v1.01-cache-2.11-cpan-cdf2f3d4e48 )