Encode
view release on metacpan or search on metacpan
Unicode/Unicode.pm view on Meta::CPAN
Back then, the range 0xD800-0xDFFF was not allocated. Let's split
that range in half and use the first half to represent the C<upper
half of a character> and the second half to represent the C<lower
half of a character>. That way, you can represent 1024 * 1024 =
1048576 more characters. Now we can store character ranges up to
\x{10ffff} even with 16-bit encodings. This pair of half-character is
now called a I<surrogate pair> and UTF-16 is the name of the encoding
that embraces them.
Here is a formula to ensurrogate a Unicode character \x{10000} and
above;
$hi = ($uni - 0x10000) / 0x400 + 0xD800;
$lo = ($uni - 0x10000) % 0x400 + 0xDC00;
And to desurrogate;
$uni = 0x10000 + ($hi - 0xD800) * 0x400 + ($lo - 0xDC00);
Note this move has made \x{D800}-\x{DFFF} into a forbidden zone but
( run in 0.576 second using v1.01-cache-2.11-cpan-26ccb49234f )