Crypt-Bear
view release on metacpan or search on metacpan
src/x509/asn1.t0 view on Meta::CPAN
\ Read one byte, that should be a trailing UTF-8 byte, and complement the
\ current value. On error, value is set to -1.
: read-UTF8-chunk ( lim val -- lim val )
swap
\ If we are at the end of the value, report an error but don't fail.
dup ifnot 2drop 0 -1 ret then
read8 rot
dup 0< if swap drop ret then 6 <<
swap dup 6 >> 2 <> if 2drop -1 ret then
0x3F and + ;
: high-surrogate? ( x -- x bool )
dup 0xD800 0xDBFF between? ;
: low-surrogate? ( x -- x bool )
dup 0xDC00 0xDFFF between? ;
: assemble-surrogate-pair ( hi lim lo -- lim val )
low-surrogate? ifnot rot 2drop 0 ret then
rot 10 << + 0x35FDC00 - ;
\ Read a UTF-16 code point (big-endian). Returned value is 0 on error.
: read-UTF16BE ( lim -- lim val )
read16be
choice
high-surrogate? uf
swap dup ifnot 2drop 0 0 ret then
read16be assemble-surrogate-pair
enduf
low-surrogate? uf
drop 0
enduf
endchoice ;
\ Read a UTF-16 code point (little-endian). Returned value is 0 on error.
: read-UTF16LE ( lim -- lim val )
read16le
choice
high-surrogate? uf
swap dup ifnot 2drop 0 0 ret then
read16le assemble-surrogate-pair
enduf
low-surrogate? uf
drop 0
enduf
endchoice ;
\ Add byte to current pad value. Offset is updated, or set to 0 on error.
: pad-append ( off val -- off )
over dup 0= swap 256 >= or if 2drop 0 ret then
over addr-pad + set8 1+ ;
\ Add UTF-8 chunk byte to the pad. The 'nn' parameter is the shift count.
: pad-append-UTF8-chunk ( off val nn -- off )
>> 0x3F and 0x80 or pad-append ;
\ Test whether a code point is invalid when encoding. This rejects the
\ 66 noncharacters, and also the surrogate range; this function does NOT
\ check that the value is in the 0..10FFFF range.
: valid-unicode? ( val -- bool )
dup 0xFDD0 0xFDEF between? if drop 0 ret then
dup 0xD800 0xDFFF between? if drop 0 ret then
0xFFFF and 0xFFFE < ;
\ Encode a code point in UTF-8. Offset is in the pad; it is updated, or
\ set to 0 on error. Leading BOM are ignored.
: encode-UTF8 ( val off -- off )
\ Skip leading BOM (U+FEFF when off is 1).
dup2 1 = swap 0xFEFF = and if swap drop ret then
swap dup { val }
dup valid-unicode? ifnot 2drop 0 ret then
choice
dup 0x80 < uf pad-append enduf
dup 0x800 < uf
6 >> 0xC0 or pad-append
val 0 pad-append-UTF8-chunk
enduf
dup 0xFFFF < uf
12 >> 0xE0 or pad-append
val 6 pad-append-UTF8-chunk
val 0 pad-append-UTF8-chunk
enduf
18 >> 0xF0 or pad-append
val 12 pad-append-UTF8-chunk
val 6 pad-append-UTF8-chunk
val 0 pad-append-UTF8-chunk
endchoice ;
\ Read a string value into the pad; this function checks that the source
\ characters are UTF-8 and non-zero. The string length (in bytes) is
\ written in the first pad byte. Returned value is true (-1) on success,
\ false (0) on error.
: read-value-UTF8 ( lim -- lim bool )
read-length-open-elt
1 { off }
begin dup while
read-UTF8 dup ifnot drop skip-close-elt 0 ret then
off encode-UTF8 >off
repeat
drop off dup ifnot ret then 1- addr-pad set8 -1 ;
\ Decode a UTF-16 string into the pad. The string is converted to UTF-8,
\ and the length is written in the first pad byte. A leading BOM is
\ honoured (big-endian is assumed if there is no BOM). A code point of
\ value 0 is an error. Returned value is true (-1) on success, false (0)
\ on error.
: read-value-UTF16 ( lim -- lim bool )
read-length-open-elt
dup ifnot addr-pad set8 -1 ret then
1 { off }
read-UTF16BE dup 0xFFFE = if
\ Leading BOM, and indicates little-endian.
drop
begin dup while
read-UTF16LE dup ifnot drop skip-close-elt 0 ret then
off encode-UTF8 >off
repeat
else
dup ifnot drop skip-close-elt 0 ret then
\ Big-endian BOM, or no BOM.
begin
off encode-UTF8 >off
dup while
read-UTF16BE dup ifnot drop skip-close-elt 0 ret then
repeat
then
drop off dup ifnot ret then 1- addr-pad set8 -1 ;
\ Decode a latin-1 string into the pad. The string is converted to UTF-8,
\ and the length is written in the first pad byte. A source byte of
\ value 0 is an error. Returned value is true (-1) on success, false (0)
( run in 0.606 second using v1.01-cache-2.11-cpan-59e3e3084b8 )