BOM results from the CPAN

Crypt-Bear

view release on metacpan or search on metacpan


\ Test whether a code point is invalid when encoding. This rejects the
\ 66 noncharacters, and also the surrogate range; this function does NOT
\ check that the value is in the 0..10FFFF range.
: valid-unicode? ( val -- bool )
	dup 0xFDD0 0xFDEF between? if drop 0 ret then
	dup 0xD800 0xDFFF between? if drop 0 ret then
	0xFFFF and 0xFFFE < ;

\ Encode a code point in UTF-8. Offset is in the pad; it is updated, or
\ set to 0 on error. Leading BOM are ignored.
: encode-UTF8 ( val off -- off )
	\ Skip leading BOM (U+FEFF when off is 1).
	dup2 1 = swap 0xFEFF = and if swap drop ret then

	swap dup { val }
	dup valid-unicode? ifnot 2drop 0 ret then
	choice
		dup 0x80 < uf pad-append enduf
		dup 0x800 < uf
			6 >> 0xC0 or pad-append
			val 0 pad-append-UTF8-chunk
		enduf

src/x509/asn1.t0 view on Meta::CPAN

: read-value-UTF8 ( lim -- lim bool )
	read-length-open-elt
	1 { off }
	begin dup while
		read-UTF8 dup ifnot drop skip-close-elt 0 ret then
		off encode-UTF8 >off
	repeat
	drop off dup ifnot ret then 1- addr-pad set8 -1 ;

\ Decode a UTF-16 string into the pad. The string is converted to UTF-8,
\ and the length is written in the first pad byte. A leading BOM is
\ honoured (big-endian is assumed if there is no BOM). A code point of
\ value 0 is an error. Returned value is true (-1) on success, false (0)
\ on error.
: read-value-UTF16 ( lim -- lim bool )
	read-length-open-elt
	dup ifnot addr-pad set8 -1 ret then
	1 { off }
	read-UTF16BE dup 0xFFFE = if
		\ Leading BOM, and indicates little-endian.
		drop
		begin dup while
			read-UTF16LE dup ifnot drop skip-close-elt 0 ret then
			off encode-UTF8 >off
		repeat
	else
		dup ifnot drop skip-close-elt 0 ret then
		\ Big-endian BOM, or no BOM.
		begin
			off encode-UTF8 >off
			dup while
			read-UTF16BE dup ifnot drop skip-close-elt 0 ret then
		repeat
	then
	drop off dup ifnot ret then 1- addr-pad set8 -1 ;

\ Decode a latin-1 string into the pad. The string is converted to UTF-8,
\ and the length is written in the first pad byte. A source byte of

( run in 0.356 second using v1.01-cache-2.11-cpan-e9daa2b36ef )