Perl6-Pugs
view release on metacpan or search on metacpan
docs/notes/unicode_draft view on Meta::CPAN
Just like how XML always knows about its xml:encoding and xml:lang,
Perl string literals and documentation should always know their
encoding/lang informatino for correct presenation of e.g. pod2html
(which would depend on lang to render CJK fonts correctly).
The "lang" should never be inferred from encoding -- it makes no
sense because lang usage shifts with time: People are writing
Trad.Chinese in GBK all the time now.
2. BOM sniffing of .pl files, but currently the set it knows is
(UTF16[LB]E, UTF8+BOM, ASCII(really latin* as default))
it should be:
(UTF32[LB]E, UTF16[LB]E, UTF8(default))
3. Per-handle stackable IO layers makes sense.
But it should allow introspection into different layer-chunks:
# storage layers (:mmap)
# textual transformation layers (:encoding, :crlf)
# format (MIME Header, HTTP Header, XML)
# semantic (:language(ja))
$fh.layers.pop;
docs/notes/unicode_draft view on Meta::CPAN
$string.substr(0, 1, :bytes)
$string.index(5, :bytes)
The 0 and 1 should be the "position" type, if you write them
out as literals, it responds to the lexical setting of char unit:
.bytes # pretend strings are buffers
.codepoints # same as perl5 - not terribly useful
# - basically unsigned integers with 21 bits
.characters # this should be the default:
# - COMBINING MARKS
# - BOM (and other zero-width assertions)
.graphemes # visual rendering - includes metadata like
# - LANGUAGE TAG blocks
# - VARIATION SELECTOR
# - LTR/RTL SELECTOR
# - Act as pre-decomposed forms (for canonical decomposition)
Lexical pragma determines what 0 and 1 means, but you can also
construct them explicitly with "pos(0, :byte)" or "character(1)"
(XXX the syntax needs work)
( run in 0.367 second using v1.01-cache-2.11-cpan-e9daa2b36ef )