BOM results from the CPAN

Perl6-Pugs

view release on metacpan or search on metacpan

docs/notes/unicode_draft view on Meta::CPAN


   Just like how XML always knows about its xml:encoding and xml:lang,
   Perl string literals and documentation should always know their
   encoding/lang informatino for correct presenation of e.g. pod2html
   (which would depend on lang to render CJK fonts correctly).

   The "lang" should never be inferred from encoding -- it makes no
   sense because lang usage shifts with time: People are writing
   Trad.Chinese in GBK all the time now.

2. BOM sniffing of .pl files, but currently the set it knows is
    (UTF16[LB]E, UTF8+BOM, ASCII(really latin* as default))
   it should be:
    (UTF32[LB]E, UTF16[LB]E, UTF8(default))

3. Per-handle stackable IO layers makes sense.
   But it should allow introspection into different layer-chunks:
        # storage layers (:mmap)
        # textual transformation layers (:encoding, :crlf)
        # format (MIME Header, HTTP Header, XML)
        # semantic (:language(ja))
        $fh.layers.pop;

docs/notes/unicode_draft view on Meta::CPAN

    $string.substr(0, 1, :bytes)
    $string.index(5, :bytes)

    The 0 and 1 should be the "position" type, if you write them
    out as literals, it responds to the lexical setting of char unit:
        .bytes      # pretend strings are buffers
        .codepoints # same as perl5 - not terribly useful
                    #  - basically unsigned integers with 21 bits
        .characters # this should be the default:
                    #  - COMBINING MARKS
                    #  - BOM (and other zero-width assertions)
        .graphemes  # visual rendering - includes metadata like
                    #  - LANGUAGE TAG blocks
                    #  - VARIATION SELECTOR
                    #  - LTR/RTL SELECTOR
                    #  - Act as pre-decomposed forms (for canonical decomposition)

    Lexical pragma determines what 0 and 1 means, but you can also
    construct them explicitly with "pos(0, :byte)" or "character(1)"
    (XXX the syntax needs work)

( run in 0.367 second using v1.01-cache-2.11-cpan-e9daa2b36ef )