CET results from the CPAN

Locale-Unicode

view release on metacpan or search on metacpan

                Accented characters to map base + punctuation, etc

        *       "ascii"

                Map from ASCII to the target, perhaps using different
                conventions

        *       "hex"

                Map characters from hex equivalents, trying all variants, eg
                "U+0061" to "a"; for hex variants see transform.xml
                <https://github.com/unicode-org/cldr/blob/maint/maint-41/com
                mon/bcp47/transform.xml>

        *       "morse"

                Map Morse Code to Unicode encoding

        *       "npinyin"

                Map the numeric form of pinyin to the tone format

        *       "publish"

                Map publishing characters, such as ", ", "â€”", to from
                vanilla characters

        *       "zawgyi"

                Map Zawgyi Myanmar encoding to Unicode

    *   "t0"

        Machine Translation: used to indicate content that has been machine
        translated, or a request for a particular type of machine
        translation of content. The first subfield in a sequence would
        typically be a "platform" or vendor designation.

        For example: "ja-t-de-t0-und"

    *   "x0"

        Private Use.

        For example: "ja-t-de-t0-und-x0-medical"

  Collation Options
    Parametric settings
    <https://unicode.org/reports/tr35/tr35-collation.html#Setting_Options>
    can be specified in language tags or in rule syntax (in the form
    [keyword value] ). For example, -ks-level2 or [strength 2] will only
    compare strings based on their primary and secondary weights.

    The options description below is taken from the LDML standard, and
    reflect how the algorithm works when implemented by web browser, or
    other runtime environment. This module does not do any of those
    algorithms. The documentation is only here for your benefit and
    convenience.

    See the standard documentation
    <https://unicode.org/reports/tr35/tr35-collation.html> and the DUCET
    (Default Unicode Collation Element Table)
    <https://www.unicode.org/reports/tr10/#Default_Unicode_Collation_Element
    _Table> for more information.

    *   "ka" or "colAlternate"

        Sets alternate handling for variable weights.

        Possible values
        <https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
        72205aa5b6fc3a/common/bcp47/collation.xml#L34> are optional and can
        be:

        *       "noignore" or "non-ignorable"

                Default value.

        *       "shifted"

    *   "kb" or "colBackwards"

        Sets collation parameter key for backward collation weight.

        Sets alternate handling for variable weights.

        Possible values
        <https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
        72205aa5b6fc3a/common/bcp47/collation.xml#L39> are optional and can
        be: "true" or "yes", "false" (default) or "no"

    *   "kc" or "colCaseLevel"

        Sets collation parameter key for case level.

        Specifies a boolean. If "on", a level consisting only of case
        characteristics will be inserted in front of tertiary level, as a
        "Level 2.5". To ignore accents but take case into account, set
        strength to "primary" and case level to "on".

        Possible values
        <https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
        72205aa5b6fc3a/common/bcp47/collation.xml#L44> are optional and can
        be: "true" or "yes", "false" (default) or "no"

    *   "kf" or "colCaseFirst"

        Sets collation parameter key for ordering by case.

        If set to upper, causes upper case to sort before lower case. If set
        to lower, causes lower case to sort before upper case.

        Possible values
        <https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
        72205aa5b6fc3a/common/bcp47/collation.xml#L49> are: "upper",
        "lower", "false" (default) or "no"

    *   "kh" or "colHiraganaQuaternary"

        Sets collation parameter key for special Hiragana handling.

README view on Meta::CPAN

        shifted.

        Possible values
        <https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
        72205aa5b6fc3a/common/bcp47/collation.xml#L55> are optional and can
        be: "true" (default) or "yes", "false" or "no"

    *   "kk" or "colNormalization"

        Sets collation parameter key for normalisation.

        Specifies a boolean. If on, then the normal UCA
        <https://www.unicode.org/reports/tr41/#UTS10> algorithm is used.

        Possible values
        <https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
        72205aa5b6fc3a/common/bcp47/collation.xml#L60> are optional and can
        be: "true" (default) or "yes", "false" or "no"

    *   "kn" or "colNumeric"

        Sets collation parameter key for numeric handling.

        Specifies a boolean. If set to on, any sequence of Decimal Digits is
        sorted at a primary level with its numeric value.

        Possible values
        <https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
        72205aa5b6fc3a/common/bcp47/collation.xml#L65> are optional and can
        be: "true" or "yes", "false" (default) or "no"

    *   "kr" or "colReorder"

        Sets collation reorder codes.

        Specifies a reordering of scripts or other significant blocks of
        characters such as symbols, punctuation, and digits.

        Possible values
        <https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
        72205aa5b6fc3a/common/bcp47/collation.xml#L70> are: "currency",
        "digit", "punct", "space", "symbol", or any BCP47 script ID.

        Also possible: "others" where all codes not explicitly mentioned
        should be ordered. The script code Zzzz (Unknown Script) is a
        synonym for others.

        For example:

        *       "en-u-kr-latn-digit"

                Reorder digits after Latin characters.

        *       "en-u-kr-arab-cyrl-others-symbol"

                Reorder Arabic characters first, then Cyrillic, and put
                symbols at the endâ€”after all other characters.

        *       "en-u-kr-others"

                Remove any locale-specific reordering, and use DUCET order
                for reordering blocks.

    *   "ks" or "colStrength"

        Sets the collation parameter key for collation strength used for
        comparison.

        Possible values
        <https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
        72205aa5b6fc3a/common/bcp47/collation.xml#L79> are:

        *       "level1" or "primary"

        *       "level2" or "secondary"

        *       "level3" (default) or "tertiary"

        *       "level4" or "quaternary" or "quarternary"

        *       "identic" or "identical"

    *   "kv"

        Sets the collation parameter key for "maxVariable", the last
        reordering group to be affected by "ka-shifted".

        Possible values are:

        *       "currency"

                Spaces, punctuation and all symbols are affected by
                ka-shifted.

        *       "punct"

                Spaces and punctuation are affected by ka-shifted (CLDR
                default).

        *       "space"

                Only spaces are affected by ka-shifted.

        *       "symbol"

                Spaces, punctuation and symbols except for currency symbols
                are affected by ka-shifted (UCA default).

    *   "vt"

        Sets the parameter key for the variable top.

        This is deprecated by the LDML standard.

AUTHOR
    Jacques Deguest <jack@deguest.jp>

SEE ALSO
    <https://github.com/unicode-org/cldr/tree/main/common/bcp47>,
    <https://en.wikipedia.org/wiki/IETF_language_tag>

( run in 2.011 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )