Locale-Unicode
view release on metacpan or search on metacpan
Accented characters to map base + punctuation, etc
* "ascii"
Map from ASCII to the target, perhaps using different
conventions
* "hex"
Map characters from hex equivalents, trying all variants, eg
"U+0061" to "a"; for hex variants see transform.xml
<https://github.com/unicode-org/cldr/blob/maint/maint-41/com
mon/bcp47/transform.xml>
* "morse"
Map Morse Code to Unicode encoding
* "npinyin"
Map the numeric form of pinyin to the tone format
* "publish"
Map publishing characters, such as ", ", "â", to from
vanilla characters
* "zawgyi"
Map Zawgyi Myanmar encoding to Unicode
* "t0"
Machine Translation: used to indicate content that has been machine
translated, or a request for a particular type of machine
translation of content. The first subfield in a sequence would
typically be a "platform" or vendor designation.
For example: "ja-t-de-t0-und"
* "x0"
Private Use.
For example: "ja-t-de-t0-und-x0-medical"
Collation Options
Parametric settings
<https://unicode.org/reports/tr35/tr35-collation.html#Setting_Options>
can be specified in language tags or in rule syntax (in the form
[keyword value] ). For example, -ks-level2 or [strength 2] will only
compare strings based on their primary and secondary weights.
The options description below is taken from the LDML standard, and
reflect how the algorithm works when implemented by web browser, or
other runtime environment. This module does not do any of those
algorithms. The documentation is only here for your benefit and
convenience.
See the standard documentation
<https://unicode.org/reports/tr35/tr35-collation.html> and the DUCET
(Default Unicode Collation Element Table)
<https://www.unicode.org/reports/tr10/#Default_Unicode_Collation_Element
_Table> for more information.
* "ka" or "colAlternate"
Sets alternate handling for variable weights.
Possible values
<https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
72205aa5b6fc3a/common/bcp47/collation.xml#L34> are optional and can
be:
* "noignore" or "non-ignorable"
Default value.
* "shifted"
* "kb" or "colBackwards"
Sets collation parameter key for backward collation weight.
Sets alternate handling for variable weights.
Possible values
<https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
72205aa5b6fc3a/common/bcp47/collation.xml#L39> are optional and can
be: "true" or "yes", "false" (default) or "no"
* "kc" or "colCaseLevel"
Sets collation parameter key for case level.
Specifies a boolean. If "on", a level consisting only of case
characteristics will be inserted in front of tertiary level, as a
"Level 2.5". To ignore accents but take case into account, set
strength to "primary" and case level to "on".
Possible values
<https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
72205aa5b6fc3a/common/bcp47/collation.xml#L44> are optional and can
be: "true" or "yes", "false" (default) or "no"
* "kf" or "colCaseFirst"
Sets collation parameter key for ordering by case.
If set to upper, causes upper case to sort before lower case. If set
to lower, causes lower case to sort before upper case.
Possible values
<https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
72205aa5b6fc3a/common/bcp47/collation.xml#L49> are: "upper",
"lower", "false" (default) or "no"
* "kh" or "colHiraganaQuaternary"
Sets collation parameter key for special Hiragana handling.
shifted.
Possible values
<https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
72205aa5b6fc3a/common/bcp47/collation.xml#L55> are optional and can
be: "true" (default) or "yes", "false" or "no"
* "kk" or "colNormalization"
Sets collation parameter key for normalisation.
Specifies a boolean. If on, then the normal UCA
<https://www.unicode.org/reports/tr41/#UTS10> algorithm is used.
Possible values
<https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
72205aa5b6fc3a/common/bcp47/collation.xml#L60> are optional and can
be: "true" (default) or "yes", "false" or "no"
* "kn" or "colNumeric"
Sets collation parameter key for numeric handling.
Specifies a boolean. If set to on, any sequence of Decimal Digits is
sorted at a primary level with its numeric value.
Possible values
<https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
72205aa5b6fc3a/common/bcp47/collation.xml#L65> are optional and can
be: "true" or "yes", "false" (default) or "no"
* "kr" or "colReorder"
Sets collation reorder codes.
Specifies a reordering of scripts or other significant blocks of
characters such as symbols, punctuation, and digits.
Possible values
<https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
72205aa5b6fc3a/common/bcp47/collation.xml#L70> are: "currency",
"digit", "punct", "space", "symbol", or any BCP47 script ID.
Also possible: "others" where all codes not explicitly mentioned
should be ordered. The script code Zzzz (Unknown Script) is a
synonym for others.
For example:
* "en-u-kr-latn-digit"
Reorder digits after Latin characters.
* "en-u-kr-arab-cyrl-others-symbol"
Reorder Arabic characters first, then Cyrillic, and put
symbols at the endâafter all other characters.
* "en-u-kr-others"
Remove any locale-specific reordering, and use DUCET order
for reordering blocks.
* "ks" or "colStrength"
Sets the collation parameter key for collation strength used for
comparison.
Possible values
<https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db
72205aa5b6fc3a/common/bcp47/collation.xml#L79> are:
* "level1" or "primary"
* "level2" or "secondary"
* "level3" (default) or "tertiary"
* "level4" or "quaternary" or "quarternary"
* "identic" or "identical"
* "kv"
Sets the collation parameter key for "maxVariable", the last
reordering group to be affected by "ka-shifted".
Possible values are:
* "currency"
Spaces, punctuation and all symbols are affected by
ka-shifted.
* "punct"
Spaces and punctuation are affected by ka-shifted (CLDR
default).
* "space"
Only spaces are affected by ka-shifted.
* "symbol"
Spaces, punctuation and symbols except for currency symbols
are affected by ka-shifted (UCA default).
* "vt"
Sets the parameter key for the variable top.
This is deprecated by the LDML standard.
AUTHOR
Jacques Deguest <jack@deguest.jp>
SEE ALSO
<https://github.com/unicode-org/cldr/tree/main/common/bcp47>,
<https://en.wikipedia.org/wiki/IETF_language_tag>
( run in 2.011 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )