Locale-Unicode
view release on metacpan or search on metacpan
for hex transforms, using the xml syntax: a📖
- `xml10`
for hex transforms, using the xml decimal syntax: a📖
- `s0`
Transform source: for non-languages/scripts, such as fullwidth-halfwidth conversion
See also `d0`
Possible [values](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47/transform-destination.xml) are:
- `accents`
Accented characters to map base + punctuation, etc
- `ascii`
Map from ASCII to the target, perhaps using different conventions
- `hex`
Map characters from hex equivalents, trying all variants, eg `U+0061` to `a`; for hex variants see [transform.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47/transform.xml)
- `morse`
Map Morse Code to Unicode encoding
- `npinyin`
Map the numeric form of pinyin to the tone format
- `publish`
Map publishing characters, such as `, `, `â`, to from vanilla characters
- `zawgyi`
Map Zawgyi Myanmar encoding to Unicode
- `t0`
Machine Translation: used to indicate content that has been machine translated, or a request for a particular type of machine translation of content. The first subfield in a sequence would typically be a `platform` or vendor designation.
For example: `ja-t-de-t0-und`
- `x0`
Private Use.
For example: `ja-t-de-t0-und-x0-medical`
## Collation Options
[Parametric settings](https://unicode.org/reports/tr35/tr35-collation.html#Setting_Options) can be specified in language tags or in rule syntax (in the form \[keyword value\] ). For example, -ks-level2 or \[strength 2\] will only compare strings base...
The options description below is taken from the LDML standard, and reflect how the algorithm works when implemented by web browser, or other runtime environment. This module does not do any of those algorithms. The documentation is only here for your...
See the [standard documentation](https://unicode.org/reports/tr35/tr35-collation.html) and the [DUCET (Default Unicode Collation Element Table)](https://www.unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table) for more information.
- `ka` or `colAlternate`
Sets alternate handling for variable weights.
Possible [values](https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db72205aa5b6fc3a/common/bcp47/collation.xml#L34) are optional and can be:
- `noignore` or `non-ignorable`
Default value.
- `shifted`
- `kb` or `colBackwards`
Sets collation parameter key for backward collation weight.
Sets alternate handling for variable weights.
Possible [values](https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db72205aa5b6fc3a/common/bcp47/collation.xml#L39) are optional and can be: `true` or `yes`, `false` (default) or `no`
- `kc` or `colCaseLevel`
Sets collation parameter key for case level.
Specifies a boolean. If `on`, a level consisting only of case characteristics will be inserted in front of tertiary level, as a "Level 2.5". To ignore accents but take case into account, set strength to `primary` and case level to `on`.
Possible [values](https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db72205aa5b6fc3a/common/bcp47/collation.xml#L44) are optional and can be: `true` or `yes`, `false` (default) or `no`
- `kf` or `colCaseFirst`
Sets collation parameter key for ordering by case.
If set to upper, causes upper case to sort before lower case. If set to lower, causes lower case to sort before upper case.
Possible [values](https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db72205aa5b6fc3a/common/bcp47/collation.xml#L49) are: `upper`, `lower`, `false` (default) or `no`
- `kh` or `colHiraganaQuaternary`
Sets collation parameter key for special Hiragana handling.
This is deprecated by the LDML standard.
Specifies a boolean. Controls special treatment of Hiragana code points on quaternary level. If turned on, Hiragana codepoints will get lower values than all the other non-variable code points in shifted.
Possible [values](https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db72205aa5b6fc3a/common/bcp47/collation.xml#L55) are optional and can be: `true` (default) or `yes`, `false` or `no`
- `kk` or `colNormalization`
Sets collation parameter key for normalisation.
Specifies a boolean. If on, then the normal [UCA](https://www.unicode.org/reports/tr41/#UTS10) algorithm is used.
Possible [values](https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db72205aa5b6fc3a/common/bcp47/collation.xml#L60) are optional and can be: `true` (default) or `yes`, `false` or `no`
- `kn` or `colNumeric`
Sets collation parameter key for numeric handling.
Specifies a boolean. If set to on, any sequence of Decimal Digits is sorted at a primary level with its numeric value.
Possible [values](https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db72205aa5b6fc3a/common/bcp47/collation.xml#L65) are optional and can be: `true` or `yes`, `false` (default) or `no`
- `kr` or `colReorder`
Sets collation reorder codes.
Specifies a reordering of scripts or other significant blocks of characters such as symbols, punctuation, and digits.
Possible [values](https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db72205aa5b6fc3a/common/bcp47/collation.xml#L70) are: `currency`, `digit`, `punct`, `space`, `symbol`, or any BCP47 script ID.
Also possible: `others` where all codes not explicitly mentioned should be ordered. The script code Zzzz (Unknown Script) is a synonym for others.
For example:
- `en-u-kr-latn-digit`
Reorder digits after Latin characters.
- `en-u-kr-arab-cyrl-others-symbol`
Reorder Arabic characters first, then Cyrillic, and put symbols at the endâafter all other characters.
- `en-u-kr-others`
Remove any locale-specific reordering, and use DUCET order for reordering blocks.
- `ks` or `colStrength`
Sets the collation parameter key for collation strength used for comparison.
Possible [values](https://github.com/unicode-org/cldr/blob/5ae2965c8afed18f89f54195db72205aa5b6fc3a/common/bcp47/collation.xml#L79) are:
- `level1` or `primary`
- `level2` or `secondary`
- `level3` (default) or `tertiary`
- `level4` or `quaternary` or `quarternary`
- `identic` or `identical`
- `kv`
Sets the collation parameter key for `maxVariable`, the last reordering group to be affected by `ka-shifted`.
Possible values are:
- `currency`
Spaces, punctuation and all symbols are affected by ka-shifted.
- `punct`
Spaces and punctuation are affected by ka-shifted (CLDR default).
- `space`
Only spaces are affected by ka-shifted.
- `symbol`
Spaces, punctuation and symbols except for currency symbols are affected by ka-shifted (UCA default).
- `vt`
Sets the parameter key for the variable top.
**This is deprecated by the LDML standard.**
# AUTHOR
Jacques Deguest <`jack@deguest.jp`>
# SEE ALSO
[https://github.com/unicode-org/cldr/tree/main/common/bcp47](https://github.com/unicode-org/cldr/tree/main/common/bcp47), [https://en.wikipedia.org/wiki/IETF\_language\_tag](https://en.wikipedia.org/wiki/IETF_language_tag)
[https://www.rfc-editor.org/info/bcp47](https://www.rfc-editor.org/info/bcp47)
[Unicode Locale Data Markup Language](https://unicode.org/reports/tr35/)
[BCP47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt)
[RFC6067 on the Unicode extensions](https://datatracker.ietf.org/doc/html/rfc6067)
[RFC6497 on the transformation extension](https://datatracker.ietf.org/doc/html/rfc6497)
# COPYRIGHT & LICENSE
( run in 3.712 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )