Lingua-JA-NormalizeText
view release on metacpan or search on metacpan
ï©ç«æå£ï¨½æ·ï©
ç¹ªï¨¾æ§ªæ´æ®¼è¦ºå¸å¶½æ¨ï¨¶æ¸´ï© å¸å·å¯¬æ¡ï©ç½è§éé·é¡ï¨¸
ï©æ¸æ°£ï©é¾åæ²ç§èææ§èå³½æ¾ç¹éï©©æï¨´ï©£åé©
å³è°å¾æ ææºªç¶ç¹¼
èè¢è¼é·èæç¼ºåååæª¢æ¬ç»ç¡ç¸£éªé¡¯é©å´æå»£æéèåï©é»æ¿ç¢é½
åæ«»å殺éåæ
æ£§è ¶è´æ®ï©çµ²ï©¡é½å
è¾æ¿å¯¦è寫ï©ï©ï©é壽æ¶ï©å¾æ¾
ç¸ç¸±ï©è
èï©ç·ï©ï¨¢æå¥¬å°æ¶çï¨ç¨±èä¹å©å£¤åæ¢æ·¨ççè®éå觸寢
æ¼çï¨ç¡åç²¹éé¨é«æ¸æ¨ç¨è²é齿ç«ï©å°æ°æ·ºæ½çºè¸é¢ç¦ªæ¾ï©ï¨±é
壯層ææå·¢çç¦ç¸½èè£é¨·å¢ï¨¿èèï©¥å½å±¬çºå¢®é«å°å¸¶æ»¯èºç§ææ¾¤å®ï¨·
æè½å彿·ç¡é²æè²éï©å»³å¾µï©è½æéï¨ééµè½é»å³ï¨¦é»¨ççç¶é¬å¾·
ç¨è®ï©å±ç¹©ï©¨è²³æ±è
¦é¸å»¢æï©è³£éº¥ç¼é«®æï©æè »ï¨µï©ç¥æ¿±ï©¤ï©ªï©ç侮
ï¨æä½å塀ç«è®é勉辨ç£è¾¯èæ¥ç©å¯¶è¥è±ï¨ºæ²é£æ¯è¬æ»¿ï¨²éºµé»é¤ æ¾å½
è¥è¯è±«é¤èè½ææ¨£è¬ ä¾è³´äºï¤è¦½ï§é¾ï¤¶å
©çµç¶ 壿·ï§åµç¦®é¸éé½¡ææ·
æï©éçå廊朗æ¨ééç£å ¯å·ææ§ï©ï¨ï©ç¤ï©ç¥¿ï©ç©°è°é
OUTPUT FOR INPUT:
äºæªå§å²çºå»å£±é¸ç¨²é£²é å¶æ è¡é§
è¬åç¸è¶å¡©å¥¥å¿æ¨ªæ¬§æ®´é»æ¸©ç©ä»®ä¾¡
ç¦ç»ä¼å£æææµ·çµµæ
¨æ¦æ¡æ®»è¦å¦å²³æ¥½åæ¸è¤å§å·»å¯ææ¼¢ç¼¶è¦³é¢é¥é¡å¨
æ¢å¸°æ°ç¥äºå½æ¯ç æ§æ æè峡æçé·é¿æå¤è¬¹åºé§å²è«å¾æµæ²æ¸çµç¶
èè軽é¶è¸ææ¬ å¹å£åæ¤æ¨©ç®ç çéºé¡é¨å³å¹åºæé±å·å½ç©é»æ¸ç æ
夿¡å殺é忍æ¡èè³æ®ç¥ç³¸è¦æ¯å
è¾æ¹¿å®èåç
®ç¤¾è
é寿åè徿¸
ç£ç¸¦ç¥ç²å¦æç·ç½²è«¸å奍尿¸ç¼ç¥¥ç§°è¨¼ä¹å°å£å¬¢æ¡æµç¶ç³è²é¸å±è§¦å¯
æ
çç¥å°½å³ç²é
é髿°æ¢ç¬å£°éææçªç¯å°æ¦æµ
æ½ç¹è·µéç¦
æ½ç¥å§å
å£®å±¤ææ¿å·£äºç©ç·èè£
é¨å¢æèèµè´å³å±ç¶å ä½å¯¾å¸¯æ»å°æ»ææ²¢åå
æ
èå£å¼¾æç´é
æ¼è«é³èåºå¾´æ²è´å
é®å¡éé転ç¹ä¼é½å
çç¯å½éå¾³
ç¬èªçªå±ç¸é£å¼æ©è³è¦å»ææ¢
売麦çºé«ªæç¹æ©è®åç¢ç§æµè³é »æç¶ä¾®
ç¦æä»ä½µå¡ä¸¦å¤è¾ºåå¼å¼å¼èæ©ç©å®è¤è±å¢¨æ²¡ç¿»æ¯ä¸æºå
麺é»é¤
æ»å¼¥
è¬è¨³äºä½ä¸èªæºæ§è¬¡æ¥é ¼ä¹±æ¬è¦§éç«è両çç·å¡æ¶é¡å±ç¤¼é·éé½¢æ¦æ´
æç·´é¬çå´å»ææ¥¼é鲿¹¾å°å·ææ§æ¸çªç¢ç¶ç¥ç¦ç¦ç©£è¡é¥
## tab2space
Converts CHARACTER TABULATION (U+0009) into SPACE (U+0020).
## remove\_controls
Removes the following control characters:
U+0000 .. U+0008
U+000B
U+000C
U+000E .. U+001F
U+007F .. U+009F
Note that this option does not remove the following characters:
U+0009 CHARACTER TABULATION
U+000A LINE FEED
U+000D CARRIAGE RETURN
## remove\_DFC
Removes the following Directional Formatting Characters:
U+061C ARABIC LETTER MARK
U+2066 LEFT-TO-RIGHT ISOLATE
U+2067 RIGHT-TO-LEFT ISOLATE
U+2068 FIRST STRONG ISOLATE
U+2069 POP DIRECTIONAL ISOLATE
U+200E LEFT-TO-RIGHT MARK
U+200F RIGHT-TO-LEFT MARK
U+202A LEFT-TO-RIGHT EMBEDDING
U+202B RIGHT-TO-LEFT EMBEDDING
U+202C POP DIRECTIONAL FORMATTING
U+202D LEFT-TO-RIGHT OVERRIDE
U+202E RIGHT-TO-LEFT OVERRIDE
See [http://www.unicode.org/reports/tr9/](http://www.unicode.org/reports/tr9/) for more information about Directional Formatting Characters.
## remove\_spaces
Removes SPACE (U+0020) and IDEOGRAPHIC SPACE (U+3000).
## dakuon\_normalize, handakuon\_normalize, all\_dakuon\_normalize
See [Lingua::JA::Dakuon](https://metacpan.org/pod/Lingua::JA::Dakuon).
Note that Lingua::JA::NormalizeText enables $Lingua::JA::Dakuon::EnableCombining flag.
## square2katakana, circled2kana, circled2kanji
See [Lingua::JA::Moji](https://metacpan.org/pod/Lingua::JA::Moji).
## decompose\_parenthesized\_kanji
Decomposes the following parenthesized kanji:
ã ã¡ã¢ã£ã¤ã¥ã¦ã§ã¨ã©ãªã«ã¬ãã®ã¯ã°ã±ã²ã³ã´ãµã¶ã·ã¸ã¹ãºã»ã¼ã½ã¾ã¿ãããã
# AUTHOR
pawa <pawapawa@cpan.org>
# SEE ALSO
[æ°æ§åä½è¡¨](http://www.asahi-net.or.jp/~ax2s-kmtn/ref/old_chara.html)
[康çåå
¸](http://ja.wikipedia.org/wiki/%E5%BA%B7%E7%86%99%E5%AD%97%E5%85%B8)
[Lingua::JA::Regular::Unicode](https://metacpan.org/pod/Lingua::JA::Regular::Unicode)
[Lingua::JA::Dakuon](https://metacpan.org/pod/Lingua::JA::Dakuon)
[Lingua::JA::Moji](https://metacpan.org/pod/Lingua::JA::Moji)
[Unicode::Normalize](https://metacpan.org/pod/Unicode::Normalize)
[Unicode::Number](https://metacpan.org/pod/Unicode::Number)
[HTML::Entities](https://metacpan.org/pod/HTML::Entities)
[HTML::Scrubber](https://metacpan.org/pod/HTML::Scrubber)
# LICENSE
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.
( run in 1.332 second using v1.01-cache-2.11-cpan-71847e10f99 )