Lingua-JA-NormalizeText
view release on metacpan or search on metacpan
lib/Lingua/JA/NormalizeText.pm view on Meta::CPAN
èè¢è¼é·èæç¼ºåååæª¢æ¬ç»ç¡ç¸£éªé¡¯é©å´æå»£æéèåï©é»æ¿ç¢é½
åæ«»å殺éåæ
æ£§è ¶è´æ®ï©çµ²ï©¡é½å
è¾æ¿å¯¦è寫ï©ï©ï©é壽æ¶ï©å¾æ¾
ç¸ç¸±ï©è
èï©ç·ï©ï¨¢æå¥¬å°æ¶çï¨ç¨±èä¹å©å£¤åæ¢æ·¨ççè®éå觸寢
æ¼çï¨ç¡åç²¹éé¨é«æ¸æ¨ç¨è²é齿ç«ï©å°æ°æ·ºæ½çºè¸é¢ç¦ªæ¾ï©ï¨±é
壯層ææå·¢çç¦ç¸½èè£é¨·å¢ï¨¿èèï©¥å½å±¬çºå¢®é«å°å¸¶æ»¯èºç§ææ¾¤å®ï¨·
æè½å彿·ç¡é²æè²éï©å»³å¾µï©è½æéï¨ééµè½é»å³ï¨¦é»¨ççç¶é¬å¾·
ç¨è®ï©å±ç¹©ï©¨è²³æ±è
¦é¸å»¢æï©è³£éº¥ç¼é«®æï©æè »ï¨µï©ç¥æ¿±ï©¤ï©ªï©ç侮
ï¨æä½å塀ç«è®é勉辨ç£è¾¯èæ¥ç©å¯¶è¥è±ï¨ºæ²é£æ¯è¬æ»¿ï¨²éºµé»é¤ æ¾å½
è¥è¯è±«é¤èè½ææ¨£è¬ ä¾è³´äºï¤è¦½ï§é¾ï¤¶å
©çµç¶ 壿·ï§åµç¦®é¸éé½¡ææ·
æï©éçå廊朗æ¨ééç£å ¯å·ææ§ï©ï¨ï©ç¤ï©ç¥¿ï©ç©°è°é
OUTPUT FOR INPUT:
äºæªå§å²çºå»å£±é¸ç¨²é£²é å¶æ è¡é§
è¬åç¸è¶å¡©å¥¥å¿æ¨ªæ¬§æ®´é»æ¸©ç©ä»®ä¾¡
ç¦ç»ä¼å£æææµ·çµµæ
¨æ¦æ¡æ®»è¦å¦å²³æ¥½åæ¸è¤å§å·»å¯ææ¼¢ç¼¶è¦³é¢é¥é¡å¨
æ¢å¸°æ°ç¥äºå½æ¯ç æ§æ æè峡æçé·é¿æå¤è¬¹åºé§å²è«å¾æµæ²æ¸çµç¶
èè軽é¶è¸ææ¬ å¹å£åæ¤æ¨©ç®ç çéºé¡é¨å³å¹åºæé±å·å½ç©é»æ¸ç æ
夿¡å殺é忍æ¡èè³æ®ç¥ç³¸è¦æ¯å
è¾æ¹¿å®èåç
®ç¤¾è
é寿åè徿¸
ç£ç¸¦ç¥ç²å¦æç·ç½²è«¸å奍尿¸ç¼ç¥¥ç§°è¨¼ä¹å°å£å¬¢æ¡æµç¶ç³è²é¸å±è§¦å¯
æ
çç¥å°½å³ç²é
é髿°æ¢ç¬å£°éææçªç¯å°æ¦æµ
æ½ç¹è·µéç¦
æ½ç¥å§å
å£®å±¤ææ¿å·£äºç©ç·èè£
é¨å¢æèèµè´å³å±ç¶å ä½å¯¾å¸¯æ»å°æ»ææ²¢åå
æ
èå£å¼¾æç´é
æ¼è«é³èåºå¾´æ²è´å
é®å¡éé転ç¹ä¼é½å
çç¯å½éå¾³
ç¬èªçªå±ç¸é£å¼æ©è³è¦å»ææ¢
売麦çºé«ªæç¹æ©è®åç¢ç§æµè³é »æç¶ä¾®
ç¦æä»ä½µå¡ä¸¦å¤è¾ºåå¼å¼å¼èæ©ç©å®è¤è±å¢¨æ²¡ç¿»æ¯ä¸æºå
麺é»é¤
æ»å¼¥
è¬è¨³äºä½ä¸èªæºæ§è¬¡æ¥é ¼ä¹±æ¬è¦§éç«è両çç·å¡æ¶é¡å±ç¤¼é·éé½¢æ¦æ´
æç·´é¬çå´å»ææ¥¼é鲿¹¾å°å·ææ§æ¸çªç¢ç¶ç¥ç¦ç¦ç©£è¡é¥
=head2 tab2space
Converts CHARACTER TABULATION (U+0009) into SPACE (U+0020).
=head2 remove_controls
Removes the following control characters:
U+0000 .. U+0008
U+000B
U+000C
U+000E .. U+001F
U+007F .. U+009F
Note that this option does not remove the following characters:
U+0009 CHARACTER TABULATION
U+000A LINE FEED
U+000D CARRIAGE RETURN
=head2 remove_DFC
Removes the following Directional Formatting Characters:
U+061C ARABIC LETTER MARK
U+2066 LEFT-TO-RIGHT ISOLATE
U+2067 RIGHT-TO-LEFT ISOLATE
U+2068 FIRST STRONG ISOLATE
U+2069 POP DIRECTIONAL ISOLATE
U+200E LEFT-TO-RIGHT MARK
U+200F RIGHT-TO-LEFT MARK
U+202A LEFT-TO-RIGHT EMBEDDING
U+202B RIGHT-TO-LEFT EMBEDDING
U+202C POP DIRECTIONAL FORMATTING
U+202D LEFT-TO-RIGHT OVERRIDE
U+202E RIGHT-TO-LEFT OVERRIDE
See L<http://www.unicode.org/reports/tr9/> for more information about Directional Formatting Characters.
=head2 remove_spaces
Removes SPACE (U+0020) and IDEOGRAPHIC SPACE (U+3000).
=head2 dakuon_normalize, handakuon_normalize, all_dakuon_normalize
See L<Lingua::JA::Dakuon>.
Note that Lingua::JA::NormalizeText enables $Lingua::JA::Dakuon::EnableCombining flag.
=head2 square2katakana, circled2kana, circled2kanji
See L<Lingua::JA::Moji>.
=head2 decompose_parenthesized_kanji
Decomposes the following parenthesized kanji:
ã ã¡ã¢ã£ã¤ã¥ã¦ã§ã¨ã©ãªã«ã¬ãã®ã¯ã°ã±ã²ã³ã´ãµã¶ã·ã¸ã¹ãºã»ã¼ã½ã¾ã¿ãããã
=head1 AUTHOR
pawa E<lt>pawapawa@cpan.orgE<gt>
=head1 SEE ALSO
L<æ°æ§åä½è¡¨|http://www.asahi-net.or.jp/~ax2s-kmtn/ref/old_chara.html>
L<康çåå
¸|http://ja.wikipedia.org/wiki/%E5%BA%B7%E7%86%99%E5%AD%97%E5%85%B8>
L<Lingua::JA::Regular::Unicode>
L<Lingua::JA::Dakuon>
L<Lingua::JA::Moji>
L<Unicode::Normalize>
L<Unicode::Number>
L<HTML::Entities>
L<HTML::Scrubber>
=head1 LICENSE
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.
=cut
( run in 1.078 second using v1.01-cache-2.11-cpan-71847e10f99 )