Lingua-JA-NormalizeText

 view release on metacpan or  search on metacpan

README.md  view on Meta::CPAN


Removes the following Directional Formatting Characters:

    U+061C  ARABIC LETTER MARK
    U+2066  LEFT-TO-RIGHT ISOLATE
    U+2067  RIGHT-TO-LEFT ISOLATE
    U+2068  FIRST STRONG ISOLATE
    U+2069  POP DIRECTIONAL ISOLATE
    U+200E  LEFT-TO-RIGHT MARK
    U+200F  RIGHT-TO-LEFT MARK
    U+202A  LEFT-TO-RIGHT EMBEDDING
    U+202B  RIGHT-TO-LEFT EMBEDDING
    U+202C  POP DIRECTIONAL FORMATTING
    U+202D  LEFT-TO-RIGHT OVERRIDE
    U+202E  RIGHT-TO-LEFT OVERRIDE

See [http://www.unicode.org/reports/tr9/](http://www.unicode.org/reports/tr9/) for more information about Directional Formatting Characters.

## remove\_spaces

Removes SPACE (U+0020) and IDEOGRAPHIC SPACE (U+3000).

lib/Lingua/JA/NormalizeText.pm  view on Meta::CPAN


Removes the following Directional Formatting Characters:

  U+061C  ARABIC LETTER MARK
  U+2066  LEFT-TO-RIGHT ISOLATE
  U+2067  RIGHT-TO-LEFT ISOLATE
  U+2068  FIRST STRONG ISOLATE
  U+2069  POP DIRECTIONAL ISOLATE
  U+200E  LEFT-TO-RIGHT MARK
  U+200F  RIGHT-TO-LEFT MARK
  U+202A  LEFT-TO-RIGHT EMBEDDING
  U+202B  RIGHT-TO-LEFT EMBEDDING
  U+202C  POP DIRECTIONAL FORMATTING
  U+202D  LEFT-TO-RIGHT OVERRIDE
  U+202E  RIGHT-TO-LEFT OVERRIDE

See L<http://www.unicode.org/reports/tr9/> for more information about Directional Formatting Characters.


=head2 remove_spaces

Removes SPACE (U+0020) and IDEOGRAPHIC SPACE (U+3000).



( run in 0.984 second using v1.01-cache-2.11-cpan-71847e10f99 )