Lingua-Translit

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

    * Fixed wrong capitalised Cyrillic A in several context rules of both
      "BGN/PCGN RUS Standard" and "BGN/PCGN RUS Strict" - thanks to Nikola
      Lečić for providing the fix!
    * Spelling corrections in man page - thanks to Lucas Kanashiro for
      providing a patch!
    * Updated copyright (Netzum Sorglos Software GmbH).

0.27  -- 2017-04-26

    * Added "BGN/PCGN RUS Standard" and "BGN/PCGN RUS Strict"
      transliterations. Thanks to Nikola Lečić for contributing these tables!

0.26  -- 2016-05-23

    * Spelling corrections in man page - thanks to Lucas Kanashiro for
      providing a patch!
    * Maintainership passed from Lingua-Systems to Netzum Sorglos.
    * developer-manual__eng.pdf: removed - the documentation is now available
      online (see link in Lingua::Translit's man page).

0.25  -- 2016-04-05

    * Changed license terms from GPL v2/Artistic to Perl's license terms.
    * Added "IAST Devanagari" and "Devanagari IAST" transliterations. Thanks
      to Philip Kime for contributing these tables!

0.24  -- 2015-11-25

    * Define POD encoding (UTF-8) - thanks to Lucas Kanashiro and
      Mohammad Sajid Anwar for reporting this issue!

0.23  -- 2015-11-23

    * Added "ISO 8859-16 RON" transliteration. Thanks to Ștefan Suciu for

Changes  view on Meta::CPAN

      and contributors lack of time to fix these.
    * Code cleanups

0.19  -- 2011-04-15

    * Added "ALA-LC RUS" transliteration (ALA-LC:1997). Thanks to Dmitry Smal
      and Rusar Publishing for contributing this table!

0.18  -- 2010-07-29

    * Added transliterations:
        * "GOST 7.79 RUS OLD" (GOST 7.79:2000 B, support for Old Russian)
        * "GOST 7.79 RUS"     (GOST 7.79:2000 B, reversible)
        * "GOST 7.79 UKR"     (GOST 7.79:2000 B, reversible)
    * lib/Lingua/Translit/Tables.pm: Improved translit_list_supported()
      output (v0.9)
    * translit: Minor improvements (v0.4)
    * tools/substitute_tables.pl: Typo fixed

0.17  -- 2010-01-20

Changes  view on Meta::CPAN

        * Suited MSWin32 check for CPAN Testers tools

0.12 -- 2008-04-23

    * Added common Polish transliteration ("Common POL")

0.11 -- 2008-04-21

    * xml/din_1460_bul.xml: changed transliteration of SOFT SIGN to plain
      ASCII apostrophe.
    * New transliterations:
        * Common SLK (Slovak)
        * Common SLV (Slovenian)

0.10 -- 2008-04-17

    * lib/Lingua/Translit.pm, xml/translit.dtd: Contexts can now contain two
      patterns at a time (before and after). These are combined using AND
      logics and therefore act as a logical "inbetween".
    * xml/xml2dump.pl: Switched from XML::Simple to XML::LibXML in order to
      prevent loss of whitespace charactes like U+180E.
    * xml/greeklish.xml: GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA (U+03AB)
      bug fixed.
    * t/13_tr_Greeklish.t: Added test for upsilon bugfix.
    * Added two transliterations for Bulgarian:
        * DIN 1460
        * Streamlined System

0.09 -- 2008-04-09

    * Added Mongolian transliteration "Common Classical MON". Thanks to
      Bayanzul Lodoysamba (baynaa@users.sourceforge.net) for contributing
      this table!
    * lib/Lingua/Translit.pm: "from" and "to" transliteration patterns are
      now evaluated literally only (within \Q and \E).

Changes  view on Meta::CPAN

    * Test fixes (require Perl 5.8 if the "utf8" pragma is used)

0.02 -- 2008-02-18

    * Added Greeklish
    * Code cleanup

0.01 -- 2008-02-13

    * Initial release
    * Set of supported transliterations:
        * ISO 9
        * ISO 843
        * DIN 5008

# vim: set ft=changelog sw=4 sts=4 ts=4 ai et:

lib/Lingua/Translit.pm  view on Meta::CPAN

character, so the original spelling can be reconstructed.
Practically this is not always the case and one single letter of the
original alphabet can be transcribed as two, three or even more letters.

Furthermore there is more than one transliteration scheme for one writing
system.
Therefore it is an important and necessary information, which scheme will be
or has been used to transliterate a text, to work integrative and be able to
reconstruct the original data.

Reconstruction is a problem though for non-unique transliterations, if no
language specific knowledge is available as the resulting clusters of
letters may be ambiguous.
For example, the Greek character "PSI" maps to "ps", but "ps" could also
result from the sequence "PI", "SIGMA" since "PI" maps to "p" and "SIGMA"
maps to s.
If a transliteration table leads to ambiguous conversions, the provided
table cannot be used reverse.

Otherwise the table can be used in both directions, if appreciated.
So if ISO 9 is originally created to convert Cyrillic letters to

lib/Lingua/Translit/Tables.pm  view on Meta::CPAN

  use Lingua::Translit::Tables qw/:list/;

  translit_list_supported();

=head1 DESCRIPTION

This module is primary used to provide transliteration tables for
L<Lingua::Translit> and therefore allows one to separate data and algorithm.

Beyond that, it provides routines to check if a given transliteration is
supported and allows one to print a simple list of supported transliterations
along with some meta information.

=head1 EXPORTS

No symbols are exported by default.

Use either the routine's name or one of the following I<tags> to import
symbols to your namespace.

=over 4

lib/Lingua/Translit/Tables.pm  view on Meta::CPAN

sub translit_reverse_supported {
    my $table = _get_table_reference( _get_table_id( $_[0] ) );

    croak("Failed to retrieve table for $_[0].") unless ($table);

    return ( ( $table->{reverse} =~ /^true$/ ) ? 1 : 0 );
}

=head2 B<translit_list_supported()>

Prints a list of all supported transliterations to STDOUT (UTF-8 encoded),
providing the following information:

  * Name
  * Reversibility
  * Description

The same information is provided in this document as well:

=cut

t/xx_tr_template.t.pl  view on Meta::CPAN

use strict;
use Test::More tests => 3;  # number of tests

my $name        =   "";     # transliterations name
my $reversible  =   0;      # is the transliteration reversible?

my $input       =   "";     # short corpus...
my $output_ok   =   "";     # ...its correct transliteration

my $context     =   "";     # context-sensitive example
my $context_ok  =   "";     # ...its correct transliteration

use Lingua::Translit;

translit  view on Meta::CPAN


=item B<--trans>, B<-t> I<name>

Use the transliteration standard named I<name>.

=item B<--reverse>, B<-r>

Transliterate in reverse direction.

B<NOTE>: Not every transliteration is capable of being reversed.
If unsure, have a look at the list of supported transliterations using the
B<--list> switch.

=item B<--infile>, B<-i> I<FILE>

Read text from I<FILE>. Defaults to STDIN.

=item B<--outfile>, B<-o> I<FILE>

Write the transliterated text to I<FILE>. Defaults to STDOUT.

=item B<--list>, B<-l>

Print a list of supported transliterations showing their

=over 4

=item *

name

=item *

reversibility

translit  view on Meta::CPAN

sub show_help {
    my $retval = shift();

    print "translit v$VERSION  --  ",
      "(c) 2007-2022 Netzum Sorglos Software GmbH\n\n",
      "usage: $0 -i FILE -o FILE -t NAME -r -l -v -h\n\n",
      "  --infile   -i  FILE       read input from FILE\n",
      "  --outfile  -o  FILE       write output to FILE\n",
      "  --trans    -t  NAME       use transliteration NAME\n",
      "  --reverse  -r             transliterate in reverse direction\n",
      "  --list     -l             list all supported transliterations\n\n",
      "  --verbose  -v             print verbose status messages\n",
      "  --help     -h             show this help\n\n",
      "Read translit(1) for details.\n";

    exit($retval);
}

sub show_list {
    print "Transliterations supported by Lingua::Translit v"
      . $Lingua::Translit::VERSION . ":\n\n";

    translit_list_supported();

    exit(0);
}

=head1 RESTRICTIONS

The input has to be UTF-8 encoded.

xml/bgn-pcgn_rus_strict.xml  view on Meta::CPAN

             transliterated as '·e' after й; however, й should be
             transliterated as 'y·' before э, so an interpunct character
             appears anyway when one transliterates Йэ/йэ.
        -->
        <rule>
            <from>э</from>
            <to>·e</to>
            <context>
                <!-- capitals are already transliterated;
                     lowercase letters 'h', 'c' and 's' appear in
                     transliterations of Ж, Ш, Ц, Ч, Щ, Х, ТС/Тс
                     and ШЧ/Шч -->
                <after>[BVGDZhKLMNPRSTsFCScбвгджзклмнпрстфхцчшщ]</after>
            </context>
        </rule>

        <rule>
            <from>э</from>
            <to>e</to>
        </rule>

xml/gost_7-79_rus.xml  view on Meta::CPAN

                <before>[\p{IsUpper}\p{IsSpace}]</before>
                <after>[\p{IsUpper}\p{IsSpace}]</after>
            </context>
        </rule>
        <rule>
            <from>&#x0426;</from>   <!-- CCL TSE  -->
            <to>&#x0043;&#x007A;</to>       <!-- LCL C  LSL Z-->
        </rule>


        <!-- transliterations resulting in multiple characters -->
        <rule>
            <from>&#x0449;</from>       <!-- CSL SHCHA  -->
            <to>&#x0073;&#x0068;&#x0068;</to>   <!-- LSL S LSL H LSL H -->
        </rule>
        <rule> <!-- ALLCAPS -->
            <from>&#x0429;</from>       <!-- CCL SHCHA  -->
            <to>&#x0053;&#x0048;&#x0048;</to> <!-- LCL S LCL H LCL H -->
            <context>
                <before>[\p{IsUpper}\p{IsSpace}]</before>
                <after>[\p{IsUpper}\p{IsSpace}]</after>

xml/gost_7-79_ukr.xml  view on Meta::CPAN

                <before>[\p{IsUpper}\p{IsSpace}]</before>
                <after>[\p{IsUpper}\p{IsSpace}]</after>
            </context>
        </rule>
        <rule>
            <from>&#x0426;</from>   <!-- CCL TSE  -->
            <to>&#x0043;&#x007A;</to>       <!-- LCL C  LSL Z-->
        </rule>


        <!-- transliterations resulting in multiple characters -->
        <rule>
            <from>&#x0449;</from>       <!-- CSL SHCHA  -->
            <to>&#x0073;&#x0068;&#x0068;</to>   <!-- LSL S LSL H LSL H -->
        </rule>
        <rule> <!-- ALLCAPS -->
            <from>&#x0429;</from>       <!-- CCL SHCHA  -->
            <to>&#x0053;&#x0048;&#x0048;</to> <!-- LCL S LCL H LCL H -->
            <context>
                <before>[\p{IsUpper}\p{IsSpace}]</before>
                <after>[\p{IsUpper}\p{IsSpace}]</after>



( run in 0.371 second using v1.01-cache-2.11-cpan-d7a12ab2c7f )