Lingua-Translit
view release on metacpan or search on metacpan
* Fixed wrong capitalised Cyrillic A in several context rules of both
"BGN/PCGN RUS Standard" and "BGN/PCGN RUS Strict" - thanks to Nikola
LeÄiÄ for providing the fix!
* Spelling corrections in man page - thanks to Lucas Kanashiro for
providing a patch!
* Updated copyright (Netzum Sorglos Software GmbH).
0.27 -- 2017-04-26
* Added "BGN/PCGN RUS Standard" and "BGN/PCGN RUS Strict"
transliterations. Thanks to Nikola LeÄiÄ for contributing these tables!
0.26 -- 2016-05-23
* Spelling corrections in man page - thanks to Lucas Kanashiro for
providing a patch!
* Maintainership passed from Lingua-Systems to Netzum Sorglos.
* developer-manual__eng.pdf: removed - the documentation is now available
online (see link in Lingua::Translit's man page).
0.25 -- 2016-04-05
* Changed license terms from GPL v2/Artistic to Perl's license terms.
* Added "IAST Devanagari" and "Devanagari IAST" transliterations. Thanks
to Philip Kime for contributing these tables!
0.24 -- 2015-11-25
* Define POD encoding (UTF-8) - thanks to Lucas Kanashiro and
Mohammad Sajid Anwar for reporting this issue!
0.23 -- 2015-11-23
* Added "ISO 8859-16 RON" transliteration. Thanks to Ètefan Suciu for
and contributors lack of time to fix these.
* Code cleanups
0.19 -- 2011-04-15
* Added "ALA-LC RUS" transliteration (ALA-LC:1997). Thanks to Dmitry Smal
and Rusar Publishing for contributing this table!
0.18 -- 2010-07-29
* Added transliterations:
* "GOST 7.79 RUS OLD" (GOST 7.79:2000 B, support for Old Russian)
* "GOST 7.79 RUS" (GOST 7.79:2000 B, reversible)
* "GOST 7.79 UKR" (GOST 7.79:2000 B, reversible)
* lib/Lingua/Translit/Tables.pm: Improved translit_list_supported()
output (v0.9)
* translit: Minor improvements (v0.4)
* tools/substitute_tables.pl: Typo fixed
0.17 -- 2010-01-20
* Suited MSWin32 check for CPAN Testers tools
0.12 -- 2008-04-23
* Added common Polish transliteration ("Common POL")
0.11 -- 2008-04-21
* xml/din_1460_bul.xml: changed transliteration of SOFT SIGN to plain
ASCII apostrophe.
* New transliterations:
* Common SLK (Slovak)
* Common SLV (Slovenian)
0.10 -- 2008-04-17
* lib/Lingua/Translit.pm, xml/translit.dtd: Contexts can now contain two
patterns at a time (before and after). These are combined using AND
logics and therefore act as a logical "inbetween".
* xml/xml2dump.pl: Switched from XML::Simple to XML::LibXML in order to
prevent loss of whitespace charactes like U+180E.
* xml/greeklish.xml: GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA (U+03AB)
bug fixed.
* t/13_tr_Greeklish.t: Added test for upsilon bugfix.
* Added two transliterations for Bulgarian:
* DIN 1460
* Streamlined System
0.09 -- 2008-04-09
* Added Mongolian transliteration "Common Classical MON". Thanks to
Bayanzul Lodoysamba (baynaa@users.sourceforge.net) for contributing
this table!
* lib/Lingua/Translit.pm: "from" and "to" transliteration patterns are
now evaluated literally only (within \Q and \E).
* Test fixes (require Perl 5.8 if the "utf8" pragma is used)
0.02 -- 2008-02-18
* Added Greeklish
* Code cleanup
0.01 -- 2008-02-13
* Initial release
* Set of supported transliterations:
* ISO 9
* ISO 843
* DIN 5008
# vim: set ft=changelog sw=4 sts=4 ts=4 ai et:
lib/Lingua/Translit.pm view on Meta::CPAN
character, so the original spelling can be reconstructed.
Practically this is not always the case and one single letter of the
original alphabet can be transcribed as two, three or even more letters.
Furthermore there is more than one transliteration scheme for one writing
system.
Therefore it is an important and necessary information, which scheme will be
or has been used to transliterate a text, to work integrative and be able to
reconstruct the original data.
Reconstruction is a problem though for non-unique transliterations, if no
language specific knowledge is available as the resulting clusters of
letters may be ambiguous.
For example, the Greek character "PSI" maps to "ps", but "ps" could also
result from the sequence "PI", "SIGMA" since "PI" maps to "p" and "SIGMA"
maps to s.
If a transliteration table leads to ambiguous conversions, the provided
table cannot be used reverse.
Otherwise the table can be used in both directions, if appreciated.
So if ISO 9 is originally created to convert Cyrillic letters to
lib/Lingua/Translit/Tables.pm view on Meta::CPAN
use Lingua::Translit::Tables qw/:list/;
translit_list_supported();
=head1 DESCRIPTION
This module is primary used to provide transliteration tables for
L<Lingua::Translit> and therefore allows one to separate data and algorithm.
Beyond that, it provides routines to check if a given transliteration is
supported and allows one to print a simple list of supported transliterations
along with some meta information.
=head1 EXPORTS
No symbols are exported by default.
Use either the routine's name or one of the following I<tags> to import
symbols to your namespace.
=over 4
lib/Lingua/Translit/Tables.pm view on Meta::CPAN
sub translit_reverse_supported {
my $table = _get_table_reference( _get_table_id( $_[0] ) );
croak("Failed to retrieve table for $_[0].") unless ($table);
return ( ( $table->{reverse} =~ /^true$/ ) ? 1 : 0 );
}
=head2 B<translit_list_supported()>
Prints a list of all supported transliterations to STDOUT (UTF-8 encoded),
providing the following information:
* Name
* Reversibility
* Description
The same information is provided in this document as well:
=cut
t/xx_tr_template.t.pl view on Meta::CPAN
use strict;
use Test::More tests => 3; # number of tests
my $name = ""; # transliterations name
my $reversible = 0; # is the transliteration reversible?
my $input = ""; # short corpus...
my $output_ok = ""; # ...its correct transliteration
my $context = ""; # context-sensitive example
my $context_ok = ""; # ...its correct transliteration
use Lingua::Translit;
=item B<--trans>, B<-t> I<name>
Use the transliteration standard named I<name>.
=item B<--reverse>, B<-r>
Transliterate in reverse direction.
B<NOTE>: Not every transliteration is capable of being reversed.
If unsure, have a look at the list of supported transliterations using the
B<--list> switch.
=item B<--infile>, B<-i> I<FILE>
Read text from I<FILE>. Defaults to STDIN.
=item B<--outfile>, B<-o> I<FILE>
Write the transliterated text to I<FILE>. Defaults to STDOUT.
=item B<--list>, B<-l>
Print a list of supported transliterations showing their
=over 4
=item *
name
=item *
reversibility
sub show_help {
my $retval = shift();
print "translit v$VERSION -- ",
"(c) 2007-2022 Netzum Sorglos Software GmbH\n\n",
"usage: $0 -i FILE -o FILE -t NAME -r -l -v -h\n\n",
" --infile -i FILE read input from FILE\n",
" --outfile -o FILE write output to FILE\n",
" --trans -t NAME use transliteration NAME\n",
" --reverse -r transliterate in reverse direction\n",
" --list -l list all supported transliterations\n\n",
" --verbose -v print verbose status messages\n",
" --help -h show this help\n\n",
"Read translit(1) for details.\n";
exit($retval);
}
sub show_list {
print "Transliterations supported by Lingua::Translit v"
. $Lingua::Translit::VERSION . ":\n\n";
translit_list_supported();
exit(0);
}
=head1 RESTRICTIONS
The input has to be UTF-8 encoded.
xml/bgn-pcgn_rus_strict.xml view on Meta::CPAN
transliterated as '·e' after й; however, й should be
transliterated as 'y·' before Ñ, so an interpunct character
appears anyway when one transliterates ÐÑ/йÑ.
-->
<rule>
<from>Ñ</from>
<to>·e</to>
<context>
<!-- capitals are already transliterated;
lowercase letters 'h', 'c' and 's' appear in
transliterations of Ð, Ш, Ц, Ч, Щ, Ð¥, ТС/ТÑ
and ШЧ/Ð¨Ñ -->
<after>[BVGDZhKLMNPRSTsFCScбвгджзклмнпÑÑÑÑÑ
ÑÑÑÑ]</after>
</context>
</rule>
<rule>
<from>Ñ</from>
<to>e</to>
</rule>
xml/gost_7-79_rus.xml view on Meta::CPAN
<before>[\p{IsUpper}\p{IsSpace}]</before>
<after>[\p{IsUpper}\p{IsSpace}]</after>
</context>
</rule>
<rule>
<from>Ц</from> <!-- CCL TSE -->
<to>Cz</to> <!-- LCL C LSL Z-->
</rule>
<!-- transliterations resulting in multiple characters -->
<rule>
<from>щ</from> <!-- CSL SHCHA -->
<to>shh</to> <!-- LSL S LSL H LSL H -->
</rule>
<rule> <!-- ALLCAPS -->
<from>Щ</from> <!-- CCL SHCHA -->
<to>SHH</to> <!-- LCL S LCL H LCL H -->
<context>
<before>[\p{IsUpper}\p{IsSpace}]</before>
<after>[\p{IsUpper}\p{IsSpace}]</after>
xml/gost_7-79_ukr.xml view on Meta::CPAN
<before>[\p{IsUpper}\p{IsSpace}]</before>
<after>[\p{IsUpper}\p{IsSpace}]</after>
</context>
</rule>
<rule>
<from>Ц</from> <!-- CCL TSE -->
<to>Cz</to> <!-- LCL C LSL Z-->
</rule>
<!-- transliterations resulting in multiple characters -->
<rule>
<from>щ</from> <!-- CSL SHCHA -->
<to>shh</to> <!-- LSL S LSL H LSL H -->
</rule>
<rule> <!-- ALLCAPS -->
<from>Щ</from> <!-- CCL SHCHA -->
<to>SHH</to> <!-- LCL S LCL H LCL H -->
<context>
<before>[\p{IsUpper}\p{IsSpace}]</before>
<after>[\p{IsUpper}\p{IsSpace}]</after>
( run in 0.371 second using v1.01-cache-2.11-cpan-d7a12ab2c7f )