Lingua-Translit
view release on metacpan or search on metacpan
lib/Lingua/Translit.pm view on Meta::CPAN
#
# Copyright (C) 2007-2008 ...
# Alex Linke <alinke@lingua-systems.com>
# Rona Linke <rlinke@lingua-systems.com>
# Copyright (C) 2009-2016 Lingua-Systems Software GmbH
# Copyright (C) 2016-2017 Netzum Sorglos, Lingua-Systems Software GmbH
# Copyright (C) 2017-2022 Netzum Sorglos Software GmbH
#
use strict;
use warnings;
require 5.008;
use Carp qw/croak/;
use Encode qw/encode decode/;
use Lingua::Translit::Tables;
our $VERSION = '0.29';
=pod
=encoding utf8
=head1 NAME
Lingua::Translit - transliterates text between writing systems
=head1 SYNOPSIS
use Lingua::Translit;
my $tr = new Lingua::Translit("ISO 843");
my $text_tr = $tr->translit("character oriented string");
if ($tr->can_reverse()) {
$text_tr = $tr->translit_reverse("character oriented string");
}
=head1 DESCRIPTION
Lingua::Translit can be used to convert text from one writing system to
another, based on national or international transliteration tables.
Where possible a reverse transliteration is supported.
The term C<transliteration> describes the conversion of text from one
writing system or alphabet to another one.
The conversion is ideally unique, mapping one character to exactly one
character, so the original spelling can be reconstructed.
Practically this is not always the case and one single letter of the
original alphabet can be transcribed as two, three or even more letters.
Furthermore there is more than one transliteration scheme for one writing
system.
Therefore it is an important and necessary information, which scheme will be
or has been used to transliterate a text, to work integrative and be able to
reconstruct the original data.
Reconstruction is a problem though for non-unique transliterations, if no
language specific knowledge is available as the resulting clusters of
letters may be ambiguous.
For example, the Greek character "PSI" maps to "ps", but "ps" could also
result from the sequence "PI", "SIGMA" since "PI" maps to "p" and "SIGMA"
maps to s.
If a transliteration table leads to ambiguous conversions, the provided
table cannot be used reverse.
Otherwise the table can be used in both directions, if appreciated.
So if ISO 9 is originally created to convert Cyrillic letters to
the Latin alphabet, the reverse transliteration will transform Latin
letters to Cyrillic.
=head1 METHODS
=head2 new(I<"name of table">)
Initializes an object with the specific transliteration table, e.g. "ISO 9".
=cut
sub new {
my $class = shift();
my $name = shift();
my $self;
# Assure that a table name was set
croak("No transliteration name given.") unless $name;
# Stay compatible with programs that use Lingua::Translit < 0.05
if ( $name =~ /^DIN 5008$/i ) {
$name = "Common DEU";
}
my $table = Lingua::Translit::Tables::_get_table_reference($name);
# Check that a table reference was assigned to the object
croak("No table found for $name.") unless $table;
# Assure the table's data is complete
croak("$name table: missing 'name'") unless defined $table->{name};
croak("$name table: missing 'desc'") unless defined $table->{desc};
croak("$name table: missing 'reverse'") unless defined $table->{reverse};
croak("$name table: missing 'rules'") unless defined $table->{rules};
# Copy over the table's data
$self->{name} = $table->{name};
$self->{desc} = $table->{desc};
$self->{rules} = $table->{rules};
# Set a truth value of the transliteration's reversibility according to
# the natural language string in the original transliteration table
$self->{reverse} = ( $table->{reverse} =~ /^true$/i ) ? 1 : 0;
undef($table);
return bless $self, $class;
}
( run in 1.743 second using v1.01-cache-2.11-cpan-d7a12ab2c7f )