Locale-Maketext
view release on metacpan or search on metacpan
lib/Locale/Maketext/TPJ13.pod view on Meta::CPAN
sub { return "Your search expression was malformed." }
However, not everything you can write in Perl code can be written in
the above shorthand system -- not by a long shot. For example, consider
the Italian translator from the beginning of this article, who wanted
the Italian for "I didn't find any files" as a special case, instead
of "I found 0 files". That couldn't be specified (at least not easily
or simply) in our shorthand system, and it would have to be written
out in full, like this:
sub { # pretend the English strings are in Italian
my($handle, $files, $dirs) = @_[0,1,2];
return "I didn't find any files" unless $files;
return join '',
"I found ",
$handle->quant($files, 'file'),
" in ",
$handle->quant($dirs, 'directory'),
".";
}
Next to a lexicon full of shorthand code, that sort of sticks out like a
sore thumb -- but this I<is> a special case, after all; and at least
it's possible, if not as concise as usual.
As to how you'd implement the Russian example from the beginning of
the article, well, There's More Than One Way To Do It, but it could be
something like this (using English words for Russian, just so you know
what's going on):
"I [quant,_1,directory,accusative] scanned."
This shifts the burden of complexity off to the quant method. That
method's parameters are: the numeric value it's going to use to
quantify something; the Russian word it's going to quantify; and the
parameter "accusative", which you're using to mean that this
sentence's syntax wants a noun in the accusative case there, although
that quantification method may have to overrule, for grammatical
reasons you may recall from the beginning of this article.
Now, the Russian quant method here is responsible not only for
implementing the strange logic necessary for figuring out how Russian
number-phrases impose case and number on their noun-phrases, but also
for inflecting the Russian word for "directory". How that inflection
is to be carried out is no small issue, and among the solutions I've
seen, some (like variations on a simple lookup in a hash where all
possible forms are provided for all necessary words) are
straightforward but I<can> become cumbersome when you need to inflect
more than a few dozen words; and other solutions (like using
algorithms to model the inflections, storing only root forms and
irregularities) I<can> involve more overhead than is justifiable for
all but the largest lexicons.
Mercifully, this design decision becomes crucial only in the hairiest
of inflected languages, of which Russian is by no means the I<worst> case
scenario, but is worse than most. Most languages have simpler
inflection systems; for example, in English or Swahili, there are
generally no more than two possible inflected forms for a given noun
("error/errors"; "kosa/makosa"), and the
rules for producing these forms are fairly simple -- or at least,
simple rules can be formulated that work for most words, and you can
then treat the exceptions as just "irregular", at least relative to
your ad hoc rules. A simpler inflection system (simpler rules, fewer
forms) means that design decisions are less crucial to maintaining
sanity, whereas the same decisions could incur
overhead-versus-scalability problems in languages like Russian. It
may I<also> be likely that code (possibly in Perl, as with
Lingua::EN::Inflect, for English nouns) has already
been written for the language in question, whether simple or complex.
Moreover, a third possibility may even be simpler than anything
discussed above: "Just require that all possible (or at least
applicable) forms be provided in the call to the given language's quant
method, as in:"
"I found [quant,_1,file,files]."
That way, quant just has to chose which form it needs, without having
to look up or generate anything. While possibly not optimal for
Russian, this should work well for most other languages, where
quantification is not as complicated an operation.
=head2 The Devil in the Details
There's plenty more to Maketext than described above -- for example,
there's the details of how language tags ("en-US", "i-pwn", "fi",
etc.) or locale IDs ("en_US") interact with actual module naming
("BogoQuery/Locale/en_us.pm"), and what magic can ensue; there's the
details of how to record (and possibly negotiate) what character
encoding Maketext will return text in (UTF8? Latin-1? KOI8?). There's
the interesting fact that Maketext is for localization, but nowhere
actually has a "C<use locale;>" anywhere in it. For the curious,
there's the somewhat frightening details of how I actually
implement something like data inheritance so that searches across
modules' %Lexicon hashes can parallel how Perl implements method
inheritance.
And, most importantly, there's all the practical details of how to
actually go about deriving from Maketext so you can use it for your
interfaces, and the various tools and conventions for starting out and
maintaining individual language modules.
That is all covered in the documentation for Locale::Maketext and the
modules that come with it, available in CPAN. After having read this
article, which covers the why's of Maketext, the documentation,
which covers the how's of it, should be quite straightforward.
=head2 The Proof in the Pudding: Localizing Web Sites
Maketext and gettext have a notable difference: gettext is in C,
accessible thru C library calls, whereas Maketext is in Perl, and
really can't work without a Perl interpreter (although I suppose
something like it could be written for C). Accidents of history (and
not necessarily lucky ones) have made C++ the most common language for
the implementation of applications like word processors, Web browsers,
and even many in-house applications like custom query systems. Current
conditions make it somewhat unlikely that the next one of any of these
kinds of applications will be written in Perl, albeit clearly more for
reasons of custom and inertia than out of consideration of what is the
right tool for the job.
( run in 0.528 second using v1.01-cache-2.11-cpan-e1769b4cff6 )