Lingua-Ident

 view release on metacpan or  search on metacpan

Ident.pm  view on Meta::CPAN

example, you may get something like this:

   [['de.iso-8859-1', -317.980835274509],
    ['en.iso-8859-1', -450.804230119916], ...]

The elements are sorted in descending order by probability.  You can
use this data to assess the reliability of the categorization and make
your own decision using application-specific metrics.

When neither a trigram nor a bigram is found, the calculation deviates
slightly from the formula given by Dunning (1994).  According to
Dunning's formula, one would estimate the probability as:

  p = log(1/#alph)

where #alph is the size of the alphabet of a particular language.
This penalizes different language models with different values because
the alphabet sizes of the languages differ.

However, the size of the alphabet is much larger for Asian languages
than for European languages.  For example, for the sample data in the
Lingua::Ident distribution trainlid(1) reports #alph = 127 for zh.big5

Ident.pm  view on Meta::CPAN

            {
               # $prob[$i] += log $matrices[$i]->{'_NULL'};
               if (exists $bigrams[$i]->{substr($trigram, 0, 2)})
               {
                  $prob[$i] +=
                      log (1 / $bigrams[$i]->{substr($trigram, 0, 2)});
               }
               else
               {
                  # When neither a trigram nor a bigram is found,
                  # according to Dunning's formula, we would now
                  # calculate:

                  #   $prob[$i] += log (1 / $matrices[$i]->{'#ALPH'});

                  # Thus, we penalize different language models with
                  # different values because of the language's
                  # alphabet size.

                  # However, the size of the alphabet (#ALPH) for
                  # Asian languages is much larger than for European

samples/sample.en  view on Meta::CPAN

How do you take the measure of a woman's life at 50, when her
generation--or at least its passionate front line--has broken all the
rules? "There is no formula that I'm aware of for being a successful
or fulfilled woman today," Hillary Rodham Clinton once said. "Perhaps
it would be easier ... if we could be handed a pattern and cut it out,
just as our mothers and grandmothers and foremothers were. But that is
not the way it is today, and I'm glad it is not."

On Oct. 26, Hillary turns 50, which is a birthday that compels almost
any woman to step back and examine whether the drape and line of her
life fit the woman she once dreamed of becoming.  The cutting edge of
female Baby Boomers, of whom Hillary is the most famous, approached
adulthood with a wild, subversive earnestness. These women would



( run in 2.953 seconds using v1.01-cache-2.11-cpan-3cd7ad12f66 )