Lingua-Identify-Blacklists
view release on metacpan or search on metacpan
lib/Lingua/Identify/Blacklists.pm view on Meta::CPAN
=head1 Exported Functions
=head2 C<$langID = identify( $text [,%options] )>
Analyses a given text and returns a language ID as the result of the classification. C<%options> can be used to change the behaviour of the classifier. Possible options are
assumed => $assumed_lang
langs => \@list_of_possible_langs
use_margin => $score
If C<langs> are specified, it runs the classifier with blacklists for those languages (in a cascaded way, i.e. best1 = lang1 vs lang2, best2 = best1 vs lang3, ...). If C<use_margin> is specified, it runs all versus all and returns the language that w...
If the C<assumed> language is given, it runs the blacklist classifier for all languages that can be confused with $assumed_lang (if blacklist models exist for them).
If neither C<langs> not C<assumed> are specified, it first runs a general-purpose language identification (using Lingua::Identify::CLD and Lingua::Identify) and then checks with the blacklist classifier whether the detected language can be confused w...
=cut
sub identify{
my $text = shift;
my %options = @_;
lib/Lingua/Identify/Blacklists.pm view on Meta::CPAN
}
=head2 Module-internal functions
The following functions are not exported and are mainly used for internal purposes (but may be used from the outside if needed).
initialize() # reset the repository of blacklists
identify_language($text) # return lang-ID for $text (using CLD)
classify(\%dic,%options) # run the classifier
classify_cascaded(\%dic,@langs) # run a cascade of binary classifications
# run all versus all and return the one that wins most binary decisions
# (a score margin is used to adjust the reliability of the decisions)
classify_with_margin(\%dic,$margin,@langs)
load_blacklists($dir) # load all blacklists available in $dir
load_blacklist(\%list,$dir, # load a lang-pair specific blacklist
$lang1,$lang2)
read_file($file,\%dic,$max) # read a file and count token frequencies
lib/Lingua/Identify/Blacklists.pm view on Meta::CPAN
my %options = @_;
$options{langs} = '' unless ($options{langs});
my @langs = ref($options{langs}) eq 'ARRAY' ?
@{$options{langs}} : split( /\s+/, $options{langs} ) ;
@langs = available_languages() unless (@langs);
return &classify_with_margin( $dic, $options{use_margin}, @langs )
if ($options{use_margin});
return &classify_cascaded( $dic, @langs );
}
sub classify_cascaded{
my $dic = shift;
my @langs = @_;
my $lang1 = shift(@langs);
foreach my $lang2 (@langs){
# load blacklists on demand
unless (exists $blacklists{"$lang1-$lang2"}){
$blacklists{"$lang1-$lang2"}={};
&load_blacklist($blacklists{"$lang1-$lang2"},
( run in 0.470 second using v1.01-cache-2.11-cpan-49f99fa48dc )