Lingua-Identify-Blacklists

 view release on metacpan or  search on metacpan

lib/Lingua/Identify/Blacklists.pm  view on Meta::CPAN

=head1 Exported Functions

=head2 C<$langID = identify( $text [,%options] )>

Analyses a given text and returns a language ID as the result of the classification. C<%options> can be used to change the behaviour of the classifier. Possible options are

  assumed    => $assumed_lang
  langs      => \@list_of_possible_langs
  use_margin => $score

If C<langs> are specified, it runs the classifier with blacklists for those languages (in a cascaded way, i.e. best1 = lang1 vs lang2, best2 = best1 vs lang3, ...). If C<use_margin> is specified, it runs all versus all and returns the language that w...

If the C<assumed> language is given, it runs the blacklist classifier for all languages that can be confused with $assumed_lang (if blacklist models exist for them).

If neither C<langs> not C<assumed> are specified, it first runs a general-purpose language identification (using Lingua::Identify::CLD and Lingua::Identify) and then checks with the blacklist classifier whether the detected language can be confused w...

=cut

sub identify{
  my $text = shift;
  my %options = @_;

lib/Lingua/Identify/Blacklists.pm  view on Meta::CPAN

}


=head2 Module-internal functions

The following functions are not exported and are mainly used for internal purposes (but may be used from the outside if needed).

 initialize()                     # reset the repository of blacklists
 identify_language($text)         # return lang-ID for $text (using CLD)
 classify(\%dic,%options)         # run the classifier
 classify_cascaded(\%dic,@langs)  # run a cascade of binary classifications

 # run all versus all and return the one that wins most binary decisions
 # (a score margin is used to adjust the reliability of the decisions)

 classify_with_margin(\%dic,$margin,@langs) 

 load_blacklists($dir)                # load all blacklists available in $dir
 load_blacklist(\%list,$dir,      # load a lang-pair specific blacklist
                $lang1,$lang2)  
 read_file($file,\%dic,$max)      # read a file and count token frequencies

lib/Lingua/Identify/Blacklists.pm  view on Meta::CPAN

    my %options     = @_;
    $options{langs} = '' unless ($options{langs});

    my @langs = ref($options{langs}) eq 'ARRAY' ? 
	@{$options{langs}} : split( /\s+/, $options{langs} ) ;

    @langs = available_languages() unless (@langs);

    return &classify_with_margin( $dic, $options{use_margin}, @langs ) 
	if ($options{use_margin});
    return &classify_cascaded( $dic, @langs );
}

sub classify_cascaded{
    my $dic = shift;
    my @langs = @_;

    my $lang1 = shift(@langs);
    foreach my $lang2 (@langs){

        # load blacklists on demand
        unless (exists $blacklists{"$lang1-$lang2"}){
            $blacklists{"$lang1-$lang2"}={};
            &load_blacklist($blacklists{"$lang1-$lang2"},



( run in 0.547 second using v1.01-cache-2.11-cpan-49f99fa48dc )