Lingua-Identify-Blacklists
    
    
  
  
  
view release on metacpan or search on metacpan
lib/Lingua/Identify/Blacklists.pm view on Meta::CPAN
=head1 Exported Functions
=head2 C<$langID = identify( $text [,%options] )>
Analyses a given text and returns a language ID as the result of the classification. C<%options> can be used to change the behaviour of the classifier. Possible options are
  assumed    => $assumed_lang
  langs      => \@list_of_possible_langs
  use_margin => $score
If C<langs> are specified, it runs the classifier with blacklists for those languages (in a cascaded way, i.e. best1 = lang1 vs lang2, best2 = best1 vs lang3, ...). If C<use_margin> is specified, it runs all versus all and returns the language that w...
If the C<assumed> language is given, it runs the blacklist classifier for all languages that can be confused with $assumed_lang (if blacklist models exist for them).
If neither C<langs> not C<assumed> are specified, it first runs a general-purpose language identification (using Lingua::Identify::CLD and Lingua::Identify) and then checks with the blacklist classifier whether the detected language can be confused w...
=cut
sub identify{
  my $text = shift;
  my %options = @_;
lib/Lingua/Identify/Blacklists.pm view on Meta::CPAN
}
=head2 Module-internal functions
The following functions are not exported and are mainly used for internal purposes (but may be used from the outside if needed).
 initialize()                     # reset the repository of blacklists
 identify_language($text)         # return lang-ID for $text (using CLD)
 classify(\%dic,%options)         # run the classifier
 classify_cascaded(\%dic,@langs)  # run a cascade of binary classifications
 # run all versus all and return the one that wins most binary decisions
 # (a score margin is used to adjust the reliability of the decisions)
 classify_with_margin(\%dic,$margin,@langs) 
 load_blacklists($dir)                # load all blacklists available in $dir
 load_blacklist(\%list,$dir,      # load a lang-pair specific blacklist
                $lang1,$lang2)  
 read_file($file,\%dic,$max)      # read a file and count token frequencies
lib/Lingua/Identify/Blacklists.pm view on Meta::CPAN
    my %options     = @_;
    $options{langs} = '' unless ($options{langs});
    my @langs = ref($options{langs}) eq 'ARRAY' ? 
	@{$options{langs}} : split( /\s+/, $options{langs} ) ;
    @langs = available_languages() unless (@langs);
    return &classify_with_margin( $dic, $options{use_margin}, @langs ) 
	if ($options{use_margin});
    return &classify_cascaded( $dic, @langs );
}
sub classify_cascaded{
    my $dic = shift;
    my @langs = @_;
    my $lang1 = shift(@langs);
    foreach my $lang2 (@langs){
        # load blacklists on demand
        unless (exists $blacklists{"$lang1-$lang2"}){
            $blacklists{"$lang1-$lang2"}={};
            &load_blacklist($blacklists{"$lang1-$lang2"},
( run in 0.687 second using v1.01-cache-2.11-cpan-c333fce770f )