bless.* results from the CPAN

bless.*
Algorithm-Viterbi
view release on metacpan or search on metacpan
lib/Algorithm/Viterbi.pm view on Meta::CPAN
  my ($prob, $v_path, $v_prob) = $v->forward_viterbi($observations);

  -- or --
   
  my $training_data = [
    [ 'walk', 'Sunny' ],
    [ 'walk', 'Sunny' ],
    [ 'walk', 'Rainy' ],
    [ 'shop', 'Rainy' ],
    [ 'clean', 'Rainy' ],
    [ 'clean', 'Rainy' ],
    ...
  ];

  $v->train($training_data);
  my ($prob, $v_path, $v_prob) = $v->forward_viterbi($observations);

=head1 DESCRIPTION

Algorithm::Viterbi computes the forward probability, the Viterbi path
and the Viterbi probability of a sequence of observations, based on 
a given start, emission and transition probability.
Alternatively, the start, emission and transition probability can be 
computed from a set of training data.

The whole idea of this module is inspired by an article on the Viterbi 
algorithm in Wikipedia, the free encyclopedia. Rather than copying all 
text, I'm just including the link to the Wikipedia page: 
L<http://en.wikipedia.org/wiki/Viterbi_algorithm>.
I think the page is well-written and I see no need to repeat the theory 
here. Reading it may clarify the documentation below.

=cut

use strict;
use warnings;

=head1 METHODS

=over 8

=item new

Creates a new C<Algorithm::Viterbi> object. 
The following attributes can be set with the constructor:

  my $v = Algorthm::Viterbi->new(
    start_state => '$',
    unknown_emission_prob => undef,
    unknown_transition_prob => 0);

The values of the attributes in the example are the default values.
For a detailed description and use of these attributes, see below.

=cut

sub new
{
  my $class = shift; 
  my $self = {@_};
  bless $self, $class;

  $self->{unknown_transition_prob} = 0 if (!defined($self->{unknown_transition_prob}));
  $self->{start_state} = '$' if (!defined($self->{start_state}));

  return $self;
}

=item train

This method computes the start, emission and transition probabilities 
from a set of observations and their associated states.
The probabilities are simple averages of the passed observations,
so if you require sophisticated smoothing on the emission, start and/or
transition, then you're better off rolling your own.

The value of member start_state is a bogus state used to define the begin state of the first transition.
By default, this state is set to '$'. You can change this by setting the variable in the constructor
or later by accessing the member directly. See example below.

This state can also be used as a separator between the beginning and end of a sequence of observations. 
For example, you could assign this state (tag) to every end-of-sentence symbol when training on a 
pre-tagged corpus.

The set of observations is passed as a reference to an array as shown in the following example:

  use strict;
  use Algorithm::Viterbi;
  use Data::Dumper;

  my $observations = [
    [ 'work', 'rainy' ],
    [ 'work', 'sunny' ],
    [ 'walk', 'sunny' ],
    [ 'walk', 'rainy' ],
    [ 'shop', 'rainy' ],
    [ 'work', 'rainy' ],
  ];

  my $v = Algorithm::Viterbi->new(start_state => '###');
  $v->train($observations);

  print Dumper($v);

will produce:

  $VAR1 = bless( {
                 'transition' => {
                                   'sunny' => {
                                                'sunny' => '0.5',
                                                'rainy' => '0.25'
                                              },
                                   'rainy' => {
                                                'sunny' => '0.5',
                                                'rainy' => '0.5'
                                              },
                                   '###' => {
                                              'rainy' => '0.25'
                                            }
                                 },
                 'emission' => {
                                 'shop' => {
                                             'rainy' => '0.25'
                                           },
                                 'walk' => {
                                             'sunny' => '0.5',
                                             'rainy' => '0.25'
                                           },
                                 'work' => {
                                             'sunny' => '0.5',
                                             'rainy' => '0.5'
                                           }
                               },
                 'start_state' => '###',
                 'states' => [
                               'sunny',
                               'rainy'
                             ],
                 'unknown_transition_prob' => 0,
                 'start' => {
                              'sunny' => '0.333333333333333',
                              'rainy' => '0.666666666666667'
                            }
               }, 'Algorithm::Viterbi' );

=cut

sub train
{
  my ($self, $training_data) = @_;

  my $ep = {};
  my $tp = {};
  my $sp = {};

  my $pt = $self->{start_state};

  foreach my $o(@$training_data){
    my ($a, $t) = @$o;
    $ep->{$a}{$t}++;
    $tp->{$pt}{$t}++;
    $pt = $t;
    $sp->{$t}++;
  }

  #emission
  foreach my $a(keys %$ep){
( run in 2.543 seconds using v1.01-cache-2.11-cpan-d7f47b0818f )