Data::Dumper results from the CPAN

Algorithm-Viterbi

L<http://en.wikipedia.org/wiki/Viterbi_algorithm>.
I think the page is well-written and I see no need to repeat the theory 
here. Reading it may clarify the documentation below.

=cut

use strict;
use warnings;

=head1 METHODS

=over 8

=item new

Creates a new C<Algorithm::Viterbi> object. 
The following attributes can be set with the constructor:

  my $v = Algorthm::Viterbi->new(
    start_state => '$',
    unknown_emission_prob => undef,
    unknown_transition_prob => 0);

The values of the attributes in the example are the default values.
For a detailed description and use of these attributes, see below.

=cut

sub new
{
  my $class = shift; 
  my $self = {@_};
  bless $self, $class;

  $self->{unknown_transition_prob} = 0 if (!defined($self->{unknown_transition_prob}));
  $self->{start_state} = '$' if (!defined($self->{start_state}));

  return $self;
}

=item train

This method computes the start, emission and transition probabilities 
from a set of observations and their associated states.
The probabilities are simple averages of the passed observations,
so if you require sophisticated smoothing on the emission, start and/or
transition, then you're better off rolling your own.

The value of member start_state is a bogus state used to define the begin state of the first transition.
By default, this state is set to '$'. You can change this by setting the variable in the constructor
or later by accessing the member directly. See example below.

This state can also be used as a separator between the beginning and end of a sequence of observations. 
For example, you could assign this state (tag) to every end-of-sentence symbol when training on a 
pre-tagged corpus.

The set of observations is passed as a reference to an array as shown in the following example:

  use strict;
  use Algorithm::Viterbi;
  use Data::Dumper;

  my $observations = [
    [ 'work', 'rainy' ],
    [ 'work', 'sunny' ],
    [ 'walk', 'sunny' ],
    [ 'walk', 'rainy' ],
    [ 'shop', 'rainy' ],
    [ 'work', 'rainy' ],
  ];

  my $v = Algorithm::Viterbi->new(start_state => '###');
  $v->train($observations);

  print Dumper($v);

will produce:

  $VAR1 = bless( {
                 'transition' => {
                                   'sunny' => {
                                                'sunny' => '0.5',
                                                'rainy' => '0.25'
                                              },
                                   'rainy' => {
                                                'sunny' => '0.5',
                                                'rainy' => '0.5'
                                              },
                                   '###' => {
                                              'rainy' => '0.25'
                                            }
                                 },
                 'emission' => {
                                 'shop' => {
                                             'rainy' => '0.25'
                                           },
                                 'walk' => {
                                             'sunny' => '0.5',
                                             'rainy' => '0.25'
                                           },
                                 'work' => {
                                             'sunny' => '0.5',
                                             'rainy' => '0.5'
                                           }
                               },
                 'start_state' => '###',
                 'states' => [
                               'sunny',
                               'rainy'
                             ],
                 'unknown_transition_prob' => 0,
                 'start' => {
                              'sunny' => '0.333333333333333',
                              'rainy' => '0.666666666666667'
                            }
               }, 'Algorithm::Viterbi' );

=cut

sub train
{

lib/Algorithm/Viterbi.pm view on Meta::CPAN

        };
  my $v = Algorithm::Viterbi->new();
  $v->emission($emission_probability);

The keys of the emission hash constitute the dictionary, which is used to determine 
whether an observation is a known or an unknown observation.

Returns the emission probabilities.

=cut

sub emission
{
  my $self = shift;
  ($self->{emission}) = @_ if (@_);
  return $self->{emission};
}

=item transition

Initializes the transition probabilities. 
The transition is passed as a reference to a hash, as shown in
this example:

  my $transition_probability = {
   'Rainy' => {'Rainy'=> 0.7, 'Sunny'=> 0.3},
   'Sunny' => {'Rainy'=> 0.4, 'Sunny'=> 0.6},
  };
  my $v = Algorithm::Viterbi->new();
  $v->transition($transition_probability);

The transition hash can be 'sparse': it is sufficient to include only known 
transitions between states. See method get_transition.

Returns the transition probabilities.

=cut

sub transition
{
  my $self = shift;
  ($self->{transition}) = @_ if (@_);
  return $self->{transition};
}

=item forward_viterbi

This method calculates the forward probability, the Viterbi path 
and the Viterbi probability of a given sequence of observations.
For a detailed description of the Algorithm, see the Wikipedia page
L<http://en.wikipedia.org/wiki/Viterbi_algorithm>.

The difference with the algorithm described in the web page above, 
is that the emission and the transition are calculated somewhat
differently. See methods get_emission and get_transition.

Example:

  use strict;
  use Algorithm::Viterbi;
  use Data::Dumper;

    
  my $observations = [ 'walk', 'shop', 'clean' ];
   my $start = { 'Rainy'=> 0.6, 'Sunny'=> 0.4 };
   my $transition = {
      'Rainy' => {'Rainy'=> 0.7, 'Sunny'=> 0.3},
      'Sunny' => {'Rainy'=> 0.4, 'Sunny'=> 0.6},
      };

  my $emission = {
    'shop' => {
      'Sunny' => '0.3',
      'Rainy' => '0.4',
    },

    'walk' => {
      'Sunny' => '0.6',
      'Rainy' => '0.1'
    },
    'clean' => {
      'Sunny' => '0.1',
      'Rainy' => '0.5'
      }
  };

  my $v = Algorithm::Viterbi->new();
  $v->emission($emission);
  $v->transition($transition);
  $v->start($start);

  print Dumper ($v->forward_viterbi($observations));

produces:

  $VAR1 = '0.033612';
  $VAR2 = [
	    'Sunny',
	    'Rainy',
	    'Rainy',
	    'Rainy'
	  ];
  $VAR3 = '0.009408';

=cut

sub forward_viterbi
{
  my ($self, $observation) = @_;

  my $T = { };
  foreach my $state (@{$self->{states}}) {
    ##               prob.                   V. path   V. prob.
    $T->{$state} = [ $self->{start}{$state}, [$state], $self->{start}{$state} ]; 
  }

  foreach my $output (@$observation) {
    my $U = { };
    foreach my $next_state (@{$self->{states}}) {
      my $total = 0;
      my $argmax = [ ];

( run in 0.621 second using v1.01-cache-2.11-cpan-39bf76dae61 )