Algorithm-Viterbi
view release on metacpan or search on metacpan
lib/Algorithm/Viterbi.pm view on Meta::CPAN
L<http://en.wikipedia.org/wiki/Viterbi_algorithm>.
I think the page is well-written and I see no need to repeat the theory
here. Reading it may clarify the documentation below.
=cut
use strict;
use warnings;
=head1 METHODS
=over 8
=item new
Creates a new C<Algorithm::Viterbi> object.
The following attributes can be set with the constructor:
my $v = Algorthm::Viterbi->new(
start_state => '$',
unknown_emission_prob => undef,
unknown_transition_prob => 0);
The values of the attributes in the example are the default values.
For a detailed description and use of these attributes, see below.
=cut
sub new
{
my $class = shift;
my $self = {@_};
bless $self, $class;
$self->{unknown_transition_prob} = 0 if (!defined($self->{unknown_transition_prob}));
$self->{start_state} = '$' if (!defined($self->{start_state}));
return $self;
}
=item train
This method computes the start, emission and transition probabilities
from a set of observations and their associated states.
The probabilities are simple averages of the passed observations,
so if you require sophisticated smoothing on the emission, start and/or
transition, then you're better off rolling your own.
The value of member start_state is a bogus state used to define the begin state of the first transition.
By default, this state is set to '$'. You can change this by setting the variable in the constructor
or later by accessing the member directly. See example below.
This state can also be used as a separator between the beginning and end of a sequence of observations.
For example, you could assign this state (tag) to every end-of-sentence symbol when training on a
pre-tagged corpus.
The set of observations is passed as a reference to an array as shown in the following example:
use strict;
use Algorithm::Viterbi;
use Data::Dumper;
my $observations = [
[ 'work', 'rainy' ],
[ 'work', 'sunny' ],
[ 'walk', 'sunny' ],
[ 'walk', 'rainy' ],
[ 'shop', 'rainy' ],
[ 'work', 'rainy' ],
];
my $v = Algorithm::Viterbi->new(start_state => '###');
$v->train($observations);
print Dumper($v);
will produce:
$VAR1 = bless( {
'transition' => {
'sunny' => {
'sunny' => '0.5',
'rainy' => '0.25'
},
'rainy' => {
'sunny' => '0.5',
'rainy' => '0.5'
},
'###' => {
'rainy' => '0.25'
}
},
'emission' => {
'shop' => {
'rainy' => '0.25'
},
'walk' => {
'sunny' => '0.5',
'rainy' => '0.25'
},
'work' => {
'sunny' => '0.5',
'rainy' => '0.5'
}
},
'start_state' => '###',
'states' => [
'sunny',
'rainy'
],
'unknown_transition_prob' => 0,
'start' => {
'sunny' => '0.333333333333333',
'rainy' => '0.666666666666667'
}
}, 'Algorithm::Viterbi' );
=cut
sub train
{
lib/Algorithm/Viterbi.pm view on Meta::CPAN
};
my $v = Algorithm::Viterbi->new();
$v->emission($emission_probability);
The keys of the emission hash constitute the dictionary, which is used to determine
whether an observation is a known or an unknown observation.
Returns the emission probabilities.
=cut
sub emission
{
my $self = shift;
($self->{emission}) = @_ if (@_);
return $self->{emission};
}
=item transition
Initializes the transition probabilities.
The transition is passed as a reference to a hash, as shown in
this example:
my $transition_probability = {
'Rainy' => {'Rainy'=> 0.7, 'Sunny'=> 0.3},
'Sunny' => {'Rainy'=> 0.4, 'Sunny'=> 0.6},
};
my $v = Algorithm::Viterbi->new();
$v->transition($transition_probability);
The transition hash can be 'sparse': it is sufficient to include only known
transitions between states. See method get_transition.
Returns the transition probabilities.
=cut
sub transition
{
my $self = shift;
($self->{transition}) = @_ if (@_);
return $self->{transition};
}
=item forward_viterbi
This method calculates the forward probability, the Viterbi path
and the Viterbi probability of a given sequence of observations.
For a detailed description of the Algorithm, see the Wikipedia page
L<http://en.wikipedia.org/wiki/Viterbi_algorithm>.
The difference with the algorithm described in the web page above,
is that the emission and the transition are calculated somewhat
differently. See methods get_emission and get_transition.
Example:
use strict;
use Algorithm::Viterbi;
use Data::Dumper;
my $observations = [ 'walk', 'shop', 'clean' ];
my $start = { 'Rainy'=> 0.6, 'Sunny'=> 0.4 };
my $transition = {
'Rainy' => {'Rainy'=> 0.7, 'Sunny'=> 0.3},
'Sunny' => {'Rainy'=> 0.4, 'Sunny'=> 0.6},
};
my $emission = {
'shop' => {
'Sunny' => '0.3',
'Rainy' => '0.4',
},
'walk' => {
'Sunny' => '0.6',
'Rainy' => '0.1'
},
'clean' => {
'Sunny' => '0.1',
'Rainy' => '0.5'
}
};
my $v = Algorithm::Viterbi->new();
$v->emission($emission);
$v->transition($transition);
$v->start($start);
print Dumper ($v->forward_viterbi($observations));
produces:
$VAR1 = '0.033612';
$VAR2 = [
'Sunny',
'Rainy',
'Rainy',
'Rainy'
];
$VAR3 = '0.009408';
=cut
sub forward_viterbi
{
my ($self, $observation) = @_;
my $T = { };
foreach my $state (@{$self->{states}}) {
## prob. V. path V. prob.
$T->{$state} = [ $self->{start}{$state}, [$state], $self->{start}{$state} ];
}
foreach my $output (@$observation) {
my $U = { };
foreach my $next_state (@{$self->{states}}) {
my $total = 0;
my $argmax = [ ];
( run in 0.621 second using v1.01-cache-2.11-cpan-39bf76dae61 )