WordNet-Similarity
view release on metacpan or search on metacpan
lib/WordNet/Similarity/path.pm view on Meta::CPAN
# Initialize traces.
$self->{traceString} = "" if($self->{'trace'});
# JM 1-21-04
# moved input validation code to WordNet::Similarity::parseInput()
my $ret = $self->parseWps ($wps1, $wps2);
ref $ret or return $ret;
my ($word1, $pos1, $sense1, $offset1, $word2, $pos2, $sense2, $offset2)
= @{$ret};
$wps1 = "$word1#$pos1#$sense1";
$wps2 = "$word2#$pos2#$sense2";
# Now check if the similarity value for these two synsets is in
# fact in the cache... if so return the cached value.
my $relatedness =
$self->{doCache} ? $self->fetchFromCache ($wps1, $wps2) : undef;
defined $relatedness and return $relatedness;
# JM 1/23/04
# Most of the code that does the work of finding the LCS and
# hypernym trees has been moved into LCSFinder and PathFinder
#my ($dist, $pathref) = $self->getShortestPath ($offset1, $offset2,
# $pos1, 'offset');
# there can be multiple shortest paths (i.e., paths of the same length)
my @paths = $self->getShortestPath ($offset1, $offset2, $pos1, 'offset');
my $path = shift @paths;
# $path will be undef if no path was found (error messages already generated)
unless (defined $path) {
return $self->UNRELATED;
}
my $dist = $path->[0];
# JM 1-29-04
# most of the code that does path-finding is now in PathFinder
if ($dist > 0) {
my $score = 1.0 / $dist;
$self->{doCache} and $self->storeToCache ($wps1, $wps2, $score);
return $score;
}
else {
$self->{errorString} .= "\nWarning (${class}::getRelatedness()) - ";
$self->{errorString} .= "Internal error while finding relatedness.";
$self->{error} = ($self->{error} < 1) ? 1 : $self->{error};
return undef;
}
}
# JM
# Much of the code that was shared between measures has been relocated.
# WordNet::Similarity contains code common to (almost) all measures,
# WordNet::Similarity::PathFinder has code common among path finding
# measures.
1;
__END__
=back
=head2 Discussion
If the two synsets being compared are the same, then the resulting relatedness
score will be 1. For exaple, the score for car#n#1 and car#n#1 is 1.
Due to multiple inheritance in the WordNet taxonomies, it is possible for
there to be a tie for the shortest path between synsets. If such a tie
occurs, then all of the paths that are tied will be printed to the
trace string.
The relatedness value returned by C<getRelatedness()> is the multiplicative
inverse of the path length between the two synsets (1/path_length). This
has a slightly subtle effect: it shifts the relative magnitude of scores.
For example, if we have the following pairs of synsets with the given
path lengths:
synset1 synset2: 3
synset3 synset4: 4
synset5 synset6: 5
We observe that the difference in the score for synset1-synset2 and
synset3-synset4 is the same as for synset3-synset4 and synset5-synset6.
When we take the multiplicative inverse of them, we get:
synset1 synset2: .333
synset3 synset4: .25
synset5 synset6: .2
Now the difference between the scores for synset3-synset4 is less than the
difference for synset1-synset2 and synset3-synset4. This can have negative
consequences when computing correlation coefficients. It might be useful
to compute relatedness as S<max_distance - path_length>, where max_distance
is the longest possible shortest path between two synsets. The original path
length can be easily determined by taking the multiplicative inverse
of the returned relatedness score: S<1/score = 1/(1/path_length) = path_length>.
The computation of max_distance is left as an exercise for the reader.
If two different word senses (wps strings) are given as input to getRelatedness,
but both word senses belong to the same synset, then 1 is returned (e.g.,
car#n#1 and auto#n#1 both belong to the same synset).
=head2 Usage
The semantic relatedness modules in this distribution are built as classes
that define the following methods:
new()
getRelatedness()
getError()
getTraceString()
( run in 1.055 second using v1.01-cache-2.11-cpan-5511b514fd6 )