AI-Embedding
view release on metacpan or search on metacpan
lib/AI/Embedding.pm view on Meta::CPAN
}
# Return Embedding as a CSV string
sub embedding {
my ($self, $text, $verbose) = @_;
my $response = $self->_get_embedding($text);
if ($response->{'success'}) {
my $embedding = decode_json($response->{'content'});
return join (',', @{$embedding->{'data'}[0]->{'embedding'}});
}
$self->{'error'} = 'HTTP Error - ' . $response->{'reason'};
return $response if defined $verbose;
return undef;
}
# Return Embedding as an array
sub raw_embedding {
my ($self, $text, $verbose) = @_;
my $response = $self->_get_embedding($text);
if ($response->{'success'}) {
my $embedding = decode_json($response->{'content'});
return @{$embedding->{'data'}[0]->{'embedding'}};
}
$self->{'error'} = 'HTTP Error - ' . $response->{'reason'};
return $response if defined $verbose;
return undef;
}
# Return Test Embedding
sub test_embedding {
my ($self, $text, $dimension) = @_;
$self->{'error'} = '';
$dimension = 1536 unless defined $dimension;
if ($text) {
srand scalar split /\s+/, $text;
}
my @vector;
for (1...$dimension) {
push @vector, rand(2) - 1;
}
return join ',', @vector;
}
# Convert a CSV Embedding into a hashref
sub _make_vector {
my ($self, $embed_string) = @_;
if (!defined $embed_string) {
$self->{'error'} = 'Nothing to compare!';
return;
}
my %vector;
my @embed = split /,/, $embed_string;
for (my $i = 0; $i < @embed; $i++) {
$vector{'feature' . $i} = $embed[$i];
}
return \%vector;
}
# Return a comparator to compare to a set vector
sub comparator {
my($self, $embed) = @_;
$self->{'error'} = '';
my $vector1 = $self->_make_vector($embed);
return sub {
my($embed2) = @_;
my $vector2 = $self->_make_vector($embed2);
return $self->_compare_vector($vector1, $vector2);
};
}
# Compare 2 Embeddings
sub compare {
my ($self, $embed1, $embed2) = @_;
my $vector1 = $self->_make_vector($embed1);
my $vector2;
if (defined $embed2) {
$vector2 = $self->_make_vector($embed2);
} else {
$vector2 = $self->{'comparator'};
}
if (!defined $vector2) {
$self->{'error'} = 'Nothing to compare!';
return;
}
if (scalar keys %$vector1 != scalar keys %$vector2) {
$self->{'error'} = 'Embeds are unequal length';
return;
}
return $self->_compare_vector($vector1, $vector2);
}
# Compare 2 Vectors
sub _compare_vector {
my ($self, $vector1, $vector2) = @_;
my $cs = Data::CosineSimilarity->new;
$cs->add( label1 => $vector1 );
$cs->add( label2 => $vector2 );
return $cs->similarity('label1', 'label2')->cosine;
}
1;
__END__
=encoding utf8
=head1 NAME
AI::Embedding - Perl module for working with text embeddings using various APIs
lib/AI/Embedding.pm view on Meta::CPAN
=head2 raw_embedding
my @raw_embedding = $embedding->raw_embedding('Some text passage', [$verbose]);
Generates an embedding for the given text and returns it as an array. The C<raw_embedding> method takes a single parameter, the text to generate the embedding for.
It is not normally necessary to use this method as the Embedding will almost always be used as a single homogeneous unit.
If the method call fails it sets the L</"error"> message and returns C<undef>. If the optional C<verbose> parameter is true, the complete L<HTTP::Tiny> response object is also returned to aid with debugging issues when using this module.
=head2 test_embedding
my $test_embedding = $embedding->test_embedding('Some text passage', $dimensions);
Used for testing code without making a chargeable call to the API.
Provides a CSV string of the same size and format as L<embedding> but with meaningless random data.
Returns a random embedding. Both parameters are optional. If a text string is provided, the returned embedding will always be the same random embedding otherwise it will be random and different every time. The C<dimension> parameter controls the n...
=head2 comparator
$embedding->comparator($csv_embedding2);
Sets a vector as a C<comparator> for future comparisons and returns a reference to a method for using the C<comparator>.
The B<comparator> method takes a single parameter, the comma-separated Embedding string to use as the comparator.
The following two are functionally equivalent. However, where multiple Embeddings are to be compared to a single Embedding, using a L<Comparator> is significantly faster.
my $similarity = $embedding->compare($csv_embedding1, $csv_embedding2);
my $cmp = $embedding->comparator($csv_embedding2);
my $similarity = $cmp->($csv_embedding1);
See L</"Comparator">
The returned method reference returns the cosine similarity between the Embedding used to call the C<comparator> method and the Embedding supplied to the method reference. See L<compare> for an explanation of the cosine similarity.
=head2 compare
my $similarity_with_other_embedding = $embedding->compare($csv_embedding1, $csv_embedding2);
Compares two embeddings and returns the cosine similarity between them. The B<compare> method takes two parameters: $csv_embedding1 and $csv_embedding2 (both comma-separated embedding strings).
Returns the cosine similarity as a floating-point number between -1 and 1, where 1 represents identical embeddings, 0 represents no similarity, and -1 represents opposite embeddings.
The absolute number is not usually relevant for text comparision. It is usually sufficient to rank the comparison results in order of high to low to reflect the best match to the worse match.
=head1 SEE ALSO
L<https://openai.com> - OpenAI official website
=head1 AUTHOR
Ian Boddison <ian at boddison.com>
=head1 BUGS
Please report any bugs or feature requests to C<bug-ai-embedding at rt.cpan.org>, or through
the web interface at L<https://rt.cpan.org/NoAuth/ReportBug.html?Queue=bug-ai-embedding>. I will be notified, and then you'll
automatically be notified of progress on your bug as I make changes.
=head1 SUPPORT
You can find documentation for this module with the perldoc command.
perldoc AI::Embedding
You can also look for information at:
=over 4
=item * RT: CPAN's request tracker (report bugs here)
L<https://rt.cpan.org/NoAuth/Bugs.html?Dist=AI-Embedding>
=item * Search CPAN
L<https://metacpan.org/release/AI::Embedding>
=back
=head1 ACKNOWLEDGEMENTS
Thanks to the help and support provided by members of Perl Monks L<https://perlmonks.org/>.
Especially L<Ken Cotterill (KCOTT)|https://metacpan.org/author/KCOTT> for assistance with unit tests and L<Hugo van der Sanden (HVDS)|https://metacpan.org/author/HVDS> for suggesting the current C<comparator> implementaion.
=head1 COPYRIGHT AND LICENSE
This software is copyright (c) 2023 by Ian Boddison.
This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.
=cut
( run in 0.289 second using v1.01-cache-2.11-cpan-eab888a1d7d )