SQLite-VecDB

 view release on metacpan or  search on metacpan

lib/SQLite/VecDB.pm  view on Meta::CPAN

  my @results = $coll->search(
    vector => [0.1, 0.2, ...],
    limit  => 5,
  );

  for my $r (@results) {
    say $r->id;        # 'doc1'
    say $r->distance;  # 0.042
    say $r->metadata;  # { title => 'Hello World' }
    say $r->content;   # 'Original text content'
  }

=head1 DESCRIPTION

SQLite::VecDB turns SQLite into a vector database using the
L<sqlite-vec|https://github.com/asg017/sqlite-vec> extension. It supports
storing vectors with metadata, KNN (k-nearest neighbor) search, and
optional automatic embedding generation via L<Langertha>.

=head2 db_file

Path to the SQLite database file. Use C<:memory:> for an in-memory database.

=head2 dimensions

The number of dimensions for vectors in this database. Must match the
embedding model you are using (e.g. 768 for nomic-embed-text, 1536 for
OpenAI text-embedding-3-small).

=head2 distance_metric

Distance metric for vector search. Default is C<cosine>. Supported by
sqlite-vec: C<cosine>, C<l2>, C<l1>.

=head2 embedding

Optional. A L<Langertha> engine instance that supports the
L<Langertha::Role::Embedding> role. When set, collections gain
C<add_text> and C<search_text> methods that automatically generate
embeddings.

=head2 sqlite_vec_path

Path to the sqlite-vec shared library. Auto-detected from
C<$ENV{SQLITE_VEC_PATH}> or L<Alien::sqlite_vec> if not specified.

=head2 collection

  my $coll = $vdb->collection('documents');
  my $coll = $vdb->collection;  # uses '_default'

Returns a L<SQLite::VecDB::Collection> for the given name. Creates the
underlying tables on first use.

=head2 collections

  my @names = $vdb->collections;

Returns the names of all existing collections.

=head1 WITH LANGERTHA — AUTOMATIC EMBEDDINGS

  use SQLite::VecDB;
  use Langertha::Engine::OpenAI;

  my $engine = Langertha::Engine::OpenAI->new(
    api_key => $ENV{OPENAI_API_KEY},
  );

  my $vdb = SQLite::VecDB->new(
    db_file    => 'vectors.db',
    dimensions => 1536,
    embedding  => $engine,
  );

  my $coll = $vdb->collection('docs');

  # Text is automatically embedded
  $coll->add_text(
    id   => 'doc1',
    text => 'Kubernetes is a container orchestration platform.',
  );

  # Query is automatically embedded
  my @results = $coll->search_text(
    text  => 'container management',
    limit => 5,
  );

=head1 EMBEDDING SETUP

SQLite::VecDB stores and searches raw vectors. To generate embeddings from
text, pass any L<Langertha> engine that supports L<Langertha::Role::Embedding>
as the C<embedding> attribute.

=head2 Local Embeddings with Ollama (Recommended for Getting Started)

The easiest way to run embeddings locally — no API key, no cloud, free:

  # Start Ollama in Docker
  docker run -d -p 11434:11434 --name ollama ollama/ollama

  # Pull an embedding model (768 dimensions, ~270MB)
  docker exec ollama ollama pull nomic-embed-text

Then in Perl:

  use SQLite::VecDB;
  use Langertha::Engine::Ollama;

  my $engine = Langertha::Engine::Ollama->new(
    url             => 'http://localhost:11434',
    embedding_model => 'nomic-embed-text',
  );

  my $vdb = SQLite::VecDB->new(
    db_file    => 'my_vectors.db',
    dimensions => 768,
    embedding  => $engine,
  );

=head2 Popular Embedding Models

  Model                            Dimensions  Provider
  ─────────────────────────────────────────────────────
  nomic-embed-text (Ollama)        768         Local
  all-minilm (Ollama)              384         Local
  mxbai-embed-large (Ollama)       1024        Local
  text-embedding-3-small (OpenAI)  1536        Cloud
  text-embedding-3-large (OpenAI)  3072        Cloud

=head2 Cloud Embeddings with OpenAI

  use Langertha::Engine::OpenAI;

  my $engine = Langertha::Engine::OpenAI->new(
    api_key => $ENV{OPENAI_API_KEY},
  );

  my $vdb = SQLite::VecDB->new(
    db_file    => 'vectors.db',
    dimensions => 1536,   # text-embedding-3-small default
    embedding  => $engine,
  );

=head1 SQLITE-VEC EXTENSION

The sqlite-vec extension must be available as a shared library. SQLite::VecDB
finds it in this order:



( run in 0.902 second using v1.01-cache-2.11-cpan-71847e10f99 )