streaming results from the CPAN

streaming

Langertha

view release on metacpan or search on metacpan


version 0.502

=head1 SYNOPSIS

    my $system_prompt = 'You are a helpful assistant.';

    # Local models via Ollama
    use Langertha::Engine::Ollama;

    my $ollama = Langertha::Engine::Ollama->new(
        url           => 'http://127.0.0.1:11434',
        model         => 'llama3.1',
        system_prompt => $system_prompt,
    );
    print $ollama->simple_chat('Do you wanna build a snowman?');

    # OpenAI
    use Langertha::Engine::OpenAI;

    my $openai = Langertha::Engine::OpenAI->new(
        api_key       => $ENV{OPENAI_API_KEY},
        model         => 'gpt-4o-mini',
        system_prompt => $system_prompt,
    );
    print $openai->simple_chat('Do you wanna build a snowman?');

    # Anthropic Claude
    use Langertha::Engine::Anthropic;

    my $claude = Langertha::Engine::Anthropic->new(
        api_key => $ENV{ANTHROPIC_API_KEY},
        model   => 'claude-sonnet-4-6',
    );
    print $claude->simple_chat('Generate Perl Moose classes to represent GeoJSON data.');

    # Google Gemini
    use Langertha::Engine::Gemini;

    my $gemini = Langertha::Engine::Gemini->new(
        api_key => $ENV{GEMINI_API_KEY},
        model   => 'gemini-2.5-flash',
    );
    print $gemini->simple_chat('Explain the difference between Moose and Moo.');

=head1 DESCRIPTION

Langertha provides a unified Perl interface for interacting with various Large
Language Model (LLM) APIs. It abstracts away provider-specific differences,
giving you a consistent API whether you're using OpenAI, Anthropic Claude,
Ollama, Groq, Mistral, or other providers.

B<THIS API IS WORK IN PROGRESS.>

=head2 Key Features

=over 4

=item * B<24 engines> -- unified API across cloud and local LLM providers

=item * B<Chat, streaming, embeddings, transcription, image generation>

=item * B<MCP tool calling> -- automatic multi-round tool loops via L<Net::Async::MCP>

=item * B<Raider> -- autonomous agent with history, compression, and plugins

=item * B<Response metadata> -- token usage, model, timing, rate limits

=item * B<Async/await> via L<Future::AsyncAwait>, sync via L<LWP::UserAgent>

=item * B<Langfuse observability> -- traces, generations, and tool spans

=item * B<Dynamic model discovery> -- query provider APIs with caching

=item * B<Chain-of-thought> -- native extraction and C<E<lt>thinkE<gt>> tag filtering

=item * B<Plugin system> for extending Raider, Chat, Embedder, and ImageGen

=back

=head2 Class Sugar

Langertha can set up your package as a Raider subclass or Plugin role:

    # Build a custom Raider agent
    package MyAgent;
    use Langertha qw( Raider );
    plugin 'Langfuse';

    around plugin_before_llm_call => async sub {
        my ($orig, $self, $conversation, $iteration) = @_;
        $conversation = await $self->$orig($conversation, $iteration);
        # ... custom logic ...
        return $conversation;
    };

    __PACKAGE__->meta->make_immutable;

    # Build a custom Plugin
    package MyApp::Guardrails;
    use Langertha qw( Plugin );

    around plugin_before_tool_call => async sub {
        my ($orig, $self, $name, $input) = @_;
        my @result = await $self->$orig($name, $input);
        return unless @result;
        return if $name eq 'dangerous_tool';
        return @result;
    };

C<use Langertha qw( Raider )> imports L<Moose> and L<Future::AsyncAwait>,
sets L<Langertha::Raider> as superclass, and provides the C<plugin>
function for applying plugins by short name.

C<use Langertha qw( Plugin )> imports L<Moose> and
L<Future::AsyncAwait>, and sets L<Langertha::Plugin> as superclass.

=head2 Engine Discovery

Langertha discovers engine modules in scope via L<Module::Pluggable> across
both namespaces:

lib/Langertha.pm view on Meta::CPAN

=item * L<Langertha::Engine::MiniMaxAnthropic> - MiniMax via legacy Anthropic-compatible endpoint

=item * L<Langertha::Engine::Gemini> - Google Gemini models (Flash, Pro)

=item * L<Langertha::Engine::vLLM> - vLLM inference server

=item * L<Langertha::Engine::SGLang> - SGLang inference server

=item * L<Langertha::Engine::HuggingFace> - HuggingFace Inference Providers

=item * L<Langertha::Engine::Perplexity> - Perplexity AI models

=item * L<Langertha::Engine::NousResearch> - Nous Research (Hermes models)

=item * L<Langertha::Engine::Cerebras> - Cerebras (wafer-scale, fastest inference)

=item * L<Langertha::Engine::OpenRouter> - OpenRouter (300+ models, meta-provider)

=item * L<Langertha::Engine::Replicate> - Replicate (thousands of open-source models)

=item * L<Langertha::Engine::OllamaOpenAI> - Ollama via OpenAI-compatible API

=item * L<Langertha::Engine::LlamaCpp> - llama.cpp server (chat, embeddings)

=item * L<Langertha::Engine::LMStudio> - LM Studio native local REST API

=item * L<Langertha::Engine::LMStudioOpenAI> - LM Studio via OpenAI-compatible API

=item * L<Langertha::Engine::LMStudioAnthropic> - LM Studio via Anthropic-compatible API

=item * L<Langertha::Engine::AKI> - AKI.IO native API (EU/Germany)

=item * L<Langertha::Engine::AKIOpenAI> - AKI.IO via OpenAI-compatible API

=item * L<Langertha::Engine::TSystems> - T-Systems AI Foundation Services / LLM Hub (EU/Germany)

=item * L<Langertha::Engine::Scaleway> - Scaleway Generative APIs (EU)

=item * L<Langertha::Engine::TranscriptionBase> - Slim base for OpenAI-shape
transcription-only engines (no chat / tools / embeddings / image generation).
L<Langertha::Engine::OpenAI> exposes a C<whisper> attribute returning an
instance of this class bound to the parent's C<api_key> / C<url>.

=item * L<Langertha::Engine::Whisper> - Self-hosted Whisper-compatible
transcription server (extends TranscriptionBase)

=back

=head2 Roles

Roles provide composable functionality to engines:

=over 4

=item * L<Langertha::Role::Capabilities> - C<engine_capabilities> registry
plus C<supports($cap)> helper, composed by L<Langertha::Role::Chat>

=item * L<Langertha::Role::Chat> - Synchronous and async chat methods,
including C<chat_f(messages =E<gt> [...], tools =E<gt> [...], tool_choice
=E<gt> ..., response_format =E<gt> ...)> for single-turn structured
calls and C<aggregate_tool_calls(\@chunks)> for streaming

=item * L<Langertha::Role::HTTP> - HTTP request/response handling

=item * L<Langertha::Role::Streaming> - Streaming response processing

=item * L<Langertha::Role::JSON> - JSON encode/decode

=item * L<Langertha::Role::OpenAICompatible> - OpenAI-compatible API behaviour

=item * L<Langertha::Role::SystemPrompt> - System prompt attribute

=item * L<Langertha::Role::Temperature> - Temperature parameter

=item * L<Langertha::Role::ResponseSize> - Max response size parameter

=item * L<Langertha::Role::ResponseFormat> - Response format (JSON mode)

=item * L<Langertha::Role::ContextSize> - Context window size parameter

=item * L<Langertha::Role::Seed> - Deterministic seed parameter

=item * L<Langertha::Role::Models> - Model listing

=item * L<Langertha::Role::Embedding> - Embedding generation

=item * L<Langertha::Role::Transcription> - Audio transcription

=item * L<Langertha::Role::Tools> - Tool/function calling

=item * L<Langertha::Role::HermesTools> - Hermes-style tool calling via
C<E<lt>tool_callE<gt>> XML tags for models without native API tool support

=item * L<Langertha::Role::ImageGeneration> - Image generation

=item * L<Langertha::Role::KeepAlive> - Keep-alive duration for local models

=item * L<Langertha::Role::PluginHost> - Plugin system for wrapper classes and Raider

=item * L<Langertha::Role::Langfuse> - Langfuse observability integration (engine-level)

=item * L<Langertha::Role::OpenAPI> - OpenAPI spec support

=back

=head2 Wrapper Classes

These classes wrap an engine with optional overrides and plugin lifecycle hooks:

=over 4

=item * L<Langertha::Chat> - Chat wrapper with system prompt, model, and temperature overrides

=item * L<Langertha::Embedder> - Embedding wrapper with optional model override

=item * L<Langertha::ImageGen> - Image generation wrapper with model, size, and quality overrides

=back

=head2 Plugins

=over 4

=item * L<Langertha::Plugin> - Base class for all plugins

=item * L<Langertha::Plugin::Langfuse> - Langfuse observability (traces, generations, spans)

=back

=head2 Data Objects

=over 4

=item * L<Langertha::Response> - LLM response with content, usage, and rate
limit metadata; C<tool_calls> is an ArrayRef of L<Langertha::ToolCall> and
the single source of truth for both native and synthesized tool calls

=item * L<Langertha::ToolCall> - Canonical tool invocation produced by an
LLM, with C<synthetic> flag for forced-tool fallbacks

=item * L<Langertha::ToolChoice> - Canonical tool-selection policy with
per-provider serializers (C<to_openai>, C<to_anthropic>, C<to_gemini>,
C<to_perplexity>)

=item * L<Langertha::Tool> - Canonical tool definition with cross-provider
serializers (C<to_openai>, C<to_anthropic>, C<to_gemini>, C<to_mcp>,
C<to_json_schema>) and accepting constructors (C<from_openai>,
C<from_anthropic>, C<from_mcp>, C<from_gemini>, C<from_hash>)

=item * L<Langertha::Content> / L<Langertha::Content::Image> -
Provider-agnostic vision input

=item * L<Langertha::RateLimit> - Normalized rate limit data from HTTP response headers

=item * L<Langertha::Stream> - Iterator over streaming chunks

=item * L<Langertha::Stream::Chunk> - A single chunk from a streaming
response (with optional C<tool_calls> for engines that emit them mid-stream)

=item * L<Langertha::Raider> - Autonomous agent with history and tool calling

=item * L<Langertha::Raider::Result> - Typed raid result (final, question, pause, abort)

=item * L<Langertha::Request::HTTP> - Internal HTTP request object

=back

=head2 Streaming

All engines that implement L<Langertha::Role::Chat> support streaming. There
are several ways to consume a stream:

B<Synchronous with callback:>

    $engine->simple_chat_stream(sub {
        my ($chunk) = @_;
        print $chunk->content;
    }, 'Tell me a story');

B<Synchronous with iterator (L<Langertha::Stream>):>

    my $stream = $engine->simple_chat_stream_iterator('Tell me a story');
    while (my $chunk = $stream->next) {
        print $chunk->content;
    }

B<Async with Future (traditional):>

    my $future = $engine->simple_chat_f('Hello');
    my $response = $future->get;

    my $future = $engine->simple_chat_stream_f('Tell me a story');
    my ($content, $chunks) = $future->get;

B<Async with Future::AsyncAwait (recommended):>

    use Future::AsyncAwait;

    async sub chat_with_ai {
        my ($engine) = @_;
        my $response = await $engine->simple_chat_f('Hello');
        say "AI says: $response";
        return $response;
    }

    async sub stream_chat {
        my ($engine) = @_;
        my ($content, $chunks) = await $engine->simple_chat_stream_realtime_f(
            sub { print shift->content },
            'Tell me a story',
        );
        say "\nReceived ", scalar(@$chunks), " chunks";
        return $content;
    }

    chat_with_ai($engine)->get;
    stream_chat($engine)->get;

The C<_f> methods use L<IO::Async> and L<Net::Async::HTTP> internally, loaded
lazily only when you call them. See C<examples/async_await_example.pl> for
complete working examples.

B<Using with Mojolicious:>

    use Mojo::Base -strict;
    use Future::Mojo;
    use Langertha::Engine::OpenAI;

    my $openai = Langertha::Engine::OpenAI->new(
        api_key => $ENV{OPENAI_API_KEY},

( run in 2.431 seconds using v1.01-cache-2.11-cpan-0bb4e1dffa6 )