streaming results from the CPAN

Langertha
view release on metacpan or search on metacpan
# Langertha â€” CLAUDE.md

## Overview

Langertha is a Perl LLM framework supporting 15+ engines via composable Moose roles. It provides chat, tool calling (MCP), streaming, embeddings, transcription, and an autonomous agent (Raider).

## Build System

Uses `[@Author::GETTY]` Dist::Zilla plugin bundle.

```bash
dzil test           # Build and test
prove -l t/         # Run tests directly
prove -lv t/60_tool_calling.t  # Single test, verbose
```

## Architecture

### Engine Hierarchy (lib/Langertha/Engine/)

```
Engine::Remote              url required, JSON + HTTP
  â”‚
  â”œâ”€â”€ Engine::AnthropicBase /v1/messages format, x-api-key auth, SSE streaming
  â”‚     â”‚
  â”‚     â”œâ”€â”€ Anthropic       Claude models, thinking blocks, tool_use
  â”‚     â”œâ”€â”€ MiniMaxAnthropic MiniMax via legacy /anthropic/v1 shim endpoint
  â”‚     â””â”€â”€ LMStudioAnthropic LM Studio Anthropic-compatible endpoint
  â”‚
  â”œâ”€â”€ Engine::OpenAIBase    /chat/completions format, Bearer auth, SSE streaming
  â”‚     â”‚
  â”‚     â”‚  Cloud providers (url has default, api_key from env)
  â”‚     â”œâ”€â”€ OpenAI          gpt-4o, embeddings, whisper transcription, structured output
  â”‚     â”œâ”€â”€ DeepSeek        deepseek-chat/reasoner, structured output
  â”‚     â”œâ”€â”€ Groq            ultra-fast inference, whisper transcription, structured output
  â”‚     â”œâ”€â”€ Mistral         EU-hosted, embeddings, structured output
  â”‚     â”œâ”€â”€ MiniMax         Shanghai (default), 1M context window, M2.7
  â”‚     â”œâ”€â”€ NousResearch    Hermes models, <tool_call> XML tool format
  â”‚     â”œâ”€â”€ Cerebras        wafer-scale chips, fastest inference
  â”‚     â”œâ”€â”€ OpenRouter      meta-provider, 300+ models, provider/model format
  â”‚     â”œâ”€â”€ Replicate       thousands of open-source models, owner/model format
  â”‚     â”œâ”€â”€ HuggingFace     Inference Providers, org/model format
  â”‚     â”œâ”€â”€ Perplexity      search-augmented, citations â€” NO tool calling
  â”‚     â”œâ”€â”€ AKIOpenAI       EU/Germany, GDPR-compliant
  â”‚     â”œâ”€â”€ TSystems        T-Systems AIFS / LLM Hub, T-Cloud Germany + EU hyperscaler models
  â”‚     â”œâ”€â”€ Scaleway        EU-hosted Generative APIs, drop-in OpenAI replacement
  â”‚     â”‚
  â”‚     â”‚  Self-hosted (url required, no api_key)
  â”‚     â”œâ”€â”€ OllamaOpenAI    Ollama /v1 endpoint, embeddings
  â”‚     â”œâ”€â”€ vLLM            high-throughput inference, single-model server
  â”‚     â”œâ”€â”€ SGLang          SGLang OpenAI-compatible server, fast structured output
  â”‚     â”œâ”€â”€ LlamaCpp        llama.cpp server, embeddings
  â”‚     â””â”€â”€ LMStudioOpenAI  LM Studio's OpenAI-compatible endpoint
  â”‚
  â”œâ”€â”€ Engine::TranscriptionBase  Transcription-only OpenAI-shape base (no chat/tools)
  â”‚     â”‚
  â”‚     â””â”€â”€ Whisper         self-hosted faster-whisper-server etc.
  â”‚
  â”‚  Non-OpenAI formats (own request/response handling)
  â”œâ”€â”€ Gemini                ?key= auth, functionDeclarations, thought parts
  â”œâ”€â”€ Ollama                native /api/chat, NDJSON streaming, OpenAPI spec
  â”œâ”€â”€ AKI                   key-in-body auth, EU/Germany, /api/call/{model}
  â””â”€â”€ LMStudio              LM Studio native API (non-OpenAI/non-Anthropic)
```

**LMStudio family** â€” LM Studio servers can expose three different
endpoints: `LMStudio` is the native API, `LMStudioOpenAI` is the
OpenAI-compatible endpoint, and `LMStudioAnthropic` is the
Anthropic-compatible endpoint. Pick whichever your LM Studio server is
configured to serve.

**AKI family** â€” `AKI` is the official AKI.IO native API (changes
often, breaks). `AKIOpenAI` is the more stable OpenAI-compatible
endpoint, but it sometimes lacks features. Both are provided so users
can pick their tradeoff; we don't endorse one over the other.

**Whisper / `->whisper` accessor** â€” `Whisper` no longer extends
`OpenAI` (since post-0.404 refactor). It extends the new
`TranscriptionBase` so it has only transcription functionality, no
chat / tools / embeddings / image generation. To get a transcription
handle from an existing `OpenAI` instance use the `whisper` attribute
â€” it returns a `TranscriptionBase` configured with the parent's
`api_key` and `url` so credentials don't have to be restated.

### Roles (lib/Langertha/Role/)

- **Capabilities** â€” `engine_capabilities` registry + `supports($cap)`
  helper. Composed by `Chat` (and indirectly via every other capability
  role). Mapping roleâ†’cap-flag lives in one map in `Role::Capabilities`;
  engines override via `around engine_capabilities` for wire-reality
  corrections (e.g. clearing `tool_choice_named` on string-only providers).
- **Chat** â€” sync/async chat (`simple_chat`, `simple_chat_f`); also
  `chat_f(messages => [...], tools => [...], tool_choice => ...,
  response_format => ...)` for single-turn structured calls.
- **Tools** â€” MCP tool calling loop (`chat_with_tools_f`, `mcp_servers`)
- **HermesTools** â€” XML-tag tool calling for models without native support
- **Streaming** â€” SSE / NDJSON streaming responses
- **Embedding** â€” Vector embeddings (`simple_embedding`)
- **Transcription** â€” Audio transcription
- **HTTP** â€” HTTP transport (sync + async via IO::Async)
- **JSON** â€” JSON encoding/decoding (`$self->json->encode/decode`)
- **SystemPrompt** â€” System prompt management
- **Temperature**, **ResponseSize**, **ContextSize**, **Seed** â€” Generation parameters
- **ResponseFormat** â€” JSON mode / structured output, plus
  `$self->decode_loose_json($text)` for tolerant parsing of
  prose-wrapped or fenced JSON output (overridable per engine)
- **Models** â€” Model selection and defaults
- **Langfuse** â€” Observability (traces, spans, generations)
- **OpenAICompatible** â€” OpenAI-format request/response handling
- **OpenAPI** â€” OpenAPI spec validation
- **ThinkTag** â€” Chain-of-thought `<think>` tag filtering

### Core Classes

- **Langertha::Response** â€” LLM response with metadata, stringifies to
  content. `tool_calls` is an `ArrayRef[Langertha::ToolCall]` (single
  source of truth for emitted tool calls â€” native and synthetic).
- **Langertha::Stream** / **Stream::Chunk** â€” Streaming iteration.
  `Stream::Chunk` carries an optional `tool_calls` field; helper
  `aggregate_tool_calls(\@chunks)` on `Role::Chat` collects them.
- **Langertha::ToolCall** â€” canonical tool invocation produced by an
  LLM (with `synthetic` flag for forced-tool fallbacks).
- **Langertha::ToolChoice** â€” canonical tool-selection policy with
  per-provider serializers (`to_openai`, `to_anthropic`, `to_gemini`,
  `to_perplexity`).
- **Langertha::Tool** â€” canonical tool definition with cross-provider
  serializers (`to_openai`, `to_anthropic`, `to_gemini`, `to_mcp`,
  `to_json_schema`) and accepting constructors (`from_openai`,
  `from_anthropic`, `from_mcp`, `from_gemini`, `from_hash`).
- **Langertha::Content::Image** â€” provider-agnostic vision input.
- **Langertha::Request::HTTP** â€” Internal HTTP request wrapper
- **Langertha::Raider** â€” Autonomous agent (see below)
- **Langertha::Raider::Result** â€” Raid result with type handling

### Tool & Structured-Output Flow

Three inputs combine: caller arguments (`tools`/`tool_choice`/
`response_format`/`mcp_servers`), method (`chat_f` single-turn vs
`chat_with_tools_f` multi-turn loop), and engine caps. `chat_f`
auto-rewrites between forms when the wire reality demands it; every
case lands as a `Langertha::ToolCall` on `Response.tool_calls`.

| Caller passes | Engine has | What `chat_f` does |
|---|---|---|
| `tools` only (no choice) | `tools_native` | forwarded to wire (per-provider via `Tool->to_X`) |
| `tools` only | only `tools_hermes` | only via `chat_with_tools_f` (XML in prompt) |
| `tools` + `tool_choice={type=>'tool',name=>X}` | `tool_choice_named` | native forced-name |
| `tools` + `tool_choice={type=>'tool',name=>X}` | only `response_format_json_schema` (Perplexity) | **auto-rewrite**: clears tools/choice, sets `response_format=json_schema` from tool's schema; loose-parses content; attaches synthetic `ToolCall` |
| `response_format=json_*` | `response_format_json_*` | native (Geminiâ†’`responseSchema`, Ollamaâ†’`format`) |
| `response_format=json_*` | only `tool_choice_named` (Anthropic) | engine-internal: synth tool + forced choice; `tool_use` input lifted into `Response.content` as JSON |
| `mcp_servers` set | `tools_native` or `tools_hermes` | use `chat_with_tools_f` for multi-turn loop |

Per-provider wire payload: OpenAI `tools=[{type=>'function',...}]` /
`tool_calls` in `choices[0].message`; Anthropic `tools=[{name,input_schema}]`
/ `tool_use` blocks in `content[]`; Gemini `functionDeclarations` +
`toolConfig.functionCallingConfig` / `functionCall` parts; Ollama
OpenAI-shape natively. Hermes engines (NousResearch, AKI, AKIOpenAI)
( run in 1.636 second using v1.01-cache-2.11-cpan-0bb4e1dffa6 )