AI-Ollama-Client
view release on metacpan or search on metacpan
ollama/ollama-curated.yaml view on Meta::CPAN
use_mmap:
type: boolean
description: |
Enable mmap. (Default: false)
use_mlock:
type: boolean
description: |
Enable mlock. (Default: false)
embedding_only:
type: boolean
description: |
Enable embedding only. (Default: false)
rope_frequency_base:
type: number
format: float
description: |
The base of the rope frequency scale. (Default: 1.0)
rope_frequency_scale:
type: number
format: float
description: |
The scale of the rope frequency. (Default: 1.0)
num_thread:
type: integer
description: |
Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number o...
ResponseFormat:
type: string
description: |
The format to return a response in. Currently the only accepted value is json.
Enable JSON mode by setting the format parameter to json. This will structure the response as valid JSON.
Note: it's important to instruct the model to use JSON in the prompt. Otherwise, the model may generate large amounts whitespace.
enum:
- json
GenerateCompletionResponse:
type: object
description: The response class for the generate endpoint.
properties:
model:
type: string
description: *model_name
example: llama2:7b
created_at:
type: string
format: date-time
description: Date on which a model was created.
example: 2023-08-04T19:22:45.499127Z
response:
type: string
description: The response for a given prompt with a provided model.
example: The sky appears blue because of a phenomenon called Rayleigh scattering.
done:
type: boolean
description: Whether the response has completed.
example: true
context:
type: array
description: |
An encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory.
items:
type: integer
example: [ 1, 2, 3 ]
total_duration:
type: integer
description: Time spent generating the response.
example: 5589157167
load_duration:
type: integer
description: Time spent in nanoseconds loading the model.
example: 3013701500
prompt_eval_count:
type: integer
description: Number of tokens in the prompt.
example: 46
prompt_eval_duration:
type: integer
description: Time spent in nanoseconds evaluating the prompt.
example: 1160282000
eval_count:
type: integer
description: Number of tokens the response.
example: 113
eval_duration:
type: integer
description: Time in nanoseconds spent generating the response.
example: 1325948000
GenerateChatCompletionRequest:
type: object
description: Request class for the chat endpoint.
properties:
model:
type: string
description: *model_name
example: llama2:7b
messages:
type: array
description: The messages of the chat, this can be used to keep a chat memory
items:
$ref: '#/components/schemas/Message'
format:
$ref: '#/components/schemas/ResponseFormat'
options:
$ref: '#/components/schemas/RequestOptions'
stream:
type: boolean
description: *stream
default: false
keep_alive:
type: integer
description: *keep_alive
required:
- model
- messages
GenerateChatCompletionResponse:
type: object
description: The response class for the chat endpoint.
properties:
message:
$ref: '#/components/schemas/Message'
( run in 1.749 second using v1.01-cache-2.11-cpan-0bb4e1dffa6 )