= ollama_chat :type: processor :status: experimental :categories: ["AI"] //// THIS FILE IS AUTOGENERATED! To make changes, edit the corresponding source file under: https://github.com/redpanda-data/connect/tree/main/internal/impl/. And: https://github.com/redpanda-data/connect/tree/main/cmd/tools/docs_gen/templates/plugin.adoc.tmpl //// // © 2024 Redpanda Data Inc. component_type_dropdown::[] Generates responses to messages in a chat conversation, using the Ollama API. Introduced in version 4.32.0. [tabs] ====== Common:: + -- ```yml # Common config fields, showing default values label: "" ollama_chat: model: llama3.1 # No default (required) prompt: "" # No default (optional) image: 'root = this.image.decode("base64") # decode base64 encoded image' # No default (optional) response_format: text max_tokens: 0 # No default (optional) temperature: 0 # No default (optional) runner: context_size: 0 # No default (optional) batch_size: 0 # No default (optional) server_address: http://127.0.0.1:11434 # No default (optional) ``` -- Advanced:: + -- ```yml # All config fields, showing default values label: "" ollama_chat: model: llama3.1 # No default (required) prompt: "" # No default (optional) system_prompt: "" # No default (optional) image: 'root = this.image.decode("base64") # decode base64 encoded image' # No default (optional) response_format: text max_tokens: 0 # No default (optional) temperature: 0 # No default (optional) num_keep: 0 # No default (optional) seed: 42 # No default (optional) top_k: 0 # No default (optional) top_p: 0 # No default (optional) repeat_penalty: 0 # No default (optional) presence_penalty: 0 # No default (optional) frequency_penalty: 0 # No default (optional) stop: [] # No default (optional) runner: context_size: 0 # No default (optional) batch_size: 0 # No default (optional) gpu_layers: 0 # No default (optional) threads: 0 # No default (optional) use_mmap: false # No default (optional) use_mlock: false # No default (optional) server_address: http://127.0.0.1:11434 # No default (optional) cache_directory: /opt/cache/connect/ollama # No default (optional) download_url: "" # No default (optional) ``` -- ====== This processor sends prompts to your chosen Ollama large language model (LLM) and generates text from the responses, using the Ollama API. By default, the processor starts and runs a locally installed Ollama server. Alternatively, to use an already running Ollama server, add your server details to the `server_address` field. You can https://ollama.com/download[download and install Ollama from the Ollama website^]. For more information, see the https://github.com/ollama/ollama/tree/main/docs[Ollama documentation^]. == Examples [tabs] ====== Use Llava to analyze an image:: + -- This example fetches image URLs from stdin and has a multimodal LLM describe the image. ```yaml input: stdin: scanner: lines: {} pipeline: processors: - http: verb: GET url: "${!content().string()}" - ollama_chat: model: llava prompt: "Describe the following image" image: "root = content()" output: stdout: codec: lines ``` -- ====== == Fields === `model` The name of the Ollama LLM to use. For a full list of models, see the https://ollama.com/models[Ollama website]. *Type*: `string` ```yml # Examples model: llama3.1 model: gemma2 model: qwen2 model: phi3 ``` === `prompt` The prompt you want to generate a response for. By default, the processor submits the entire payload as a string. This field supports xref:configuration:interpolation.adoc#bloblang-queries[interpolation functions]. *Type*: `string` === `system_prompt` The system prompt to submit to the Ollama LLM. This field supports xref:configuration:interpolation.adoc#bloblang-queries[interpolation functions]. *Type*: `string` === `image` The image to submit along with the prompt to the model. The result should be a byte array. *Type*: `string` Requires version 4.38.0 or newer ```yml # Examples image: 'root = this.image.decode("base64") # decode base64 encoded image' ``` === `response_format` The format of the response that the Ollama model generates. If specifying JSON output, then the `prompt` should specify that the output should be in JSON as well. *Type*: `string` *Default*: `"text"` Options: `text` , `json` . === `max_tokens` The maximum number of tokens to predict and output. Limiting the amount of output means that requests are processed faster and have a fixed limit on the cost. *Type*: `int` === `temperature` The temperature of the model. Increasing the temperature makes the model answer more creatively. *Type*: `int` === `num_keep` Specify the number of tokens from the initial prompt to retain when the model resets its internal context. By default, this value is set to `4`. Use `-1` to retain all tokens from the initial prompt. *Type*: `int` === `seed` Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. *Type*: `int` ```yml # Examples seed: 42 ``` === `top_k` Reduces the probability of generating nonsense. A higher value, for example `100`, will give more diverse answers. A lower value, for example `10`, will be more conservative. *Type*: `int` === `top_p` Works together with `top-k`. A higher value, for example 0.95, will lead to more diverse text. A lower value, for example 0.5, will generate more focused and conservative text. *Type*: `float` === `repeat_penalty` Sets how strongly to penalize repetitions. A higher value, for example 1.5, will penalize repetitions more strongly. A lower value, for example 0.9, will be more lenient. *Type*: `float` === `presence_penalty` Positive values penalize new tokens if they have appeared in the text so far. This increases the model's likelihood to talk about new topics. *Type*: `float` === `frequency_penalty` Positive values penalize new tokens based on the frequency of their appearance in the text so far. This decreases the model's likelihood to repeat the same line verbatim. *Type*: `float` === `stop` Sets the stop sequences to use. When this pattern is encountered the LLM stops generating text and returns the final response. *Type*: `array` === `runner` Options for the model runner that are used when the model is first loaded into memory. *Type*: `object` === `runner.context_size` Sets the size of the context window used to generate the next token. Using a larger context window uses more memory and takes longer to processor. *Type*: `int` === `runner.batch_size` The maximum number of requests to process in parallel. *Type*: `int` === `runner.gpu_layers` This option allows offloading some layers to the GPU for computation. This generally results in increased performance. By default, the runtime decides the number of layers dynamically. *Type*: `int` === `runner.threads` Set the number of threads to use during generation. For optimal performance, it is recommended to set this value to the number of physical CPU cores your system has. By default, the runtime decides the optimal number of threads. *Type*: `int` === `runner.use_mmap` Map the model into memory. This is only support on unix systems and allows loading only the necessary parts of the model as needed. *Type*: `bool` === `runner.use_mlock` Lock the model in memory, preventing it from being swapped out when memory-mapped. This option can improve performance but reduces some of the advantages of memory-mapping because it uses more RAM to run and can slow down load times as the model loads into RAM. *Type*: `bool` === `server_address` The address of the Ollama server to use. Leave the field blank and the processor starts and runs a local Ollama server or specify the address of your own local or remote server. *Type*: `string` ```yml # Examples server_address: http://127.0.0.1:11434 ``` === `cache_directory` If `server_address` is not set - the directory to download the ollama binary and use as a model cache. *Type*: `string` ```yml # Examples cache_directory: /opt/cache/connect/ollama ``` === `download_url` If `server_address` is not set - the URL to download the ollama binary from. Defaults to the offical Ollama GitHub release for this platform. *Type*: `string`