= ollama_embeddings
:type: processor
:status: experimental
:categories: ["AI"]


////
     THIS FILE IS AUTOGENERATED!

     To make changes, edit the corresponding source file under:

     https://github.com/redpanda-data/connect/tree/main/internal/impl/<provider>.

     And:

     https://github.com/redpanda-data/connect/tree/main/cmd/tools/docs_gen/templates/plugin.adoc.tmpl
////

// © 2024 Redpanda Data Inc.


component_type_dropdown::[]


Generates vector embeddings from text, using the Ollama API.

Introduced in version 4.32.0.


[tabs]
======
Common::
+
--

```yml
# Common config fields, showing default values
label: ""
ollama_embeddings:
  model: nomic-embed-text # No default (required)
  text: "" # No default (optional)
  runner:
    context_size: 0 # No default (optional)
    batch_size: 0 # No default (optional)
  server_address: http://127.0.0.1:11434 # No default (optional)
```

--
Advanced::
+
--

```yml
# All config fields, showing default values
label: ""
ollama_embeddings:
  model: nomic-embed-text # No default (required)
  text: "" # No default (optional)
  runner:
    context_size: 0 # No default (optional)
    batch_size: 0 # No default (optional)
    gpu_layers: 0 # No default (optional)
    threads: 0 # No default (optional)
    use_mmap: false # No default (optional)
    use_mlock: false # No default (optional)
  server_address: http://127.0.0.1:11434 # No default (optional)
  cache_directory: /opt/cache/connect/ollama # No default (optional)
  download_url: "" # No default (optional)
```

--
======

This processor sends text to your chosen Ollama large language model (LLM) and creates vector embeddings, using the Ollama API. Vector embeddings are long arrays of numbers that represent values or objects, in this case text. 

By default, the processor starts and runs a locally installed Ollama server. Alternatively, to use an already running Ollama server, add your server details to the `server_address` field. You can https://ollama.com/download[download and install Ollama from the Ollama website^].

For more information, see the https://github.com/ollama/ollama/tree/main/docs[Ollama documentation^].

== Examples

[tabs]
======
Store embedding vectors in Qdrant::
+
--

Compute embeddings for some generated data and store it within xrefs:component:outputs/qdrant.adoc[Qdrant]

```yamlinput:
  generate:
    interval: 1s
    mapping: |
      root = {"text": fake("paragraph")}
pipeline:
  processors:
  - ollama_embeddings:
      model: snowflake-artic-embed
      text: "${!this.text}"
output:
  qdrant:
    grpc_host: localhost:6334
    collection_name: "example_collection"
    id: "root = uuid_v4()"
    vector_mapping: "root = this"
```

--
Store embedding vectors in Clickhouse::
+
--

Compute embeddings for some generated data and store it within https://clickhouse.com/[Clickhouse^]

```yamlinput:
  generate:
    interval: 1s
    mapping: |
      root = {"text": fake("paragraph")}
pipeline:
  processors:
  - branch:
      processors:
      - ollama_embeddings:
          model: snowflake-artic-embed
          text: "${!this.text}"
      result_map: |
        root.embeddings = this
output:
  sql_insert:
    driver: clickhouse
    dsn: "clickhouse://localhost:9000"
    table: searchable_text
    columns: ["id", "text", "vector"]
    args_mapping: "root = [uuid_v4(), this.text, this.embeddings]"
```

--
======

== Fields

=== `model`

The name of the Ollama LLM to use. For a full list of models, see the https://ollama.com/models[Ollama website].


*Type*: `string`


```yml
# Examples

model: nomic-embed-text

model: mxbai-embed-large

model: snowflake-artic-embed

model: all-minilm
```

=== `text`

The text you want to create vector embeddings for. By default, the processor submits the entire payload as a string.
This field supports xref:configuration:interpolation.adoc#bloblang-queries[interpolation functions].


*Type*: `string`


=== `runner`

Options for the model runner that are used when the model is first loaded into memory.


*Type*: `object`


=== `runner.context_size`

Sets the size of the context window used to generate the next token. Using a larger context window uses more memory and takes longer to processor.


*Type*: `int`


=== `runner.batch_size`

The maximum number of requests to process in parallel.


*Type*: `int`


=== `runner.gpu_layers`

This option allows offloading some layers to the GPU for computation. This generally results in increased performance. By default, the runtime decides the number of layers dynamically.


*Type*: `int`


=== `runner.threads`

Set the number of threads to use during generation. For optimal performance, it is recommended to set this value to the number of physical CPU cores your system has. By default, the runtime decides the optimal number of threads.


*Type*: `int`


=== `runner.use_mmap`

Map the model into memory. This is only support on unix systems and allows loading only the necessary parts of the model as needed.


*Type*: `bool`


=== `runner.use_mlock`

Lock the model in memory, preventing it from being swapped out when memory-mapped. This option can improve performance but reduces some of the advantages of memory-mapping because it uses more RAM to run and can slow down load times as the model loads into RAM.


*Type*: `bool`


=== `server_address`

The address of the Ollama server to use. Leave the field blank and the processor starts and runs a local Ollama server or specify the address of your own local or remote server.


*Type*: `string`


```yml
# Examples

server_address: http://127.0.0.1:11434
```

=== `cache_directory`

If `server_address` is not set - the directory to download the ollama binary and use as a model cache.


*Type*: `string`


```yml
# Examples

cache_directory: /opt/cache/connect/ollama
```

=== `download_url`

If `server_address` is not set - the URL to download the ollama binary from. Defaults to the offical Ollama GitHub release for this platform.


*Type*: `string`