= gcp_vertex_ai_embeddings
:type: processor
:status: experimental
:categories: ["AI"]


////
     THIS FILE IS AUTOGENERATED!

     To make changes, edit the corresponding source file under:

     https://github.com/redpanda-data/connect/tree/main/internal/impl/<provider>.

     And:

     https://github.com/redpanda-data/connect/tree/main/cmd/tools/docs_gen/templates/plugin.adoc.tmpl
////

// © 2024 Redpanda Data Inc.


component_type_dropdown::[]


Generates vector embeddings to represent input text, using the Vertex AI API.

Introduced in version 4.37.0.

```yml
# Config fields, showing default values
label: ""
gcp_vertex_ai_embeddings:
  project: "" # No default (required)
  credentials_json: "" # No default (optional)
  location: us-central1
  model: text-embedding-004 # No default (required)
  task_type: RETRIEVAL_DOCUMENT
  text: "" # No default (optional)
  output_dimensions: 0 # No default (optional)
```

This processor sends text strings to the Vertex AI API, which generates vector embeddings. By default, the processor submits the entire payload of each message as a string, unless you use the `text` configuration field to customize it.

For more information, see the https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings[Vertex AI documentation^].

== Fields

=== `project`

GCP project ID to use


*Type*: `string`


=== `credentials_json`

An optional field to set google Service Account Credentials json.
[CAUTION]
====
This field contains sensitive information that usually shouldn't be added to a config directly, read our xref:configuration:secrets.adoc[secrets page for more info].
====


*Type*: `string`


=== `location`

The location of the model.


*Type*: `string`

*Default*: `"us-central1"`

=== `model`

The name of the LLM to use. For a full list of models, see the https://console.cloud.google.com/vertex-ai/model-garden[Vertex AI Model Garden].


*Type*: `string`


```yml
# Examples

model: text-embedding-004

model: text-multilingual-embedding-002
```

=== `task_type`

The way to optimize embeddings that the model generates for specific use cases.


*Type*: `string`

*Default*: `"RETRIEVAL_DOCUMENT"`

|===
| Option | Summary

| `CLASSIFICATION`
| optimize for being able classify texts according to preset labels
| `CLUSTERING`
| optimize for clustering texts based on their similarities
| `FACT_VERIFICATION`
| optimize for queries that are proving or disproving a fact such as "apples grow underground"
| `QUESTION_ANSWERING`
| optimize for search proper questions such as "Why is the sky blue?"
| `RETRIEVAL_DOCUMENT`
| optimize for documents that will be searched (also known as a corpus)
| `RETRIEVAL_QUERY`
| optimize for queries such as "What is the best fish recipe?" or "best restaurant in Chicago"
| `SEMANTIC_SIMILARITY`
| optimize for text similarity

|===

=== `text`

The text you want to compute vector embeddings for. By default, the processor submits the entire payload as a string.
This field supports xref:configuration:interpolation.adoc#bloblang-queries[interpolation functions].


*Type*: `string`


=== `output_dimensions`

The maximum length for the output embedding size. If set, the output embeddings will be truncated to this size.


*Type*: `int`