= gcp_vertex_ai_embeddings :type: processor :status: experimental :categories: ["AI"] //// THIS FILE IS AUTOGENERATED! To make changes, edit the corresponding source file under: https://github.com/redpanda-data/connect/tree/main/internal/impl/. And: https://github.com/redpanda-data/connect/tree/main/cmd/tools/docs_gen/templates/plugin.adoc.tmpl //// // © 2024 Redpanda Data Inc. component_type_dropdown::[] Generates vector embeddings to represent input text, using the Vertex AI API. Introduced in version 4.37.0. ```yml # Config fields, showing default values label: "" gcp_vertex_ai_embeddings: project: "" # No default (required) credentials_json: "" # No default (optional) location: us-central1 model: text-embedding-004 # No default (required) task_type: RETRIEVAL_DOCUMENT text: "" # No default (optional) output_dimensions: 0 # No default (optional) ``` This processor sends text strings to the Vertex AI API, which generates vector embeddings. By default, the processor submits the entire payload of each message as a string, unless you use the `text` configuration field to customize it. For more information, see the https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings[Vertex AI documentation^]. == Fields === `project` GCP project ID to use *Type*: `string` === `credentials_json` An optional field to set google Service Account Credentials json. [CAUTION] ==== This field contains sensitive information that usually shouldn't be added to a config directly, read our xref:configuration:secrets.adoc[secrets page for more info]. ==== *Type*: `string` === `location` The location of the model. *Type*: `string` *Default*: `"us-central1"` === `model` The name of the LLM to use. For a full list of models, see the https://console.cloud.google.com/vertex-ai/model-garden[Vertex AI Model Garden]. *Type*: `string` ```yml # Examples model: text-embedding-004 model: text-multilingual-embedding-002 ``` === `task_type` The way to optimize embeddings that the model generates for specific use cases. *Type*: `string` *Default*: `"RETRIEVAL_DOCUMENT"` |=== | Option | Summary | `CLASSIFICATION` | optimize for being able classify texts according to preset labels | `CLUSTERING` | optimize for clustering texts based on their similarities | `FACT_VERIFICATION` | optimize for queries that are proving or disproving a fact such as "apples grow underground" | `QUESTION_ANSWERING` | optimize for search proper questions such as "Why is the sky blue?" | `RETRIEVAL_DOCUMENT` | optimize for documents that will be searched (also known as a corpus) | `RETRIEVAL_QUERY` | optimize for queries such as "What is the best fish recipe?" or "best restaurant in Chicago" | `SEMANTIC_SIMILARITY` | optimize for text similarity |=== === `text` The text you want to compute vector embeddings for. By default, the processor submits the entire payload as a string. This field supports xref:configuration:interpolation.adoc#bloblang-queries[interpolation functions]. *Type*: `string` === `output_dimensions` The maximum length for the output embedding size. If set, the output embeddings will be truncated to this size. *Type*: `int`