= parquet_encode :type: processor :status: experimental :categories: ["Parsing"] //// THIS FILE IS AUTOGENERATED! To make changes, edit the corresponding source file under: https://github.com/redpanda-data/connect/tree/main/internal/impl/. And: https://github.com/redpanda-data/connect/tree/main/cmd/tools/docs_gen/templates/plugin.adoc.tmpl //// // © 2024 Redpanda Data Inc. component_type_dropdown::[] Encodes https://parquet.apache.org/docs/[Parquet files^] from a batch of structured messages. Introduced in version 4.4.0. [tabs] ====== Common:: + -- ```yml # Common config fields, showing default values label: "" parquet_encode: schema: [] # No default (required) default_compression: uncompressed ``` -- Advanced:: + -- ```yml # All config fields, showing default values label: "" parquet_encode: schema: [] # No default (required) default_compression: uncompressed default_encoding: DELTA_LENGTH_BYTE_ARRAY ``` -- ====== This processor uses https://github.com/parquet-go/parquet-go[https://github.com/parquet-go/parquet-go^], which is itself experimental. Therefore changes could be made into how this processor functions outside of major version releases. == Examples [tabs] ====== Writing Parquet Files to AWS S3:: + -- In this example we use the batching mechanism of an `aws_s3` output to collect a batch of messages in memory, which then converts it to a parquet file and uploads it. ```yaml output: aws_s3: bucket: TODO path: 'stuff/${! timestamp_unix() }-${! uuid_v4() }.parquet' batching: count: 1000 period: 10s processors: - parquet_encode: schema: - name: id type: INT64 - name: weight type: DOUBLE - name: content type: BYTE_ARRAY default_compression: zstd ``` -- ====== == Fields === `schema` Parquet schema. *Type*: `array` === `schema[].name` The name of the column. *Type*: `string` === `schema[].type` The type of the column, only applicable for leaf columns with no child fields. Some logical types can be specified here such as UTF8. *Type*: `string` Options: `BOOLEAN` , `INT32` , `INT64` , `FLOAT` , `DOUBLE` , `BYTE_ARRAY` , `UTF8` . === `schema[].repeated` Whether the field is repeated. *Type*: `bool` *Default*: `false` === `schema[].optional` Whether the field is optional. *Type*: `bool` *Default*: `false` === `schema[].fields` A list of child fields. *Type*: `array` ```yml # Examples fields: - name: foo type: INT64 - name: bar type: BYTE_ARRAY ``` === `default_compression` The default compression type to use for fields. *Type*: `string` *Default*: `"uncompressed"` Options: `uncompressed` , `snappy` , `gzip` , `brotli` , `zstd` , `lz4raw` . === `default_encoding` The default encoding type to use for fields. A custom default encoding is only necessary when consuming data with libraries that do not support `DELTA_LENGTH_BYTE_ARRAY` and is therefore best left unset where possible. *Type*: `string` *Default*: `"DELTA_LENGTH_BYTE_ARRAY"` Requires version 4.11.0 or newer Options: `DELTA_LENGTH_BYTE_ARRAY` , `PLAIN` .