= parquet_encode
:type: processor
:status: experimental
:categories: ["Parsing"]


////
     THIS FILE IS AUTOGENERATED!

     To make changes, edit the corresponding source file under:

     https://github.com/redpanda-data/connect/tree/main/internal/impl/<provider>.

     And:

     https://github.com/redpanda-data/connect/tree/main/cmd/tools/docs_gen/templates/plugin.adoc.tmpl
////

// © 2024 Redpanda Data Inc.


component_type_dropdown::[]


Encodes https://parquet.apache.org/docs/[Parquet files^] from a batch of structured messages.

Introduced in version 4.4.0.


[tabs]
======
Common::
+
--

```yml
# Common config fields, showing default values
label: ""
parquet_encode:
  schema: [] # No default (required)
  default_compression: uncompressed
```

--
Advanced::
+
--

```yml
# All config fields, showing default values
label: ""
parquet_encode:
  schema: [] # No default (required)
  default_compression: uncompressed
  default_encoding: DELTA_LENGTH_BYTE_ARRAY
```

--
======

This processor uses https://github.com/parquet-go/parquet-go[https://github.com/parquet-go/parquet-go^], which is itself experimental. Therefore changes could be made into how this processor functions outside of major version releases.


== Examples

[tabs]
======
Writing Parquet Files to AWS S3::
+
--

In this example we use the batching mechanism of an `aws_s3` output to collect a batch of messages in memory, which then converts it to a parquet file and uploads it.

```yaml
output:
  aws_s3:
    bucket: TODO
    path: 'stuff/${! timestamp_unix() }-${! uuid_v4() }.parquet'
    batching:
      count: 1000
      period: 10s
      processors:
        - parquet_encode:
            schema:
              - name: id
                type: INT64
              - name: weight
                type: DOUBLE
              - name: content
                type: BYTE_ARRAY
            default_compression: zstd
```

--
======

== Fields

=== `schema`

Parquet schema.


*Type*: `array`


=== `schema[].name`

The name of the column.


*Type*: `string`


=== `schema[].type`

The type of the column, only applicable for leaf columns with no child fields. Some logical types can be specified here such as UTF8.


*Type*: `string`


Options:
`BOOLEAN`
, `INT32`
, `INT64`
, `FLOAT`
, `DOUBLE`
, `BYTE_ARRAY`
, `UTF8`
.

=== `schema[].repeated`

Whether the field is repeated.


*Type*: `bool`

*Default*: `false`

=== `schema[].optional`

Whether the field is optional.


*Type*: `bool`

*Default*: `false`

=== `schema[].fields`

A list of child fields.


*Type*: `array`


```yml
# Examples

fields:
  - name: foo
    type: INT64
  - name: bar
    type: BYTE_ARRAY
```

=== `default_compression`

The default compression type to use for fields.


*Type*: `string`

*Default*: `"uncompressed"`

Options:
`uncompressed`
, `snappy`
, `gzip`
, `brotli`
, `zstd`
, `lz4raw`
.

=== `default_encoding`

The default encoding type to use for fields. A custom default encoding is only necessary when consuming data with libraries that do not support `DELTA_LENGTH_BYTE_ARRAY` and is therefore best left unset where possible.


*Type*: `string`

*Default*: `"DELTA_LENGTH_BYTE_ARRAY"`
Requires version 4.11.0 or newer

Options:
`DELTA_LENGTH_BYTE_ARRAY`
, `PLAIN`
.