= parquet_decode :type: processor :status: experimental :categories: ["Parsing"] //// THIS FILE IS AUTOGENERATED! To make changes, edit the corresponding source file under: https://github.com/redpanda-data/connect/tree/main/internal/impl/. And: https://github.com/redpanda-data/connect/tree/main/cmd/tools/docs_gen/templates/plugin.adoc.tmpl //// // © 2024 Redpanda Data Inc. component_type_dropdown::[] Decodes https://parquet.apache.org/docs/[Parquet files^] into a batch of structured messages. Introduced in version 4.4.0. ```yml # Config fields, showing default values label: "" parquet_decode: {} ``` This processor uses https://github.com/parquet-go/parquet-go[https://github.com/parquet-go/parquet-go^], which is itself experimental. Therefore changes could be made into how this processor functions outside of major version releases. == Examples [tabs] ====== Reading Parquet Files from AWS S3:: + -- In this example we consume files from AWS S3 as they're written by listening onto an SQS queue for upload events. We make sure to use the `to_the_end` scanner which means files are read into memory in full, which then allows us to use a `parquet_decode` processor to expand each file into a batch of messages. Finally, we write the data out to local files as newline delimited JSON. ```yaml input: aws_s3: bucket: TODO prefix: foos/ scanner: to_the_end: {} sqs: url: TODO processors: - parquet_decode: {} output: file: codec: lines path: './foos/${! meta("s3_key") }.jsonl' ``` -- ======