= group_by
:type: processor
:status: stable
:categories: ["Composition"]


////
     THIS FILE IS AUTOGENERATED!

     To make changes, edit the corresponding source file under:

     https://github.com/redpanda-data/connect/tree/main/internal/impl/<provider>.

     And:

     https://github.com/redpanda-data/connect/tree/main/cmd/tools/docs_gen/templates/plugin.adoc.tmpl
////

// © 2024 Redpanda Data Inc.


component_type_dropdown::[]


Splits a xref:configuration:batching.adoc[batch of messages] into N batches, where each resulting batch contains a group of messages determined by a xref:guides:bloblang/about.adoc[Bloblang query].

```yml
# Config fields, showing default values
label: ""
group_by: [] # No default (required)
```

Once the groups are established a list of processors are applied to their respective grouped batch, which can be used to label the batch as per their grouping. Messages that do not pass the check of any specified group are placed in their own group.

The functionality of this processor depends on being applied across messages that are batched. You can find out more about batching xref:configuration:batching.adoc[in this doc].

== Fields

=== `[].check`

A xref:guides:bloblang/about.adoc[Bloblang query] that should return a boolean value indicating whether a message belongs to a given group.


*Type*: `string`


```yml
# Examples

check: this.type == "foo"

check: this.contents.urls.contains("https://benthos.dev/")

check: "true"
```

=== `[].processors`

A list of xref:components:processors/about.adoc[processors] to execute on the newly formed group.


*Type*: `array`

*Default*: `[]`

== Examples

[tabs]
======
Grouped Processing::
+
--

Imagine we have a batch of messages that we wish to split into a group of foos and everything else, which should be sent to different output destinations based on those groupings. We also need to send the foos as a tar gzip archive. For this purpose we can use the `group_by` processor with a xref:components:outputs/switch.adoc[`switch`] output:

```yaml
pipeline:
  processors:
    - group_by:
      - check: content().contains("this is a foo")
        processors:
          - archive:
              format: tar
          - compress:
              algorithm: gzip
          - mapping: 'meta grouping = "foo"'

output:
  switch:
    cases:
      - check: meta("grouping") == "foo"
        output:
          gcp_pubsub:
            project: foo_prod
            topic: only_the_foos
      - output:
          gcp_pubsub:
            project: somewhere_else
            topic: no_foos_here
```

--
======