= pg_stream
:type: input
:status: beta
:categories: ["Services"]


////
     THIS FILE IS AUTOGENERATED!

     To make changes, edit the corresponding source file under:

     https://github.com/redpanda-data/connect/tree/main/internal/impl/<provider>.

     And:

     https://github.com/redpanda-data/connect/tree/main/cmd/tools/docs_gen/templates/plugin.adoc.tmpl
////

// © 2024 Redpanda Data Inc.


component_type_dropdown::[]


Streams changes from a PostgreSQL database using logical replication.

Introduced in version 4.39.0.


[tabs]
======
Common::
+
--

```yml
# Common config fields, showing default values
input:
  label: ""
  pg_stream:
    dsn: postgres://foouser:foopass@localhost:5432/foodb?sslmode=disable # No default (required)
    batch_transactions: true
    stream_snapshot: false
    snapshot_memory_safety_factor: 1
    snapshot_batch_size: 0
    schema: public # No default (required)
    tables: [] # No default (required)
    checkpoint_limit: 1024
    temporary_slot: false
    slot_name: ""
    pg_standby_timeout: 10s
    pg_wal_monitor_interval: 3s
    max_parallel_snapshot_tables: 1
    auto_replay_nacks: true
    batching:
      count: 0
      byte_size: 0
      period: ""
      check: ""
```

--
Advanced::
+
--

```yml
# All config fields, showing default values
input:
  label: ""
  pg_stream:
    dsn: postgres://foouser:foopass@localhost:5432/foodb?sslmode=disable # No default (required)
    batch_transactions: true
    stream_snapshot: false
    snapshot_memory_safety_factor: 1
    snapshot_batch_size: 0
    schema: public # No default (required)
    tables: [] # No default (required)
    checkpoint_limit: 1024
    temporary_slot: false
    slot_name: ""
    pg_standby_timeout: 10s
    pg_wal_monitor_interval: 3s
    max_parallel_snapshot_tables: 1
    auto_replay_nacks: true
    batching:
      count: 0
      byte_size: 0
      period: ""
      check: ""
      processors: [] # No default (optional)
```

--
======

Streams changes from a PostgreSQL database for Change Data Capture (CDC).
Additionally, if `stream_snapshot` is set to true, then the existing data in the database is also streamed too.

== Metadata

This input adds the following metadata fields to each message:
- mode (Either "streaming" or "snapshot" indicating whether the message is part of a streaming operation or snapshot processing)
- table (Name of the table that the message originated from)
- operation (Type of operation that generated the message, such as INSERT, UPDATE, or DELETE)
		

== Fields

=== `dsn`

The Data Source Name for the PostgreSQL database in the form of `postgres://[user[:password]@][netloc][:port][/dbname][?param1=value1&...]`. Please note that Postgres enforces SSL by default, you can override this with the parameter `sslmode=disable` if required.


*Type*: `string`


```yml
# Examples

dsn: postgres://foouser:foopass@localhost:5432/foodb?sslmode=disable
```

=== `batch_transactions`

When set to true, transactions are batched into a single message.


*Type*: `bool`

*Default*: `true`

=== `stream_snapshot`

When set to true, the plugin will first stream a snapshot of all existing data in the database before streaming changes. In order to use this the tables that are being snapshot MUST have a primary key set so that reading from the table can be parallelized.


*Type*: `bool`

*Default*: `false`

```yml
# Examples

stream_snapshot: true
```

=== `snapshot_memory_safety_factor`

Determines the fraction of available memory that can be used for streaming the snapshot. Values between 0 and 1 represent the percentage of memory to use. Lower values make initial streaming slower but help prevent out-of-memory errors.


*Type*: `float`

*Default*: `1`

```yml
# Examples

snapshot_memory_safety_factor: 0.2
```

=== `snapshot_batch_size`

The number of rows to fetch in each batch when querying the snapshot. A value of 0 lets the plugin determine the batch size based on `snapshot_memory_safety_factor` property.


*Type*: `int`

*Default*: `0`

```yml
# Examples

snapshot_batch_size: 10000
```

=== `schema`

The PostgreSQL schema from which to replicate data.


*Type*: `string`


```yml
# Examples

schema: public
```

=== `tables`

A list of table names to include in the logical replication. Each table should be specified as a separate item.


*Type*: `array`


```yml
# Examples

tables: |2-
  			- my_table
  			- my_table_2
  		
```

=== `checkpoint_limit`

The maximum number of messages that can be processed at a given time. Increasing this limit enables parallel processing and batching at the output level. Any given LSN will not be acknowledged unless all messages under that offset are delivered in order to preserve at least once delivery guarantees.


*Type*: `int`

*Default*: `1024`

=== `temporary_slot`

If set to true, creates a temporary replication slot that is automatically dropped when the connection is closed.


*Type*: `bool`

*Default*: `false`

=== `slot_name`

The name of the PostgreSQL logical replication slot to use. If not provided, a random name will be generated. You can create this slot manually before starting replication if desired.


*Type*: `string`

*Default*: `""`

```yml
# Examples

slot_name: my_test_slot
```

=== `pg_standby_timeout`

Specify the standby timeout before refreshing an idle connection.


*Type*: `string`

*Default*: `"10s"`

```yml
# Examples

pg_standby_timeout: 30s
```

=== `pg_wal_monitor_interval`

How often to report changes to the replication lag.


*Type*: `string`

*Default*: `"3s"`

```yml
# Examples

pg_wal_monitor_interval: 6s
```

=== `max_parallel_snapshot_tables`

Int specifies a number of tables that will be processed in parallel during the snapshot processing stage


*Type*: `int`

*Default*: `1`

=== `auto_replay_nacks`

Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation.


*Type*: `bool`

*Default*: `true`

=== `batching`

Allows you to configure a xref:configuration:batching.adoc[batching policy].


*Type*: `object`


```yml
# Examples

batching:
  byte_size: 5000
  count: 0
  period: 1s

batching:
  count: 10
  period: 1s

batching:
  check: this.contains("END BATCH")
  count: 0
  period: 1m
```

=== `batching.count`

A number of messages at which the batch should be flushed. If `0` disables count based batching.


*Type*: `int`

*Default*: `0`

=== `batching.byte_size`

An amount of bytes at which the batch should be flushed. If `0` disables size based batching.


*Type*: `int`

*Default*: `0`

=== `batching.period`

A period in which an incomplete batch should be flushed regardless of its size.


*Type*: `string`

*Default*: `""`

```yml
# Examples

period: 1s

period: 1m

period: 500ms
```

=== `batching.check`

A xref:guides:bloblang/about.adoc[Bloblang query] that should return a boolean value indicating whether a message should end a batch.


*Type*: `string`

*Default*: `""`

```yml
# Examples

check: this.type == "end_of_transaction"
```

=== `batching.processors`

A list of xref:components:processors/about.adoc[processors] to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op.


*Type*: `array`


```yml
# Examples

processors:
  - archive:
      format: concatenate

processors:
  - archive:
      format: lines

processors:
  - archive:
      format: json_array
```