I am using the Stream reactor GCP and S3 connectors. How can I check that all records are sent to the bucket?

Andrew_Stevenson · March 27, 2025, 11:41am

The sink connector batches records that are written to buckets. Within each file are records, in order of offset per partition.

So, for example, you have partition 1 and 10 records, the connector will write a file containing these records and upload to GCP, the filename is the partition and the last offset in the file. In this case 1_0000000010.json.

Kafka then delivers to the sink connector the next 10 (might be more, might be less) records in the partition, and the connector uploads a new file with the 10 records. The second file name would be 1_0000000020.json, containing offsets 11 to 20.

To reconcile, we can then check the file sequence is correct, in order and matches the record count per file, e.g:

1_0000000010.json contains 10 records
1_0000000020.json contains 10 records

10 + 10 = 20, and the filename offset matches that of the last file.

The script to calculate this depends on what tools you have to do this, maybe BigQuery in GCP.

Once you have this, we can compare it to Kafka using Lenses. Head to SQL Studio and run a query to count messages per topic.

SELECT _meta.partition, count(*)
FROM airline-luggage
GROUP BY _meta.partition

You can expand this query to include filtering by message values or other metadata from the topics, i.e. record timestamp etc.

More information about SQL can be found here.

Topic		Replies	Views
GCSStorage Sink Connector Apache Kafka Connectors kafka-connect	2	28	May 12, 2025
S3 sink fails with "Too many index files have accumulated (6 out of max 5)" Apache Kafka Connectors	2	260	July 23, 2024
Lenses S3 Sink Connector missing sinking the event metadata information in the S3 bucket Apache Kafka Connectors	1	150	January 4, 2024
S3, GCS and Azure Datalake source connectors are ignoring objects Apache Kafka Connectors kafka-connect	1	49	October 22, 2024
S3 connector restart to backup data General kafka-connect	3	224	April 16, 2024

I am using the Stream reactor GCP and S3 connectors. How can I check that all records are sent to the bucket?

Related topics