How to parallelize sinking data from Kafka to S3 using the Lenses S3 Connector?

ejhon_s · December 11, 2023, 11:33pm

I have a topic that I’m sinking into an S3 bucket in AVRO format, but it’s very slow. Is there a way to increase the data throughput?

grudtnerv · December 11, 2023, 11:41pm

Yes, you might be able to increase throughput by increasing parallelism, but you need to pay attention to a few points. The first thing is to ensure you have the latest your sink or source connector here: GitHub - lensesio/stream-reactor: A collection of open source Apache 2.0 Kafka Connector maintained by Lenses.io.. The approach using parallelism depends on the number of partitions in your topic and the number of tasks; ideally, you should have the same number of partitions and tasks.

Additionally, you can change the number of messages per poll, which is set to 1024 by default, by modifying the kcql parameter. For example:

“connect.s3.kcql”: “insert into dest_topic select * from source_bucket storeas Avro LIMIT 10000”.

You also need to consider the network of your Kafka broker and the type of disk you are using, as these are limiting factors. Also, there is a documentation about the S3 sink and source with a lot of details:

Sink: Kafka to AWS S3 open source connector | Lenses.io Documentation
Source: AWS S3 to Kafka open source connector | Lenses.io Documentation

Topic		Replies	Views
Top Tips for Using Lenses.io and Kafka Connect to Stream Data Lenses	0	35	August 30, 2024
Back up and restore Kafka Topic without conversion Apache Kafka Connectors	1	124	May 23, 2024
Trying to sink __consumer_offsets topic with Lenses S3 connector Apache Kafka Connectors	1	273	December 21, 2023
How to control the workload distribution in Lenses connectors? Apache Kafka Connectors	3	23	February 20, 2025
Lenses s3 connector sink missing metadata information Apache Kafka Connectors	4	232	June 14, 2024

How to parallelize sinking data from Kafka to S3 using the Lenses S3 Connector?

Related topics