I have a topic that I’m sinking into an S3 bucket in AVRO format, but it’s very slow. Is there a way to increase the data throughput?
Yes, you might be able to increase throughput by increasing parallelism, but you need to pay attention to a few points. The first thing is to ensure you have the latest your sink or source connector here: GitHub - lensesio/stream-reactor: A collection of open source Apache 2.0 Kafka Connector maintained by Lenses.io.. The approach using parallelism depends on the number of partitions in your topic and the number of tasks; ideally, you should have the same number of partitions and tasks.
Additionally, you can change the number of messages per poll, which is set to 1024 by default, by modifying the kcql parameter. For example:
insert into dest_topic select * from source_bucket storeas Avro LIMIT 10000”.
You also need to consider the network of your Kafka broker and the type of disk you are using, as these are limiting factors. Also, there is a documentation about the S3 sink and source with a lot of details: