SQL Processors Best Practices

Hello,

I’m creating a SQL Processor and I’m trying to understand scaling runners. What’s really happening under the hood? I understand it’s running on Kubernetes, so does scaling mean more pods? And does that automatically parallelize processing across Kafka topics/partitions, or is there something else we need to do?

Thank you!

Great question! Let me break it down for you:

  • Yes, SQL Processors in Lenses run on Kubernetes. When you scale up or down runners, what you’re really doing is changing the replica count in the underlying Kubernetes deployment. Kubernetes then creates (or terminates) pods to match that state.

  • Parallelization happens through Kafka consumer groups. Each SQL Processor uses Kafka consumers to read data from upstream topics. The maximum parallelization is the max topic-partitions count of input topics. Let’s take an example where we join two input topics together:

    • Topic A has 3 partitions

    • Topic B has 5 partitions

    • The maximum parallelism you can achieve in this instance is 5 (assuming that the output topic is not pre-created and the join is key-based).

  • Runner count is set manually. By default, a SQL Processor starts with one runner. Lenses does not automatically scale runners based on topic partitions—you would need to configure the number of runners to match your Kafka infrastructure and performance needs.

  • Advanced configuration allows you to tune pod resources at a finer level (CPU/memory per runner), which can help optimize performance based on throughput, partition size, and retention policies.

So, to summarize: scaling up runners increases pods, but the real limit to parallelism is the number of Kafka partitions. You’ll want to size runners accordingly, and up to that maximum.