How does Lenses infer message Key/Value Serialisation type for a topic?

Hi Team!

Because of a problem with our configuration we had to rebuild the Lenses database by dropping and reimporting the configuration. However, after Lenses was running again, we detected that the types of keys and values for our topics were different. For example, a topic using Long keys and Avro payloads is now shown as key=Bytes and value=JSON. Could you please explain how key and value types are inferred and what might have gone wrong here?

Thank You!

Thank you for your question. Lenses uses two primary methods to infer the key and value types for Kafka topics:

  1. Auto-Detection via Interval Scanning: Lenses periodically scans topics to detect their data types. The frequency of this scan is controlled by the lenses.interval.type.detection configuration setting. Scanning involves sampling messages from the end of the log of each topic (specifically, from partition 0).

  2. Schema Registry Events: Lenses also detects when a new schema is added to the Schema Registry. When this happens, it immediately attempts to associate that schema with a topic. This process is most effective when your Schema Registry is using the TopicName subject naming strategy, as it creates a clear link between the schema and the topic.

Important Note: Lenses will not overwrite an existing key/value type configuration. Once a type has been set (either automatically by Lenses or manually by a user), it will be skipped in future scans. This ensures that your manual configurations are always respected.

Now by recreating the database, you forced Lenses to try to autodetect the topic types once again.

For example, a topic using Long keys and Avro payloads is now shown as key=Bytes and value=JSON.

A value/payload type of JSON is inferred for a topic whenever the messages sampled by the auto-detection process are valid JSON. I see two reasons why a wrong value format could be shown for this topic:

  • A Misconfigured Producer: It’s possible that a producer application is erroneously sending JSON-formatted messages to this topic instead of the expected Avro format.
  • Manual Override: A user may have manually set the topic’s value format to JSON within Lenses. The Audit logs should contain a record of this change if this is the case. (see the attached image for reference)

Similarly, here are the likely reasons the key is being identified as Bytes instead of Long:

  • Incorrect Data Size: For Lenses to deserialise the key as a Long, the binary array must be exactly 8 bytes. If the data is any other size, it will default to Bytes if it can’t match with another type.
  • Manual Override [excluded]: As with the value, a user may have manually set the key’s format to Bytes.
  • Sampling Delay: It’s highly unlikely, but there could be a delay in Lenses sampling the topic. This usually resolves within a few minutes for an active topic.
  • Null Key records: if the sampled records have null Key, then Lenses will imply Bytes.

You can also trigger the auto-identification by navigating to the topic and from the topic actions menu chose “Reset Types”.

I hope the above explains why topics identification might differ from the expectations. Please let me know if you have any more questions.

Cheers,
Sebastian

1 Like