Reset offset files in S3 for Lenses s3 sink connector

Hi Team! I have added s3 sink connector in MSK connect. When I remove and create the connector again, the .indexes folder in s3 bucket for offsets are not removed. Is there any configuration to automatically remove the .indexes folder in s3 bucket for offsets? In case, .indexes folder in s3 bucket for offsets is not removed, when I recreate the connector, I see invalid task warning due to offsets mismatch

Hello Shaik_ahmed,

Your inquiry raises a valid scenario. Currently, the Connect framework lacks a mechanism to inform a connector when it is being removed. When combined with the frequent restarts of tasks (due to worker failures or the addition of new connectors to the cluster), the sink connector finds itself unable to distinguish between a connector being recreated and a task restart event.

In instances where offset mismatches occur, it’s not advisable for the connector to make assumptions—given the absence of contextual information—whether it’s a recreation or potentially a genuine bug/corruption. Therefore, discarding data, especially if it’s voluminous, isn’t an acceptable solution without certainty regarding the nature of the event.

For now, if you opt to recreate the connector, we recommend wiping the target bucket and the index folder associated with the connector to ensure data integrity.

Stefan

Hi stheppi,

Thanks for your inputs. Is there a possibility to add such feature in future?

Thanks,
Samdan

Hi shaik_ahmed,

At this point, we will have to investigate. But as I said, Kafka Connect does not provide options to understand when a sink connector starts and has not been running before.

Stefan

1 Like