Is it possible to configure the Lenses AWS S3 connector to create the same file without the partition offset, and overwrite the old file if it exists?
Hi Asaf,
Before addressing your question, it would be beneficial to gain a better understanding of your specific scenario.
To your inquiry, generating the same file name again is intentionally not supported by design. Allow me to elucidate. The connector adheres to a fundamental guarantee of achieving exactly-once data writing to S3. This commitment necessitates the inclusion of offset information with the file name. If we were to separate the offset and file name, it would introduce a dual-write scenario, requiring a more complex implementation involving two-phase commit details.
In the event of a connector task restart, a common occurrence in distributed systems, the sink must seamlessly resume operations. As the concept of exactly-once in Kafka is somewhat elusive, it’s plausible that the same record may be handed over to the sink at least once. To address this, the connector task references its source of truth:S3 upon reboot, determining the last offset written. This strategy ensures that records are not processed twice, maintaining data integrity.
In summary, the connector is intentionally designed to store the offset as part of the filename to achieve exactly once. Therefore for a given topic partition, the same filename won’t be used.
hope this clarifies the rationale behind the design decision. I look forward to understanding more about your specific scenario for further assistance.
oh i understand what you are saying ,
we wanted to upload to s3 configuration file from postgress to s3 service and using the connector to write the configuration file on aws.
i see what it is not ok to write the same file all the time