S3, GCS and Azure Datalake source connectors are ignoring objects

It looks like your configuration is using the default bucket object ordering, which is alphanumeric. This is generally the most efficient approach, as it takes advantage of S3’s built-in ability to list objects from a watermark in a lexicographical order. However, if your objects arrive at different times, some may be skipped if they appear before the last processed watermark in the lexicographic order.

From the object names you’re using, it seems this might be happening in your case. To resolve this, you should change the ordering by setting connect.s3.ordering.type to LastModified. This configuration ensures that objects are processed based on their timestamps instead. You can find more details in our documentation here.

Please note that using LastModified requires the connector to list all objects and sort them by their modification time, which impacts performance. And with growing number of objects the impact would only grow.