S3 Source connector - java.lang.IllegalStateException when using Regex

I’m trying to use the Lenses Stream Reactor S3 Source to restore data from S3 to MSK. My topic data is stored as bucket/prefix/topicname/partition=0/topicname+0+0000000000.json.

I have configured the following regex:

connect.s3.source.partition.extractor.type=regex
connect.s3.source.partition.extractor.regex=(?i)^(?:.*)\/(partition=[0-9]*)\/(?:[0-9]*)[.](?:Json|Avro|Parquet|Text|Csv|Bytes)$

I can see from the connector logs that it lists the files from S3 in the correct format but produces a scala.util.matching.Regex$MatchIterator
Below is the stacktrace:

[Worker-0acc4ac8eced061c4] java.lang.IllegalStateException
[Worker-0acc4ac8eced061c4] 	at scala.util.matching.Regex$MatchIterator.ensure(Regex.scala:848)
[Worker-0acc4ac8eced061c4] 	at scala.util.matching.Regex$MatchIterator.start(Regex.scala:858)
[Worker-0acc4ac8eced061c4] 	at scala.util.matching.Regex$MatchData.group(Regex.scala:660)
[Worker-0acc4ac8eced061c4] 	at scala.util.matching.Regex$MatchData.group$(Regex.scala:659)
[Worker-0acc4ac8eced061c4] 	at scala.util.matching.Regex$MatchIterator.group(Regex.scala:805)
[Worker-0acc4ac8eced061c4] 	at io.lenses.streamreactor.connect.aws.s3.model.RegexPartitionExtractor.extract(PartitionExtractor.scala:33)
[Worker-0acc4ac8eced061c4] 	at io.lenses.streamreactor.connect.aws.s3.source.config.SourceBucketOptions.$anonfun$getPartitionExtractorFn$4(S3SourceConfig.scala:65)

There are 2 issues with your regex.

Problem 1
Your regex has the string “partition=” is inside the capturing group for the partition, however we only want to extract a numeric value for the partition. Move it outside and it will help.

connect.s3.source.partition.extractor.regex=(?i)^(?:.*)\/partition=([0-9]*)\/(?:[0-9]*)[.](?:Json|Avro|Parquet|Text|Csv|Bytes)$

Problem 2

Your regex is unsuccessfully attempting to match the filename but this might be unnecessary. As we only need to extract the partition you can massively simplify this regex:

"(?i)^(?:.*)\\/partition=([0-9]*)\\/.*$"

Hope this helps!