The sink connector batches records that are written to buckets. Within each file are records, in order of offset per partition.
So, for example, you have partition 1 and 10 records, the connector will write a file containing these records and upload to GCP, the filename is the partition and the last offset in the file. In this case 1_0000000010.json.
Kafka then delivers to the sink connector the next 10 (might be more, might be less) records in the partition, and the connector uploads a new file with the 10 records. The second file name would be 1_0000000020.json, containing offsets 11 to 20.
To reconcile, we can then check the file sequence is correct, in order and matches the record count per file, e.g:
1_0000000010.json contains 10 records
1_0000000020.json contains 10 records
10 + 10 = 20, and the filename offset matches that of the last file.
The script to calculate this depends on what tools you have to do this, maybe BigQuery in GCP.
Once you have this, we can compare it to Kafka using Lenses. Head to SQL Studio and run a query to count messages per topic.
SELECT _meta.partition, count(*)
FROM airline-luggage
GROUP BY _meta.partition
You can expand this query to include filtering by message values or other metadata from the topics, i.e. record timestamp etc.
More information about SQL can be found here.