kafka 5 years feedback

Abstract

  • one topic for all:
    • 👎 one message slow down the other messages
    • 👎 complex monitoring
    • 👎 non specialized consumer
  • functional topics
    • 👍 similar messages in processing time
    • 👍 simpler debugging and monitoring
  • dead letter
    • 👍 no message loss
    • 👍 limited impact from one tenant to another
    • 👎 order not kept
  • transactional messages
    • 👍 no message loss
    • 👍 no distributed transaction
    • 👎 queue based implementation in database
    • 👎 small loss of real time

Anti-patterns

  • batch producer
    • let Kafka do the batch
  • Kafka message size
    • do not send binaries
    • store the binaries in a file storage
    • send the Kafka message containing the file URL
  • message processing time
    • have deterministic and constant processing time in the same topic
    • set the max.poll.interval.ms and max.poll.records depending on the expected consumption
  • topic definition
    • no one topic for all
    • one topic per use case
    • define naming convention
      • e.g. <business domain>.<data description>.<classification>
      • catalog.product.cmd containing the message UpdateProduct, DeleteProduct

Number of partitions

  • repartition order loss
  • it’s best to over-partition than the inverse
  • choose a number of partition that is easy to divide (e.g. 12, 24 or 36)

Partition key

  • partition by homogenous business value

Monitoring

  • add a correlation id to the messages
  • monitor the message production throughput
  • monitor the consumer group lags

Access control

  • no manual action
  • kafka prod == database prod

Checklist

  • producers
    • acks
    • ma.in.flight.requests.per.connection
    • partition keys
  • consumers
    • max.poll.interval.ms
    • max.poll.records
    • monitor consumer group lags

Bonus: Kafka Properties Documentation