Beginners guide to balance your data across Apache kafka partitions

src: Beginners guide to balance your data across Apache Kafka partitions By Olena Kutsenko

Abstract

number of partitions depends on how data is consumed

find the key allowing highest cardinality and most critical ordering

keep in mind data distribution over time

observe and monitor

be ready for the change

Partitioning:

a mechanism to ditribute the topic data across multiple brokers

a way to parallelize the production and consumption of the messages

the key to scale the system horizontally

Distributed systems are complex:

durable messaging

concurrent distributed transaction

HA, scaling, security

Measuring performance:

throughput: nb of messages that goes through the system in a given amount of time

how many people fit into a bus

latency: overall time it takes to process each message

total time for a traveler to complete the journey

lag: delta between the last produced message and the last consumer’s committed message

how far is the farthest passenger behind the front of the arriving group

How many partitions do you need?

Having too few partitions:

low throughput

increase in lag

puts limit on the upper nb of consumers - leads to struggling consumers

Too many partitions:

backups, upgrades, other admin operations require more effort

more potential consumer rebalances

producers are slower at batching and distributing data, compression is less effective, etc

Start with the end in mind.

Default partitioners

With the hash based partitioner, if you are changing the number of partitions, order of records may be compromised, because the hash(key) % nb of partitions formula may not send the record to the partition prior of the partition nb change.

Sticky partitioning may increase ~50% in performance.

linger.ms (default is 0ms)

batch.size (default is ~16MB)

Unbalanced partitions

How to avoid?

find the highest cardinality, with most critical ordering

Data is not produced evenly over time, so we can tweak the linger.ms and batch.size settings.

Stick partitioner problem, how to solve:

kafka >= 3.3

partitioner.adaptive.partitionng.enable=true

partitioner.availability.timeout.ms > 0

kafka < 3.3

increase linger.ms

rebalance partitions on a regular basis

Fetch size is too low on a consumer side.

Measure and monitoring

consumer lag

under replicated partition

fetch-latency-avg and fetch-latency-max

Many hot partitions

create a new topic and move the data

useful for

rebalancing records across partitions

disaster recovery

scaling up and down

changing schema

🗒️ l-lin

Explorer

Beginners guide to balance your data across Apache kafka partitions

How many partitions do you need?

Default partitioners

Unbalanced partitions

Measure and monitoring

Many hot partitions

Explorer

Recent Notes

kubetailrb

kotlin

sops

so you want to build an event driven system

ai

Graph View

Table of Contents

Backlinks