Designing Real SystemsLesson 5.5
How to design a distributed message queue like Kafka
topics and partitions, consumer groups, offset management, message retention, producer acknowledgment, at-least-once delivery, ordering guarantees
Why Message Queues Exist
A message queue decouples producers from consumers. The producer doesn't wait for the consumer to process — it fires and moves on. This absorbs traffic spikes and allows independent scaling.
Kafka Core Concepts
- Topic: named stream of messages (e.g., 'user.events')
- Partition: ordered, immutable log within a topic. Parallelism unit.
- Offset: position of a message within a partition. Consumers track their offset.
- Consumer Group: each partition is consumed by exactly one consumer in a group. N consumers can read N partitions in parallel.
from kafka import KafkaProducer, KafkaConsumer
producer = KafkaProducer(bootstrap_servers='kafka:9092')
producer.send('user.events', key=b'user:123', value=b'{"action":"login"}')
consumer = KafkaConsumer('user.events', group_id='analytics-service',
bootstrap_servers='kafka:9092')
for msg in consumer:
process(msg.value)Ordering Guarantees
Kafka guarantees ordering within a partition. Messages with the same key always go to the same partition (key-based routing), so all events for user 123 are processed in order.
Delivery Semantics
- At-most-once: may lose messages, never duplicates
- At-least-once: may duplicate, never loses (most common)
- Exactly-once: Kafka transactions + idempotent consumers
