How to handle cache invalidation in distributed systems
cache invalidation strategies, TTL expiry, event-driven invalidation, write-through invalidation, stampede problem, cache warming
The Hardest Problem in Caching
Phil Karlton famously said there are only two hard things in computer science: cache invalidation and naming things. The problem is real: when your source of truth updates, stale data in the cache must be removed or refreshed.
TTL-Based Invalidation
Simplest approach. Set a TTL on every cached key. Accept that data can be stale for up to TTL seconds. Works well for content that can tolerate short staleness windows (product prices, catalog data).
Event-Driven Invalidation
When data changes in the DB, publish an invalidation event. Cache consumers listen and delete the affected key.
# On DB write:
db.update(user_id=123, name='Alice')
event_bus.publish('user.updated', { 'user_id': 123 })
# Cache invalidation consumer:
def on_user_updated(event):
cache.delete(f'user:{event["user_id"]}')Cache Stampede
When a popular key expires, many concurrent requests all miss the cache simultaneously and hit the DB at once. Solutions:
- Mutex/lock: first request refreshes cache; others wait
- Probabilistic early expiration: randomly refresh slightly before TTL expires
- Background refresh: a separate job pre-warms the cache before expiry
