Script Valley
Interview Prep: System Design Rounds
Caching SystemsLesson 4.5

How to design a distributed cache system

cache sharding, cache cluster topology, replication for availability, hot key problem, local vs remote cache, cache hierarchy

Distributed Cache Architecture

A single Redis node maxes out at ~100GB and 100K ops/second. For larger systems, you need distributed caching across multiple nodes.

Sharding the Cache

Use consistent hashing to map keys to cache nodes. The client library (or a proxy like Twemproxy) handles routing.

# Client-side sharding with redis-py
from rediscluster import RedisCluster

startup_nodes = [
    {'host': 'cache-0', 'port': 6379},
    {'host': 'cache-1', 'port': 6379},
    {'host': 'cache-2', 'port': 6379}
]
rc = RedisCluster(startup_nodes=startup_nodes, decode_responses=True)
rc.set('user:123', 'alice')  # automatically routed to correct shard

Hot Key Problem

If one key (e.g., a viral post) gets 1M reads/second, all requests hit one cache node. Solutions:

  • Replicate hot keys to multiple nodes and randomly pick one per request
  • Use local in-process cache (L1) for ultra-hot keys, backed by Redis (L2)

Local vs Remote Cache Hierarchy

  • L1 (in-process): microsecond access, limited size, per-instance cache (Guava Cache, Caffeine)
  • L2 (Redis cluster): millisecond access, shared across instances, larger
  • L3 (database): tens of milliseconds, source of truth