Script Valley
System Design: APIs, Caching & Scalability
Rate Limiting and ThrottlingLesson 4.1

Why APIs need rate limiting and how it works

rate limiting definition, abuse prevention, DDoS mitigation, fair usage, cost control, rate limiting vs throttling, 429 status code

Why APIs need rate limiting and how it works

Rate limiting overview

What Rate Limiting Solves

Without rate limiting, a single client can send unlimited requests, overwhelming your servers, degrading service for everyone else, and driving up infrastructure costs. Rate limiting enforces a maximum request rate per client identified by IP, API key, or user ID.

Rate limiting and throttling are distinct: rate limiting rejects excess requests immediately with 429 Too Many Requests. Throttling adds artificial delay to slow clients down. Rate limiting is the standard for APIs.

Response Headers

Always tell clients their current rate limit status:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 43
X-RateLimit-Reset: 1718001234
Retry-After: 60

Retry-After is required on 429 responses — it tells the client exactly how many seconds to wait before retrying.

What to Limit By

IP-based limiting is easy to implement but easy to bypass by rotating IPs. API key or user ID limiting is more robust for authenticated endpoints. For public endpoints, combine both. Apply different limits per tier: free users get 100 req/min, paid users get 10,000 req/min. Rate limit at the gateway layer, not deep in application code, so it executes before expensive processing.

Up next

Fixed window vs sliding window rate limiting algorithms

Sign in to track progress