Rate Limiting
Rate limiting protects services from abuse, overload, and unfair usage by restricting how many requests a client can make over time. It is especially important for public APIs, login flows, and expensive endpoints.
Reading time
9 min
Why Rate Limiting Matters
Without rate limiting, one user or bot can consume disproportionate capacity, hurt latency for everyone else, or brute-force sensitive endpoints. It is a foundational protection for both availability and security.
Common Algorithms
- Fixed window: counts requests in a fixed time slot, simple but vulnerable to burst at window edges
- Sliding window: tracks a rolling time window per client, smoother and more accurate than fixed
- Token bucket: clients accumulate tokens over time and spend one per request, allows controlled bursting
- Leaky bucket: requests are processed at a constant rate regardless of arrival pattern, smooths traffic
Where It Is Applied
- Per IP
- Per user
- Per API key
- Per tenant or subscription plan
- Per endpoint or route
HTTP Response Codes
A request that exceeds the limit returns 429 Too Many Requests. Well-designed APIs also return headers like X-RateLimit-Limit, X-RateLimit-Remaining, and Retry-After so clients can back off gracefully instead of hammering the server blindly.
Distributed Design
At scale, limits are often stored in Redis or another shared counter store because local memory is not enough across many instances. Redis atomic operations like INCR and EXPIRE make it efficient to increment counters and set TTLs in a single round trip.
Soft vs Hard Limits
Hard limits reject requests immediately once the quota is exceeded. Soft limits allow a small burst over the threshold before enforcing, giving legitimate clients some slack during traffic spikes without fully opening the floodgates.
Rate Limiting vs Throttling
Rate limiting enforces a maximum number of requests over a time window and rejects excess requests outright. Throttling slows down the processing of excess requests rather than rejecting them, useful when degraded service is preferable to complete denial.
Client Backoff Strategies
Good API clients implement exponential backoff with jitter when they receive a 429. This prevents thundering herd problems where many clients retry simultaneously and immediately overwhelm the server again after a brief cooldown.
Abuse Detection and Blocking
Rate limiting can be combined with anomaly detection to identify and block abusive clients beyond simple quota enforcement. Patterns like sudden spikes, repeated auth failures, or unusual geographic distribution can trigger temporary or permanent bans.
Rate Limiting at the Gateway
Centralizing rate limiting at the API gateway keeps the logic out of individual services. The gateway tracks counters per client and route, applies the correct policy, and rejects requests before they consume any downstream resources.
Interview Tip
If the API is public, proactively mention token bucket with Redis. Also bring up the Retry-After header and client backoff to show you have thought about the full request lifecycle, not just the server side enforcement.