HLDintermediate

Rate Limiting

Rate limiting protects services from abuse, overload, and unfair usage by restricting how many requests a client can make over time. It is especially important for public APIs, login flows, and expensive endpoints.

Reading time

9 min

rate limitingredissecurityapi

Why Rate Limiting Matters

Without rate limiting, one user or bot can consume disproportionate capacity, hurt latency for everyone else, or brute-force sensitive endpoints. It is a foundational protection for both availability and security.

Common Algorithms

  • Fixed window: counts requests in a fixed time slot, simple but vulnerable to burst at window edges
  • Sliding window: tracks a rolling time window per client, smoother and more accurate than fixed
  • Token bucket: clients accumulate tokens over time and spend one per request, allows controlled bursting
  • Leaky bucket: requests are processed at a constant rate regardless of arrival pattern, smooths traffic

Where It Is Applied

  • Per IP
  • Per user
  • Per API key
  • Per tenant or subscription plan
  • Per endpoint or route

HTTP Response Codes

A request that exceeds the limit returns 429 Too Many Requests. Well-designed APIs also return headers like X-RateLimit-Limit, X-RateLimit-Remaining, and Retry-After so clients can back off gracefully instead of hammering the server blindly.

Distributed Design

At scale, limits are often stored in Redis or another shared counter store because local memory is not enough across many instances. Redis atomic operations like INCR and EXPIRE make it efficient to increment counters and set TTLs in a single round trip.

Soft vs Hard Limits

Hard limits reject requests immediately once the quota is exceeded. Soft limits allow a small burst over the threshold before enforcing, giving legitimate clients some slack during traffic spikes without fully opening the floodgates.

Rate Limiting vs Throttling

Rate limiting enforces a maximum number of requests over a time window and rejects excess requests outright. Throttling slows down the processing of excess requests rather than rejecting them, useful when degraded service is preferable to complete denial.

Client Backoff Strategies

Good API clients implement exponential backoff with jitter when they receive a 429. This prevents thundering herd problems where many clients retry simultaneously and immediately overwhelm the server again after a brief cooldown.

Abuse Detection and Blocking

Rate limiting can be combined with anomaly detection to identify and block abusive clients beyond simple quota enforcement. Patterns like sudden spikes, repeated auth failures, or unusual geographic distribution can trigger temporary or permanent bans.

Rate Limiting at the Gateway

Centralizing rate limiting at the API gateway keeps the logic out of individual services. The gateway tracks counters per client and route, applies the correct policy, and rejects requests before they consume any downstream resources.

Interview Tip

If the API is public, proactively mention token bucket with Redis. Also bring up the Retry-After header and client backoff to show you have thought about the full request lifecycle, not just the server side enforcement.