Internet-scale distributed systems implement an API rate limiter for high availability and security. Let’s break down the design components of this system.

Requirements

Data storage

Database schema

The NoSQL document store such as MongoDB is used to store the rate limiting rules and the throttled data

Type of data store

High-level design

Throttling type and algorithms

Workflow

  1. The client creates an HTTP connection to the web server
  2. The web server forwards the request to the rate limiter service
  3. The rate limiter service queries the rules cache to check the rate limit rule for the requested API endpoint
  4. The read replicas of rules NoSQL data store are queried on a cache miss to identify the rate limit rule
  5. The distributed lock such as Redis lock is used to handle concurrency when the same user makes multiple requests at the same time in a distributed system
  6. The rate limiter service queries the throttle cache to verify if the throttle threshold is exceeded
  7. The throttle cache uses a write-behind (write-back) cache pattern by storing the throttle data on the message queue to improve the latency
  8. The throttle sync service executes a batch operation to store the throttle data on the message queue to the NoSQL document store
  9. The NoSQL document stores persist the throttle data for fault tolerance, long-living rate limits, analytics, and auditing
  10. The dropped requests (throttle threshold exceeded) are stored on the message queue for analytics and auditing
  11. If the client request is not exceeding the throttle limit, the rate limiter delegates the client request to the API server
  12. LRU cache eviction is used for cache servers
  13. HTTP response headers indicate the relevant throttle limit data
  14. Consistent hashing is used to redirect the request from a user to the same subset of servers

References


Also published here.

Featured image source.