Benefits
- prevent DoS attacks, either intentional or unintentional
- cost: less cost allocated to servers. Also save on thrid party API calls
High Level Design
As cloud microservices have become popular, they are all behind a component called API gateway. It supports rate limiting, SSL termination, serving static content, etc.
Rate Limiting Algorithms
Token bucket algorithm
Each second, tokens are added to a bucket at token refill rate. Bucket can only have bucket capacity number of tokens, rest will overflow
Each network request will take a token and the proceed. If the bucket is empty the request is dropped.
Bucket is per granularity of rate limiting. If it is something like 5 posts per second for a user, the bucket will be separate for each user. It can also be per IP address.
Pros: simple to implement, allows burst of requests, less memory consumption Cons: might be hard to tune the 2 parameters.
Leaking Bucket Algorithm
Requests are added to a queue of fixed size. It is processed at a specified rate. If the queue is full, requests are dropped.
It is good when we need a stable inflow rate to service. Also memory efficient
Cons: Burst of traffic will fill up the queue with old requests.
Fixed window counter algo
Time is divided into fixed windows. Each window allows some number of requests. If a request arrives and the quota is finished, it is dropped.
One issue is that across the boundary of windows, more requests can pass through. Consider boundary of 1 minute. Just ±5 seconds of the boundary, 2n
requests can pass where in each window we allow n
requests.
Sliding window log algo
Store sorted timestamps of request arrival in a log. When a request arrives, its timestamp is inserted. Outdated timestamps (outside the window) are removed. If the log size is smaller or equal than the allowed count, then the request is allowed else it is rejected. Ll
Sliding Window Counter algo
Divide time into windows of minutes. Suppose a request comes after 30% of a window (i.e. 0.3 seconds into a window). Then take 70% times number of requests in previous window + 30% times number of requests in current window. If the number is less than requests allowed per minute, the request is allowed to go through.
This smoothes out the traffic. It has some assumptions of even traffic distribution in a window but that doesn’t have much impact as per studies.
High Level Design
Counters need to be stored. In memory database like Redis is used as disk access in traditional db is slow.
Distributed Environment
We might need multiple load balancers to handle the load. Since our compute is stateless and we don’t want to use sticky sessions, using a central data store solves the issue
Wrap up
- Hard rate limiting means requests cannot exceed the threshold. Soft means they can for a short period
- can also do rate limiting at IP layer using iptables