TL;DR
Implement API rate limiting using token bucket or sliding window algorithms. Return standard IETF rate limit headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset) and 429 Too Many Requests when limits are exceeded. Modern PetstoreAPI implements rate limiting with per-user quotas and clear error responses.
Introduction
A client makes 10,000 requests to your API in one minute. Your database crashes. Your monitoring alerts fire. Your other customers can’t access the API. You’re under attack—or maybe just dealing with a buggy client in a retry loop.
Rate limiting prevents this. It caps how many requests a client can make in a time window. When they exceed the limit, you return 429 Too Many Requests. The client backs off, and your API stays healthy.
The old Swagger Petstore doesn’t implement rate limiting at all. Modern PetstoreAPI implements rate limiting with standard IETF headers, per-user quotas, and clear error responses.
In this guide, you’ll learn rate limiting algorithms, standard headers, and how Modern PetstoreAPI implements rate limiting correctly.
Why APIs Need Rate Limiting
Rate limiting protects your API from abuse and ensures fair usage.
Protection Against Abuse
1. Denial-of-Service (DoS) attacks
An attacker floods your API with requests to make it unavailable. Rate limiting caps their impact.
2. Credential stuffing
Attackers try thousands of username/password combinations. Rate limiting slows them down.
3. Data scraping
Bots scrape your entire dataset. Rate limiting makes scraping impractical.
4. Cost control
If your API calls expensive services (AI models, third-party APIs), rate limiting prevents runaway costs.
Fair Usage
1. Prevent one client from monopolizing resources
Without rate limiting, one client making 1000 req/sec can starve other clients.
2. Predictable performance
Rate limiting ensures consistent response times for all clients.
3. Tiered access
Free tier: 100 req/hour. Paid tier: 10,000 req/hour. Rate limiting enforces these tiers.
Operational Benefits
1. Capacity planning
You know the maximum load your API will handle.
2. Cost predictability
Rate limits cap infrastructure costs.
3. Graceful degradation
Under load, rate limiting prevents cascading failures.
Rate Limiting Algorithms
Different algorithms have different tradeoffs.
1. Fixed Window
Count requests in fixed time windows.
How it works:
Window 1 (00:00-00:59): 100 requests allowed
Window 2 (01:00-01:59): 100 requests allowed
Implementation:
def is_allowed(user_id):
current_minute = get_current_minute()
key = f"rate_limit:{user_id}:{current_minute}"
count = redis.incr(key)
redis.expire(key, 60)
return count <= 100
Pros:
- Simple to implement
- Low memory usage
Cons:
- Burst problem: Client can make 100 requests at 00:59 and 100 at 01:00 (200 in 2 seconds)
2. Sliding Window
Count requests in a rolling time window.
How it works:
At 01:30, count requests from 00:30 to 01:30 (last 60 minutes).
Implementation:
def is_allowed(user_id):
now = time.time()
window_start = now - 3600 # 1 hour ago
key = f"rate_limit:{user_id}"
# Remove old requests
redis.zremrangebyscore(key, 0, window_start)
# Count requests in window
count = redis.zcard(key)
if count < 100:
redis.zadd(key, {now: now})
redis.expire(key, 3600)
return True
return False
Pros:
- No burst problem
- Accurate rate limiting
Cons:
- Higher memory usage (stores timestamp for each request)
- More complex
3. Token Bucket
Tokens are added to a bucket at a fixed rate. Each request consumes a token.
How it works:
Bucket capacity: 100 tokens
Refill rate: 10 tokens/second
Request: consumes 1 token
Implementation:
def is_allowed(user_id):
now = time.time()
key = f"rate_limit:{user_id}"
# Get current state
data = redis.hgetall(key)
tokens = float(data.get('tokens', 100))
last_refill = float(data.get('last_refill', now))
# Refill tokens
elapsed = now - last_refill
tokens = min(100, tokens + elapsed * 10) # 10 tokens/sec
if tokens >= 1:
tokens -= 1
redis.hset(key, 'tokens', tokens)
redis.hset(key, 'last_refill', now)
redis.expire(key, 3600)
return True
return False
Pros:
- Allows bursts (up to bucket capacity)
- Smooth rate limiting
- Industry standard
Cons:
- More complex than fixed window
- Requires storing state
4. Leaky Bucket
Requests are added to a queue and processed at a fixed rate.
How it works:
Queue capacity: 100 requests
Process rate: 10 requests/second
Pros:
- Smooth output rate
- Good for protecting downstream services
Cons:
- Adds latency (requests wait in queue)
- Complex to implement
Which Algorithm to Use?
For most APIs: Token Bucket
It’s the industry standard, allows reasonable bursts, and provides smooth rate limiting.
Modern PetstoreAPI uses token bucket with per-user quotas.
Standard Rate Limit Headers
Use IETF standard headers (draft-ietf-httpapi-ratelimit-headers).
Standard Headers
RateLimit-Limit: Maximum requests allowed in the time window
RateLimit-Limit: 100
RateLimit-Remaining: Requests remaining in current window
RateLimit-Remaining: 45
RateLimit-Reset: Seconds until the rate limit resets
RateLimit-Reset: 3600
Example Response
GET /pets
200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 99
RateLimit-Reset: 3600
{
"data": [...]
}
Legacy Headers (Deprecated)
Many APIs use non-standard headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 99
X-RateLimit-Reset: 1710331200
Don’t use these. The X- prefix is deprecated, and the format isn’t standardized.
How Modern PetstoreAPI Implements Rate Limiting
Modern PetstoreAPI implements token bucket rate limiting with standard headers.
Rate Limits by Tier
Free tier:
- 100 requests/hour
- 1,000 requests/day
Pro tier:
- 10,000 requests/hour
- 100,000 requests/day
Enterprise tier:
- Custom limits
Implementation
Successful request:
GET /v1/pets
200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 99
RateLimit-Reset: 3540
{
"data": [...]
}
Rate limit exceeded:
GET /v1/pets
429 Too Many Requests
Content-Type: application/problem+json
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 120
Retry-After: 120
{
"type": "https://petstoreapi.com/errors/rate-limit-exceeded",
"title": "Rate Limit Exceeded",
"status": 429,
"detail": "You have exceeded the rate limit of 100 requests per hour",
"instance": "/v1/pets",
"retryAfter": 120,
"limit": 100,
"window": "1h"
}
Per-User vs Per-IP
Per-user (authenticated requests):
Rate limit by user ID or API key. More accurate and fair.
user_id = get_authenticated_user()
is_allowed(user_id)
Per-IP (unauthenticated requests):
Rate limit by IP address. Less accurate (shared IPs, VPNs) but better than nothing.
ip_address = request.remote_addr
is_allowed(ip_address)
Modern PetstoreAPI uses per-user rate limiting for authenticated requests and per-IP for public endpoints.
Rate Limit Response Format
When rate limits are exceeded, return 429 with RFC 9457 error format.
Response Structure
{
"type": "https://petstoreapi.com/errors/rate-limit-exceeded",
"title": "Rate Limit Exceeded",
"status": 429,
"detail": "You have exceeded your rate limit. Please try again later.",
"instance": "/v1/pets",
"retryAfter": 120,
"limit": 100,
"remaining": 0,
"reset": 120,
"window": "1h"
}
Headers
429 Too Many Requests
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 120
Retry-After: 120
Retry-After: Tells clients when to retry (in seconds).
Testing Rate Limits with Apidog
Apidog helps you test rate limiting behavior.
Test Scenarios
1. Normal usage:
Send 50 requests → All succeed
Check RateLimit-Remaining decreases
2. Exceed limit:
Send 101 requests → 101st returns 429
Verify error response format
Check Retry-After header
3. Reset behavior:
Exceed limit → Wait for reset → Verify limit restored
4. Different tiers:
Test free tier (100/hour)
Test pro tier (10,000/hour)
Verify limits are enforced correctly
Apidog Test Example
// Test rate limit headers
pm.test("Rate limit headers present", () => {
pm.response.to.have.header("RateLimit-Limit");
pm.response.to.have.header("RateLimit-Remaining");
pm.response.to.have.header("RateLimit-Reset");
});
// Test rate limit exceeded
pm.test("Returns 429 when limit exceeded", () => {
// Make 101 requests
for (let i = 0; i < 101; i++) {
pm.sendRequest("GET /v1/pets");
}
pm.response.to.have.status(429);
});
Rate Limiting Best Practices
1. Use standard headers
Use IETF standard headers, not custom X- headers.
2. Return 429, not 403
429 means “too many requests.” 403 means “forbidden.” Don’t confuse them.
3. Include Retry-After
Tell clients when they can retry.
4. Document your limits
Make rate limits visible in documentation.
5. Provide different tiers
Free tier: low limits. Paid tier: higher limits.
6. Rate limit by user, not IP
Per-user limits are more accurate and fair.
7. Allow bursts
Token bucket allows reasonable bursts without penalizing normal usage.
8. Monitor rate limit hits
Track how often clients hit rate limits. High rates indicate problems.
9. Provide rate limit status endpoint
GET /v1/rate-limit
200 OK
{
"limit": 100,
"remaining": 45,
"reset": 3540
}
10. Test rate limiting
Use Apidog to test rate limit behavior before deployment.
Conclusion
Rate limiting protects your API from abuse and ensures fair usage. Use token bucket algorithm with standard IETF headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset). Return 429 Too Many Requests with RFC 9457 error format when limits are exceeded.
Modern PetstoreAPI implements rate limiting correctly with per-user quotas, standard headers, and clear error responses. Check the documentation for implementation details.
Test your rate limiting with Apidog to ensure it works correctly under load and handles edge cases properly.
FAQ
What rate limits should I set?
Start conservative: 100 requests/hour for free tier, 10,000/hour for paid. Adjust based on usage patterns and infrastructure capacity.
Should I rate limit by IP or user?
Rate limit by user (API key) for authenticated requests. Use IP-based rate limiting only for public endpoints.
What happens if a client exceeds the rate limit?
Return 429 Too Many Requests with Retry-After header. Don’t block the client permanently—let them retry after the window resets.
How do I handle rate limits for webhooks?
Webhooks are server-to-server, so rate limits should be higher. Consider separate limits for webhooks vs API calls.
Should I rate limit internal services?
Yes, but with much higher limits. Rate limiting prevents cascading failures even in internal systems.
How do I test rate limiting?
Use Apidog to send multiple requests and verify 429 responses, rate limit headers, and reset behavior.
What if my API is behind a CDN?
CDN caching reduces load, but you still need rate limiting for cache misses and POST/PUT/DELETE requests.
How do I implement rate limiting across multiple servers?
Use a shared data store (Redis, Memcached) to track rate limits across all servers. Don’t use local memory—it won’t work in distributed systems.



