How Should You Implement API Rate Limiting?

API rate limiting prevents abuse and ensures fair usage. Learn token bucket, sliding window, and how Modern PetstoreAPI implements rate limiting with standard IETF headers.

Ashley Innocent

Ashley Innocent

13 March 2026

How Should You Implement API Rate Limiting?

Apidog for Enterprise

On-Premises Deploy

SSO & RBAC

SOC 2 Compliant

Explore Apidog Enterprise

TL;DR

Implement API rate limiting using token bucket or sliding window algorithms. Return standard IETF rate limit headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset) and 429 Too Many Requests when limits are exceeded. Modern PetstoreAPI implements rate limiting with per-user quotas and clear error responses.

Introduction

A client makes 10,000 requests to your API in one minute. Your database crashes. Your monitoring alerts fire. Your other customers can’t access the API. You’re under attack—or maybe just dealing with a buggy client in a retry loop.

Rate limiting prevents this. It caps how many requests a client can make in a time window. When they exceed the limit, you return 429 Too Many Requests. The client backs off, and your API stays healthy.

The old Swagger Petstore doesn’t implement rate limiting at all. Modern PetstoreAPI implements rate limiting with standard IETF headers, per-user quotas, and clear error responses.

💡
If you’re building or testing REST APIs, Apidog helps you test rate limiting behavior, validate rate limit headers, and ensure your API handles excessive requests correctly. You can simulate high-volume scenarios and verify rate limit responses.
button

In this guide, you’ll learn rate limiting algorithms, standard headers, and how Modern PetstoreAPI implements rate limiting correctly.

Why APIs Need Rate Limiting

Rate limiting protects your API from abuse and ensures fair usage.

Protection Against Abuse

1. Denial-of-Service (DoS) attacks

An attacker floods your API with requests to make it unavailable. Rate limiting caps their impact.

2. Credential stuffing

Attackers try thousands of username/password combinations. Rate limiting slows them down.

3. Data scraping

Bots scrape your entire dataset. Rate limiting makes scraping impractical.

4. Cost control

If your API calls expensive services (AI models, third-party APIs), rate limiting prevents runaway costs.

Fair Usage

1. Prevent one client from monopolizing resources

Without rate limiting, one client making 1000 req/sec can starve other clients.

2. Predictable performance

Rate limiting ensures consistent response times for all clients.

3. Tiered access

Free tier: 100 req/hour. Paid tier: 10,000 req/hour. Rate limiting enforces these tiers.

Operational Benefits

1. Capacity planning

You know the maximum load your API will handle.

2. Cost predictability

Rate limits cap infrastructure costs.

3. Graceful degradation

Under load, rate limiting prevents cascading failures.

Rate Limiting Algorithms

Different algorithms have different tradeoffs.

1. Fixed Window

Count requests in fixed time windows.

How it works:

Window 1 (00:00-00:59): 100 requests allowed
Window 2 (01:00-01:59): 100 requests allowed

Implementation:

def is_allowed(user_id):
    current_minute = get_current_minute()
    key = f"rate_limit:{user_id}:{current_minute}"
    count = redis.incr(key)
    redis.expire(key, 60)
    return count <= 100

Pros:

Cons:

2. Sliding Window

Count requests in a rolling time window.

How it works:

At 01:30, count requests from 00:30 to 01:30 (last 60 minutes).

Implementation:

def is_allowed(user_id):
    now = time.time()
    window_start = now - 3600  # 1 hour ago
    key = f"rate_limit:{user_id}"

    # Remove old requests
    redis.zremrangebyscore(key, 0, window_start)

    # Count requests in window
    count = redis.zcard(key)

    if count < 100:
        redis.zadd(key, {now: now})
        redis.expire(key, 3600)
        return True
    return False

Pros:

Cons:

3. Token Bucket

Tokens are added to a bucket at a fixed rate. Each request consumes a token.

How it works:

Bucket capacity: 100 tokens
Refill rate: 10 tokens/second
Request: consumes 1 token

Implementation:

def is_allowed(user_id):
    now = time.time()
    key = f"rate_limit:{user_id}"

    # Get current state
    data = redis.hgetall(key)
    tokens = float(data.get('tokens', 100))
    last_refill = float(data.get('last_refill', now))

    # Refill tokens
    elapsed = now - last_refill
    tokens = min(100, tokens + elapsed * 10)  # 10 tokens/sec

    if tokens >= 1:
        tokens -= 1
        redis.hset(key, 'tokens', tokens)
        redis.hset(key, 'last_refill', now)
        redis.expire(key, 3600)
        return True
    return False

Pros:

Cons:

4. Leaky Bucket

Requests are added to a queue and processed at a fixed rate.

How it works:

Queue capacity: 100 requests
Process rate: 10 requests/second

Pros:

Cons:

Which Algorithm to Use?

For most APIs: Token Bucket

It’s the industry standard, allows reasonable bursts, and provides smooth rate limiting.

Modern PetstoreAPI uses token bucket with per-user quotas.

Standard Rate Limit Headers

Use IETF standard headers (draft-ietf-httpapi-ratelimit-headers).

Standard Headers

RateLimit-Limit: Maximum requests allowed in the time window

RateLimit-Limit: 100

RateLimit-Remaining: Requests remaining in current window

RateLimit-Remaining: 45

RateLimit-Reset: Seconds until the rate limit resets

RateLimit-Reset: 3600

Example Response

GET /pets
200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 99
RateLimit-Reset: 3600

{
  "data": [...]
}

Legacy Headers (Deprecated)

Many APIs use non-standard headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 99
X-RateLimit-Reset: 1710331200

Don’t use these. The X- prefix is deprecated, and the format isn’t standardized.

How Modern PetstoreAPI Implements Rate Limiting

Modern PetstoreAPI implements token bucket rate limiting with standard headers.

Rate Limits by Tier

Free tier:

Pro tier:

Enterprise tier:

Implementation

Successful request:

GET /v1/pets
200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 99
RateLimit-Reset: 3540

{
  "data": [...]
}

Rate limit exceeded:

GET /v1/pets
429 Too Many Requests
Content-Type: application/problem+json
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 120
Retry-After: 120

{
  "type": "https://petstoreapi.com/errors/rate-limit-exceeded",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "You have exceeded the rate limit of 100 requests per hour",
  "instance": "/v1/pets",
  "retryAfter": 120,
  "limit": 100,
  "window": "1h"
}

Per-User vs Per-IP

Per-user (authenticated requests):

Rate limit by user ID or API key. More accurate and fair.

user_id = get_authenticated_user()
is_allowed(user_id)

Per-IP (unauthenticated requests):

Rate limit by IP address. Less accurate (shared IPs, VPNs) but better than nothing.

ip_address = request.remote_addr
is_allowed(ip_address)

Modern PetstoreAPI uses per-user rate limiting for authenticated requests and per-IP for public endpoints.

Rate Limit Response Format

When rate limits are exceeded, return 429 with RFC 9457 error format.

Response Structure

{
  "type": "https://petstoreapi.com/errors/rate-limit-exceeded",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "You have exceeded your rate limit. Please try again later.",
  "instance": "/v1/pets",
  "retryAfter": 120,
  "limit": 100,
  "remaining": 0,
  "reset": 120,
  "window": "1h"
}

Headers

429 Too Many Requests
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 120
Retry-After: 120

Retry-After: Tells clients when to retry (in seconds).

Testing Rate Limits with Apidog

Apidog helps you test rate limiting behavior.

Test Scenarios

1. Normal usage:

Send 50 requests → All succeed
Check RateLimit-Remaining decreases

2. Exceed limit:

Send 101 requests → 101st returns 429
Verify error response format
Check Retry-After header

3. Reset behavior:

Exceed limit → Wait for reset → Verify limit restored

4. Different tiers:

Test free tier (100/hour)
Test pro tier (10,000/hour)
Verify limits are enforced correctly

Apidog Test Example

// Test rate limit headers
pm.test("Rate limit headers present", () => {
  pm.response.to.have.header("RateLimit-Limit");
  pm.response.to.have.header("RateLimit-Remaining");
  pm.response.to.have.header("RateLimit-Reset");
});

// Test rate limit exceeded
pm.test("Returns 429 when limit exceeded", () => {
  // Make 101 requests
  for (let i = 0; i < 101; i++) {
    pm.sendRequest("GET /v1/pets");
  }
  pm.response.to.have.status(429);
});

Rate Limiting Best Practices

1. Use standard headers

Use IETF standard headers, not custom X- headers.

2. Return 429, not 403

429 means “too many requests.” 403 means “forbidden.” Don’t confuse them.

3. Include Retry-After

Tell clients when they can retry.

4. Document your limits

Make rate limits visible in documentation.

5. Provide different tiers

Free tier: low limits. Paid tier: higher limits.

6. Rate limit by user, not IP

Per-user limits are more accurate and fair.

7. Allow bursts

Token bucket allows reasonable bursts without penalizing normal usage.

8. Monitor rate limit hits

Track how often clients hit rate limits. High rates indicate problems.

9. Provide rate limit status endpoint

GET /v1/rate-limit
200 OK
{
  "limit": 100,
  "remaining": 45,
  "reset": 3540
}

10. Test rate limiting

Use Apidog to test rate limit behavior before deployment.

Conclusion

Rate limiting protects your API from abuse and ensures fair usage. Use token bucket algorithm with standard IETF headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset). Return 429 Too Many Requests with RFC 9457 error format when limits are exceeded.

Modern PetstoreAPI implements rate limiting correctly with per-user quotas, standard headers, and clear error responses. Check the documentation for implementation details.

Test your rate limiting with Apidog to ensure it works correctly under load and handles edge cases properly.

button

FAQ

What rate limits should I set?

Start conservative: 100 requests/hour for free tier, 10,000/hour for paid. Adjust based on usage patterns and infrastructure capacity.

Should I rate limit by IP or user?

Rate limit by user (API key) for authenticated requests. Use IP-based rate limiting only for public endpoints.

What happens if a client exceeds the rate limit?

Return 429 Too Many Requests with Retry-After header. Don’t block the client permanently—let them retry after the window resets.

How do I handle rate limits for webhooks?

Webhooks are server-to-server, so rate limits should be higher. Consider separate limits for webhooks vs API calls.

Should I rate limit internal services?

Yes, but with much higher limits. Rate limiting prevents cascading failures even in internal systems.

How do I test rate limiting?

Use Apidog to send multiple requests and verify 429 responses, rate limit headers, and reset behavior.

What if my API is behind a CDN?

CDN caching reduces load, but you still need rate limiting for cache misses and POST/PUT/DELETE requests.

How do I implement rate limiting across multiple servers?

Use a shared data store (Redis, Memcached) to track rate limits across all servers. Don’t use local memory—it won’t work in distributed systems.

Explore more

Postman Collection Runner Restrictions: What Changed and How to Work Around It

Postman Collection Runner Restrictions: What Changed and How to Work Around It

Postman restricted Collection Runner on the free tier in 2026, breaking CI/CD workflows. Learn what changed, workarounds, and how Apidog's runner has no limits.

9 June 2026

How to Recover Postman Collections After Being Locked Out

How to Recover Postman Collections After Being Locked Out

Lost access to your Postman collections after the free plan change? Step-by-step recovery guide: local cache, API export, and migrating to Apidog safely.

9 June 2026

How to Share Postman Collections Without Upgrading to Team Plan

How to Share Postman Collections Without Upgrading to Team Plan

Share Postman collections on the free tier without paying $19/user/month. Export JSON, public workspaces, Git sync, and free Apidog collaboration explained.

9 June 2026

Practice API Design-first in Apidog

Discover an easier way to build and use APIs