Skip to content

Rate Limits

Overview

The gateway enforces limits per API key in three layers:

  1. Burst limit over a 10-second window
  2. Sustained requests-per-minute limit
  3. Daily and monthly cost budgets

The current public KB query endpoint records 1 cent of estimated cost per successful request.

Tiers

Tier Burst (10s) RPM Daily Budget Monthly Budget
Free 20 60 500 cents 2,500 cents
Professional 60 300 5,000 cents 25,000 cents
Enterprise 200 1,000 50,000 cents 250,000 cents

Budget windows are tracked in UTC:

  • daily budgets reset at the next UTC midnight
  • monthly budgets reset on the first day of the next UTC month

429 Responses

When enforcement blocks a request, the API returns 429 Too Many Requests and includes these headers:

Header Meaning
Retry-After Seconds until the caller should retry
X-RateLimit-Limit Limit for the exhausted window
X-RateLimit-Remaining Always 0 on the blocked response
X-RateLimit-Reset Unix timestamp for the reset point

Example burst-limit response:

{
  "detail": {
    "error": "rate_limit_exceeded",
    "type": "burst",
    "retry_after": 10,
    "limit": 20,
    "window_seconds": 10
  }
}

Daily budget exhaustion:

{
  "detail": {
    "error": "daily_budget_exceeded",
    "type": "cost",
    "retry_after": 86399,
    "daily_limit_cents": 500,
    "current_cost_cents": 500
  }
}

Success Responses

Successful requests currently return X-Request-Id, but they do not yet expose remaining quota headers. Build integrations around 429 handling rather than assuming success-side rate-limit headers are present.

Fail-Open Behavior

If Upstash Redis is disabled, unconfigured, or temporarily unavailable, the gateway degrades to allow-all instead of blocking traffic. Usage accounting still attempts to write the daily usage row in PostgreSQL independently.

Retry Guidance

Use Retry-After exactly for burst and sustained throttling. For budget exhaustion, back off until the reset window instead of retrying in a tight loop.