Skip to main content
This page explains the throughput and throttling constraints you should pay attention to when calling the API.

What to check

  • request rate per time window
  • concurrency limits
  • whether different models or capabilities have separate quotas
  • whether free, test, and production environments differ

Engineering guidance

  • centralize retry, backoff, and circuit breaking on the server
  • use caching or queueing for high-frequency flows
  • separate business traffic spikes from model invocation spikes

Suggested debugging order

  1. confirm whether you hit a platform-level throttle
  2. confirm whether the specific model or capability has its own rate cap
  3. inspect whether the client is retrying or resubmitting unexpectedly