Deepseek Rate Limits
DeepSeek does not publish fixed numerical rate limits. Instead, the API dynamically caps user concurrency based on current server load and returns HTTP 429 once a caller's concurrency ceiling is reached. There is no hard requests-per-minute or tokens-per-minute cap published on the public docs; sustained throughput is therefore best-effort and varies in real time. Inference connections that have not begun streaming within ten minutes of being accepted are closed by the server.
Deepseek Rate Limits is the machine-readable rate-limit profile for DeepSeek on the APIs.io network, conforming to the API Commons Rate Limits specification.
It captures 2 rate-limit definitions, measuring concurrent_requests and seconds_to_first_token.
The profile also includes 4 backoff/retry policies defined and response codes documented for throttled and serviceUnavailable.
Tagged areas include AI, Artificial Intelligence, Chat, LLM, and Large Language Models.