Qwen · Rate Limits

Qwen Rate Limits

Name: Qwen Rate Limits
Creator: Qwen
Keywords: AI, LLM, Alibaba, Rate Limiting

Alibaba Cloud Model Studio enforces per-model RPM/TPM and concurrent-task quotas. Limits are configurable per workspace and visible in the console; defaults vary by model and tier.

Qwen Rate Limits is the machine-readable rate-limit profile for Qwen on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 3 rate-limit definitions, measuring requests-per-minute, tokens-per-minute, and concurrent-tasks.

The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include AI, LLM, Alibaba, and Rate Limiting.

3 Limits Throttle: 429

AILLMAlibabaRate Limiting

Limits

Default RPM account

requests-per-minute

see workspace console

Per-model defaults; varies by Qwen variant.

Default TPM account

tokens-per-minute

see workspace console

Concurrent Tasks account

concurrent-tasks

see workspace console

Image/video generation has separate concurrency caps.

Policies

Backoff Strategy

Exponential backoff with jitter; honor Retry-After.

Limit Increase

Submit a quota request via Alibaba Cloud console for higher RPM/TPM.

Sources

https://www.alibabacloud.com/help/en/model-studio