Qwen · Rate Limits
Qwen Rate Limits
Alibaba Cloud Model Studio enforces per-model RPM/TPM and concurrent-task quotas. Limits are configurable per workspace and visible in the console; defaults vary by model and tier.
Qwen Rate Limits is the machine-readable rate-limit profile for Qwen on the APIs.io network, conforming to the API Commons Rate Limits specification.
It captures 3 rate-limit definitions, measuring requests-per-minute, tokens-per-minute, and concurrent-tasks.
The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.
Tagged areas include AI, LLM, Alibaba, and Rate Limiting.
3 Limits
Throttle: 429
AILLMAlibabaRate Limiting
Limits
Default RPM account
see workspace console
Per-model defaults; varies by Qwen variant.
Default TPM account
see workspace console
Concurrent Tasks account
see workspace console
Image/video generation has separate concurrency caps.
Policies
Backoff Strategy
Exponential backoff with jitter; honor Retry-After.
Limit Increase
Submit a quota request via Alibaba Cloud console for higher RPM/TPM.