Zhipu AI · Rate Limits

Zhipu Ai Rate Limits

Name: Zhipu Ai Rate Limits
Creator: Zhipu AI
Keywords: AI, LLM, GLM, Rate Limiting, Quotas

Z.ai applies model-specific concurrency to API users with prepaid balance. GLM Coding Plan subscribers have 5-hour and weekly usage windows with peak/off-peak quota multipliers.

Zhipu Ai Rate Limits is the machine-readable rate-limit profile for Zhipu AI on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 3 rate-limit definitions, measuring concurrent-requests, window, and quota-multiplier.

The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include AI, LLM, GLM, Rate Limiting, and Quotas.

3 Limits Throttle: 429

AILLMGLMRate LimitingQuotas

Limits

API Default Concurrency account

concurrent-requests

Default per-model concurrency for prepaid API users; varies by package.

GLM Coding Plan Window account

window

5-hour and weekly

Usage allowances reset on rolling 5-hour and weekly windows.

Peak Hours Multiplier account

quota-multiplier

2-3x

Peak (14:00-18:00 UTC+8) consumes quota at 2-3x the off-peak rate for GLM-5.1 / GLM-5-Turbo.

Policies

Backoff Strategy

Exponential backoff with jitter; honor Retry-After headers.

Concurrency Upgrade

Higher concurrency requires contacting Z.ai sales or upgrading plan tier.

Zhipu Ai Rate Limits

Limits

Policies

Sources