Cerebras · Rate Limits

Cerebras Rate Limits

Scaffolded rate limit definitions for the Cerebras API surface. Captures per-tier quotas, burst behavior, response signaling, and recovery semantics. Defaults are scaffold values to be replaced with published provider limits.

Cerebras Rate Limits is the machine-readable rate-limit profile for Cerebras on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 2 rate-limit definitions, across the free and pro tiers, measuring requests_per_minute.

The profile also includes response codes documented for throttled, quotaExceeded, and serviceUnavailable.

Tagged areas include AI Inference, Large Language Models, Wafer Scale, Hardware, and Cloud.

2 Limits Throttle: 429 Quota: 429
AI InferenceLarge Language ModelsWafer ScaleHardwareCloudOpenAI CompatibleLLMSDKAcceleratorHigh Performance ComputingRate LimitingQuotasThrottling

Limits

Free Tier Default api-key
requests_per_minute · minute
10
Pro Tier Default api-key
requests_per_minute · minute
120