Cerebras · Rate Limits

Cerebras Rate Limits

Name: Cerebras Rate Limits
Creator: Cerebras
Keywords: AI Inference, Large Language Models, Wafer Scale, Hardware, Cloud, OpenAI Compatible, LLM, SDK, Accelerator, High Performance Computing, Rate Limiting, Quotas, Throttling

Scaffolded rate limit definitions for the Cerebras API surface. Captures per-tier quotas, burst behavior, response signaling, and recovery semantics. Defaults are scaffold values to be replaced with published provider limits.

Cerebras Rate Limits is the machine-readable rate-limit profile for Cerebras on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 2 rate-limit definitions, across the free and pro tiers, measuring requests_per_minute.

The profile also includes response codes documented for throttled, quotaExceeded, and serviceUnavailable.

Tagged areas include AI Inference, Large Language Models, Wafer Scale, Hardware, and Cloud.

2 Limits Throttle: 429 Quota: 429

AI InferenceLarge Language ModelsWafer ScaleHardwareCloudOpenAI CompatibleLLMSDKAcceleratorHigh Performance ComputingRate LimitingQuotasThrottling

Limits

Free Tier Default api-key

requests_per_minute · minute

Pro Tier Default api-key

requests_per_minute · minute

120