Cerebrium · Rate Limits

Cerebrium Rate Limits

Cerebrium does not throttle deployed function endpoints with classic per-minute request quotas; instead, throughput is governed by autoscaling GPU/CPU concurrency limits per app and per account plan tier. The number of concurrent replicas an app can scale to is configured per deployment (min/max instances) and capped by the plan tier, with Enterprise offering unlimited GPU concurrency. Async runs are bounded by a maximum execution window (up to 12 hours) and a configurable response grace period. Specific per-account concurrency caps are not reconciled in this artifact.

Cerebrium Rate Limits is the machine-readable rate-limit profile for Cerebrium on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 4 rate-limit definitions, measuring instances, gpu_instances, and seconds.

The profile also includes 3 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include AI, GPU, Serverless, Inference, and ML Infrastructure.

4 Limits Throttle: 429
AIGPUServerlessInferenceML InfrastructureRate LimitingQuotasThrottling

Limits

Concurrent Replicas (per app) app
instances
see provider documentation
Configured via min/max instances in cerebrium.toml; capped by plan tier.
GPU Concurrency (per account) account
gpu_instances
see provider documentation
Bounded by plan tier; Enterprise offers unlimited GPU concurrency.
Async Execution Window request
seconds
up to 12 hours
Async runs are bounded by response_grace_period (default 15 minutes, up to 12 hours).
Cold Start / Scale-to-Zero app
instances
scales to zero when idle
Apps scale down to zero idle instances; cold-start latency applies on first request after idle.

Policies

Tiered Concurrency
Concurrency caps raise as accounts move from Hobby to Standard to Enterprise (unlimited GPU concurrency).
Autoscaling
Apps autoscale replicas between configured min and max based on incoming traffic.
Backoff Strategy
Clients should implement exponential backoff with jitter and honor Retry-After on any throttling responses.

Sources