Beam · Rate Limits

Beam Cloud Rate Limits

Beam governs workloads primarily through container concurrency limits rather than classic per-minute request quotas. Each usage tier caps the number of GPU and CPU containers that may run simultaneously (Developer 5 GPU / 30 CPU, Team 50 GPU / 1,000 CPU, Growth custom / unlimited), and the platform autoscales deployments up to those ceilings. Synchronous web endpoints are additionally bound by an invocation time limit of roughly 180 seconds, beyond which work should move to asynchronous task queues.

Beam Cloud Rate Limits is the machine-readable rate-limit profile for Beam on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 4 rate-limit definitions, measuring containers and seconds.

The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include Serverless, GPU, Python, Inference, and Containers.

4 Limits Throttle: 429
ServerlessGPUPythonInferenceContainersRate LimitingQuotasThrottling

Limits

GPU Container Concurrency account
containers
5 (Developer) / 50 (Team) / custom (Growth)
Maximum number of GPU containers running concurrently per tier.
CPU Container Concurrency account
containers
30 (Developer) / 1000 (Team) / unlimited (Growth)
Maximum number of CPU containers running concurrently per tier.
Synchronous Endpoint Timeout endpoint
seconds
~180
Web endpoints target synchronous work under ~180 seconds; longer work belongs in task queues.
Autoscaling deployment
containers
up to tier concurrency ceiling
Deployments scale out per concurrent inputs up to the account's GPU/CPU concurrency limit.

Policies

Concurrency-Based Throttling
New invocations queue or scale containers up to the tier ceiling rather than being rejected outright.
Backoff Strategy
Clients should implement exponential backoff with jitter and honor Retry-After on 429 responses.

Sources