Beam Cloud Rate Limits
Beam governs workloads primarily through container concurrency limits rather than classic per-minute request quotas. Each usage tier caps the number of GPU and CPU containers that may run simultaneously (Developer 5 GPU / 30 CPU, Team 50 GPU / 1,000 CPU, Growth custom / unlimited), and the platform autoscales deployments up to those ceilings. Synchronous web endpoints are additionally bound by an invocation time limit of roughly 180 seconds, beyond which work should move to asynchronous task queues.
Beam Cloud Rate Limits is the machine-readable rate-limit profile for Beam on the APIs.io network, conforming to the API Commons Rate Limits specification.
It captures 4 rate-limit definitions, measuring containers and seconds.
The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.
Tagged areas include Serverless, GPU, Python, Inference, and Containers.