Prime Intellect Rate Limits
Best-effort summary of Prime Intellect API rate-limiting and quota behavior. The public documentation does not publish a comprehensive table of per-tier RPM/TPM values; control-plane and inference APIs use bearer tokens with per-key quotas managed in the Prime dashboard. Reserved quotas for GPU pods are governed by the marketplace availability service rather than HTTP throttling.
Prime Intellect Rate Limits is the machine-readable rate-limit profile for Prime Intellect on the APIs.io network, conforming to the API Commons Rate Limits specification.
It captures 3 rate-limit definitions.
The profile also includes response codes documented for throttled, quotaExceeded, and unauthorized.
Tagged areas include Rate Limiting, Quotas, GPU Compute, and Inference.