Prime Intellect · Rate Limits

Prime Intellect Rate Limits

Name: Prime Intellect Rate Limits
Creator: Prime Intellect
Keywords: Rate Limiting, Quotas, GPU Compute, Inference

Best-effort summary of Prime Intellect API rate-limiting and quota behavior. The public documentation does not publish a comprehensive table of per-tier RPM/TPM values; control-plane and inference APIs use bearer tokens with per-key quotas managed in the Prime dashboard. Reserved quotas for GPU pods are governed by the marketplace availability service rather than HTTP throttling.

Prime Intellect Rate Limits is the machine-readable rate-limit profile for Prime Intellect on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 3 rate-limit definitions.

The profile also includes response codes documented for throttled, quotaExceeded, and unauthorized.

Tagged areas include Rate Limiting, Quotas, GPU Compute, and Inference.

3 Limits Throttle: 429 Quota: 429

Rate LimitingQuotasGPU ComputeInference

Limits

Per-key quotas managed in dashboard.

Usage metering returned in response body; cost computed per request.

Concurrent sandbox count is per-team and visible in the dashboard.

Prime Intellect Rate Limits

Limits

Sources