Scalable Inference Serving Rate Limits

Scalable Inference Serving does not publish public API rate limits reachable in this reconciliation pass; limits are governed by the customer's commercial / partner agreement. Consumers should honor 429 / 503 responses where they appear and follow standard exponential-backoff guidance.

Scalable Inference Serving Rate Limits is the machine-readable rate-limit profile for Scalable Inference Serving on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 1 rate-limit definition, measuring varies.

The profile also includes 2 backoff/retry policies defined and response codes documented for throttled and serviceUnavailable.

Tagged areas include AI, Inference, Kubernetes, and Rate Limiting.

1 Limits Throttle: 429
AIInferenceKubernetesRate Limiting

Limits

Contracted limits account
varies
negotiated under commercial / partner agreement; not publicly published

Policies

Backoff on 429
Clients should implement exponential backoff with jitter on HTTP 429 responses, honoring the Retry-After header when present.
Quota raise via account team
Quota and rate-limit changes are coordinated through the Scalable Inference Serving account team or partner manager rather than self-service.

Sources