Nvidia · Rate Limits

Nvidia Rate Limits

NVIDIA's developer API surface is multi-product. build.nvidia.com hosted NIM endpoints have per-account / per-API-key rate limits and free-credit budgets that rotate by promotion; specific RPM/TPM numbers are not consistently published. Self-hosted NIM (via AI Enterprise license) has no NVIDIA-side rate limits — throughput is bounded by the customer's GPU hardware. NGC downloads are throttled per-IP by the catalog CDN.

Nvidia Rate Limits is the machine-readable rate-limit profile for Nvidia on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 3 rate-limit definitions, measuring varies and requests_per_second.

The profile also includes 4 backoff/retry policies defined and response codes documented for unauthorized, forbidden, throttled, and serviceUnavailable.

Tagged areas include GPU, AI, Machine Learning, Computing, and Graphics.

3 Limits Throttle: 429
GPUAIMachine LearningComputingGraphicsRate Limiting

Limits

build.nvidia.com hosted endpoints api-key
varies
per-key throttle and free-credit budget; specific RPM/TPM not publicly documented
Self-hosted NIM (AI Enterprise) cluster
requests_per_second
bounded by customer GPU hardware; no NVIDIA-imposed rate limit
NGC Catalog downloads IP
varies
CDN-throttled per-IP; not numerically published

Policies

API key required
All hosted NIM endpoints (build.nvidia.com, integrate.api.nvidia.com) require an NVIDIA developer API key passed via Authorization Bearer header.
Backoff
Implement exponential backoff with jitter on 429 responses. Honor Retry-After when present.
Free-credit exhaustion
When free-trial credits on build.nvidia.com are exhausted, requests return an authorization error rather than a throttle; obtain an AI Enterprise license or cloud-marketplace credentials for production.
Self-host for high throughput
For sustained high-throughput inference, deploy NIM containers on customer GPU infrastructure under AI Enterprise rather than calling hosted endpoints.

Sources