Nvidia · Rate Limits

Nvidia Rate Limits

Name: Nvidia Rate Limits
Creator: Nvidia
Keywords: GPU, AI, Machine Learning, Computing, Graphics, Rate Limiting

NVIDIA's developer API surface is multi-product. build.nvidia.com hosted NIM endpoints have per-account / per-API-key rate limits and free-credit budgets that rotate by promotion; specific RPM/TPM numbers are not consistently published. Self-hosted NIM (via AI Enterprise license) has no NVIDIA-side rate limits — throughput is bounded by the customer's GPU hardware. NGC downloads are throttled per-IP by the catalog CDN.

Nvidia Rate Limits is the machine-readable rate-limit profile for Nvidia on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 3 rate-limit definitions, measuring varies and requests_per_second.

The profile also includes 4 backoff/retry policies defined and response codes documented for unauthorized, forbidden, throttled, and serviceUnavailable.

Tagged areas include GPU, AI, Machine Learning, Computing, and Graphics.

3 Limits Throttle: 429

GPUAIMachine LearningComputingGraphicsRate Limiting

Limits

build.nvidia.com hosted endpoints api-key

varies

per-key throttle and free-credit budget; specific RPM/TPM not publicly documented

Self-hosted NIM (AI Enterprise) cluster

requests_per_second

bounded by customer GPU hardware; no NVIDIA-imposed rate limit

NGC Catalog downloads IP

varies

CDN-throttled per-IP; not numerically published

Policies

API key required

All hosted NIM endpoints (build.nvidia.com, integrate.api.nvidia.com) require an NVIDIA developer API key passed via Authorization Bearer header.

Backoff

Implement exponential backoff with jitter on 429 responses. Honor Retry-After when present.

Free-credit exhaustion

When free-trial credits on build.nvidia.com are exhausted, requests return an authorization error rather than a throttle; obtain an AI Enterprise license or cloud-marketplace credentials for production.

Self-host for high throughput

For sustained high-throughput inference, deploy NIM containers on customer GPU infrastructure under AI Enterprise rather than calling hosted endpoints.

Nvidia Rate Limits

Limits

Policies

Sources