Hugging Face · Rate Limits

Hugging Face Rate Limits

Name: Hugging Face Rate Limits
Creator: Hugging Face
Keywords: Rate Limiting, AI, Inference, Machine Learning

Hugging Face does not publish a single account-wide requests-per-second number. Limits are enforced per-account/per-token via monthly Inference Providers credits ($0.10 Free, $2 PRO, $2/seat Team & Enterprise). Hub API and Inference Endpoints quotas are tracked as instance counts (raise via support). Higher-throughput needs use Inference Endpoints (dedicated capacity) or partner-provider keys directly. Rate limits scale with subscription tier.

Hugging Face Rate Limits is the machine-readable rate-limit profile for Hugging Face on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 6 rate-limit definitions, measuring USD_per_month, USD_per_seat_per_month, concurrent_instances, requests_per_minute, and GPU_seconds_per_day.

The profile also includes 6 backoff/retry policies defined and response codes documented for throttled, quotaExceeded, and serviceUnavailable.

Tagged areas include Rate Limiting, AI, Inference, and Machine Learning.

6 Limits Throttle: 429 Quota: 429

Rate LimitingAIInferenceMachine Learning

Limits

Inference Providers monthly credits (Free) account

USD_per_month · month

0.1

Free users can purchase additional credits to continue past the monthly allotment.

Inference Providers monthly credits (PRO) account

USD_per_month · month

2.0

PRO subscribers receive 20x the inference credits of Free users.

Inference Providers monthly credits (Team / Enterprise) organization

USD_per_seat_per_month · month

2.0

Pooled across all organization members; bill via X-HF-Bill-To header.

Inference Endpoints instance quota account

concurrent_instances

see https://ui.endpoints.huggingface.co quotas page

Paused endpoints do not count; scaled-to-zero endpoints still count. Raise via support.

Hub API rate limits api-key

requests_per_minute

not publicly published; scales with account tier (Free / PRO / Team / Enterprise)

ZeroGPU quota (Free) account

GPU_seconds_per_day

dynamic; PRO users receive 8x the Free quota

Policies

Backoff Strategy

Honor 429 responses with exponential backoff and jitter. Use the Retry-After header when present.

Credit Exhaustion

After monthly credits are exhausted, requests can continue under pay-as-you-go by purchasing additional credits via the billing settings page.

Pass-Through Provider Limits

When routing through Inference Providers, partner-provider rate limits and content policies also apply (e.g. Cerebras, Together, Replicate). Hugging Face does not add markup but forwards provider-imposed throttling.

Custom Provider Key

Users can supply their own provider key in HF settings to bypass HF billing/credit limits and be billed directly by the provider.

Organization Billing

Team / Enterprise organizations can centralize billing and set spending limits via the X-HF-Bill-To header.

Quota Increases

Inference Endpoints and Spaces hardware quotas can be raised by contacting api-enterprise@huggingface.co.

Hugging Face Rate Limits

Limits

Policies

Sources