Hugging Face Rate Limits
Hugging Face does not publish a single account-wide requests-per-second number. Limits are enforced per-account/per-token via monthly Inference Providers credits ($0.10 Free, $2 PRO, $2/seat Team & Enterprise). Hub API and Inference Endpoints quotas are tracked as instance counts (raise via support). Higher-throughput needs use Inference Endpoints (dedicated capacity) or partner-provider keys directly. Rate limits scale with subscription tier.
Hugging Face Rate Limits is the machine-readable rate-limit profile for Hugging Face on the APIs.io network, conforming to the API Commons Rate Limits specification.
It captures 6 rate-limit definitions, measuring USD_per_month, USD_per_seat_per_month, concurrent_instances, requests_per_minute, and GPU_seconds_per_day.
The profile also includes 6 backoff/retry policies defined and response codes documented for throttled, quotaExceeded, and serviceUnavailable.
Tagged areas include Rate Limiting, AI, Inference, and Machine Learning.