Groq · Rate Limits

Groq Rate Limits

Name: Groq Rate Limits
Creator: Groq
Keywords: AI, LLM, Inference, LPU, Low Latency, Rate Limiting, Quotas, Throttling

GroqCloud enforces per-account rate limits on synchronous inference expressed as RPM (requests per minute), RPD (requests per day), TPM (tokens per minute), TPD (tokens per day), and audio-specific ASH/ASD (audio seconds per hour/day). Limits vary by model and account spend tier and are visible in the GroqCloud console. Specific per-model values are not reconciled in this artifact.

Groq Rate Limits is the machine-readable rate-limit profile for Groq on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 6 rate-limit definitions, measuring requests, tokens, audio_seconds, and jobs.

The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include AI, LLM, Inference, LPU, and Low Latency.

6 Limits Throttle: 429

AILLMInferenceLPULow LatencyRate LimitingQuotasThrottling

Limits

Requests Per Minute (RPM) account

requests

see provider documentation

Per-model RPM, varies by tier and model.

Requests Per Day (RPD) account

requests

see provider documentation

Per-model RPD, varies by tier and model.

Tokens Per Minute (TPM) account

tokens

see provider documentation

Per-model TPM, varies by tier and model.

Tokens Per Day (TPD) account

tokens

see provider documentation

Per-model TPD, varies by tier and model.

Audio Seconds Per Hour / Day (ASH / ASD) account

audio_seconds

see provider documentation

Applies to STT/TTS endpoints; varies by model.

Batch API account

jobs

separate from sync limits

Batch jobs queue and run with 50% discount; do not consume sync RPM/TPM directly.

Policies

Tiered Limits

Limits raise as accounts move from free to paid usage and via Enterprise agreements.

Backoff Strategy

Clients should implement exponential backoff with jitter and honor Retry-After.

Groq Rate Limits

Limits

Policies

Sources