Groq Rate Limits
GroqCloud enforces per-account rate limits on synchronous inference expressed as RPM (requests per minute), RPD (requests per day), TPM (tokens per minute), TPD (tokens per day), and audio-specific ASH/ASD (audio seconds per hour/day). Limits vary by model and account spend tier and are visible in the GroqCloud console. Specific per-model values are not reconciled in this artifact.
Groq Rate Limits is the machine-readable rate-limit profile for Groq on the APIs.io network, conforming to the API Commons Rate Limits specification.
It captures 6 rate-limit definitions, measuring requests, tokens, audio_seconds, and jobs.
The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.
Tagged areas include AI, LLM, Inference, LPU, and Low Latency.