Together Ai Rate Limits
Together AI enforces per-account rate limits on serverless inference that vary by model and account tier (Build / Scale / Enterprise as account spend/credit grows). Limits include requests-per-minute (RPM) and tokens-per-minute (TPM) per model. Specific per-model values are not reconciled in this artifact - see the Together console for active limits on your account.
Together Ai Rate Limits is the machine-readable rate-limit profile for Together AI on the APIs.io network, conforming to the API Commons Rate Limits specification.
It captures 5 rate-limit definitions, measuring requests, tokens, and jobs.
The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.
Tagged areas include AI, LLM, Inference, Open Source, and Fine-tuning.