Leonardo.AI · Rate Limits

Leonardo Ai Rate Limits

Reconciled concurrency, queue, and rate-limit behaviour for the Leonardo.AI Production API. Leonardo enforces per-API-key concurrency limits (parallel in-flight generations) on top of a queueing system rather than a strict RPM/TPM token bucket. Excess requests are queued, not 429'd, up to the queue depth.

Leonardo Ai Rate Limits is the machine-readable rate-limit profile for Leonardo.AI on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 2 rate-limit definitions, across the Default Production API and Synchronous utility endpoints tiers.

The profile also includes response codes documented for unauthorized, forbidden, notFound, validationError, rateLimited, and serverError.

Tagged areas include AI, Rate Limiting, Concurrency, and Queue.

2 Limits
AIRate LimitingConcurrencyQueue

Limits

Concurrency and queue depth are account-scoped, not publicly documented. Use the in-app API Access dashboard and contact support for production limit increases.
Designed for higher request rates than the generation endpoints, but still subject to a fair-use ceiling.

Sources