llamaindex · Rate Limits

Llamaindex Rate Limits

LlamaIndex / LlamaCloud rate limits are not publicly enumerated as per-second numbers on the pricing page; tiered limits scale with plan and Enterprise gets 5x Pro. Limits are enforced per API key. Detailed per-endpoint throttling is documented inside the LlamaCloud product after sign-in.

Llamaindex Rate Limits is the machine-readable rate-limit profile for llamaindex on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 4 rate-limit definitions, measuring varies.

The profile also includes 3 backoff/retry policies defined and response codes documented for throttled and quotaExceeded.

Tagged areas include Rate Limiting, LLM, and RAG.

4 Limits Throttle: 429 Quota: 429
Rate LimitingLLMRAG

Limits

API requests (Free) api-key
varies
see plan tier; lowest tier in LlamaCloud
API requests (Starter) api-key
varies
see plan tier
API requests (Pro) api-key
varies
see plan tier
API requests (Enterprise) contract
varies
5x Pro rate limits

Policies

Tiered scaling
Per-key request limits scale by plan tier; Enterprise contracts receive 5x Pro rate limits.
Backoff
On 429, clients should retry with exponential backoff and respect Retry-After when present.
Credits as soft quota
Per-month credit allotments act as a soft quota; PAYG continues at $1.25/1,000 credits up to a per-tier monthly cap.

Sources