llamaindex · Rate Limits

Llamaindex Rate Limits

Name: Llamaindex Rate Limits
Creator: llamaindex
Keywords: Rate Limiting, LLM, RAG

LlamaIndex / LlamaCloud rate limits are not publicly enumerated as per-second numbers on the pricing page; tiered limits scale with plan and Enterprise gets 5x Pro. Limits are enforced per API key. Detailed per-endpoint throttling is documented inside the LlamaCloud product after sign-in.

Llamaindex Rate Limits is the machine-readable rate-limit profile for llamaindex on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 4 rate-limit definitions, measuring varies.

The profile also includes 3 backoff/retry policies defined and response codes documented for throttled and quotaExceeded.

Tagged areas include Rate Limiting, LLM, and RAG.

4 Limits Throttle: 429 Quota: 429

Rate LimitingLLMRAG

Limits

API requests (Free) api-key

varies

see plan tier; lowest tier in LlamaCloud

API requests (Starter) api-key

varies

see plan tier

API requests (Pro) api-key

varies

see plan tier

API requests (Enterprise) contract

varies

5x Pro rate limits

Policies

Tiered scaling

Per-key request limits scale by plan tier; Enterprise contracts receive 5x Pro rate limits.

Backoff

On 429, clients should retry with exponential backoff and respect Retry-After when present.

Credits as soft quota

Per-month credit allotments act as a soft quota; PAYG continues at $1.25/1,000 credits up to a per-tier monthly cap.

Llamaindex Rate Limits

Limits

Policies

Sources