LlamaCloud · Rate Limits

Llamacloud Rate Limits

Name: Llamacloud Rate Limits
Creator: LlamaCloud
Keywords: AI, Document Parsing, Extraction, Indexing, Retrieval, RAG, Rate Limiting, Quotas, Throttling

LlamaCloud enforces per-account limits expressed as request concurrency and throughput on parsing, extraction, and retrieval, plus a monthly credit allowance per plan that effectively caps page/query volume. Concurrency and rate limits scale with the subscription tier, with Enterprise offering roughly 5x higher rate limits. Parsing, extraction, and indexing are asynchronous job-based workloads, so callers poll job status rather than holding long synchronous connections. Specific per-tier numeric limits are not reconciled in this artifact.

Llamacloud Rate Limits is the machine-readable rate-limit profile for LlamaCloud on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 5 rate-limit definitions, measuring concurrent_jobs, requests, queries, and credits.

The profile also includes 3 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include AI, Document Parsing, Extraction, Indexing, and Retrieval.

5 Limits Throttle: 429

AIDocument ParsingExtractionIndexingRetrievalRAGRate LimitingQuotasThrottling

Limits

Parse Job Concurrency account

concurrent_jobs

see provider documentation

Number of in-flight LlamaParse jobs; scales with subscription tier.

Extraction Job Concurrency account

concurrent_jobs

see provider documentation

Number of in-flight LlamaExtract jobs; scales with subscription tier.

Requests Per Minute account

requests

see provider documentation

Per-account request rate; Enterprise offers ~5x higher limits.

Retrieval Queries account

queries

see provider documentation

Retrieval throughput against managed indexes; metered at 1 credit per query.

Monthly Credit Allowance account

credits

10000 (Free) / 40000 (Starter) / 400000 (Pro) / custom (Enterprise)

Plan-included monthly credits; overage billed pay-as-you-go.

Policies

Tiered Limits

Concurrency and rate limits raise as accounts move from Free to Starter, Pro, and Enterprise agreements.

Asynchronous Jobs

Parsing, extraction, and indexing run as async jobs; clients poll job status endpoints rather than blocking.

Backoff Strategy

Clients should implement exponential backoff with jitter on 429 responses and honor Retry-After.

Llamacloud Rate Limits

Limits

Policies

Sources