LlamaCloud · Rate Limits

Llamacloud Rate Limits

LlamaCloud enforces per-account limits expressed as request concurrency and throughput on parsing, extraction, and retrieval, plus a monthly credit allowance per plan that effectively caps page/query volume. Concurrency and rate limits scale with the subscription tier, with Enterprise offering roughly 5x higher rate limits. Parsing, extraction, and indexing are asynchronous job-based workloads, so callers poll job status rather than holding long synchronous connections. Specific per-tier numeric limits are not reconciled in this artifact.

Llamacloud Rate Limits is the machine-readable rate-limit profile for LlamaCloud on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 5 rate-limit definitions, measuring concurrent_jobs, requests, queries, and credits.

The profile also includes 3 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include AI, Document Parsing, Extraction, Indexing, and Retrieval.

5 Limits Throttle: 429
AIDocument ParsingExtractionIndexingRetrievalRAGRate LimitingQuotasThrottling

Limits

Parse Job Concurrency account
concurrent_jobs
see provider documentation
Number of in-flight LlamaParse jobs; scales with subscription tier.
Extraction Job Concurrency account
concurrent_jobs
see provider documentation
Number of in-flight LlamaExtract jobs; scales with subscription tier.
Requests Per Minute account
requests
see provider documentation
Per-account request rate; Enterprise offers ~5x higher limits.
Retrieval Queries account
queries
see provider documentation
Retrieval throughput against managed indexes; metered at 1 credit per query.
Monthly Credit Allowance account
credits
10000 (Free) / 40000 (Starter) / 400000 (Pro) / custom (Enterprise)
Plan-included monthly credits; overage billed pay-as-you-go.

Policies

Tiered Limits
Concurrency and rate limits raise as accounts move from Free to Starter, Pro, and Enterprise agreements.
Asynchronous Jobs
Parsing, extraction, and indexing run as async jobs; clients poll job status endpoints rather than blocking.
Backoff Strategy
Clients should implement exponential backoff with jitter on 429 responses and honor Retry-After.

Sources