Contextual AI · Rate Limits

Contextual Ai Rate Limits

The Contextual AI platform enforces per-workspace rate limits and request constraints across its APIs. Documented hard input constraints include a 32,000-token total limit on Generate requests, a 7,000-token total limit on LMUnit requests, and Parse file limits of 300 MB and 2,000 pages per file. Per-endpoint request-per-minute and token-per-minute throttles vary by tier and are not publicly reconciled in this artifact; enterprise agreements raise limits. Throttled requests return HTTP 429.

Contextual Ai Rate Limits is the machine-readable rate-limit profile for Contextual AI on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 6 rate-limit definitions, measuring tokens, bytes, pages, and requests.

The profile also includes 3 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include AI, RAG, LLM, Grounded Language Model, and Enterprise.

6 Limits Throttle: 429
AIRAGLLMGrounded Language ModelEnterpriseRate LimitingQuotasThrottling

Limits

Generate Request Tokens request
tokens
32000
Total tokens (messages + knowledge + output) per Generate request.
LMUnit Request Tokens request
tokens
7000
Total input tokens per LMUnit evaluation request.
Parse File Size request
bytes
314572800
Maximum 300 MB per file submitted to Parse.
Parse File Pages request
pages
2000
Maximum 2,000 pages per file submitted to Parse.
Requests Per Minute (RPM) workspace
requests
see provider documentation
Per-endpoint RPM varies by tier; not publicly reconciled.
Tokens Per Minute (TPM) workspace
tokens
see provider documentation
Per-endpoint TPM varies by tier; not publicly reconciled.

Policies

Job Result Retention
Parse job status and results are retained for 30 days; older requests return 404.
Tiered Limits
Limits raise with paid usage and via Enterprise agreements.
Backoff Strategy
Clients should implement exponential backoff with jitter and honor Retry-After on 429.

Sources