Contextual Ai Rate Limits
The Contextual AI platform enforces per-workspace rate limits and request constraints across its APIs. Documented hard input constraints include a 32,000-token total limit on Generate requests, a 7,000-token total limit on LMUnit requests, and Parse file limits of 300 MB and 2,000 pages per file. Per-endpoint request-per-minute and token-per-minute throttles vary by tier and are not publicly reconciled in this artifact; enterprise agreements raise limits. Throttled requests return HTTP 429.
Contextual Ai Rate Limits is the machine-readable rate-limit profile for Contextual AI on the APIs.io network, conforming to the API Commons Rate Limits specification.
It captures 6 rate-limit definitions, measuring tokens, bytes, pages, and requests.
The profile also includes 3 backoff/retry policies defined and response codes documented for throttled.
Tagged areas include AI, RAG, LLM, Grounded Language Model, and Enterprise.