Parasail · Rate Limits

Parasail Rate Limits

Parasail enforces request-per-minute (RPM) ceilings by usage tier on its OpenAI- compatible inference endpoints. A token-volume limit is not currently enforced. Batch jobs are bounded by a 24-hour completion window rather than RPM. Dedicated deployments have no global RPM ceiling beyond what the provisioned replicas can serve.

Parasail Rate Limits is the machine-readable rate-limit profile for Parasail on the APIs.io network, conforming to the API Commons Rate Limits specification.

The profile also includes 3 backoff/retry policies defined.

Tagged areas include AI, Artificial Intelligence, GPU, Inference, and Large Language Models.

0 Limits
AIArtificial IntelligenceGPUInferenceLarge Language ModelsOpen Source ModelsHugging FaceBatchEmbeddingsTokenmaxxingSupercloud

Policies