Predibase Rate Limits
Predibase enforces account-level limits that differ by surface. Serverless (shared endpoint) inference has a free-tier token allowance (approximately 1M tokens/day and 10M tokens/month) and per-account throughput limits; dedicated deployments scale throughput with the provisioned GPU accelerator and replica count rather than a fixed request quota. Fine-tuning and batch inference are queued jobs. Specific per-account values are visible in the Predibase console and are not reconciled in this artifact.
Predibase Rate Limits is the machine-readable rate-limit profile for Predibase on the APIs.io network, conforming to the API Commons Rate Limits specification.
It captures 5 rate-limit definitions, measuring tokens, requests, and jobs.
The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.
Tagged areas include AI, LLM, Fine-Tuning, Inference, and LoRA.