Predibase · Rate Limits

Predibase Rate Limits

Predibase enforces account-level limits that differ by surface. Serverless (shared endpoint) inference has a free-tier token allowance (approximately 1M tokens/day and 10M tokens/month) and per-account throughput limits; dedicated deployments scale throughput with the provisioned GPU accelerator and replica count rather than a fixed request quota. Fine-tuning and batch inference are queued jobs. Specific per-account values are visible in the Predibase console and are not reconciled in this artifact.

Predibase Rate Limits is the machine-readable rate-limit profile for Predibase on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 5 rate-limit definitions, measuring tokens, requests, and jobs.

The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include AI, LLM, Fine-Tuning, Inference, and LoRA.

5 Limits Throttle: 429
AILLMFine-TuningInferenceLoRARate LimitingQuotasThrottling

Limits

Serverless Token Allowance (Free) account
tokens
~1M tokens/day, ~10M tokens/month (free tier)
Shared endpoint serverless inference allowance before paid usage.
Serverless Throughput account
requests
see provider console
Per-account concurrency / throughput on shared endpoints.
Dedicated Deployment Throughput deployment
requests
scales with GPU accelerator and replica count
Throughput is governed by provisioned hardware, not a fixed quota.
Fine-Tuning Jobs account
jobs
queued; concurrency varies by tier
Supervised and GRPO jobs queue and run on managed training infrastructure.
Batch Inference Jobs account
jobs
queued; separate from realtime serving
Async batch jobs deploy the base model and load adapters automatically.

Policies

Tiered Limits
Allowances and throughput rise from Free to Developer (dedicated) to Enterprise agreements.
Backoff Strategy
Clients should implement exponential backoff with jitter and honor Retry-After on 429 responses.

Sources