Predibase · Rate Limits

Predibase Rate Limits

Name: Predibase Rate Limits
Creator: Predibase
Keywords: AI, LLM, Fine-Tuning, Inference, LoRA, Rate Limiting, Quotas, Throttling

Predibase enforces account-level limits that differ by surface. Serverless (shared endpoint) inference has a free-tier token allowance (approximately 1M tokens/day and 10M tokens/month) and per-account throughput limits; dedicated deployments scale throughput with the provisioned GPU accelerator and replica count rather than a fixed request quota. Fine-tuning and batch inference are queued jobs. Specific per-account values are visible in the Predibase console and are not reconciled in this artifact.

Predibase Rate Limits is the machine-readable rate-limit profile for Predibase on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 5 rate-limit definitions, measuring tokens, requests, and jobs.

The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include AI, LLM, Fine-Tuning, Inference, and LoRA.

5 Limits Throttle: 429

AILLMFine-TuningInferenceLoRARate LimitingQuotasThrottling

Limits

Serverless Token Allowance (Free) account

tokens

~1M tokens/day, ~10M tokens/month (free tier)

Shared endpoint serverless inference allowance before paid usage.

Serverless Throughput account

requests

see provider console

Per-account concurrency / throughput on shared endpoints.

Dedicated Deployment Throughput deployment

requests

scales with GPU accelerator and replica count

Throughput is governed by provisioned hardware, not a fixed quota.

Fine-Tuning Jobs account

jobs

queued; concurrency varies by tier

Supervised and GRPO jobs queue and run on managed training infrastructure.

Batch Inference Jobs account

jobs

queued; separate from realtime serving

Async batch jobs deploy the base model and load adapters automatically.

Policies

Tiered Limits

Allowances and throughput rise from Free to Developer (dedicated) to Enterprise agreements.

Backoff Strategy

Clients should implement exponential backoff with jitter and honor Retry-After on 429 responses.

Predibase Rate Limits

Limits

Policies

Sources