Ollama Rate Limits
Local Ollama (http://localhost:11434) has no rate limits or authentication. Ollama Cloud enforces tier-based concurrency (Free, Pro=3, Max=10 concurrent cloud models) and weekly GPU-time quotas rather than per-second request ceilings. Cloud quotas reset on 5-hour session and 7-day weekly cycles. Specific TPS / RPM ceilings are not publicly documented.
Ollama Rate Limits is the machine-readable rate-limit profile for Ollama on the APIs.io network, conforming to the API Commons Rate Limits specification.
It captures 5 rate-limit definitions, measuring requests_per_second, concurrent_requests, and gpu_time.
The profile also includes 4 backoff/retry policies defined and response codes documented for unauthorized, throttled, and serviceUnavailable.
Tagged areas include Artificial Intelligence, Large Language Models, Models, and Rate Limiting.