Ollama Plans Pricing
Ollama is free and open-source for local inference (no auth, no charge, no rate limit on http://localhost:11434). Ollama Cloud is a hosted-inference add-on with three tiers (Free, Pro, Max), measured by GPU utilization rather than tokens. Cloud usage resets on a 5-hour session and 7-day weekly cycle. All tiers permit unlimited use of open models on the user's own hardware; tier differences apply only to ollama.com cloud-model concurrency and weekly cloud usage.
Ollama Plans Pricing is the machine-readable pricing-plan profile for Ollama on the APIs.io network, conforming to the API Commons Plans specification.
It defines 4 plans, covering freemium and subscription tiers, with named plans including Local, Cloud — Free, Cloud — Pro, Cloud — Max.
Tagged areas include Artificial Intelligence, Large Language Models, and Models.
Plans
Run Ollama on your own hardware. Open source. No account or API key required for the local server at http://localhost:11434.
- Unlimited local inference
- Open-source models
- No authentication required for localhost
- Ollama API
- Ollama OpenAI Compatibility API
- Ollama Anthropic Compatibility API
Free hosted inference at ollama.com. Requires an ollama.com account and API key. Cloud usage is metered by GPU time, not tokens.
- Run cloud models from CLI/API
- Unlimited public models
- Run models on your own hardware
- Ollama Cloud API
$20/month or $200/year. Run 3 cloud models at a time; 50x more cloud usage than Free.
- 3 concurrent cloud models
- 50x cloud usage of Free
- Ollama Cloud API
$100/month. Run 10 cloud models at a time; 5x more usage than Pro.
- 10 concurrent cloud models
- 5x cloud usage of Pro
- Ollama Cloud API