Fireworks Ai Plans Pricing
Fireworks AI offers serverless pay-per-token inference, on-demand dedicated GPU deployments billed per GPU-second, batch inference at 50% of serverless, cached input tokens at 50% of standard, and managed fine-tuning. New users get $1 in free credits and postpaid billing as usage grows.
Fireworks Ai Plans Pricing is the machine-readable pricing-plan profile for Fireworks AI on the APIs.io network, conforming to the API Commons Plans specification.
It defines 5 plans, covering usage and enterprise tiers, with named plans including Serverless (Pay-as-you-go), Batch Inference, Fine-Tuning, On-Demand Deployments (Dedicated GPUs), Enterprise.
Tagged areas include AI, LLM, Inference, Multimodal, and Fine-tuning.
Plans
On-demand per-token inference with high rate limits, postpaid billing, and zero cold starts.
- Chat Completions
- Vision
- Embeddings
- Rerank
- Images
- Audio
Asynchronous batch jobs priced at 50% of serverless input and output rates.
- Batch Chat Completions
- Batch Embeddings
Supervised fine-tuning (LoRA and full) priced per 1M training tokens by model size and method, plus reinforcement fine-tuning billed per GPU-hour at on-demand deployment rates.
- SFT (LoRA)
- SFT (Full)
- Reinforcement Fine-Tuning
Pay-per-GPU-second dedicated GPU deployments with autoscaling and no cold-start charge.
- H100 / H200
- B200
- B300
Reserved capacity, dedicated regions, SOC2 / HIPAA compliance support, and negotiated terms. Contact Fireworks sales.
- Custom Volume Pricing
- Reserved Capacity