Fireworks AI · Pricing Plans

Fireworks Ai Plans Pricing

Name: Fireworks Ai Plans Pricing
Creator: Fireworks AI
Keywords: AI, LLM, Inference, Multimodal, Fine-tuning, GPU, Plans

Fireworks AI offers serverless pay-per-token inference, on-demand dedicated GPU deployments billed per GPU-second, batch inference at 50% of serverless, cached input tokens at 50% of standard, and managed fine-tuning. New users get $1 in free credits and postpaid billing as usage grows.

Fireworks Ai Plans Pricing is the machine-readable pricing-plan profile for Fireworks AI on the APIs.io network, conforming to the API Commons Plans specification.

It defines 5 plans, covering usage and enterprise tiers, with named plans including Serverless (Pay-as-you-go), Batch Inference, Fine-Tuning, On-Demand Deployments (Dedicated GPUs), Enterprise.

Tagged areas include AI, LLM, Inference, Multimodal, and Fine-tuning.

5 Plans API Commons Plans

View Source

AILLMInferenceMultimodalFine-tuningGPUPlans

Plans

Serverless (Pay-as-you-go) usage

On-demand per-token inference with high rate limits, postpaid billing, and zero cold starts.

Chat / Vision Tokens (tokens · month) per 1M tokens, varies by model (see pricing page) USD

Cached Input Tokens (tokens · month) 50% of the standard input rate USD

Embeddings (up to 150M params) (tokens · month) $0.008 per 1M USD

Embeddings (150M-350M params) (tokens · month) $0.016 per 1M USD

Embeddings (Qwen3 8B) (tokens · month) $0.10 per 1M USD

Chat Completions
Vision
Embeddings
Rerank
Images
Audio

Batch Inference usage

Asynchronous batch jobs priced at 50% of serverless input and output rates.

Batch Tokens (tokens · usage) 50% of serverless rates (input and output) USD

Batch Chat Completions
Batch Embeddings

Fine-Tuning usage

Supervised fine-tuning (LoRA and full) priced per 1M training tokens by model size and method, plus reinforcement fine-tuning billed per GPU-hour at on-demand deployment rates.

Supervised Fine-Tuning Tokens (tokens · usage) $0.50-$40.00 per 1M training tokens (varies by model size and method) USD

Reinforcement Fine-Tuning (hours · usage) per GPU-hour at on-demand rates ($7-$12 per hour) USD

Fine-Tuned Model Serving (tokens · month) same per-token price as base model USD

SFT (LoRA)
SFT (Full)
Reinforcement Fine-Tuning

On-Demand Deployments (Dedicated GPUs) usage

Pay-per-GPU-second dedicated GPU deployments with autoscaling and no cold-start charge.

H100 / H200 (per hour) (hours · usage) $7.00 per hour USD

B200 (per hour) (hours · usage) $10.00 per hour USD

B300 (per hour) (hours · usage) $12.00 per hour USD

H100 / H200
B200
B300

Enterprise enterprise

Reserved capacity, dedicated regions, SOC2 / HIPAA compliance support, and negotiated terms. Contact Fireworks sales.

Enterprise Agreement (contract · year) contact sales USD

Custom Volume Pricing
Reserved Capacity

Fireworks Ai Plans Pricing

Plans

Sources