Inferless Plans Pricing
Inferless uses usage-based, per-second GPU compute billing. You pay only for the inference seconds consumed while a request is being served; when minimum replicas are set to zero and there is no active traffic, no charge accrues. Rates vary by GPU machine type (T4, A10, A100) and by whether the instance is shared (fractional) or dedicated. New accounts receive free credit to start.
Inferless Plans Pricing is the machine-readable pricing-plan profile for Inferless on the APIs.io network, conforming to the API Commons Plans specification.
It defines 4 plans, covering free, usage, and enterprise tiers, with named plans including Free Credit, Pay-as-you-go (Shared Instances), Pay-as-you-go (Dedicated Instances), Enterprise.
Tagged areas include AI, ML Inference, Serverless GPU, Model Deployment, and Inference.
Plans
New users start with free GPU credit and no credit card required, to deploy and test models before paying for usage.
- Inference Endpoints
- Model Import
Per-second GPU billing on shared (fractional) instances - GPU resources are allocated among several users for cost-effective, variable-performance serving suited to smaller or infrequent workloads.
- Inference Endpoints
- Autoscaling to Zero
Per-second GPU billing on dedicated instances reserving the full GPU for a single workload for consistent performance.
- Inference Endpoints
- Dedicated GPU
Volume plan for high-throughput production workloads with discounted per-second rates, higher GPU concurrency, extended log retention, custom credits, and negotiated terms. Contact Inferless sales.
- Discounted Per-Second Rates
- Higher GPU Concurrency
- Extended Log Retention