Inferless · Pricing Plans

Inferless Plans Pricing

Inferless uses usage-based, per-second GPU compute billing. You pay only for the inference seconds consumed while a request is being served; when minimum replicas are set to zero and there is no active traffic, no charge accrues. Rates vary by GPU machine type (T4, A10, A100) and by whether the instance is shared (fractional) or dedicated. New accounts receive free credit to start.

Inferless Plans Pricing is the machine-readable pricing-plan profile for Inferless on the APIs.io network, conforming to the API Commons Plans specification.

It defines 4 plans, covering free, usage, and enterprise tiers, with named plans including Free Credit, Pay-as-you-go (Shared Instances), Pay-as-you-go (Dedicated Instances), Enterprise.

Tagged areas include AI, ML Inference, Serverless GPU, Model Deployment, and Inference.

4 Plans API Commons Plans
View Source
AIML InferenceServerless GPUModel DeploymentInferencePlans

Plans

Free Credit free

New users start with free GPU credit and no credit card required, to deploy and test models before paying for usage.

Free GPU Credit (hours · onboarding) 10 hours free credit (plus promotional $30 credit), no credit card required USD
Pay-as-you-go (Shared Instances) usage

Per-second GPU billing on shared (fractional) instances - GPU resources are allocated among several users for cost-effective, variable-performance serving suited to smaller or infrequent workloads.

T4 (Shared) (gpu_seconds · usage) $0.000092/sec ($0.33/hr) USD
A10 (Shared) (gpu_seconds · usage) $0.000170/sec ($0.61/hr) USD
A100 (Shared) (gpu_seconds · usage) $0.000745/sec ($2.68/hr) USD
Pay-as-you-go (Dedicated Instances) usage

Per-second GPU billing on dedicated instances reserving the full GPU for a single workload for consistent performance.

T4 (Dedicated) (gpu_seconds · usage) $0.000185/sec ($0.66/hr) USD
A10 (Dedicated) (gpu_seconds · usage) $0.000341/sec ($1.22/hr) USD
A100 (Dedicated) (gpu_seconds · usage) $0.001491/sec ($5.36/hr) USD
Enterprise enterprise

Volume plan for high-throughput production workloads with discounted per-second rates, higher GPU concurrency, extended log retention, custom credits, and negotiated terms. Contact Inferless sales.

Enterprise Agreement (contract · month) discounted price (min 100,000 inference requests/month; GPU concurrency 50; 365-day log retention) USD

Sources