Inferless · Pricing Plans

Inferless Plans Pricing

Name: Inferless Plans Pricing
Creator: Inferless
Keywords: AI, ML Inference, Serverless GPU, Model Deployment, Inference, Plans

Inferless uses usage-based, per-second GPU compute billing. You pay only for the inference seconds consumed while a request is being served; when minimum replicas are set to zero and there is no active traffic, no charge accrues. Rates vary by GPU machine type (T4, A10, A100) and by whether the instance is shared (fractional) or dedicated. New accounts receive free credit to start.

Inferless Plans Pricing is the machine-readable pricing-plan profile for Inferless on the APIs.io network, conforming to the API Commons Plans specification.

It defines 4 plans, covering free, usage, and enterprise tiers, with named plans including Free Credit, Pay-as-you-go (Shared Instances), Pay-as-you-go (Dedicated Instances), Enterprise.

Tagged areas include AI, ML Inference, Serverless GPU, Model Deployment, and Inference.

4 Plans API Commons Plans

View Source

AIML InferenceServerless GPUModel DeploymentInferencePlans

Plans

Free Credit free

New users start with free GPU credit and no credit card required, to deploy and test models before paying for usage.

Free GPU Credit (hours · onboarding) 10 hours free credit (plus promotional $30 credit), no credit card required USD

Inference Endpoints
Model Import

Pay-as-you-go (Shared Instances) usage

Per-second GPU billing on shared (fractional) instances - GPU resources are allocated among several users for cost-effective, variable-performance serving suited to smaller or infrequent workloads.

T4 (Shared) (gpu_seconds · usage) $0.000092/sec ($0.33/hr) USD

A10 (Shared) (gpu_seconds · usage) $0.000170/sec ($0.61/hr) USD

A100 (Shared) (gpu_seconds · usage) $0.000745/sec ($2.68/hr) USD

Inference Endpoints
Autoscaling to Zero

Pay-as-you-go (Dedicated Instances) usage

Per-second GPU billing on dedicated instances reserving the full GPU for a single workload for consistent performance.

T4 (Dedicated) (gpu_seconds · usage) $0.000185/sec ($0.66/hr) USD

A10 (Dedicated) (gpu_seconds · usage) $0.000341/sec ($1.22/hr) USD

A100 (Dedicated) (gpu_seconds · usage) $0.001491/sec ($5.36/hr) USD

Inference Endpoints
Dedicated GPU

Enterprise enterprise

Volume plan for high-throughput production workloads with discounted per-second rates, higher GPU concurrency, extended log retention, custom credits, and negotiated terms. Contact Inferless sales.

Enterprise Agreement (contract · month) discounted price (min 100,000 inference requests/month; GPU concurrency 50; 365-day log retention) USD

Discounted Per-Second Rates
Higher GPU Concurrency
Extended Log Retention

Inferless Plans Pricing

Plans

Sources