Inferless · FinOps Profile

Inferless Finops

FinOps view of Inferless spend. Inferless bills per second of GPU compute consumed while serving inference, metered by GPU machine type (T4, A10, A100) and instance class (shared/fractional vs dedicated). Cost accrues only while a request runs; with min_replica set to 0, idle deployments incur no charge. Optimization centers on right-sizing the GPU, tuning concurrency and replica bounds, and scaling to zero.

Inferless Finops is the FinOps profile for Inferless on the APIs.io network, aligned with the FinOps Foundation Framework.

It defines 4 billable meters, billed in USD, on a monthly cycle, and pricing category usage-based.

The profile maps 8 FOCUS columns for cost-allocation reporting.

Tagged areas include AI, ML Inference, Serverless GPU, Model Deployment, and Inference.

Category: AI and Machine Learning Pricing: Usage-Based Billing: Monthly FOCUS v1.3
AIML InferenceServerless GPUModel DeploymentInferenceFinOpsCost ManagementFOCUS

Framework Alignment

Framework
Data Spec

Charge Categories

UsagePurchaseAdjustment

FOCUS Columns

BillingCurrency
USD
ChargeCategory
Usage
InvoiceIssuerName
Inferless
PricingCategory
Usage-Based
ProviderName
Inferless
PublisherName
Inferless
ServiceCategory
AI and Machine Learning
ServiceName
Inferless Serverless GPU Inference

Meters

gpu_seconds
Unit: seconds
Seconds of GPU compute consumed serving inference, billed per machine type and instance class.
shared_instance_seconds
Unit: seconds
GPU seconds on shared (fractional) instances at the lower per-second rate.
dedicated_instance_seconds
Unit: seconds
GPU seconds on dedicated instances reserving the full GPU.
free_credit
Unit: hours
Onboarding free credit (hours) applied against early GPU usage.

Sources