Predibase · FinOps Profile

Predibase Finops

FinOps view of Predibase spend. Predibase bills three usage meters: serverless inference per token (scaled by model size), batch inference at a flat per-million-token rate, fine-tuning (training) per token of training data scaled by base-model size, and dedicated deployments per GPU-hour by accelerator. LoRAX multi-LoRA serving packs many adapters onto one GPU, reducing dedicated serving cost versus one deployment per fine-tuned model.

Predibase Finops is the FinOps profile for Predibase on the APIs.io network, aligned with the FinOps Foundation Framework.

It defines 4 billable meters, billed in USD, on a monthly cycle, and pricing category usage-based.

The profile maps 8 FOCUS columns for cost-allocation reporting.

Tagged areas include AI, LLM, Fine-Tuning, Inference, and LoRA.

Category: AI and Machine Learning Pricing: Usage-Based Billing: Monthly FOCUS v1.3
AILLMFine-TuningInferenceLoRAFinOpsCost ManagementFOCUS

Framework Alignment

Framework
Data Spec

Charge Categories

UsagePurchaseAdjustment

FOCUS Columns

BillingCurrency
USD
ChargeCategory
Usage
InvoiceIssuerName
Predibase
PricingCategory
Usage-Based
ProviderName
Predibase
PublisherName
Predibase
ServiceCategory
AI and Machine Learning
ServiceName
Predibase

Meters

serverless_inference_tokens
Unit: tokens
Tokens served via serverless / shared endpoints, billed per 1M tokens scaled by model size.
batch_inference_tokens
Unit: tokens
Tokens served via batch inference jobs, billed at a flat per-1M-token rate.
finetuning_training_tokens
Unit: tokens
Training tokens consumed by supervised / GRPO fine-tuning jobs, billed per 1M tokens scaled by base-model size.
dedicated_gpu_hours
Unit: hours
GPU-hours consumed by dedicated / private deployments, billed per accelerator type (A10, A100, H100).

Sources