Profiling Replicate — One API, Thirteen Capabilities, a Million Models

Replicate is one of the most interesting providers on the network because it represents an opposite shape from the enterprise fintech profiles I’ve been writing about. Mastercard has 101 APIs. Plaid has 39. Replicate has one.

But that single API decomposes into 13 Naftiko capabilities that span model lifecycle, prediction execution, deployments, hardware selection, and webhook orchestration — and behind it sits a marketplace of essentially unbounded open-source models. It’s a useful reminder that “API count” isn’t the right metric for surface area. Capability count is.

What the single API actually does

The thirteen capabilities in the Replicate profile are exactly the operational verbs of a managed inference platform:

Models and Model — list, get, create, delete. The catalog and per-model surface.
Collections and Slug — curated groupings of models.
Predictions — the core verb: create a prediction, get its status, stream output.
Deployments — long-running, autoscaling endpoints with their own configuration.
Hardware — the GPU-tier selector (A100, H100, T4, etc.) exposed as an API.
Cancel — operationally critical when an inference job goes long.
Accounts, Owner, Name — multi-tenant addressing of the above.
Secrets — webhook signing key retrieval.

Each capability is a self-contained Naftiko workflow. “Run a model” looks like one verb to a human; in the API it’s create prediction → poll get prediction (or subscribe via webhook) → optionally cancel. The capability framing groups those operations together so a downstream agent or composer doesn’t have to derive the workflow from endpoint names.

What’s interesting about the shape

A few things that make Replicate worth profiling specifically:

The async-first model. Predictions are not synchronous request/response. You create a prediction, then either poll or wait for the webhook. This is the right shape for inference workloads (which can take seconds to minutes), but it’s still rare in the broader AI API ecosystem. OpenAI is mostly synchronous; Anthropic is too. Replicate’s surface is explicitly built around the truth that ML inference is a job, not a function call.
Hardware as a first-class API. Most managed-inference providers hide hardware behind opaque “auto” selection. Replicate exposes it. That’s a quietly opinionated design choice — it gives developers latency-vs-cost control as a programmable axis rather than an account-settings dropdown.
The webhook signing secret has its own capability. The fact that the Replicate team designed Secrets — Get the Signing Secret for the Default Webhook as a dedicated operation tells you the team thinks about webhook security as a first-class concern rather than a docs page. That’s the kind of detail you notice when you walk a catalog.
Deployments and predictions are separate. A deployment is a configured, autoscaling endpoint. A prediction is a single inference call. Some providers conflate these. Keeping them separate lets you reason about cost and lifecycle independently.

The takeaway

Replicate is the cleanest example in the catalog of “small API surface, large capability surface.” One spec, 13 capability groupings, async-first by design, hardware exposed, secrets surfaced as a primitive. That’s a lot of useful shape behind what looks at a glance like a one-API listing.

If you’re building an inference platform of your own — internal or commercial — providers.apis.io/providers/replicate is a clean reference to walk. The capability-per-verb decomposition in particular is portable: it doesn’t require a 30-API monolith to express, just a coherent set of operations that compose into the workflows your users actually need.