Inferless
Inferless is a serverless GPU inference platform for machine learning models. Teams import a model from Hugging Face, a Git repo, or a container and Inferless auto-generates a scalable REST inference endpoint billed per second of GPU compute. A workspace-scoped management API and CLI cover model import, deployment, settings, logs, secrets, and volumes.
APIs
Inferless Inference Endpoints API
Each deployed model exposes an auto-generated REST inference endpoint on a per-deployment host (m-
Inferless Model Management API
Workspace-scoped REST management API under https://api.inferless.com/rest for updating model autoscaling and machine settings (min/max replicas, machine type, concurrency, infer...
Inferless Workspaces and Deployments
Workspace, model import, and deployment workflow exposed through the Inferless CLI (inferless init, deploy, run, remote-run, model, workspace, runtime, secrets, volume) and back...