Predibase
Predibase Deployments API

The Deployments API from Predibase — 2 operation(s) for deployments.
Documentation GitHub OpenAPI
OpenAPI Specification

openapi: 3.0.1
info:
  title: Predibase Adapters Deployments API
  description: 'Specification of the Predibase API surfaces documented at https://docs.predibase.com. Two planes are covered: (1) the inference data plane on https://serving.app.predibase.com, which exposes an OpenAI-compatible chat/completions interface plus native generate / generate_stream text-generation endpoints, scoped per tenant and deployment; and (2) the control plane on https://api.app.predibase.com, which manages fine-tuning jobs, adapter repositories, deployments, datasets, and base models. All endpoints authenticate with a Predibase API token sent as an HTTP Bearer token.'
  termsOfService: https://predibase.com/terms-of-service
  contact:
    name: Predibase Support
    email: support@predibase.com
    url: https://docs.predibase.com
  version: '2.0'
servers:
- url: https://serving.app.predibase.com/{tenant}/deployments/v2/llms/{model}
  description: Inference (serving) base. tenant is your Predibase tenant ID (Settings > My Profile); model is the deployment name (Deployments page). The OpenAI-compatible routes live under the /v1 suffix of this base.
  variables:
    tenant:
      default: TENANT_ID
      description: Predibase tenant ID.
    model:
      default: DEPLOYMENT_NAME
      description: Deployment name (base model deployment).
- url: https://api.app.predibase.com/v2
  description: Control plane base for fine-tuning, adapters, deployments, datasets, and models.
security:
- bearerAuth: []
tags:
- name: Deployments
paths:
  /deployments:
    post:
      operationId: createDeployment
      tags:
      - Deployments
      summary: Create a dedicated deployment.
      description: Creates a dedicated or private serverless deployment of a base model on a selected GPU accelerator (e.g. a10_24gb, a100_80gb), with LoRA serving enabled for fine-tuned adapters.
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/DeploymentRequest'
      responses:
        '200':
          description: The created deployment.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Deployment'
    get:
      operationId: listDeployments
      tags:
      - Deployments
      summary: List deployments.
      responses:
        '200':
          description: A list of deployments.
  /deployments/{deploymentName}:
    get:
      operationId: getDeployment
      tags:
      - Deployments
      summary: Get a deployment.
      parameters:
      - name: deploymentName
        in: path
        required: true
        schema:
          type: string
      responses:
        '200':
          description: The deployment.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Deployment'
    delete:
      operationId: deleteDeployment
      tags:
      - Deployments
      summary: Delete a deployment.
      parameters:
      - name: deploymentName
        in: path
        required: true
        schema:
          type: string
      responses:
        '200':
          description: The deployment was deleted.
components:
  schemas:
    DeploymentRequest:
      type: object
      required:
      - name
      - base_model
      properties:
        name:
          type: string
        base_model:
          type: string
        accelerator:
          type: string
          description: GPU accelerator, e.g. a10_24gb or a100_80gb.
        min_replicas:
          type: integer
        max_replicas:
          type: integer
    Deployment:
      type: object
      properties:
        name:
          type: string
        base_model:
          type: string
        accelerator:
          type: string
        status:
          type: string
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      bearerFormat: Predibase API token
      description: 'Predibase API token sent as Authorization: Bearer <PREDIBASE_API_TOKEN>. Generate a token from Settings in the Predibase console.'
Predibase Deployments API

Documentation

Specifications

Other Resources

OpenAPI Specification