NVIDIA NIM Health API

Liveness, readiness, and startup probes exposed by self-hosted NIM containers (/v1/health/live, /v1/health/ready) and a Prometheus /v1/metrics scrape endpoint for GPU utilization, request latency, and queue depth. Drives Kubernetes pod lifecycle and HPA scaling via the NIM Operator.

OpenAPI Specification

nvidia-nim-health-api-openapi.yml Raw ↑
openapi: 3.1.0
info:
  title: NVIDIA NIM Health API
  description: >
    Liveness, readiness, and metrics endpoints exposed by every self-hosted
    NIM container on port 8000. The NIM Operator uses these for Kubernetes
    probes; Prometheus scrapes /v1/metrics for GPU utilization, request
    latency, queue depth, and per-engine counters.
  version: '2026-05-25'
  contact:
    name: NVIDIA Developer Support
    url: https://forums.developer.nvidia.com/c/ai-data-science/nemo-llm-service/
  license:
    name: NVIDIA AI Enterprise License
    url: https://www.nvidia.com/en-us/data-center/products/ai-enterprise/
servers:
  - url: http://localhost:8000
    description: Self-hosted NIM container default
tags:
  - name: Health
    description: Liveness, readiness, and metrics probes
paths:
  /v1/health/live:
    get:
      summary: Liveness Probe
      description: Returns 200 OK if the container process is alive. Used as Kubernetes livenessProbe.
      operationId: getLiveness
      tags:
        - Health
      responses:
        '200':
          description: Container is alive.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HealthStatus'
        '503':
          description: Container is unhealthy and should be restarted.
  /v1/health/ready:
    get:
      summary: Readiness Probe
      description: Returns 200 OK only once the model engine has loaded and the container can accept traffic.
      operationId: getReadiness
      tags:
        - Health
      responses:
        '200':
          description: Ready to serve.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HealthStatus'
        '503':
          description: Not ready yet (e.g. model still loading).
  /v1/metrics:
    get:
      summary: Prometheus Metrics
      description: Prometheus text exposition format. Includes GPU utilization, request latency histograms, queue depth, and engine-specific counters.
      operationId: getMetrics
      tags:
        - Health
      responses:
        '200':
          description: Prometheus metrics payload.
          content:
            text/plain:
              schema:
                type: string
components:
  schemas:
    HealthStatus:
      type: object
      properties:
        message:
          type: string
          example: Service is live.
        object:
          type: string
          example: health-response