NVIDIA NIM

NVIDIA NIM Chat Completions API

OpenAI-compatible chat completions endpoint exposing 100+ foundation models — Meta Llama, Mistral, Mixtral, NVIDIA Nemotron, DeepSeek, Qwen, Microsoft Phi, Google Gemma, IBM Granite, and more — through a single /v1/chat/completions surface. Supports streaming, tool/function calling, structured outputs, vision inputs on multimodal models, and the standard temperature/top_p/max_tokens parameters. Switching models is a one-line change to the model string. Available hosted on integrate.api.nvidia.com or self-hosted via NIM containers on any GPU.

Documentation GitHub OpenAPI

OpenAPI Specification

openapi: 3.1.0
info:
  title: NVIDIA NIM Chat Completions API
  description: >
    OpenAI-compatible chat completions endpoint served by NVIDIA NIM. Available
    as a hosted service at https://integrate.api.nvidia.com/v1 and on every
    self-hosted NIM LLM container on port 8000. A single contract serves
    100+ foundation models — Llama, Mistral, NVIDIA Nemotron, DeepSeek, Qwen,
    Phi, Gemma, Granite — through the standard /v1/chat/completions surface.
  version: '2026-05-25'
  contact:
    name: NVIDIA Developer Support
    url: https://forums.developer.nvidia.com/c/ai-data-science/nemo-llm-service/
  license:
    name: NVIDIA AI Enterprise License
    url: https://www.nvidia.com/en-us/data-center/products/ai-enterprise/
servers:
  - url: https://integrate.api.nvidia.com
    description: NVIDIA-hosted NIM endpoint (DGX Cloud)
  - url: http://localhost:8000
    description: Self-hosted NIM container default
security:
  - BearerAuth: []
tags:
  - name: Chat
    description: OpenAI-compatible chat completion operations
paths:
  /v1/chat/completions:
    post:
      summary: Create A Chat Completion
      description: >
        Generate a chat completion for the supplied messages. Compatible with the
        OpenAI chat completions schema; supports streaming via Server-Sent Events
        when `stream: true` is set, tool/function calling, JSON-mode structured
        outputs, and (on VLM models) image inputs inside the messages array.
      operationId: createChatCompletion
      tags:
        - Chat
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ChatCompletionRequest'
      responses:
        '200':
          description: Chat completion response (or SSE stream when stream=true).
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ChatCompletionResponse'
            text/event-stream:
              schema:
                type: string
                description: 'SSE stream of `data:` JSON deltas terminated by `data: [DONE]`.'
        '400':
          description: Invalid request.
        '401':
          description: Missing or invalid API key.
        '403':
          description: API key lacks access to the requested model.
        '404':
          description: Requested model not found in this endpoint's catalog.
        '429':
          description: Rate limit or quota exceeded.
        '500':
          description: Upstream inference error.
components:
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: nvapi-...
      description: 'NVIDIA developer API key. Use `Authorization: Bearer nvapi-...`.'
  schemas:
    ChatCompletionRequest:
      type: object
      required: [model, messages]
      properties:
        model:
          type: string
          description: Model identifier (e.g. `meta/llama-3.3-70b-instruct`, `nvidia/llama-3.1-nemotron-70b-instruct`).
        messages:
          type: array
          items:
            $ref: '#/components/schemas/ChatMessage'
        temperature:
          type: number
          minimum: 0
          maximum: 2
          default: 0.2
        top_p:
          type: number
          minimum: 0
          maximum: 1
          default: 0.7
        max_tokens:
          type: integer
          minimum: 1
          default: 1024
        stream:
          type: boolean
          default: false
        stop:
          oneOf:
            - type: string
            - type: array
              items:
                type: string
        n:
          type: integer
          minimum: 1
          default: 1
        seed:
          type: integer
        tools:
          type: array
          items:
            $ref: '#/components/schemas/Tool'
        tool_choice:
          oneOf:
            - type: string
              enum: [auto, none, required]
            - type: object
        response_format:
          type: object
          properties:
            type:
              type: string
              enum: [text, json_object, json_schema]
            json_schema:
              type: object
        frequency_penalty:
          type: number
        presence_penalty:
          type: number
    ChatMessage:
      type: object
      required: [role]
      properties:
        role:
          type: string
          enum: [system, user, assistant, tool]
        content:
          oneOf:
            - type: string
            - type: array
              items:
                $ref: '#/components/schemas/ContentPart'
        name:
          type: string
        tool_calls:
          type: array
          items:
            $ref: '#/components/schemas/ToolCall'
        tool_call_id:
          type: string
    ContentPart:
      type: object
      required: [type]
      properties:
        type:
          type: string
          enum: [text, image_url]
        text:
          type: string
        image_url:
          type: object
          properties:
            url:
              type: string
              description: HTTPS URL or `data:image/...;base64,...` payload.
    Tool:
      type: object
      required: [type, function]
      properties:
        type:
          type: string
          enum: [function]
        function:
          type: object
          required: [name]
          properties:
            name:
              type: string
            description:
              type: string
            parameters:
              type: object
    ToolCall:
      type: object
      properties:
        id:
          type: string
        type:
          type: string
          enum: [function]
        function:
          type: object
          properties:
            name:
              type: string
            arguments:
              type: string
    ChatCompletionResponse:
      type: object
      properties:
        id:
          type: string
        object:
          type: string
          example: chat.completion
        created:
          type: integer
        model:
          type: string
        choices:
          type: array
          items:
            type: object
            properties:
              index:
                type: integer
              message:
                $ref: '#/components/schemas/ChatMessage'
              finish_reason:
                type: string
                enum: [stop, length, tool_calls, content_filter]
        usage:
          type: object
          properties:
            prompt_tokens:
              type: integer
            completion_tokens:
              type: integer
            total_tokens:
              type: integer

NVIDIA NIM Chat Completions API

Documentation

Specifications

Schemas & Data

Other Resources

OpenAPI Specification