NVIDIA NIM

NVIDIA NIM (NVIDIA Inference Microservices) is a catalog of GPU-accelerated, containerized AI inference microservices that package optimized model engines (TensorRT-LLM, vLLM, SGLang, Triton) behind industry-standard OpenAI-compatible REST APIs. NIM covers large language models, embeddings and reranking, vision-language models, speech (Riva), visual generative AI, and biology (BioNeMo) — exposed identically whether consumed from the hosted endpoint at integrate.api.nvidia.com or self-hosted via Docker containers and the Kubernetes-native NIM Operator. NIM ships with NVIDIA AI Enterprise for commercial deployment and is the inference layer underneath NVIDIA AI Blueprints, NeMo Retriever, NeMo Guardrails, and the broader NVIDIA developer stack.

NVIDIA NIM publishes 11 APIs on the APIs.io network, including Completions API, Embeddings API, Reranking API, and 8 more. Tagged areas include AI, Artificial Intelligence, Inference, Microservices, and LLM.

The NVIDIA NIM catalog on APIs.io includes 1 JSON-LD context and 1 Spectral governance ruleset.

NVIDIA NIM’s developer surface includes authentication, developer portal, documentation, getting-started guide, signup flow, sandbox, pricing, and 68 more developer resources.

🌐 Visit website 📡 Source on GitHub

71.0/100 exemplar ▬ flat Agent 48/100 agent ready Full breakdown ↓
scored 2026-07-28 · rubric v0.6

AccessFreemiumSelf serve⚡ Free to try

11 APIs 1 MCP Servers 237 Agent Skills 19 Features

AIArtificial IntelligenceInferenceMicroservicesLLMFoundation ModelsGPUKubernetesNVIDIAOpenAI Compatible

Kin Score

Kin Score How this is scored →
scored 2026-07-28 · rubric v0.6

Composite quality — 71.0/100 · exemplar

Contract Quality 18.5 / 25

Developer Ergonomics 15.2 / 20

Commercial Clarity 15.8 / 20

Operational Transparency 4.8 / 13

Governance 8.4 / 12

Discoverability 8.3 / 10

Agent readiness — 48/100 · agent ready

Machine-Readable Contract 18 / 18

Agentic Access Contract 10 / 10

MCP Server 12 / 12

Machine-Readable Auth 10 / 10

Idempotency 0 / 9

Stable Error Semantics 8 / 8

Request/Response Examples 7 / 7

Rate-Limit Signaling 7 / 7

Typed Event Surface 0 / 6

Agent Skills 5 / 5

Well-Known Catalog 4 / 4

Consent & Bot Identity 0 / 3

A2A Agent Card 0 / 8

Dry-Run / Simulate Mode 0 / 4

Improve this rating by publishing the missing artifacts — every area above can be raised, and the full rubric is at apis.io/rating/. This rating is computed from github.com/api-evangelist/nvidia-nim: open an issue to ask a question, or submit a pull request to add artifacts. Want it done for you? Prioritized profiling — $2,500 →

APIs 11

Individual APIs this provider publishes, each with its own machine-readable definition.

NVIDIA NIM Completions API

Legacy OpenAI-compatible text completion endpoint (/v1/completions) for non-chat foundation models served by NIM. Accepts a raw prompt and returns generated text with the same s...

NVIDIA NIM Embeddings API

OpenAI-compatible embeddings endpoint (/v1/embeddings) backed by NVIDIA NeMo Retriever text embedding models including NV-Embed, NV-EmbedQA-E5, llama-3.2-nv-embedqa-1b, and BAAI...

NVIDIA NIM Reranking API

NeMo Retriever cross-encoder reranking endpoint (/v1/ranking) for scoring candidate passages against a query. Improves retrieval relevance on RAG pipelines and supports the llam...

NVIDIA NIM Models API

OpenAI-compatible model catalog endpoint (/v1/models) returning the list of models served by the NIM endpoint or container. Each entry includes id, owned_by, and created timesta...

NVIDIA NIM Vision Language Models API

Vision-language model inference through the standard /v1/chat/completions surface with image inputs (base64 or URL) in the messages payload. Supports NVIDIA NeVA, microsoft/kosm...

NVIDIA NIM Health API

Liveness, readiness, and startup probes exposed by self-hosted NIM containers (/v1/health/live, /v1/health/ready) and a Prometheus /v1/metrics scrape endpoint for GPU utilizatio...

NVIDIA NIM Biology (BioNeMo) API

BioNeMo NIMs for protein structure prediction (AlphaFold2, ESMFold, OpenFold), protein generation (ProtGPT2, RFDiffusion), molecular property prediction (MolMIM), small molecule...

NVIDIA NIM ASR API

Automatic speech recognition (speech-to-text)

NVIDIA NIM Chat API

OpenAI-compatible chat completion operations

NVIDIA NIM Images API

Text-to-image and image-to-image generation

NVIDIA NIM TTS API

Text-to-speech synthesis

Scroll for all 11

Postman Collections 10

Ready-to-run Postman collections for exercising this provider's APIs.

NVIDIA NIM Biology (BioNeMo) API

POSTMAN

NVIDIA NIM Chat Completions API

POSTMAN

NVIDIA NIM Completions API

POSTMAN

NVIDIA NIM Embeddings API

POSTMAN

NVIDIA NIM Health API

POSTMAN

NVIDIA NIM Image Generation API

POSTMAN

NVIDIA NIM Models API

POSTMAN

NVIDIA NIM Reranking API

POSTMAN

NVIDIA NIM Speech API

POSTMAN

NVIDIA NIM Vision Language Models API

POSTMAN

Scroll for all 10

Open Collections 10

Open, tool-agnostic API collections (OpenAPI-derived and Bruno).

NVIDIA NIM Biology (BioNeMo) API

OPEN COLLECTION

NVIDIA NIM Chat Completions API

OPEN COLLECTION

NVIDIA NIM Completions API

OPEN COLLECTION

NVIDIA NIM Embeddings API

OPEN COLLECTION

NVIDIA NIM Health API

OPEN COLLECTION

NVIDIA NIM Image Generation API

OPEN COLLECTION

NVIDIA NIM Models API

OPEN COLLECTION

NVIDIA NIM Reranking API

OPEN COLLECTION

NVIDIA NIM Speech API

OPEN COLLECTION

NVIDIA NIM Vision Language Models API

OPEN COLLECTION

Scroll for all 10

Arazzo Workflows 7

Multi-step API workflows described with the Arazzo specification.

NVIDIA NIM BioNeMo Drug Discovery

Fold a protein from sequence, dock a ligand into the predicted structure, then generate optimized analog molecules.

ARAZZO

NVIDIA NIM Discover And Chat

List the served models, confirm a target model's metadata, then run a chat completion against it.

ARAZZO

NVIDIA NIM Generate Image And Caption

Generate an image from a text prompt, then caption the generated image with a vision-language model.

ARAZZO

NVIDIA NIM Health Gated Completion

Check a self-hosted NIM container's readiness, and only run a text completion once the engine reports ready.

ARAZZO

NVIDIA NIM RAG Rerank And Answer

Embed a query, rerank candidate passages against it, then answer the question grounded in the top passage.

ARAZZO

NVIDIA NIM Vision Describe And Summarize

Describe an image with a vision-language model, then condense the description into a short caption with an LLM.

ARAZZO

NVIDIA NIM Voice Assistant Loop

Transcribe an audio clip with Riva ASR, answer the transcript with an LLM, then synthesize the reply with Riva TTS.

ARAZZO

Scroll for all 7

MCP Servers 1

Model Context Protocol servers that expose these APIs to AI agents.

nvidia-nim-mcp.yml

MCP SERVER

Agent Skills 24

Packaged agent skills for driving this provider's APIs from an AI assistant.

accelerated-computing-cudf

AGENT SKILL

aiq-deploy

AGENT SKILL

aiq-deploy

AGENT SKILL

aiq-research

AGENT SKILL

aiq-research

AGENT SKILL

cudaq-guide

AGENT SKILL

cufolio

AGENT SKILL

cuopt-developer

AGENT SKILL

cuopt-install

AGENT SKILL

cuopt-numerical-optimization-api-c

AGENT SKILL

cuopt-numerical-optimization-api-cli

AGENT SKILL

cuopt-numerical-optimization-api-python

AGENT SKILL

cuopt-numerical-optimization-formulation

AGENT SKILL

cuopt-routing-api-python

AGENT SKILL

cuopt-routing-formulation

AGENT SKILL

cuopt-server-api-python

AGENT SKILL

cuopt-server-common

AGENT SKILL

cuopt-skill-evolution

AGENT SKILL

cuopt-user-rules

AGENT SKILL

cuopt-user-rules

AGENT SKILL

cupynumeric-hdf5

AGENT SKILL

cupynumeric-install

AGENT SKILL

cupynumeric-migration-readiness

AGENT SKILL

cupynumeric-parallel-data-load

AGENT SKILL

Scroll for all 24

Pricing Plans 1

Published pricing tiers and plan structures.

Nvidia Nim Plans Pricing

3 plans

PLANS

Rate Limits 1

Documented rate limits and quota policies.

Nvidia Nim Rate Limits

0 limits

RATE LIMITS

FinOps 1

Cost, billing, and metering signals for API financial operations.

Nvidia Nim Finops

FINOPS

Features 19

Notable capabilities this provider offers.

OpenAI-compatible REST surface — /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models, /v1/ranking

100+ foundation models exposed through a single API contract — Llama 3.1/3.2/3.3, Mistral, Mixtral, NVIDIA Nemotron, DeepSeek-R1, Qwen 2.5, Microsoft Phi, Google Gemma, IBM Granite, and Falcon

Free hosted inference at build.nvidia.com on DGX Cloud — 1,000 credits on signup, 40 RPM rate limit

Self-hosted deployment via Docker containers shipping TensorRT-LLM, vLLM, or SGLang inference engines

Kubernetes-native deployment via the NIM Operator with NIMService, NIMCache, NIMPipeline CRDs

GPU-aware autoscaling, persistent model caches, and rolling upgrades managed by the operator

Multi-tenant licensing through NVIDIA AI Enterprise (commercial production use)

NeMo Retriever NIMs for embeddings, reranking, OCR, and PDF-to-Markdown extraction in RAG pipelines

Vision Language Model NIMs reusing the chat-completions surface for multimodal inputs

NVIDIA Riva speech NIMs (Parakeet ASR, Canary translation, Magpie TTS) with HTTP and gRPC adapters

BioNeMo NIMs for AlphaFold2, ESMFold, ProtGPT2, MolMIM, DiffDock, RFDiffusion

Visual generative AI NIMs — FLUX.1, SDXL, Shutterstock Edify Image, Edify 3D

NeMo Guardrails for input/output safety and topic policy enforcement

Function calling, JSON mode, tool use, and structured outputs across compatible LLMs

Streaming via Server-Sent Events on chat/completions

Prometheus /v1/metrics scrape endpoint and /v1/health/{live,ready} probes for Kubernetes

LangChain, LlamaIndex, Haystack, OpenAI SDK, and direct REST client compatibility

NVIDIA AI Blueprints — full reference RAG, multimodal search, drug discovery, and digital human stacks

Available on DGX Cloud, AWS, Azure, Google Cloud, Oracle Cloud, GKE, EKS, AKS, OpenShift, and on-prem

Scroll for all 19

JSON Schema 2

Standalone JSON Schema definitions for this provider's data models.

Get Started 4

Portal, sign-up, and the first successful call

Portal

GettingStarted

Signup

Sandbox

Documentation 4

Reference material describing how the API behaves

Documentation

Documentation

Documentation

Documentation

Agent Surfaces 5

MCP servers, agent skills, and machine-readable catalogs

AgenticAccess

AgentSkills

WellKnown

MCPServer

LLMsTxt

Design & Contract 13

Pagination, idempotency, versioning, errors, and events

Arazzo

Arazzo

Arazzo

Arazzo

Arazzo

Arazzo

Arazzo

Versioning

Conformance

ErrorCatalog

Lifecycle

Conventions

DataModel

Scroll for all 13

Build 16

SDKs, sample code, and the tooling you integrate with

PostmanWorkspace

PostmanWorkspace

GitHubOrganization

GitHubOrganization

GitHubOrganization

SDKs

SDKs

SDKs

SDKs

SDKs

SDKs

SDKs

SDKs

SDKs

CodeExamples

CodeExamples

Packages

CLI

Scroll for all 16

Access & Security 4

Authentication, authorization, and security posture

VulnerabilityDisclosure

VulnerabilityDisclosure

DomainSecurity

DomainSecurity

Authentication

Authentication

TrustCenter

TrustCenter

Operate 5

Status, limits, changes, and where to get help

StatusPage

Forums

RateLimits

RateLimits

ChangeLog

Commercial 5

Pricing, plans, and the legal terms of use

Pricing

TermsOfService

PrivacyPolicy

Plans

FinOps

Company 2

The organization behind the API

Blog

Blog

Blog

Blog

Other 17

Properties that don't map to a standard resource type

Models

KubernetesCRD

Protobuf

Protobuf

Protobuf

Protobuf

Protobuf

Overlay

Overlay

Overlay

Overlay

Overlay

Overlay

Overlay

Overlay

Overlay

Overlay

Scroll for all 17

Source (apis.yml)

aid: nvidia-nim
url: https://raw.githubusercontent.com/api-evangelist/nvidia-nim/refs/heads/main/apis.yml
apis:
- aid: nvidia-nim:nvidia-nim-completions-api
  name: NVIDIA NIM Completions API
  tags:
  - AI
  - Artificial Intelligence
  - Completions
  - LLM
  - OpenAI Compatible
  humanURL: https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html
  baseURL: https://integrate.api.nvidia.com/v1
  properties:
  - url: https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html
    type: Documentation
  - url: openapi/nvidia-nim-completions-api-openapi.yml
    type: OpenAPI
  description: Legacy OpenAI-compatible text completion endpoint (/v1/completions) for non-chat foundation models served by
    NIM. Accepts a raw prompt and returns generated text with the same streaming, sampling, and stopping-criterion controls
    as the chat endpoint.
- aid: nvidia-nim:nvidia-nim-embeddings-api
  name: NVIDIA NIM Embeddings API
  tags:
  - AI
  - Artificial Intelligence
  - Embeddings
  - Retrieval
  - RAG
  - OpenAI Compatible
  humanURL: https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/api-reference.html
  baseURL: https://integrate.api.nvidia.com/v1
  properties:
  - url: https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/api-reference.html
    type: Documentation
  - url: openapi/nvidia-nim-embeddings-api-openapi.yml
    type: OpenAPI
  - url: json-schema/nvidia-nim-embedding-schema.json
    type: JSONSchema
  description: OpenAI-compatible embeddings endpoint (/v1/embeddings) backed by NVIDIA NeMo Retriever text embedding models
    including NV-Embed, NV-EmbedQA-E5, llama-3.2-nv-embedqa-1b, and BAAI BGE-M3. Returns dense float vectors for documents
    or queries to power RAG, semantic search, and clustering. Supports `input_type=passage|query` for asymmetric retrieval
    and the standard `dimensions` parameter on models that permit dimension reduction.
- aid: nvidia-nim:nvidia-nim-reranking-api
  name: NVIDIA NIM Reranking API
  tags:
  - AI
  - Artificial Intelligence
  - Reranking
  - Retrieval
  - RAG
  humanURL: https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/api-reference.html
  baseURL: https://integrate.api.nvidia.com/v1
  properties:
  - url: https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/api-reference.html
    type: Documentation
  - url: openapi/nvidia-nim-reranking-api-openapi.yml
    type: OpenAPI
  description: NeMo Retriever cross-encoder reranking endpoint (/v1/ranking) for scoring candidate passages against a query.
    Improves retrieval relevance on RAG pipelines and supports the llama-3.2-nv-rerankqa-1b and NV-RerankQA-Mistral-4B-v3
    models. Accepts a query plus a list of passages and returns a sorted list of relevance scores.
- aid: nvidia-nim:nvidia-nim-models-api
  name: NVIDIA NIM Models API
  tags:
  - AI
  - Artificial Intelligence
  - Models
  - Catalog
  - OpenAI Compatible
  humanURL: https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html
  baseURL: https://integrate.api.nvidia.com/v1
  properties:
  - url: https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html
    type: Documentation
  - url: openapi/nvidia-nim-models-api-openapi.yml
    type: OpenAPI
  description: OpenAI-compatible model catalog endpoint (/v1/models) returning the list of models served by the NIM endpoint
    or container. Each entry includes id, owned_by, and created timestamp. Used by clients to discover the model name strings
    to pass to chat-completions / completions / embeddings.
- aid: nvidia-nim:nvidia-nim-vision-api
  name: NVIDIA NIM Vision Language Models API
  tags:
  - AI
  - Artificial Intelligence
  - Vision
  - Multimodal
  - VLM
  humanURL: https://docs.api.nvidia.com/nim/reference/vlm-apis
  baseURL: https://integrate.api.nvidia.com/v1
  properties:
  - url: https://docs.api.nvidia.com/nim/reference/vlm-apis
    type: Documentation
  - url: openapi/nvidia-nim-vision-api-openapi.yml
    type: OpenAPI
  description: Vision-language model inference through the standard /v1/chat/completions surface with image inputs (base64
    or URL) in the messages payload. Supports NVIDIA NeVA, microsoft/kosmos-2, Phi-3-vision, llama-3.2-90b-vision-instruct,
    and other VLMs hosted in the NIM catalog.
- aid: nvidia-nim:nvidia-nim-health-api
  name: NVIDIA NIM Health API
  tags:
  - Health
  - Observability
  - Kubernetes
  humanURL: https://docs.nvidia.com/nim/large-language-models/latest/observability.html
  properties:
  - url: https://docs.nvidia.com/nim/large-language-models/latest/observability.html
    type: Documentation
  - url: openapi/nvidia-nim-health-api-openapi.yml
    type: OpenAPI
  description: Liveness, readiness, and startup probes exposed by self-hosted NIM containers (/v1/health/live, /v1/health/ready)
    and a Prometheus /v1/metrics scrape endpoint for GPU utilization, request latency, and queue depth. Drives Kubernetes
    pod lifecycle and HPA scaling via the NIM Operator.
- aid: nvidia-nim:nvidia-nim-biology-api
  name: NVIDIA NIM Biology (BioNeMo) API
  tags:
  - AI
  - Biology
  - BioNeMo
  - Drug Discovery
  - Healthcare
  humanURL: https://docs.nvidia.com/nim/bionemo/latest/index.html
  baseURL: https://integrate.api.nvidia.com/v1
  properties:
  - url: https://docs.nvidia.com/nim/bionemo/latest/index.html
    type: Documentation
  - url: openapi/nvidia-nim-biology-api-openapi.yml
    type: OpenAPI
  description: BioNeMo NIMs for protein structure prediction (AlphaFold2, ESMFold, OpenFold), protein generation (ProtGPT2,
    RFDiffusion), molecular property prediction (MolMIM), small molecule generation, and molecular docking (DiffDock). Each
    model is a containerized microservice with the same OpenAPI surface.
- aid: nvidia-nim:nvidia-nim-asr-api
  name: NVIDIA NIM ASR API
  description: Automatic speech recognition (speech-to-text)
  humanURL: https://docs.api.nvidia.com/nim/reference/llm-apis
  baseURL: https://integrate.api.nvidia.com/v1
  tags:
  - ASR
  properties:
  - type: OpenAPI
    url: openapi/nvidia-nim-asr-api-openapi.yml
  - type: Documentation
    url: https://docs.api.nvidia.com/nim/reference/llm-apis
  - type: Documentation
    url: https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html
  - type: JSONSchema
    url: json-schema/nvidia-nim-chat-completion-schema.json
  - type: JSONLD
    url: json-ld/nvidia-nim-context.jsonld
  - type: Documentation
    url: https://docs.api.nvidia.com/nim/reference/visual-models
  - type: Documentation
    url: https://docs.nvidia.com/nim/riva/latest/index.html
- aid: nvidia-nim:nvidia-nim-chat-api
  name: NVIDIA NIM Chat API
  description: OpenAI-compatible chat completion operations
  humanURL: https://docs.api.nvidia.com/nim/reference/llm-apis
  baseURL: https://integrate.api.nvidia.com/v1
  tags:
  - Chat
  properties:
  - type: OpenAPI
    url: openapi/nvidia-nim-chat-api-openapi.yml
  - type: Documentation
    url: https://docs.api.nvidia.com/nim/reference/llm-apis
  - type: Documentation
    url: https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html
  - type: JSONSchema
    url: json-schema/nvidia-nim-chat-completion-schema.json
  - type: JSONLD
    url: json-ld/nvidia-nim-context.jsonld
  - type: Documentation
    url: https://docs.api.nvidia.com/nim/reference/visual-models
  - type: Documentation
    url: https://docs.nvidia.com/nim/riva/latest/index.html
- aid: nvidia-nim:nvidia-nim-images-api
  name: NVIDIA NIM Images API
  description: Text-to-image and image-to-image generation
  humanURL: https://docs.api.nvidia.com/nim/reference/llm-apis
  baseURL: https://integrate.api.nvidia.com/v1
  tags:
  - Images
  properties:
  - type: OpenAPI
    url: openapi/nvidia-nim-images-api-openapi.yml
  - type: Documentation
    url: https://docs.api.nvidia.com/nim/reference/llm-apis
  - type: Documentation
    url: https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html
  - type: JSONSchema
    url: json-schema/nvidia-nim-chat-completion-schema.json
  - type: JSONLD
    url: json-ld/nvidia-nim-context.jsonld
  - type: Documentation
    url: https://docs.api.nvidia.com/nim/reference/visual-models
  - type: Documentation
    url: https://docs.nvidia.com/nim/riva/latest/index.html
- aid: nvidia-nim:nvidia-nim-tts-api
  name: NVIDIA NIM TTS API
  description: Text-to-speech synthesis
  humanURL: https://docs.api.nvidia.com/nim/reference/llm-apis
  baseURL: https://integrate.api.nvidia.com/v1
  tags:
  - TTS
  properties:
  - type: OpenAPI
    url: openapi/nvidia-nim-tts-api-openapi.yml
  - type: Documentation
    url: https://docs.api.nvidia.com/nim/reference/llm-apis
  - type: Documentation
    url: https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html
  - type: JSONSchema
    url: json-schema/nvidia-nim-chat-completion-schema.json
  - type: JSONLD
    url: json-ld/nvidia-nim-context.jsonld
  - type: Documentation
    url: https://docs.api.nvidia.com/nim/reference/visual-models
  - type: Documentation
    url: https://docs.nvidia.com/nim/riva/latest/index.html
name: NVIDIA NIM
tags:
- AI
- Artificial Intelligence
- Inference
- Microservices
- LLM
- Foundation Models
- GPU
- Kubernetes
- NVIDIA
- OpenAI Compatible
kind: contract
accessModel:
  pricing: freemium
  onboarding: self-serve
  trial: false
  try_now: true
  public: false
  label: Freemium · Self-serve signup
  confidence: high
  source:
  - plans
  - authentication
  generated: '2026-07-22'
  method: derived
image: https://kinlane-images.s3.amazonaws.com/shared/apis-json/icons/nvidia-nim.png
access: 3rd-Party
common:
- type: AgenticAccess
  url: agentic-access/nvidia-nim-agentic-access.yml
- type: VulnerabilityDisclosure
  url: security/nvidia-nim-vulnerability-disclosure.yml
- type: DomainSecurity
  url: security/nvidia-nim-domain-security.yml
- type: Authentication
  url: authentication/nvidia-nim-authentication.yml
- name: Agent Skills
  url: https://github.com/NVIDIA/skills
  type: AgentSkills
- type: PostmanWorkspace
  url: https://www.postman.com/kinlaneapi/nvidia-nim/overview
- type: Arazzo
  url: arazzo/nvidia-nim-bionemo-drug-discovery-workflow.yml
  name: NVIDIA NIM BioNeMo Drug Discovery
- type: Arazzo
  url: arazzo/nvidia-nim-discover-and-chat-workflow.yml
  name: NVIDIA NIM Discover And Chat
- type: Arazzo
  url: arazzo/nvidia-nim-generate-image-and-caption-workflow.yml
  name: NVIDIA NIM Generate Image And Caption
- type: Arazzo
  url: arazzo/nvidia-nim-health-gated-completion-workflow.yml
  name: NVIDIA NIM Health Gated Completion
- type: Arazzo
  url: arazzo/nvidia-nim-rag-rerank-answer-workflow.yml
  name: NVIDIA NIM RAG Rerank And Answer
- type: Arazzo
  url: arazzo/nvidia-nim-vision-describe-and-summarize-workflow.yml
  name: NVIDIA NIM Vision Describe And Summarize
- type: Arazzo
  url: arazzo/nvidia-nim-voice-assistant-loop-workflow.yml
  name: NVIDIA NIM Voice Assistant Loop
- type: Portal
  url: https://build.nvidia.com
- type: Documentation
  url: https://docs.nvidia.com/nim/index.html
- type: Documentation
  url: https://docs.api.nvidia.com/nim/reference/llm-apis
- type: Documentation
  url: https://developer.nvidia.com/nim
- type: GettingStarted
  url: https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html
- type: Signup
  url: https://build.nvidia.com/explore/discover
- type: Sandbox
  url: https://build.nvidia.com/explore/discover
- type: Pricing
  url: https://www.nvidia.com/en-us/data-center/products/ai-enterprise/
- type: GitHubOrganization
  url: https://github.com/NVIDIA
- type: GitHubOrganization
  url: https://github.com/NVIDIA-NIM-Agent-Blueprints
- type: StatusPage
  url: https://status.nvidia.com
- type: Blog
  url: https://developer.nvidia.com/blog/category/generative-ai/
- type: Blog
  url: https://blogs.nvidia.com/blog/category/generative-ai/
- type: Forums
  url: https://forums.developer.nvidia.com/c/ai-data-science/nemo-llm-service/
- type: TrustCenter
  url: https://www.nvidia.com/en-us/about-nvidia/legal-info/
- type: TermsOfService
  url: https://www.nvidia.com/en-us/about-nvidia/terms-of-service/
- type: PrivacyPolicy
  url: https://www.nvidia.com/en-us/about-nvidia/privacy-policy/
- type: Documentation
  url: https://docs.nvidia.com/nim-operator/latest/index.html
  name: NIM Operator Documentation
- type: SDKs
  url: https://github.com/NVIDIA/nim-deploy
  name: NIM Deploy (Helm Charts and Reference Implementations)
- type: SDKs
  url: https://github.com/NVIDIA/k8s-nim-operator
  name: Kubernetes NIM Operator
- type: SDKs
  url: https://github.com/NVIDIA/GenerativeAIExamples
  name: Generative AI Examples
- type: SDKs
  url: https://github.com/NVIDIA/NeMo
  name: NeMo Toolkit
- type: SDKs
  url: https://github.com/NVIDIA/NeMo-Guardrails
  name: NeMo Guardrails
- type: SDKs
  url: https://github.com/NVIDIA/TensorRT-LLM
  name: TensorRT-LLM
- type: SDKs
  url: https://github.com/triton-inference-server/server
  name: Triton Inference Server
- type: SDKs
  url: https://github.com/langchain-ai/langchain-nvidia
  name: LangChain NVIDIA AI Endpoints
- type: SDKs
  url: https://pypi.org/project/openai/
  name: OpenAI Python SDK (compatible)
- type: CodeExamples
  url: https://github.com/NVIDIA/GenerativeAIExamples
  name: NVIDIA Generative AI Examples
- type: CodeExamples
  url: https://github.com/NVIDIA-AI-Blueprints
  name: NVIDIA AI Blueprints
- type: Models
  url: https://build.nvidia.com/explore/discover
- type: KubernetesCRD
  url: https://github.com/NVIDIA/k8s-nim-operator/tree/main/api
  name: NIMService / NIMCache / NIMPipeline CRDs
- type: RateLimits
  url: https://docs.api.nvidia.com/nim/reference/limits
- type: Versioning
  url: https://docs.nvidia.com/nim/large-language-models/latest/release-notes.html
- url: plans/nvidia-nim-plans-pricing.yml
  type: Plans
- url: rate-limits/nvidia-nim-rate-limits.yml
  type: RateLimits
- url: finops/nvidia-nim-finops.yml
  type: FinOps
- url: packages/nvidia-nim-packages.yml
  type: Packages
- url: well-known/nvidia-nim-well-known.yml
  type: WellKnown
- url: mcp/nvidia-nim-mcp.yml
  type: MCPServer
- url: llms/nvidia-nim-llms.txt
  type: LLMsTxt
- url: conformance/nvidia-nim-conformance.yml
  type: Conformance
- url: errors/nvidia-nim-problem-types.yml
  type: ErrorCatalog
- url: lifecycle/nvidia-nim-lifecycle.yml
  type: Lifecycle
- url: conventions/nvidia-nim-conventions.yml
  type: Conventions
- url: changelog/nvidia-nim-changelog.yml
  type: ChangeLog
- url: cli/nvidia-nim-cli.yml
  type: CLI
- url: data-model/nvidia-nim-data-model.yml
  type: DataModel
- url: grpc/nvidia-nim-riva_asr.proto
  type: Protobuf
  name: Riva Speech Recognition (ASR)
- url: grpc/nvidia-nim-riva_tts.proto
  type: Protobuf
  name: Riva Speech Synthesis (TTS)
- url: grpc/nvidia-nim-riva_nmt.proto
  type: Protobuf
  name: Riva Neural Machine Translation
- url: grpc/nvidia-nim-riva_audio.proto
  type: Protobuf
  name: Riva Audio Encoding
- url: grpc/nvidia-nim-riva_common.proto
  type: Protobuf
  name: Riva Common Types
- url: overlays/nvidia-nim-chat-completions-overlay.yaml
  type: Overlay
  name: Chat Completions enhancements
- url: overlays/nvidia-nim-completions-overlay.yaml
  type: Overlay
  name: Completions enhancements
- url: overlays/nvidia-nim-embeddings-overlay.yaml
  type: Overlay
  name: Embeddings enhancements
- url: overlays/nvidia-nim-reranking-overlay.yaml
  type: Overlay
  name: Reranking enhancements
- url: overlays/nvidia-nim-models-overlay.yaml
  type: Overlay
  name: Models enhancements
- url: overlays/nvidia-nim-vision-overlay.yaml
  type: Overlay
  name: Vision enhancements
- url: overlays/nvidia-nim-image-generation-overlay.yaml
  type: Overlay
  name: Image Generation enhancements
- url: overlays/nvidia-nim-speech-overlay.yaml
  type: Overlay
  name: Speech enhancements
- url: overlays/nvidia-nim-biology-overlay.yaml
  type: Overlay
  name: Biology (BioNeMo) enhancements
- url: overlays/nvidia-nim-health-overlay.yaml
  type: Overlay
  name: Health enhancements
- type: Features
  data:
  - OpenAI-compatible REST surface — /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models, /v1/ranking
  - 100+ foundation models exposed through a single API contract — Llama 3.1/3.2/3.3, Mistral, Mixtral, NVIDIA Nemotron, DeepSeek-R1,
    Qwen 2.5, Microsoft Phi, Google Gemma, IBM Granite, and Falcon
  - Free hosted inference at build.nvidia.com on DGX Cloud — 1,000 credits on signup, 40 RPM rate limit
  - Self-hosted deployment via Docker containers shipping TensorRT-LLM, vLLM, or SGLang inference engines
  - Kubernetes-native deployment via the NIM Operator with NIMService, NIMCache, NIMPipeline CRDs
  - GPU-aware autoscaling, persistent model caches, and rolling upgrades managed by the operator
  - Multi-tenant licensing through NVIDIA AI Enterprise (commercial production use)
  - NeMo Retriever NIMs for embeddings, reranking, OCR, and PDF-to-Markdown extraction in RAG pipelines
  - Vision Language Model NIMs reusing the chat-completions surface for multimodal inputs
  - NVIDIA Riva speech NIMs (Parakeet ASR, Canary translation, Magpie TTS) with HTTP and gRPC adapters
  - BioNeMo NIMs for AlphaFold2, ESMFold, ProtGPT2, MolMIM, DiffDock, RFDiffusion
  - Visual generative AI NIMs — FLUX.1, SDXL, Shutterstock Edify Image, Edify 3D
  - NeMo Guardrails for input/output safety and topic policy enforcement
  - Function calling, JSON mode, tool use, and structured outputs across compatible LLMs
  - Streaming via Server-Sent Events on chat/completions
  - Prometheus /v1/metrics scrape endpoint and /v1/health/{live,ready} probes for Kubernetes
  - LangChain, LlamaIndex, Haystack, OpenAI SDK, and direct REST client compatibility
  - NVIDIA AI Blueprints — full reference RAG, multimodal search, drug discovery, and digital human stacks
  - Available on DGX Cloud, AWS, Azure, Google Cloud, Oracle Cloud, GKE, EKS, AKS, OpenShift, and on-prem
  sources:
  - https://build.nvidia.com
  - https://docs.nvidia.com/nim/index.html
  - https://docs.api.nvidia.com/nim/reference/llm-apis
  - https://www.nvidia.com/en-us/ai-data-science/products/nim-microservices/
  - https://github.com/NVIDIA/k8s-nim-operator
  - https://github.com/NVIDIA/nim-deploy
  updated: '2026-05-25'
created: '2026-05-25'
modified: '2026-06-20'
position: Consuming
description: NVIDIA NIM (NVIDIA Inference Microservices) is a catalog of GPU-accelerated, containerized AI inference microservices
  that package optimized model engines (TensorRT-LLM, vLLM, SGLang, Triton) behind industry-standard OpenAI-compatible REST
  APIs. NIM covers large language models, embeddings and reranking, vision-language models, speech (Riva), visual generative
  AI, and biology (BioNeMo) — exposed identically whether consumed from the hosted endpoint at integrate.api.nvidia.com or
  self-hosted via Docker containers and the Kubernetes-native NIM Operator. NIM ships with NVIDIA AI Enterprise for commercial
  deployment and is the inference layer underneath NVIDIA AI Blueprints, NeMo Retriever, NeMo Guardrails, and the broader
  NVIDIA developer stack.
maintainers:
- FN: Kin Lane
  email: info@apievangelist.com
  X: apievangelist
  url: https://apievangelist.com
specificationVersion: '0.16'

NVIDIA NIM

APIs 11

Postman Collections 10

Open Collections 10

Arazzo Workflows 7

MCP Servers 1

Agent Skills 24

Pricing Plans 1

Rate Limits 1

FinOps 1

Features 19

Semantic Vocabularies 1

Spectral Rules 1

JSON Schema 2

Security Posture 3

Agentic Access 1

Get Started 4

Documentation 4

Agent Surfaces 5

Design & Contract 13

Build 16

Access & Security 4

Operate 5

Commercial 5

Company 2

Other 17

Source (apis.yml)