NVIDIA NIM

NVIDIA NIM Reranking API

NeMo Retriever cross-encoder reranking endpoint (/v1/ranking) for scoring candidate passages against a query. Improves retrieval relevance on RAG pipelines and supports the llama-3.2-nv-rerankqa-1b and NV-RerankQA-Mistral-4B-v3 models. Accepts a query plus a list of passages and returns a sorted list of relevance scores.

Documentation GitHub OpenAPI

Documentation

📖

Documentation

https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/api-reference.html

Specifications

⚙

OpenAPI

https://raw.githubusercontent.com/api-evangelist/nvidia-nim/refs/heads/main/openapi/nvidia-nim-reranking-api-openapi.yml

OpenAPI Specification

openapi: 3.1.0
info:
  title: NVIDIA NIM Biology (BioNeMo) ASR Reranking API
  description: 'NVIDIA BioNeMo NIMs for drug discovery and structural biology. Each model is a containerized microservice with its own task-specific payload but a consistent JSON contract. Includes protein structure prediction (AlphaFold2, ESMFold, OpenFold), protein generation (ProtGPT2, RFDiffusion), molecular property prediction (MolMIM), small molecule generation, and molecular docking (DiffDock).

    '
  version: '2026-05-25'
  contact:
    name: NVIDIA Developer Support
    url: https://forums.developer.nvidia.com/c/ai-data-science/nemo-llm-service/
  license:
    name: NVIDIA AI Enterprise License
    url: https://www.nvidia.com/en-us/data-center/products/ai-enterprise/
servers:
- url: https://integrate.api.nvidia.com
  description: NVIDIA-hosted NIM endpoint
- url: http://localhost:8000
  description: Self-hosted NIM container default
security:
- BearerAuth: []
tags:
- name: Reranking
  description: Cross-encoder reranking operations
paths:
  /v1/ranking:
    post:
      summary: Rank Candidate Passages
      description: Score candidate passages against a query using a NeMo Retriever reranker.
      operationId: rankPassages
      tags:
      - Reranking
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/RankingRequest'
      responses:
        '200':
          description: Ranked passages with relevance scores.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/RankingResponse'
        '400':
          description: Invalid request.
        '401':
          description: Missing or invalid API key.
        '429':
          description: Rate limit exceeded.
components:
  schemas:
    RankingRequest:
      type: object
      required:
      - model
      - query
      - passages
      properties:
        model:
          type: string
          description: e.g. `nvidia/llama-3.2-nv-rerankqa-1b-v2`, `nvidia/nv-rerankqa-mistral-4b-v3`.
        query:
          type: object
          required:
          - text
          properties:
            text:
              type: string
        passages:
          type: array
          items:
            type: object
            required:
            - text
            properties:
              text:
                type: string
        truncate:
          type: string
          enum:
          - NONE
          - END
          default: END
    RankingResponse:
      type: object
      properties:
        rankings:
          type: array
          items:
            type: object
            properties:
              index:
                type: integer
                description: Original index in the request `passages` array.
              logit:
                type: number
                description: Raw cross-encoder relevance logit (higher = more relevant).
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: nvapi-...