Agent Skill · NVIDIA NIM

nemotron-speech

Routes NVIDIA Nemotron Speech (Riva) NIM tasks — deploys, runs, and tests ASR, TTS, and NMT NIMs on build.nvidia.com or self-hosted.

Provider: NVIDIA NIM Path in repo: skills/nemotron-speech/SKILL.md

Skill body

Nemotron Speech Skills

Note: “Nemotron Speech” is the public-facing name for what NVIDIA documents today as Riva / Riva NIM. All commands, container images, gRPC APIs, Python imports, and documentation URLs still use “Riva” — the rename is brand-only. Do not rename commands, images, or doc URLs.

Agent: When walking the user through a multi-step workflow, announce each step before presenting it: Step N/M — Step Title (e.g., “Step 1/4 — Deploy the Container”).

Purpose

Single entry point for all NVIDIA Nemotron Speech (Riva) NIM workflows: ASR (speech-to-text), TTS (text-to-speech), and NMT (translation). Covers cloud-hosted inference via build.nvidia.com, self-hosted Docker deployment, client-protocol choice for ASR (gRPC, HTTP, WebSocket), custom NeMo model deployment via riva-build, ASR pipeline tuning (VAD, diarization, language models), and the prerequisite Docker / NGC / driver setup.

When to Use This Skill

Use this skill for any Nemotron Speech / Riva NIM task — deployment, testing, custom model build, system requirements check, or model selection across ASR / TTS / NMT modalities.

Workflow

Identify the user’s task type, then load the corresponding reference file from references/. The reference files contain the detailed per-workflow content; this SKILL.md is a routing surface. Load only the reference relevant to the task at hand.

Prerequisites

Instructions

Source of truth

For per-release detail — current model catalog, container IDs, function IDs, voice lists, VRAM minimums, per-model feature support — fetch or open the canonical NVIDIA doc rather than relying on text in this SKILL.md or the references. Each reference file includes its own routing table to the relevant doc pages.

Top-level landing pages:

Topic URL
ASR support matrix https://docs.nvidia.com/nim/speech/latest/reference/support-matrix/asr.html
TTS support matrix https://docs.nvidia.com/nim/speech/latest/reference/support-matrix/tts.html
NMT support matrix https://docs.nvidia.com/nim/speech/latest/reference/support-matrix/nmt.html
Prerequisites (driver / GPU / OS) https://docs.nvidia.com/nim/speech/latest/get-started/prerequisites.html
ASR pipeline configuration https://docs.nvidia.com/nim/speech/latest/asr/customization/pipeline-configuration.html
ASR runtime customization https://docs.nvidia.com/nim/speech/latest/asr/customization/customization.html
Cloud function IDs (per model) https://build.nvidia.com/<org>/<model>/api
NGC catalog https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/models

Examples

“Deploy a Parakeet ASR NIM” → load references/asr.md, follow Option B (self-hosted), Steps 1–4.

“Synthesize speech with Magpie” → load references/tts.md, follow Option A (cloud) or Option B (self-hosted).

“Translate English to German” → load references/nmt.md, follow the 4-step flow.

“Convert my fine-tuned .nemo to a NIM” → load references/asr-custom.md for the 4-phase pipeline and references/pipelines.md for build-time config.

“Can my GPU run this?” → load references/deployment-readiness-checks.md and run the 6-step system check.

“Which Riva model should I use?” → load references/model-selection.md, apply the decision framework, then fetch the support matrix for the specific current model name.

Naming & Terminology

Troubleshooting

For task-specific runtime or modality issues, use the relevant reference file (references/<task>.md). Cross-cutting readiness checks:

Limitations

Next Steps

Skill frontmatter

triggers: Nemotron Speechdeploy Riva NIMdeploy ASR/TTS/NMT NIMRiva ASRRiva TTSRiva translationParakeetCanaryWhisperNemotron ASR StreamingMagpie TTSDNT tagnemo2rivariva-buildriva-deployRMIRRiva NIM setupNGC API keyforce_eouSilero VADSortformer diarizationchunk size RivaRiva HTTPRiva WebSocketgrpc.nvcf.nvidia.combuild.nvidia.com Riva version: 1.0.0 license: Apache-2.0 metadata: {"author" => "Nemotron Speech Team", "team" => "riva", "tags" => ["nvidia", "nemotron-speech", "riva", "nim", "asr", "tts", "nmt", "speech", "speech-to-text", "text-to-speech", "translation", "parakeet", "canary", "whisper", "magpie", "nemotron", "grpc", "http", "websocket", "cloud", "nvcf"], "domain" => "ml"}