Gemini logo

Gemini

Google's Gemini API provides access to state-of-the-art generative AI models for text generation, multimodal understanding, code generation, and more.

15 APIs 11 Features
AgentsArtificial IntelligenceAudio UnderstandingBatch ProcessingDeep ResearchDocument UnderstandingEmbeddingsFunction CallingGenerative AiImage GenerationLarge Language ModelsMachine LearningMultimodalStructured OutputText-To-SpeechVideo GenerationVideo Understanding

APIs

Gemini REST API

REST API for accessing Gemini models for text generation, chat, embeddings, and multimodal tasks.

Gemini Python SDK

Python client library for the Gemini API.

Gemini Node.js SDK

Node.js client library for the Gemini API.

Gemini Go SDK

Go client library for the Gemini API, providing an interface for developers to integrate Google generative models into Go applications.

Gemini Java SDK

Java client library for the Gemini API, providing an interface for developers to integrate Google generative models into Java applications.

Gemini C# SDK

C# client library for the Gemini API, providing an interface for developers to integrate Google generative models into .NET applications.

Gemini Live API

Low-latency bidirectional streaming API enabling real-time voice and video interactions with Gemini models over WebSocket connections.

Gemini Interactions API

Unified interface for interacting with Gemini models and agents, simplifying state management, tool orchestration, and long-running tasks as an improved alternative to generateC...

Gemini Image Generation API

Image generation capabilities through the Gemini API, supporting text-to-image generation, image editing, and multi-turn conversational editing.

Gemini Video Generation API

Video generation capabilities through the Gemini API powered by Veo, supporting text-to-video and image-to-video generation in resolutions up to 4K.

Gemini Text-to-Speech API

Native audio generation text-to-speech capabilities through the Gemini API, supporting single and multi-speaker speech synthesis with natural language control over style, accent...

Gemini Files API

API for uploading and managing media files for use with Gemini models, supporting images, audio, video, and documents up to 2 GB per file with 20 GB per project storage.

Gemini Embeddings API

Text embedding capabilities through the Gemini API, generating vector representations for semantic search, classification, clustering, and retrieval augmented generation (RAG) a...

Gemini Batch API

Asynchronous batch processing API for submitting large volumes of Gemini API requests at 50 percent of the standard cost, with support for content generation, embeddings, and Op...

Gemini Deep Research API

Agentic research capability powered by the Interactions API that autonomously plans, executes, and synthesizes multi-step research tasks using web search and URL context to prod...

Features

Multimodal Understanding

Process and understand text, images, audio, video, and documents in a single model.

Function Calling

Define custom functions that Gemini can invoke to interact with external systems and APIs.

Structured Output

Generate JSON responses conforming to specified schemas for reliable data extraction.

Context Caching

Cache large context windows to reduce latency and cost for repeated queries.

Code Execution

Execute Python code in a sandboxed environment for computational tasks.

Grounding with Google Search

Ground model responses with real-time Google Search results for factual accuracy.

Live Streaming API

Real-time bidirectional voice and video interactions over WebSocket connections.

Image and Video Generation

Generate images and videos from text prompts using Gemini and Veo models.

Text-to-Speech

Native audio generation with multi-speaker support and natural language style control.

Deep Research

Autonomous multi-step research agent that synthesizes cited reports from web sources.

Thinking Mode

Extended reasoning capability for complex problem-solving and analysis tasks.

Use Cases

AI-Powered Chatbots

Build conversational AI assistants with multimodal understanding and function calling.

Document Processing

Extract structured data from documents, PDFs, and images using vision capabilities.

Content Generation

Generate text, images, and video content with AI for marketing and creative workflows.

Code Generation

Generate, explain, and debug code across multiple programming languages.

Semantic Search

Build search systems using Gemini embeddings for semantic similarity matching.

Real-Time Translation

Translate text and audio in real-time using multimodal capabilities.

Integrations

Google Cloud Vertex AI

Access Gemini models through Vertex AI for enterprise-grade deployment and management.

Google AI Studio

Prototype and test Gemini API calls with the web-based development environment.

LangChain

Use Gemini as a provider in LangChain for building AI application pipelines.

Firebase

Integrate Gemini with Firebase for mobile and web app AI features.

OpenAI Compatibility

Use Gemini through OpenAI-compatible API endpoints for easy migration.

Resources

🚀
GettingStarted
GettingStarted
🔑
Authentication
Authentication
💰
Pricing
Pricing
🔗
Models
Models
🔗
RateLimits
RateLimits
📜
TermsOfService
TermsOfService
📜
PrivacyPolicy
PrivacyPolicy
📰
Blog
Blog
💬
Support
Support
📦
SDK
SDK
📄
ChangeLog
ChangeLog
🟢
StatusPage
StatusPage
👥
GitHubRepository
GitHubRepository
🌐
Console
Console