Google Gemini · Rate Limits

Google Gemini Rate Limits

Name: Google Gemini Rate Limits
Creator: Google Gemini
Keywords: Generative AI, LLM, Google, Rate Limiting

Gemini API rate limits are scoped per usage tier (Free, Tier 1, Tier 2, Tier 3) and per model. Each tier defines RPM (requests per minute), TPM (tokens per minute), and RPD (requests per day) ceilings. Tier promotion is automatic based on cumulative spend and account age. Specific numerical limits are not statically published per model; they are visible in Google AI Studio per project. The Batch API has separate limits.

Google Gemini Rate Limits is the machine-readable rate-limit profile for Google Gemini on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 7 rate-limit definitions, measuring varies, monthly_spend_cap_USD, concurrent_requests, bytes, and tokens.

The profile also includes 5 backoff/retry policies defined and response codes documented for throttled, resourceExhausted, and quotaExceeded.

Tagged areas include Generative AI, LLM, Google, and Rate Limiting.

7 Limits Throttle: 429 Quota: 403

Generative AILLMGoogleRate Limiting

Limits

Free tier project

varies

see AI Studio rate-limit page

Active project or free trial. Lowest RPM/TPM/RPD ceilings; varies by model.

Tier 1 project

monthly_spend_cap_USD

250

Linked billing account. $250 monthly spend cap; higher RPM/TPM/RPD than Free.

Tier 2 project

monthly_spend_cap_USD

2000

Reached after $100+ spent and 3 days on the account. $2,000 monthly spend cap.

Tier 3 project

monthly_spend_cap_USD

100000

Reached after $1,000+ spent and 30 days. Spend cap ranges $20,000-$100,000+ subject to review.

Batch API concurrent batch requests project

concurrent_requests

100

Batch API input file size batch_request

bytes

2147483648

2 GB.

Batch API enqueued tokens project/model

tokens

see model-specific batch quota

Ranges from millions to billions depending on model and tier.

Policies

Tier promotion

Tier upgrade is automatic when spend / account-age thresholds are met. Higher tiers grant higher RPM/TPM/RPD across all models in your project.

Live rate-limit visibility

View your current per-model RPM/TPM/RPD in the Google AI Studio Rate Limits page; programmatic values are not statically documented because they change per tier and per model.

429 backoff

On 429 ResourceExhausted, retry with exponential backoff with jitter; respect any retry hint metadata returned by the API.

Batch API discount

Batch API offers 50% lower per-token cost vs synchronous calls and uses separate quota buckets - a primary FinOps lever for non-latency-sensitive workloads.

Vertex AI alternative

For higher throughput needs, use Gemini via Vertex AI with provisioned throughput; quotas are governed by Vertex AI quotas and are raisable through the Cloud Console.

Google Gemini Rate Limits

Limits

Policies

Sources