Mistral AI · Rate Limits

Mistral Ai Rate Limits

Mistral AI's la Plateforme exposes a chat-completions API at api.mistral.ai/v1 with per-account, per-model rate limits enforced as requests-per-second and tokens-per-minute. Specific per-tier numbers are not displayed on the public docs / pricing pages we sampled — they are surfaced in-product on the la Plateforme console and can be raised via support. 429 with Retry-After indicates throttling.

Mistral Ai Rate Limits is the machine-readable rate-limit profile for Mistral AI on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 3 rate-limit definitions, measuring requests_per_second, tokens_per_minute, and concurrent_requests.

The profile also includes 4 backoff/retry policies defined and response codes documented for throttled and serviceUnavailable.

Tagged areas include Rate Limiting, AI, and Large Language Models.

3 Limits Throttle: 429
Rate LimitingAILarge Language Models

Limits

Requests per second (per model, per workspace) account
requests_per_second
See la Plateforme console; not publicly published per tier
Tokens per minute (per model, per workspace) account
tokens_per_minute
See la Plateforme console; not publicly published per tier
Concurrent requests account
concurrent_requests
See la Plateforme console

Policies

Honor Retry-After
429 responses include Retry-After (seconds). Honor the value before retrying with exponential backoff and jitter.
Per-model scoping
Limits are enforced per-model — heavy use of one model does not throttle others unless the workspace-wide budget is hit.
Tier upgrades
Higher per-tier rate limits are unlocked by adding a payment method and incurring usage; explicit limit raises can be requested via support.
Reasoning models
Reasoning-effort settings can spike output tokens significantly; size token-per-minute caps for the worst case.

Sources