Microsoft Azure API Management · Rate Limits

Microsoft Azure Api Management Rate Limits

Name: Microsoft Azure Api Management Rate Limits
Creator: Microsoft Azure API Management
Keywords: Rate Limiting, API Gateway, API Management, Microsoft Azure

Azure API Management is itself a rate-limit/quota engine for downstream APIs. Built-in policies (rate-limit, rate-limit-by-key, quota, quota-by-key) let operators throttle by subscription key, IP, or arbitrary expression. Service-level capacity caps depend on the tier (scale units).

Microsoft Azure Api Management Rate Limits is the machine-readable rate-limit profile for Microsoft Azure API Management on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 4 rate-limit definitions, measuring requests_per_minute, varies, and requests_per_period.

The profile also includes 6 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include Rate Limiting, API Gateway, API Management, and Microsoft Azure.

4 Limits Throttle: 429

Rate LimitingAPI GatewayAPI ManagementMicrosoft Azure

Limits

Consumption tier per-subscription subscription

requests_per_minute

see policy definition; default no service-level cap

Operators define limits via rate-limit / quota policies; consumption tier is metered per call rather than capped at the service level.

Tier scale units service

varies

Developer 1, Basic 2, Basic v2 10, Standard 4, Standard v2 10, Premium 12 per region, Premium v2 30

Each scale unit yields documented gateway throughput; configure rate-limit policies on top.

Rate-limit policy (per minute) subscription/key/expression

requests_per_minute

configurable per policy

Quota policy (per period) subscription/key/expression

requests_per_period

configurable per policy

Policies

rate-limit and rate-limit-by-key

Apply per-minute (or sub-minute via fixed-period) limits per subscription, key, or arbitrary key expression. Returns 429 with Retry-After when exceeded.

quota and quota-by-key

Apply longer-window (renewable per hour, day, week, month) call or bandwidth quotas per subscription/key.

llm-token-limit

AI-gateway TPM (tokens-per-minute) and token-quota policy for LLM/AI backends. Supports per-subscription, per-IP, or arbitrary counter keys, with optional prompt-token pre-estimation before forwarding to backend.

llm-emit-token-metric

Emit prompt/completion/total token counts as Application Insights custom metrics with configurable dimensions (Client IP, API ID, user ID, etc.) for AI cost attribution.

Capacity vs throttling

Tier scale units define peak capacity; operators must size scale units to match configured policy ceilings.

Honor Retry-After

Clients should honor the Retry-After header returned with 429 responses when the gateway throttles a request. Backend circuit breakers in API Management likewise honor backend Retry-After headers for dynamic recovery.

Microsoft Azure Api Management Rate Limits

Limits

Policies

Sources