Microsoft Azure Api Management Rate Limits

Azure API Management is itself a rate-limit/quota engine for downstream APIs. Built-in policies (rate-limit, rate-limit-by-key, quota, quota-by-key) let operators throttle by subscription key, IP, or arbitrary expression. Service-level capacity caps depend on the tier (scale units).

Microsoft Azure Api Management Rate Limits is the machine-readable rate-limit profile for Microsoft Azure API Management on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 4 rate-limit definitions, measuring requests_per_minute, varies, and requests_per_period.

The profile also includes 6 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include Rate Limiting, API Gateway, API Management, and Microsoft Azure.

4 Limits Throttle: 429
Rate LimitingAPI GatewayAPI ManagementMicrosoft Azure

Limits

Consumption tier per-subscription subscription
requests_per_minute
see policy definition; default no service-level cap
Operators define limits via rate-limit / quota policies; consumption tier is metered per call rather than capped at the service level.
Tier scale units service
varies
Developer 1, Basic 2, Basic v2 10, Standard 4, Standard v2 10, Premium 12 per region, Premium v2 30
Each scale unit yields documented gateway throughput; configure rate-limit policies on top.
Rate-limit policy (per minute) subscription/key/expression
requests_per_minute
configurable per policy
Quota policy (per period) subscription/key/expression
requests_per_period
configurable per policy

Policies

rate-limit and rate-limit-by-key
Apply per-minute (or sub-minute via fixed-period) limits per subscription, key, or arbitrary key expression. Returns 429 with Retry-After when exceeded.
quota and quota-by-key
Apply longer-window (renewable per hour, day, week, month) call or bandwidth quotas per subscription/key.
llm-token-limit
AI-gateway TPM (tokens-per-minute) and token-quota policy for LLM/AI backends. Supports per-subscription, per-IP, or arbitrary counter keys, with optional prompt-token pre-estimation before forwarding to backend.
llm-emit-token-metric
Emit prompt/completion/total token counts as Application Insights custom metrics with configurable dimensions (Client IP, API ID, user ID, etc.) for AI cost attribution.
Capacity vs throttling
Tier scale units define peak capacity; operators must size scale units to match configured policy ceilings.
Honor Retry-After
Clients should honor the Retry-After header returned with 429 responses when the gateway throttles a request. Backend circuit breakers in API Management likewise honor backend Retry-After headers for dynamic recovery.

Sources