Anyscale Rate Limits
Anyscale is a control-plane API for managing Ray compute. Throughput limits primarily come from the underlying cloud quotas (per-region instance and GPU quotas in the customer's AWS / GCP account or Anyscale's hosted account). Control-plane API call rates are not publicly documented and are pending reconciliation; service-level rate limits on Ray Serve services are controlled by user code and autoscaling configuration.
Anyscale Rate Limits is the machine-readable rate-limit profile for Anyscale on the APIs.io network, conforming to the API Commons Rate Limits specification.
It captures 4 rate-limit definitions, measuring requests, concurrent, and nodes.
The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.
Tagged areas include AI, Distributed Computing, Ray, ML Platform, and Inference.