Vespa · Rate Limits
Vespa Ai Rate Limits
Vespa serving throughput is governed by application configuration (per-container HTTP threads, document API concurrency, query timeout) rather than a fixed per-key request quota. The values below reflect Vespa's default protection mechanisms and Vespa Cloud guidance.
Vespa Ai Rate Limits is the machine-readable rate-limit profile for Vespa on the APIs.io network, conforming to the API Commons Rate Limits specification.
It captures 6 rate-limit definitions, measuring queries_per_second, writes_per_second, concurrent, seconds, and documents.
The profile also includes response codes documented for throttled, serverError, and timeout.
Tagged areas include Rate Limiting, AI Search, and Vector Database.
6 Limits
Throttle: 429
Rate LimitingAI SearchVector Database
Limits
Query throughput cluster
scales with container cluster size
Tune feed/query container resources and search threads per query to scale.
Document API throughput cluster
scales with content cluster size
Vespa Cloud customers typically achieve tens of thousands of writes per second per content cluster.
In-flight document operations container
bounded by container thread pool
Returns HTTP 429 when the per-container in-flight limit is exceeded.
Default query timeout request
500
Default query timeout is 500ms; configurable per request via the `timeout` parameter.
Default feed timeout request
180
Default feed timeout is 180s; configurable per request.
Visit batch request
wantedDocumentCount
Visit operations are paginated via the `continuation` token; batch sizing is governed by `wantedDocumentCount`.