Vespa · Rate Limits

Vespa Ai Rate Limits

Vespa serving throughput is governed by application configuration (per-container HTTP threads, document API concurrency, query timeout) rather than a fixed per-key request quota. The values below reflect Vespa's default protection mechanisms and Vespa Cloud guidance.

Vespa Ai Rate Limits is the machine-readable rate-limit profile for Vespa on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 6 rate-limit definitions, measuring queries_per_second, writes_per_second, concurrent, seconds, and documents.

The profile also includes response codes documented for throttled, serverError, and timeout.

Tagged areas include Rate Limiting, AI Search, and Vector Database.

6 Limits Throttle: 429
Rate LimitingAI SearchVector Database

Limits

Query throughput cluster
queries_per_second
scales with container cluster size
Tune feed/query container resources and search threads per query to scale.
Document API throughput cluster
writes_per_second
scales with content cluster size
Vespa Cloud customers typically achieve tens of thousands of writes per second per content cluster.
In-flight document operations container
concurrent
bounded by container thread pool
Returns HTTP 429 when the per-container in-flight limit is exceeded.
Default query timeout request
seconds · request
500
Default query timeout is 500ms; configurable per request via the `timeout` parameter.
Default feed timeout request
seconds · request
180
Default feed timeout is 180s; configurable per request.
Visit batch request
documents · request
wantedDocumentCount
Visit operations are paginated via the `continuation` token; batch sizing is governed by `wantedDocumentCount`.

Sources