Pieces Rate Limits
The Pieces OS API is served on-device over the loopback interface (http://localhost:1000) and does not impose conventional per-account HTTP rate limits the way a hosted cloud API would. Throughput for local model inference is bounded by the host machine's CPU, GPU, and memory. Where Copilot requests are routed to cloud models, usage limits follow the account's plan tier (Free has limited cloud usage; Pro is unlimited), and the upstream cloud model provider's own limits apply. Specific numeric limits are not documented for the local API and are not reconciled here.
Pieces Rate Limits is the machine-readable rate-limit profile for Pieces on the APIs.io network, conforming to the API Commons Rate Limits specification.
It captures 4 rate-limit definitions, measuring requests and tokens.
The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.
Tagged areas include AI, Developer Tools, On-Device, Local API, and Long-Term Memory.