Lamini · Rate Limits

Lamini Rate Limits

The Lamini Platform meters inference and tuning usage per account and is governed primarily by available credit / spend on the On-Demand tier and by reserved GPU capacity on Enterprise. Concurrent inference requests and tuning jobs are bounded by account capacity rather than fixed published per-minute request quotas. Specific numeric limits are not publicly documented and are not reconciled in this artifact.

Lamini Rate Limits is the machine-readable rate-limit profile for Lamini on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 4 rate-limit definitions, measuring requests, jobs, steps, and usd.

The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include AI, LLM, Fine-Tuning, Memory Tuning, and Inference.

4 Limits Throttle: 429
AILLMFine-TuningMemory TuningInferenceRate LimitingQuotasThrottling

Limits

Concurrent Inference Requests account
requests
see provider documentation
Inference concurrency bounded by account capacity and credit/tier.
Concurrent Tuning Jobs account
jobs
see provider documentation
Number of simultaneous tuning jobs bounded by GPU capacity / tier.
Tuning Throughput account
steps
see provider documentation
Burst tuning scales linearly across multiple GPUs/nodes on On-Demand.
Credit / Spend Ceiling account
usd
see provider documentation
On-Demand usage is bounded by available prepaid or free credit.

Policies

Capacity-Based Limits
Throughput scales with On-Demand credit and Enterprise reserved GPU capacity rather than fixed per-minute quotas.
Backoff Strategy
Clients should implement exponential backoff with jitter and honor Retry-After on 429 responses.

Sources